Commit Graph

21 Commits

Author SHA1 Message Date
21in7
30ddb2fef4 feat(ml): relax training thresholds for 5-10x more training samples
Add TRAIN_* constants (signal_threshold=2, adx=15, vol_mult=1.5, neg_ratio=3)
as dataset_builder defaults. Remove hardcoded negative_ratio=5 from all callers.
Bot entry conditions unchanged (config.py strict values).

WF 5-fold results (all symbols AUC 0.91+):
- XRPUSDT: 0.9216 ± 0.0052
- SOLUSDT:  0.9174 ± 0.0063
- DOGEUSDT: 0.9222 ± 0.0085

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 19:38:15 +09:00
21in7
fe99885faa fix(ml): align dataset_builder default SL/TP with config (2.0/2.0)
Module-level ATR_SL_MULT was 1.5, now 2.0 to match config.py and CLI defaults.
This ensures generate_dataset_vectorized() produces correct labels even without
explicit parameters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 18:43:09 +09:00
21in7
75d1af7fcc feat(ml): parameterize SL/TP multipliers in dataset_builder
Add atr_sl_mult and atr_tp_mult parameters to _calc_labels_vectorized
and generate_dataset_vectorized, defaulting to existing constants (1.5,
2.0) for full backward compatibility. Callers (train scripts, backtester)
can now pass symbol-specific multipliers without modifying module-level
constants.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 18:03:24 +09:00
21in7
64f56806d2 fix: resolve 6 warning issues from code review
5. Add daily PnL reset loop — UTC midnight auto-reset via
   _daily_reset_loop in main.py, prevents stale daily_pnl accumulation
6. Fix set_base_balance race condition — call once in main.py before
   spawning bots, instead of each bot calling independently
7. Remove realized_pnl != 0 from close detection — prevents entry
   orders with small rp values being misclassified as closes
8. Rename xrp_btc_rs/xrp_eth_rs → primary_btc_rs/primary_eth_rs —
   generic column names for multi-symbol support (dataset_builder,
   ml_features, and tests updated consistently)
9. Replace asyncio.get_event_loop() → get_running_loop() — fixes
   DeprecationWarning on Python 3.10+
10. Parallelize candle preload — asyncio.gather for all symbols
    instead of sequential REST calls, ~3x faster startup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 22:44:40 +09:00
21in7
02e41881ac feat: strategy parameter sweep and production param optimization
- Add independent backtest engine (backtester.py) with walk-forward support
- Add backtest sanity check validator (backtest_validator.py)
- Add CLI tools: run_backtest.py, strategy_sweep.py (with --combined mode)
- Fix train-serve skew: unify feature z-score normalization (ml_features.py)
- Add strategy params (SL/TP ATR mult, ADX filter, volume multiplier) to
  config.py, indicators.py, dataset_builder.py, bot.py, backtester.py
- Fix WalkForwardBacktester not propagating strategy params to test folds
- Update production defaults: SL=2.0x, TP=2.0x, ADX=25, Vol=2.5
  (3-symbol combined PF: 0.71 → 1.24, MDD: 65.9% → 17.1%)
- Retrain ML models with new strategy parameters

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 23:39:43 +09:00
21in7
ff9e639142 feat: add OI derived features (oi_change_ma5, oi_price_spread) to dataset builder and ML features
Add two new OI-derived features to improve ML model's market microstructure
understanding:
- oi_change_ma5: 5-candle moving average of OI change rate (short-term trend)
- oi_price_spread: z-scored OI minus z-scored price return (divergence signal)

Both features use 96-candle rolling z-score window. FEATURE_COLS expanded from
24 to 26. Existing tests updated to reflect new feature counts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 20:07:40 +09:00
21in7
9c6f5dbd76 feat: remove ADX hard filter from dataset builder, add ADX as ML feature
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 21:17:49 +09:00
21in7
0af138d8ee feat: add stratified_undersample helper function
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 23:58:15 +09:00
21in7
b7ad358a0a fix: make HOLD negative sampling tests non-vacuous
The two HOLD negative tests (test_hold_negative_labels_are_all_zero,
test_signal_samples_preserved_after_sampling) were passing vacuously
because sample_df produces 0 signal candles (ADX ~18, below threshold
25). Added signal_producing_df fixture with higher volatility and volume
surges to reliably generate signals. Removed if-guards so assertions
are mandatory. Also restored the full docstring for
generate_dataset_vectorized() documenting btc_df/eth_df,
time_weight_decay, and negative_ratio parameters.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 23:45:10 +09:00
21in7
8e56301d52 feat: add HOLD negative sampling to dataset_builder
Add negative_ratio parameter to generate_dataset_vectorized() that
samples HOLD candles as label=0 negatives alongside signal candles.
This increases training data from ~535 to ~3,200 samples when enabled.

- Split valid_rows into base_valid (shared) and sig_valid (signal-only)
- Add 'source' column ("signal" vs "hold_negative") for traceability
- HOLD samples get label=0 and random 50/50 side assignment
- Default negative_ratio=0 preserves backward compatibility
- Fix incorrect column count assertion in existing test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 23:34:45 +09:00
21in7
eeb5e9d877 feat: add ADX filter to block sideways market entries
ADX < 25 now returns HOLD in get_signal(), preventing entries during
trendless (sideways) markets. NaN ADX values fall through to existing
weighted signal logic. Also syncs the vectorized dataset builder with
the same ADX filter to keep training data consistent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 19:55:12 +09:00
21in7
bcc717776d fix: RS 계산을 np.divide(where=) 방식으로 교체 — epsilon 이상치 폭발 차단
Made-with: Cursor
2026-03-02 00:30:36 +09:00
21in7
820d8e0213 refactor: 분모 연산을 1e-8 epsilon 패턴으로 통일
Made-with: Cursor
2026-03-01 23:52:59 +09:00
21in7
417b8e3c6a feat: OI/펀딩비 결측 구간을 np.nan으로 마스킹 (0.0 → nan)
Made-with: Cursor
2026-03-01 23:52:19 +09:00
21in7
24d3ba9411 feat: enhance data fetching and model training with OI and funding rate integration
- Updated `fetch_history.py` to collect open interest (OI) and funding rate data from Binance, improving the dataset for model training.
- Modified `train_and_deploy.sh` to include options for OI and funding rate collection during data fetching.
- Enhanced `dataset_builder.py` to incorporate OI change and funding rate features with rolling z-score normalization.
- Updated training logs to reflect new metrics and features, ensuring comprehensive tracking of model performance.
- Adjusted feature columns in `ml_features.py` to include OI and funding rate for improved model robustness.
2026-03-01 22:25:38 +09:00
21in7
4245d7cdbf feat: implement 15-minute timeframe upgrade for model training and data processing
- Introduced a new markdown document detailing the plan to transition the entire pipeline from a 1-minute to a 15-minute timeframe, aiming to improve model AUC from 0.49-0.50 to over 0.53.
- Updated key parameters across multiple scripts, including `LOOKAHEAD` adjustments and default data paths to reflect the new 15-minute interval.
- Modified data fetching and training scripts to ensure compatibility with the new timeframe, including changes in `fetch_history.py`, `train_model.py`, and `train_and_deploy.sh`.
- Enhanced the bot's data stream configuration to operate on a 15-minute interval, ensuring real-time data processing aligns with the new model training strategy.
- Updated training logs to capture new model performance metrics under the revised timeframe.
2026-03-01 22:16:15 +09:00
21in7
a6697e7cca feat: implement LightGBM model improvement plan with feature normalization and walk-forward validation
- Added a new markdown document outlining the plan to enhance the LightGBM model's AUC from 0.54 to 0.57+ through feature normalization, strong time weighting, and walk-forward validation.
- Implemented rolling z-score normalization for absolute value features in `src/dataset_builder.py` to improve model robustness against regime changes.
- Introduced a walk-forward validation function in `scripts/train_model.py` to accurately measure future prediction performance.
- Updated training log to include new model performance metrics and added ONNX model export functionality for compatibility.
- Adjusted model training parameters for better performance and included detailed validation results in the training log.
2026-03-01 22:02:32 +09:00
21in7
d9238afaf9 feat: enhance MLX model training with combined data handling
- Introduced a new function `_split_combined` to separate XRP, BTC, and ETH data from a combined DataFrame.
- Updated `train_mlx` to utilize the new function, improving data management and feature handling.
- Adjusted dataset generation to accommodate BTC and ETH features, with warnings for missing features.
- Changed default data path in `train_mlx` and `train_model` to point to the combined dataset for consistency.
- Increased `LOOKAHEAD` from 60 to 90 and adjusted `ATR_TP_MULT` for better model performance.
2026-03-01 21:43:27 +09:00
21in7
db144750a3 feat: enhance model training and deployment scripts with time-weighted sampling
- Updated `train_model.py` and `train_mlx_model.py` to include a time weight decay parameter for improved sample weighting during training.
- Modified dataset generation to incorporate sample weights based on time decay, enhancing model performance.
- Adjusted deployment scripts to support new backend options and improved error handling for model file transfers.
- Added new entries to the training log for better tracking of model performance metrics over time.
- Included ONNX model export functionality in the MLX filter for compatibility with Linux servers.
2026-03-01 21:25:06 +09:00
21in7
d1af736bfc feat: implement BTC/ETH correlation features for improved model accuracy
- Added a new design document outlining the integration of BTC/ETH candle data as additional features in the XRP ML filter, enhancing prediction accuracy.
- Introduced `MultiSymbolStream` for combined WebSocket data retrieval of XRP, BTC, and ETH.
- Expanded feature set from 13 to 21 by including 8 new BTC/ETH-related features.
- Updated various scripts and modules to support the new feature set and data handling.
- Enhanced training and deployment scripts to accommodate the new dataset structure.

This commit lays the groundwork for improved model performance by leveraging the correlation between BTC and ETH with XRP.
2026-03-01 19:30:17 +09:00
21in7
e1560f882b feat: add vectorized dataset builder (1x pandas_ta call)
Made-with: Cursor
2026-03-01 18:52:34 +09:00