Added HOLD candles as negative samples to increase training data from ~535 to ~3,200 samples. Introduced a negative_ratio parameter in generate_dataset_vectorized() for sampling HOLD candles alongside signal candles. Implemented stratified undersampling to ensure signal samples are preserved during training. Updated relevant tests to validate new functionality and maintain compatibility with existing tests.
- Modified dataset_builder.py to include HOLD negative sampling logic
- Updated train_model.py to apply stratified undersampling
- Added tests for new sampling methods
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- tune_hyperparams.py: 탐색 완료 후 Best AUC > Baseline AUC 이면
models/active_lgbm_params.json 자동 갱신
- tune_hyperparams.py: 베이스라인을 active 파일 기준으로 측정
(active 없으면 코드 내 기본값 사용)
- train_model.py: _load_lgbm_params()에 active 파일 자동 탐색 추가
우선순위: --tuned-params > active_lgbm_params.json > 하드코딩 기본값
- models/active_lgbm_params.json: 현재 best 파라미터로 초기화
- .gitignore: tune_results_*.json 제외, active 파일은 git 추적 유지
Made-with: Cursor
- scripts/tune_hyperparams.py: Optuna + Walk-Forward 5폴드 AUC 목적 함수
- 데이터셋 1회 캐싱으로 모든 trial 공유 (속도 최적화)
- num_leaves <= 2^max_depth - 1 제약 강제 (소규모 데이터 과적합 방지)
- MedianPruner로 저성능 trial 조기 종료
- 결과: 콘솔 리포트 + models/tune_results_YYYYMMDD_HHMMSS.json
- requirements.txt: optuna>=3.6.0 추가
- README.md: 하이퍼파라미터 자동 튜닝 사용법 섹션 추가
- docs/plans/: 설계 문서 및 구현 플랜 추가
Made-with: Cursor
바이낸스 OI 히스토리 API가 최근 30일치만 제공하는 제약을 우회하기 위해
upsert_parquet() 함수를 추가. 매일 실행 시 기존 parquet의 oi_change/funding_rate가
0.0인 구간만 신규 값으로 덮어써 점진적으로 과거 데이터를 채워나감.
--no-upsert 플래그로 기존 덮어쓰기 동작 유지 가능.
Made-with: Cursor
- Updated README.md to reflect new features including dynamic margin ratio, model hot-reload, and multi-symbol streaming.
- Modified bot logic to ensure raw signals are passed to the `_close_and_reenter` method, even when the ML filter is loaded.
- Introduced a new script `run_tests.sh` for streamlined test execution.
- Improved test coverage for signal processing and re-entry logic, ensuring correct behavior under various conditions.
- Added `_close_and_reenter` method to handle immediate re-entry after closing a position when a reverse signal is detected, contingent on passing the ML filter.
- Updated `process_candle` to call `_close_and_reenter` instead of `_close_position` for reverse signals.
- Enhanced test coverage for the new functionality, ensuring correct behavior under various conditions, including ML filter checks and position limits.
- Added new training log entries for lgbm backend with AUC, precision, and recall metrics.
- Enhanced deploy_model.sh to manage ONNX and lgbm model files based on the selected backend.
- Adjusted output shape in mlx_filter.py for ONNX export to support dynamic batch sizes.
- Updated `fetch_history.py` to collect open interest (OI) and funding rate data from Binance, improving the dataset for model training.
- Modified `train_and_deploy.sh` to include options for OI and funding rate collection during data fetching.
- Enhanced `dataset_builder.py` to incorporate OI change and funding rate features with rolling z-score normalization.
- Updated training logs to reflect new metrics and features, ensuring comprehensive tracking of model performance.
- Adjusted feature columns in `ml_features.py` to include OI and funding rate for improved model robustness.
- Introduced a new markdown document detailing the plan to transition the entire pipeline from a 1-minute to a 15-minute timeframe, aiming to improve model AUC from 0.49-0.50 to over 0.53.
- Updated key parameters across multiple scripts, including `LOOKAHEAD` adjustments and default data paths to reflect the new 15-minute interval.
- Modified data fetching and training scripts to ensure compatibility with the new timeframe, including changes in `fetch_history.py`, `train_model.py`, and `train_and_deploy.sh`.
- Enhanced the bot's data stream configuration to operate on a 15-minute interval, ensuring real-time data processing aligns with the new model training strategy.
- Updated training logs to capture new model performance metrics under the revised timeframe.
- Added a new markdown document outlining the plan to enhance the LightGBM model's AUC from 0.54 to 0.57+ through feature normalization, strong time weighting, and walk-forward validation.
- Implemented rolling z-score normalization for absolute value features in `src/dataset_builder.py` to improve model robustness against regime changes.
- Introduced a walk-forward validation function in `scripts/train_model.py` to accurately measure future prediction performance.
- Updated training log to include new model performance metrics and added ONNX model export functionality for compatibility.
- Adjusted model training parameters for better performance and included detailed validation results in the training log.
- Added a new stage to the Jenkins pipeline to notify Discord when a build starts, succeeds, or fails, improving communication during the CI/CD process.
- Implemented model hot-reload functionality in the MLFilter class, allowing automatic reloading of models when file changes are detected, enhancing responsiveness to updates.
- Updated deployment scripts to provide clearer messaging regarding model loading and container status, improving user experience and debugging capabilities.
- Introduced a new function `_split_combined` to separate XRP, BTC, and ETH data from a combined DataFrame.
- Updated `train_mlx` to utilize the new function, improving data management and feature handling.
- Adjusted dataset generation to accommodate BTC and ETH features, with warnings for missing features.
- Changed default data path in `train_mlx` and `train_model` to point to the combined dataset for consistency.
- Increased `LOOKAHEAD` from 60 to 90 and adjusted `ATR_TP_MULT` for better model performance.
- Updated `train_model.py` and `train_mlx_model.py` to include a time weight decay parameter for improved sample weighting during training.
- Modified dataset generation to incorporate sample weights based on time decay, enhancing model performance.
- Adjusted deployment scripts to support new backend options and improved error handling for model file transfers.
- Added new entries to the training log for better tracking of model performance metrics over time.
- Included ONNX model export functionality in the MLX filter for compatibility with Linux servers.
- Added a new design document outlining the integration of BTC/ETH candle data as additional features in the XRP ML filter, enhancing prediction accuracy.
- Introduced `MultiSymbolStream` for combined WebSocket data retrieval of XRP, BTC, and ETH.
- Expanded feature set from 13 to 21 by including 8 new BTC/ETH-related features.
- Updated various scripts and modules to support the new feature set and data handling.
- Enhanced training and deployment scripts to accommodate the new dataset structure.
This commit lays the groundwork for improved model performance by leveraging the correlation between BTC and ETH with XRP.
- Added comprehensive plans for training a LightGBM model on M4 Mac Mini and deploying it to an LXC container.
- Created scripts for model training, deployment, and a full pipeline execution.
- Enhanced model transfer with error handling and logging for better tracking.
- Introduced profiling for training time analysis and dataset generation optimization.
Made-with: Cursor
- Added a new function to accurately retrieve the number of allocated CPUs in containerized environments, improving parallel processing efficiency.
- Updated the dataset generation function to utilize the new CPU count function, ensuring optimal resource usage during model training.
Made-with: Cursor
- Created README.md to document project features, structure, and setup instructions.
- Updated fetch_history.py to include path adjustments for module imports.
- Enhanced train_model.py for parallel processing of dataset generation and added command-line argument for specifying worker count.
Made-with: Cursor
- Added MLFilter class to load and evaluate LightGBM model for trading signals.
- Introduced retraining mechanism to update the model daily based on new data.
- Created feature engineering and label building utilities for model training.
- Updated bot logic to incorporate ML filter for signal validation.
- Added scripts for data fetching and model training.
Made-with: Cursor