Added HOLD candles as negative samples to increase training data from ~535 to ~3,200 samples. Introduced a negative_ratio parameter in generate_dataset_vectorized() for sampling HOLD candles alongside signal candles. Implemented stratified undersampling to ensure signal samples are preserved during training. Updated relevant tests to validate new functionality and maintain compatibility with existing tests.
- Modified dataset_builder.py to include HOLD negative sampling logic
- Updated train_model.py to apply stratified undersampling
- Added tests for new sampling methods
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add negative_ratio parameter to generate_dataset_vectorized() that
samples HOLD candles as label=0 negatives alongside signal candles.
This increases training data from ~535 to ~3,200 samples when enabled.
- Split valid_rows into base_valid (shared) and sig_valid (signal-only)
- Add 'source' column ("signal" vs "hold_negative") for traceability
- HOLD samples get label=0 and random 50/50 side assignment
- Default negative_ratio=0 preserves backward compatibility
- Fix incorrect column count assertion in existing test
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Introduced a comprehensive architecture document detailing the CoinTrader system, including an overview, core layer architecture, MLOps pipeline, and key operational scenarios.
- Updated README to reference the new architecture document and added a configuration option to disable the ML filter.
- Enhanced the ML filter to allow for complete signal acceptance when the NO_ML_FILTER environment variable is set.