Add two new OI-derived features to improve ML model's market microstructure
understanding:
- oi_change_ma5: 5-candle moving average of OI change rate (short-term trend)
- oi_price_spread: z-scored OI minus z-scored price return (divergence signal)
Both features use 96-candle rolling z-score window. FEATURE_COLS expanded from
24 to 26. Existing tests updated to reflect new feature counts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The two HOLD negative tests (test_hold_negative_labels_are_all_zero,
test_signal_samples_preserved_after_sampling) were passing vacuously
because sample_df produces 0 signal candles (ADX ~18, below threshold
25). Added signal_producing_df fixture with higher volatility and volume
surges to reliably generate signals. Removed if-guards so assertions
are mandatory. Also restored the full docstring for
generate_dataset_vectorized() documenting btc_df/eth_df,
time_weight_decay, and negative_ratio parameters.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add negative_ratio parameter to generate_dataset_vectorized() that
samples HOLD candles as label=0 negatives alongside signal candles.
This increases training data from ~535 to ~3,200 samples when enabled.
- Split valid_rows into base_valid (shared) and sig_valid (signal-only)
- Add 'source' column ("signal" vs "hold_negative") for traceability
- HOLD samples get label=0 and random 50/50 side assignment
- Default negative_ratio=0 preserves backward compatibility
- Fix incorrect column count assertion in existing test
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Added a new design document outlining the integration of BTC/ETH candle data as additional features in the XRP ML filter, enhancing prediction accuracy.
- Introduced `MultiSymbolStream` for combined WebSocket data retrieval of XRP, BTC, and ETH.
- Expanded feature set from 13 to 21 by including 8 new BTC/ETH-related features.
- Updated various scripts and modules to support the new feature set and data handling.
- Enhanced training and deployment scripts to accommodate the new dataset structure.
This commit lays the groundwork for improved model performance by leveraging the correlation between BTC and ETH with XRP.