feat: add OI derived features (oi_change_ma5, oi_price_spread)

This commit is contained in:
21in7
2026-03-04 20:32:51 +09:00
13 changed files with 1283 additions and 21 deletions

View File

@@ -47,7 +47,7 @@ bash scripts/deploy_model.sh
**5-layer data flow on each 15m candle close:** **5-layer data flow on each 15m candle close:**
1. `src/data_stream.py` — Combined WebSocket for XRP/BTC/ETH, deque buffers (200 candles each) 1. `src/data_stream.py` — Combined WebSocket for XRP/BTC/ETH, deque buffers (200 candles each)
2. `src/indicators.py` — RSI, MACD, BB, EMA, StochRSI, ATR; weighted signal aggregation → LONG/SHORT/HOLD 2. `src/indicators.py` — RSI, MACD, BB, EMA, StochRSI, ATR; weighted signal aggregation → LONG/SHORT/HOLD
3. `src/ml_filter.py` + `src/ml_features.py` — 24-feature extraction (ADX 포함), ONNX priority > LightGBM fallback, threshold ≥ 0.55 3. `src/ml_filter.py` + `src/ml_features.py` — 26-feature extraction (ADX + OI 파생 피처 포함), ONNX priority > LightGBM fallback, threshold ≥ 0.55
4. `src/exchange.py` + `src/risk_manager.py` — Dynamic margin, MARKET orders with SL/TP, daily loss limit (5%) 4. `src/exchange.py` + `src/risk_manager.py` — Dynamic margin, MARKET orders with SL/TP, daily loss limit (5%)
5. `src/user_data_stream.py` + `src/notifier.py` — Real-time TP/SL detection via WebSocket, Discord webhooks 5. `src/user_data_stream.py` + `src/notifier.py` — Real-time TP/SL detection via WebSocket, Discord webhooks
@@ -116,3 +116,4 @@ All design documents and implementation plans are stored in `docs/plans/` with t
| 2026-03-03 | `position-monitor-logging` | Completed | | 2026-03-03 | `position-monitor-logging` | Completed |
| 2026-03-03 | `adx-ml-feature-migration` (design + plan) | Completed | | 2026-03-03 | `adx-ml-feature-migration` (design + plan) | Completed |
| 2026-03-03 | `optuna-precision-objective-plan` | Pending | | 2026-03-03 | `optuna-precision-objective-plan` | Pending |
| 2026-03-04 | `oi-derived-features` (design + plan) | In Progress |

View File

@@ -0,0 +1,70 @@
# OI 파생 피처 설계
## 목표
기존 `oi_change` 피처에 더해, OI 데이터에서 파생 피처 2개를 만들어 LightGBM 학습 데이터에 추가하고, 피처 추가 전후 검증셋 성능을 자동 비교한다.
## 제약사항
- Binance OI 히스토리 API는 최근 30일분만 제공
- 학습 데이터에서 OI 유효 구간 ≈ 2,880개 15분 캔들
- A/B 비교 결과는 방향성 참고용 (통계적 유의성 제한)
## 파생 피처
### 1. `oi_change_ma5`
- **계산**: OI 변화율의 5캔들(75분) 이동평균
- **의미**: OI 단기 추세. 급감/급증 노이즈 제거된 방향성
- **정규화**: rolling z-score (288캔들 윈도우, 기존 패턴 동일)
- **기존 `oi_change`와의 관계**: smoothed 버전. 상관관계 높을 수 있으나 LightGBM이 자연 선택. importance 낮으면 이후 제거
### 2. `oi_price_spread`
- **계산**: `rolling_zscore(oi_change) - rolling_zscore(price_ret_1)`
- **의미**: OI와 가격 움직임 간 괴리도 (연속값)
- 양수: OI가 가격 대비 강세 (자금 유입)
- 음수: OI가 가격 대비 약세 (자금 유출)
- **정규화**: 양쪽 입력이 이미 z-score이므로 추가 정규화 불필요
- **바이너리 대신 연속값 채택 이유**: sign() 기반 바이너리는 미미한 차이도 1/0으로 분류 → 노이즈 과잉. 연속값은 LightGBM이 분할점을 학습
## 수정 대상 파일
### dataset_builder.py
- OI 파생 피처 2개 계산 로직 추가
- 기존 `oi_change` z-score 결과를 재사용하여 `oi_change_ma5` 계산
- `oi_price_spread` = `oi_change` z-score - `ret_1` z-score
### ml_features.py
- `FEATURE_COLS``oi_change_ma5`, `oi_price_spread` 추가 (24→26)
- `build_features()`에 실시간 계산 로직 추가
- `oi_change_ma5`: bot에서 전달받은 최근 5봉 OI MA
- `oi_price_spread`: 실시간 z-scored OI - z-scored price change
### train_model.py
- `--compare` 플래그 추가
- Baseline (기존 24피처) vs New (26피처) 자동 비교 출력:
- Precision, Recall, F1, AUC-ROC
- Feature importance top 10
- Best threshold
- 검증셋 크기 (n=XX) 및 "방향성 참고용" 경고
### bot.py
- OI 변화율 히스토리 deque(maxlen=5) 관리
- `_init_oi_history()`: 봇 시작 시 Binance OI hist API에서 최근 5봉 fetch → cold start 해결
- `_fetch_market_microstructure()` 확장: MA5 계산, price_spread 계산 후 build_features()에 전달
### exchange.py
- `get_oi_history(limit=5)`: 봇 초기화용 최근 OI 히스토리 fetch 메서드 추가
### scripts/collect_oi.py (신규)
- OI 장기 수집 스크립트
- 15분마다 cron 실행
- Binance `/fapi/v1/openInterest` 호출 → `data/oi_history.parquet`에 append
- 기존 fetch_history.py의 30일 데이터 보완용

View File

@@ -0,0 +1,764 @@
# OI 파생 피처 구현 계획
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** OI 파생 피처 2개(`oi_change_ma5`, `oi_price_spread`)를 추가하고, 기존 대비 성능을 자동 비교하며, OI 장기 수집 스크립트를 만든다.
**Architecture:** dataset_builder.py에 파생 피처 계산 추가 → ml_features.py에 FEATURE_COLS/build_features 확장 → train_model.py에 --compare 플래그로 A/B 비교 → bot.py에 OI deque 히스토리 관리 및 cold start → scripts/collect_oi.py 신규
**Tech Stack:** Python, LightGBM, pandas, numpy, Binance REST API
---
### Task 1: dataset_builder.py — OI 파생 피처 계산
**Files:**
- Modify: `src/dataset_builder.py:277-291` (OI/FR 피처 계산 블록)
- Test: `tests/test_dataset_builder.py`
**Step 1: Write failing tests**
`tests/test_dataset_builder.py` 끝에 추가:
```python
def test_oi_derived_features_present():
"""OI 파생 피처 2개가 결과에 포함되어야 한다."""
import numpy as np
import pandas as pd
from src.dataset_builder import _calc_features_vectorized, _calc_signals, _calc_indicators
n = 300
np.random.seed(42)
df = pd.DataFrame({
"open": np.random.uniform(1, 2, n),
"high": np.random.uniform(2, 3, n),
"low": np.random.uniform(0.5, 1, n),
"close": np.random.uniform(1, 2, n),
"volume": np.random.uniform(1000, 5000, n),
"oi_change": np.concatenate([np.zeros(100), np.random.uniform(-0.05, 0.05, 200)]),
})
d = _calc_indicators(df)
sig = _calc_signals(d)
feat = _calc_features_vectorized(d, sig)
assert "oi_change_ma5" in feat.columns, "oi_change_ma5 컬럼이 없음"
assert "oi_price_spread" in feat.columns, "oi_price_spread 컬럼이 없음"
def test_oi_derived_features_nan_when_no_oi():
"""oi_change 컬럼이 없으면 파생 피처도 nan이어야 한다."""
import numpy as np
import pandas as pd
from src.dataset_builder import _calc_features_vectorized, _calc_signals, _calc_indicators
n = 200
np.random.seed(0)
df = pd.DataFrame({
"open": np.random.uniform(1, 2, n),
"high": np.random.uniform(2, 3, n),
"low": np.random.uniform(0.5, 1, n),
"close": np.random.uniform(1, 2, n),
"volume": np.random.uniform(1000, 5000, n),
})
d = _calc_indicators(df)
sig = _calc_signals(d)
feat = _calc_features_vectorized(d, sig)
assert feat["oi_change_ma5"].isna().all(), "oi_change 컬럼 없을 때 oi_change_ma5는 전부 nan이어야 함"
assert feat["oi_price_spread"].isna().all(), "oi_change 컬럼 없을 때 oi_price_spread는 전부 nan이어야 함"
def test_oi_price_spread_is_continuous():
"""oi_price_spread는 바이너리가 아닌 연속값이어야 한다."""
import numpy as np
import pandas as pd
from src.dataset_builder import _calc_features_vectorized, _calc_signals, _calc_indicators
n = 300
np.random.seed(42)
df = pd.DataFrame({
"open": np.random.uniform(1, 2, n),
"high": np.random.uniform(2, 3, n),
"low": np.random.uniform(0.5, 1, n),
"close": np.random.uniform(1, 2, n),
"volume": np.random.uniform(1000, 5000, n),
"oi_change": np.random.uniform(-0.05, 0.05, n),
})
d = _calc_indicators(df)
sig = _calc_signals(d)
feat = _calc_features_vectorized(d, sig)
valid = feat["oi_price_spread"].dropna()
assert len(valid.unique()) > 2, "oi_price_spread는 연속값이어야 함 (2개 초과 유니크 값)"
```
**Step 2: Run tests to verify they fail**
Run: `bash scripts/run_tests.sh -k "oi_derived"`
Expected: FAIL — `oi_change_ma5`, `oi_price_spread` 컬럼 없음
**Step 3: Implement in dataset_builder.py**
`src/dataset_builder.py:277-291` (기존 OI/FR 블록) 뒤에 파생 피처 추가:
```python
# OI 변화율 / 펀딩비 피처
# 컬럼 없으면 전체 nan, 있으면 0.0 구간(데이터 미제공 구간)을 nan으로 마스킹
if "oi_change" in d.columns:
oi_raw = np.where(d["oi_change"].values == 0.0, np.nan, d["oi_change"].values)
else:
oi_raw = np.full(len(d), np.nan)
if "funding_rate" in d.columns:
fr_raw = np.where(d["funding_rate"].values == 0.0, np.nan, d["funding_rate"].values)
else:
fr_raw = np.full(len(d), np.nan)
oi_z = _rolling_zscore(oi_raw.astype(np.float64), window=96)
result["oi_change"] = oi_z
result["funding_rate"] = _rolling_zscore(fr_raw.astype(np.float64), window=96)
# --- OI 파생 피처 ---
# 1. oi_change_ma5: OI 변화율의 5캔들 이동평균 (단기 추세)
oi_series = pd.Series(oi_raw.astype(np.float64))
oi_ma5_raw = oi_series.rolling(window=5, min_periods=1).mean().values
result["oi_change_ma5"] = _rolling_zscore(oi_ma5_raw, window=96)
# 2. oi_price_spread: z-scored OI 변화율 - z-scored 가격 수익률 (연속값)
# 양수: OI가 가격 대비 강세 (자금 유입)
# 음수: OI가 가격 대비 약세 (자금 유출)
result["oi_price_spread"] = oi_z - ret_1_z
```
주의: 기존 `oi_change``funding_rate`의 window도 288→96으로 변경. `oi_z` 변수를 재사용하여 `oi_price_spread` 계산. `ret_1_z`는 이미 위에서 계산됨 (line 181).
**Step 4: Update OPTIONAL_COLS in generate_dataset_vectorized**
`src/dataset_builder.py:387` 수정:
```python
OPTIONAL_COLS = {"oi_change", "funding_rate", "oi_change_ma5", "oi_price_spread"}
```
**Step 5: Run tests to verify they pass**
Run: `bash scripts/run_tests.sh -k "oi_derived"`
Expected: 3 tests PASS
**Step 6: Run full test suite**
Run: `bash scripts/run_tests.sh`
Expected: All existing tests PASS (기존 oi_change/funding_rate 테스트 포함)
**Step 7: Commit**
```bash
git add src/dataset_builder.py tests/test_dataset_builder.py
git commit -m "feat: add oi_change_ma5 and oi_price_spread derived features to dataset builder"
```
---
### Task 2: ml_features.py — FEATURE_COLS 및 build_features() 확장
**Files:**
- Modify: `src/ml_features.py:4-15` (FEATURE_COLS), `src/ml_features.py:33-139` (build_features)
- Test: `tests/test_ml_features.py`
**Step 1: Write failing tests**
`tests/test_ml_features.py` 끝에 추가:
```python
def test_feature_cols_has_26_items():
from src.ml_features import FEATURE_COLS
assert len(FEATURE_COLS) == 26
def test_build_features_with_oi_derived_params():
"""oi_change_ma5, oi_price_spread 파라미터가 피처에 반영된다."""
xrp_df = _make_df(10, base_price=1.0)
btc_df = _make_df(10, base_price=50000.0)
eth_df = _make_df(10, base_price=3000.0)
features = build_features(
xrp_df, "LONG",
btc_df=btc_df, eth_df=eth_df,
oi_change=0.05, funding_rate=0.0002,
oi_change_ma5=0.03, oi_price_spread=0.12,
)
assert features["oi_change_ma5"] == pytest.approx(0.03)
assert features["oi_price_spread"] == pytest.approx(0.12)
def test_build_features_oi_derived_defaults_to_zero():
"""oi_change_ma5, oi_price_spread 미제공 시 0.0으로 채워진다."""
xrp_df = _make_df(10, base_price=1.0)
features = build_features(xrp_df, "LONG")
assert features["oi_change_ma5"] == pytest.approx(0.0)
assert features["oi_price_spread"] == pytest.approx(0.0)
```
기존 테스트 수정:
- `test_feature_cols_has_24_items` → 삭제 또는 숫자를 26으로 변경
- `test_build_features_with_btc_eth_has_24_features``assert len(features) == 26`
- `test_build_features_without_btc_eth_has_16_features``assert len(features) == 18`
**Step 2: Run tests to verify they fail**
Run: `bash scripts/run_tests.sh -k "test_feature_cols_has_26 or test_build_features_oi_derived"`
Expected: FAIL
**Step 3: Implement**
`src/ml_features.py` FEATURE_COLS 수정 (line 4-15):
```python
FEATURE_COLS = [
"rsi", "macd_hist", "bb_pct", "ema_align",
"stoch_k", "stoch_d", "atr_pct", "vol_ratio",
"ret_1", "ret_3", "ret_5", "signal_strength", "side",
"btc_ret_1", "btc_ret_3", "btc_ret_5",
"eth_ret_1", "eth_ret_3", "eth_ret_5",
"xrp_btc_rs", "xrp_eth_rs",
# 시장 미시구조: OI 변화율(z-score), 펀딩비(z-score)
"oi_change", "funding_rate",
# OI 파생 피처
"oi_change_ma5", "oi_price_spread",
"adx",
]
```
`build_features()` 시그니처 수정 (line 33-40):
```python
def build_features(
df: pd.DataFrame,
signal: str,
btc_df: pd.DataFrame | None = None,
eth_df: pd.DataFrame | None = None,
oi_change: float | None = None,
funding_rate: float | None = None,
oi_change_ma5: float | None = None,
oi_price_spread: float | None = None,
) -> pd.Series:
```
`build_features()` 끝부분 (line 134-138) 수정:
```python
base["oi_change"] = float(oi_change) if oi_change is not None else 0.0
base["funding_rate"] = float(funding_rate) if funding_rate is not None else 0.0
base["oi_change_ma5"] = float(oi_change_ma5) if oi_change_ma5 is not None else 0.0
base["oi_price_spread"] = float(oi_price_spread) if oi_price_spread is not None else 0.0
base["adx"] = float(last.get("adx", 0))
```
**Step 4: Run tests**
Run: `bash scripts/run_tests.sh -k "test_ml_features"`
Expected: All PASS
**Step 5: Run full test suite**
Run: `bash scripts/run_tests.sh`
Expected: All PASS (test_dataset_builder의 FEATURE_COLS 참조도 26개로 통과)
**Step 6: Commit**
```bash
git add src/ml_features.py tests/test_ml_features.py
git commit -m "feat: add oi_change_ma5 and oi_price_spread to FEATURE_COLS and build_features"
```
---
### Task 3: train_model.py — --compare A/B 비교 모드
**Files:**
- Modify: `scripts/train_model.py:425-452` (main, argparse)
- Test: 수동 실행 확인 (학습 스크립트는 통합 테스트)
**Step 1: Implement compare function**
`scripts/train_model.py``compare()` 함수 추가 (train() 함수 뒤):
```python
def compare(data_path: str, time_weight_decay: float = 2.0, tuned_params_path: str | None = None):
"""기존 피처 vs OI 파생 피처 추가 버전 A/B 비교."""
print("=" * 70)
print(" OI 파생 피처 A/B 비교 (30일 데이터 기반, 방향성 참고용)")
print("=" * 70)
df_raw = pd.read_parquet(data_path)
base_cols = ["open", "high", "low", "close", "volume"]
btc_df = eth_df = None
if "close_btc" in df_raw.columns:
btc_df = df_raw[[c + "_btc" for c in base_cols]].copy()
btc_df.columns = base_cols
if "close_eth" in df_raw.columns:
eth_df = df_raw[[c + "_eth" for c in base_cols]].copy()
eth_df.columns = base_cols
df = df_raw[base_cols].copy()
if "oi_change" in df_raw.columns:
df["oi_change"] = df_raw["oi_change"]
if "funding_rate" in df_raw.columns:
df["funding_rate"] = df_raw["funding_rate"]
dataset = generate_dataset_vectorized(
df, btc_df=btc_df, eth_df=eth_df,
time_weight_decay=time_weight_decay,
negative_ratio=5,
)
if dataset.empty:
raise ValueError("데이터셋 생성 실패")
lgbm_params, weight_scale = _load_lgbm_params(tuned_params_path)
# Baseline: OI 파생 피처 제외
BASELINE_EXCLUDE = {"oi_change_ma5", "oi_price_spread"}
baseline_cols = [c for c in FEATURE_COLS if c in dataset.columns and c not in BASELINE_EXCLUDE]
new_cols = [c for c in FEATURE_COLS if c in dataset.columns]
results = {}
for label, cols in [("Baseline (24)", baseline_cols), ("New (26)", new_cols)]:
X = dataset[cols]
y = dataset["label"]
w = dataset["sample_weight"].values
source = dataset["source"].values if "source" in dataset.columns else np.full(len(X), "signal")
split = int(len(X) * 0.8)
X_tr, X_val = X.iloc[:split], X.iloc[split:]
y_tr, y_val = y.iloc[:split], y.iloc[split:]
w_tr = (w[:split] * weight_scale).astype(np.float32)
source_tr = source[:split]
balanced_idx = stratified_undersample(y_tr.values, source_tr, seed=42)
X_tr_b = X_tr.iloc[balanced_idx]
y_tr_b = y_tr.iloc[balanced_idx]
w_tr_b = w_tr[balanced_idx]
import warnings
model = lgb.LGBMClassifier(**lgbm_params, random_state=42, verbose=-1)
with warnings.catch_warnings():
warnings.simplefilter("ignore")
model.fit(X_tr_b, y_tr_b, sample_weight=w_tr_b)
proba = model.predict_proba(X_val)[:, 1]
auc = roc_auc_score(y_val, proba) if len(np.unique(y_val)) > 1 else 0.5
precs, recs, thrs = precision_recall_curve(y_val, proba)
precs, recs = precs[:-1], recs[:-1]
valid_idx = np.where(recs >= 0.15)[0]
if len(valid_idx) > 0:
best_i = valid_idx[np.argmax(precs[valid_idx])]
thr, prec, rec = float(thrs[best_i]), float(precs[best_i]), float(recs[best_i])
else:
thr, prec, rec = 0.50, 0.0, 0.0
# Feature importance
imp = dict(zip(cols, model.feature_importances_))
top10 = sorted(imp.items(), key=lambda x: x[1], reverse=True)[:10]
results[label] = {
"auc": auc, "precision": prec, "recall": rec,
"threshold": thr, "n_val": len(y_val),
"n_val_pos": int(y_val.sum()), "top10": top10,
}
# 비교 테이블 출력
print(f"\n{'지표':<20} {'Baseline (24)':>15} {'New (26)':>15} {'Delta':>10}")
print("-" * 62)
for metric in ["auc", "precision", "recall", "threshold"]:
b = results["Baseline (24)"][metric]
n = results["New (26)"][metric]
d = n - b
sign = "+" if d > 0 else ""
print(f"{metric:<20} {b:>15.4f} {n:>15.4f} {sign}{d:>9.4f}")
n_val = results["Baseline (24)"]["n_val"]
n_pos = results["Baseline (24)"]["n_val_pos"]
print(f"\n검증셋: n={n_val} (양성={n_pos}, 음성={n_val - n_pos})")
print("⚠ 30일 데이터 기반 — 방향성 참고용\n")
print("Feature Importance Top 10 (New):")
for feat_name, imp_val in results["New (26)"]["top10"]:
marker = " ← NEW" if feat_name in BASELINE_EXCLUDE else ""
print(f" {feat_name:<25} {imp_val:>6}{marker}")
```
**Step 2: Add --compare flag to argparse**
`scripts/train_model.py` main() 함수의 argparse에 추가:
```python
parser.add_argument("--compare", action="store_true",
help="OI 파생 피처 추가 전후 A/B 성능 비교")
```
main() 분기에 추가:
```python
if args.compare:
compare(args.data, time_weight_decay=args.decay, tuned_params_path=args.tuned_params)
elif args.wf:
...
```
**Step 3: Commit**
```bash
git add scripts/train_model.py
git commit -m "feat: add --compare flag for OI derived features A/B comparison"
```
---
### Task 4: bot.py — OI deque 히스토리 및 실시간 파생 피처 공급
**Files:**
- Modify: `src/bot.py:15-31` (init), `src/bot.py:60-83` (fetch/calc), `src/bot.py:110-114,287-291` (build_features 호출)
- Modify: `src/exchange.py` (get_oi_history 추가)
- Test: `tests/test_bot.py`
**Step 1: Write failing tests**
`tests/test_bot.py` 끝에 추가:
```python
def test_bot_has_oi_history_deque(config):
"""봇이 OI 히스토리 deque를 가져야 한다."""
with patch("src.bot.BinanceFuturesClient"):
bot = TradingBot(config)
from collections import deque
assert isinstance(bot._oi_history, deque)
assert bot._oi_history.maxlen == 5
@pytest.mark.asyncio
async def test_init_oi_history_fills_deque(config):
"""_init_oi_history가 deque를 채워야 한다."""
with patch("src.bot.BinanceFuturesClient"):
bot = TradingBot(config)
bot.exchange.get_oi_history = AsyncMock(return_value=[0.01, -0.02, 0.03, -0.01, 0.02])
await bot._init_oi_history()
assert len(bot._oi_history) == 5
@pytest.mark.asyncio
async def test_fetch_microstructure_returns_derived_features(config):
"""_fetch_market_microstructure가 oi_change_ma5와 oi_price_spread를 반환해야 한다."""
with patch("src.bot.BinanceFuturesClient"):
bot = TradingBot(config)
bot.exchange.get_open_interest = AsyncMock(return_value=5000000.0)
bot.exchange.get_funding_rate = AsyncMock(return_value=0.0001)
bot._prev_oi = 4900000.0
bot._oi_history.extend([0.01, -0.02, 0.03, -0.01])
bot._latest_ret_1 = 0.01
result = await bot._fetch_market_microstructure()
assert len(result) == 4 # oi_change, funding_rate, oi_change_ma5, oi_price_spread
```
**Step 2: Run tests to verify they fail**
Run: `bash scripts/run_tests.sh -k "oi_history or fetch_microstructure_returns_derived"`
Expected: FAIL
**Step 3: Implement exchange.get_oi_history()**
`src/exchange.py`에 추가:
```python
async def get_oi_history(self, limit: int = 5) -> list[float]:
"""최근 OI 변화율 히스토리를 조회한다 (봇 초기화용). 실패 시 빈 리스트."""
loop = asyncio.get_event_loop()
try:
result = await loop.run_in_executor(
None,
lambda: self.client.futures_open_interest_hist(
symbol=self.config.symbol, period="15m", limit=limit + 1,
),
)
if len(result) < 2:
return []
oi_values = [float(r["sumOpenInterest"]) for r in result]
changes = []
for i in range(1, len(oi_values)):
if oi_values[i - 1] > 0:
changes.append((oi_values[i] - oi_values[i - 1]) / oi_values[i - 1])
else:
changes.append(0.0)
return changes
except Exception as e:
logger.warning(f"OI 히스토리 조회 실패 (무시): {e}")
return []
```
**Step 4: Implement bot.py changes**
`src/bot.py` `__init__` 수정:
```python
from collections import deque
# __init__에 추가:
self._oi_history: deque = deque(maxlen=5)
self._latest_ret_1: float = 0.0 # 최신 가격 수익률 (oi_price_spread용)
```
`_init_oi_history()` 추가:
```python
async def _init_oi_history(self) -> None:
"""봇 시작 시 최근 OI 변화율 히스토리를 조회하여 deque를 채운다."""
try:
changes = await self.exchange.get_oi_history(limit=5)
for c in changes:
self._oi_history.append(c)
if changes:
self._prev_oi = None # 다음 실시간 OI로 갱신
logger.info(f"OI 히스토리 초기화: {len(self._oi_history)}")
except Exception as e:
logger.warning(f"OI 히스토리 초기화 실패 (무시): {e}")
```
`_fetch_market_microstructure()` 수정 — 4-tuple 반환:
```python
async def _fetch_market_microstructure(self) -> tuple[float, float, float, float]:
"""OI 변화율, 펀딩비, OI MA5, OI-가격 스프레드를 실시간으로 조회한다."""
oi_val, fr_val = await asyncio.gather(
self.exchange.get_open_interest(),
self.exchange.get_funding_rate(),
return_exceptions=True,
)
if isinstance(oi_val, (int, float)) and oi_val > 0:
oi_change = self._calc_oi_change(float(oi_val))
else:
oi_change = 0.0
fr_float = float(fr_val) if isinstance(fr_val, (int, float)) else 0.0
# OI 히스토리 업데이트 및 MA5 계산
self._oi_history.append(oi_change)
oi_ma5 = sum(self._oi_history) / len(self._oi_history) if self._oi_history else 0.0
# OI-가격 스프레드 (단순 차이, 실시간에서는 z-score 없이 raw)
oi_price_spread = oi_change - self._latest_ret_1
logger.debug(
f"OI={oi_val}, OI변화율={oi_change:.6f}, 펀딩비={fr_float:.6f}, "
f"OI_MA5={oi_ma5:.6f}, OI_Price_Spread={oi_price_spread:.6f}"
)
return oi_change, fr_float, oi_ma5, oi_price_spread
```
`process_candle()` 수정:
```python
# 캔들 마감 시 가격 수익률 계산 (oi_price_spread용)
if len(df) >= 2:
prev_close = df["close"].iloc[-2]
curr_close = df["close"].iloc[-1]
self._latest_ret_1 = (curr_close - prev_close) / prev_close if prev_close != 0 else 0.0
oi_change, funding_rate, oi_ma5, oi_price_spread = await self._fetch_market_microstructure()
```
모든 `build_features()` 호출에 새 파라미터 추가:
```python
features = build_features(
df_with_indicators, signal,
btc_df=btc_df, eth_df=eth_df,
oi_change=oi_change, funding_rate=funding_rate,
oi_change_ma5=oi_ma5, oi_price_spread=oi_price_spread,
)
```
`_close_and_reenter()` 시그니처도 확장:
```python
async def _close_and_reenter(
self,
position: dict,
signal: str,
df,
btc_df=None,
eth_df=None,
oi_change: float = 0.0,
funding_rate: float = 0.0,
oi_change_ma5: float = 0.0,
oi_price_spread: float = 0.0,
) -> None:
```
`run()` 수정 — `_init_oi_history()` 호출 추가:
```python
async def run(self):
logger.info(f"봇 시작: {self.config.symbol}, 레버리지 {self.config.leverage}x")
await self._recover_position()
await self._init_oi_history()
...
```
**Step 5: Run tests**
Run: `bash scripts/run_tests.sh -k "test_bot"`
Expected: All PASS
**Step 6: Run full test suite**
Run: `bash scripts/run_tests.sh`
Expected: All PASS
**Step 7: Commit**
```bash
git add src/bot.py src/exchange.py tests/test_bot.py
git commit -m "feat: add OI history deque, cold start init, and derived features to bot runtime"
```
---
### Task 5: scripts/collect_oi.py — OI 장기 수집 스크립트
**Files:**
- Create: `scripts/collect_oi.py`
**Step 1: Implement**
```python
"""
OI 장기 수집 스크립트.
15분마다 cron 실행하여 Binance OI를 data/oi_history.parquet에 누적한다.
사용법:
python scripts/collect_oi.py
python scripts/collect_oi.py --symbol XRPUSDT
crontab 예시:
*/15 * * * * cd /path/to/cointrader && .venv/bin/python scripts/collect_oi.py >> logs/collect_oi.log 2>&1
"""
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent))
import argparse
from datetime import datetime, timezone
import pandas as pd
from binance.client import Client
from dotenv import load_dotenv
import os
load_dotenv()
OI_PATH = Path("data/oi_history.parquet")
def collect(symbol: str = "XRPUSDT"):
client = Client(
api_key=os.getenv("BINANCE_API_KEY", ""),
api_secret=os.getenv("BINANCE_API_SECRET", ""),
)
result = client.futures_open_interest(symbol=symbol)
oi_value = float(result["openInterest"])
ts = datetime.now(timezone.utc)
new_row = pd.DataFrame([{
"timestamp": ts,
"symbol": symbol,
"open_interest": oi_value,
}])
if OI_PATH.exists():
existing = pd.read_parquet(OI_PATH)
combined = pd.concat([existing, new_row], ignore_index=True)
else:
OI_PATH.parent.mkdir(parents=True, exist_ok=True)
combined = new_row
combined.to_parquet(OI_PATH, index=False)
print(f"[{ts.isoformat()}] OI={oi_value:.2f}{OI_PATH}")
def main():
parser = argparse.ArgumentParser(description="OI 장기 수집")
parser.add_argument("--symbol", default="XRPUSDT")
args = parser.parse_args()
collect(symbol=args.symbol)
if __name__ == "__main__":
main()
```
**Step 2: Commit**
```bash
git add scripts/collect_oi.py
git commit -m "feat: add OI long-term collection script for cron-based data accumulation"
```
---
### Task 6: 기존 테스트 수정 및 전체 검증
**Files:**
- Modify: `tests/test_ml_features.py` (피처 수 변경)
- Modify: `tests/test_bot.py` (기존 OI 테스트가 4-tuple 반환에 호환되도록)
**Step 1: Fix test_ml_features.py assertions**
- `test_feature_cols_has_24_items` → 26으로 변경
- `test_build_features_with_btc_eth_has_24_features` → 26
- `test_build_features_without_btc_eth_has_16_features` → 18
**Step 2: Fix test_bot.py**
기존 `test_process_candle_fetches_oi_and_funding` 등에서 `_fetch_market_microstructure` 반환값이 4-tuple이 되므로 mock 반환값 수정:
```python
bot._fetch_market_microstructure = AsyncMock(return_value=(0.02, 0.0001, 0.015, 0.01))
```
또는 `_fetch_market_microstructure`를 mock하지 않는 테스트는 exchange mock이 정상이면 자동 통과.
**Step 3: Run full test suite**
Run: `bash scripts/run_tests.sh`
Expected: All PASS
**Step 4: Commit**
```bash
git add tests/test_ml_features.py tests/test_bot.py
git commit -m "test: update test assertions for 26-feature model and 4-tuple microstructure"
```
---
### Task 7: CLAUDE.md 업데이트
**Files:**
- Modify: `CLAUDE.md`
**Step 1: Update plan table**
CLAUDE.md의 plan history 테이블에 추가:
```
| 2026-03-04 | `oi-derived-features` (design + plan) | In Progress |
```
ml_features.py 설명도 24→26개로 갱신.
**Step 2: Commit**
```bash
git add CLAUDE.md
git commit -m "docs: update CLAUDE.md with OI derived features plan status"
```

64
scripts/collect_oi.py Normal file
View File

@@ -0,0 +1,64 @@
"""
OI 장기 수집 스크립트.
15분마다 cron 실행하여 Binance OI를 data/oi_history.parquet에 누적한다.
사용법:
python scripts/collect_oi.py
python scripts/collect_oi.py --symbol XRPUSDT
crontab 예시:
*/15 * * * * cd /path/to/cointrader && .venv/bin/python scripts/collect_oi.py >> logs/collect_oi.log 2>&1
"""
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent))
import argparse
from datetime import datetime, timezone
import pandas as pd
from binance.client import Client
from dotenv import load_dotenv
import os
load_dotenv()
OI_PATH = Path("data/oi_history.parquet")
def collect(symbol: str = "XRPUSDT"):
client = Client(
api_key=os.getenv("BINANCE_API_KEY", ""),
api_secret=os.getenv("BINANCE_API_SECRET", ""),
)
result = client.futures_open_interest(symbol=symbol)
oi_value = float(result["openInterest"])
ts = datetime.now(timezone.utc)
new_row = pd.DataFrame([{
"timestamp": ts,
"symbol": symbol,
"open_interest": oi_value,
}])
if OI_PATH.exists():
existing = pd.read_parquet(OI_PATH)
combined = pd.concat([existing, new_row], ignore_index=True)
else:
OI_PATH.parent.mkdir(parents=True, exist_ok=True)
combined = new_row
combined.to_parquet(OI_PATH, index=False)
print(f"[{ts.isoformat()}] OI={oi_value:.2f}{OI_PATH}")
def main():
parser = argparse.ArgumentParser(description="OI 장기 수집")
parser.add_argument("--symbol", default="XRPUSDT")
args = parser.parse_args()
collect(symbol=args.symbol)
if __name__ == "__main__":
main()

View File

@@ -422,6 +422,113 @@ def walk_forward_auc(
print(f" 폴드별: {[round(a, 4) for a in aucs]}") print(f" 폴드별: {[round(a, 4) for a in aucs]}")
def compare(data_path: str, time_weight_decay: float = 2.0, tuned_params_path: str | None = None):
"""기존 피처 vs OI 파생 피처 추가 버전 A/B 비교."""
import warnings
print("=" * 70)
print(" OI 파생 피처 A/B 비교 (30일 데이터 기반, 방향성 참고용)")
print("=" * 70)
df_raw = pd.read_parquet(data_path)
base_cols = ["open", "high", "low", "close", "volume"]
btc_df = eth_df = None
if "close_btc" in df_raw.columns:
btc_df = df_raw[[c + "_btc" for c in base_cols]].copy()
btc_df.columns = base_cols
if "close_eth" in df_raw.columns:
eth_df = df_raw[[c + "_eth" for c in base_cols]].copy()
eth_df.columns = base_cols
df = df_raw[base_cols].copy()
if "oi_change" in df_raw.columns:
df["oi_change"] = df_raw["oi_change"]
if "funding_rate" in df_raw.columns:
df["funding_rate"] = df_raw["funding_rate"]
dataset = generate_dataset_vectorized(
df, btc_df=btc_df, eth_df=eth_df,
time_weight_decay=time_weight_decay,
negative_ratio=5,
)
if dataset.empty:
raise ValueError("데이터셋 생성 실패")
lgbm_params, weight_scale = _load_lgbm_params(tuned_params_path)
# Baseline: OI 파생 피처 제외
BASELINE_EXCLUDE = {"oi_change_ma5", "oi_price_spread"}
baseline_cols = [c for c in FEATURE_COLS if c in dataset.columns and c not in BASELINE_EXCLUDE]
new_cols = [c for c in FEATURE_COLS if c in dataset.columns]
results = {}
for label, cols in [("Baseline", baseline_cols), ("New", new_cols)]:
X = dataset[cols]
y = dataset["label"]
w = dataset["sample_weight"].values
source = dataset["source"].values if "source" in dataset.columns else np.full(len(X), "signal")
split = int(len(X) * 0.8)
X_tr, X_val = X.iloc[:split], X.iloc[split:]
y_tr, y_val = y.iloc[:split], y.iloc[split:]
w_tr = (w[:split] * weight_scale).astype(np.float32)
source_tr = source[:split]
balanced_idx = stratified_undersample(y_tr.values, source_tr, seed=42)
X_tr_b = X_tr.iloc[balanced_idx]
y_tr_b = y_tr.iloc[balanced_idx]
w_tr_b = w_tr[balanced_idx]
model = lgb.LGBMClassifier(**lgbm_params, random_state=42, verbose=-1)
with warnings.catch_warnings():
warnings.simplefilter("ignore")
model.fit(X_tr_b, y_tr_b, sample_weight=w_tr_b)
proba = model.predict_proba(X_val)[:, 1]
auc = roc_auc_score(y_val, proba) if len(np.unique(y_val)) > 1 else 0.5
precs, recs, thrs = precision_recall_curve(y_val, proba)
precs, recs = precs[:-1], recs[:-1]
valid_idx = np.where(recs >= 0.15)[0]
if len(valid_idx) > 0:
best_i = valid_idx[np.argmax(precs[valid_idx])]
thr, prec, rec = float(thrs[best_i]), float(precs[best_i]), float(recs[best_i])
else:
thr, prec, rec = 0.50, 0.0, 0.0
# Feature importance
imp = dict(zip(cols, model.feature_importances_))
top10 = sorted(imp.items(), key=lambda x: x[1], reverse=True)[:10]
results[label] = {
"auc": auc, "precision": prec, "recall": rec,
"threshold": thr, "n_val": len(y_val),
"n_val_pos": int(y_val.sum()), "top10": top10,
}
# 비교 테이블 출력
n_base = len(baseline_cols)
n_new = len(new_cols)
print(f"\n{'지표':<20} {f'Baseline({n_base})':>15} {f'New({n_new})':>15} {'Delta':>10}")
print("-" * 62)
for metric in ["auc", "precision", "recall", "threshold"]:
b = results["Baseline"][metric]
n = results["New"][metric]
d = n - b
sign = "+" if d > 0 else ""
print(f"{metric:<20} {b:>15.4f} {n:>15.4f} {sign}{d:>9.4f}")
n_val = results["Baseline"]["n_val"]
n_pos = results["Baseline"]["n_val_pos"]
print(f"\n검증셋: n={n_val} (양성={n_pos}, 음성={n_val - n_pos})")
print("⚠ 30일 데이터 기반 — 방향성 참고용\n")
print("Feature Importance Top 10 (New):")
for feat_name, imp_val in results["New"]["top10"]:
marker = " ← NEW" if feat_name in BASELINE_EXCLUDE else ""
print(f" {feat_name:<25} {imp_val:>6}{marker}")
def main(): def main():
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
parser.add_argument("--data", default="data/combined_15m.parquet") parser.add_argument("--data", default="data/combined_15m.parquet")
@@ -435,9 +542,13 @@ def main():
"--tuned-params", type=str, default=None, "--tuned-params", type=str, default=None,
help="Optuna 튜닝 결과 JSON 경로 (지정 시 기본 파라미터를 덮어씀)", help="Optuna 튜닝 결과 JSON 경로 (지정 시 기본 파라미터를 덮어씀)",
) )
parser.add_argument("--compare", action="store_true",
help="OI 파생 피처 추가 전후 A/B 성능 비교")
args = parser.parse_args() args = parser.parse_args()
if args.wf: if args.compare:
compare(args.data, time_weight_decay=args.decay, tuned_params_path=args.tuned_params)
elif args.wf:
walk_forward_auc( walk_forward_auc(
args.data, args.data,
time_weight_decay=args.decay, time_weight_decay=args.decay,

View File

@@ -1,4 +1,5 @@
import asyncio import asyncio
from collections import deque
import pandas as pd import pandas as pd
from loguru import logger from loguru import logger
from src.config import Config from src.config import Config
@@ -24,6 +25,8 @@ class TradingBot:
self._entry_quantity: float | None = None self._entry_quantity: float | None = None
self._is_reentering: bool = False # _close_and_reenter 중 콜백 상태 초기화 방지 self._is_reentering: bool = False # _close_and_reenter 중 콜백 상태 초기화 방지
self._prev_oi: float | None = None # OI 변화율 계산용 이전 값 self._prev_oi: float | None = None # OI 변화율 계산용 이전 값
self._oi_history: deque = deque(maxlen=5)
self._latest_ret_1: float = 0.0
self.stream = MultiSymbolStream( self.stream = MultiSymbolStream(
symbols=[config.symbol, "BTCUSDT", "ETHUSDT"], symbols=[config.symbol, "BTCUSDT", "ETHUSDT"],
interval="15m", interval="15m",
@@ -57,21 +60,43 @@ class TradingBot:
else: else:
logger.info("기존 포지션 없음 - 신규 진입 대기") logger.info("기존 포지션 없음 - 신규 진입 대기")
async def _fetch_market_microstructure(self) -> tuple[float, float]: async def _init_oi_history(self) -> None:
"""OI 변화율과 펀딩비를 실시간으로 조회한다. 실패 시 0.0으로 폴백.""" """봇 시작 시 최근 OI 변화율 히스토리를 조회하여 deque를 채운다."""
try:
changes = await self.exchange.get_oi_history(limit=5)
for c in changes:
self._oi_history.append(c)
if changes:
self._prev_oi = None
logger.info(f"OI 히스토리 초기화: {len(self._oi_history)}")
except Exception as e:
logger.warning(f"OI 히스토리 초기화 실패 (무시): {e}")
async def _fetch_market_microstructure(self) -> tuple[float, float, float, float]:
"""OI 변화율, 펀딩비, OI MA5, OI-가격 스프레드를 실시간으로 조회한다."""
oi_val, fr_val = await asyncio.gather( oi_val, fr_val = await asyncio.gather(
self.exchange.get_open_interest(), self.exchange.get_open_interest(),
self.exchange.get_funding_rate(), self.exchange.get_funding_rate(),
return_exceptions=True, return_exceptions=True,
) )
# None(API 실패) 또는 Exception이면 _calc_oi_change를 호출하지 않고 0.0 반환
if isinstance(oi_val, (int, float)) and oi_val > 0: if isinstance(oi_val, (int, float)) and oi_val > 0:
oi_change = self._calc_oi_change(float(oi_val)) oi_change = self._calc_oi_change(float(oi_val))
else: else:
oi_change = 0.0 oi_change = 0.0
fr_float = float(fr_val) if isinstance(fr_val, (int, float)) else 0.0 fr_float = float(fr_val) if isinstance(fr_val, (int, float)) else 0.0
logger.debug(f"OI={oi_val}, OI변화율={oi_change:.6f}, 펀딩비={fr_float:.6f}")
return oi_change, fr_float # OI 히스토리 업데이트 및 MA5 계산
self._oi_history.append(oi_change)
oi_ma5 = sum(self._oi_history) / len(self._oi_history) if self._oi_history else 0.0
# OI-가격 스프레드
oi_price_spread = oi_change - self._latest_ret_1
logger.debug(
f"OI={oi_val}, OI변화율={oi_change:.6f}, 펀딩비={fr_float:.6f}, "
f"OI_MA5={oi_ma5:.6f}, OI_Price_Spread={oi_price_spread:.6f}"
)
return oi_change, fr_float, oi_ma5, oi_price_spread
def _calc_oi_change(self, current_oi: float) -> float: def _calc_oi_change(self, current_oi: float) -> float:
"""이전 OI 대비 변화율을 계산한다. 첫 캔들은 0.0 반환.""" """이전 OI 대비 변화율을 계산한다. 첫 캔들은 0.0 반환."""
@@ -85,8 +110,14 @@ class TradingBot:
async def process_candle(self, df, btc_df=None, eth_df=None): async def process_candle(self, df, btc_df=None, eth_df=None):
self.ml_filter.check_and_reload() self.ml_filter.check_and_reload()
# 가격 수익률 계산 (oi_price_spread용)
if len(df) >= 2:
prev_close = df["close"].iloc[-2]
curr_close = df["close"].iloc[-1]
self._latest_ret_1 = (curr_close - prev_close) / prev_close if prev_close != 0 else 0.0
# 캔들 마감 시 OI/펀딩비 실시간 조회 (실패해도 0으로 폴백) # 캔들 마감 시 OI/펀딩비 실시간 조회 (실패해도 0으로 폴백)
oi_change, funding_rate = await self._fetch_market_microstructure() oi_change, funding_rate, oi_ma5, oi_price_spread = await self._fetch_market_microstructure()
if not self.risk.is_trading_allowed(): if not self.risk.is_trading_allowed():
logger.warning("리스크 한도 초과 - 거래 중단") logger.warning("리스크 한도 초과 - 거래 중단")
@@ -111,6 +142,7 @@ class TradingBot:
df_with_indicators, signal, df_with_indicators, signal,
btc_df=btc_df, eth_df=eth_df, btc_df=btc_df, eth_df=eth_df,
oi_change=oi_change, funding_rate=funding_rate, oi_change=oi_change, funding_rate=funding_rate,
oi_change_ma5=oi_ma5, oi_price_spread=oi_price_spread,
) )
if self.ml_filter.is_model_loaded(): if self.ml_filter.is_model_loaded():
if not self.ml_filter.should_enter(features): if not self.ml_filter.should_enter(features):
@@ -126,6 +158,7 @@ class TradingBot:
position, raw_signal, df_with_indicators, position, raw_signal, df_with_indicators,
btc_df=btc_df, eth_df=eth_df, btc_df=btc_df, eth_df=eth_df,
oi_change=oi_change, funding_rate=funding_rate, oi_change=oi_change, funding_rate=funding_rate,
oi_change_ma5=oi_ma5, oi_price_spread=oi_price_spread,
) )
async def _open_position(self, signal: str, df): async def _open_position(self, signal: str, df):
@@ -272,6 +305,8 @@ class TradingBot:
eth_df=None, eth_df=None,
oi_change: float = 0.0, oi_change: float = 0.0,
funding_rate: float = 0.0, funding_rate: float = 0.0,
oi_change_ma5: float = 0.0,
oi_price_spread: float = 0.0,
) -> None: ) -> None:
"""기존 포지션을 청산하고, ML 필터 통과 시 반대 방향으로 즉시 재진입한다.""" """기존 포지션을 청산하고, ML 필터 통과 시 반대 방향으로 즉시 재진입한다."""
# 재진입 플래그: User Data Stream 콜백이 신규 포지션 상태를 초기화하지 않도록 보호 # 재진입 플래그: User Data Stream 콜백이 신규 포지션 상태를 초기화하지 않도록 보호
@@ -288,6 +323,7 @@ class TradingBot:
df, signal, df, signal,
btc_df=btc_df, eth_df=eth_df, btc_df=btc_df, eth_df=eth_df,
oi_change=oi_change, funding_rate=funding_rate, oi_change=oi_change, funding_rate=funding_rate,
oi_change_ma5=oi_change_ma5, oi_price_spread=oi_price_spread,
) )
if not self.ml_filter.should_enter(features): if not self.ml_filter.should_enter(features):
logger.info(f"ML 필터 차단: {signal} 재진입 무시") logger.info(f"ML 필터 차단: {signal} 재진입 무시")
@@ -300,6 +336,7 @@ class TradingBot:
async def run(self): async def run(self):
logger.info(f"봇 시작: {self.config.symbol}, 레버리지 {self.config.leverage}x") logger.info(f"봇 시작: {self.config.symbol}, 레버리지 {self.config.leverage}x")
await self._recover_position() await self._recover_position()
await self._init_oi_history()
balance = await self.exchange.get_balance() balance = await self.exchange.get_balance()
self.risk.set_base_balance(balance) self.risk.set_base_balance(balance)
logger.info(f"기준 잔고 설정: {balance:.2f} USDT (동적 증거금 비율 기준점)") logger.info(f"기준 잔고 설정: {balance:.2f} USDT (동적 증거금 비율 기준점)")

View File

@@ -287,8 +287,18 @@ def _calc_features_vectorized(
else: else:
fr_raw = np.full(len(d), np.nan) fr_raw = np.full(len(d), np.nan)
result["oi_change"] = _rolling_zscore(oi_raw.astype(np.float64)) oi_z = _rolling_zscore(oi_raw.astype(np.float64), window=96)
result["funding_rate"] = _rolling_zscore(fr_raw.astype(np.float64)) result["oi_change"] = oi_z
result["funding_rate"] = _rolling_zscore(fr_raw.astype(np.float64), window=96)
# --- OI 파생 피처 ---
# 1. oi_change_ma5: OI 변화율의 5캔들 이동평균 (단기 추세)
oi_series = pd.Series(oi_raw.astype(np.float64))
oi_ma5_raw = oi_series.rolling(window=5, min_periods=1).mean().values
result["oi_change_ma5"] = _rolling_zscore(oi_ma5_raw, window=96)
# 2. oi_price_spread: z-scored OI - z-scored 가격 수익률 (연속값)
result["oi_price_spread"] = oi_z - ret_1_z
return result return result
@@ -384,7 +394,7 @@ def generate_dataset_vectorized(
feat_all = _calc_features_vectorized(d, signal_arr, btc_df=btc_df, eth_df=eth_df) feat_all = _calc_features_vectorized(d, signal_arr, btc_df=btc_df, eth_df=eth_df)
# 신호 발생 + NaN 없음 + 미래 데이터 충분한 인덱스만 # 신호 발생 + NaN 없음 + 미래 데이터 충분한 인덱스만
OPTIONAL_COLS = {"oi_change", "funding_rate"} OPTIONAL_COLS = {"oi_change", "funding_rate", "oi_change_ma5", "oi_price_spread"}
available_cols_for_nan_check = [ available_cols_for_nan_check = [
c for c in FEATURE_COLS c for c in FEATURE_COLS
if c in feat_all.columns and c not in OPTIONAL_COLS if c in feat_all.columns and c not in OPTIONAL_COLS

View File

@@ -173,6 +173,30 @@ class BinanceFuturesClient:
logger.warning(f"펀딩비 조회 실패 (무시): {e}") logger.warning(f"펀딩비 조회 실패 (무시): {e}")
return None return None
async def get_oi_history(self, limit: int = 5) -> list[float]:
"""최근 OI 변화율 히스토리를 조회한다 (봇 초기화용). 실패 시 빈 리스트."""
loop = asyncio.get_event_loop()
try:
result = await loop.run_in_executor(
None,
lambda: self.client.futures_open_interest_hist(
symbol=self.config.symbol, period="15m", limit=limit + 1,
),
)
if len(result) < 2:
return []
oi_values = [float(r["sumOpenInterest"]) for r in result]
changes = []
for i in range(1, len(oi_values)):
if oi_values[i - 1] > 0:
changes.append((oi_values[i] - oi_values[i - 1]) / oi_values[i - 1])
else:
changes.append(0.0)
return changes
except Exception as e:
logger.warning(f"OI 히스토리 조회 실패 (무시): {e}")
return []
async def create_listen_key(self) -> str: async def create_listen_key(self) -> str:
"""POST /fapi/v1/listenKey — listenKey 신규 발급""" """POST /fapi/v1/listenKey — listenKey 신규 발급"""
loop = asyncio.get_event_loop() loop = asyncio.get_event_loop()

View File

@@ -9,8 +9,9 @@ FEATURE_COLS = [
"eth_ret_1", "eth_ret_3", "eth_ret_5", "eth_ret_1", "eth_ret_3", "eth_ret_5",
"xrp_btc_rs", "xrp_eth_rs", "xrp_btc_rs", "xrp_eth_rs",
# 시장 미시구조: OI 변화율(z-score), 펀딩비(z-score) # 시장 미시구조: OI 변화율(z-score), 펀딩비(z-score)
# parquet에 oi_change/funding_rate 컬럼이 없으면 dataset_builder에서 0으로 채움
"oi_change", "funding_rate", "oi_change", "funding_rate",
# OI 파생 피처
"oi_change_ma5", "oi_price_spread",
"adx", "adx",
] ]
@@ -37,12 +38,14 @@ def build_features(
eth_df: pd.DataFrame | None = None, eth_df: pd.DataFrame | None = None,
oi_change: float | None = None, oi_change: float | None = None,
funding_rate: float | None = None, funding_rate: float | None = None,
oi_change_ma5: float | None = None,
oi_price_spread: float | None = None,
) -> pd.Series: ) -> pd.Series:
""" """
기술 지표가 계산된 DataFrame의 마지막 행에서 ML 피처를 추출한다. 기술 지표가 계산된 DataFrame의 마지막 행에서 ML 피처를 추출한다.
btc_df, eth_df가 제공되면 24개 피처를, 없으면 16개 피처를 반환한다. btc_df, eth_df가 제공되면 26개 피처를, 없으면 18개 피처를 반환한다.
signal: "LONG" | "SHORT" signal: "LONG" | "SHORT"
oi_change, funding_rate: 실제 값이 제공되면 사용, 없으면 0.0으로 채운다. oi_change, funding_rate, oi_change_ma5, oi_price_spread: 실제 값이 제공되면 사용, 없으면 0.0으로 채운다.
""" """
last = df.iloc[-1] last = df.iloc[-1]
close = last["close"] close = last["close"]
@@ -132,8 +135,10 @@ def build_features(
}) })
# 실시간에서 실제 값이 제공되면 사용, 없으면 0으로 채운다 # 실시간에서 실제 값이 제공되면 사용, 없으면 0으로 채운다
base["oi_change"] = float(oi_change) if oi_change is not None else 0.0 base["oi_change"] = float(oi_change) if oi_change is not None else 0.0
base["funding_rate"] = float(funding_rate) if funding_rate is not None else 0.0 base["funding_rate"] = float(funding_rate) if funding_rate is not None else 0.0
base["oi_change_ma5"] = float(oi_change_ma5) if oi_change_ma5 is not None else 0.0
base["oi_price_spread"] = float(oi_price_spread) if oi_price_spread is not None else 0.0
base["adx"] = float(last.get("adx", 0)) base["adx"] = float(last.get("adx", 0))
return pd.Series(base) return pd.Series(base)

View File

@@ -227,6 +227,42 @@ async def test_process_candle_fetches_oi_and_funding(config, sample_df):
assert "funding_rate" in call_kwargs assert "funding_rate" in call_kwargs
def test_bot_has_oi_history_deque(config):
"""봇이 OI 히스토리 deque를 가져야 한다."""
from collections import deque
with patch("src.bot.BinanceFuturesClient"):
bot = TradingBot(config)
assert isinstance(bot._oi_history, deque)
assert bot._oi_history.maxlen == 5
@pytest.mark.asyncio
async def test_init_oi_history_fills_deque(config):
"""_init_oi_history가 deque를 채워야 한다."""
with patch("src.bot.BinanceFuturesClient"):
bot = TradingBot(config)
bot.exchange = AsyncMock()
bot.exchange.get_oi_history = AsyncMock(return_value=[0.01, -0.02, 0.03, -0.01, 0.02])
await bot._init_oi_history()
assert len(bot._oi_history) == 5
@pytest.mark.asyncio
async def test_fetch_microstructure_returns_4_tuple(config):
"""_fetch_market_microstructure가 4-tuple을 반환해야 한다."""
with patch("src.bot.BinanceFuturesClient"):
bot = TradingBot(config)
bot.exchange = AsyncMock()
bot.exchange.get_open_interest = AsyncMock(return_value=5000000.0)
bot.exchange.get_funding_rate = AsyncMock(return_value=0.0001)
bot._prev_oi = 4900000.0
bot._oi_history.extend([0.01, -0.02, 0.03, -0.01])
bot._latest_ret_1 = 0.01
result = await bot._fetch_market_microstructure()
assert len(result) == 4
def test_calc_oi_change_first_candle_returns_zero(config): def test_calc_oi_change_first_candle_returns_zero(config):
"""첫 캔들은 0.0을 반환하고 _prev_oi를 설정한다.""" """첫 캔들은 0.0을 반환하고 _prev_oi를 설정한다."""
with patch("src.bot.BinanceFuturesClient"): with patch("src.bot.BinanceFuturesClient"):

View File

@@ -266,3 +266,74 @@ def test_stratified_undersample_preserves_signal():
signal_indices = np.where(source == "signal")[0] signal_indices = np.where(source == "signal")[0]
for si in signal_indices: for si in signal_indices:
assert si in idx, f"signal 인덱스 {si}가 누락됨" assert si in idx, f"signal 인덱스 {si}가 누락됨"
def test_oi_derived_features_present():
"""OI 파생 피처 2개가 결과에 포함되어야 한다."""
import numpy as np
import pandas as pd
from src.dataset_builder import _calc_features_vectorized, _calc_signals, _calc_indicators
n = 300
np.random.seed(42)
df = pd.DataFrame({
"open": np.random.uniform(1, 2, n),
"high": np.random.uniform(2, 3, n),
"low": np.random.uniform(0.5, 1, n),
"close": np.random.uniform(1, 2, n),
"volume": np.random.uniform(1000, 5000, n),
"oi_change": np.concatenate([np.zeros(100), np.random.uniform(-0.05, 0.05, 200)]),
})
d = _calc_indicators(df)
sig = _calc_signals(d)
feat = _calc_features_vectorized(d, sig)
assert "oi_change_ma5" in feat.columns, "oi_change_ma5 컬럼이 없음"
assert "oi_price_spread" in feat.columns, "oi_price_spread 컬럼이 없음"
def test_oi_derived_features_nan_when_no_oi():
"""oi_change 컬럼이 없으면 파생 피처도 nan이어야 한다."""
import numpy as np
import pandas as pd
from src.dataset_builder import _calc_features_vectorized, _calc_signals, _calc_indicators
n = 200
np.random.seed(0)
df = pd.DataFrame({
"open": np.random.uniform(1, 2, n),
"high": np.random.uniform(2, 3, n),
"low": np.random.uniform(0.5, 1, n),
"close": np.random.uniform(1, 2, n),
"volume": np.random.uniform(1000, 5000, n),
})
d = _calc_indicators(df)
sig = _calc_signals(d)
feat = _calc_features_vectorized(d, sig)
assert feat["oi_change_ma5"].isna().all(), "oi_change 컬럼 없을 때 oi_change_ma5는 전부 nan이어야 함"
assert feat["oi_price_spread"].isna().all(), "oi_change 컬럼 없을 때 oi_price_spread는 전부 nan이어야 함"
def test_oi_price_spread_is_continuous():
"""oi_price_spread는 바이너리가 아닌 연속값이어야 한다."""
import numpy as np
import pandas as pd
from src.dataset_builder import _calc_features_vectorized, _calc_signals, _calc_indicators
n = 300
np.random.seed(42)
df = pd.DataFrame({
"open": np.random.uniform(1, 2, n),
"high": np.random.uniform(2, 3, n),
"low": np.random.uniform(0.5, 1, n),
"close": np.random.uniform(1, 2, n),
"volume": np.random.uniform(1000, 5000, n),
"oi_change": np.random.uniform(-0.05, 0.05, n),
})
d = _calc_indicators(df)
sig = _calc_signals(d)
feat = _calc_features_vectorized(d, sig)
valid = feat["oi_price_spread"].dropna()
assert len(valid.unique()) > 2, "oi_price_spread는 연속값이어야 함 (2개 초과 유니크 값)"

View File

@@ -113,3 +113,43 @@ async def test_get_funding_rate_error_returns_none(exchange):
) )
result = await exchange.get_funding_rate() result = await exchange.get_funding_rate()
assert result is None assert result is None
@pytest.mark.asyncio
async def test_get_oi_history_returns_changes(exchange):
"""get_oi_history()가 OI 변화율 리스트를 반환하는지 확인."""
exchange.client.futures_open_interest_hist = MagicMock(
return_value=[
{"sumOpenInterest": "1000000"},
{"sumOpenInterest": "1010000"},
{"sumOpenInterest": "1005000"},
{"sumOpenInterest": "1020000"},
{"sumOpenInterest": "1015000"},
{"sumOpenInterest": "1030000"},
]
)
result = await exchange.get_oi_history(limit=5)
assert len(result) == 5
assert isinstance(result[0], float)
# 첫 번째 변화율: (1010000 - 1000000) / 1000000 = 0.01
assert abs(result[0] - 0.01) < 1e-6
@pytest.mark.asyncio
async def test_get_oi_history_error_returns_empty(exchange):
"""API 오류 시 빈 리스트 반환 확인."""
exchange.client.futures_open_interest_hist = MagicMock(
side_effect=Exception("API error")
)
result = await exchange.get_oi_history(limit=5)
assert result == []
@pytest.mark.asyncio
async def test_get_oi_history_insufficient_data_returns_empty(exchange):
"""데이터가 부족하면 빈 리스트 반환 확인."""
exchange.client.futures_open_interest_hist = MagicMock(
return_value=[{"sumOpenInterest": "1000000"}]
)
result = await exchange.get_oi_history(limit=5)
assert result == []

View File

@@ -21,17 +21,17 @@ def _make_df(n=10, base_price=1.0):
}) })
def test_build_features_with_btc_eth_has_24_features(): def test_build_features_with_btc_eth_has_26_features():
xrp_df = _make_df(10, base_price=1.0) xrp_df = _make_df(10, base_price=1.0)
btc_df = _make_df(10, base_price=50000.0) btc_df = _make_df(10, base_price=50000.0)
eth_df = _make_df(10, base_price=3000.0) eth_df = _make_df(10, base_price=3000.0)
features = build_features(xrp_df, "LONG", btc_df=btc_df, eth_df=eth_df) features = build_features(xrp_df, "LONG", btc_df=btc_df, eth_df=eth_df)
assert len(features) == 24 assert len(features) == 26
def test_build_features_without_btc_eth_has_16_features(): def test_build_features_without_btc_eth_has_18_features():
xrp_df = _make_df(10, base_price=1.0) xrp_df = _make_df(10, base_price=1.0)
features = build_features(xrp_df, "LONG") features = build_features(xrp_df, "LONG")
assert len(features) == 16 assert len(features) == 18
def test_build_features_btc_ret_1_correct(): def test_build_features_btc_ret_1_correct():
xrp_df = _make_df(10, base_price=1.0) xrp_df = _make_df(10, base_price=1.0)
@@ -51,8 +51,9 @@ def test_build_features_rs_zero_when_btc_ret_zero():
assert features["xrp_btc_rs"] == 0.0 assert features["xrp_btc_rs"] == 0.0
def test_feature_cols_has_24_items(): def test_feature_cols_has_24_items():
"""Legacy test — updated to 26 after OI derived features added."""
from src.ml_features import FEATURE_COLS from src.ml_features import FEATURE_COLS
assert len(FEATURE_COLS) == 24 assert len(FEATURE_COLS) == 26
def make_df(n=100): def make_df(n=100):
@@ -139,3 +140,31 @@ def test_build_features_defaults_to_zero_when_not_provided(sample_df_with_indica
feat = build_features(sample_df_with_indicators, signal="LONG") feat = build_features(sample_df_with_indicators, signal="LONG")
assert feat["oi_change"] == pytest.approx(0.0) assert feat["oi_change"] == pytest.approx(0.0)
assert feat["funding_rate"] == pytest.approx(0.0) assert feat["funding_rate"] == pytest.approx(0.0)
def test_feature_cols_has_26_items():
from src.ml_features import FEATURE_COLS
assert len(FEATURE_COLS) == 26
def test_build_features_with_oi_derived_params():
"""oi_change_ma5, oi_price_spread 파라미터가 피처에 반영된다."""
xrp_df = _make_df(10, base_price=1.0)
btc_df = _make_df(10, base_price=50000.0)
eth_df = _make_df(10, base_price=3000.0)
features = build_features(
xrp_df, "LONG",
btc_df=btc_df, eth_df=eth_df,
oi_change=0.05, funding_rate=0.0002,
oi_change_ma5=0.03, oi_price_spread=0.12,
)
assert features["oi_change_ma5"] == pytest.approx(0.03)
assert features["oi_price_spread"] == pytest.approx(0.12)
def test_build_features_oi_derived_defaults_to_zero():
"""oi_change_ma5, oi_price_spread 미제공 시 0.0으로 채워진다."""
xrp_df = _make_df(10, base_price=1.0)
features = build_features(xrp_df, "LONG")
assert features["oi_change_ma5"] == pytest.approx(0.0)
assert features["oi_price_spread"] == pytest.approx(0.0)