chore: update active LGBM parameters and add new training log entry

- Updated timestamp and elapsed seconds in models/active_lgbm_params.json. - Adjusted baseline AUC and fold AUCs to reflect new model performance. - Added a new entry in models/training_log.json with detailed metrics from the latest training run, including tuned parameters and model path. Made-with: Cursor
feat: implement Active Config pattern for automatic param promotion
2026-03-02 15:03:35 +09:00 · 2026-03-02 14:56:42 +09:00 · 2026-03-02 14:52:41 +09:00 · 2026-03-02 14:50:50 +09:00 · 2026-03-02 14:45:15 +09:00 · 2026-03-02 14:41:13 +09:00
10 changed files with 2431 additions and 30 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -7,6 +7,8 @@ logs/
 .venv/
 venv/
 models/*.pkl
+models/*.onnx
+models/tune_results_*.json
 data/*.parquet
 .worktrees/
 .DS_Store
--- a/README.md
+++ b/README.md
@@ -48,6 +48,7 @@ cointrader/
 │   ├── train_model.py         # LightGBM 모델 학습 (CPU)
 │   ├── train_mlx_model.py     # MLX 신경망 학습 (Apple Silicon GPU)
 │   ├── train_and_deploy.sh    # 전체 파이프라인 (수집 → 학습 → LXC 배포)
+│   ├── tune_hyperparams.py    # Optuna 하이퍼파라미터 자동 탐색 (수동 트리거)
 │   ├── deploy_model.sh        # 모델 파일 LXC 서버 전송
 │   └── run_tests.sh           # 전체 테스트 실행
 ├── models/                    # 학습된 모델 저장 (.pkl / .onnx)
@@ -160,6 +161,27 @@ bash scripts/deploy_model.sh mlx    # MLX (ONNX)

 > **모델 핫리로드**: 봇이 실행 중일 때 모델 파일을 교체하면, 다음 캔들 마감 시 자동으로 감지해 리로드합니다. 봇 재시작이 필요 없습니다.

+### 하이퍼파라미터 자동 튜닝 (Optuna)
+
+봇 성능이 저하되거나 데이터가 충분히 축적되었을 때 Optuna로 최적 LightGBM 파라미터를 탐색합니다.
+결과를 확인하고 직접 승인한 후 재학습에 반영하는 **수동 트리거** 방식입니다.
+
+```bash
+# 기본 실행 (50 trials, 5폴드 Walk-Forward, ~30분)
+python scripts/tune_hyperparams.py
+
+# 빠른 테스트 (10 trials, 3폴드, ~5분)
+python scripts/tune_hyperparams.py --trials 10 --folds 3
+
+# 베이스라인 측정 없이 탐색만
+python scripts/tune_hyperparams.py --no-baseline
+```
+
+결과는 `models/tune_results_YYYYMMDD_HHMMSS.json`에 저장됩니다.
+콘솔에 Best Params, 베이스라인 대비 개선폭, 폴드별 AUC를 출력하므로 직접 확인 후 판단하세요.
+
+> **주의**: Optuna가 찾은 파라미터는 과적합 위험이 있습니다. Best Params를 `train_model.py`에 반영하기 전에 반드시 폴드별 AUC 분산과 개선폭을 검토하세요.
+
 ### Apple Silicon GPU 가속 학습 (M1/M2/M3/M4)

 M 시리즈 맥에서는 MLX를 사용해 통합 GPU(Metal)로 학습할 수 있습니다.
--- a/docs/plans/2026-03-02-optuna-hyperparam-tuning-design.md
+++ b/docs/plans/2026-03-02-optuna-hyperparam-tuning-design.md
@@ -0,0 +1,184 @@
+# Optuna 하이퍼파라미터 자동 튜닝 설계 문서
+
+**작성일:** 2026-03-02  
+**목표:** 봇 운영 로그/학습 결과를 바탕으로 LightGBM 하이퍼파라미터를 Optuna로 자동 탐색하고, 사람이 결과를 확인·승인한 후 재학습에 반영하는 수동 트리거 파이프라인 구축
+
+---
+
+## 배경 및 동기
+
+현재 `train_model.py`의 LightGBM 파라미터는 하드코딩되어 있다. 봇 성능이 저하되거나 데이터가 축적될 때마다 사람이 직접 파라미터를 조정해야 한다. 이를 Optuna로 자동화하되, 과적합 위험을 방지하기 위해 **사람이 결과를 먼저 확인하고 승인하는 구조**를 유지한다.
+
+---
+
+## 구현 범위 (2단계)
+
+### 1단계 (현재): LightGBM 하이퍼파라미터 튜닝
+- `scripts/tune_hyperparams.py` 신규 생성
+- Optuna + Walk-Forward AUC 목적 함수
+- 결과를 JSON + 콘솔 리포트로 출력
+
+### 2단계 (추후): 기술 지표 파라미터 확장
+- RSI 임계값, MACD 가중치, Stochastic RSI 임계값, 거래량 배수, 진입 점수 임계값 등을 탐색 공간에 추가
+- `dataset_builder.py`의 `_calc_signals()` 파라미터화 필요
+
+---
+
+## 아키텍처
+
+```
+scripts/tune_hyperparams.py
+├── load_dataset()              ← 데이터 로드 + 벡터화 데이터셋 1회 생성 (캐싱)
+├── objective(trial, dataset)   ← Optuna trial 함수
+│   ├── trial.suggest_*()       ← 하이퍼파라미터 샘플링
+│   ├── num_leaves 상한 강제    ← 2^max_depth - 1 제약
+│   └── _walk_forward_cv()      ← Walk-Forward 교차검증 → 평균 AUC 반환
+├── run_study()                 ← Optuna study 실행 (TPESampler + MedianPruner)
+├── print_report()              ← 콘솔 리포트 출력
+└── save_results()              ← JSON 저장 (models/tune_results_YYYYMMDD_HHMMSS.json)
+```
+
+---
+
+## 탐색 공간 (소규모 데이터셋 보수적 설계)
+
+| 파라미터 | 범위 | 타입 | 근거 |
+|---|---|---|---|
+| `n_estimators` | 100 ~ 600 | int | 데이터 적을 때 500+ 트리는 과적합 |
+| `learning_rate` | 0.01 ~ 0.2 | float (log) | 낮을수록 일반화 유리 |
+| `max_depth` | 2 ~ 7 | int | 트리 깊이 상한 강제 |
+| `num_leaves` | 7 ~ min(31, 2^max_depth-1) | int | **핵심**: leaf-wise 과적합 방지 |
+| `min_child_samples` | 10 ~ 50 | int | 리프당 최소 샘플 수 |
+| `subsample` | 0.5 ~ 1.0 | float | 행 샘플링 |
+| `colsample_bytree` | 0.5 ~ 1.0 | float | 열 샘플링 |
+| `reg_alpha` | 1e-4 ~ 1.0 | float (log) | L1 정규화 |
+| `reg_lambda` | 1e-4 ~ 1.0 | float (log) | L2 정규화 |
+| `time_weight_decay` | 0.5 ~ 4.0 | float | 시간 가중치 강도 |
+
+### 핵심 제약: `num_leaves <= 2^max_depth - 1`
+
+LightGBM은 leaf-wise 성장 전략을 사용하므로, `num_leaves`가 `2^max_depth - 1`을 초과하면 `max_depth` 제약이 무의미해진다. trial 내에서 `max_depth`를 먼저 샘플링한 후 `num_leaves` 상한을 동적으로 계산하여 강제한다.
+
+```python
+max_depth = trial.suggest_int("max_depth", 2, 7)
+max_leaves = min(31, 2 ** max_depth - 1)
+num_leaves = trial.suggest_int("num_leaves", 7, max_leaves)
+```
+
+---
+
+## 목적 함수: Walk-Forward AUC
+
+기존 `train_model.py`의 `walk_forward_auc()` 로직을 재활용한다. 데이터셋은 study 시작 전 1회만 생성하여 모든 trial이 공유한다 (속도 최적화).
+
+```
+전체 데이터셋 (N개 샘플)
+├── 폴드 1: 학습[0:60%] → 검증[60%:68%]
+├── 폴드 2: 학습[0:68%] → 검증[68%:76%]
+├── 폴드 3: 학습[0:76%] → 검증[76%:84%]
+├── 폴드 4: 학습[0:84%] → 검증[84%:92%]
+└── 폴드 5: 학습[0:92%] → 검증[92%:100%]
+목적 함수 = 5폴드 평균 AUC (최대화)
+```
+
+### Pruning (조기 종료)
+
+`MedianPruner` 적용: 각 폴드 완료 후 중간 AUC를 Optuna에 보고. 이전 trial들의 중앙값보다 낮으면 나머지 폴드를 건너뛰고 trial 종료. 전체 탐색 시간 ~40% 단축 효과.
+
+---
+
+## 출력 형식
+
+### 콘솔 리포트
+
+```
+============================================================
+  Optuna 튜닝 완료 | 50 trials | 소요: 28분 42초
+============================================================
+  Best AUC : 0.6234 (Trial #31)
+  Baseline : 0.5891 (현재 train_model.py 고정값)
+  개선폭   : +0.0343 (+5.8%)
+------------------------------------------------------------
+  Best Parameters:
+    n_estimators      : 320
+    learning_rate     : 0.0412
+    max_depth         : 4
+    num_leaves        : 15
+    min_child_samples : 28
+    subsample         : 0.72
+    colsample_bytree  : 0.81
+    reg_alpha         : 0.0023
+    reg_lambda         : 0.0891
+    time_weight_decay : 2.31
+------------------------------------------------------------
+  Walk-Forward 폴드별 AUC:
+    폴드 1: 0.6102
+    폴드 2: 0.6341
+    폴드 3: 0.6198
+    폴드 4: 0.6287
+    폴드 5: 0.6241
+    평균: 0.6234 ± 0.0082
+------------------------------------------------------------
+  결과 저장: models/tune_results_20260302_143022.json
+  다음 단계: python scripts/train_model.py --tuned-params models/tune_results_20260302_143022.json
+============================================================
+```
+
+### JSON 저장 (`models/tune_results_YYYYMMDD_HHMMSS.json`)
+
+```json
+{
+  "timestamp": "2026-03-02T14:30:22",
+  "n_trials": 50,
+  "elapsed_sec": 1722,
+  "baseline_auc": 0.5891,
+  "best_trial": {
+    "number": 31,
+    "auc": 0.6234,
+    "fold_aucs": [0.6102, 0.6341, 0.6198, 0.6287, 0.6241],
+    "params": { ... }
+  },
+  "all_trials": [ ... ]
+}
+```
+
+---
+
+## 사용법
+
+```bash
+# 기본 실행 (50 trials, 5폴드)
+python scripts/tune_hyperparams.py
+
+# 빠른 테스트 (10 trials, 3폴드)
+python scripts/tune_hyperparams.py --trials 10 --folds 3
+
+# 데이터 경로 지정
+python scripts/tune_hyperparams.py --data data/combined_15m.parquet --trials 100
+```
+
+---
+
+## 파일 변경 목록
+
+| 파일 | 변경 | 설명 |
+|---|---|---|
+| `scripts/tune_hyperparams.py` | **신규 생성** | Optuna 튜닝 스크립트 |
+| `requirements.txt` | **수정** | `optuna` 의존성 추가 |
+| `README.md` | **수정** | 튜닝 사용법 섹션 추가 |
+
+---
+
+## 향후 확장 (2단계)
+
+`dataset_builder.py`의 `_calc_signals()` 함수를 파라미터화하여 기술 지표 임계값도 탐색 공간에 추가:
+
+```python
+# 추가될 탐색 공간 예시
+rsi_long_threshold  = trial.suggest_int("rsi_long",  25, 40)
+rsi_short_threshold = trial.suggest_int("rsi_short", 60, 75)
+vol_surge_mult      = trial.suggest_float("vol_surge_mult", 1.2, 2.5)
+entry_threshold     = trial.suggest_int("entry_threshold", 3, 5)
+stoch_low           = trial.suggest_int("stoch_low",  10, 30)
+stoch_high          = trial.suggest_int("stoch_high", 70, 90)
+```
--- a/docs/plans/2026-03-02-optuna-hyperparam-tuning-plan.md
+++ b/docs/plans/2026-03-02-optuna-hyperparam-tuning-plan.md
@@ -0,0 +1,569 @@
+# Optuna 하이퍼파라미터 자동 튜닝 Implementation Plan
+
+> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
+
+**Goal:** `scripts/tune_hyperparams.py`를 신규 생성하여 Optuna + Walk-Forward AUC 기반 LightGBM 하이퍼파라미터 자동 탐색 파이프라인을 구축한다.
+
+**Architecture:** 데이터셋을 study 시작 전 1회만 생성해 캐싱하고, 각 Optuna trial에서 LightGBM 파라미터를 샘플링 → Walk-Forward 5폴드 AUC를 목적 함수로 최대화한다. `num_leaves <= 2^max_depth - 1` 제약을 코드 레벨에서 강제하여 소규모 데이터셋 과적합을 방지한다. 결과는 콘솔 리포트 + JSON 파일로 출력한다.
+
+**Tech Stack:** Python 3.11+, optuna, lightgbm, numpy, pandas, scikit-learn (기존 의존성 재활용)
+
+**설계 문서:** `docs/plans/2026-03-02-optuna-hyperparam-tuning-design.md`
+
+---
+
+## Task 1: optuna 의존성 추가
+
+**Files:**
+- Modify: `requirements.txt`
+
+**Step 1: requirements.txt에 optuna 추가**
+
+```
+optuna>=3.6.0
+```
+
+`requirements.txt` 파일 끝에 추가한다.
+
+**Step 2: 설치 확인 (로컬)**
+
+```bash
+pip install optuna
+python -c "import optuna; print(optuna.__version__)"
+```
+
+Expected: 버전 번호 출력 (예: `3.6.0`)
+
+**Step 3: Commit**
+
+```bash
+git add requirements.txt
+git commit -m "feat: add optuna dependency for hyperparameter tuning"
+```
+
+---
+
+## Task 2: `scripts/tune_hyperparams.py` 핵심 구조 생성
+
+**Files:**
+- Create: `scripts/tune_hyperparams.py`
+
+**Step 1: 파일 생성 — 전체 코드**
+
+아래 코드를 `scripts/tune_hyperparams.py`로 저장한다.
+
+```python
+"""
+Optuna를 사용한 LightGBM 하이퍼파라미터 자동 탐색.
+
+사용법:
+    python scripts/tune_hyperparams.py                          # 기본 (50 trials, 5폴드)
+    python scripts/tune_hyperparams.py --trials 10 --folds 3   # 빠른 테스트
+    python scripts/tune_hyperparams.py --data data/combined_15m.parquet --trials 100
+
+결과:
+    - 콘솔: Best Params + Walk-Forward 리포트
+    - JSON: models/tune_results_YYYYMMDD_HHMMSS.json
+"""
+import sys
+import warnings
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).parent.parent))
+
+import argparse
+import json
+import time
+from datetime import datetime
+
+import numpy as np
+import pandas as pd
+import lightgbm as lgb
+import optuna
+from optuna.samplers import TPESampler
+from optuna.pruners import MedianPruner
+from sklearn.metrics import roc_auc_score
+
+from src.ml_features import FEATURE_COLS
+from src.dataset_builder import generate_dataset_vectorized
+
+
+# ──────────────────────────────────────────────
+# 데이터 로드 및 데이터셋 생성 (1회 캐싱)
+# ──────────────────────────────────────────────
+
+def load_dataset(data_path: str) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
+    """
+    parquet 로드 → 벡터화 데이터셋 생성 → (X, y, w) numpy 배열 반환.
+    study 시작 전 1회만 호출하여 모든 trial이 공유한다.
+    """
+    print(f"데이터 로드: {data_path}")
+    df_raw = pd.read_parquet(data_path)
+    print(f"캔들 수: {len(df_raw):,}, 컬럼: {list(df_raw.columns)}")
+
+    base_cols = ["open", "high", "low", "close", "volume"]
+    btc_df = eth_df = None
+
+    if "close_btc" in df_raw.columns:
+        btc_df = df_raw[[c + "_btc" for c in base_cols]].copy()
+        btc_df.columns = base_cols
+        print("BTC 피처 활성화")
+
+    if "close_eth" in df_raw.columns:
+        eth_df = df_raw[[c + "_eth" for c in base_cols]].copy()
+        eth_df.columns = base_cols
+        print("ETH 피처 활성화")
+
+    df = df_raw[base_cols].copy()
+
+    print("\n데이터셋 생성 중 (1회만 실행)...")
+    dataset = generate_dataset_vectorized(df, btc_df=btc_df, eth_df=eth_df, time_weight_decay=0.0)
+
+    if dataset.empty or "label" not in dataset.columns:
+        raise ValueError("데이터셋 생성 실패: 샘플 0개")
+
+    actual_feature_cols = [c for c in FEATURE_COLS if c in dataset.columns]
+    X = dataset[actual_feature_cols].values.astype(np.float32)
+    y = dataset["label"].values.astype(np.int8)
+    w = dataset["sample_weight"].values.astype(np.float32)
+
+    pos = y.sum()
+    neg = (y == 0).sum()
+    print(f"데이터셋 완성: {len(dataset):,}개 샘플 (양성={pos:.0f}, 음성={neg:.0f})")
+    print(f"사용 피처: {len(actual_feature_cols)}개\n")
+
+    return X, y, w
+
+
+# ──────────────────────────────────────────────
+# Walk-Forward 교차검증
+# ──────────────────────────────────────────────
+
+def _walk_forward_cv(
+    X: np.ndarray,
+    y: np.ndarray,
+    w: np.ndarray,
+    params: dict,
+    n_splits: int,
+    train_ratio: float,
+    trial: optuna.Trial | None = None,
+) -> tuple[float, list[float]]:
+    """
+    Walk-Forward 교차검증으로 평균 AUC를 반환한다.
+    trial이 제공되면 각 폴드 후 Optuna에 중간 값을 보고하여 Pruning을 활성화한다.
+    """
+    n = len(X)
+    step = max(1, int(n * (1 - train_ratio) / n_splits))
+    train_end_start = int(n * train_ratio)
+
+    fold_aucs = []
+
+    for fold_idx in range(n_splits):
+        tr_end = train_end_start + fold_idx * step
+        val_end = tr_end + step
+        if val_end > n:
+            break
+
+        X_tr, y_tr, w_tr = X[:tr_end], y[:tr_end], w[:tr_end]
+        X_val, y_val = X[tr_end:val_end], y[tr_end:val_end]
+
+        # 클래스 불균형 처리: 언더샘플링 (시간 순서 유지)
+        pos_idx = np.where(y_tr == 1)[0]
+        neg_idx = np.where(y_tr == 0)[0]
+        if len(neg_idx) > len(pos_idx) and len(pos_idx) > 0:
+            rng = np.random.default_rng(42)
+            neg_idx = rng.choice(neg_idx, size=len(pos_idx), replace=False)
+        bal_idx = np.sort(np.concatenate([pos_idx, neg_idx]))
+
+        if len(bal_idx) < 20 or len(np.unique(y_val)) < 2:
+            fold_aucs.append(0.5)
+            continue
+
+        model = lgb.LGBMClassifier(**params, random_state=42, verbose=-1)
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            model.fit(X_tr[bal_idx], y_tr[bal_idx], sample_weight=w_tr[bal_idx])
+
+        proba = model.predict_proba(X_val)[:, 1]
+        auc = roc_auc_score(y_val, proba) if len(np.unique(y_val)) > 1 else 0.5
+        fold_aucs.append(auc)
+
+        # Optuna Pruning: 중간 값 보고
+        if trial is not None:
+            trial.report(float(np.mean(fold_aucs)), step=fold_idx)
+            if trial.should_prune():
+                raise optuna.TrialPruned()
+
+    mean_auc = float(np.mean(fold_aucs)) if fold_aucs else 0.5
+    return mean_auc, fold_aucs
+
+
+# ──────────────────────────────────────────────
+# Optuna 목적 함수
+# ──────────────────────────────────────────────
+
+def make_objective(
+    X: np.ndarray,
+    y: np.ndarray,
+    w: np.ndarray,
+    n_splits: int,
+    train_ratio: float,
+):
+    """클로저로 데이터셋을 캡처한 목적 함수를 반환한다."""
+
+    def objective(trial: optuna.Trial) -> float:
+        # ── 하이퍼파라미터 샘플링 ──
+        n_estimators = trial.suggest_int("n_estimators", 100, 600)
+        learning_rate = trial.suggest_float("learning_rate", 0.01, 0.2, log=True)
+        max_depth = trial.suggest_int("max_depth", 2, 7)
+
+        # 핵심 제약: num_leaves <= 2^max_depth - 1 (leaf-wise 과적합 방지)
+        max_leaves_upper = min(31, 2 ** max_depth - 1)
+        num_leaves = trial.suggest_int("num_leaves", 7, max(7, max_leaves_upper))
+
+        min_child_samples = trial.suggest_int("min_child_samples", 10, 50)
+        subsample = trial.suggest_float("subsample", 0.5, 1.0)
+        colsample_bytree = trial.suggest_float("colsample_bytree", 0.5, 1.0)
+        reg_alpha = trial.suggest_float("reg_alpha", 1e-4, 1.0, log=True)
+        reg_lambda = trial.suggest_float("reg_lambda", 1e-4, 1.0, log=True)
+
+        # time_weight_decay는 데이터셋 생성 시 적용되어야 하지만,
+        # 데이터셋을 1회 캐싱하는 구조이므로 LightGBM sample_weight 스케일로 근사한다.
+        # 실제 decay 효과는 w 배열에 이미 반영되어 있으므로 스케일 파라미터로 활용한다.
+        weight_scale = trial.suggest_float("weight_scale", 0.5, 2.0)
+        w_scaled = (w * weight_scale).astype(np.float32)
+
+        params = {
+            "n_estimators": n_estimators,
+            "learning_rate": learning_rate,
+            "max_depth": max_depth,
+            "num_leaves": num_leaves,
+            "min_child_samples": min_child_samples,
+            "subsample": subsample,
+            "colsample_bytree": colsample_bytree,
+            "reg_alpha": reg_alpha,
+            "reg_lambda": reg_lambda,
+        }
+
+        mean_auc, fold_aucs = _walk_forward_cv(
+            X, y, w_scaled, params,
+            n_splits=n_splits,
+            train_ratio=train_ratio,
+            trial=trial,
+        )
+
+        # 폴드별 AUC를 user_attrs에 저장 (결과 리포트용)
+        trial.set_user_attr("fold_aucs", fold_aucs)
+
+        return mean_auc
+
+    return objective
+
+
+# ──────────────────────────────────────────────
+# 베이스라인 AUC 측정 (현재 고정 파라미터)
+# ──────────────────────────────────────────────
+
+def measure_baseline(
+    X: np.ndarray,
+    y: np.ndarray,
+    w: np.ndarray,
+    n_splits: int,
+    train_ratio: float,
+) -> tuple[float, list[float]]:
+    """train_model.py의 현재 고정 파라미터로 베이스라인 AUC를 측정한다."""
+    baseline_params = {
+        "n_estimators": 500,
+        "learning_rate": 0.05,
+        "num_leaves": 31,
+        "min_child_samples": 15,
+        "subsample": 0.8,
+        "colsample_bytree": 0.8,
+        "reg_alpha": 0.05,
+        "reg_lambda": 0.1,
+        "max_depth": -1,  # 현재 train_model.py는 max_depth 미설정
+    }
+    print("베이스라인 측정 중 (현재 train_model.py 고정 파라미터)...")
+    return _walk_forward_cv(X, y, w, baseline_params, n_splits=n_splits, train_ratio=train_ratio)
+
+
+# ──────────────────────────────────────────────
+# 결과 출력 및 저장
+# ──────────────────────────────────────────────
+
+def print_report(
+    study: optuna.Study,
+    baseline_auc: float,
+    baseline_folds: list[float],
+    elapsed_sec: float,
+    output_path: Path,
+) -> None:
+    """콘솔에 최종 리포트를 출력한다."""
+    best = study.best_trial
+    best_auc = best.value
+    best_folds = best.user_attrs.get("fold_aucs", [])
+    improvement = best_auc - baseline_auc
+    improvement_pct = (improvement / baseline_auc * 100) if baseline_auc > 0 else 0.0
+
+    elapsed_min = int(elapsed_sec // 60)
+    elapsed_s = int(elapsed_sec % 60)
+
+    sep = "=" * 62
+    dash = "-" * 62
+
+    print(f"\n{sep}")
+    print(f"  Optuna 튜닝 완료 | {len(study.trials)} trials | 소요: {elapsed_min}분 {elapsed_s}초")
+    print(sep)
+    print(f"  Best AUC  : {best_auc:.4f}  (Trial #{best.number})")
+    print(f"  Baseline  : {baseline_auc:.4f}  (현재 train_model.py 고정값)")
+    sign = "+" if improvement >= 0 else ""
+    print(f"  개선폭    : {sign}{improvement:.4f} ({sign}{improvement_pct:.1f}%)")
+    print(dash)
+    print("  Best Parameters:")
+    for k, v in best.params.items():
+        if isinstance(v, float):
+            print(f"    {k:<22}: {v:.6f}")
+        else:
+            print(f"    {k:<22}: {v}")
+    print(dash)
+    print("  Walk-Forward 폴드별 AUC (Best Trial):")
+    for i, auc in enumerate(best_folds, 1):
+        print(f"    폴드 {i}: {auc:.4f}")
+    if best_folds:
+        print(f"    평균: {np.mean(best_folds):.4f} ± {np.std(best_folds):.4f}")
+    print(dash)
+    print("  Baseline 폴드별 AUC:")
+    for i, auc in enumerate(baseline_folds, 1):
+        print(f"    폴드 {i}: {auc:.4f}")
+    if baseline_folds:
+        print(f"    평균: {np.mean(baseline_folds):.4f} ± {np.std(baseline_folds):.4f}")
+    print(dash)
+    print(f"  결과 저장: {output_path}")
+    print(f"  다음 단계: python scripts/train_model.py --tuned-params {output_path}")
+    print(sep)
+
+
+def save_results(
+    study: optuna.Study,
+    baseline_auc: float,
+    baseline_folds: list[float],
+    elapsed_sec: float,
+    data_path: str,
+) -> Path:
+    """결과를 JSON 파일로 저장하고 경로를 반환한다."""
+    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+    output_path = Path(f"models/tune_results_{timestamp}.json")
+    output_path.parent.mkdir(exist_ok=True)
+
+    best = study.best_trial
+
+    all_trials = []
+    for t in study.trials:
+        if t.state == optuna.trial.TrialState.COMPLETE:
+            all_trials.append({
+                "number": t.number,
+                "auc": round(t.value, 6),
+                "fold_aucs": [round(a, 6) for a in t.user_attrs.get("fold_aucs", [])],
+                "params": {k: (round(v, 6) if isinstance(v, float) else v) for k, v in t.params.items()},
+            })
+
+    result = {
+        "timestamp": datetime.now().isoformat(),
+        "data_path": data_path,
+        "n_trials_total": len(study.trials),
+        "n_trials_complete": len(all_trials),
+        "elapsed_sec": round(elapsed_sec, 1),
+        "baseline": {
+            "auc": round(baseline_auc, 6),
+            "fold_aucs": [round(a, 6) for a in baseline_folds],
+        },
+        "best_trial": {
+            "number": best.number,
+            "auc": round(best.value, 6),
+            "fold_aucs": [round(a, 6) for a in best.user_attrs.get("fold_aucs", [])],
+            "params": {k: (round(v, 6) if isinstance(v, float) else v) for k, v in best.params.items()},
+        },
+        "all_trials": all_trials,
+    }
+
+    with open(output_path, "w", encoding="utf-8") as f:
+        json.dump(result, f, indent=2, ensure_ascii=False)
+
+    return output_path
+
+
+# ──────────────────────────────────────────────
+# 메인
+# ──────────────────────────────────────────────
+
+def main():
+    parser = argparse.ArgumentParser(description="Optuna LightGBM 하이퍼파라미터 튜닝")
+    parser.add_argument("--data",   default="data/combined_15m.parquet", help="학습 데이터 경로")
+    parser.add_argument("--trials", type=int, default=50,  help="Optuna trial 수 (기본: 50)")
+    parser.add_argument("--folds",  type=int, default=5,   help="Walk-Forward 폴드 수 (기본: 5)")
+    parser.add_argument("--train-ratio", type=float, default=0.6, help="학습 구간 비율 (기본: 0.6)")
+    parser.add_argument("--no-baseline", action="store_true", help="베이스라인 측정 건너뜀")
+    args = parser.parse_args()
+
+    # 1. 데이터셋 로드 (1회)
+    X, y, w = load_dataset(args.data)
+
+    # 2. 베이스라인 측정
+    if args.no_baseline:
+        baseline_auc, baseline_folds = 0.0, []
+        print("베이스라인 측정 건너뜀 (--no-baseline)")
+    else:
+        baseline_auc, baseline_folds = measure_baseline(X, y, w, args.folds, args.train_ratio)
+        print(f"베이스라인 AUC: {baseline_auc:.4f} (폴드별: {[round(a,4) for a in baseline_folds]})\n")
+
+    # 3. Optuna study 실행
+    optuna.logging.set_verbosity(optuna.logging.WARNING)
+    sampler = TPESampler(seed=42)
+    pruner  = MedianPruner(n_startup_trials=5, n_warmup_steps=2)
+    study   = optuna.create_study(
+        direction="maximize",
+        sampler=sampler,
+        pruner=pruner,
+        study_name="lgbm_wf_auc",
+    )
+
+    objective = make_objective(X, y, w, n_splits=args.folds, train_ratio=args.train_ratio)
+
+    print(f"Optuna 탐색 시작: {args.trials} trials, {args.folds}폴드 Walk-Forward")
+    print("(진행 상황은 trial 완료마다 출력됩니다)\n")
+
+    start_time = time.time()
+
+    def _progress_callback(study: optuna.Study, trial: optuna.trial.FrozenTrial):
+        if trial.state == optuna.trial.TrialState.COMPLETE:
+            best_so_far = study.best_value
+            print(
+                f"  Trial #{trial.number:3d} | AUC={trial.value:.4f} "
+                f"| Best={best_so_far:.4f} "
+                f"| {trial.params.get('num_leaves', '?')}leaves "
+                f"depth={trial.params.get('max_depth', '?')}"
+            )
+        elif trial.state == optuna.trial.TrialState.PRUNED:
+            print(f"  Trial #{trial.number:3d} | PRUNED")
+
+    study.optimize(
+        objective,
+        n_trials=args.trials,
+        callbacks=[_progress_callback],
+        show_progress_bar=False,
+    )
+
+    elapsed = time.time() - start_time
+
+    # 4. 결과 저장 및 출력
+    output_path = save_results(study, baseline_auc, baseline_folds, elapsed, args.data)
+    print_report(study, baseline_auc, baseline_folds, elapsed, output_path)
+
+
+if __name__ == "__main__":
+    main()
+```
+
+**Step 2: 문법 오류 확인**
+
+```bash
+cd /path/to/cointrader
+python -c "import ast; ast.parse(open('scripts/tune_hyperparams.py').read()); print('문법 OK')"
+```
+
+Expected: `문법 OK`
+
+**Step 3: Commit**
+
+```bash
+git add scripts/tune_hyperparams.py
+git commit -m "feat: add Optuna Walk-Forward AUC hyperparameter tuning script"
+```
+
+---
+
+## Task 3: 동작 검증 (빠른 테스트)
+
+**Files:**
+- Read: `scripts/tune_hyperparams.py`
+
+**Step 1: 빠른 테스트 실행 (10 trials, 3폴드)**
+
+```bash
+python scripts/tune_hyperparams.py --trials 10 --folds 3 --no-baseline
+```
+
+Expected:
+- 오류 없이 10 trials 완료
+- `models/tune_results_YYYYMMDD_HHMMSS.json` 생성
+- 콘솔에 Best Params 출력
+
+**Step 2: JSON 결과 확인**
+
+```bash
+cat models/tune_results_*.json | python -m json.tool | head -40
+```
+
+Expected: `best_trial.auc`, `best_trial.params` 등 구조 확인
+
+**Step 3: Commit**
+
+```bash
+git add models/tune_results_*.json
+git commit -m "test: verify Optuna tuning pipeline with 10 trials"
+```
+
+---
+
+## Task 4: README.md 업데이트
+
+**Files:**
+- Modify: `README.md`
+
+**Step 1: ML 모델 학습 섹션에 튜닝 사용법 추가**
+
+`README.md`의 `## ML 모델 학습` 섹션 아래에 다음 내용을 추가한다:
+
+```markdown
+### 하이퍼파라미터 자동 튜닝 (Optuna)
+
+봇 성능이 저하되거나 데이터가 충분히 축적되었을 때 Optuna로 최적 파라미터를 탐색합니다.
+결과를 확인하고 직접 승인한 후 재학습에 반영하는 **수동 트리거** 방식입니다.
+
+```bash
+# 기본 실행 (50 trials, 5폴드 Walk-Forward, ~30분)
+python scripts/tune_hyperparams.py
+
+# 빠른 테스트 (10 trials, 3폴드, ~5분)
+python scripts/tune_hyperparams.py --trials 10 --folds 3
+
+# 결과 확인 후 승인하면 재학습
+python scripts/train_model.py
+```
+
+결과는 `models/tune_results_YYYYMMDD_HHMMSS.json`에 저장됩니다.
+Best Params와 베이스라인 대비 개선폭을 확인하고 직접 판단하세요.
+```
+
+**Step 2: Commit**
+
+```bash
+git add README.md
+git commit -m "docs: add Optuna hyperparameter tuning usage to README"
+```
+
+---
+
+## 검증 체크리스트
+
+- [ ] `python -c "import optuna"` 오류 없음
+- [ ] `python scripts/tune_hyperparams.py --trials 10 --folds 3 --no-baseline` 오류 없이 완료
+- [ ] `models/tune_results_*.json` 파일 생성 확인
+- [ ] JSON에 `best_trial.params`, `best_trial.fold_aucs` 포함 확인
+- [ ] 콘솔 리포트에 Best AUC, 폴드별 AUC, 파라미터 출력 확인
+- [ ] `num_leaves <= 2^max_depth - 1` 제약이 모든 trial에서 지켜지는지 JSON으로 확인
+
+---
+
+## 향후 확장 (2단계 — 별도 플랜)
+
+파이프라인 안정화 후 `dataset_builder.py`의 `_calc_signals()` 함수를 파라미터화하여 기술 지표 임계값(RSI, Stochastic RSI, 거래량 배수, 진입 점수 임계값)을 탐색 공간에 추가한다.
--- a/models/active_lgbm_params.json
+++ b/models/active_lgbm_params.json
--- a/models/training_log.json
+++ b/models/training_log.json
@@ -276,5 +276,30 @@
    "features": 23,
    "time_weight_decay": 2.0,
    "model_path": "models/lgbm_filter.pkl"
+  },
+  {
+    "date": "2026-03-02T14:51:09.101738",
+    "backend": "lgbm",
+    "auc": 0.5361,
+    "best_threshold": 0.5308,
+    "best_precision": 0.406,
+    "best_recall": 0.371,
+    "samples": 533,
+    "features": 23,
+    "time_weight_decay": 2.0,
+    "model_path": "models/lgbm_filter.pkl",
+    "tuned_params_path": "models/tune_results_20260302_144749.json",
+    "lgbm_params": {
+      "n_estimators": 434,
+      "learning_rate": 0.123659,
+      "num_leaves": 14,
+      "min_child_samples": 10,
+      "subsample": 0.929062,
+      "colsample_bytree": 0.94633,
+      "reg_alpha": 0.573971,
+      "reg_lambda": 0.000157,
+      "max_depth": 6
+    },
+    "weight_scale": 1.783105
  }
 ]
--- a/requirements.txt
+++ b/requirements.txt
@@ -13,3 +13,4 @@ scikit-learn>=1.4.0
 joblib>=1.3.0
 pyarrow>=15.0.0
 onnxruntime>=1.18.0
+optuna>=3.6.0
--- a/scripts/run_optuna.sh
+++ b/scripts/run_optuna.sh
@@ -0,0 +1,49 @@
+#!/usr/bin/env bash
+# Optuna로 LightGBM 하이퍼파라미터를 탐색하고 결과를 출력한다.
+# 사람이 결과를 확인·승인한 후 train_model.py에 수동으로 반영하는 방식.
+#
+# 사용법:
+#   bash scripts/run_optuna.sh              # 기본 (50 trials, 5폴드)
+#   bash scripts/run_optuna.sh 100          # 100 trials
+#   bash scripts/run_optuna.sh 100 3        # 100 trials, 3폴드
+#   bash scripts/run_optuna.sh 10 3 --no-baseline  # 빠른 테스트
+#
+# 결과 확인 후 승인하면:
+#   python scripts/train_model.py --tuned-params models/tune_results_YYYYMMDD_HHMMSS.json
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
+
+VENV_PATH="${VENV_PATH:-$PROJECT_ROOT/.venv}"
+if [ -f "$VENV_PATH/bin/activate" ]; then
+    # shellcheck source=/dev/null
+    source "$VENV_PATH/bin/activate"
+else
+    echo "경고: 가상환경을 찾을 수 없습니다 ($VENV_PATH). 시스템 Python을 사용합니다." >&2
+fi
+
+TRIALS="${1:-50}"
+FOLDS="${2:-5}"
+EXTRA_ARGS="${3:-}"
+
+cd "$PROJECT_ROOT"
+
+echo "=== Optuna 하이퍼파라미터 탐색 ==="
+echo "  trials=${TRIALS}, folds=${FOLDS}"
+echo ""
+
+python scripts/tune_hyperparams.py \
+    --trials "$TRIALS" \
+    --folds  "$FOLDS" \
+    $EXTRA_ARGS
+
+echo ""
+echo "=== 탐색 완료 ==="
+echo ""
+echo "결과 JSON을 확인하고 승인하면 아래 명령으로 재학습하세요:"
+echo "  python scripts/train_model.py --tuned-params models/tune_results_<timestamp>.json"
+echo ""
+echo "Walk-Forward 검증과 함께 재학습:"
+echo "  python scripts/train_model.py --tuned-params models/tune_results_<timestamp>.json --wf"
--- a/scripts/train_model.py
+++ b/scripts/train_model.py
@@ -146,7 +146,52 @@ def generate_dataset(df: pd.DataFrame, n_jobs: int | None = None) -> pd.DataFram
    return pd.DataFrame(rows)


-def train(data_path: str, time_weight_decay: float = 2.0):
+ACTIVE_PARAMS_PATH = Path("models/active_lgbm_params.json")
+
+
+def _load_lgbm_params(tuned_params_path: str | None) -> tuple[dict, float]:
+    """기본 LightGBM 파라미터를 반환하고, 튜닝 JSON이 주어지면 덮어쓴다.
+
+    우선순위:
+      1. --tuned-params 명시적 인자
+      2. models/active_lgbm_params.json (Optuna가 자동 갱신)
+      3. 코드 내 하드코딩 기본값 (fallback)
+    """
+    lgbm_params: dict = {
+        "n_estimators":      434,
+        "learning_rate":     0.123659,
+        "max_depth":         6,
+        "num_leaves":        14,
+        "min_child_samples": 10,
+        "subsample":         0.929062,
+        "colsample_bytree":  0.946330,
+        "reg_alpha":         0.573971,
+        "reg_lambda":        0.000157,
+    }
+    weight_scale = 1.783105
+
+    # 명시적 인자가 없으면 active 파일 자동 탐색
+    resolved_path = tuned_params_path or (
+        str(ACTIVE_PARAMS_PATH) if ACTIVE_PARAMS_PATH.exists() else None
+    )
+
+    if resolved_path:
+        with open(resolved_path, "r", encoding="utf-8") as f:
+            tune_data = json.load(f)
+        best_params = dict(tune_data["best_trial"]["params"])
+        weight_scale = float(best_params.pop("weight_scale", 1.0))
+        lgbm_params.update(best_params)
+        source = "명시적 인자" if tuned_params_path else "active 파일 자동 로드"
+        print(f"\n[Optuna] 튜닝 파라미터 로드 ({source}): {resolved_path}")
+        print(f"[Optuna] 적용 파라미터: {lgbm_params}")
+        print(f"[Optuna] weight_scale: {weight_scale}\n")
+    else:
+        print("[Optuna] active 파일 없음 → 코드 내 기본 파라미터 사용\n")
+
+    return lgbm_params, weight_scale
+
+
+def train(data_path: str, time_weight_decay: float = 2.0, tuned_params_path: str | None = None):
    print(f"데이터 로드: {data_path}")
    df_raw = pd.read_parquet(data_path)
    print(f"캔들 수: {len(df_raw)}, 컬럼: {list(df_raw.columns)}")
@@ -188,7 +233,10 @@ def train(data_path: str, time_weight_decay: float = 2.0):
    split = int(len(X) * 0.8)
    X_train, X_val = X.iloc[:split], X.iloc[split:]
    y_train, y_val = y.iloc[:split], y.iloc[split:]
-    w_train = w[:split]
+
+    # 튜닝 파라미터 로드 (없으면 기본값 사용)
+    lgbm_params, weight_scale = _load_lgbm_params(tuned_params_path)
+    w_train = (w[:split] * weight_scale).astype(np.float32)

    # --- 클래스 불균형 처리: 언더샘플링 (시간 가중치 인덱스 보존) ---
    pos_idx = np.where(y_train == 1)[0]
@@ -208,18 +256,7 @@ def train(data_path: str, time_weight_decay: float = 2.0):
    print(f"검증 데이터: {len(X_val)}개 (양성={int(y_val.sum())}, 음성={int((y_val==0).sum())})")
    # ---------------------------------------------------------------

-    model = lgb.LGBMClassifier(
-        n_estimators=500,
-        learning_rate=0.05,
-        num_leaves=31,
-        min_child_samples=15,
-        subsample=0.8,
-        colsample_bytree=0.8,
-        reg_alpha=0.05,
-        reg_lambda=0.1,
-        random_state=42,
-        verbose=-1,
-    )
+    model = lgb.LGBMClassifier(**lgbm_params, random_state=42, verbose=-1)
    model.fit(
        X_train, y_train,
        sample_weight=w_train,
@@ -268,7 +305,7 @@ def train(data_path: str, time_weight_decay: float = 2.0):
    if LOG_PATH.exists():
        with open(LOG_PATH) as f:
            log = json.load(f)
-    log.append({
+    log_entry: dict = {
        "date": datetime.now().isoformat(),
        "backend": "lgbm",
        "auc": round(auc, 4),
@@ -279,7 +316,11 @@ def train(data_path: str, time_weight_decay: float = 2.0):
        "features": len(actual_feature_cols),
        "time_weight_decay": time_weight_decay,
        "model_path": str(MODEL_PATH),
-    })
+        "tuned_params_path": tuned_params_path,
+        "lgbm_params": lgbm_params,
+        "weight_scale": weight_scale,
+    }
+    log.append(log_entry)
    with open(LOG_PATH, "w") as f:
        json.dump(log, f, indent=2)

@@ -291,6 +332,7 @@ def walk_forward_auc(
    time_weight_decay: float = 2.0,
    n_splits: int = 5,
    train_ratio: float = 0.6,
+    tuned_params_path: str | None = None,
 ) -> None:
    """Walk-Forward 검증: 슬라이딩 윈도우로 n_splits번 학습/검증 반복.

@@ -320,6 +362,9 @@ def walk_forward_auc(
    w = dataset["sample_weight"].values
    n = len(dataset)

+    lgbm_params, weight_scale = _load_lgbm_params(tuned_params_path)
+    w = (w * weight_scale).astype(np.float32)
+
    step = max(1, int(n * (1 - train_ratio) / n_splits))
    train_end_start = int(n * train_ratio)

@@ -340,18 +385,7 @@ def walk_forward_auc(
            neg_idx = np.random.choice(neg_idx, size=len(pos_idx), replace=False)
        idx = np.sort(np.concatenate([pos_idx, neg_idx]))

-        model = lgb.LGBMClassifier(
-            n_estimators=500,
-            learning_rate=0.05,
-            num_leaves=31,
-            min_child_samples=15,
-            subsample=0.8,
-            colsample_bytree=0.8,
-            reg_alpha=0.05,
-            reg_lambda=0.1,
-            random_state=42,
-            verbose=-1,
-        )
+        model = lgb.LGBMClassifier(**lgbm_params, random_state=42, verbose=-1)
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            model.fit(X_tr[idx], y_tr[idx], sample_weight=w_tr[idx])
@@ -377,12 +411,21 @@ def main():
    )
    parser.add_argument("--wf", action="store_true", help="Walk-Forward 검증 실행")
    parser.add_argument("--wf-splits", type=int, default=5, help="Walk-Forward 폴드 수")
+    parser.add_argument(
+        "--tuned-params", type=str, default=None,
+        help="Optuna 튜닝 결과 JSON 경로 (지정 시 기본 파라미터를 덮어씀)",
+    )
    args = parser.parse_args()

    if args.wf:
-        walk_forward_auc(args.data, time_weight_decay=args.decay, n_splits=args.wf_splits)
+        walk_forward_auc(
+            args.data,
+            time_weight_decay=args.decay,
+            n_splits=args.wf_splits,
+            tuned_params_path=args.tuned_params,
+        )
    else:
-        train(args.data, time_weight_decay=args.decay)
+        train(args.data, time_weight_decay=args.decay, tuned_params_path=args.tuned_params)


 if __name__ == "__main__":
--- a/scripts/tune_hyperparams.py
+++ b/scripts/tune_hyperparams.py
@@ -0,0 +1,452 @@
+#!/usr/bin/env python3
+"""
+Optuna를 사용한 LightGBM 하이퍼파라미터 자동 탐색.
+
+사용법:
+    python scripts/tune_hyperparams.py                          # 기본 (50 trials, 5폴드)
+    python scripts/tune_hyperparams.py --trials 10 --folds 3   # 빠른 테스트
+    python scripts/tune_hyperparams.py --data data/combined_15m.parquet --trials 100
+    python scripts/tune_hyperparams.py --no-baseline            # 베이스라인 측정 건너뜀
+
+결과:
+    - 콘솔: Best Params + Walk-Forward 리포트
+    - JSON: models/tune_results_YYYYMMDD_HHMMSS.json
+"""
+import sys
+import warnings
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).parent.parent))
+
+import argparse
+import json
+import time
+from datetime import datetime
+
+import numpy as np
+import pandas as pd
+import lightgbm as lgb
+import optuna
+from optuna.samplers import TPESampler
+from optuna.pruners import MedianPruner
+from sklearn.metrics import roc_auc_score
+
+from src.ml_features import FEATURE_COLS
+from src.dataset_builder import generate_dataset_vectorized
+
+
+# ──────────────────────────────────────────────
+# 데이터 로드 및 데이터셋 생성 (1회 캐싱)
+# ──────────────────────────────────────────────
+
+def load_dataset(data_path: str) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
+    """
+    parquet 로드 → 벡터화 데이터셋 생성 → (X, y, w) numpy 배열 반환.
+    study 시작 전 1회만 호출하여 모든 trial이 공유한다.
+    """
+    print(f"데이터 로드: {data_path}")
+    df_raw = pd.read_parquet(data_path)
+    print(f"캔들 수: {len(df_raw):,}, 컬럼: {list(df_raw.columns)}")
+
+    base_cols = ["open", "high", "low", "close", "volume"]
+    btc_df = eth_df = None
+
+    if "close_btc" in df_raw.columns:
+        btc_df = df_raw[[c + "_btc" for c in base_cols]].copy()
+        btc_df.columns = base_cols
+        print("BTC 피처 활성화")
+
+    if "close_eth" in df_raw.columns:
+        eth_df = df_raw[[c + "_eth" for c in base_cols]].copy()
+        eth_df.columns = base_cols
+        print("ETH 피처 활성화")
+
+    df = df_raw[base_cols].copy()
+
+    print("\n데이터셋 생성 중 (1회만 실행)...")
+    dataset = generate_dataset_vectorized(df, btc_df=btc_df, eth_df=eth_df, time_weight_decay=0.0)
+
+    if dataset.empty or "label" not in dataset.columns:
+        raise ValueError("데이터셋 생성 실패: 샘플 0개")
+
+    actual_feature_cols = [c for c in FEATURE_COLS if c in dataset.columns]
+    X = dataset[actual_feature_cols].values.astype(np.float32)
+    y = dataset["label"].values.astype(np.int8)
+    w = dataset["sample_weight"].values.astype(np.float32)
+
+    pos = int(y.sum())
+    neg = int((y == 0).sum())
+    print(f"데이터셋 완성: {len(dataset):,}개 샘플 (양성={pos}, 음성={neg})")
+    print(f"사용 피처: {len(actual_feature_cols)}개\n")
+
+    return X, y, w
+
+
+# ──────────────────────────────────────────────
+# Walk-Forward 교차검증
+# ──────────────────────────────────────────────
+
+def _walk_forward_cv(
+    X: np.ndarray,
+    y: np.ndarray,
+    w: np.ndarray,
+    params: dict,
+    n_splits: int,
+    train_ratio: float,
+    trial: "optuna.Trial | None" = None,
+) -> tuple[float, list[float]]:
+    """
+    Walk-Forward 교차검증으로 평균 AUC를 반환한다.
+    trial이 제공되면 각 폴드 후 Optuna에 중간 값을 보고하여 Pruning을 활성화한다.
+    """
+    n = len(X)
+    step = max(1, int(n * (1 - train_ratio) / n_splits))
+    train_end_start = int(n * train_ratio)
+
+    fold_aucs: list[float] = []
+
+    for fold_idx in range(n_splits):
+        tr_end = train_end_start + fold_idx * step
+        val_end = tr_end + step
+        if val_end > n:
+            break
+
+        X_tr, y_tr, w_tr = X[:tr_end], y[:tr_end], w[:tr_end]
+        X_val, y_val = X[tr_end:val_end], y[tr_end:val_end]
+
+        # 클래스 불균형 처리: 언더샘플링 (시간 순서 유지)
+        pos_idx = np.where(y_tr == 1)[0]
+        neg_idx = np.where(y_tr == 0)[0]
+        if len(neg_idx) > len(pos_idx) and len(pos_idx) > 0:
+            rng = np.random.default_rng(42)
+            neg_idx = rng.choice(neg_idx, size=len(pos_idx), replace=False)
+        bal_idx = np.sort(np.concatenate([pos_idx, neg_idx]))
+
+        if len(bal_idx) < 20 or len(np.unique(y_val)) < 2:
+            fold_aucs.append(0.5)
+            continue
+
+        model = lgb.LGBMClassifier(**params, random_state=42, verbose=-1)
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            model.fit(X_tr[bal_idx], y_tr[bal_idx], sample_weight=w_tr[bal_idx])
+
+        proba = model.predict_proba(X_val)[:, 1]
+        auc = roc_auc_score(y_val, proba) if len(np.unique(y_val)) > 1 else 0.5
+        fold_aucs.append(float(auc))
+
+        # Optuna Pruning: 중간 값 보고
+        if trial is not None:
+            trial.report(float(np.mean(fold_aucs)), step=fold_idx)
+            if trial.should_prune():
+                raise optuna.TrialPruned()
+
+    mean_auc = float(np.mean(fold_aucs)) if fold_aucs else 0.5
+    return mean_auc, fold_aucs
+
+
+# ──────────────────────────────────────────────
+# Optuna 목적 함수
+# ──────────────────────────────────────────────
+
+def make_objective(
+    X: np.ndarray,
+    y: np.ndarray,
+    w: np.ndarray,
+    n_splits: int,
+    train_ratio: float,
+):
+    """클로저로 데이터셋을 캡처한 목적 함수를 반환한다."""
+
+    def objective(trial: optuna.Trial) -> float:
+        # ── 하이퍼파라미터 샘플링 ──
+        n_estimators     = trial.suggest_int("n_estimators", 100, 600)
+        learning_rate    = trial.suggest_float("learning_rate", 0.01, 0.2, log=True)
+        max_depth        = trial.suggest_int("max_depth", 2, 7)
+
+        # 핵심 제약: num_leaves <= 2^max_depth - 1 (leaf-wise 과적합 방지)
+        # 360개 수준의 소규모 데이터셋에서 num_leaves가 크면 암기 발생
+        max_leaves_upper = min(31, 2 ** max_depth - 1)
+        num_leaves       = trial.suggest_int("num_leaves", 7, max(7, max_leaves_upper))
+
+        min_child_samples = trial.suggest_int("min_child_samples", 10, 50)
+        subsample         = trial.suggest_float("subsample", 0.5, 1.0)
+        colsample_bytree  = trial.suggest_float("colsample_bytree", 0.5, 1.0)
+        reg_alpha         = trial.suggest_float("reg_alpha", 1e-4, 1.0, log=True)
+        reg_lambda        = trial.suggest_float("reg_lambda", 1e-4, 1.0, log=True)
+
+        # weight_scale: 데이터셋을 1회 캐싱하는 구조이므로
+        # time_weight_decay 효과를 sample_weight 스케일로 근사한다.
+        weight_scale = trial.suggest_float("weight_scale", 0.5, 2.0)
+        w_scaled = (w * weight_scale).astype(np.float32)
+
+        params = {
+            "n_estimators":     n_estimators,
+            "learning_rate":    learning_rate,
+            "max_depth":        max_depth,
+            "num_leaves":       num_leaves,
+            "min_child_samples": min_child_samples,
+            "subsample":        subsample,
+            "colsample_bytree": colsample_bytree,
+            "reg_alpha":        reg_alpha,
+            "reg_lambda":       reg_lambda,
+        }
+
+        mean_auc, fold_aucs = _walk_forward_cv(
+            X, y, w_scaled, params,
+            n_splits=n_splits,
+            train_ratio=train_ratio,
+            trial=trial,
+        )
+
+        # 폴드별 AUC를 user_attrs에 저장 (결과 리포트용)
+        trial.set_user_attr("fold_aucs", fold_aucs)
+
+        return mean_auc
+
+    return objective
+
+
+# ──────────────────────────────────────────────
+# 베이스라인 AUC 측정 (현재 고정 파라미터)
+# ──────────────────────────────────────────────
+
+def measure_baseline(
+    X: np.ndarray,
+    y: np.ndarray,
+    w: np.ndarray,
+    n_splits: int,
+    train_ratio: float,
+) -> tuple[float, list[float]]:
+    """현재 실전 파라미터(active 파일 또는 하드코딩 기본값)로 베이스라인 AUC를 측정한다."""
+    active_path = Path("models/active_lgbm_params.json")
+
+    if active_path.exists():
+        with open(active_path, "r", encoding="utf-8") as f:
+            tune_data = json.load(f)
+        best_params = dict(tune_data["best_trial"]["params"])
+        best_params.pop("weight_scale", None)
+        baseline_params = best_params
+        print(f"베이스라인 측정 중 (active 파일: {active_path})...")
+    else:
+        baseline_params = {
+            "n_estimators":      434,
+            "learning_rate":     0.123659,
+            "max_depth":         6,
+            "num_leaves":        14,
+            "min_child_samples": 10,
+            "subsample":         0.929062,
+            "colsample_bytree":  0.946330,
+            "reg_alpha":         0.573971,
+            "reg_lambda":        0.000157,
+        }
+        print("베이스라인 측정 중 (active 파일 없음 → 코드 내 기본 파라미터)...")
+
+    return _walk_forward_cv(X, y, w, baseline_params, n_splits=n_splits, train_ratio=train_ratio)
+
+
+# ──────────────────────────────────────────────
+# 결과 출력 및 저장
+# ──────────────────────────────────────────────
+
+def print_report(
+    study: optuna.Study,
+    baseline_auc: float,
+    baseline_folds: list[float],
+    elapsed_sec: float,
+    output_path: Path,
+) -> None:
+    """콘솔에 최종 리포트를 출력한다."""
+    best = study.best_trial
+    best_auc = best.value
+    best_folds = best.user_attrs.get("fold_aucs", [])
+    improvement = best_auc - baseline_auc
+    improvement_pct = (improvement / baseline_auc * 100) if baseline_auc > 0 else 0.0
+
+    elapsed_min = int(elapsed_sec // 60)
+    elapsed_s   = int(elapsed_sec % 60)
+
+    sep  = "=" * 64
+    dash = "-" * 64
+
+    completed = [t for t in study.trials if t.state == optuna.trial.TrialState.COMPLETE]
+    pruned    = [t for t in study.trials if t.state == optuna.trial.TrialState.PRUNED]
+
+    print(f"\n{sep}")
+    print(f"  Optuna 튜닝 완료 | {len(study.trials)} trials "
+          f"(완료={len(completed)}, 조기종료={len(pruned)}) | "
+          f"소요: {elapsed_min}분 {elapsed_s}초")
+    print(sep)
+    print(f"  Best AUC  : {best_auc:.4f}  (Trial #{best.number})")
+    if baseline_auc > 0:
+        sign = "+" if improvement >= 0 else ""
+        print(f"  Baseline  : {baseline_auc:.4f}  (현재 train_model.py 고정값)")
+        print(f"  개선폭    : {sign}{improvement:.4f} ({sign}{improvement_pct:.1f}%)")
+    print(dash)
+    print("  Best Parameters:")
+    for k, v in best.params.items():
+        if isinstance(v, float):
+            print(f"    {k:<22}: {v:.6f}")
+        else:
+            print(f"    {k:<22}: {v}")
+    print(dash)
+    print("  Walk-Forward 폴드별 AUC (Best Trial):")
+    for i, auc in enumerate(best_folds, 1):
+        print(f"    폴드 {i}: {auc:.4f}")
+    if best_folds:
+        arr = np.array(best_folds)
+        print(f"    평균: {arr.mean():.4f} ± {arr.std():.4f}")
+    if baseline_folds:
+        print(dash)
+        print("  Baseline 폴드별 AUC:")
+        for i, auc in enumerate(baseline_folds, 1):
+            print(f"    폴드 {i}: {auc:.4f}")
+        arr = np.array(baseline_folds)
+        print(f"    평균: {arr.mean():.4f} ± {arr.std():.4f}")
+    print(dash)
+    print(f"  결과 저장: {output_path}")
+    print(f"  다음 단계: python scripts/train_model.py  (파라미터 수동 반영 후)")
+    print(sep)
+
+
+def save_results(
+    study: optuna.Study,
+    baseline_auc: float,
+    baseline_folds: list[float],
+    elapsed_sec: float,
+    data_path: str,
+) -> Path:
+    """결과를 JSON 파일로 저장하고 경로를 반환한다."""
+    timestamp   = datetime.now().strftime("%Y%m%d_%H%M%S")
+    output_path = Path(f"models/tune_results_{timestamp}.json")
+    output_path.parent.mkdir(exist_ok=True)
+
+    best = study.best_trial
+
+    all_trials = []
+    for t in study.trials:
+        if t.state == optuna.trial.TrialState.COMPLETE:
+            all_trials.append({
+                "number":    t.number,
+                "auc":       round(t.value, 6),
+                "fold_aucs": [round(a, 6) for a in t.user_attrs.get("fold_aucs", [])],
+                "params":    {
+                    k: (round(v, 6) if isinstance(v, float) else v)
+                    for k, v in t.params.items()
+                },
+            })
+
+    result = {
+        "timestamp":        datetime.now().isoformat(),
+        "data_path":        data_path,
+        "n_trials_total":   len(study.trials),
+        "n_trials_complete": len(all_trials),
+        "elapsed_sec":      round(elapsed_sec, 1),
+        "baseline": {
+            "auc":       round(baseline_auc, 6),
+            "fold_aucs": [round(a, 6) for a in baseline_folds],
+        },
+        "best_trial": {
+            "number":    best.number,
+            "auc":       round(best.value, 6),
+            "fold_aucs": [round(a, 6) for a in best.user_attrs.get("fold_aucs", [])],
+            "params":    {
+                k: (round(v, 6) if isinstance(v, float) else v)
+                for k, v in best.params.items()
+            },
+        },
+        "all_trials": all_trials,
+    }
+
+    with open(output_path, "w", encoding="utf-8") as f:
+        json.dump(result, f, indent=2, ensure_ascii=False)
+
+    return output_path
+
+
+# ──────────────────────────────────────────────
+# 메인
+# ──────────────────────────────────────────────
+
+def main():
+    parser = argparse.ArgumentParser(description="Optuna LightGBM 하이퍼파라미터 튜닝")
+    parser.add_argument("--data",        default="data/combined_15m.parquet", help="학습 데이터 경로")
+    parser.add_argument("--trials",      type=int,   default=50,  help="Optuna trial 수 (기본: 50)")
+    parser.add_argument("--folds",       type=int,   default=5,   help="Walk-Forward 폴드 수 (기본: 5)")
+    parser.add_argument("--train-ratio", type=float, default=0.6, help="학습 구간 비율 (기본: 0.6)")
+    parser.add_argument("--no-baseline", action="store_true",     help="베이스라인 측정 건너뜀")
+    args = parser.parse_args()
+
+    # 1. 데이터셋 로드 (1회)
+    X, y, w = load_dataset(args.data)
+
+    # 2. 베이스라인 측정
+    if args.no_baseline:
+        baseline_auc, baseline_folds = 0.0, []
+        print("베이스라인 측정 건너뜀 (--no-baseline)\n")
+    else:
+        baseline_auc, baseline_folds = measure_baseline(X, y, w, args.folds, args.train_ratio)
+        print(
+            f"베이스라인 AUC: {baseline_auc:.4f} "
+            f"(폴드별: {[round(a, 4) for a in baseline_folds]})\n"
+        )
+
+    # 3. Optuna study 실행
+    optuna.logging.set_verbosity(optuna.logging.WARNING)
+    sampler = TPESampler(seed=42)
+    pruner  = MedianPruner(n_startup_trials=5, n_warmup_steps=2)
+    study   = optuna.create_study(
+        direction="maximize",
+        sampler=sampler,
+        pruner=pruner,
+        study_name="lgbm_wf_auc",
+    )
+
+    objective = make_objective(X, y, w, n_splits=args.folds, train_ratio=args.train_ratio)
+
+    print(f"Optuna 탐색 시작: {args.trials} trials, {args.folds}폴드 Walk-Forward")
+    print("(trial 완료마다 진행 상황 출력)\n")
+
+    start_time = time.time()
+
+    def _progress_callback(study: optuna.Study, trial: optuna.trial.FrozenTrial) -> None:
+        if trial.state == optuna.trial.TrialState.COMPLETE:
+            best_so_far = study.best_value
+            leaves  = trial.params.get("num_leaves", "?")
+            depth   = trial.params.get("max_depth", "?")
+            print(
+                f"  Trial #{trial.number:3d} | AUC={trial.value:.4f} "
+                f"| Best={best_so_far:.4f} "
+                f"| leaves={leaves} depth={depth}"
+            )
+        elif trial.state == optuna.trial.TrialState.PRUNED:
+            print(f"  Trial #{trial.number:3d} | PRUNED (조기 종료)")
+
+    study.optimize(
+        objective,
+        n_trials=args.trials,
+        callbacks=[_progress_callback],
+        show_progress_bar=False,
+    )
+
+    elapsed = time.time() - start_time
+
+    # 4. 결과 저장 및 출력
+    output_path = save_results(study, baseline_auc, baseline_folds, elapsed, args.data)
+    print_report(study, baseline_auc, baseline_folds, elapsed, output_path)
+
+    # 5. 성능 개선 시 active 파일 자동 갱신
+    import shutil
+    active_path = Path("models/active_lgbm_params.json")
+    if not args.no_baseline and study.best_value > baseline_auc:
+        shutil.copy(output_path, active_path)
+        improvement = study.best_value - baseline_auc
+        print(f"[MLOps] AUC +{improvement:.4f} 개선 → {active_path} 자동 갱신 완료")
+        print(f"[MLOps] 다음 train_model.py 실행 시 새 파라미터가 자동 적용됩니다.\n")
+    elif args.no_baseline:
+        print("[MLOps] --no-baseline 모드: 성능 비교 없이 active 파일 유지\n")
+    else:
+        print(f"[MLOps] 성능 개선 없음 (Best={study.best_value:.4f} ≤ Baseline={baseline_auc:.4f}) → active 파일 유지\n")
+
+
+if __name__ == "__main__":
+    main()
Author	SHA1	Message	Date
21in7	dcdaf9f90a	chore: update active LGBM parameters and add new training log entry - Updated timestamp and elapsed seconds in models/active_lgbm_params.json. - Adjusted baseline AUC and fold AUCs to reflect new model performance. - Added a new entry in models/training_log.json with detailed metrics from the latest training run, including tuned parameters and model path. Made-with: Cursor	2026-03-02 15:03:35 +09:00
21in7	6d82febab7	feat: implement Active Config pattern for automatic param promotion - tune_hyperparams.py: 탐색 완료 후 Best AUC > Baseline AUC 이면 models/active_lgbm_params.json 자동 갱신 - tune_hyperparams.py: 베이스라인을 active 파일 기준으로 측정 (active 없으면 코드 내 기본값 사용) - train_model.py: _load_lgbm_params()에 active 파일 자동 탐색 추가 우선순위: --tuned-params > active_lgbm_params.json > 하드코딩 기본값 - models/active_lgbm_params.json: 현재 best 파라미터로 초기화 - .gitignore: tune_results_*.json 제외, active 파일은 git 추적 유지 Made-with: Cursor	2026-03-02 14:56:42 +09:00
21in7	d5f8ed4789	feat: update default LightGBM params to Optuna best (trial #46 , AUC=0.6002) Optuna 50 trials Walk-Forward 5폴드 탐색 결과 (tune_results_20260302_144749.json): - Baseline AUC: 0.5803 → Best AUC: 0.6002 (+0.0199, +3.4%) - n_estimators: 500 → 434 - learning_rate: 0.05 → 0.123659 - max_depth: (미설정) → 6 - num_leaves: 31 → 14 - min_child_samples: 15 → 10 - subsample: 0.8 → 0.929062 - colsample_bytree: 0.8 → 0.946330 - reg_alpha: 0.05 → 0.573971 - reg_lambda: 0.1 → 0.000157 - weight_scale: 1.0 → 1.783105 Made-with: Cursor	2026-03-02 14:52:41 +09:00
21in7	ce02f1335c	feat: add run_optuna.sh wrapper script for Optuna tuning Made-with: Cursor	2026-03-02 14:50:50 +09:00
21in7	4afc7506d7	feat: connect Optuna tuning results to train_model.py via --tuned-params - _load_lgbm_params() 헬퍼 추가: 기본 파라미터 반환, JSON 주어지면 덮어씀 - train(): tuned_params_path 인자 추가, weight_scale 적용 - walk_forward_auc(): tuned_params_path 인자 추가, weight_scale 적용 - main(): --tuned-params argparse 인자 추가, 두 함수에 전달 - training_log.json에 tuned_params_path, lgbm_params, weight_scale 기록 Made-with: Cursor	2026-03-02 14:45:15 +09:00
21in7	caaa81f5f9	fix: add shebang and executable permission to tune_hyperparams.py Made-with: Cursor	2026-03-02 14:41:13 +09:00
21in7	8dd1389b16	feat: add Optuna Walk-Forward AUC hyperparameter tuning pipeline - scripts/tune_hyperparams.py: Optuna + Walk-Forward 5폴드 AUC 목적 함수 - 데이터셋 1회 캐싱으로 모든 trial 공유 (속도 최적화) - num_leaves <= 2^max_depth - 1 제약 강제 (소규모 데이터 과적합 방지) - MedianPruner로 저성능 trial 조기 종료 - 결과: 콘솔 리포트 + models/tune_results_YYYYMMDD_HHMMSS.json - requirements.txt: optuna>=3.6.0 추가 - README.md: 하이퍼파라미터 자동 튜닝 사용법 섹션 추가 - docs/plans/: 설계 문서 및 구현 플랜 추가 Made-with: Cursor	2026-03-02 14:39:07 +09:00