multiAI Summary Pending

ml-antipattern-validator

Prevents 30+ critical AI/ML mistakes including data leakage, evaluation errors, training pitfalls, and deployment issues. Use when working with ML training, testing, model evaluation, or deployment.

231 stars

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ml-antipattern-validator/SKILL.md --create-dirs "https://raw.githubusercontent.com/aiskillstore/marketplace/main/skills/doyajin174/ml-antipattern-validator/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/ml-antipattern-validator/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How ml-antipattern-validator Compares

Feature / Agentml-antipattern-validatorStandard Approach
Platform SupportmultiLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Prevents 30+ critical AI/ML mistakes including data leakage, evaluation errors, training pitfalls, and deployment issues. Use when working with ML training, testing, model evaluation, or deployment.

Which AI agents support this skill?

This skill is compatible with multi.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# ML Antipattern Validator

## Overview

AI/ML 개발에서 30+ 안티패턴을 감지하고 방지하는 스킬입니다.

**Key Principle**: Honest evaluation > Impressive metrics.

## When to Activate

**Automatic Triggers**:
- ML training code (`train*.py`, model training)
- Dataset preparation or splitting
- Model evaluation or testing
- Production deployment planning

**Manual Triggers**:
- `@validate-ml` - Full validation
- `@check-leakage` - Data leakage detection
- `@verify-eval` - Evaluation methodology

---

## Pre-Implementation Checklist

```python
✅ Requirements:
□ Problem clearly defined with success metrics
□ Train/test split strategy defined
□ Evaluation methodology matches business objective

✅ Data Integrity:
□ No temporal leakage (future → past)
□ No target leakage (answer in features)
□ No preprocessing leakage (fit on all data)
□ No group leakage (related samples split)

✅ Evaluation Setup:
□ Test set completely held out
□ Metrics aligned with business objective
□ Baseline models defined
```

---

## Critical Antipatterns

### Category 1: Data Leakage 🚨

#### 1.1 Target Leakage
```python
❌ WRONG: Using "refund_issued" to predict "purchase_fraud"
✅ CORRECT: Only use features available at purchase time
```

#### 1.2 Temporal Leakage
```python
❌ WRONG: train = df[df['date'] > '2024-06-01']  # Future data
✅ CORRECT: train = df[df['date'] < '2024-06-01']  # Past for training
```

#### 1.3 Preprocessing Leakage
```python
❌ WRONG: X_scaled = scaler.fit_transform(X); train_test_split(X_scaled)
✅ CORRECT: Split first, then scaler.fit(X_train)
```

#### 1.4 Group Leakage
```python
❌ WRONG: train_test_split(df)  # Same user in both sets
✅ CORRECT: GroupShuffleSplit(groups=df['user_id'])
```

#### 1.5 Data Augmentation Leakage
```python
❌ WRONG: augment(X) → train_test_split()
✅ CORRECT: train_test_split() → augment(X_train)
```

---

### Category 2: Evaluation Mistakes ⚠️

#### 2.1 Testing on Training Data
```python
❌ WRONG: evaluate(model, training_data)
✅ CORRECT: evaluate(model, unseen_test_data)
```

#### 2.2 Metric Misalignment
```python
Business Objective → Appropriate Metric:
- Ranking → NDCG, MRR, MAP
- Imbalanced → F1, Precision@K, AUC-PR
- Balanced → Accuracy, AUC-ROC
```

#### 2.3 Accuracy Paradox
```python
❌ WRONG: 99% accuracy on 99:1 imbalanced data
✅ CORRECT: Check per-class metrics with classification_report()
```

#### 2.4 Invalid Time Series CV
```python
❌ WRONG: cross_val_score(model, X, y, cv=5)  # Shuffles time!
✅ CORRECT: TimeSeriesSplit(n_splits=5)
```

#### 2.5 Hyperparameter Tuning on Test Set
```python
❌ WRONG: grid_search(model, X_test, y_test)
✅ CORRECT: train/validation/test three-way split
```

---

### Category 3: Training Pitfalls 🔧

#### 3.1 Batch Norm Inference Error
```python
❌ WRONG: predictions = model(X_test)  # Still in train mode
✅ CORRECT: model.eval(); with torch.no_grad(): predictions = model(X_test)
```

#### 3.2 Early Stopping Overfitting
```python
❌ WRONG: EarlyStopping(patience=50)
✅ CORRECT: EarlyStopping(patience=5, min_delta=0.001, restore_best_weights=True)
```

#### 3.3 Learning Rate Warmup
```python
✅ CORRECT: get_linear_schedule_with_warmup(num_warmup_steps=1000)
```

#### 3.4 Class Imbalance
```python
❌ WRONG: CrossEntropyLoss()  # Biased toward majority
✅ CORRECT: CrossEntropyLoss(weight=class_weights)
```

---

## Detection Patterns

### Leakage Detection
```python
# Check feature-target correlation
correlation = df[features].corrwith(df['target'])
if (correlation.abs() > 0.95).any():
    raise DataLeakageError("Suspiciously high correlation")

# Check temporal ordering
if train['date'].min() > test['date'].max():
    raise TemporalLeakageError("Training on future, testing on past")

# Check group overlap
if train_groups & test_groups:
    raise GroupLeakageError("Overlapping groups")
```

### Mode Check
```python
if model.training:
    raise InferenceModeError("Model in training mode during evaluation")
```

---

## Validation Checklist

Before deployment:

- [ ] No data leakage detected
- [ ] Test set never seen during training
- [ ] Metrics aligned with business objective
- [ ] model.eval() called for inference
- [ ] Class imbalance handled
- [ ] Covariate shift monitoring planned

---

## References

상세 예시 및 시나리오는 [references/REFERENCE.md](references/REFERENCE.md) 참조.