when-developing-ml-models-use-ml-expert
Specialized ML model development, training, and deployment workflow
Best use case
when-developing-ml-models-use-ml-expert is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Specialized ML model development, training, and deployment workflow
Specialized ML model development, training, and deployment workflow
Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.
Practical example
Example input
Use the "when-developing-ml-models-use-ml-expert" skill to help with this workflow task. Context: Specialized ML model development, training, and deployment workflow
Example output
A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.
When to use this skill
- Use this skill when you want a reusable workflow rather than writing the same prompt again and again.
- Use it when you already have the supporting tools or dependencies needed by the workflow.
When not to use this skill
- Do not use this when you only need a one-off answer and do not need a reusable workflow.
- Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/when-developing-ml-models-use-ml-expert/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How when-developing-ml-models-use-ml-expert Compares
| Feature / Agent | when-developing-ml-models-use-ml-expert | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Specialized ML model development, training, and deployment workflow
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Cursor vs Codex for AI Workflows
Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.
SKILL.md Source
# ML Expert - Machine Learning Model Development
## Overview
Specialized workflow for ML model development, training, and deployment. Supports various architectures (CNNs, RNNs, Transformers) with distributed training capabilities.
## When to Use
- Developing new ML models
- Training neural networks
- Model optimization
- Production deployment
- Transfer learning
- Fine-tuning existing models
## Phase 1: Data Preparation (10 min)
### Objective
Clean, preprocess, and prepare training data
### Agent: ML-Developer
**Step 1.1: Load and Analyze Data**
```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
# Load data
data = pd.read_csv('dataset.csv')
# Analyze
analysis = {
'shape': data.shape,
'columns': data.columns.tolist(),
'dtypes': data.dtypes.to_dict(),
'missing': data.isnull().sum().to_dict(),
'stats': data.describe().to_dict()
}
# Store analysis
await memory.store('ml-expert/data-analysis', analysis)
```
**Step 1.2: Data Cleaning**
```python
# Handle missing values
data = data.fillna(data.mean())
# Remove duplicates
data = data.drop_duplicates()
# Handle outliers
from scipy import stats
z_scores = np.abs(stats.zscore(data.select_dtypes(include=[np.number])))
data = data[(z_scores < 3).all(axis=1)]
# Encode categorical variables
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
for col in data.select_dtypes(include=['object']).columns:
data[col] = le.fit_transform(data[col])
```
**Step 1.3: Split Data**
```python
# Split into train/val/test
X = data.drop('target', axis=1)
y = data['target']
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)
# Normalize
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)
X_test = scaler.transform(X_test)
# Save preprocessed data
np.save('data/X_train.npy', X_train)
np.save('data/X_val.npy', X_val)
np.save('data/X_test.npy', X_test)
```
### Validation Criteria
- [ ] Data loaded successfully
- [ ] Missing values handled
- [ ] Train/val/test split created
- [ ] Normalization applied
## Phase 2: Model Selection (10 min)
### Objective
Choose and configure model architecture
### Agent: Researcher
**Step 2.1: Analyze Task Type**
```python
task_analysis = {
'type': 'classification|regression|clustering',
'complexity': 'low|medium|high',
'dataSize': len(data),
'features': X.shape[1],
'classes': len(np.unique(y)) if classification else None
}
# Recommend architecture
if task_analysis['type'] == 'classification' and task_analysis['features'] > 100:
recommended_architecture = 'deep_neural_network'
elif task_analysis['type'] == 'regression' and task_analysis['dataSize'] < 10000:
recommended_architecture = 'random_forest'
# ... more logic
```
**Step 2.2: Define Architecture**
```python
import tensorflow as tf
def create_model(architecture='dnn', input_shape, num_classes):
if architecture == 'dnn':
model = tf.keras.Sequential([
tf.keras.layers.Dense(256, activation='relu', input_shape=input_shape),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(num_classes, activation='softmax')
])
elif architecture == 'cnn':
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=input_shape),
tf.keras.layers.MaxPooling2D((2,2)),
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D((2,2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(num_classes, activation='softmax')
])
# ... more architectures
return model
model = create_model('dnn', input_shape=(X_train.shape[1],), num_classes=len(np.unique(y)))
model.summary()
```
**Step 2.3: Configure Training**
```python
training_config = {
'optimizer': tf.keras.optimizers.Adam(learning_rate=0.001),
'loss': 'sparse_categorical_crossentropy',
'metrics': ['accuracy'],
'batch_size': 32,
'epochs': 100,
'callbacks': [
tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True),
tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5),
tf.keras.callbacks.ModelCheckpoint('best_model.h5', save_best_only=True)
]
}
model.compile(
optimizer=training_config['optimizer'],
loss=training_config['loss'],
metrics=training_config['metrics']
)
```
### Validation Criteria
- [ ] Task analyzed
- [ ] Architecture selected
- [ ] Model configured
- [ ] Training parameters set
## Phase 3: Train Model (20 min)
### Objective
Execute training with monitoring
### Agent: ML-Developer
**Step 3.1: Start Training**
```python
# Train model
history = model.fit(
X_train, y_train,
validation_data=(X_val, y_val),
batch_size=training_config['batch_size'],
epochs=training_config['epochs'],
callbacks=training_config['callbacks'],
verbose=1
)
# Save training history
import json
with open('training_history.json', 'w') as f:
json.dump({
'loss': history.history['loss'],
'val_loss': history.history['val_loss'],
'accuracy': history.history['accuracy'],
'val_accuracy': history.history['val_accuracy']
}, f)
```
**Step 3.2: Monitor Training**
```python
import matplotlib.pyplot as plt
# Plot training curves
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
# Loss
ax1.plot(history.history['loss'], label='Train Loss')
ax1.plot(history.history['val_loss'], label='Val Loss')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.legend()
ax1.grid(True)
# Accuracy
ax2.plot(history.history['accuracy'], label='Train Accuracy')
ax2.plot(history.history['val_accuracy'], label='Val Accuracy')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Accuracy')
ax2.legend()
ax2.grid(True)
plt.savefig('training_curves.png')
```
**Step 3.3: Distributed Training (Optional)**
```python
# Using Flow-Nexus for distributed training
from flow_nexus import DistributedTrainer
trainer = DistributedTrainer({
'cluster_id': 'ml-training-cluster',
'num_nodes': 4,
'strategy': 'data_parallel'
})
# Train across multiple nodes
trainer.fit(
model=model,
train_data=(X_train, y_train),
val_data=(X_val, y_val),
config=training_config
)
```
### Validation Criteria
- [ ] Training completed
- [ ] No NaN losses
- [ ] Validation metrics improving
- [ ] Model checkpoints saved
## Phase 4: Validate Performance (10 min)
### Objective
Evaluate model on test set
### Agent: Tester
**Step 4.1: Evaluate on Test Set**
```python
# Load best model
best_model = tf.keras.models.load_model('best_model.h5')
# Evaluate
test_loss, test_accuracy = best_model.evaluate(X_test, y_test, verbose=0)
# Predictions
y_pred = best_model.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1)
# Detailed metrics
from sklearn.metrics import classification_report, confusion_matrix
metrics = {
'test_loss': float(test_loss),
'test_accuracy': float(test_accuracy),
'classification_report': classification_report(y_test, y_pred_classes, output_dict=True),
'confusion_matrix': confusion_matrix(y_test, y_pred_classes).tolist()
}
await memory.store('ml-expert/metrics', metrics)
```
**Step 4.2: Generate Evaluation Report**
```python
report = f"""
# Model Evaluation Report
## Performance Metrics
- Test Loss: {test_loss:.4f}
- Test Accuracy: {test_accuracy:.4f}
## Classification Report
{classification_report(y_test, y_pred_classes)}
## Model Summary
- Architecture: {recommended_architecture}
- Parameters: {model.count_params()}
- Training Time: {training_time} seconds
## Training History
- Best Val Loss: {min(history.history['val_loss']):.4f}
- Best Val Accuracy: {max(history.history['val_accuracy']):.4f}
- Epochs Trained: {len(history.history['loss'])}
"""
with open('evaluation_report.md', 'w') as f:
f.write(report)
```
### Validation Criteria
- [ ] Test accuracy > target threshold
- [ ] No overfitting detected
- [ ] Metrics documented
- [ ] Report generated
## Phase 5: Deploy to Production (15 min)
### Objective
Package model for deployment
### Agent: ML-Developer
**Step 5.1: Export Model**
```python
# Save in multiple formats
model.save('model.h5') # Keras format
model.save('model_savedmodel') # TensorFlow SavedModel
# Convert to TFLite for mobile
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
with open('model.tflite', 'wb') as f:
f.write(tflite_model)
# Save preprocessing pipeline
import joblib
joblib.dump(scaler, 'scaler.pkl')
```
**Step 5.2: Create Deployment Package**
```python
import shutil
import os
# Create deployment directory
os.makedirs('deployment', exist_ok=True)
# Copy necessary files
shutil.copy('model.h5', 'deployment/')
shutil.copy('scaler.pkl', 'deployment/')
shutil.copy('evaluation_report.md', 'deployment/')
# Create inference script
inference_script = '''
import tensorflow as tf
import joblib
import numpy as np
class ModelInference:
def __init__(self, model_path, scaler_path):
self.model = tf.keras.models.load_model(model_path)
self.scaler = joblib.load(scaler_path)
def predict(self, input_data):
# Preprocess
scaled = self.scaler.transform(input_data)
# Predict
predictions = self.model.predict(scaled)
return np.argmax(predictions, axis=1)
# Usage
inference = ModelInference('model.h5', 'scaler.pkl')
result = inference.predict(new_data)
'''
with open('deployment/inference.py', 'w') as f:
f.write(inference_script)
```
**Step 5.3: Generate Documentation**
```markdown
# Model Deployment Guide
## Files
- `model.h5`: Trained Keras model
- `scaler.pkl`: Preprocessing scaler
- `inference.py`: Inference script
## Usage
\`\`\`python
from inference import ModelInference
model = ModelInference('model.h5', 'scaler.pkl')
predictions = model.predict(new_data)
\`\`\`
## Performance
- Latency: < 50ms per prediction
- Accuracy: ${test_accuracy}
- Model Size: ${model_size}MB
## Requirements
- tensorflow>=2.0
- scikit-learn
- numpy
```
### Validation Criteria
- [ ] Model exported successfully
- [ ] Deployment package created
- [ ] Inference script tested
- [ ] Documentation complete
## Success Metrics
- Test accuracy meets target
- Training converged
- No overfitting
- Production-ready deployment
## Skill Completion
Outputs:
1. **model.h5**: Trained model file
2. **evaluation_report.md**: Performance metrics
3. **deployment/**: Production package
4. **training_history.json**: Training logs
Complete when model deployed and validated.Related Skills
typescript-expert
TypeScript and JavaScript expert with deep knowledge of type-level programming, performance optimization, monorepo management, migration strategies, and modern tooling. Use PROACTIVELY for any TypeScript/JavaScript issues including complex type gymnastics, build performance, debugging, and architectural decisions. If a specialized expert is a better fit, I will recommend switching and stop.
threat-modeling-expert
Expert in threat modeling methodologies, security architecture review, and risk assessment. Masters STRIDE, PASTA, attack trees, and security requirement extraction. Use for security architecture reviews, threat identification, and secure-by-design planning.
service-mesh-expert
Expert service mesh architect specializing in Istio, Linkerd, and cloud-native networking patterns. Masters traffic management, security policies, observability integration, and multi-cluster mesh con
pydantic-models-py
Create Pydantic models following the multi-model pattern with Base, Create, Update, Response, and InDB variants. Use when defining API request/response schemas, database models, or data validation in Python applications using Pydantic v2.
prisma-expert
Prisma ORM expert for schema design, migrations, query optimization, relations modeling, and database operations. Use PROACTIVELY for Prisma schema issues, migration problems, query performance, relation design, or database connection issues.
nosql-expert
Expert guidance for distributed NoSQL databases (Cassandra, DynamoDB). Focuses on mental models, query-first modeling, single-table design, and avoiding hot partitions in high-scale systems.
nestjs-expert
Nest.js framework expert specializing in module architecture, dependency injection, middleware, guards, interceptors, testing with Jest/Supertest, TypeORM/Mongoose integration, and Passport.js authentication. Use PROACTIVELY for any Nest.js application issues including architecture decisions, testing strategies, performance optimization, or debugging complex dependency injection problems. If a specialized expert is a better fit, I will recommend switching and stop.
n8n-mcp-tools-expert
Expert guide for using n8n-mcp MCP tools effectively. Use when searching for nodes, validating configurations, accessing templates, managing workflows, or using any n8n-mcp tool. Provides tool selection guidance, parameter formats, and common patterns.
mermaid-expert
Create Mermaid diagrams for flowcharts, sequences, ERDs, and architectures. Masters syntax for all diagram types and styling. Use PROACTIVELY for visual documentation, system diagrams, or process flows.
laravel-expert
Senior Laravel Engineer role for production-grade, maintainable, and idiomatic Laravel solutions. Focuses on clean architecture, security, performance, and modern standards (Laravel 10/11+).
kotlin-coroutines-expert
Expert patterns for Kotlin Coroutines and Flow, covering structured concurrency, error handling, and testing.
computer-vision-expert
SOTA Computer Vision Expert (2026). Specialized in YOLO26, Segment Anything 3 (SAM 3), Vision Language Models, and real-time spatial analysis.