mlops

MLflow, model versioning, experiment tracking, model registry, and production ML systems

16 stars

Best use case

mlops is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

MLflow, model versioning, experiment tracking, model registry, and production ML systems

Teams using mlops should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/mlops/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/data-ai/mlops/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/mlops/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How mlops Compares

Feature / AgentmlopsStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

MLflow, model versioning, experiment tracking, model registry, and production ML systems

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# MLOps

Production machine learning systems with MLflow, model versioning, and deployment pipelines.

## Quick Start

```python
import mlflow
from mlflow.tracking import MlflowClient
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score
import joblib

# Configure MLflow
mlflow.set_tracking_uri("http://mlflow-server:5000")
mlflow.set_experiment("customer-churn-prediction")

# Training with experiment tracking
with mlflow.start_run(run_name="rf-baseline"):
    # Log parameters
    params = {"n_estimators": 100, "max_depth": 10, "random_state": 42}
    mlflow.log_params(params)

    # Train model
    model = RandomForestClassifier(**params)
    model.fit(X_train, y_train)

    # Evaluate and log metrics
    y_pred = model.predict(X_test)
    metrics = {
        "accuracy": accuracy_score(y_test, y_pred),
        "f1_score": f1_score(y_test, y_pred, average="weighted")
    }
    mlflow.log_metrics(metrics)

    # Log model to registry
    mlflow.sklearn.log_model(
        model, "model",
        registered_model_name="churn-classifier",
        signature=mlflow.models.infer_signature(X_train, y_pred)
    )

    print(f"Run ID: {mlflow.active_run().info.run_id}")
```

## Core Concepts

### 1. Model Registry & Versioning

```python
from mlflow.tracking import MlflowClient

client = MlflowClient()

# Promote model to production
client.transition_model_version_stage(
    name="churn-classifier",
    version=3,
    stage="Production"
)

# Archive old version
client.transition_model_version_stage(
    name="churn-classifier",
    version=2,
    stage="Archived"
)

# Load production model
model_uri = "models:/churn-classifier/Production"
model = mlflow.sklearn.load_model(model_uri)

# Model comparison
def compare_model_versions(model_name: str, versions: list[int]) -> dict:
    results = {}
    for version in versions:
        run_id = client.get_model_version(model_name, str(version)).run_id
        run = client.get_run(run_id)
        results[version] = run.data.metrics
    return results
```

### 2. Feature Store Pattern

```python
from feast import FeatureStore, Entity, Feature, FeatureView, FileSource
from datetime import timedelta

# Define feature store
store = FeatureStore(repo_path="feature_repo/")

# Get training features
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "customer_features:total_purchases",
        "customer_features:days_since_last_order",
        "customer_features:avg_order_value"
    ]
).to_df()

# Get online features for inference
feature_vector = store.get_online_features(
    features=[
        "customer_features:total_purchases",
        "customer_features:days_since_last_order"
    ],
    entity_rows=[{"customer_id": "12345"}]
).to_dict()
```

### 3. Model Serving with FastAPI

```python
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import mlflow
import numpy as np

app = FastAPI()

# Load model at startup
model = mlflow.sklearn.load_model("models:/churn-classifier/Production")

class PredictionRequest(BaseModel):
    features: list[float]

class PredictionResponse(BaseModel):
    prediction: int
    probability: float
    model_version: str

@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
    try:
        X = np.array(request.features).reshape(1, -1)
        prediction = model.predict(X)[0]
        probability = model.predict_proba(X)[0].max()

        return PredictionResponse(
            prediction=int(prediction),
            probability=float(probability),
            model_version="v3"
        )
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/health")
async def health():
    return {"status": "healthy", "model_loaded": model is not None}
```

### 4. CI/CD for ML

```yaml
# .github/workflows/ml-pipeline.yml
name: ML Pipeline

on:
  push:
    paths:
      - 'src/**'
      - 'data/**'

jobs:
  train-and-evaluate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Run tests
        run: pytest tests/

      - name: Train model
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_URI }}
        run: python src/train.py

      - name: Evaluate model
        run: python src/evaluate.py --threshold 0.85

      - name: Register model
        if: success()
        run: python src/register_model.py

  deploy:
    needs: train-and-evaluate
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
      - name: Deploy to production
        run: |
          kubectl set image deployment/model-server \
            model-server=gcr.io/$PROJECT/model:${{ github.sha }}
```

## Tools & Technologies

| Tool | Purpose | Version (2025) |
|------|---------|----------------|
| **MLflow** | Experiment tracking | 2.10+ |
| **Feast** | Feature store | 0.36+ |
| **BentoML** | Model serving | 1.2+ |
| **Seldon** | K8s model serving | 1.17+ |
| **DVC** | Data versioning | 3.40+ |
| **Weights & Biases** | Experiment tracking | Latest |
| **Evidently** | Model monitoring | 0.4+ |

## Troubleshooting Guide

| Issue | Symptoms | Root Cause | Fix |
|-------|----------|------------|-----|
| **Model Drift** | Accuracy drops | Data distribution change | Monitor, retrain |
| **Slow Inference** | High latency | Large model, no optimization | Quantize, distill |
| **Version Mismatch** | Prediction errors | Wrong model version | Pin versions |
| **Feature Skew** | Train/serve mismatch | Different preprocessing | Use feature store |

## Best Practices

```python
# ✅ DO: Version everything
mlflow.log_artifact("data/train.csv")
mlflow.log_params({"data_version": "v2.3"})

# ✅ DO: Test model before deployment
def test_model_performance(model, threshold=0.85):
    score = evaluate_model(model)
    assert score >= threshold, f"Model score {score} below threshold"

# ✅ DO: Monitor in production
# ✅ DO: A/B test new models

# ❌ DON'T: Deploy without validation
# ❌ DON'T: Skip rollback strategy
```

## Resources

- [MLflow Docs](https://mlflow.org/docs/latest/)
- [Made With ML](https://madewithml.com/)
- [Google ML Best Practices](https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning)

---

**Skill Certification Checklist:**
- [ ] Can track experiments with MLflow
- [ ] Can manage model registry
- [ ] Can deploy models with FastAPI/BentoML
- [ ] Can set up CI/CD for ML
- [ ] Can monitor models in production

Related Skills

agent-mlops

16
from diegosouzapw/awesome-omni-skill

Production deployment and operationalization of AI agents on Databricks. Use when deploying agents to Model Serving, setting up MLflow logging and tracing for agents, implementing Agent Evaluation frameworks, monitoring agent performance in production, managing agent versions and rollbacks, optimizing agent costs and latency, or establishing CI/CD pipelines for agents. Covers MLflow integration patterns, evaluation best practices, Model Serving configuration, and production monitoring strategies.

agent-mlops-engineer

16
from diegosouzapw/awesome-omni-skill

Expert MLOps engineer specializing in ML infrastructure, platform engineering, and operational excellence for machine learning systems. Masters CI/CD for ML, model versioning, and scalable ML platforms with focus on reliability and automation.

bgo

10
from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

obsidian-daily

16
from diegosouzapw/awesome-omni-skill

Manage Obsidian Daily Notes via obsidian-cli. Create and open daily notes, append entries (journals, logs, tasks, links), read past notes by date, and search vault content. Handles relative dates like "yesterday", "last Friday", "3 days ago".

obsidian-additions

16
from diegosouzapw/awesome-omni-skill

Create supplementary materials attached to existing notes: experiments, meetings, reports, logs, conspectuses, practice sessions, annotations, AI outputs, links collections. Two-step process: (1) create aggregator space, (2) create concrete addition in base/additions/. INVOKE when user wants to attach any supplementary material to an existing note. Triggers: "addition", "create addition", "experiment", "meeting notes", "report", "conspectus", "log", "practice", "annotations", "links", "link collection", "аддишн", "конспект", "встреча", "отчёт", "эксперимент", "практика", "аннотации", "ссылки", "добавь к заметке".

observe

16
from diegosouzapw/awesome-omni-skill

Query and manage Observe using the Observe CLI. Use when the user wants to run OPAL queries, list datasets, manage objects, or interact with their Observe tenant from the command line.

observability-review

16
from diegosouzapw/awesome-omni-skill

AI agent that analyzes operational signals (metrics, logs, traces, alerts, SLO/SLI reports) from observability platforms (Prometheus, Datadog, New Relic, CloudWatch, Grafana, Elastic) and produces practical, risk-aware triage and recommendations. Use when reviewing system health, investigating performance issues, analyzing monitoring data, evaluating service reliability, or providing SRE analysis of operational metrics. Distinguishes between critical issues requiring action, items needing investigation, and informational observations requiring no action.

nvidia-nim

16
from diegosouzapw/awesome-omni-skill

NVIDIA NIM inference microservices for deploying AI models with OpenAI-compatible APIs, self-hosted or cloud

numpy-string-ops

16
from diegosouzapw/awesome-omni-skill

Vectorized string manipulation using the char module and modern string alternatives, including cleaning and search operations. Triggers: string operations, numpy.char, text cleaning, substring search.

nova-act-usability

16
from diegosouzapw/awesome-omni-skill

AI-orchestrated usability testing using Amazon Nova Act. The agent generates personas, runs tests to collect raw data, interprets responses to determine goal achievement, and generates HTML reports. Tests real user workflows (booking, checkout, posting) with safety guardrails. Use when asked to "test website usability", "run usability test", "generate usability report", "evaluate user experience", "test checkout flow", "test booking process", or "analyze website UX".

notebook-writer

16
from diegosouzapw/awesome-omni-skill

Create and document Jupyter notebooks for reproducible analyses

nomistakes

16
from diegosouzapw/awesome-omni-skill

Error prevention and best practices enforcement for agent-assisted coding. Use when writing code to catch common mistakes, enforce patterns, prevent bugs, validate inputs, handle errors, follow coding standards, avoid anti-patterns, and ensure code quality through proactive checks and guardrails.