tracking-model-versions

Build this skill enables AI assistant to track and manage ai/ml model versions using the model-versioning-tracker plugin. it should be used when the user asks to manage model versions, track model lineage, log model performance, or implement version control f... Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.

1,868 stars

byjeremylongshore

View on GitHub Installation ↓

Best use case

tracking-model-versions is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using tracking-model-versions should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/tracking-model-versions/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/ai-ml/model-versioning-tracker/skills/tracking-model-versions/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/tracking-model-versions/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How tracking-model-versions Compares

Feature / Agent	tracking-model-versions	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

SKILL.md Source

# Model Versioning Tracker

## Overview

Track and manage AI/ML model versions using MLflow, DVC, or Weights & Biases. Log model metadata (hyperparameters, training data hash, framework version), record evaluation metrics (accuracy, F1, latency), manage model registry transitions (Staging, Production, Archived), and generate model cards documenting lineage and performance.

## Prerequisites

- MLflow tracking server running locally or remotely (`mlflow server` or managed MLflow)
- Python 3.9+ with `mlflow`, `pandas`, and the relevant ML framework installed
- Model artifacts accessible on the local filesystem or cloud storage (S3, GCS)
- Write access to the MLflow tracking URI and artifact store

## Instructions

1. Connect to the MLflow tracking server by setting `MLFLOW_TRACKING_URI` and verify connectivity with `mlflow experiments list`.
2. Create or select an MLflow experiment for the model project using `mlflow experiments create --experiment-name <name>`.
3. Log a new model version: start an MLflow run, log parameters (learning rate, epochs, batch size), log metrics (accuracy, loss, F1 score), and log the model artifact with `mlflow.<flavor>.log_model()`.
4. Register the model in the MLflow Model Registry using `mlflow.register_model()` with the run URI and a descriptive model name.
5. Transition the model version through stages: `None` -> `Staging` -> `Production` using `client.transition_model_version_stage()`. Archive previous production versions.
6. Compare model versions by querying metrics across runs with `mlflow.search_runs()` and generating comparison tables showing metric improvements between versions.
7. Generate a model card from the registered model metadata, including training data description, evaluation metrics, intended use, limitations, and ethical considerations. See `${CLAUDE_SKILL_DIR}/assets/model_card_template.md`.
8. Set up automated alerts for model performance degradation by comparing production metrics against baseline thresholds stored in the model registry.

See `${CLAUDE_SKILL_DIR}/assets/example_mlflow_workflow.yaml` for a complete workflow configuration.

## Examples

**Tracking a new image classification model version**: Log a ResNet-50 fine-tuned on a custom dataset. Record hyperparameters (lr=0.001, epochs=50, optimizer=Adam), metrics (val_accuracy=0.94, val_loss=0.18, inference_latency_ms=12), and the serialized model artifact. Register as version 3 in the model registry and transition to Staging for validation.

**Comparing model versions before production promotion**: Query MLflow for all versions of the sentiment-analysis model. Generate a comparison table showing accuracy improved from 0.87 (v2) to 0.91 (v3) while inference latency increased from 8ms to 15ms. Recommend promoting v3 to Production only if latency is acceptable for the use case.

**Generating a model card for compliance review**: Extract metadata from MLflow model registry version 5: training dataset (100K customer reviews), evaluation results (F1=0.89 on held-out test set), known limitations (struggles with sarcasm and multilingual input), and intended use (customer feedback classification). Output a structured Markdown model card.

## Output

- MLflow run with logged parameters, metrics, and model artifact
- Model registry entry with version number and stage assignment
- Version comparison table with metric deltas across runs
- Model card in Markdown format documenting lineage, performance, and limitations

## Error Handling

| Error | Cause | Solution |
|-------|-------|----------|
| MLflow connection refused | Tracking server not running or wrong URI | Verify `MLFLOW_TRACKING_URI` is correct; start server with `mlflow server --host 0.0.0.0 --port 5000` |
| Artifact upload failed | Insufficient permissions on artifact store | Check S3/GCS bucket permissions; verify IAM role has write access to the artifact path |
| Model registration conflict | Model name already exists with incompatible schema | Use a versioned model name or delete the conflicting registry entry |
| Metrics not logged | MLflow run ended before logging completed | Ensure all `log_metric()` calls happen within the active run context (`with mlflow.start_run():`) |
| Stage transition denied | Model version already in target stage | Archive the existing version in that stage first, then retry the transition |

## Resources

- MLflow documentation: https://mlflow.org/docs/latest/index.html
- MLflow Model Registry: https://mlflow.org/docs/latest/model-registry.html
- DVC (Data Version Control): https://dvc.org/doc
- Weights & Biases Model Registry: https://docs.wandb.ai/guides/model-registry
- ML Model Cards: https://modelcards.withgoogle.com/about

Related Skills

tracking-regression-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Track and manage regression test suites across releases. Use when performing specialized testing. Trigger with phrases like "track regressions", "manage regression suite", or "validate against baseline".

openrouter-model-routing

1868

from jeremylongshore/claude-code-plugins-plus-skills

Implement intelligent model routing to optimize cost, quality, and latency on OpenRouter. Use when building multi-model systems or optimizing spend across task types. Triggers: 'openrouter routing', 'model routing', 'route to model', 'model selection openrouter'.

openrouter-model-catalog

1868

from jeremylongshore/claude-code-plugins-plus-skills

Query, filter, and select from OpenRouter's 400+ model catalog. Use when choosing models, comparing pricing, or checking capabilities. Triggers: 'openrouter models', 'list models', 'model catalog', 'compare models', 'available models'.

openrouter-model-availability

1868

from jeremylongshore/claude-code-plugins-plus-skills

Monitor OpenRouter model availability and implement health checks. Use when building systems that depend on specific models being online. Triggers: 'openrouter model status', 'is model available', 'openrouter health check', 'model availability'.

klingai-model-catalog

1868

from jeremylongshore/claude-code-plugins-plus-skills

Explore Kling AI models, versions, and capabilities for video and image generation. Use when selecting models or comparing features. Trigger with phrases like 'kling ai models', 'klingai capabilities', 'kling video models', 'klingai features'.

cursor-model-selection

1868

from jeremylongshore/claude-code-plugins-plus-skills

Configure and select AI models in Cursor for Chat, Composer, and Agent mode. Triggers on "cursor model", "cursor gpt", "cursor claude", "change cursor model", "cursor ai model", "cursor auto mode".

clade-model-inference

1868

from jeremylongshore/claude-code-plugins-plus-skills

Stream Claude responses, use system prompts, handle multi-turn conversations, Use when working with model-inference patterns. and process structured output with the Messages API. Trigger with "anthropic streaming", "claude messages api", "claude inference", "stream claude response".

tracking-service-reliability

1868

from jeremylongshore/claude-code-plugins-plus-skills

Define and track SLAs, SLIs, and SLOs for service reliability including availability, latency, and error rates. Use when establishing reliability targets or monitoring service health. Trigger with phrases like "define SLOs", "track SLI metrics", or "calculate error budget".

tracking-application-response-times

1868

from jeremylongshore/claude-code-plugins-plus-skills

Track and optimize application response times across API endpoints, database queries, and service calls. Use when monitoring performance or identifying bottlenecks. Trigger with phrases like "track response times", "monitor API performance", or "analyze latency".

tracking-resource-usage

1868

from jeremylongshore/claude-code-plugins-plus-skills

Track and optimize resource usage across application stack including CPU, memory, disk, and network I/O. Use when identifying bottlenecks or optimizing costs. Trigger with phrases like "track resource usage", "monitor CPU and memory", or "optimize resource allocation".

modeling-nosql-data

1868

from jeremylongshore/claude-code-plugins-plus-skills

Build use when you need to work with NoSQL data modeling. This skill provides NoSQL database design with comprehensive guidance and automation. Trigger with phrases like "model NoSQL data", "design document structure", or "optimize NoSQL schema".

tracking-token-launches

1868

from jeremylongshore/claude-code-plugins-plus-skills

Track new token launches across DEXes with risk analysis and contract verification. Use when discovering new token launches, monitoring IDOs, or analyzing token contracts. Trigger with phrases like "track launches", "find new tokens", "new pairs on uniswap", "token risk analysis", or "monitor IDOs".