ML Experiment Tracking

Track machine learning experiments with reproducible parameters and metrics

16 stars

Best use case

ML Experiment Tracking is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Track machine learning experiments with reproducible parameters and metrics

Teams using ML Experiment Tracking should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ml-experiment-tracking/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/data-ai/ml-experiment-tracking/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/ml-experiment-tracking/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ML Experiment Tracking Compares

Feature / Agent	ML Experiment Tracking	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Track machine learning experiments with reproducible parameters and metrics

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# ML Experiment Tracking Skill

Track machine learning experiments with reproducible parameters and metrics.

## Trigger Conditions
- Model configuration changes or hyperparameter updates
- New experiment run initiated
- User invokes with "track experiment" or "compare models"

## Input Contract
- **Required:** Experiment parameters (model, hyperparameters, data)
- **Required:** Evaluation metrics
- **Optional:** Baseline comparison, hypothesis

## Output Contract
- Experiment log entry with full reproducibility info
- Comparison table against baseline/prior runs
- Recommendation on whether to promote or iterate

## Tool Permissions
- **Read:** Model configs, training data metadata, metric logs
- **Write:** Experiment logs, comparison reports
- **Execute:** Metric collection commands

## Execution Steps
1. Record experiment hypothesis and parameters
2. Capture environment (dependencies, data version, code commit)
3. Execute or observe training run
4. Collect metrics and artifacts
5. Compare against baseline and prior experiments
6. Recommend: promote, iterate, or abandon

## Success Criteria
- Experiment is fully reproducible from logged parameters
- Metrics compared against baseline
- Clear recommendation with rationale

## Escalation Rules
- Escalate if model performance degrades vs. baseline
- Escalate if data drift detected in training set
- Escalate if experiment requires new infrastructure

## Example Invocations

**Input:** "Compare the BERT-base and DistilBERT models for our classification task"

**Output:** Experiment log: BERT-base (F1: 0.92, latency: 45ms, size: 440MB) vs DistilBERT (F1: 0.89, latency: 12ms, size: 260MB). Recommendation: DistilBERT for production (3% F1 trade-off for 73% latency improvement). Promote to staging for A/B test.

Related Skills

asset-tracking

from diegosouzapw/awesome-omni-skill

Use when managing asset metadata, dependencies, and delivery workflows across teams.

analytics-tracking

from diegosouzapw/awesome-omni-skill

（中文）When the user wants to set up, improve, or audit analytics tracking and measurement. Also use when the user mentions "set up tracking," "GA4," "Google Analytics," "conversion tracking," "event tracking," "UTM parameters," "tag manager," "GTM," "analytics implementation," or "tracking plan." For A/B test measurement, see ab-test-setup.

prediction-tracking

from diegosouzapw/awesome-omni-skill

Track and evaluate AI predictions over time to assess accuracy. Use when reviewing past predictions to determine if they came true, failed, or remain uncertain.

aiwf:error-tracking

from diegosouzapw/awesome-omni-skill

Add Sentry v8 error tracking and performance monitoring to your project services. Use this skill when adding error handling, creating new controllers, instrumenting cron jobs, or tracking database performance. ALL ERRORS MUST BE CAPTURED TO SENTRY - no exceptions.

artifact-tracking

from diegosouzapw/awesome-omni-skill

Token-efficient tracking for AI orchestration. CLI-first for status updates (~50 tokens), agent fallback for complex ops (~1KB). Use when: updating task status, querying blockers, creating progress files, validating phases.

agentic-kpi-tracking

from diegosouzapw/awesome-omni-skill

Track and measure agentic coding KPIs for ZTE progression. Use when measuring workflow effectiveness, tracking Size/Attempts/Streak/Presence metrics, or assessing readiness for autonomous operation.

bgo

from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

obsidian-daily

from diegosouzapw/awesome-omni-skill

Manage Obsidian Daily Notes via obsidian-cli. Create and open daily notes, append entries (journals, logs, tasks, links), read past notes by date, and search vault content. Handles relative dates like "yesterday", "last Friday", "3 days ago".

obsidian-additions

from diegosouzapw/awesome-omni-skill

Create supplementary materials attached to existing notes: experiments, meetings, reports, logs, conspectuses, practice sessions, annotations, AI outputs, links collections. Two-step process: (1) create aggregator space, (2) create concrete addition in base/additions/. INVOKE when user wants to attach any supplementary material to an existing note. Triggers: "addition", "create addition", "experiment", "meeting notes", "report", "conspectus", "log", "practice", "annotations", "links", "link collection", "аддишн", "конспект", "встреча", "отчёт", "эксперимент", "практика", "аннотации", "ссылки", "добавь к заметке".

observe

from diegosouzapw/awesome-omni-skill

Query and manage Observe using the Observe CLI. Use when the user wants to run OPAL queries, list datasets, manage objects, or interact with their Observe tenant from the command line.

observability-review

from diegosouzapw/awesome-omni-skill

AI agent that analyzes operational signals (metrics, logs, traces, alerts, SLO/SLI reports) from observability platforms (Prometheus, Datadog, New Relic, CloudWatch, Grafana, Elastic) and produces practical, risk-aware triage and recommendations. Use when reviewing system health, investigating performance issues, analyzing monitoring data, evaluating service reliability, or providing SRE analysis of operational metrics. Distinguishes between critical issues requiring action, items needing investigation, and informational observations requiring no action.

nvidia-nim

from diegosouzapw/awesome-omni-skill

NVIDIA NIM inference microservices for deploying AI models with OpenAI-compatible APIs, self-hosted or cloud