ml-experiment-tracker

Plan reproducible ML experiment runs with explicit parameters, metrics, and artifacts. Use before model training to standardize tracking-ready experiment definitions.

3,891 stars
Complexity: easy

About this skill

The ML Experiment Tracker skill is designed to assist users, particularly ML engineers and data scientists, in creating structured and reproducible plans for their machine learning experiments. It provides a guided workflow to define crucial experiment components such as datasets, target tasks, model families, and the parameter search space. By standardizing these definitions, the skill ensures consistency in how experiments are set up and subsequently logged in experiment tracking systems. This is vital for maintaining clear research and development records, enabling easier comparison between different runs, and fostering effective collaboration within ML teams. Before any model training commences, this skill prompts users to define metrics and acceptance thresholds, ensuring that clear success criteria are established upfront. It culminates in producing a detailed run plan, complete with versioning and artifact expectations. This plan can then be exported for execution with various tracking tools, significantly enhancing experimental rigor, clarity, and overall reproducibility.

Best use case

The primary use case for the ML Experiment Tracker is to standardize the planning phase of machine learning projects, ensuring reproducibility and consistent logging. It is most beneficial for ML engineers, data scientists, and research teams who need to manage multiple experiment runs, systematically compare model performance, and ensure that all experimental details (parameters, metrics, artifacts) are explicitly defined and trackable from the outset.

Plan reproducible ML experiment runs with explicit parameters, metrics, and artifacts. Use before model training to standardize tracking-ready experiment definitions.

A structured, machine-readable ML experiment run plan detailing datasets, parameters, metrics, artifacts, and versioning for consistent logging and execution.

Practical example

Example input

Plan a new experiment for a multi-class text classification model. Use the AG News dataset, target F1-score (macro), and evaluate fastText and TF-IDF + Logistic Regression models. Include a hyperparameter search space for learning rates [0.01, 0.001] and regularization strength [0.1, 1.0].

Example output

```json
{
  "experiment_name": "text-classification-v2",
  "dataset": "AG News",
  "task": "multi-class text classification",
  "model_family": [
    "fastText",
    "TF-IDF_LogisticRegression"
  ],
  "parameters_search_space": {
    "learning_rate": [
      0.01,
      0.001
    ],
    "regularization_strength": [
      0.1,
      1.0
    ]
  },
  "metrics": [
    {
      "name": "f1_score_macro",
      "threshold": 0.85,
      "direction": "maximize"
    }
  ],
  "artifacts_to_track": [
    "model_weights",
    "training_logs",
    "confusion_matrix.png"
  ],
  "version": "2.0.0"
}
```

When to use this skill

  • Before starting a new ML model training project.
  • When reproducibility of ML experiments is a high priority.
  • To standardize experiment definitions across a team or organization.
  • When integrating with experiment tracking systems like MLflow, Weights & Biases, etc.

When not to use this skill

  • For tasks unrelated to ML experiment planning or tracking.
  • When quick, ad-hoc model training without formal tracking is sufficient.
  • If you already have a mature and standardized experiment planning process fully automated.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ml-experiment-tracker/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/0x-professor/ml-experiment-tracker/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/ml-experiment-tracker/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How ml-experiment-tracker Compares

Feature / Agentml-experiment-trackerStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityeasyN/A

Frequently Asked Questions

What does this skill do?

Plan reproducible ML experiment runs with explicit parameters, metrics, and artifacts. Use before model training to standardize tracking-ready experiment definitions.

How difficult is it to install?

The installation complexity is rated as easy. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# ML Experiment Tracker

## Overview

Generate structured experiment plans that can be logged consistently in experiment tracking systems.

## Workflow

1. Define dataset, target task, model family, and parameter search space.
2. Define metrics and acceptance thresholds before training.
3. Produce run plan with version and artifact expectations.
4. Export the run plan for execution in tracking tools.

## Use Bundled Resources

- Run `scripts/build_experiment_plan.py` to generate consistent run plans.
- Read `references/tracking-guide.md` for reproducibility checklist.

## Guardrails

- Keep inputs explicit and machine-readable.
- Always include metrics and baseline criteria.

Related Skills

tavily-search

3891
from openclaw/skills

Use Tavily API for real-time web search and content extraction. Use when: user needs real-time web search results, research, or current information from the web. Requires Tavily API key.

Data & Research

baidu-search

3891
from openclaw/skills

Search the web using Baidu AI Search Engine (BDSE). Use for live information, documentation, or research topics.

Data & Research

notebooklm

3891
from openclaw/skills

Google NotebookLM 非官方 Python API 的 OpenClaw Skill。支持内容生成(播客、视频、幻灯片、测验、思维导图等)、文档管理和研究自动化。当用户需要使用 NotebookLM 生成音频概述、视频、学习材料或管理知识库时触发。

Data & Research

openclaw-search

3891
from openclaw/skills

Intelligent search for agents. Multi-source retrieval with confidence scoring - web, academic, and Tavily in one unified API.

Data & Research

aisa-tavily

3891
from openclaw/skills

AI-optimized web search via AIsa's Tavily API proxy. Returns concise, relevant results for AI agents through AIsa's unified API gateway.

Data & Research

Market Sizing — TAM/SAM/SOM Calculator

3891
from openclaw/skills

Build defensible market sizing for any product, pitch deck, or business case. Top-down and bottom-up methodologies combined.

Data & Research

Data Analyst — AfrexAI ⚡📊

3891
from openclaw/skills

**Transform raw data into decisions. Not just charts — answers.**

Data & Research

Competitor Monitor

3891
from openclaw/skills

Tracks and analyzes competitor moves — pricing changes, feature launches, hiring, and positioning shifts

Data & Research

afrexai-competitive-intel

3891
from openclaw/skills

Complete competitive intelligence system — market mapping, product teardowns, pricing intel, win/loss analysis, battlecards, and strategic monitoring. Goes far beyond SEO to cover the full business landscape.

Data & Research

trending-news-aggregator

3891
from openclaw/skills

智能热点新闻聚合器 - 自动抓取多平台热点新闻, AI分析趋势,支持定时推送和热度评分。 核心功能: - 每天自动聚合多平台热点(微博、知乎、百度等) - 智能分类(科技、财经、社会、国际等) - 热度评分算法 - 增量检测(标记新增热点) - AI趋势分析

Data & Research

search-cluster

3891
from openclaw/skills

Aggregated search aggregator using Google CSE, GNews RSS, Wikipedia, Reddit, and Scrapling.

Data & Research

data-analysis-partner

3891
from openclaw/skills

智能数据分析 Skill,输入 CSV/Excel 文件和分析需求,输出带交互式 ECharts 图表的 HTML 自包含分析报告

Data & Research