setting-up-experiment-tracking

Implement machine learning experiment tracking using MLflow or Weights & Biases. Configures environment and provides code for logging parameters, metrics, and artifacts. Use when asked to "setup experiment tracking" or "initialize MLflow". Trigger with relevant phrases based on skill purpose.

1,868 stars

Best use case

setting-up-experiment-tracking is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Implement machine learning experiment tracking using MLflow or Weights & Biases. Configures environment and provides code for logging parameters, metrics, and artifacts. Use when asked to "setup experiment tracking" or "initialize MLflow". Trigger with relevant phrases based on skill purpose.

Teams using setting-up-experiment-tracking should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/setting-up-experiment-tracking/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/ai-ml/experiment-tracking-setup/skills/setting-up-experiment-tracking/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/setting-up-experiment-tracking/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How setting-up-experiment-tracking Compares

Feature / Agentsetting-up-experiment-trackingStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Implement machine learning experiment tracking using MLflow or Weights & Biases. Configures environment and provides code for logging parameters, metrics, and artifacts. Use when asked to "setup experiment tracking" or "initialize MLflow". Trigger with relevant phrases based on skill purpose.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Experiment Tracking Setup

Configure ML experiment tracking with MLflow or Weights & Biases, including environment setup and code for logging parameters, metrics, and artifacts.

## Overview

This skill streamlines the process of setting up experiment tracking for machine learning projects. It automates environment configuration, tool initialization, and provides code examples to get you started quickly.

## How It Works

1. **Analyze Context**: The skill analyzes the current project context to determine the appropriate experiment tracking tool (MLflow or W&B) based on user preference or existing project configuration.
2. **Configure Environment**: It configures the environment by installing necessary Python packages and setting environment variables.
3. **Initialize Tracking**: The skill initializes the chosen tracking tool, potentially starting a local MLflow server or connecting to a W&B project.
4. **Provide Code Snippets**: It provides code snippets demonstrating how to log experiment parameters, metrics, and artifacts within your ML code.

## When to Use This Skill

This skill activates when you need to:
- Start tracking machine learning experiments in a new project.
- Integrate experiment tracking into an existing ML project.
- Quickly set up MLflow or Weights & Biases for experiment management.
- Automate the process of logging parameters, metrics, and artifacts.

## Examples

### Example 1: Starting a New Project with MLflow

User request: "track experiments using mlflow"

The skill will:
1. Install the `mlflow` Python package.
2. Generate example code for logging parameters, metrics, and artifacts to an MLflow server.

### Example 2: Integrating W&B into an Existing Project

User request: "setup experiment tracking with wandb"

The skill will:
1. Install the `wandb` Python package.
2. Generate example code for initializing W&B and logging experiment data.

## Best Practices

- **Tool Selection**: Consider the scale and complexity of your project when choosing between MLflow and W&B. MLflow is well-suited for local tracking, while W&B offers cloud-based collaboration and advanced features.
- **Consistent Logging**: Establish a consistent logging strategy for parameters, metrics, and artifacts to ensure comparability across experiments.
- **Artifact Management**: Utilize artifact logging to track models, datasets, and other relevant files associated with each experiment.

## Integration

This skill can be used in conjunction with other skills that generate or modify machine learning code, such as skills for model training or data preprocessing. It ensures that all experiments are properly tracked and documented.

## Prerequisites

- Appropriate file access permissions
- Required dependencies installed

## Instructions

1. Invoke this skill when the trigger conditions are met
2. Provide necessary context and parameters
3. Review the generated output
4. Apply modifications as needed

## Output

The skill produces structured output relevant to the task.

## Error Handling

- Invalid input: Prompts for correction
- Missing dependencies: Lists required components
- Permission errors: Suggests remediation steps

## Resources

- Project documentation
- Related skills and commands

Related Skills

tracking-regression-tests

1868
from jeremylongshore/claude-code-plugins-plus-skills

Track and manage regression test suites across releases. Use when performing specialized testing. Trigger with phrases like "track regressions", "manage regression suite", or "validate against baseline".

windsurf-team-settings

1868
from jeremylongshore/claude-code-plugins-plus-skills

Manage team-wide Windsurf settings and AI policies. Activate when users mention "team settings", "organization config", "team policies", "shared settings", or "team standardization". Handles team configuration management. Use when working with windsurf team settings functionality. Trigger with phrases like "windsurf team settings", "windsurf settings", "windsurf".

cursor-privacy-settings

1868
from jeremylongshore/claude-code-plugins-plus-skills

Configure Cursor privacy mode, data handling, telemetry, and sensitive file exclusion. Triggers on "cursor privacy", "cursor data", "cursor security", "privacy mode", "cursor telemetry", "cursor data retention".

setting-up-synthetic-monitoring

1868
from jeremylongshore/claude-code-plugins-plus-skills

Setup synthetic monitoring for proactive performance tracking including uptime checks, transaction monitoring, and API health. Use when implementing availability monitoring or tracking critical user journeys. Trigger with phrases like "setup synthetic monitoring", "monitor uptime", or "configure health checks".

tracking-service-reliability

1868
from jeremylongshore/claude-code-plugins-plus-skills

Define and track SLAs, SLIs, and SLOs for service reliability including availability, latency, and error rates. Use when establishing reliability targets or monitoring service health. Trigger with phrases like "define SLOs", "track SLI metrics", or "calculate error budget".

tracking-application-response-times

1868
from jeremylongshore/claude-code-plugins-plus-skills

Track and optimize application response times across API endpoints, database queries, and service calls. Use when monitoring performance or identifying bottlenecks. Trigger with phrases like "track response times", "monitor API performance", or "analyze latency".

tracking-resource-usage

1868
from jeremylongshore/claude-code-plugins-plus-skills

Track and optimize resource usage across application stack including CPU, memory, disk, and network I/O. Use when identifying bottlenecks or optimizing costs. Trigger with phrases like "track resource usage", "monitor CPU and memory", or "optimize resource allocation".

setting-up-distributed-tracing

1868
from jeremylongshore/claude-code-plugins-plus-skills

Execute this skill automates the setup of distributed tracing for microservices. it helps developers implement end-to-end request visibility by configuring context propagation, span creation, trace collection, and analysis. use this skill when the user re... Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.

setting-up-log-aggregation

1868
from jeremylongshore/claude-code-plugins-plus-skills

Execute use when setting up log aggregation solutions using ELK, Loki, or Splunk. Trigger with phrases like "setup log aggregation", "deploy ELK stack", "configure Loki", or "install Splunk". Generates production-ready configurations for data ingestion, processing, storage, and visualization with proper security and scalability.

tracking-token-launches

1868
from jeremylongshore/claude-code-plugins-plus-skills

Track new token launches across DEXes with risk analysis and contract verification. Use when discovering new token launches, monitoring IDOs, or analyzing token contracts. Trigger with phrases like "track launches", "find new tokens", "new pairs on uniswap", "token risk analysis", or "monitor IDOs".

tracking-crypto-prices

1868
from jeremylongshore/claude-code-plugins-plus-skills

Track real-time cryptocurrency prices across exchanges with historical data and alerts. Provides price data infrastructure for dependent skills (portfolio, tax, DeFi, arbitrage). Use when checking crypto prices, monitoring markets, or fetching historical price data. Trigger with phrases like "check price", "BTC price", "crypto prices", "price history", "get quote for", "what's ETH trading at", "show me top coins", or "track my watchlist".

tracking-crypto-portfolio

1868
from jeremylongshore/claude-code-plugins-plus-skills

Track cryptocurrency portfolio with real-time valuations, allocation analysis, and P&L tracking. Use when checking portfolio value, viewing holdings breakdown, analyzing allocations, or exporting portfolio data. Trigger with phrases like "show my portfolio", "check crypto holdings", "portfolio allocation", "track my crypto", or "export portfolio".