mlflow-python

MLflow experiment tracking via Python API. TRIGGERS - MLflow metrics, log backtest, experiment tracking, search runs.

29 stars

byterrylica

View on GitHub Installation ↓

Best use case

mlflow-python is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

MLflow experiment tracking via Python API. TRIGGERS - MLflow metrics, log backtest, experiment tracking, search runs.

Teams using mlflow-python should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/mlflow-python/SKILL.md --create-dirs "https://raw.githubusercontent.com/terrylica/cc-skills/main/plugins/devops-tools/skills/mlflow-python/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/mlflow-python/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How mlflow-python Compares

Feature / Agent	mlflow-python	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

MLflow experiment tracking via Python API. TRIGGERS - MLflow metrics, log backtest, experiment tracking, search runs.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# MLflow Python Skill

Unified read/write MLflow operations via Python API with QuantStats integration for comprehensive trading metrics.

**ADR**: [2025-12-12-mlflow-python-skill](/docs/adr/2025-12-12-mlflow-python-skill.md)

> **Note**: This skill uses Pandas (MLflow API requires it). The `mlflow-python` path is auto-skipped by the Polars preference hook.

> **Self-Evolving Skill**: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues.

## When to Use This Skill

**CAN Do**:

- Log backtest metrics (Sharpe, max_drawdown, total_return, etc.)
- Log experiment parameters (strategy config, timeframes)
- Create and manage experiments
- Query runs with SQL-like filtering
- Calculate 70+ trading metrics via QuantStats
- Retrieve metric history (time-series data)

**CANNOT Do**:

- Direct database access to MLflow backend
- Artifact storage management (S3/GCS configuration)
- MLflow server administration

## Prerequisites

### Authentication Setup

MLflow uses separate environment variables for credentials (NOT embedded in URI):

```bash
# Option 1: mise + .env.local (recommended)
# Create .env.local in skill directory with:
MLFLOW_TRACKING_URI=http://mlflow.eonlabs.com:5000
MLFLOW_TRACKING_USERNAME=eonlabs
MLFLOW_TRACKING_PASSWORD=<password>

# Option 2: Direct environment variables
export MLFLOW_TRACKING_URI="http://mlflow.eonlabs.com:5000"
export MLFLOW_TRACKING_USERNAME="eonlabs"
export MLFLOW_TRACKING_PASSWORD="<password>"
```

### Verify Connection

```bash
/usr/bin/env bash << 'SKILL_SCRIPT_EOF'
cd ${CLAUDE_PLUGIN_ROOT}/skills/mlflow-python
uv run scripts/query_experiments.py experiments
SKILL_SCRIPT_EOF
```

## Quick Start Workflows

### A. Log Backtest Results (Primary Use Case)

```bash
/usr/bin/env bash << 'SKILL_SCRIPT_EOF_2'
cd ${CLAUDE_PLUGIN_ROOT}/skills/mlflow-python
uv run scripts/log_backtest.py \
  --experiment "crypto-backtests" \
  --run-name "btc_momentum_v2" \
  --returns path/to/returns.csv \
  --params '{"strategy": "momentum", "timeframe": "1h"}'
SKILL_SCRIPT_EOF_2
```

### B. Search Experiments

```bash
uv run scripts/query_experiments.py experiments
```

### C. Query Runs with Filter

```bash
uv run scripts/query_experiments.py runs \
  --experiment "crypto-backtests" \
  --filter "metrics.sharpe_ratio > 1.5" \
  --order-by "metrics.sharpe_ratio DESC"
```

### D. Create New Experiment

```bash
uv run scripts/create_experiment.py \
  --name "crypto-backtests-2025" \
  --description "Q1 2025 cryptocurrency trading strategy backtests"
```

### E. Get Metric History

```bash
uv run scripts/get_metric_history.py \
  --run-id abc123 \
  --metrics sharpe_ratio,cumulative_return
```

## QuantStats Metrics Available

The `log_backtest.py` script calculates 70+ metrics via QuantStats, including:

| Category     | Metrics                                                           |
| ------------ | ----------------------------------------------------------------- |
| **Ratios**   | sharpe, sortino, calmar, omega, treynor                           |
| **Returns**  | cagr, total_return, avg_return, best, worst                       |
| **Drawdown** | max_drawdown, avg_drawdown, drawdown_days                         |
| **Trade**    | win_rate, profit_factor, payoff_ratio, consecutive_wins/losses    |
| **Risk**     | volatility, var, cvar, ulcer_index, serenity_index                |
| **Advanced** | kelly_criterion, recovery_factor, risk_of_ruin, information_ratio |

See [quantstats-metrics.md](./references/quantstats-metrics.md) for full list.

## Bundled Scripts

| Script                  | Purpose                                      |
| ----------------------- | -------------------------------------------- |
| `log_backtest.py`       | Log backtest returns with QuantStats metrics |
| `query_experiments.py`  | Search experiments and runs (replaces CLI)   |
| `create_experiment.py`  | Create new experiment with metadata          |
| `get_metric_history.py` | Retrieve metric time-series data             |

## Configuration

The skill uses mise `[env]` pattern for configuration. See `.mise.toml` for defaults.

Create `.env.local` (gitignored) for credentials:

```bash
MLFLOW_TRACKING_URI=http://mlflow.eonlabs.com:5000
MLFLOW_TRACKING_USERNAME=eonlabs
MLFLOW_TRACKING_PASSWORD=<password>
```

## Reference Documentation

- [Authentication Patterns](./references/authentication.md) - Idiomatic MLflow auth
- [QuantStats Metrics](./references/quantstats-metrics.md) - Full list of 70+ metrics
- [Query Patterns](./references/query-patterns.md) - DataFrame operations
- [Migration from CLI](./references/migration-from-cli.md) - CLI to Python API mapping

## Migration from mlflow-query

This skill replaces the CLI-based `mlflow-query` skill. Key differences:

| Feature        | mlflow-query (old) | mlflow-python (new)    |
| -------------- | ------------------ | ---------------------- |
| Log metrics    | Not supported      | `mlflow.log_metrics()` |
| Log params     | Not supported      | `mlflow.log_params()`  |
| Query runs     | CLI text parsing   | DataFrame output       |
| Metric history | Workaround only    | Native support         |
| Auth pattern   | Embedded in URI    | Separate env vars      |

See [migration-from-cli.md](./references/migration-from-cli.md) for detailed mapping.

---

## Troubleshooting

| Issue                   | Cause                        | Solution                                            |
| ----------------------- | ---------------------------- | --------------------------------------------------- |
| Connection refused      | MLflow server not running    | Verify MLFLOW_TRACKING_URI and server status        |
| Authentication failed   | Wrong credentials            | Check MLFLOW_TRACKING_USERNAME and PASSWORD in .env |
| Experiment not found    | Experiment name typo         | Run `query_experiments.py experiments` to list all  |
| QuantStats import error | Missing dependency           | `uv add quantstats` in skill directory              |
| Pandas import warning   | Expected for this skill      | Ignore - MLflow requires Pandas (hook-excluded)     |
| Run creation fails      | Experiment doesn't exist     | Use `create_experiment.py` to create first          |
| Metric history empty    | Wrong run_id or metric name  | Verify run_id with `query_experiments.py runs`      |
| Returns CSV parse error | Wrong date format or columns | Check CSV has date index and returns column         |


## Post-Execution Reflection

After this skill completes, check before closing:

1. **Did the command succeed?** — If not, fix the instruction or error table that caused the failure.
2. **Did parameters or output change?** — If the underlying tool's interface drifted, update Usage examples and Parameters table to match.
3. **Was a workaround needed?** — If you had to improvise (different flags, extra steps), update this SKILL.md so the next invocation doesn't need the same workaround.

Only update if the issue is real and reproducible — not speculative.

Related Skills

python-workspace

from terrylica/cc-skills

Python workspace for MQL5 integration. TRIGGERS - MetaTrader 5 Python, mt5 package, MQL5-Python setup.

python-memory-safe-scripts

from terrylica/cc-skills

Memory-safe Python script patterns for long-running processes under systemd MemoryMax constraints. Covers allocator purge (mimalloc/glibc malloc_trim), HTTP response lifecycle, DataFrame cleanup, thread-local connection reuse, and periodic GC cadence. Battle-tested through 5 OOM optimization cycles on production GPU workstations. Use this skill proactively whenever writing or reviewing Python scripts that: run under systemd with MemoryMax, process data in loops (downloads, ETL, backfill), use ThreadPoolExecutor, or make repeated HTTP requests. Also use when diagnosing OOM kills, RSS creep, or fd exhaustion in Python services. TRIGGERS - memory optimization, OOM prevention, RSS reduction, malloc_trim, systemd MemoryMax, memory leak, allocator purge, memory-safe script, RSS creep, fd exhaustion, SIGKILL status 9, MemoryHigh, glibc arena, mimalloc purge, requests memory leak, ThreadPoolExecutor cleanup.

python-logging-best-practices

from terrylica/cc-skills

Python logging with loguru, structlog, and orjson. TRIGGERS - loguru, structlog, structured logging, JSONL logs, log rotation, secret redaction, OTel logging, lightweight logging, print logging, systemd logging.

voice-quality-audition

from terrylica/cc-skills

Audition Kokoro TTS voices to compare quality and grade. TRIGGERS - audition voices, kokoro voices, voice comparison, tts voice, voice quality, compare voices.

settings-and-tuning

from terrylica/cc-skills

Configure TTS voices, speed, timeouts, queue depth, and bot settings. TRIGGERS - configure tts, change voice, tts speed, queue depth, tts timeout, bot config, tune settings, adjust parameters.

full-stack-bootstrap

from terrylica/cc-skills

One-time bootstrap for Kokoro TTS engine, Telegram bot, and BotFather setup. TRIGGERS - setup tts, install kokoro, botfather, bootstrap tts-tg-sync, configure telegram bot, full stack setup.

diagnostic-issue-resolver

from terrylica/cc-skills

Diagnose and resolve TTS and Telegram bot issues. TRIGGERS - tts not working, bot not responding, kokoro error, audio not playing, lock stuck, telegram bot troubleshoot, diagnose issue.

component-version-upgrade

from terrylica/cc-skills

Upgrade Kokoro model, bot dependencies, or TTS components. TRIGGERS - upgrade kokoro, update model, upgrade bot, update dependencies, version bump, component update.

clean-component-removal

from terrylica/cc-skills

Remove TTS and Telegram sync components cleanly. TRIGGERS - uninstall tts, remove telegram bot, uninstall kokoro, clean tts, teardown, component removal.

send-message

from terrylica/cc-skills

Use when user wants to send a text message on Telegram as their personal account via MTProto, text someone, or message a contact by username, phone, or chat ID.

send-media

from terrylica/cc-skills

Use when user wants to send or upload a file, photo, video, voice note, or document on Telegram via their personal account.

search-messages

from terrylica/cc-skills

Use when user wants to search for messages across all Telegram chats or within a specific chat, find old messages by text, or look up Telegram message history filtered by sender.