wandb-weave
Query and analyze W&B experiment data and Weave LLM traces using Python scripts. Use when working with Weights & Biases data, including (1) querying ML experiment runs, metrics, and hyperparameters, (2) analyzing LLM traces and evaluations, (3) creating W&B reports, (4) listing projects and entities.
Best use case
wandb-weave is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Query and analyze W&B experiment data and Weave LLM traces using Python scripts. Use when working with Weights & Biases data, including (1) querying ML experiment runs, metrics, and hyperparameters, (2) analyzing LLM traces and evaluations, (3) creating W&B reports, (4) listing projects and entities.
Teams using wandb-weave should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/wandb-weave/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How wandb-weave Compares
| Feature / Agent | wandb-weave | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Query and analyze W&B experiment data and Weave LLM traces using Python scripts. Use when working with Weights & Biases data, including (1) querying ML experiment runs, metrics, and hyperparameters, (2) analyzing LLM traces and evaluations, (3) creating W&B reports, (4) listing projects and entities.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# W&B & Weave Data Tools
Python scripts to query W&B experiment data and Weave LLM traces.
## Prerequisites
```bash
pip install wandb weave
export WANDB_API_KEY="your-api-key"
```
## Workflow Decision Tree
```
What do you want to do?
│
├─ Query ML experiments (runs, metrics, sweeps)
│ └─ Run: scripts/query_runs.py
│
├─ Analyze LLM traces
│ ├─ Need trace data? → scripts/query_traces.py
│ └─ Just need count? → scripts/query_traces.py --count-only
│
├─ Create a report
│ └─ Run: scripts/create_report.py
│
└─ List projects
└─ Run: scripts/list_projects.py
```
## Scripts
### query_runs.py
Query W&B experiment runs with filtering and sorting.
```bash
# List recent runs
python scripts/query_runs.py <entity> <project> --limit 10
# Filter by state
python scripts/query_runs.py my-team my-project --state finished
# Sort by metric (best first)
python scripts/query_runs.py my-team my-project --sort "-summary_metrics.accuracy"
# Custom filter
python scripts/query_runs.py my-team my-project --filter '{"config.model": "gpt-4"}'
```
| Option | Description |
|--------|-------------|
| `--limit N` | Max results (default: 20) |
| `--state` | Filter: running, finished, crashed, failed |
| `--sort` | Sort field (prefix `-` for desc) |
| `--filter` | JSON filter dict |
| `--output` | json or table |
### query_traces.py
Query Weave LLM traces with filtering.
```bash
# List recent traces
python scripts/query_traces.py <entity> <project> --limit 50
# Filter by status
python scripts/query_traces.py my-team my-project --status success
# Filter by model
python scripts/query_traces.py my-team my-project --model gpt-4o
# Find slow traces
python scripts/query_traces.py my-team my-project --min-latency 5000
# Count only
python scripts/query_traces.py my-team my-project --count-only
```
| Option | Description |
|--------|-------------|
| `--limit N` | Max results (default: 50) |
| `--status` | Filter: success, error, running |
| `--model` | Filter by model name |
| `--min-latency` | Min latency in ms |
| `--roots-only` | Only root traces |
| `--count-only` | Return count, not data |
| `--filter` | Custom JSON filter (advanced) |
For advanced filter syntax (when `--status`, `--model`, `--min-latency` are not enough), see [references/weave_filters.md](references/weave_filters.md).
### list_projects.py
List entities and projects.
```bash
# List all entities and projects
python scripts/list_projects.py
# List projects for specific entity
python scripts/list_projects.py my-team
# List entities only
python scripts/list_projects.py --entities-only
```
### create_report.py
Create W&B reports programmatically.
```bash
# Create with inline content
python scripts/create_report.py my-team my-project "Weekly Summary" \
--content "## Results\n\n- Accuracy: 95%\n- Loss: 0.05"
# Create from markdown file
python scripts/create_report.py my-team my-project "Analysis" --file report.md
# With description
python scripts/create_report.py my-team my-project "Q4 Report" \
--content "..." --description "Quarterly analysis"
```
## Common Workflows
### Analyze Experiment Performance
```bash
# 1. Find your project
python scripts/list_projects.py my-team
# 2. Query best runs
python scripts/query_runs.py my-team my-project \
--state finished \
--sort "-summary_metrics.accuracy" \
--limit 10
# 3. Create summary report
python scripts/create_report.py my-team my-project "Best Runs" \
--content "## Top 10 Runs by Accuracy\n\n..."
```
### Debug LLM Application
```bash
# 1. Count errors
python scripts/query_traces.py my-team my-project --status error --count-only
# 2. Get error details
python scripts/query_traces.py my-team my-project --status error --limit 20
# 3. Find slow traces
python scripts/query_traces.py my-team my-project --min-latency 5000
```
## Resources
- **Advanced trace filters**: Load [references/weave_filters.md](references/weave_filters.md) when `--filter` option is needed for complex queries not covered by built-in optionsRelated Skills
arweave-standards
GitHub repository skill for ArweaveTeam/arweave-standards
arweave-bridge
ZigZag Exchange Arweave Bridge - Pay with zkSync stablecoins (USDC/USDT/DAI) for permanent Arweave storage. Use for building dApps needing decentralized file storage, NFT metadata permanence, or Layer 2 storage solutions.
arweave-ao-cookbook
Build decentralized applications on AO - a permanent, decentralized compute platform using actor model for parallel processes with native message-passing and permanent storage on Arweave
bgo
Automated Blender build-go workflow. Automatically builds, removes old version, installs, enables, and launches Blender with your extension/add-on. Use when you want to quickly test changes, execute complete build-to-launch cycle, or run custom packaging scripts with automatic Blender launch.
maintenance
Cleans up and organizes project files. Use when user mentions '整理', 'cleanup', 'アーカイブ', 'archive', '肥大化', 'Plans.md', 'session-log', or asks to clean up old tasks, archive completed items, or organize files. Do NOT load for: 実装作業, レビュー, 新機能開発, デプロイ.
hello-skill
每次对话开始时,声明"[Skills✏️已加载]"
zylvie-automation
Automate Zylvie tasks via Rube MCP (Composio). Always search tools first for current schemas.
zoominfo-automation
Automate Zoominfo tasks via Rube MCP (Composio). Always search tools first for current schemas.
zoho-invoice-automation
Automate Zoho Invoice tasks via Rube MCP (Composio): invoices, estimates, expenses, clients, and payment tracking. Always search tools first for current schemas.
zoho-inventory-automation
Automate Zoho Inventory tasks via Rube MCP (Composio): items, orders, warehouses, shipments, and stock management. Always search tools first for current schemas.
zoho-bigin-automation
Automate Zoho Bigin tasks via Rube MCP (Composio): pipelines, contacts, companies, products, and small business CRM. Always search tools first for current schemas.
zoho_desk-automation
Zoho Desk automation via Rube MCP -- toolkit not currently available in Composio; no ZOHO_DESK_ tools found