experiment-loop
Autonomous experiment loop - modify code, measure, keep/discard, iterate until target met
Best use case
experiment-loop is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Autonomous experiment loop - modify code, measure, keep/discard, iterate until target met
Teams using experiment-loop should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/experiment-loop/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How experiment-loop Compares
| Feature / Agent | experiment-loop | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Autonomous experiment loop - modify code, measure, keep/discard, iterate until target met
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Experiment Loop
Autonomous optimization loop: modify code, run measurement, evaluate results, keep improvements, discard regressions. Inspired by the 3-file principle: fixed measurement + mutable code + human instructions.
## Concept
```
┌─────────────────────────────────────────┐
│ INSTRUCTIONS (fixed) │
│ "Optimize X to achieve Y metric" │
└──────────────┬──────────────────────────┘
│
v
┌──────────────────────────────┐
│ MEASURE (fixed) │ ◄── Never changes during loop
│ benchmark.sh / test suite │
└──────────────┬───────────────┘
│
v
┌──────────────────────────────┐
│ CODE (mutable) │ ◄── Agent modifies this
│ src/target-module.ts │
└──────────────┬───────────────┘
│
v
┌──────────────────────────────┐
│ EVALUATE │
│ Better? → KEEP + next iter │
│ Worse? → REVERT + try alt │
│ Target met? → DONE │
└──────────────────────────────┘
```
## Usage
```
/experiment-loop --target "response time < 100ms" --measure "npm run benchmark" --file src/api/handler.ts
```
## Parameters
| Param | Required | Description |
|-------|----------|-------------|
| `--target` | Yes | Success criteria (quantitative) |
| `--measure` | Yes | Command to run measurement |
| `--file` | Yes | File(s) to optimize |
| `--max-iterations` | No | Max attempts (default: 10) |
| `--baseline` | No | Run baseline measurement first (default: true) |
## Workflow
### Step 1: Baseline
```
1. Run measurement command
2. Record baseline metrics
3. Save current code state (git stash or copy)
```
### Step 2: Hypothesize
```
1. Analyze current code
2. Identify optimization opportunity
3. Predict expected improvement
```
### Step 3: Modify
```
1. Make ONE focused change
2. Keep change small and reversible
3. Document what was changed and why
```
### Step 4: Measure
```
1. Run same measurement command
2. Record new metrics
3. Compare against baseline AND previous best
```
### Step 5: Evaluate
```
IF metrics improved:
→ KEEP change
→ Update "current best" baseline
→ Log: "Iteration N: +X% improvement from [change description]"
IF metrics worsened or unchanged:
→ REVERT change
→ Log: "Iteration N: [change description] did not improve, reverted"
→ Try different approach
IF target met:
→ DONE
→ Generate summary report
```
### Step 6: Iterate or Stop
```
IF target met → Report success
IF max_iterations reached → Report best achieved
IF no improvement for 3 consecutive iterations → Report plateau
```
## Output
```markdown
# Experiment Report: [Target]
Date: [timestamp]
Iterations: [N]
Status: TARGET MET / PLATEAU / MAX ITERATIONS
## Baseline
[Initial measurement results]
## Best Result
[Best measurement achieved]
Improvement: [X% over baseline]
## Iteration Log
| # | Change | Result | Delta | Keep? |
|---|--------|--------|-------|-------|
| 1 | [desc] | [metric] | +X% | Yes |
| 2 | [desc] | [metric] | -Y% | No |
## Final Code State
[What changes were kept]
## Learnings
- [What worked]
- [What didn't work]
- [Suggestions for further optimization]
```
## Best For
| Agent | Use Case |
|-------|----------|
| nitro | Performance optimization loops |
| profiler | Memory/CPU optimization |
| backend-dev | API response time optimization |
| frontend-dev | Bundle size / render time optimization |
| ai-engineer | Prompt optimization loops |
## Rules
- ONE change per iteration (isolate variables)
- ALWAYS measure before and after
- ALWAYS revert failed changes
- Max 10 iterations default (prevent infinite loops)
- Report progress every 3 iterations
- Stop if 3 consecutive non-improvements (plateau detection)Related Skills
verification-loop
Comprehensive verification system covering build, types, lint, tests, security, and diff review before a PR.
workflow-router
Goal-based workflow orchestration - routes tasks to specialist agents based on user goals
wiring
Wiring Verification
websocket-patterns
Connection management, room patterns, reconnection strategies, message buffering, and binary protocol design.
visual-verdict
Screenshot comparison QA for frontend development. Takes a screenshot of the current implementation, scores it across multiple visual dimensions, and returns a structured PASS/REVISE/FAIL verdict with concrete fixes. Use when implementing UI from a design reference or verifying visual correctness.
vector-db-patterns
Embedding strategies, ANN algorithms, hybrid search, RAG chunking strategies, and reranking for semantic search and retrieval.
variant-analysis
Find similar vulnerabilities across a codebase after discovering one instance. Uses pattern matching, AST search, Semgrep/CodeQL queries, and manual tracing to propagate findings. Adapted from Trail of Bits. Use after finding a bug to check if the same pattern exists elsewhere.
validate-agent
Validation agent that validates plan tech choices against current best practices
tracing-patterns
OpenTelemetry setup, span context propagation, sampling strategies, Jaeger queries
tour
Friendly onboarding tour of Claude Code capabilities for users asking what it can do.
tldr-stats
Show full session token usage, costs, TLDR savings, and hook activity
tldr-router
Map code questions to the optimal tldr command by detecting intent and routing to the right analysis layer.