experiment-loop

Autonomous experiment loop - modify code, measure, keep/discard, iterate until target met

422 stars

byvibeeval

View on GitHub Installation ↓

Best use case

experiment-loop is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Autonomous experiment loop - modify code, measure, keep/discard, iterate until target met

Teams using experiment-loop should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/experiment-loop/SKILL.md --create-dirs "https://raw.githubusercontent.com/vibeeval/vibecosystem/main/skills/experiment-loop/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/experiment-loop/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How experiment-loop Compares

Feature / Agent	experiment-loop	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Autonomous experiment loop - modify code, measure, keep/discard, iterate until target met

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Experiment Loop

Autonomous optimization loop: modify code, run measurement, evaluate results, keep improvements, discard regressions. Inspired by the 3-file principle: fixed measurement + mutable code + human instructions.

## Concept

```
┌─────────────────────────────────────────┐
│  INSTRUCTIONS (fixed)                    │
│  "Optimize X to achieve Y metric"        │
└──────────────┬──────────────────────────┘
               │
               v
┌──────────────────────────────┐
│  MEASURE (fixed)              │ ◄── Never changes during loop
│  benchmark.sh / test suite    │
└──────────────┬───────────────┘
               │
               v
┌──────────────────────────────┐
│  CODE (mutable)               │ ◄── Agent modifies this
│  src/target-module.ts         │
└──────────────┬───────────────┘
               │
               v
┌──────────────────────────────┐
│  EVALUATE                     │
│  Better? → KEEP + next iter   │
│  Worse?  → REVERT + try alt   │
│  Target met? → DONE           │
└──────────────────────────────┘
```

## Usage

```
/experiment-loop --target "response time < 100ms" --measure "npm run benchmark" --file src/api/handler.ts
```

## Parameters

| Param | Required | Description |
|-------|----------|-------------|
| `--target` | Yes | Success criteria (quantitative) |
| `--measure` | Yes | Command to run measurement |
| `--file` | Yes | File(s) to optimize |
| `--max-iterations` | No | Max attempts (default: 10) |
| `--baseline` | No | Run baseline measurement first (default: true) |

## Workflow

### Step 1: Baseline
```
1. Run measurement command
2. Record baseline metrics
3. Save current code state (git stash or copy)
```

### Step 2: Hypothesize
```
1. Analyze current code
2. Identify optimization opportunity
3. Predict expected improvement
```

### Step 3: Modify
```
1. Make ONE focused change
2. Keep change small and reversible
3. Document what was changed and why
```

### Step 4: Measure
```
1. Run same measurement command
2. Record new metrics
3. Compare against baseline AND previous best
```

### Step 5: Evaluate
```
IF metrics improved:
  → KEEP change
  → Update "current best" baseline
  → Log: "Iteration N: +X% improvement from [change description]"

IF metrics worsened or unchanged:
  → REVERT change
  → Log: "Iteration N: [change description] did not improve, reverted"
  → Try different approach

IF target met:
  → DONE
  → Generate summary report
```

### Step 6: Iterate or Stop
```
IF target met → Report success
IF max_iterations reached → Report best achieved
IF no improvement for 3 consecutive iterations → Report plateau
```

## Output

```markdown
# Experiment Report: [Target]
Date: [timestamp]
Iterations: [N]
Status: TARGET MET / PLATEAU / MAX ITERATIONS

## Baseline
[Initial measurement results]

## Best Result
[Best measurement achieved]
Improvement: [X% over baseline]

## Iteration Log
| # | Change | Result | Delta | Keep? |
|---|--------|--------|-------|-------|
| 1 | [desc] | [metric] | +X% | Yes |
| 2 | [desc] | [metric] | -Y% | No |

## Final Code State
[What changes were kept]

## Learnings
- [What worked]
- [What didn't work]
- [Suggestions for further optimization]
```

## Best For

| Agent | Use Case |
|-------|----------|
| nitro | Performance optimization loops |
| profiler | Memory/CPU optimization |
| backend-dev | API response time optimization |
| frontend-dev | Bundle size / render time optimization |
| ai-engineer | Prompt optimization loops |

## Rules

- ONE change per iteration (isolate variables)
- ALWAYS measure before and after
- ALWAYS revert failed changes
- Max 10 iterations default (prevent infinite loops)
- Report progress every 3 iterations
- Stop if 3 consecutive non-improvements (plateau detection)

Related Skills

verification-loop

422

from vibeeval/vibecosystem

Comprehensive verification system covering build, types, lint, tests, security, and diff review before a PR.

workflow-router

422

from vibeeval/vibecosystem

Goal-based workflow orchestration - routes tasks to specialist agents based on user goals

wiring

422

from vibeeval/vibecosystem

Wiring Verification

websocket-patterns

422

from vibeeval/vibecosystem

Connection management, room patterns, reconnection strategies, message buffering, and binary protocol design.

visual-verdict

422

from vibeeval/vibecosystem

Screenshot comparison QA for frontend development. Takes a screenshot of the current implementation, scores it across multiple visual dimensions, and returns a structured PASS/REVISE/FAIL verdict with concrete fixes. Use when implementing UI from a design reference or verifying visual correctness.

vector-db-patterns

422

from vibeeval/vibecosystem

Embedding strategies, ANN algorithms, hybrid search, RAG chunking strategies, and reranking for semantic search and retrieval.

variant-analysis

422

from vibeeval/vibecosystem

Find similar vulnerabilities across a codebase after discovering one instance. Uses pattern matching, AST search, Semgrep/CodeQL queries, and manual tracing to propagate findings. Adapted from Trail of Bits. Use after finding a bug to check if the same pattern exists elsewhere.

validate-agent

422

from vibeeval/vibecosystem

Validation agent that validates plan tech choices against current best practices

tracing-patterns

422

from vibeeval/vibecosystem

OpenTelemetry setup, span context propagation, sampling strategies, Jaeger queries

tour

422

from vibeeval/vibecosystem

Friendly onboarding tour of Claude Code capabilities for users asking what it can do.

tldr-stats

422

from vibeeval/vibecosystem

Show full session token usage, costs, TLDR savings, and hook activity

tldr-router

422

from vibeeval/vibecosystem

Map code questions to the optimal tldr command by detecting intent and routing to the right analysis layer.