autoresearch

Autonomous iterative improvement loop. Agent modifies code, verifies against metrics, keeps improvements or reverts failures, and repeats. Uses git as memory — each change is committed, measured, and kept or discarded. Runs until a target metric is hit or max iterations reached. Triggers on: "autoresearch", "auto improve", "iterative improvement", "autonomous loop", "hill climb"

170 stars

Best use case

autoresearch is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Autonomous iterative improvement loop. Agent modifies code, verifies against metrics, keeps improvements or reverts failures, and repeats. Uses git as memory — each change is committed, measured, and kept or discarded. Runs until a target metric is hit or max iterations reached. Triggers on: "autoresearch", "auto improve", "iterative improvement", "autonomous loop", "hill climb"

Teams using autoresearch should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/autoresearch/SKILL.md --create-dirs "https://raw.githubusercontent.com/Miosa-osa/canopy/main/library/skills/development/autoresearch/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/autoresearch/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How autoresearch Compares

Feature / AgentautoresearchStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Autonomous iterative improvement loop. Agent modifies code, verifies against metrics, keeps improvements or reverts failures, and repeats. Uses git as memory — each change is committed, measured, and kept or discarded. Runs until a target metric is hit or max iterations reached. Triggers on: "autoresearch", "auto improve", "iterative improvement", "autonomous loop", "hill climb"

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# /autoresearch

> Autonomous code improvement through iterative experiment-and-measure cycles.

## Purpose

Run an autonomous loop that improves code against a measurable metric. Each iteration: the agent analyzes current performance, hypothesizes an improvement, implements it, commits it, runs the metric, and decides keep-or-revert. Git history becomes the experiment log. The loop continues until the target is met, max iterations are exhausted, or no progress is made for N consecutive rounds.

## Usage

```bash
# Basic — improve test pass rate
/autoresearch --metric "mix test --cover" --target 95

# Improve benchmark performance
/autoresearch --metric "cargo bench --output json" --target "p99 < 10ms" --max-iter 20

# Improve code quality score
/autoresearch --metric "npx eslint . --format json | jq '.errorCount'" --target 0

# Dry run — show plan without executing
/autoresearch --metric "pytest" --target "all pass" --dry-run

# Resume a previous session
/autoresearch --resume

# Limit scope to specific files
/autoresearch --metric "mix test test/parser_test.exs" --target "0 failures" --scope "lib/parser.ex"
```

## Arguments

| Flag | Type | Default | Description |
|------|------|---------|-------------|
| `--metric` | string | required | Command that produces a measurable result |
| `--target` | string | required | Success condition (number, comparison, or keyword) |
| `--max-iter` | int | `10` | Maximum improvement iterations |
| `--stall-limit` | int | `3` | Stop after N consecutive iterations with no improvement |
| `--scope` | string | `.` | Files or directories the agent may modify |
| `--dry-run` | flag | false | Show the improvement plan without executing |
| `--resume` | flag | false | Resume from last autoresearch session |
| `--branch` | string | `autoresearch/<timestamp>` | Git branch name for the experiment |
| `--commit-each` | flag | true | Commit each successful iteration |
| `--verbose` | flag | false | Show full metric output each iteration |

## Workflow

1. **Baseline** — Run the metric command on current code. Record the baseline score. Create a new git branch.
2. **Analyze** — Read the metric output and the scoped source files. Identify the highest-impact area for improvement.
3. **Hypothesize** — Form a single, testable change hypothesis. Write it to the commit message.
4. **Implement** — Make the code change. Keep changes small and focused (one logical modification per iteration).
5. **Measure** — Run the metric command again. Parse the result.
6. **Decide** — Compare to previous best score. If improved or equal: `git commit` with hypothesis + result. If worse: `git checkout -- .` to revert.
7. **Log** — Record iteration number, hypothesis, before/after scores, and keep/revert decision.
8. **Repeat** — Return to step 2 unless: target met, max iterations reached, or stall limit hit.
9. **Report** — Summarize all iterations, net improvement, and final state. Optionally squash commits.

## Examples

### Improving test coverage
```
/autoresearch --metric "mix test --cover | grep 'Total:'" --target 90 --scope "lib/"

## Autoresearch — Session ar-20260320-1
| Iter | Hypothesis | Before | After | Decision |
|------|-----------|--------|-------|----------|
| 1 | Add missing tests for Parser.parse_header/1 edge cases | 72.3% | 78.1% | KEEP |
| 2 | Cover error branches in Validator.check/2 | 78.1% | 83.4% | KEEP |
| 3 | Add property tests for Encoder module | 83.4% | 82.9% | REVERT |
| 4 | Test Encoder.encode/1 boundary inputs directly | 83.4% | 88.7% | KEEP |
| 5 | Cover remaining uncovered functions in Formatter | 88.7% | 91.2% | KEEP |

Result: TARGET MET (91.2% >= 90%)
Branch: autoresearch/20260320-143022 (5 commits, 2 reverted)
```

## Output

```markdown
## Autoresearch Complete

- **Branch**: autoresearch/20260320-143022
- **Iterations**: 7 (5 kept, 2 reverted)
- **Baseline**: 72.3%
- **Final**: 91.2%
- **Target**: 90% — MET
- **Net improvement**: +18.9%

### Iteration Log
| # | Hypothesis | Score | Delta | Decision |
|---|-----------|-------|-------|----------|
| 1 | ... | 78.1% | +5.8 | KEEP |
| ... | ... | ... | ... | ... |

### Recommended Next Steps
- Merge branch or cherry-pick successful commits
- Run full test suite to confirm no regressions
```

## Dependencies

- Git (branching, committing, reverting)
- A runnable metric command that produces parseable output
- `/commit` — Used for each successful iteration
- File system access to scoped source files