multiAI Summary Pending

autoresearch-pro

Automatically improve OpenClaw skills, prompts, or articles through iterative mutation-testing loops. Inspired by Karpathy's autoresearch. Use when user says 'optimize [skill]', 'autoresearch [skill]', 'improve my skill', 'optimize this prompt', 'improve my prompt', 'polish this article', 'improve this article', or explicitly requests quality improvement for any text-based content. Supports three modes: skill (SKILL.md files), prompt (any prompt text), and article (any document).

3,556 stars

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/autoresearch-pro/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/0xcjl/autoresearch-pro/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/autoresearch-pro/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How autoresearch-pro Compares

Feature / Agentautoresearch-proStandard Approach
Platform SupportmultiLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Automatically improve OpenClaw skills, prompts, or articles through iterative mutation-testing loops. Inspired by Karpathy's autoresearch. Use when user says 'optimize [skill]', 'autoresearch [skill]', 'improve my skill', 'optimize this prompt', 'improve my prompt', 'polish this article', 'improve this article', or explicitly requests quality improvement for any text-based content. Supports three modes: skill (SKILL.md files), prompt (any prompt text), and article (any document).

Which AI agents support this skill?

This skill is compatible with multi.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# autoresearch-pro

## Overview

Automatically improve any OpenClaw skill, prompt, or article through iterative mutation-testing: small edits → run test cases → score with checklist → keep improvements, discard regressions.

**Inspired by [Karpathy/autoresearch](https://github.com/karpathy/autoresearch).**

Supports three optimization modes:

| Mode | Input | Output |
|------|-------|--------|
| **Skill** | Path to a skill directory | Improved SKILL.md |
| **Prompt** | A prompt text string | Improved prompt |
| **Article** | An article/document text | Improved article |

---

## Workflow

### Step 1 — Identify Mode and Input

Ask the user to confirm:

- **Mode 1 — Skill**: User says "optimize [skill-name]" or provides a skill path
- **Mode 2 — Prompt**: User says "optimize this prompt" or pastes a prompt
- **Mode 3 — Article**: User says "improve this article" or pastes article text

For **Skill mode**, resolve the skill path to `~/.openclaw/skills/<skill-name>/SKILL.md`.
For **Prompt/Article mode**, keep the text in context (do not write to disk unless needed).

### Step 2 — Generate Checklist (10 Questions)

Read the target content first. Then generate 10 diverse, specific yes/no checklist questions relevant to the content type:

**For Skill mode (same as before):**

| # | Dimension | What to Check |
|---|----------|---------------|
| 1 | Description clarity | Is the frontmatter description precise and actionable? |
| 2 | Trigger coverage | Does it cover the main real-world use cases? |
| 3 | Workflow structure | Are steps clearly sequenced and unambiguous? |
| 4 | Error guidance | Does it handle error states and edge cases? |
| 5 | Tool usage accuracy | Are tool names and parameters correct for OpenClaw? |
| 6 | Example quality | Do examples reflect real usage patterns? |
| 7 | Conciseness | Is content free of redundant repetition? |
| 8 | Freedom calibration | Is instruction specificity appropriate? |
| 9 | Reference quality | Are references and links accurate? |
| 10 | Completeness | Are all sections filled with real content? |

**For Prompt mode (10 tailored questions):**

| # | Dimension | What to Check |
|---|----------|---------------|
| 1 | Goal clarity | Does the prompt state a clear, specific goal? |
| 2 | Role/tone | Is the desired role or tone specified? |
| 3 | Input format | Is the input format clearly described? |
| 4 | Output format | Is the expected output format specified? |
| 5 | Constraints | Are key constraints and boundaries stated? |
| 6 | Context sufficiency | Is enough context provided to avoid hallucination? |
| 7 | Edge cases | Does it handle ambiguous or edge case inputs? |
| 8 | Conciseness | Is it free of redundant or contradictory instructions? |
| 9 | Actionability | Are instructions concrete and actionable vs. vague? |
| 10 | Completeness | Are all necessary elements for the task present? |

**For Article mode (10 tailored questions):**

| # | Dimension | What to Check |
|---|----------|---------------|
| 1 | Title quality | Does the title clearly convey the main value? |
| 2 | Opening hook | Does the opening grab attention and set expectations? |
| 3 | Logical structure | Are ideas logically organized (not random)? |
| 4 | Argument clarity | Are claims supported with evidence or reasoning? |
| 5 | Conciseness | Is unnecessary padding or repetition removed? |
| 6 | Transition flow | Do paragraphs/sections flow smoothly? |
| 7 | Closing strength | Does the conclusion summarize and inspire action? |
| 8 | Tone consistency | Is the tone consistent throughout? |
| 9 | Readability | Is sentence/paragraph length varied appropriately? |
| 10 | Audience match | Does language match the target audience level? |

**Present the 10 questions**, numbered 1-10. Ask the user to select which ones to activate (e.g., "use questions 1, 3, 5, 7"). Default: use all 10 if user doesn't specify.

### Step 3 — Prepare Test Cases

- **Skill mode**: Generate 3-5 realistic prompts a user would send when using the skill
- **Prompt mode**: Generate 3-5 test inputs that the prompt would process
- **Article mode**: Generate 3-5 ways the article might be read or consumed

Store test cases in context — do not write to disk.

### Step 4 — Run Autoresearch Loop

**Loop configuration:**
- **Rounds per batch**: 30
- **Max total rounds**: 100
- **Pause**: After every 30 rounds, show summary and ask user to continue or stop
- **Stop conditions**: User says stop, OR 100 rounds completed

**Per-round procedure:**

1. **Mutate**: Make ONE small edit to the target content:
   - Skill mode: edit SKILL.md
   - Prompt mode: edit the prompt string
   - Article mode: edit the article text

2. **Test**: For each test case, simulate what output the content would produce.

3. **Score**: Apply each active checklist question (0 or 1 per question). Score = (passed / total) × 100.

4. **Decide**: If new score ≥ best score → keep the mutation. If lower → revert.

5. **Log**: Round number, mutation type, score, keep/revert decision.

**Mutation types (pick one per round):**

| Type | Description |
|------|-------------|
| A | Add a constraint rule |
| B | Strengthen trigger/coverage |
| C | Add a concrete example |
| D | Tighten vague language |
| E | Improve error/edge case handling |
| F | Remove redundant content |
| G | Improve transitions |
| H | Expand a thin section |
| I | Add cross-reference |
| J | Adjust degree-of-freedom |

### Step 5 — Report Results

**After each batch (30 rounds):**
```
Batch N (rounds X-Y):
  Best score: XX%
  Mutations kept: N  |  Reverted: N
  Most effective types: [list top 2-3]
Accumulated improvements: [summary]
Continue? (yes/stop)
```

**After full completion:**
- Original score vs. final score
- Top 3 most impactful mutations
- Final improved content (inline or diff)
- File path (skill mode only)

---

## Mutation Strategy Reference

**High-impact, low-risk changes:**
- Adding explicit constraints where the content is vague
- Expanding coverage to cover edge cases
- Adding concrete examples to abstract instructions
- Tightening soft language ("try to" → "must")

**Avoid in one round:**
- Large rewrites of entire sections
- Multiple unrelated changes at once
- Changing fundamental scope or purpose

See `references/mutation_strategies.md` for the full strategy guide.

---

## Mode Selection Quick Reference

| User says | Mode |
|-----------|------|
| "optimize [skill]" / "autoresearch [skill]" | Skill |
| "optimize this prompt" / "improve my prompt" | Prompt |
| "polish this article" / "improve this article" | Article |
| "optimize this document" | Article |

Default to **Prompt mode** if the input is a text string without a skill path.