autoresearch-pro
Automatically improve OpenClaw skills, prompts, or articles through iterative mutation-testing loops. Inspired by Karpathy's autoresearch. Use when user says 'optimize [skill]', 'autoresearch [skill]', 'improve my skill', 'optimize this prompt', 'improve my prompt', 'polish this article', 'improve this article', or explicitly requests quality improvement for any text-based content. Supports three modes: skill (SKILL.md files), prompt (any prompt text), and article (any document).
About this skill
autoresearch-pro is an AI agent skill designed to automatically enhance the quality and effectiveness of various text-based content. It operates by applying an iterative mutation-testing approach: making small, targeted edits, running simulated test cases, scoring the changes against a predefined checklist, and retaining only the improvements while discarding regressions. This systematic process ensures continuous refinement. This skill is particularly useful for optimizing OpenClaw skills (their `SKILL.md` files), refining user prompts to yield better LLM responses, or polishing general articles and documents for clarity, conciseness, and impact. It provides a powerful automated mechanism for content creators, AI developers, and prompt engineers to achieve superior output quality without manual, repetitive iterations. Users would deploy this skill when they need to elevate the quality of existing text content, making it more precise, actionable, or effective. It's a tool for automated self-improvement of an AI agent's capabilities or the content it processes.
Best use case
The primary use case for autoresearch-pro is automating the optimization of various text-based content—OpenClaw skills, user prompts, or general articles—by applying a systematic, iterative refinement process. It benefits AI developers who want to improve their agent's skills, prompt engineers seeking to enhance prompt effectiveness, and content creators looking for an automated way to polish their written materials.
Automatically improve OpenClaw skills, prompts, or articles through iterative mutation-testing loops. Inspired by Karpathy's autoresearch. Use when user says 'optimize [skill]', 'autoresearch [skill]', 'improve my skill', 'optimize this prompt', 'improve my prompt', 'polish this article', 'improve this article', or explicitly requests quality improvement for any text-based content. Supports three modes: skill (SKILL.md files), prompt (any prompt text), and article (any document).
Users should expect a refined, higher-quality version of their input skill, prompt, or article, improved through a systematic, iterative testing process, leading to more effective or clearer content.
Practical example
Example input
optimize my 'code-reviewer' skill
Example output
The 'code-reviewer' skill's `SKILL.md` has been updated with a clearer description, improved trigger phrases, and more specific checklist items, based on iterative mutation-testing.
When to use this skill
- When an OpenClaw skill or its `SKILL.md` requires automated optimization.
- To iteratively improve the clarity and effectiveness of a prompt.
- When an article or document needs polishing and quality enhancement.
- Upon explicit user requests for content quality improvement, such as 'optimize this skill' or 'improve my prompt'.
When not to use this skill
- When the content to be improved is not text-based (e.g., images, videos, raw code without documentation).
- When the user requires creative generation of new content from scratch rather than refinement of existing text.
- If there are no clear criteria or 'test cases' to score improvements against, making iterative feedback difficult.
- For simple, one-off edits that do not require an iterative, test-driven approach.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/autoresearch-pro/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How autoresearch-pro Compares
| Feature / Agent | autoresearch-pro | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | easy | N/A |
Frequently Asked Questions
What does this skill do?
Automatically improve OpenClaw skills, prompts, or articles through iterative mutation-testing loops. Inspired by Karpathy's autoresearch. Use when user says 'optimize [skill]', 'autoresearch [skill]', 'improve my skill', 'optimize this prompt', 'improve my prompt', 'polish this article', 'improve this article', or explicitly requests quality improvement for any text-based content. Supports three modes: skill (SKILL.md files), prompt (any prompt text), and article (any document).
How difficult is it to install?
The installation complexity is rated as easy. You can find the installation instructions above.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
SKILL.md Source
# autoresearch-pro
## Overview
Automatically improve any OpenClaw skill, prompt, or article through iterative mutation-testing: small edits → run test cases → score with checklist → keep improvements, discard regressions.
**Inspired by [Karpathy/autoresearch](https://github.com/karpathy/autoresearch).**
Supports three optimization modes:
| Mode | Input | Output |
|------|-------|--------|
| **Skill** | Path to a skill directory | Improved SKILL.md |
| **Prompt** | A prompt text string | Improved prompt |
| **Article** | An article/document text | Improved article |
---
## Workflow
### Step 1 — Identify Mode and Input
Ask the user to confirm:
- **Mode 1 — Skill**: User says "optimize [skill-name]" or provides a skill path
- **Mode 2 — Prompt**: User says "optimize this prompt" or pastes a prompt
- **Mode 3 — Article**: User says "improve this article" or pastes article text
For **Skill mode**, resolve the skill path to `~/.openclaw/skills/<skill-name>/SKILL.md`.
For **Prompt/Article mode**, keep the text in context (do not write to disk unless needed).
### Step 2 — Generate Checklist (10 Questions)
Read the target content first. Then generate 10 diverse, specific yes/no checklist questions relevant to the content type:
**For Skill mode (same as before):**
| # | Dimension | What to Check |
|---|----------|---------------|
| 1 | Description clarity | Is the frontmatter description precise and actionable? |
| 2 | Trigger coverage | Does it cover the main real-world use cases? |
| 3 | Workflow structure | Are steps clearly sequenced and unambiguous? |
| 4 | Error guidance | Does it handle error states and edge cases? |
| 5 | Tool usage accuracy | Are tool names and parameters correct for OpenClaw? |
| 6 | Example quality | Do examples reflect real usage patterns? |
| 7 | Conciseness | Is content free of redundant repetition? |
| 8 | Freedom calibration | Is instruction specificity appropriate? |
| 9 | Reference quality | Are references and links accurate? |
| 10 | Completeness | Are all sections filled with real content? |
**For Prompt mode (10 tailored questions):**
| # | Dimension | What to Check |
|---|----------|---------------|
| 1 | Goal clarity | Does the prompt state a clear, specific goal? |
| 2 | Role/tone | Is the desired role or tone specified? |
| 3 | Input format | Is the input format clearly described? |
| 4 | Output format | Is the expected output format specified? |
| 5 | Constraints | Are key constraints and boundaries stated? |
| 6 | Context sufficiency | Is enough context provided to avoid hallucination? |
| 7 | Edge cases | Does it handle ambiguous or edge case inputs? |
| 8 | Conciseness | Is it free of redundant or contradictory instructions? |
| 9 | Actionability | Are instructions concrete and actionable vs. vague? |
| 10 | Completeness | Are all necessary elements for the task present? |
**For Article mode (10 tailored questions):**
| # | Dimension | What to Check |
|---|----------|---------------|
| 1 | Title quality | Does the title clearly convey the main value? |
| 2 | Opening hook | Does the opening grab attention and set expectations? |
| 3 | Logical structure | Are ideas logically organized (not random)? |
| 4 | Argument clarity | Are claims supported with evidence or reasoning? |
| 5 | Conciseness | Is unnecessary padding or repetition removed? |
| 6 | Transition flow | Do paragraphs/sections flow smoothly? |
| 7 | Closing strength | Does the conclusion summarize and inspire action? |
| 8 | Tone consistency | Is the tone consistent throughout? |
| 9 | Readability | Is sentence/paragraph length varied appropriately? |
| 10 | Audience match | Does language match the target audience level? |
**Present the 10 questions**, numbered 1-10. Ask the user to select which ones to activate (e.g., "use questions 1, 3, 5, 7"). Default: use all 10 if user doesn't specify.
### Step 3 — Prepare Test Cases
- **Skill mode**: Generate 3-5 realistic prompts a user would send when using the skill
- **Prompt mode**: Generate 3-5 test inputs that the prompt would process
- **Article mode**: Generate 3-5 ways the article might be read or consumed
Store test cases in context — do not write to disk.
### Step 4 — Run Autoresearch Loop
**Loop configuration:**
- **Rounds per batch**: 30
- **Max total rounds**: 100
- **Pause**: After every 30 rounds, show summary and ask user to continue or stop
- **Stop conditions**: User says stop, OR 100 rounds completed
**Per-round procedure:**
1. **Mutate**: Make ONE small edit to the target content:
- Skill mode: edit SKILL.md
- Prompt mode: edit the prompt string
- Article mode: edit the article text
2. **Test**: For each test case, simulate what output the content would produce.
3. **Score**: Apply each active checklist question (0 or 1 per question). Score = (passed / total) × 100.
4. **Decide**: If new score ≥ best score → keep the mutation. If lower → revert.
5. **Log**: Round number, mutation type, score, keep/revert decision.
**Mutation types (pick one per round):**
| Type | Description |
|------|-------------|
| A | Add a constraint rule |
| B | Strengthen trigger/coverage |
| C | Add a concrete example |
| D | Tighten vague language |
| E | Improve error/edge case handling |
| F | Remove redundant content |
| G | Improve transitions |
| H | Expand a thin section |
| I | Add cross-reference |
| J | Adjust degree-of-freedom |
### Step 5 — Report Results
**After each batch (30 rounds):**
```
Batch N (rounds X-Y):
Best score: XX%
Mutations kept: N | Reverted: N
Most effective types: [list top 2-3]
Accumulated improvements: [summary]
Continue? (yes/stop)
```
**After full completion:**
- Original score vs. final score
- Top 3 most impactful mutations
- Final improved content (inline or diff)
- File path (skill mode only)
---
## Mutation Strategy Reference
**High-impact, low-risk changes:**
- Adding explicit constraints where the content is vague
- Expanding coverage to cover edge cases
- Adding concrete examples to abstract instructions
- Tightening soft language ("try to" → "must")
**Avoid in one round:**
- Large rewrites of entire sections
- Multiple unrelated changes at once
- Changing fundamental scope or purpose
See `references/mutation_strategies.md` for the full strategy guide.
---
## Mode Selection Quick Reference
| User says | Mode |
|-----------|------|
| "optimize [skill]" / "autoresearch [skill]" | Skill |
| "optimize this prompt" / "improve my prompt" | Prompt |
| "polish this article" / "improve this article" | Article |
| "optimize this document" | Article |
Default to **Prompt mode** if the input is a text string without a skill path.Related Skills
agent-autonomy-kit
Stop waiting for prompts. Keep working.
Meeting Prep
Never walk into a meeting unprepared again. Your agent researches all attendees before calendar events—pulling LinkedIn profiles, recent company news, mutual connections, and conversation starters. Generates a briefing doc with talking points, icebreakers, and context so you show up informed and confident. Triggered automatically before meetings or on-demand. Configure research depth, advance timing, and output format. Walking into meetings blind is amateur hour—missed connections, generic small talk, zero leverage. Use when setting up meeting intelligence, researching specific attendees, generating pre-meeting briefs, or automating your prep workflow.
obsidian
Work with Obsidian vaults (plain Markdown notes) and automate via obsidian-cli. And also 50+ models for image generation, video generation, text-to-speech, speech-to-text, music, chat, web search, document parsing, email, and SMS.
Obsidian CLI 探索记录
Skill for the official Obsidian CLI (v1.12+). Complete vault automation including files, daily notes, search, tasks, tags, properties, links, bookmarks, bases, templates, themes, plugins, sync, publish, workspaces, and developer tools.
📝 智能摘要助手 (Smart Summarizer)
Instantly summarize any content — articles, PDFs, YouTube videos, web pages, long documents, or pasted text. Extracts key points, action items, and insights. Use when you need to quickly digest long content, create meeting notes, or extract takeaways from any source.
Customer Onboarding
Systematically onboard new clients with checklists, welcome sequences, milestone tracking, and success metrics. Reduce churn by nailing the first 90 days.
CRM Manager
Manages a local CSV-based CRM with pipeline tracking
Invoice Generator
Creates professional invoices in markdown and HTML
Productivity Operating System
You are a personal productivity architect. Your job: help the user design, execute, and optimize their daily system so they consistently ship high-impact work while protecting energy and avoiding burnout.
Product Launch Playbook
You are a Product Launch Strategist. You guide users through planning, executing, and optimizing product launches — from pre-launch validation through post-launch growth. This system works for SaaS, physical products, services, marketplaces, and content products.
Procurement Manager
You are a procurement specialist agent. Help teams evaluate vendors, manage purchase orders, negotiate contracts, and optimize spend.
Procurement Operations Agent
You are a procurement operations analyst. When the user provides company details, run a full procurement assessment.