Best use case
cjl-autoresearch-cc is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
## Overview
Teams using cjl-autoresearch-cc should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/cjl-autoresearch-cc/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How cjl-autoresearch-cc Compares
| Feature / Agent | cjl-autoresearch-cc | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
## Overview
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Startups
Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.
Best AI Skills for ChatGPT
Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
SKILL.md Source
# cjl-autoresearch-cc
## Overview
Improve skills, prompts, articles, workflows, and systems via iterative mutation-testing.
**Core principle:** One small verifiable change per round. Large rewrites are unverifiable and will be reverted.
**Workflow:** small edits → test → score → keep improvements, discard regressions.
Inspired by Karpathy/autoresearch and 0xcjl/openclaw-autoresearch-pro.
## Trigger Keywords
**English:** `autoresearch`
**Chinese:** `自动优化`, `自动研究`
## Semantic Triggers (No Keywords Needed)
This skill activates when the user's intent matches, even without explicit keywords:
- User wants to improve any skill, prompt, article, workflow, or system
- User asks to polish, refine, enhance, or upgrade content
- User wants iterative testing and improvement
- User says '帮我改进一下这个prompt', 'optimize this'
- User says '迭代优化', '循环改进', '反复打磨'
- User asks '能不能更好', '如何提升质量'
- User uses "打磨", "精炼", "完善", "升级" in context of content improvement
## Supported Optimization Targets
| Mode | Input | Example |
|------|-------|---------|
| **Skill** | Skill name or path | `coding-standards`, `~/.claude/skills/tdd-workflow` |
| **Plugin** | Path to a plugin directory | `~/.claude/plugins/everything-claude-code` |
| **Prompt** | A prompt text string | Inline or file path |
| **Article** | An article/document text | Inline or file path |
| **Workflow** | A process or workflow description | Inline or file path |
| **System** | A mechanism or system design | Inline or file path |
## Workflow
### Step 1 — Identify Mode and Target
Before proceeding, confirm with user:
> "Optimize [target] in [mode] mode? (yes/no)"
If no, ask for clarification. If yes, proceed to Step 2.
Parse the user's request to determine mode. Check for:
**Keyword triggers:**
- `autoresearch [target]` / `自动优化 [target]` / `自动研究 [target]`
- `optimize [target]` / `improve [target]` / `优化 [target]` / `改进 [target]`
- `refine [target]` / `enhance [target]` / `精炼 [target]` / `增强 [target]`
**Semantic triggers (intent-based):**
- User wants to improve any skill, prompt, article, workflow, or system
- User asks to polish, refine, enhance, or upgrade content
- User describes wanting iterative testing and improvement
**Mode detection from intent:**
| User Intent | Mode |
|-------------|------|
| Optimize a skill/SKILL.md file | Skill |
| Optimize an agent configuration | Skill |
| Improve a custom command | Skill |
| Optimize a plugin | Plugin |
| Improve hooks configuration | Plugin |
| Improve a prompt text | Prompt |
| Polish an article/document | Article |
| Optimize a workflow/process | Workflow |
| Improve a system mechanism | System |
For Skill/Plugin mode, resolve the path:
- Skill: `~/.claude/skills/<skill-name>/SKILL.md`
- Plugin: `~/.claude/plugins/<plugin-name>/`
If path doesn't exist, search in order: ~/.claude/skills/ → current dir → ask user.
**Examples of semantic triggers (no keywords):**
- "帮我优化一下这个skill" → Skill mode
- "这个prompt不够好,帮我改进" → Prompt mode
- "我想让这篇文章更通顺" → Article mode
- "优化一下部署流程" → Workflow mode
### Step 2 — Generate Checklist (10 Questions)
Read the target content first. Then generate 10 diverse, specific yes/no checklist questions relevant to the content type:
**For Skill/Plugin mode:**
| # | Dimension | What to Check |
|---|----------|---------------|
| 1 | Description clarity | Is the description precise, actionable, and clear? Does it state what the skill does and when to use it? |
| 2 | Trigger coverage | Does it cover main real-world use cases? |
| 3 | Workflow structure | Are steps clearly sequenced and unambiguous? |
| 4 | Error guidance | Does it handle error states and edge cases? |
| 5 | Tool usage accuracy | Are tool names and parameters correct for Claude Code? |
| 6 | Example quality | Do examples reflect real usage patterns? |
| 7 | Conciseness | Is content free of redundant repetition? |
| 8 | Freedom calibration | Is instruction specificity appropriate? |
| 9 | Reference quality | Are references and links accurate? |
| 10 | Completeness | Are all sections filled with real content? |
**For Prompt mode:**
| # | Dimension | What to Check |
|---|----------|---------------|
| 1 | Goal clarity | Does the prompt state a clear, specific goal? |
| 2 | Role/tone | Is the desired role or tone specified? |
| 3 | Input format | Is the input format clearly described? |
| 4 | Output format | Is the expected output format specified? |
| 5 | Constraints | Are key constraints and boundaries stated? |
| 6 | Context sufficiency | Is enough context provided to avoid hallucination? |
| 7 | Edge cases | Does it handle ambiguous or edge case inputs? |
| 8 | Conciseness | Is it free of redundant or contradictory instructions? |
| 9 | Actionability | Are instructions concrete and actionable vs. vague? |
| 10 | Completeness | Are all necessary elements for the task present? |
**For Article/Documentation mode:**
| # | Dimension | What to Check |
|---|----------|---------------|
| 1 | Title quality | Does the title clearly convey the main value? Is it specific enough? |
| 2 | Opening hook | Does the opening grab attention? Does it set clear expectations? |
| 3 | Logical structure | Are ideas logically organized (not random)? |
| 4 | Argument clarity | Are claims supported with evidence or reasoning? |
| 5 | Conciseness | Is unnecessary padding or repetition removed? |
| 6 | Transition flow | Do paragraphs/sections flow smoothly? |
| 7 | Closing strength | Does the conclusion summarize and inspire action? |
| 8 | Tone consistency | Is the tone consistent throughout? |
| 9 | Readability | Is sentence/paragraph length varied appropriately? |
| 10 | Audience match | Does language match the target audience level? |
**For Workflow/System mode:**
| # | Dimension | What to Check |
|---|----------|---------------|
| 1 | Goal clarity | Is the objective clearly stated? |
| 2 | Step sequencing | Are steps in logical, efficient order? |
| 3 | Completeness | Are all necessary steps present? |
| 4 | Error handling | Are failure modes addressed (timeout, auth, network, resource exhaustion)? |
| 5 | Edge cases | Are corner cases considered (empty input, large files)? |
| 6 | Simplicity | Is the workflow/system as simple as possible? Can steps be combined or eliminated? |
| 7 | Observability | Can progress/status be tracked? |
| 8 | Reversibility | Can steps be undone if errors occur? |
| 9 | Automation potential | Which steps could be automated? |
| 10 | Maintainability | Is it easy to modify and extend? |
Present the 10 questions, numbered 1-10. Ask the user to select which ones to activate.
> **Rule:** Must use at least 5 questions. Using fewer makes scoring unreliable.
After presenting, ask: "Ready to start the optimization loop? (yes/start)"
### Step 3 — Prepare Test Cases
Test cases validate that mutations improve, not harm, the content. Generate realistic user scenarios.
- **Skill/Plugin mode**: Generate 3-5 realistic prompts a user would send when using the skill/plugin
- **Prompt mode**: Generate 3-5 test inputs that the prompt would process
- **Article mode**: Generate 3-5 ways the article might be read or consumed
- **Workflow mode**: Generate 3-5 scenarios the workflow would handle
- **System mode**: Generate 3-5 conditions the system would encounter
Store test cases in context — do not write to disk unless needed.
### Step 4 — Run Autoresearch Loop
> **Tip:** For mutation strategies, see [Mutation Strategy Reference](#mutation-strategy-reference) below.
**Loop configuration:**
- Rounds per batch: 30
- Max total rounds: 100
- Pause: After every 30 rounds, show summary and ask user to continue or stop
- Stop conditions:
- User says stop
- 100 rounds completed
- Score reaches 100%
- No improvement for 10 consecutive rounds
**Per-round procedure:**
Track progress: Round N/100 | Best: XX% | Last: +/-YY
> **Constraint:** ONE mutation per round. Multiple changes = unverifiable = will be reverted.
1. **Mutate**: Make ONE small edit (see [Mutation types](#mutation-types-pick-one-per-round))
2. **Test**: For each test case, simulate what output the content would produce
> **Constraint:** Be honest. If the output would disappoint a user, the mutation failed.
3. **Score**: Apply each active checklist question (0 or 1 per question). Score = (passed / total_questions) × 100
Scoring scale:
- 10/10 = 100% (perfect)
- 7/10 = 70% (good)
- 5/10 = 50% (minimum viable)
4. **Decide**: If new score ≥ best score → keep the mutation. If lower → revert
Example: Best=85%, New=87% → Keep. Best=85%, New=83% → Revert.
> Trust the score. Don't rationalize a bad mutation.
5. **Log**: Round number, mutation type, score, keep/revert decision
**Mutation types (pick ONE per round):**
| Type | Name | When to Use |
|------|------|-------------|
| A | Add constraint | When content is too vague |
| B | Strengthen coverage | When trigger cases are missing |
| C | Add example | When steps are too abstract |
| D | Tighten language | When words are soft ("try to") |
| E | Error handling | When failure modes missing |
| F | Remove redundancy | When content is verbose |
| G | Improve transitions | When flow is choppy |
| H | Expand thin section | When content is sparse |
| I | Add cross-ref | When sections are isolated |
| J | Adjust freedom | When balance is off |
### Step 5 — Report Results (after each batch)
See [Quick Reference](#quick-reference) below for output format examples.
**After each batch (30 rounds):**
Example:
```
Batch 1 (rounds 1-30):
Best score: 85%
Mutations kept: 23 | Reverted: 7
Most effective types: A, C, D
```
**After full completion:**
```
Optimized: [filename/path]
Score: XX% → YY% (+ZZ%)
Rounds: N (kept: K, reverted: R)
Top mutations: [type, type, type]
---
Final content:
[diff or inline]
```
## Mutation Strategy Reference
**High-impact, low-risk changes:**
- Adding explicit constraints where the content is vague
- Expanding coverage to cover edge cases
- Adding concrete examples to abstract instructions
- Tightening soft language ("try to" → "must")
**Avoid in one round:**
- Large rewrites of entire sections
- Multiple unrelated changes at once
- Changing fundamental scope or purpose
- Formatting-only changes (no testable value)
- Adding content the user didn't request
- Removing more than 10% of content
## Quick Reference
### Keywords Reference
**Auto-detect:** `autoresearch`, `自动优化`, `自动研究`
**Skill:** `autoresearch ~/.claude/skills/tdd`
**Prompt:** `optimize this prompt: [text]`
**Workflow:** `optimize the deployment workflow`
**System:** `improve the error handling system`
### Semantic Triggers (No Keywords)
```
"帮我优化一下这个skill" # → Skill mode
"这个prompt不太行" # → Prompt mode
"我想让文章更通顺" # → Article mode
"优化一下部署流程" # → Workflow mode
"改进一下这个系统" # → System mode
"improve this code review" # → Prompt/Skill mode
"polish this documentation" # → Article mode
```
### Mode Detection
| Situation | Action |
|-----------|--------|
| Path detected | Skill/Plugin mode |
| Keyword present | Keyword-specified mode |
| Short text | Prompt mode |
| Long document | Article mode |
| Uncertain | Prompt mode (default) |
**Edge cases:** Empty → ask. Invalid path → fallback to ~/.claude/skills/. Ambiguous → ask.Related Skills
autoresearch-pro
Automatically improve OpenClaw skills, prompts, or articles through iterative mutation-testing loops. Inspired by Karpathy's autoresearch. Use when user says 'optimize [skill]', 'autoresearch [skill]', 'improve my skill', 'optimize this prompt', 'improve my prompt', 'polish this article', 'improve this article', or explicitly requests quality improvement for any text-based content. Supports three modes: skill (SKILL.md files), prompt (any prompt text), and article (any document).
autoresearch-loop
Apply Karpathy's autoresearch methodology to autonomously iterate and improve anything measurable — Claude skills, n8n workflows, system prompts, business processes, or any artifact with a clear quality metric. Inspired by github.com/karpathy/autoresearch (56k stars). The loop: propose a change → test it → measure against the target metric → keep if better, discard if not → repeat indefinitely. ALWAYS trigger this skill when the user says things like: "improve this skill automatically", "iterate on this workflow", "run autoresearch on", "autonomous improvement loop", "overnight experiments", "keep iterating until it's better", "run experiments on this", "optimize this automatically", or any request to have Claude autonomously make and test changes to something measurable. Also trigger for requests to "set up an improvement loop", "run the autoresearch method", or anything involving iterative autonomous experimentation.
Autoresearch Skill
## Trigger
autoresearch-agent
Autonomous experiment loop that optimizes any file by a measurable metric. Inspired by Karpathy's autoresearch. The agent edits a target file, runs a fixed evaluation, keeps improvements (git commit), discards failures (git reset), and loops indefinitely. Use when: user wants to optimize code speed, reduce bundle/image size, improve test pass rate, optimize prompts, improve content quality (headlines, copy, CTR), or run any measurable improvement loop. Requires: a target file, an evaluation command that outputs a metric, and a git repo.
autoresearch
Autonomous AI research skill for running automated neural network experiments. This skill should be used when the user wants to set up autonomous AI research experiments, run automated neural network training, conduct autonomous machine learning research, or let AI agents experiment with model architectures and hyperparameters. Based on Andrej Karpathy's autoresearch project, this skill enables AI agents to autonomously modify training code, run experiments, evaluate results, and iteratively improve models. Use when: (1) Setting up autonomous research experiments, (2) Running automated neural network training, (3) Conducting AI-driven research optimization, (4) Experimenting with model architectures and hyperparameters, (5) Implementing autonomous research loops, or (6) When the user mentions "autonomous research", "AI experiments", "automated training", "neural network optimization", or "autoresearch".
codex-autoresearch-loop
Self-directed iterative research skill for Codex that continuously cycles through modify, verify, retain or discard, and repeat until a measurable goal is reached.
autoresearchclaw-autonomous-research
Fully autonomous research pipeline that turns a topic idea into a complete academic paper with real citations, experiments, and conference-ready LaTeX.
autoresearch-genealogy
Structured prompts, vault templates, and autonomous research workflows for AI-assisted genealogy using Claude Code.
---
name: article-factory-wechat
humanizer
Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.
find-skills
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
tavily-search
Use Tavily API for real-time web search and content extraction. Use when: user needs real-time web search results, research, or current information from the web. Requires Tavily API key.