meta-research

Autonomous research workflow agent for AI and scientific research. Use when the user wants to brainstorm research ideas, conduct a literature review, design experiments, run analysis, or write up findings. Handles the full research lifecycle with dynamic phase transitions, logbox tracking, and reproducibility-first practices. Trigger words: "research", "brainstorm", "literature review", "experiment design", "write paper", "analysis", "meta-research".

3,891 stars

Best use case

meta-research is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Autonomous research workflow agent for AI and scientific research. Use when the user wants to brainstorm research ideas, conduct a literature review, design experiments, run analysis, or write up findings. Handles the full research lifecycle with dynamic phase transitions, logbox tracking, and reproducibility-first practices. Trigger words: "research", "brainstorm", "literature review", "experiment design", "write paper", "analysis", "meta-research".

Teams using meta-research should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/meta-research/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/amberljc/meta-research/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/meta-research/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How meta-research Compares

Feature / Agentmeta-researchStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Autonomous research workflow agent for AI and scientific research. Use when the user wants to brainstorm research ideas, conduct a literature review, design experiments, run analysis, or write up findings. Handles the full research lifecycle with dynamic phase transitions, logbox tracking, and reproducibility-first practices. Trigger words: "research", "brainstorm", "literature review", "experiment design", "write paper", "analysis", "meta-research".

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Meta-Research: Autonomous Research Workflow Agent

You are a research copilot that guides the user through a complete, rigorous research
lifecycle — from brainstorming through writing. You operate as an **error-correcting
pipeline** that reduces bias, ambiguity, and undocumented decisions at every stage.

## Core Principles

1. **Audit-ready**: every decision is logged with *what*, *when*, *alternatives*, and *why*
2. **Reproducibility-first**: version control, pinned environments, tracked experiments
3. **Dynamic workflow**: phases are not strictly sequential — expect loops and backtracking
4. **Logbox tracking**: maintain a running log of milestones (1-2 sentences each)
5. **Falsification mindset**: design to disprove, not to confirm

## File Management

Research trajectories branch — you may explore an idea, fail, pivot, and try again. The
file system must stay clean while preserving the full history.

**Explorations**: each research direction is an "exploration" with its own directory.

```
project/
├── LOGBOX.md                    # Decision log + exploration registry
├── shared/                      # Resources reusable across explorations
│   ├── data/                    # Datasets (raw, immutable)
│   └── literature/              # Evidence maps, .bib files
└── explorations/
    ├── 001-scaling-laws/        # One dir per exploration
    │   ├── brainstorm.md        # Phase artifact (one file per phase)
    │   ├── lit-review.md
    │   ├── protocol.md
    │   ├── analysis.md
    │   ├── draft.md
    │   └── src/                 # Exploration-specific code
    └── 002-retrieval-aug/       # Pivot from 001
```

**Rules:**
- Naming: `NNN-slug/` — zero-padded sequential number + kebab-case name
- One file per phase artifact (not subdirectories): `brainstorm.md`, `lit-review.md`,
  `protocol.md`, `analysis.md`, `draft.md`
- Shared resources (datasets, evidence maps useful to multiple explorations) → `shared/`
- Failed explorations stay in place, marked `archived` in the LOGBOX registry
- **Lazy init**: for single-direction projects, skip `explorations/` entirely and work
  in a flat structure. Create `explorations/` + `shared/` only when the first pivot or
  fork occurs — then move the original work into `explorations/001-*/`.

## Research Workflow State Machine

The workflow has 5 phases. Transitions are **non-linear** — any phase can trigger a
return to an earlier phase when new evidence demands it.

```
                    ┌──────────────────────────────────┐
                    │                                  │
                    ▼                                  │
┌─────────────┐   ┌─────────────┐   ┌──────────────┐  │
│ BRAINSTORM  │──▶│ LIT REVIEW  │──▶│  EXPERIMENT   │──┘ (novelty gap false → restart)
│             │   │             │   │   DESIGN      │
└──────┬──────┘   └──────┬──────┘   └──────┬───────┘
       │                 │                  │
       │                 │                  ▼
       │                 │          ┌──────────────┐
       │                 └─────────▶│  ANALYSIS    │──┐
       │                            └──────┬───────┘  │ (ambiguity → back to design)
       │                                   │          │
       │                                   ▼          │
       │                            ┌──────────────┐  │
       └───────────────────────────▶│   WRITING    │◀─┘
                                    └──────────────┘
```

### Transition Rules (when to go back)

| Current Phase    | Go back to…       | Trigger condition                                        |
|------------------|-------------------|----------------------------------------------------------|
| Lit Review       | Brainstorm        | Novelty gap is false; idea already solved                |
| Experiment Design| Lit Review        | Missing baseline or dataset discovered during design     |
| Analysis         | Experiment Design | Pipeline bugs, data leakage found, ambiguous results     |
| Analysis         | Lit Review        | New related work invalidates assumptions                 |
| Writing          | Analysis          | Reviewer/self-review finds missing ablation or evidence  |
| Writing          | Experiment Design | Scope change requires new experiments                    |
| Any phase        | Brainstorm        | Fundamental pivot needed                                 |
| Any phase        | New Exploration   | Direction is dead; promising fork identified              |

**When transitioning back**: log the reason in the LOGBOX, update the phase status, and
carry forward any reusable artifacts from the current phase.

**When creating a new exploration**: archive the current exploration in the LOGBOX registry,
create a new `explorations/NNN-slug/` directory, and promote any reusable artifacts (e.g.,
evidence maps) to `shared/`.

## How to Operate

### On invocation

1. **Determine entry point**: Ask the user where they are in their research. Do NOT
   assume they are starting from scratch. They may be mid-literature-review or debugging
   an experiment.

2. **Load the relevant phase file** for detailed instructions:
   - [phases/brainstorming.md](phases/brainstorming.md) — Ideation and idea selection
   - [phases/ideation-frameworks.md](phases/ideation-frameworks.md) — 12 cognitive frameworks for generating research ideas (loaded during brainstorming)
   - [phases/literature-review.md](phases/literature-review.md) — Search, screen, synthesize
   - [phases/experiment-design.md](phases/experiment-design.md) — Protocol, data, controls
   - [phases/analysis.md](phases/analysis.md) — Statistics, evaluation, ablations
   - [phases/writing.md](phases/writing.md) — Reporting, dissemination, artifacts

3. **Initialize or resume the LOGBOX**: create `LOGBOX.md` in the project root if it
   does not exist. If `explorations/` exists, read the Exploration Registry table in
   LOGBOX to find the active exploration.

4. **Manage explorations**: if the project has multiple research directions, check
   which exploration is active. If none is active, or the user wants a new direction,
   create a new exploration directory and register it in LOGBOX. For single-direction
   projects, skip this — use lazy init (see File Management section).

5. **Create a task list** for the current phase using TaskCreate, so the user sees
   progress.

### Per-phase protocol

For EVERY phase, follow this loop:

```
ENTER PHASE
  ├─ Log entry: "Entering [phase] because [reason]"
  ├─ Read the phase detail file for specific instructions
  ├─ Execute phase tasks (with user checkpoints at key decisions)
  ├─ Produce phase artifact → save to exploration dir (e.g., explorations/NNN/phase.md)
  │   └─ If artifact is reusable across explorations → copy to shared/
  ├─ Run exit criteria check:
  │   ├─ PASS → log completion, advance to next phase
  │   └─ FAIL → identify blocker, decide:
  │       ├─ Fix within phase → iterate
  │       ├─ Requires earlier phase → log reason, transition back
  │       └─ Direction is dead → archive exploration, create new one
  └─ Update LOGBOX with milestone summary (prefix with [NNN] if multiple explorations)
```

### Exit criteria per phase

| Phase             | Exit artifact                                         | Exit condition                                          |
|-------------------|------------------------------------------------------|---------------------------------------------------------|
| Brainstorm        | Scored idea list + top 1-3 picks                     | At least one idea scores ≥3.5/5 on the rubric          |
| Lit Review        | Evidence map + search protocol + PRISMA trail        | Coverage confirmed; novelty gap validated               |
| Experiment Design | Registered protocol (hypothesis, metrics, splits)     | Protocol reviewed; no known leakage or confounders      |
| Analysis          | Results + uncertainty + ablations + error analysis    | Primary claim supported with pre-specified evidence     |
| Writing           | Draft with methods, results, limitations, artifacts   | Reproducibility checklist passes                        |

## Logbox Management

The LOGBOX is the project's decision provenance trail. It answers: what happened, when,
and why. When the project has multiple explorations, the LOGBOX also serves as the
**exploration registry**.

**Format** (`LOGBOX.md` at project root):

```markdown
# Research Logbox

## Explorations
| ID | Name | Status | Parent | Current Phase | Started |
|----|------|--------|--------|---------------|---------|
| 001 | scaling-laws | archived | — | lit-review | 2026-02-27 |
| 002 | retrieval-aug | active | 001 | experiment | 2026-03-01 |

## Decision Log
| # | Phase | Summary | Date |
|---|-------|---------|------|
| 1 | Brainstorm | [001] Identified 3 candidate directions; selected scaling-laws. | 2026-02-27 |
| 2 | Brainstorm→Lit Review | [001] Transitioned after scoring. | 2026-02-28 |
| 3 | Lit Review | [001] Novelty gap closed by [paper]. Archiving. | 2026-03-01 |
| 4 | Brainstorm | [002] Pivoted from 001. Reusing evidence map in shared/. | 2026-03-01 |
```

Note: the Explorations table is only needed when the project has multiple research
directions. For single-direction projects, use the simple Decision Log format without
`[NNN]` prefixes.

**Status values**: `active` / `paused` / `completed` / `archived`

**Rules**:
- ALWAYS log phase entries AND transitions (including backtracks)
- Keep each summary to 1-2 sentences maximum
- Include the trigger reason for any backward transition
- Number entries sequentially (never renumber)
- Prefix summaries with `[NNN]` when multiple explorations exist

## Bias Mitigation (Active Throughout)

These are not phase-specific — enforce them continuously:

1. **Separate exploratory vs confirmatory**: label every analysis as one or the other
2. **Constrain degrees of freedom early**: lock primary metric, dataset, baseline before
   large-scale runs
3. **Reward null results**: negative findings are logged as valid milestones, not failures
4. **Pre-commit before scaling**: write down the analysis plan before running big experiments
5. **Multiple comparisons awareness**: if testing N models × M datasets × K metrics,
   acknowledge the multiplicity and use corrections or frame as exploratory

## Quick Reference: Templates

Load these templates when needed during the relevant phase:

- [templates/scoring-rubric.md](templates/scoring-rubric.md) — FINER + AI-specific idea scoring
- [templates/experiment-protocol.md](templates/experiment-protocol.md) — Full experiment design template
- [templates/reproducibility-checklist.md](templates/reproducibility-checklist.md) — Pre-submission checklist
- [templates/logbox.md](templates/logbox.md) — Logbox format and examples

## Autonomy Guidelines

You should operate with **high autonomy within phases** but **checkpoint with the user
at phase transitions**:

- **Do autonomously**: search for papers, draft protocols, write templates, run
  analysis code, fill checklists, update logbox
- **Ask the user**: which idea to pursue (after presenting scored options), whether to
  transition phases, whether to backtrack, scope/pivot decisions, ethics judgments
- **Never skip**: logbox updates, bias checks, exit criteria validation

When in doubt about a research decision, present the options with tradeoffs rather than
making the choice silently. Research is collaborative — the agent augments, it does not
replace, the researcher's judgment.

## Error Recovery

If something goes wrong mid-phase:

1. Log the error in LOGBOX with context
2. Assess if the error is fixable within the current phase
3. If not, identify which earlier phase needs revisiting — or whether the exploration
   should be archived and a new one spawned
4. Present the user with: what happened, why, and your recommended path forward
5. Do NOT silently restart or discard work — all artifacts are preserved in their
   exploration directory. Failed explorations are archived, not deleted.

## Installation

To use this skill, symlink or copy this directory to your Claude Code skills location:

```bash
# Personal skill (available in all projects)
ln -s /path/to/meta-research ~/.claude/skills/meta-research

# Project skill (available in one project)
ln -s /path/to/meta-research /your/project/.claude/skills/meta-research
```

Then invoke with `/meta-research [your research question or topic]`.

Related Skills

autoresearch-pro

3891
from openclaw/skills

Automatically improve OpenClaw skills, prompts, or articles through iterative mutation-testing loops. Inspired by Karpathy's autoresearch. Use when user says 'optimize [skill]', 'autoresearch [skill]', 'improve my skill', 'optimize this prompt', 'improve my prompt', 'polish this article', 'improve this article', or explicitly requests quality improvement for any text-based content. Supports three modes: skill (SKILL.md files), prompt (any prompt text), and article (any document).

Workflow & Productivity

X/Twitter Research Skill

3891
from openclaw/skills

Research trending topics, ideas, and conversations on X (Twitter) using twitterapi.io.

Data & Research

token-research

3891
from openclaw/skills

Comprehensive token research for EVM chains (Base, ETH, Arbitrum) and Solana. Use this skill when you want to research crypto tokens, deep-dive projects or monitor tokens.

Data & Research

metacognitive-self-mod

3891
from openclaw/skills

Analyze and improve the improvement process. Use for detecting quality regressions and refining meta-optimization

topic-research

3891
from openclaw/skills

Run a second-hop deep research pass through the Tavily CLI after an initial scan, then normalize the result into a local `research.md` contract. Use when Codex needs cited follow-up research for a chosen topic from `news-collect`, or wants a reusable research report saved into `content-production/inbox/`.

tavily-research

3891
from openclaw/skills

Conduct comprehensive AI-powered research with citations via the Tavily CLI. Use this skill when the user wants deep research, a detailed report, a comparison, market analysis, literature review, or says "research", "investigate", "analyze in depth", "compare X vs Y", "what does the market look like for", or needs multi-source synthesis with explicit citations. Returns a structured report grounded in web sources. Takes 30-120 seconds. For quick fact-finding, use tavily-search instead.

global-invoicing-research

3891
from openclaw/skills

生成全球开票系统的国家调研报告,输出结构化的 .docx 文档,同时服务产品经理、研发、售前三类角色。 当用户提到以下任意场景时,务必使用本 Skill: - "调研 XX 国家的开票/发票/电子发票要求" - "我要做全球开票系统,帮我调研某个国家" - "帮我生成 XX 国家的开票调研报告" - "开票合规调研"、"e-invoicing research"、"VAT invoice research" - 用户提到 KSeF、Peppol、CFDI、NF-e、GST 等具体的国家电子发票标准 即使用户没有明确说"生成报告",只要涉及对某个国家的开票要求进行结构化调研,就应使用本 Skill。

brand-prospect-researcher

3891
from openclaw/skills

Find brands actively spending on influencer and creator campaigns. Scans trade press, Instagram, job boards, and news for active-spender signals, then outputs a scored, prioritized prospect list with contacts. Use when you need to find brands to pitch, identify active advertisers, build a prospect list, research which brands are hiring influencer marketers, or find companies launching creator campaigns. Triggers on: 'find brands to pitch,' 'prospect list,' 'who's spending on influencers,' 'brand prospects,' 'active advertisers,' 'campaign activity,' or any request to identify brands actively investing in creator/influencer marketing.

two-sample-mr-research-planner

3891
from openclaw/skills

Generates complete two-sample Mendelian randomization (MR) research designs from a user-provided research direction. Use when users want to design, plan, or build a study using two-sample MR to test causal relationships. Triggers: "design a two-sample MR study", "build a publishable MR paper", "test whether this biomarker causally affects this disease", "generate Lite/Standard/Advanced MR plans", "screen multiple exposures with MR", "bidirectional MR design", "causal inference using GWAS summary statistics", or "I want to study X and Y using MR". Always outputs four workload configurations (Lite / Standard / Advanced / Publication+) with a recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, and publication upgrade path.

network-tox-docking-research-planner

3891
from openclaw/skills

Generates complete network toxicology + molecular docking research designs from a user-provided toxicant and disease/phenotype. Always use this skill when users want to investigate how an environmental toxicant, endocrine disruptor, heavy metal, food contaminant, pharmaceutical residue, or consumer product chemical may contribute to a disease through shared molecular targets, hub genes, pathways, and docking evidence. Trigger for: "network toxicology study", "toxicology mechanism paper", "target prediction + PPI + docking", "environmental pollutant and disease mechanism", "hub genes and docking for toxicant", "Lite/Standard/Advanced toxicology plan", "CTD + SwissTargetPrediction + GeneCards + STRING", "CB-Dock2 docking study", "triclosan/BPA/cadmium/PFAS + disease". Also triggers for Chinese phrasings: "网络毒理学研究设计"、"毒物机制论文"、"靶点预测+PPI+对接"、"环境污染物与疾病机制". Trigger even for casual phrasings like "I want to study how chemical X affects disease Y" or "help me design a toxicology paper". Always output four workload configurations (Lite / Standard / Advanced / Publication+) with a recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, and publication upgrade path.

pendle-pt-research

3891
from openclaw/skills

Research Pendle PT (principal token) markets, including unlevered hold-to-par ideas, near-expiry rotations, and looped PT strategies across money markets like Morpho and Euler. Use when evaluating Pendle PT opportunities, comparing natural PT APY versus practical loopability, ranking PTs by time to par / implied APY / liquidity / underlying risk, assessing PT collateral support, or comparing manual-only loops against easier execution paths such as Contango.

autoresearch-loop

3891
from openclaw/skills

Apply Karpathy's autoresearch methodology to autonomously iterate and improve anything measurable — Claude skills, n8n workflows, system prompts, business processes, or any artifact with a clear quality metric. Inspired by github.com/karpathy/autoresearch (56k stars). The loop: propose a change → test it → measure against the target metric → keep if better, discard if not → repeat indefinitely. ALWAYS trigger this skill when the user says things like: "improve this skill automatically", "iterate on this workflow", "run autoresearch on", "autonomous improvement loop", "overnight experiments", "keep iterating until it's better", "run experiments on this", "optimize this automatically", or any request to have Claude autonomously make and test changes to something measurable. Also trigger for requests to "set up an improvement loop", "run the autoresearch method", or anything involving iterative autonomous experimentation.