idea-discovery-robot

Workflow 1 adaptation for robotics and embodied AI. Orchestrates robotics-aware literature survey, idea generation, novelty check, and critical review to go from a broad robotics direction to benchmark-grounded, simulation-first ideas. Use when user says "robotics idea discovery", "机器人找idea", "embodied AI idea", "机器人方向探索", "sim2real 选题", or wants ideas for manipulation, locomotion, navigation, drones, humanoids, or general robot learning.

5,407 stars

bywanshuiyin

View on GitHub Installation ↓

Best use case

idea-discovery-robot is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using idea-discovery-robot should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/idea-discovery-robot/SKILL.md --create-dirs "https://raw.githubusercontent.com/wanshuiyin/Auto-claude-code-research-in-sleep/main/skills/idea-discovery-robot/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/idea-discovery-robot/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How idea-discovery-robot Compares

Feature / Agent	idea-discovery-robot	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

AI Agent for Product Research

Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

SKILL.md Source

# Robotics Idea Discovery Pipeline

Orchestrate a robotics-specific idea discovery workflow for: **$ARGUMENTS**

## Overview

This skill chains four sub-skills into a single automated pipeline:

```
/research-lit → /idea-creator (robotics framing) → /novelty-check → /research-review
  (survey)              (filter + pilot plan)         (verify novel)    (critical feedback)
```

But every phase must be grounded in robotics-specific constraints:
- **Embodiment**: arm, mobile manipulator, drone, humanoid, quadruped, autonomous car, etc.
- **Task family**: grasping, insertion, locomotion, navigation, manipulation, rearrangement, multi-step planning
- **Observation + action interface**: RGB/RGB-D/tactile/language; torque/velocity/waypoints/end-effector actions
- **Simulator / benchmark availability**: simulation-first by default
- **Real robot constraints**: hardware availability, reset cost, safety, operator time
- **Evaluation quality**: success rate plus failure cases, safety violations, intervention count, latency, sample efficiency
- **Sim2real story**: whether the idea can stay in sim, needs offline logs, or truly requires hardware

The goal is not to produce flashy demos. The goal is to produce ideas that are:
- benchmarkable
- falsifiable
- feasible with available robotics infrastructure
- interesting even if the answer is negative

## Constants

- **MAX_PILOT_IDEAS = 3** — Validate at most 3 top ideas deeply
- **PILOT_MODE = `sim-first`** — Prefer simulation or offline-log pilots before any hardware execution
- **REAL_ROBOT_PILOTS = `explicit approval only`** — Never assume physical robot access or approval
- **AUTO_PROCEED = true** — If user does not respond at checkpoints, proceed with the best sim-first option
- **REVIEWER_MODEL = `gpt-5.4`** — External reviewer model via Codex MCP
- **TARGET_VENUES = CoRL, RSS, ICRA, IROS, RA-L** — Default novelty and reviewer framing

> Override inline, e.g. `/idea-discovery-robot "bimanual manipulation" — only sim ideas, no real robot` or `/idea-discovery-robot "drone navigation" — focus on CoRL/RSS, 2 pilot ideas max`

## Execution Rule

Follow the phases in order. Do **not** stop after a checkpoint unless:
- the user explicitly says to stop, or
- the user asks to change scope and re-run an earlier phase

If `AUTO_PROCEED=true` and the user does not respond, continue immediately to the next phase using the strongest **sim-first, benchmark-grounded** option.

## Phase 0: Frame the Robotics Problem

Before generating ideas, extract or infer this **Robotics Problem Frame** from `$ARGUMENTS` and local project context:

- **Embodiment**
- **Task family**
- **Environment type**: tabletop, warehouse, home, outdoor, aerial, driving, legged terrain
- **Observation modalities**
- **Action interface / controller abstraction**
- **Learning regime**: RL, imitation, behavior cloning, world model, planning, VLA/VLM, classical robotics, hybrid
- **Available assets**: simulator, benchmark suite, teleop data, offline logs, existing codebase, real hardware
- **Compute budget**
- **Safety constraints**
- **Desired contribution type**: method, benchmark, diagnosis, systems, sim2real, data curation

If some fields are missing, make explicit assumptions and default to:
- **simulation-first**
- **public benchmark preferred**
- **no real robot execution**

Write this frame into working notes before moving on. Every later decision should reference it.

## Phase 1: Robotics Literature Survey

Invoke:

```
/research-lit "$ARGUMENTS — focus venues: CoRL, RSS, ICRA, IROS, RA-L, TRO, Science Robotics"
```

Then reorganize the findings using a robotics lens instead of a generic ML lens.

### Build a Robotics Landscape Matrix

For each relevant paper, classify:

| Axis | Examples |
|------|----------|
| Embodiment | single-arm, mobile manipulator, humanoid, drone, quadruped |
| Task | pick-place, insertion, navigation, locomotion, long-horizon rearrangement |
| Learning setup | RL, BC, IL, offline RL, world model, planning, diffusion policy |
| Observation | RGB, RGB-D, proprioception, tactile, language |
| Action abstraction | torque, joint velocity, end-effector delta pose, waypoint planner |
| Eval regime | pure sim, sim+real, real-only, offline benchmark |
| Benchmark | ManiSkill, RLBench, Isaac Lab, Habitat, Meta-World, CALVIN, LIBERO, custom |
| Metrics | success rate, collision rate, intervention count, path length, latency, energy |
| Main bottleneck | sample inefficiency, brittleness, reset cost, perception drift, sim2real gap |

### Search Priorities

When refining the survey, prioritize:
- recent work from **CoRL, RSS, ICRA, IROS, RA-L**
- recent arXiv papers from the last 6-12 months
- benchmark papers and follow-up reproductions
- negative-result or diagnosis papers if they reveal system bottlenecks

### What to Look For

Do not stop at "who got the best success rate." Explicitly identify:
- recurring failure modes papers do not fix
- benchmarks that are saturated or misleading
- places where embodiment changes invalidate prior conclusions
- methods that only work with privileged observations
- ideas whose reported gains come from reset engineering, reward shaping, or hidden infrastructure
- task families where evaluation quality is weak even if performance numbers look high

**Checkpoint:** Present the landscape to the user in robotics terms:

```
🤖 Robotics survey complete. I grouped the field by embodiment, benchmark, action interface, and sim2real setup.

Main gaps:
1. [...]
2. [...]
3. [...]

Should I generate ideas under this framing, or should I narrow to a specific robot / benchmark / modality?
```

- **User approves** (or no response + AUTO_PROCEED=true) → proceed to Phase 2 with the best robotics frame.
- **User requests changes** (e.g. narrower embodiment, different benchmark family, no sim2real, no hardware) → refine the robotics frame, re-run Phase 1, and present again.

## Phase 2: Robotics-Specific Idea Generation and Filtering

Generate ideas only after the robotics frame is explicit.

Invoke the existing idea generator, but pass the **Robotics Problem Frame** and landscape matrix into the prompt so it does not produce generic ML ideas:

```
/idea-creator "$ARGUMENTS — robotics frame: [paste Robotics Problem Frame] — focus venues: CoRL, RSS, ICRA, IROS, RA-L — benchmark-specific ideas only — sim-first pilots — no real-robot execution without explicit approval — require failure metrics and baseline clarity"
```

Then rewrite and filter the output using the robotics-specific rules below.

Each candidate idea must include:
- **One-sentence summary**
- **Target embodiment**
- **Target benchmark / simulator / dataset**
- **Core bottleneck being addressed**
- **Minimum sim-first pilot**
- **Mandatory metrics**
- **Expected failure mode if the idea does not work**
- **Whether the idea truly needs real hardware**

### Good Robotics Idea Patterns

Prefer ideas that:
- expose a real bottleneck in perception-action coupling
- improve robustness under embodiment or environment shift
- reduce operator time, reset cost, or demonstration cost
- strengthen sim2real transfer with measurable mechanisms
- improve recovery, retry behavior, or failure detection
- create a better benchmark, diagnostic, or evaluation protocol
- test an assumption the community repeats but rarely measures

### Weak Robotics Idea Patterns

Downrank ideas that are mostly:
- "apply a foundation model / VLM / diffusion model to robot X" with no new bottleneck analysis
- demo-driven but not benchmarkable
- dependent on inaccessible hardware, custom sensors, or massive private datasets
- impossible to evaluate without a months-long infrastructure build
- only interesting if everything works perfectly

### Filtering Rules

For each idea, reject or heavily downrank if:
- no concrete simulator or benchmark is available
- no credible baseline exists
- no measurable metric beyond "looks better"
- real robot execution is required but hardware access is unclear
- the setup depends on privileged observations that make the claim weak
- the expected contribution disappears if evaluation is made fair

**Checkpoint:** Present the ranked robotics ideas before novelty checking:

```
💡 Robotics ideas generated. Top candidates:

1. [Idea 1] — Embodiment: [...] — Benchmark: [...] — Pilot: sim/offline — Risk: LOW/MEDIUM/HIGH
2. [Idea 2] — Embodiment: [...] — Benchmark: [...] — Pilot: sim/offline — Risk: LOW/MEDIUM/HIGH
3. [Idea 3] — requires hardware / weak benchmark / high risk

Should I carry the top sim-first ideas into novelty checking and external review?
(If no response, I'll continue with the strongest benchmark-grounded ideas.)
```

- **User picks ideas** (or no response + AUTO_PROCEED=true) → proceed to Phase 3 with the top sim-first ideas, then continue to Phase 4 and Phase 5.
- **User wants different constraints** → update the robotics frame and re-run Phase 2.
- **User wants narrower scope** → go back to Phase 1 with a tighter embodiment / task / benchmark focus.

## Phase 3: Feasibility and Pilot Design

For the top ideas, design a **minimal validation package**.

If the repository already contains a usable simulator, benchmark harness, or offline dataset pipeline, you may validate the top 1-3 ideas there. If not, do **not** force execution. Produce a concrete pilot plan instead.

By default, pilots should be one of:
- **simulation pilot**
- **offline log / dataset pilot**
- **analysis-only pilot** using existing benchmark outputs

Only propose a real-robot pilot if the user explicitly wants that.

For each surviving idea, specify:

```markdown
- Embodiment:
- Benchmark / simulator:
- Baselines:
- Pilot type: sim / offline / real
- Compute estimate:
- Human/operator time:
- Success metrics:
- Failure metrics:
- Safety concerns:
- What result would count as positive signal:
- What negative result would still be publishable:
```

### Real Robot Rule

**Never auto-proceed to physical robot testing.** If an idea needs hardware:
- mark it as `needs physical validation`
- design the sim or offline precursor first
- ask for explicit user confirmation before any real-robot step

If no cheap sim/offline pilot exists, keep the idea in the report but label it **high execution risk**.

After Phase 3, continue to Phase 4 even if you only produced a pilot plan rather than running a pilot. Lack of immediate execution is not a reason to stop the workflow.

## Phase 4: Deep Novelty Verification

For each top idea, run:

```
/novelty-check "[idea description with embodiment + task family + benchmark + sensor stack + controller/policy class + sim2real angle + target venues: CoRL/RSS/ICRA/IROS/RA-L]"
```

Robotics novelty checks must include:
- embodiment
- task family
- benchmark / simulator
- sensor stack
- controller / policy type
- sim2real or safety angle if relevant

Be especially skeptical of ideas that are just:
- old method + new benchmark
- VLA/VLM + standard manipulation benchmark
- sim2real claim without new transfer mechanism

If the method is not novel but the **finding** or **evaluation protocol** is, say that explicitly.

## Phase 5: External Robotics Review

Invoke:

```
/research-review "[top idea with robotics framing, embodiment, benchmark, baselines, pilot plan, evaluation metrics, and sim2real/hardware risks — review as CoRL/RSS/ICRA reviewer]"
```

Frame the reviewer as a senior **CoRL / RSS / ICRA** reviewer. Ask them to focus on:
- whether the contribution is really new for robotics, not just ML
- the minimum benchmark package needed for credibility
- whether the sim2real story is justified
- missing baselines or failure analyses
- whether the idea survives realistic infrastructure constraints

Update the report with the reviewer's minimum viable evidence package.

## Phase 6: Final Report

Write or update `IDEA_REPORT.md` with a robotics-specific structure so it stays compatible with downstream workflows.

```markdown
# Robotics Idea Discovery Report

**Direction**: $ARGUMENTS
**Date**: [today]
**Pipeline**: research-lit → idea-creator (robotics framing) → novelty-check → research-review

## Robotics Problem Frame
- Embodiment:
- Task family:
- Observation / action interface:
- Available assets:
- Constraints:

## Landscape Matrix
[grouped by embodiment, benchmark, and bottleneck]

## Ranked Ideas

### Idea 1: [title] — RECOMMENDED
- Embodiment:
- Benchmark / simulator:
- Bottleneck addressed:
- Pilot type: sim / offline / real
- Positive signal:
- Novelty:
- Reviewer score:
- Hardware risk:
- Next step:

## Eliminated Ideas
- [idea] — killed because benchmark unclear / hardware inaccessible / novelty weak / no fair evaluation

## Evidence Package for the Top Idea
- Required baselines:
- Required metrics:
- Required failure cases:
- Whether real robot evidence is mandatory:

## Next Steps
- [ ] Implement sim-first pilot
- [ ] Run /novelty-check on the final idea wording
- [ ] Only after approval: consider hardware validation
```

## Key Rules

- **Simulation first.** Hardware is never the default.
- **Benchmark specificity is mandatory.** No benchmark, no serious idea.
- **Evaluation must include failures.** Success rate alone is not enough.
- **Embodiment matters.** Do not assume a result on one robot transfers to another.
- **Avoid foundation-model theater.** Novel terminology is not novelty.
- **Infrastructure realism matters.** Operator time, reset burden, and safety count as research constraints.
- **If the contribution is mainly diagnostic or evaluative, say so.** That can still be publishable.

## Composing with Later Work

After this workflow identifies a strong robotics idea:

```
/idea-discovery-robot "direction"   ← you are here
implement sim-first pilot
/run-experiment                     ← if infrastructure exists
/auto-review-loop "top robotics idea"
```

If no simulator or benchmark is available yet, stop at the report and ask the user to choose whether to build infrastructure or pivot to a more executable idea.

Related Skills

idea-discovery

5407

from wanshuiyin/Auto-claude-code-research-in-sleep

Workflow 1: Full idea discovery pipeline. Orchestrates research-lit → idea-creator → novelty-check → research-review to go from a broad research direction to validated, pilot-tested ideas. Use when user says "找idea全流程", "idea discovery pipeline", "从零开始找方向", or wants the complete idea exploration workflow.

idea-creator

5407

from wanshuiyin/Auto-claude-code-research-in-sleep

Generate and rank research ideas given a broad direction. Use when user says "找idea", "brainstorm ideas", "generate research ideas", "what can we work on", or wants to explore a research area for publishable directions.

vast-gpu

5407

from wanshuiyin/Auto-claude-code-research-in-sleep

Rent, manage, and destroy GPU instances on vast.ai. Use when user says "rent gpu", "vast.ai", "rent a server", "cloud gpu", or needs on-demand GPU without owning hardware.

system-profile

5407

from wanshuiyin/Auto-claude-code-research-in-sleep

Profile a target (script, process, GPU, memory, interconnect) using external tools and code instrumentation. Produces structured performance reports with actionable recommendations. Use when user says "profile", "benchmark", "bottleneck", or wants performance analysis.

training-check

5407

from wanshuiyin/Auto-claude-code-research-in-sleep

Periodically check WandB metrics during training to catch problems early (NaN, loss divergence, idle GPUs). Avoids wasting GPU hours on broken runs. Use when training is running and you want automated health checks.

serverless-modal

5407

from wanshuiyin/Auto-claude-code-research-in-sleep

Run GPU workloads on Modal — training, fine-tuning, inference, batch processing. Zero-config serverless: no SSH, no Docker, auto scale-to-zero. Use when user says "modal run", "modal training", "modal inference", "deploy to modal", "need a GPU", "run on modal", "serverless GPU", or needs remote GPU compute.

semantic-scholar

5407

from wanshuiyin/Auto-claude-code-research-in-sleep

Search published venue papers (IEEE, ACM, Springer, etc.) via Semantic Scholar API. Complements /arxiv (preprints) with citation counts, venue metadata, and TLDR. Use when user says "search semantic scholar", "find IEEE papers", "find journal papers", "venue papers", "citation search", or wants published literature beyond arXiv preprints.

run-experiment

5407

from wanshuiyin/Auto-claude-code-research-in-sleep

Deploy and run ML experiments on local, remote, Vast.ai, or Modal serverless GPU. Use when user says "run experiment", "deploy to server", "跑实验", or needs to launch training jobs.

result-to-claim

5407

from wanshuiyin/Auto-claude-code-research-in-sleep

Use when experiments complete to judge what claims the results support, what they don't, and what evidence is still missing. Codex MCP evaluates results against intended claims and routes to next action (pivot, supplement, or confirm). Use after experiments finish — before writing the paper or running ablations.

research-review

5407

from wanshuiyin/Auto-claude-code-research-in-sleep

Get a deep critical review of research from GPT via Codex MCP. Use when user says "review my research", "help me review", "get external review", or wants critical feedback on research ideas, papers, or experimental results.

research-refine

5407

from wanshuiyin/Auto-claude-code-research-in-sleep

Turn a vague research direction into a problem-anchored, elegant, frontier-aware, implementation-oriented method plan via iterative GPT-5.4 review. Use when the user says "refine my approach", "帮我细化方案", "decompose this problem", "打磨idea", "refine research plan", "细化研究方案", or wants a concrete research method that stays simple, focused, and top-venue ready instead of a vague or overbuilt idea.

research-refine-pipeline

5407

from wanshuiyin/Auto-claude-code-research-in-sleep

Run an end-to-end workflow that chains `research-refine` and `experiment-plan`. Use when the user wants a one-shot pipeline from vague research direction to focused final proposal plus detailed experiment roadmap, or asks to "串起来", build a pipeline, do it end-to-end, or generate both the method and experiment plan together.