autopilot

Full autonomous execution from idea to working code

18,419 stars

Best use case

autopilot is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Full autonomous execution from idea to working code

Teams using autopilot should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/autopilot/SKILL.md --create-dirs "https://raw.githubusercontent.com/Yeachan-Heo/oh-my-codex/main/skills/autopilot/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/autopilot/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How autopilot Compares

Feature / AgentautopilotStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Full autonomous execution from idea to working code

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

<Purpose>
Autopilot takes a brief product idea and autonomously handles the full lifecycle: requirements analysis, technical design, planning, parallel implementation, QA cycling, and multi-perspective validation. It produces working, verified code from a 2-3 line description.
</Purpose>

<Use_When>
- User wants end-to-end autonomous execution from an idea to working code
- User says "autopilot", "auto pilot", "autonomous", "build me", "create me", "make me", "full auto", "handle it all", or "I want a/an..."
- Task requires multiple phases: planning, coding, testing, and validation
- User wants hands-off execution and is willing to let the system run to completion
</Use_When>

<Do_Not_Use_When>
- User wants to explore options or brainstorm -- use `plan` skill instead
- User says "just explain", "draft only", or "what would you suggest" -- respond conversationally
- User wants a single focused code change -- use `ralph` or delegate to an executor agent
- User wants to review or critique an existing plan -- use `plan --review`
- Task is a quick fix or small bug -- use direct executor delegation
</Do_Not_Use_When>

<Why_This_Exists>
Most non-trivial software tasks require coordinated phases: understanding requirements, designing a solution, implementing in parallel, testing, and validating quality. Autopilot orchestrates all of these phases automatically so the user can describe what they want and receive working code without managing each step.
</Why_This_Exists>

<Execution_Policy>
- Each phase must complete before the next begins
- Parallel execution is used within phases where possible (Phase 2 and Phase 4)
- QA cycles repeat up to 5 times; if the same error persists 3 times, stop and report the fundamental issue
- Validation requires approval from all reviewers; rejected items get fixed and re-validated
- Cancel with `/cancel` at any time; progress is preserved for resume
- If a deep-interview spec exists, use it as high-clarity phase input instead of re-expanding from scratch
- If input is too vague for reliable expansion, offer/trigger `$deep-interview` first
- Do not enter expansion/planning/execution-heavy phases until pre-context grounding exists; if fast execution is forced, proceed only with explicit risk notes
- Default to concise, evidence-dense progress and completion reporting unless the user or risk level requires more detail
- Treat newer user task updates as local overrides for the active workflow branch while preserving earlier non-conflicting constraints
- If correctness depends on additional inspection, retrieval, execution, or verification, keep using the relevant tools until the workflow is grounded
- Continue through clear, low-risk, reversible next steps automatically; ask only when the next step is materially branching, destructive, or preference-dependent
</Execution_Policy>

<Steps>
0. **Pre-context Intake (required before Phase 0 starts)**:
   - Derive a task slug from the request.
   - Load the latest relevant snapshot from `.omx/context/{slug}-*.md` when available.
   - If no snapshot exists, create `.omx/context/{slug}-{timestamp}.md` (UTC `YYYYMMDDTHHMMSSZ`) with:
     - Task statement
     - Desired outcome
     - Known facts/evidence
     - Constraints
     - Unknowns/open questions
     - Likely codebase touchpoints
   - If ambiguity remains high, run `explore` first for brownfield facts, then run `$deep-interview --quick <task>` before proceeding.
   - Carry the snapshot path into autopilot artifacts/state so all phases share grounded context.

1. **Phase 0 - Expansion**: Turn the user's idea into a detailed spec
   - If `.omx/specs/deep-interview-*.md` exists for this task: reuse it and skip redundant expansion work
   - If prompt is highly vague: route to `$deep-interview` for Socratic ambiguity-gated clarification
   - Analyst (THOROUGH tier): Extract requirements
   - Architect (THOROUGH tier): Create technical specification
   - Output: `.omx/plans/autopilot-spec.md`

2. **Phase 1 - Planning**: Create an implementation plan from the spec
   - Architect (THOROUGH tier): Create plan (direct mode, no interview)
   - Critic (THOROUGH tier): Validate plan
   - Output: `.omx/plans/autopilot-impl.md`

3. **Phase 2 - Execution**: Implement the plan using Ralph + Ultrawork
   - LOW-tier executor/search roles: Simple tasks
   - STANDARD-tier executor roles: Standard tasks
   - THOROUGH-tier executor/architect roles: Complex tasks
   - Run independent tasks in parallel

4. **Phase 3 - QA**: Cycle until all tests pass (UltraQA mode)
   - Build, lint, test, fix failures
   - Repeat up to 5 cycles
   - Stop early if the same error repeats 3 times (indicates a fundamental issue)

5. **Phase 4 - Validation**: Multi-perspective review in parallel
   - Architect: Functional completeness
   - Security-reviewer: Vulnerability check
   - Code-reviewer: Quality review
   - All must approve; fix and re-validate on rejection

6. **Phase 5 - Cleanup**: Clear all mode state via OMX MCP tools on successful completion
   - `state_clear({mode: "autopilot"})`
   - `state_clear({mode: "ralph"})`
   - `state_clear({mode: "ultrawork"})`
   - `state_clear({mode: "ultraqa"})`
   - Or run `/cancel` for clean exit
</Steps>

<Tool_Usage>
- Before first MCP tool use, call `ToolSearch("mcp")` to discover deferred MCP tools
- Use `ask_codex` with `agent_role: "architect"` for Phase 4 architecture validation
- Use `ask_codex` with `agent_role: "security-reviewer"` for Phase 4 security review
- Use `ask_codex` with `agent_role: "code-reviewer"` for Phase 4 quality review
- Agents form their own analysis first, then consult Codex for cross-validation
- If ToolSearch finds no MCP tools or Codex is unavailable, proceed without it -- never block on external tools
</Tool_Usage>

## State Management

Use `omx_state` MCP tools for autopilot lifecycle state.

- **On start**:
  `state_write({mode: "autopilot", active: true, current_phase: "expansion", started_at: "<now>", state: {context_snapshot_path: "<snapshot-path>"}})`
- **On phase transitions**:
  `state_write({mode: "autopilot", current_phase: "planning"})`
  `state_write({mode: "autopilot", current_phase: "execution"})`
  `state_write({mode: "autopilot", current_phase: "qa"})`
  `state_write({mode: "autopilot", current_phase: "validation"})`
- **On completion**:
  `state_write({mode: "autopilot", active: false, current_phase: "complete", completed_at: "<now>"})`
- **On cancellation/cleanup**:
  run `$cancel` (which should call `state_clear(mode="autopilot")`)


## Scenario Examples

**Good:** The user says `continue` after the workflow already has a clear next step. Continue the current branch of work instead of restarting or re-asking the same question.

**Good:** The user changes only the output shape or downstream delivery step (for example `make a PR`). Preserve earlier non-conflicting workflow constraints and apply the update locally.

**Bad:** The user says `continue`, and the workflow restarts discovery or stops before the missing verification/evidence is gathered.

<Examples>
<Good>
User: "autopilot A REST API for a bookstore inventory with CRUD operations using TypeScript"
Why good: Specific domain (bookstore), clear features (CRUD), technology constraint (TypeScript). Autopilot has enough context to expand into a full spec.
</Good>

<Good>
User: "build me a CLI tool that tracks daily habits with streak counting"
Why good: Clear product concept with a specific feature. The "build me" trigger activates autopilot.
</Good>

<Bad>
User: "fix the bug in the login page"
Why bad: This is a single focused fix, not a multi-phase project. Use direct executor delegation or ralph instead.
</Bad>

<Bad>
User: "what are some good approaches for adding caching?"
Why bad: This is an exploration/brainstorming request. Respond conversationally or use the plan skill.
</Bad>
</Examples>

<Escalation_And_Stop_Conditions>
- Stop and report when the same QA error persists across 3 cycles (fundamental issue requiring human input)
- Stop and report when validation keeps failing after 3 re-validation rounds
- Stop when the user says "stop", "cancel", or "abort"
- If requirements were too vague and expansion produces an unclear spec, pause and redirect to `$deep-interview` before proceeding
</Escalation_And_Stop_Conditions>

<Final_Checklist>
- [ ] All 5 phases completed (Expansion, Planning, Execution, QA, Validation)
- [ ] All validators approved in Phase 4
- [ ] Tests pass (verified with fresh test run output)
- [ ] Build succeeds (verified with fresh build output)
- [ ] State files cleaned up
- [ ] User informed of completion with summary of what was built
</Final_Checklist>

<Advanced>
## Configuration

Optional settings in `~/.codex/config.toml`:

```toml
[omx.autopilot]
maxIterations = 10
maxQaCycles = 5
maxValidationRounds = 3
pauseAfterExpansion = false
pauseAfterPlanning = false
skipQa = false
skipValidation = false
```

## Resume

If autopilot was cancelled or failed, run `/autopilot` again to resume from where it stopped.

## Recommended Clarity Pipeline

For ambiguous requests, prefer:

```
deep-interview -> ralplan -> autopilot
```

- `deep-interview`: ambiguity-gated Socratic requirements
- `ralplan`: consensus planning (planner/architect/critic)
- `autopilot`: execution + QA + validation

## Best Practices for Input

1. Be specific about the domain -- "bookstore" not "store"
2. Mention key features -- "with CRUD", "with authentication"
3. Specify constraints -- "using TypeScript", "with PostgreSQL"
4. Let it run -- avoid interrupting unless truly needed

## Pipeline Orchestrator (v0.8+)

Autopilot can be driven by the configurable pipeline orchestrator (`src/pipeline/`), which
sequences stages through a uniform `PipelineStage` interface:

```
RALPLAN (consensus planning) -> team-exec (Codex CLI workers) -> ralph-verify (architect verification)
```

Pipeline configuration options:

```toml
[omx.autopilot.pipeline]
maxRalphIterations = 10    # Ralph verification iteration ceiling
workerCount = 2            # Number of Codex CLI team workers
agentType = "executor"     # Agent type for team workers
```

The pipeline persists state via `pipeline-state.json` and supports resume from the last
incomplete stage. See `src/pipeline/orchestrator.ts` for the full API.

## Troubleshooting

**Stuck in a phase?** Check TODO list for blocked tasks, run `state_read({mode: "autopilot"})`, or cancel and resume.

**QA cycles exhausted?** The same error 3 times indicates a fundamental issue. Review the error pattern; manual intervention may be needed.

**Validation keeps failing?** Review the specific issues. Requirements may have been too vague -- cancel and provide more detail.
</Advanced>