ClaudeCodexWorkflow & Productivity

reprompter

Transform messy prompts into structured, effective prompts — single, multi-agent, or reverse-engineered from great outputs. Use when: "reprompt", "reprompt this", "clean up this prompt", "structure my prompt", rough text needing XML tags, "reprompter teams", "repromptverse", "run with quality", "smart run", "smart agents", "campaign swarm", "engineering swarm", "ops swarm", "research swarm", multi-agent tasks, audits, parallel work, "reverse reprompt", "reprompt from example", "learn from this", "extract prompt from", "prompt dna", "prompt genome", reverse-engineering prompts from exemplar outputs. Don't use for simple Q&A, pure chat, or immediate execution-only tasks (see "Don't Use When" section). Outputs: structured XML/Markdown prompt, before/after quality score, optional team brief + per-agent sub-prompts, Agent Cards, Extraction Card (reverse mode). Target quality score: Single ≥ 7/10; Repromptverse per-agent ≥ 8/10; Reverse ≥ 7/10.

97 stars

byAytuncYildizli

Complexity: easy

View on GitHub Installation ↓

About this skill

Reprompter is an advanced AI agent skill designed to elevate the quality and effectiveness of user prompts. It takes a raw, unstructured prompt and refines it into a clear, actionable format, often utilizing XML or Markdown for optimal agent understanding. This process includes an 'Interview' or 'Dimension Interview' phase to clarify user intent and requirements. It operates in two distinct modes: 'Single' mode enhances individual prompts by structuring them for better results and providing a quality score. The 'Repromptverse' mode goes further, orchestrating multi-agent workflows. It plans agent teams, generates 'Agent Cards' for planning and status, reprompts each agent's specific task, executes the plan, and provides 'Result Cards' for evaluation, even including retries. Users benefit from significantly improved AI agent performance, reduced ambiguity, and more reliable outcomes, especially for intricate tasks requiring coordinated effort across multiple agents. It ensures best practices are applied, leading to higher quality scores and more actionable outputs from both individual agents and complex agent swarms.

Best use case

The primary use case for Reprompter is to enhance the quality and clarity of prompts provided to AI agents, whether for simple tasks or complex multi-agent operations. It is most beneficial for users struggling with ambiguous AI outputs, those who want to ensure their prompts adhere to best practices (like using XML tags), or anyone orchestrating a team of AI agents for marketing, engineering, research, or operational swarms, ensuring each agent receives an optimized directive for parallel and audited work.

Users can expect a highly structured XML/Markdown prompt with an improved quality score, and for multi-agent tasks, a comprehensive team brief with per-agent sub-prompts, agent team output files, and Agent Cards detailing plan, status, and results.

Practical example

Example input

reprompt this: "Write an engaging blog post on AI ethics for a tech audience. Include key considerations and future outlook, but don't make it too long."

Example output

```xml
<structured_prompt>
  <task>Generate a blog post</task>
  <topic>AI Ethics</topic>
  <audience>Tech professionals and enthusiasts</audience>
  <tone>Engaging and informative</tone>
  <key_sections>
    <section>Introduction to AI Ethics</section>
    <section>Current Ethical Considerations (e.g., bias, privacy, accountability)</section>
    <section>Future Outlook and Challenges</section>
    <section>Conclusion and Call to Action</section>
  </key_sections>
  <length>Concise (approx. 800-1000 words)</length>
  <output_format>Markdown for blog post</output_format>
</structured_prompt>
<quality_score_before>5/10</quality_score_before>
<quality_score_after>8.5/10</quality_score_after>
```

When to use this skill

You have a rough or messy prompt that needs structuring and best practices (e.g., XML tags).
You are orchestrating a multi-agent team (e.g., marketing, engineering, research swarms) and need coordinated, high-quality sub-prompts.
You want to improve the quality score and effectiveness of your AI agent's output.
You need to conduct audits, parallel work, or complex multi-agent tasks.

When not to use this skill

For simple, direct Q&A where prompt quality isn't a primary concern.
When performing basic chat interactions that don't require structured output.
For immediate execution-only tasks where no refinement or multi-agent coordination is needed.

How reprompter Compares

Feature / Agent	reprompter	Standard Approach
Platform Support	Claude, Codex	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	easy	N/A

Frequently Asked Questions

What does this skill do?

Which AI agents support this skill?

This skill is designed for Claude, Codex.

How difficult is it to install?

The installation complexity is rated as easy. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

Top AI Agents for Productivity

See the top AI agent skills for productivity, workflow automation, operational systems, documentation, and everyday task execution.

SKILL.md Source

# RePrompter v12.1.0

> **Your prompt sucks. Let's fix that.** Single prompts, full agent teams, or reverse-engineer from great outputs — one skill, three modes. **v12.0 closes the loop: every prompt emits success criteria, every run can be recorded + scored + fed into a local flywheel that biases future generations toward what actually worked — with an A/B report to prove it.**

---

## Two modes

| Mode | Trigger | What happens |
|------|---------|-------------|
| **Single** | "reprompt this", "clean up this prompt" | Interview → structured prompt → score |
| **Repromptverse** | "reprompter teams", "repromptverse", "run with quality", "smart run", "smart agents", "campaign swarm", "engineering swarm", "ops swarm", "research swarm" | Dimension Interview → Plan team → Agent Cards → reprompt each agent → execute → Result Cards → evaluate → retry |
| **Reverse** | "reverse reprompt", "reprompt from example", "learn from this", "extract prompt from", "prompt dna", "prompt genome" | Analyze exemplar → classify → extract prompt DNA → generate XML prompt → score → inject into flywheel |

Auto-detection: if task mentions 2+ systems, "audit", or "parallel" → ask: "This looks like a multi-agent task. Want to use Repromptverse mode?"

Definition — **2+ systems** means at least two distinct technical domains that can be worked independently. Examples: frontend + backend, API + database, mobile app + backend, infrastructure + application code, security audit + cost audit.

## Don't use when

- User wants a simple direct answer (no prompt generation needed)
- User wants casual chat/conversation
- Task is immediate execution-only with no reprompting step
- Scope does not involve prompt design, structure, or orchestration

> Clarification: RePrompter **does** support code-related tasks (feature, bugfix, API, refactor) by generating better prompts. It does **not** directly apply code changes in Single mode. Direct code execution belongs to coding-agent unless Repromptverse execution mode is explicitly requested.

---

## Mode 1: Single prompt

### Process

1. **Receive raw input**
2. **Input guard** — if input is empty, a single word with no verb, or clearly not a task → ask the user to describe what they want to accomplish
   - Reject examples: "hi", "thanks", "lol", "what's up", "good morning", random emoji-only input
   - Accept examples: "fix login bug", "write API tests", "improve this prompt"
3. **Quick Mode gate** — under 20 words, single action, no complexity indicators → generate immediately
4. **Smart Interview** — use `AskUserQuestion` with clickable options (2-5 questions max)
5. **Flywheel bias check (optional, read-only)** — if `REPROMPTER_FLYWHEEL_BIAS=1` is set in the environment, consult past outcomes before choosing a template. See "Flywheel bias injection" below.
6. **Generate + Score** — apply template, show before/after quality metrics. The generated prompt MUST include a `<success_criteria schema_version="1">` block with 3-6 `<criterion>` entries. Each criterion has `id` (kebab-case slug, unique in block), `verification_method` (`rule` | `llm_judge` | `manual`), a one-sentence `<description>`, and — depending on method — an inline `<rule type="regex|predicate">` or `<judge_prompt>` (neither for `manual`). Schema of record: `references/outcome-schema.md`.
7. **Single-pass evaluator** — run self-eval rubric and do one delta rewrite if score < 7

**Why criteria are emitted:** so every prompt carries its own testable assertions; outcome records produced by `scripts/outcome-record.js` (added in the same PR) join criteria to results for flywheel learning.

#### Flywheel bias injection (v3 read-path)

Default: off. Enable explicitly with `REPROMPTER_FLYWHEEL_BIAS=1` so runs with and without bias can be compared apples-to-apples until it earns the default.

When the flag is set, between the interview and the template pick:

1. Run `npm run flywheel:query -- --task-type <slug>` where `<slug>` is the task type identified from the interview (e.g. `fix_bug`, `write_code`).
2. Read the command's stdout. It's either `null` (cold start / low N) or a single JSON object with `recipe`, `confidence`, `sampleCount`.
3. **Only bias on `confidence ∈ {"medium", "high"}` AND `sampleCount >= 3`.** Low-confidence recommendations add noise without signal; treat them as cold start.
4. When biasing:
   - Prefer `recipe.vector.templateId` over the default intent-routed template.
   - Adopt `recipe.vector.patterns` alongside anything you would have picked from `references/patterns/`.
   - Match `recipe.vector.capabilityTier` in your reasoning about downstream execution.
5. Announce the decision in one line before the generate step so the user sees what happened:
   > Flywheel: preferring `<template>` + `[patterns]` based on N past runs (score X/10, <confidence> confidence)
   Or, if no bias applied:
   > Flywheel: no bias (cold start / low confidence)
6. The bias changes **which template/patterns you start from.** The rest of the pipeline (interview content, generated prompt's XML structure, criteria emission) is unchanged. The flywheel never rewrites Claude's output.
7. **Attribution (v3 part 3).** When bias is applied, remember the chosen recipe's `hash`, `confidence`, and `sampleCount` until the outcome is recorded for this run. Then stamp them onto the record via `scripts/outcome-record.js --applied-recommendation '{"recipe_hash":"<hash>","confidence":"<low|medium|high>","sample_count":<N>,"applied_at":"prompt_gen"}'`. Use `applied_at="phase_2"` for Repromptverse team-wide bias. **If no bias was applied (flag off, query returned null, or low confidence) OMIT the flag entirely** — the *absence* of `applied_recommendation` on a record is what marks it as the bias-off control group for `npm run flywheel:ab` analysis. Never stamp a zero/placeholder block; absence is the signal.

### ⚠️ MUST GENERATE AFTER INTERVIEW

After interview completes, IMMEDIATELY:
1. Select template based on task type
2. Generate the full polished prompt
3. Show quality score (before/after table)
4. Ask if user wants to execute or copy

```
❌ WRONG: Ask interview questions → stop
✅ RIGHT: Ask interview questions → generate prompt → show score → offer to execute
```

### Interview questions

Ask via `AskUserQuestion`. **Max 5 questions total.**

**Standard questions** (priority order — drop lower ones if task-specific questions are needed):
1. Task type: Build Feature / Fix Bug / Refactor / Write Tests / API Work / UI / Security / Docs / Content / Research / Multi-Agent
   - If user selects **Multi-Agent** while currently in **Single mode**, immediately transition to **Repromptverse Phase 1 (Team Plan)** and confirm team execution mode (Parallel vs Sequential).
2. Execution mode: Single Agent / Team (Parallel) / Team (Sequential) / Let RePrompter decide
3. Motivation: User-facing / Internal tooling / Bug fix / Exploration / Skip *(drop first if space needed)*
4. Output format: XML Tags / Markdown / Plain Text / JSON *(drop first if space needed)*

**Task-specific questions** (MANDATORY for compound prompts — replace lower-priority standard questions):
- Extract keywords from prompt → generate relevant follow-up options
- Example: prompt mentions "telegram" → ask about alert type, interactivity, delivery
- **Vague prompt fallback:** if input has no extractable keywords (e.g., "make it better"), ask open-ended: "What are you working on?" and "What's the goal?" before proceeding

### Single mode pattern pack (Microsoft-inspired)

Apply these patterns even without multi-agent execution:

1. **Intent router** — map task to template with explicit priority rules
2. **Constraint normalizer** — convert vague goals into measurable requirements/limits
3. **Spec contract** — enforce role/context/task/requirements/constraints/output/success structure
4. **Evaluator loop** — score clarity/specificity/structure/constraints/verifiability/decomposition; if score < 7, produce one delta rewrite

This keeps Single mode deterministic and compatible across Claude, OpenClaw, and Codex runtimes.

### Auto-detect complexity

| Signal | Suggested mode |
|--------|---------------|
| 2+ distinct systems (e.g., frontend + backend, API + DB, mobile + backend) | Team (Parallel) |
| Pipeline (fetch → transform → deploy) | Team (Sequential) |
| Single file/component | Single Agent |
| "audit", "review", "analyze" across areas | Team (Parallel) |
| "campaign", "launch", "growth", "SEO", "content calendar", "funnel" | Team (Parallel, Marketing Swarm) |
| "architecture", "feature delivery", "refactor", "migration", "test coverage" | Team (Parallel, Engineering Swarm) |
| "incident", "uptime", "gateway", "latency", "cron", "SLO", "health" | Team (Parallel, Ops Swarm) |
| "benchmark", "compare", "tradeoff", "options", "analysis", "research" | Team (Parallel, Research Swarm) |

### Quick mode

#### ⚠️ Force interview signals (check first)

**If ANY of the following signals are present, SKIP Quick Mode and go directly to interview — no exceptions:**

| Signal category | Keywords / patterns |
|----------------|---------------------|
| **Scope keywords** | system, platform, service, pipeline, dashboard, module, suite, management |
| **Ownership / existing state** | our, existing, the current, fresh, updated |
| **Integration verbs** | integrate, merge, connect, combine, sync |
| **Compound tasks** | "and", "plus", "also", "as well as" |
| **State management** | track, sync, manage |
| **Vague modifiers** | better, improved, some, maybe, kind of |
| **Ambiguous pronouns** | "it", "this", "that" without a clear referent in the same sentence |
| **Comprehensiveness** | comprehensive, complete, full, end-to-end, overall |

**Clause detection:** Treat any prompt with two or more independent clauses (comma-separated actions, semicolon-joined tasks, or consecutive imperative verbs) as a compound task — force interview.

**Broad-scope noun enforcement (`count_distinct_systems()`):** Count the number of distinct systems/modules implied by broad-scope nouns (system, module, suite, platform, pipeline, dashboard, management). If count >= 1 AND the prompt does not name a single, specific identifier — force interview.

#### Enable Quick Mode (only when NO force-interview signals are present)

Enable when ALL true:
- < 20 words (excluding code blocks)
- Exactly 1 action verb from: add, fix, remove, rename, move, delete, update, create
- Single target (one specific, named file, component, or identifier — NOT a broad-scope noun such as system, module, suite, or management)
- No conjunctions (and, or, plus, also)
- No vague modifiers (better, improved, some, maybe, kind of)

### Task types & templates

Detect task type from input. Each type has a dedicated template in `references/`:

| Type | Template | Use when |
|------|----------|----------|
| Feature | `feature-template.md` | New functionality (default fallback) |
| Bugfix | `bugfix-template.md` | Debug + fix |
| Refactor | `refactor-template.md` | Structural cleanup |
| Testing | `testing-template.md` | Test writing |
| API | `api-template.md` | Endpoint/API work |
| UI | `ui-template.md` | UI components |
| Security | `security-template.md` | Security audit/hardening |
| Docs | `docs-template.md` | Documentation |
| Content | `content-template.md` | Blog posts, articles, marketing copy |
| Research | `research-template.md` | Analysis/exploration |
| Marketing Swarm | `marketing-swarm-template.md` | Marketing-first multi-agent orchestration |
| Engineering Swarm | `engineering-swarm-template.md` | Engineering-first multi-agent orchestration |
| Ops Swarm | `ops-swarm-template.md` | Reliability/infra multi-agent orchestration |
| Research Swarm | `research-swarm-template.md` | Analysis/benchmark multi-agent orchestration |
| Repromptverse | `repromptverse-template.md` | Multi-agent routing + termination + evaluator loop |
| Multi-Agent | `swarm-template.md` | Basic multi-agent coordination |
| Reverse | `reverse-template.md` | Reverse-engineered prompt from exemplar output |
| Team Brief | `team-brief-template.md` | Team orchestration brief |

**Priority** (most specific wins): marketing-swarm > engineering-swarm > ops-swarm > research-swarm > repromptverse > api > security > ui > testing > bugfix > refactor > content > docs > research > feature. For multi-agent tasks, use the best-fit swarm template + `repromptverse-template` + `team-brief-template`, then type-specific templates for each agent sub-prompt.

**How it works:** Read the matching template from `references/{type}-template.md`, then fill it with task-specific context. Templates are NOT loaded into context by default — only read on demand when generating a prompt. If the template file is not found, fall back to the Base XML Structure below.

> To add a new task type: create `references/{type}-template.md` following the XML structure below, then add it to the table above.

### Base XML structure

All templates follow this core structure (8 required tags). Use as fallback if no specific template matches:

Exception: `team-brief-template.md` uses Markdown format for orchestration briefs. This is intentional — see template header for rationale.

```xml
<role>{Expert role matching task type and domain}</role>

<context>
- Working environment, frameworks, tools
- Available resources, current state
</context>

<task>{Clear, unambiguous single-sentence task}</task>

<motivation>{Why this matters — priority, impact}</motivation>

<requirements>
- {Specific, measurable requirement 1}
- {At least 3-5 requirements}
</requirements>

<constraints>
- {What NOT to do}
- {Boundaries and limits}
</constraints>

<output_format>{Expected format, structure, length}</output_format>

<success_criteria schema_version="1">
  <criterion id="no-regression" verification_method="rule">
    <description>Output does not reintroduce the original error signature.</description>
    <rule type="regex"><![CDATA[^(?!.*TypeError: cannot read property 'id' of undefined).*$]]></rule>
  </criterion>
  <criterion id="guards-null-user" verification_method="llm_judge">
    <description>Fix guards against the null-user edge case from the bug report.</description>
    <judge_prompt><![CDATA[Does the diff check that `user` is non-null before reading `user.id`? Reply pass or fail.]]></judge_prompt>
  </criterion>
  <criterion id="regression-test-added" verification_method="manual">
    <description>At least one regression test covers the previously failing scenario.</description>
  </criterion>
</success_criteria>
```

(The Base XML `<success_criteria>` example above matches the v1 schema in `references/outcome-schema.md`; real generated prompts should adapt the `id`s, descriptions, and rules to the task at hand.)

### Project context detection

Auto-detect tech stack from current working directory ONLY:
- Scan `package.json`, `tsconfig.json`, `prisma/schema.prisma`, etc.
- Session-scoped — different directory = fresh context
- Opt out with "no context", "generic", or "manual context"
- Never scan parent directories or carry context between sessions

---

## Mode 2: Repromptverse (Agent Teams)

### TL;DR

```
Raw task in → quality output out. Every agent gets a reprompted prompt.

Phase 1: Score raw prompt, dimension interview if needed, plan team, show Agent Cards (YOU do this, ~45s)
Phase 2: Write XML-structured prompt per agent (YOU do this, ~2min)
Phase 3: Launch agents (tmux, TeamCreate, sessions_spawn, Codex, or sequential) (AUTOMATED)
Phase 4: Show Result Cards, score, retry if needed (YOU do this)
```

**Key insight:** The reprompt phase costs ZERO extra tokens — YOU write the prompts, not another AI.

### Repromptverse control plane (Microsoft-inspired)

Every multi-agent run must include:

1. **Routing policy** — who speaks next and why (selector-style routing for non-trivial teams)
2. **Termination policy** — max turns, max wall time, and no-progress stop condition
3. **Artifact contract** — one writer per output file, fixed schema for handoffs
4. **Evaluator loop** — score each artifact, retry only with delta prompts (max 2 retries)

Use `references/repromptverse-template.md` to enforce this contract.

Domain profile auto-load rules (lazy-load, on demand):

- Marketing intent (`campaign`, `launch`, `growth`, `seo`, `content calendar`, `funnel`) -> `references/marketing-swarm-template.md`
- Engineering intent (`architecture`, `feature delivery`, `refactor`, `migration`, `test coverage`) -> `references/engineering-swarm-template.md`
- Ops intent (`incident`, `uptime`, `gateway`, `latency`, `cron`, `slo`, `health`) -> `references/ops-swarm-template.md`
- Research intent (`benchmark`, `compare`, `tradeoff`, `analysis`, `research`) -> `references/research-swarm-template.md`

Then merge with `references/repromptverse-template.md` for routing/termination/evaluation contract and add task-specific constraints.

Canonical implementation for deterministic routing lives in `scripts/intent-router.js`.
If docs and code ever diverge, the script is the source of truth for benchmark/testing paths.

### Phase 1: Team plan (~45 seconds)

1. **Score raw prompt** (1-10): Clarity, Specificity, Structure, Constraints, Decomposition
   - Phase 1 uses 5 quick-assessment dimensions. The full 6-dimension scoring (adding Verifiability) is used in Phase 4 evaluation.
2. **Dimension Interview gate** — check which askable dimensions scored < 5 (see Dimension Interview section below)
3. **Pick mode:** parallel (independent agents) or sequential (pipeline with dependencies)
4. **Define team:** 2-5 agents max, each owns ONE domain, no overlap (informed by interviewContext if interview ran)
5. **Show Plan Cards** (see Agent Cards section below)
6. **User confirmation gate** — "Team plan ready. Proceed to execution?" User can approve, adjust, or cancel. In automated/batch runs, auto-proceed.
7. **Write team brief** to `/tmp/rpt-brief-{taskname}.md` (use unique tasknames to avoid collisions; includes interviewContext section if interview ran)

### Dimension Interview (Repromptverse only)

Score-driven interview for Repromptverse mode. Distinct from Single mode's "Smart Interview" (which uses a standard question list). The Dimension Interview derives questions from low-scoring raw prompt dimensions.

#### Trigger logic

```
scores = score_raw_prompt(rawInput)  # 5 dimensions from step 1

# Structure is EXCLUDED — reprompter fixes structure via templates.
# Only 4 dimensions are interview-eligible:
askable = [d for d in scores if d.name != "Structure" and d.value <= 5]

# Threshold: less-than-or-equal. Scores of 5 ARE borderline and trigger questions.
if len(askable) == 0:
    SKIP interview → proceed to step 3 (pick mode)
elif len(askable) <= 2:
    ASK 1-2 questions (one per low dimension)
else:
    ASK 3-4 questions (max 4, prioritized by lowest score first)
```

#### Dimension-to-question mapping

| Dimension | Score < 5 triggers | Question approach |
|-----------|-------------------|-------------------|
| **Clarity** | Task is ambiguous or multi-interpretable | Open-ended with dynamic options extracted from prompt keywords |
| **Specificity** | Scope is vague, no concrete targets | Dynamic options from prompt keywords + top-level directory names |
| **Constraints** | No boundaries defined | "Any areas to exclude?" with context-aware options |
| **Decomposition** | Unclear work split | "How many independent streams?" with suggested splits |

**Question rules:**
- Use `AskUserQuestion` with clickable options (consistent with Single mode)
- Options are **dynamic**: extracted from prompt keywords + codebase context (config files + top-level dirs only — no deep analysis)
- Every question includes a free-text escape hatch option
- Priority order: lowest scoring dimension first
- Language follows user's input language

#### Skip/dismiss handling

- User skips all questions → proceed with empty interviewContext. Plan Cards note: "Interview: skipped by user"
- User answers some, skips others → populate only answered fields

#### Interview output (interviewContext)

Responses merge into an interviewContext written to the team brief file:

```
interviewContext = {
  scope: [from Specificity answer],
  excludes: [from Constraints answer],
  successCriteria: [from answers, or omitted — Phase 2 derives from requirements],
  taskClarification: [from Clarity answer, if asked]
}
```

When `successCriteria` is not gathered (question not asked or user skipped), omit the field. Phase 2 derives success criteria from requirements as it does today.

**How interviewContext feeds into later phases:**
- **Agent count and roles** — scope determines which agents are created
- **Per-agent `<constraints>`** — excludes injected into each agent's prompt
- **Per-agent `<success_criteria>`** — user expectations propagated
- **Template selection** — clarified task type may route to a different swarm profile

**Precedence:** Interview responses override auto-detected codebase context. Conflicts noted in Plan Cards.

**Flywheel:** interviewContext is excluded from recipe fingerprint hash. The fingerprint captures strategy (template + patterns + tier), not user scope answers.

### Agent Cards (transparency layer)

Three fixed-format card types rendered at different phases. Templates are exact — do not invent new formats.

#### Plan Cards — rendered at end of Phase 1 (step 5)

After team plan is complete, before Phase 2 prompt writing. Use this exact table format:

```markdown
## Team: {N} Opus Agents ({Parallel|Sequential})

| # | Agent | Scope | Excludes | Output |
|---|-------|-------|----------|--------|
| 1 | {role} | {scope} | {excludes or "-"} | {output path} |
| 2 | {role} | {scope} | {excludes or "-"} | {output path} |

Interview context applied: {summary of influence, including override conflicts, or "No interview (high-quality prompt)", or "Interview: skipped by user"}
```

**Rules:**
- MUST appear before any agent is launched
- If interview ran, show which constraints came from interview vs auto-detected
- If user requests agent adjustments at confirmation gate, re-render Plan Cards with updated team
- Single-agent runs: table renders with one row (valid)

#### Status Line — rendered during Phase 3 polling

Compact one-line status with each poll cycle:

```
Agents: ✅ 2/4  ⏳ 1/4  🔄 1/4 (retry 1)
```

**Emoji mapping:** ✅ = completed, ⏳ = in-progress, 🔄 = retrying

**Rules:**
- Replace verbose poll output with this compact format
- Platform-dependent: TeamCreate uses TaskList status; tmux uses best-effort pane parsing; sequential is trivial
- Show retry count for retrying agents

#### Result Cards — rendered at start of Phase 4

After reading all agent outputs, before synthesis. Use this exact table format:

```markdown
## Results

| Agent | Score | Findings | Key Insight |
|-------|-------|----------|-------------|
| {role} | {score}/10 {pass/retry emoji} | {count} findings | {one-sentence top finding} |

Total: {N} findings | {accepted}/{total} accepted | {retry_count} retries
```

**Rules:**
- MUST appear before synthesis is written
- "Key Insight" = single most important finding per agent (forces prioritization)
- Retry agents show retry reason in findings column

#### Token budget (Agent Cards + Dimension Interview)

| Phase | Extra tokens | Source |
|-------|-------------|--------|
| Phase 1 (interview) | 100-400 | AskUserQuestion calls (0-4 questions) + option generation from config/directory scan |
| Phase 1 (plan cards) | 100-300 | Table render (varies by team size) |
| Phase 3 (status) | ~20/poll | Compact status line |
| Phase 4 (result cards) | 150-250 | Summary table |
| **Total** | **~400-1000** | **0.5-2% of typical 50K-200K run** |

### Phase 2: Repromptverse prompt pack (~2 minutes)

**Flywheel bias check (optional, read-only):** Same rules as Mode 1 (see "Flywheel bias injection" in Mode 1). When `REPROMPTER_FLYWHEEL_BIAS=1`, run `npm run flywheel:query -- --task-type <team-task-slug>` once for the overall team task before per-agent adaptation. If `confidence ∈ {"medium", "high"}` with `sampleCount >= 3`, prefer the recommended `templateId`/`patterns` as the team-wide starting point; each agent still picks its own role-specific template on top. Announce the bias decision once at the start of Phase 2, not per agent, to keep the output readable. Per-role bias queries are a v3 follow-up once enough role-stamped records exist.

For EACH agent:
1. Pick the best-matching template from `references/` (or use base XML structure)
2. Read it, then apply these **per-agent adaptations**:

- `<role>`: Specific expert title for THIS agent's domain
- `<context>`: Add exact file paths (verified with `ls`), what OTHER agents handle (boundary awareness)
- `<requirements>`: At least 5 specific, independently verifiable requirements
- `<constraints>`: Scope boundary with other agents, read-only vs write, file/directory boundaries
- `<output_format>`: Exact path `/tmp/rpt-{taskname}-{agent-domain}.md`, required sections
- `<success_criteria>`: **MUST** use the v1 structured shape (same as Mode 1) — see `references/outcome-schema.md`. Include 3–6 `<criterion>` entries scoped to **this agent's artifact** (not the whole team's output). Each criterion has `id`, `verification_method` (`rule` | `llm_judge` | `manual`), a one-sentence `<description>`, and an inline `<rule>` or `<judge_prompt>` per the method. Bullet-list placeholders in the template files are acceptable scaffolding but the generated per-agent prompt **must** upgrade them to the structured form.

**Score each prompt — target 8+/10.** If under 8, add more context/constraints.

Write all to `/tmp/rpt-agent-prompts-{taskname}.md`

**Flywheel hook (per-agent):** after Phase 3 execution, each agent's artifact at `/tmp/rpt-{taskname}-{agent-domain}.md` can be recorded separately with `scripts/outcome-record.js --role <agent-name>` (one record per agent, `mode="repromptverse"`, and `--role` set to the teammate's name so the flywheel bridge uses it as the `domain` when building the recipe fingerprint). Score each record with `scripts/evaluate-outcome.js`. Without `--role`, all agents on the same `task_type` collapse into the same recipe bucket and the strategy learner can't tell which roles consistently win vs struggle — so always pass it for Repromptverse records.

#### Reprompt quality scorecard (mandatory)

After writing all agent prompts, show the before/after comparison so the user sees the improvement:

```markdown
## Reprompt Quality

| Metric | Raw prompt | After reprompt | Change |
|--------|-----------|----------------|--------|
| Overall | {raw}/10 | {after}/10 | +{pct}% |
| Per-agent avg | - | {avg}/10 | - |
| Agents | - | {N} | - |

Raw prompt scored {raw}/10. After reprompting, each agent prompt scores {min}-{max}/10 (avg {avg}/10).
```

**Rules:**
- MUST appear after Phase 2 prompt generation, before Phase 3 execution
- Shows the user exactly how much reprompter improved their input
- If any agent prompt scores < 8, note which ones and what was added to fix them

### Phase 3: Execute

Phase 3 has platform-specific execution methods. The reprompted prompts from Phase 2 work with any method — you just need to pick which one to run. In most runs you should not ask the user; auto-pick below and announce the decision so they can redirect if they want.

**Status Line (all platforms):** During polling, show compact agent status with each cycle. See Agent Cards section for format.

#### Runtime auto-pick (default behaviour — do this first)

If the user explicitly named an option in their request (e.g. "use tmux", "run it sequentially", "via sessions_spawn"), honour that and skip the detection. Otherwise run the decision tree below top-to-bottom and use the first option whose capability is available.

| Order | Capability check | If true, use |
|-------|-----------------|--------------|
| 1 | **All four** of `TeamCreate`, `Agent`, `SendMessage`, and `TeamDelete` are listed in your current toolset. (Gating on `TeamCreate` alone is not enough — Option B's spawn/shutdown path needs the whole set; without it the run fails mid-execution rather than falling through to another option.) | **Option B** — native Claude Code teams; teammates can message each other; no tmux init or send-keys timing risk |
| 2 | `sessions_spawn` tool is listed in your current toolset | **Option C** — OpenClaw |
| 3 | `bash -c 'command -v tmux && { v=$(claude --version 2>/dev/null \| awk "{print \$1}"); [[ "$v" =~ ^(2\.[1-9]\|[3-9]) ]]; }'` exits 0. (Binary presence alone is insufficient — Option A needs `claude` ≥ 2.1 so `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` is honoured; older CLIs accept the env var but don't enable team mode.) | **Option A** — tmux + child `claude --model opus`, visible panes |
| 4 | Running inside Codex (parallel sessions available) | **Option D** |
| 5 | None of the above | **Option E** — sequential fallback (works with any LLM) |

After picking, announce it in one short line before starting Phase 3 work, so the user can redirect:

> Auto-picked **Option B** (TeamCreate + Agent) — `TeamCreate` is available in this runtime. Override by saying "use Option A" (tmux) or "use Option E" (sequential).

Why B is first: it's the only option where teammates can `SendMessage` each other and share a `TaskList` in-process. Option A is picked only when B isn't available — it still works end-to-end but loses cross-agent messaging and adds the tmux init + send-keys timing surface.

#### Tool-schema guard (all options)

Before invoking any tool named in Options A–E, verify it appears in your current toolset and that the call signature matches the schema Claude Code loaded for you. Opus 4.7 rejects calls against an unknown tool or a non-matching signature instead of inferring intent like 4.6 used to. If a named tool is unfamiliar, halt and report back rather than substituting a similar-looking one.

Known pitfalls captured from 4.6 → 4.7 drift in this skill:

- **`Task` → `Agent`.** The legacy spawn tool was named `Task` and took `subagent_type` as a keyword argument. It has been split into `Agent(...)` for spawn and `TaskCreate` / `TaskUpdate` / `TaskList` for todos. Any example still calling the old spawn name is broken under 4.7.
- **`SendMessage` signature.** Current shape is `SendMessage(to=<name-or-"*">, message=<str-or-obj>)`. Legacy `type=` and `recipient=` kwargs do not exist on the current tool.
- **Broadcast restriction.** `SendMessage(to="*", ...)` accepts **plain strings only.** Structured payloads such as `{"type": "shutdown_request"}` must be sent per-agent by name; the runtime rejects structured broadcasts.
- **`TeamDelete` ordering.** `TeamDelete()` fails if any teammate is still active. Shutdown is async; in-process teammates need a turn yield to approve each `shutdown_request` before cleanup succeeds.
- **`TeamCreate` precedence.** `Agent(team_name=...)` errors if that team was not created first. Always call `TeamCreate` before any `Agent` with a `team_name` argument.

Canonical signatures Option B depends on. **These are reference documentation, not a schema enforced by the validator.** `npm run validate:tool-refs` is a blocklist — it catches known-bad shapes from this repo's history (obsolete tool names, reordered broadcast calls, hardcoded model pins) but does not positively verify that every call here matches its schema. If you change a signature below, update the linter's check set in `scripts/validate-tool-refs.js` in the same PR (and also any Option B flow that relies on the old shape).

```text
TeamCreate(team_name=<string>, description=<string>)

TaskCreate(subject=<string>, description=<string>)

Agent(
  description=<string>,           # required
  prompt=<string>,                 # required
  subagent_type=<string>,          # optional, e.g. "general-purpose"
  team_name=<string>,              # optional — requires prior TeamCreate
  name=<string>,                   # optional — used as SendMessage target
  model=<string>,                  # optional — "opus" / "sonnet" / "haiku"
  run_in_background=<bool>,        # optional — default false
)

SendMessage(to=<name-or-"*">, message=<str-or-obj>)

TaskList()  # used during polling; returns current task statuses

TeamDelete()
```

Never hardcode a specific model version string of the form `claude-<family>-<major>-<minor>` — use the bare alias (`opus`, `sonnet`, `haiku`) so the CLI resolves to the current latest automatically. The linter also enforces this.

#### Option A: tmux (Claude Code)

```bash
# 1. Start Claude Code with Agent Teams
tmux new-session -d -s {session} "cd /path/to/workdir && CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 claude --model opus"
# placeholders:
# - {session}: unique tmux session name (example: rpt-auth-audit)
# - /path/to/workdir: absolute repository path for the target project (example: /tmp/reprompter-check)

# 2. Wait for startup
sleep 12

# 3. Send prompt — MUST use -l (literal), Enter SEPARATE
# IMPORTANT: Include POLLING RULES to prevent lead TaskList loop bug
tmux send-keys -t {session} -l 'Create an agent team with N teammates. CRITICAL: Use model opus for ALL tasks.

POLLING RULES — YOU MUST FOLLOW THESE:
- After sending tasks, poll TaskList at most 10 times
- If ALL tasks show "done" status, IMMEDIATELY stop polling
- After 3 consecutive TaskList calls showing the same status, STOP polling regardless
- Once you stop polling: read the output files, then write synthesis
- DO NOT call TaskList more than 20 times total under any circumstances

Teammate 1 (ROLE): TASK. Write output to /tmp/rpt-{taskname}-{domain}.md. ... After all complete, synthesize into /tmp/rpt-{taskname}-final.md'
sleep 0.5
tmux send-keys -t {session} Enter

# 4. Monitor (poll every 15-30s) — show Status Line: Agents: ✅ N/T ⏳ N/T 🔄 N/T
tmux capture-pane -t {session} -p -S -100

# 5. Verify outputs
ls -la /tmp/rpt-{taskname}-*.md

# 6. Cleanup
tmux kill-session -t {session}
```

#### Critical tmux rules

⚠️ **WARNING: Default teammate model is HAIKU unless explicitly overridden. Always set `--model opus` in both CLI launch command and team prompt.**

| Rule | Why |
|------|-----|
| Always `send-keys -l` (literal flag) | Without it, special chars break |
| Enter sent SEPARATELY | Combined fails for multiline |
| sleep 0.5 between text and Enter | Buffer processing time |
| sleep 12 after session start | Claude Code init time |
| `--model opus` in CLI AND prompt | Default teammate = HAIKU |
| Each agent writes own file | Prevents file conflicts |
| Unique taskname per run | Prevents collisions between concurrent sessions |

### Phase 4: Evaluate + retry

1. Read each agent's report
2. Score against success criteria from Phase 2:
   - 8+/10 → ACCEPT
   - 4-6/10 → RETRY with delta prompt (tell them what's missing)
   - < 4/10 → RETRY with full rewrite
   
   **Accept checklist** (use alongside score — all must pass):
   - [ ] All required output sections present
   - [ ] Requirements from Phase 2 independently verifiable
   - [ ] No hallucinated file paths or line numbers
   - [ ] Scope boundaries respected (no overlap with other agents)
3. Max 2 retries (3 total attempts)
4. **Show Result Cards** — render summary table before synthesis (see Agent Cards section for format)
5. Deliver final report to user

**Delta prompt pattern:**
```
Previous attempt scored 5/10.
✅ Good: Sections 1-3 complete
❌ Missing: Section 4 empty, line references wrong
This retry: Focus on gaps. Verify all line numbers.
```

### Expected cost & time

| Team size | Time | Cost |
|-----------|------|------|
| 2 agents | ~5-8 min | ~$1-2 |
| 3 agents | ~8-12 min | ~$2-3 |
| 4 agents | ~10-15 min | ~$2-4 |

Estimates cover Phase 3 (execution) only. Add ~3 minutes for Phases 1-2 and ~5-8 minutes per retry. Each agent uses ~25-70% of their 200K token context window.

#### Option B: TeamCreate (Claude Code native)

When using Claude Code with TeamCreate/SendMessage tools (native agent teams, no tmux needed):

```text
# 1. Create team
TeamCreate(team_name="rpt-{taskname}", description="Repromptverse: {task summary}")

# 2. Create tasks (one per agent)
TaskCreate(subject="Agent 1 task", description="Full reprompted prompt from Phase 2")
TaskCreate(subject="Agent 2 task", description="Full reprompted prompt from Phase 2")

# 3. Spawn teammates with the Agent tool (MUST specify model=opus)
#    Note: In Claude Code ≥2.1, the tool is `Agent`. The old `Task` name referred
#    to the same spawn primitive but no longer exists as a callable tool. Using
#    `Task(...)` here causes the model to either fail the call or skip the spawn.
Agent(description="Agent 1 on rpt-{taskname}", subagent_type="general-purpose",
      team_name="rpt-{taskname}", name="agent-1", model="opus",
      prompt="You are {role} on the rpt-{taskname} team. Your task is Task #1. [full prompt]",
      run_in_background=true)
Agent(description="Agent 2 on rpt-{taskname}", subagent_type="general-purpose",
      team_name="rpt-{taskname}", name="agent-2", model="opus",
      prompt="You are {role} on the rpt-{taskname} team. Your task is Task #2. [full prompt]",
      run_in_background=true)

# 4. Wait for teammates to complete — show Status Line per poll cycle
# Status Line: Agents: ✅ N/T ⏳ N/T 🔄 N/T (derived from TaskList status)
# 5. Compile synthesis from teammate reports
# 6. Shutdown teammates and delete team
#    Two hard rules on the current runtime (verified on Claude Code 2.1+):
#    - SendMessage(to="*") ONLY accepts plain-string messages. Structured
#      payloads like {"type": "shutdown_request"} are rejected on broadcast,
#      so shutdown MUST be sent per-agent by name.
#    - TeamDelete() errors if any teammate is still active. shutdown is
#      asynchronous (each teammate needs a turn to approve the request and
#      terminate), so wait for each agent to acknowledge before calling it.
#      Retry TeamDelete with a small backoff if needed.
SendMessage(to="agent-1", message={"type": "shutdown_request"})
SendMessage(to="agent-2", message={"type": "shutdown_request"})
# ... one SendMessage per spawned teammate
# (wait for each shutdown_response — in-process teammates need a turn yield)
TeamDelete()
```

**Advantages over tmux:** Teammates can message each other (cross-agent flags), shared TaskList for progress tracking, no tmux/terminal dependency, built-in idle/shutdown protocol.

**When to use TeamCreate vs tmux:** Use TeamCreate when agents need to communicate (review teams, audit teams). Use tmux when agents are fully independent and you want visible terminal panes.

#### Option C: sessions_spawn (OpenClaw only)

When tmux/Claude Code is unavailable but running inside OpenClaw:
```
sessions_spawn(task: "<per-agent prompt>", model: "opus", label: "rpt-{role}")
```
Note: `sessions_spawn` is an OpenClaw-specific tool. Not available in standalone Claude Code.

#### Option D: Codex CLI

Codex CLI 0.121.0+ offers two valid patterns for Repromptverse fan-out. Pick based on whether orchestration happens inside or outside the Codex session.

| Pattern | When to use | Mechanism |
|---|---|---|
| **D1: Native subagents** | In-session orchestration, shared context, single synthesis, per-agent TOML role definitions | `[agents]` config + prompt-driven spawn |
| **D2: Shell-level `codex exec`** | External orchestration, per-agent model/profile, structured stdout/stderr logs, total isolation | `codex exec --ephemeral --sandbox <mode> ... &` + `wait` |
| _Neither — cross-agent messaging required mid-run_ | Agents must talk while running | Use Option B (TeamCreate in Claude Code) — Codex has no cross-agent messaging primitive |

See `references/runtime/codex-runtime.md` for the full runtime contract (invocation, artifacts, retries, known gotchas).

**D1 — Native subagents (Codex 0.121.0+; `multi_agent` feature flag stabilized in 0.115.0 on 2026-03-16)**

Enable in `~/.codex/config.toml`:

```toml
[features]
multi_agent = true

[agents]
max_threads = 6       # concurrent workers (default)
max_depth = 1         # no sub-subagents by default
job_max_runtime_seconds = 1800
```

Define each repromptverse role once as `~/.codex/agents/<name>.toml`:

```toml
name = "rpt_audit_explorer"
description = "Read-only exploration for Repromptverse audit fan-out."
model = "gpt-5.4"
model_reasoning_effort = "high"
sandbox_mode = "read-only"
developer_instructions = """
You are one of N parallel audit workers. Write your findings to the
artifact path specified by the orchestrator. Cite file:line for every
claim. Do not speculate. Finish by going idle; the orchestrator reads
the artifact file, not a tool call.
"""
```

**Note:** `report_agent_job_result` is a Codex tool required only by `spawn_agents_on_csv` batch workers, not by ordinary prompt-spawned subagents. Do not add it to the normal D1 `developer_instructions` above — the tool is not registered for standard subagent roles.

Subagents are prompt-driven in Codex (not flag-driven). The orchestrator prompt fans out in natural language:

```
Spawn one rpt_audit_explorer subagent per audit dimension
(methodology, code, stats, narrative, attack-surface, claims).
Each subagent writes to /tmp/rpt-{taskname}-{dimension}.md.
After all six complete, read their artifacts and synthesize
the final report to /tmp/rpt-{taskname}-final.md.
```

The `[agents] max_threads` cap is enforced natively — no FIFO semaphore needed. Note: normal `spawn_agent` calls past the cap fail with an `AgentLimitReached` error rather than queueing, so keep the orchestrator's fan-out size ≤ `max_threads`. Known gotchas: issue #14866 (stuck "awaiting instruction", closed with linked fix) and issue #15177 (still open: model override metadata may leak back to parent model — prefer the `default` role when override fidelity matters).

**D2 — Shell-level `codex exec` (portable shell-level path, any Codex version with `codex exec` + `--ephemeral`)**

Shell-level parallelism works on any POSIX shell. `codex exec` is one-shot, so backgrounding each agent and waiting is the portable pattern:

```bash
# 0. Materialize per-agent prompt files (Phase 2 split).
#    Convention: /tmp/rpt-{taskname}-{agent}.prompt.md
ls /tmp/rpt-{taskname}-*.prompt.md

# 1. Launch each agent in the background. Workers must write their
#    artifact to /tmp/rpt-{taskname}-{agent}.md, so they need write
#    access to /tmp. --sandbox workspace-write permits this. In
#    `codex exec`, --full-auto is an alias for the same sandbox and
#    approval stays at `never` either way (verified in codex 0.121.0
#    source: exec/src/cli.rs + exec/src/lib.rs). Pick whichever flag
#    reads cleaner in your scripts.
#    Switch to --sandbox read-only ONLY if your workers are pure
#    analysis that captures findings via --output-last-message instead
#    of writing the .md artifact themselves (rename the .log to .md
#    after `wait`).
#    `--ephemeral` skips on-disk session state; recommended for
#    isolated parallel runs (historical reference: closed issue #11435).
MODEL="gpt-5.4"
for agent in planner critic synthesizer; do
  codex exec \
    --model "$MODEL" \
    --ephemeral \
    --sandbox workspace-write \
    --output-last-message "/tmp/rpt-{taskname}-${agent}.log" \
    "$(cat /tmp/rpt-{taskname}-${agent}.prompt.md)" \
    > "/tmp/rpt-{taskname}-${agent}.stdout" 2>&1 &
done

# 2. Wait for all background sessions.
wait

# 3. Verify each agent wrote its artifact (exclude .prompt.md inputs).
ls /tmp/rpt-{taskname}-*.md 2>/dev/null | grep -v '\.prompt\.md$'

# 4. Run Phase 4 evaluator loop.
```

Status Line during execution: Codex CLI has no built-in TaskList. Derive status from artifact presence — crucially, **exclude the `.prompt.md` input files** or the counter will report "done" before any agent writes output:

```bash
# Zero-match-safe, POSIX-compatible loop. Does not abort under
# `set -euo pipefail` when no artifacts exist yet or only .prompt.md
# inputs are present, and runs in dash (/bin/sh) as well as bash/zsh.
done=0
for f in /tmp/rpt-{taskname}-*.md; do
  [ -e "$f" ] || continue             # glob returned literal (no matches)
  case "$f" in *.prompt.md) continue ;; esac
  done=$((done + 1))
done
total=3
echo "Agents: ✅ $done/$total  ⏳ $((total-done))/$total"
```

**Retries:** Re-run `codex exec` for the failing agent with the delta prompt (Phase 4). Do NOT re-run the whole fleet.

**Concurrency cap (D2):** Default to 4 or the CPU count, whichever is lower. On Linux use `nproc`; on macOS use `sysctl -n hw.ncpu`. More than 4 concurrent Codex sessions against the same account can hit rate limits.

**If `wait` hangs (D2):** one agent stalled. Inspect `/tmp/rpt-{taskname}-*.stdout`, kill that PID, retry just that agent.

**Single-session fallback:** if the environment doesn't allow backgrounding or subagents (sandboxed shells, notebook runners), use Option E — the reprompted prompts are plain text and run identically in one session, just slower.

**When to pick D1 vs D2:**

| Situation | Pick |
|---|---|
| Agents share context; one summary output expected | D1 |
| Need per-agent log files or model/profile overrides | D2 |
| Orchestrating from CI or shell script outside Codex | D2 |
| You want fresh context per worker without re-ingesting the codebase | D1 |
| Codex < 0.121.0 or `multi_agent` feature disabled | D2 |
| Cross-agent messaging required mid-run | Neither — use Option B (TeamCreate in Claude Code) |

#### Option E: Sequential (any LLM)

No parallel execution tools available? Run each agent's reprompted prompt one at a time in the same session. Works with any LLM (Claude, GPT, Gemini, Codex, etc.). Slower but fully platform-agnostic.

The reprompted prompts from Phase 2 are pure text. They work regardless of execution method.

---

## Mode 3: Reverse Reprompter

### TL;DR

```
Great output in → optimal prompt out. Extract the DNA that produced excellence.

Phase 1: EXTRACT — structural analysis of the exemplar (~5s)
Phase 2: ANALYZE — classify task type, domain, tone, quality (~5s)
Phase 3: SYNTHESIZE — generate full XML prompt matching the exemplar's pattern (~10s)
Phase 4: INJECT — seed flywheel with pre-graded exemplar outcome (optional, ~2s)
```

**Key insight:** Users encounter great outputs constantly but can't reproduce the quality. Reverse Reprompter closes that gap by extracting the prompt that would have produced it.

### Trigger words

- "reverse reprompt", "reverse reprompter"
- "reprompt from example", "reprompt from this"
- "learn from this"
- "extract prompt from"
- "reverse engineer prompt"
- "prompt from output", "prompt dna", "prompt genome"

### Process

1. **Receive exemplar** — user provides text (paste, file path, or points to an existing output)
2. **Input guard** — must be substantial output (>50 chars, has structure). Reject raw prompts (use Single mode instead), empty text, or single-word inputs
3. **Quick interview** (max 2 questions via AskUserQuestion):
   - "What do you love about this output?" (with options: Structure / Depth / Tone / Coverage / Everything)
   - "What context produced it?" (with options: Code review / Architecture / API work / Research / Other) — skip if task type is detectable with high confidence
4. **Analyze** — extract structure, classify type, detect domain and tone
5. **Extract criteria** — derive a v1 `<success_criteria schema_version="1">` block from the exemplar's observable features (see "Criteria extraction from exemplars" below). The exemplar *is* the target, so the criteria encode "future outputs should match this exemplar's distinguishing properties."
6. **Generate** — produce full XML prompt using reverse template + best-fit task template; embed the extracted `<success_criteria>` block.
7. **Score** — show quality dimensions of the generated prompt
8. **Flywheel injection** — offer to save as pre-graded exemplar outcome

### ⚠️ MUST GENERATE AFTER ANALYSIS

After analysis completes, IMMEDIATELY:
1. Extract `<success_criteria>` from the exemplar (see "Criteria extraction from exemplars" below — 3–6 criteria anchored to observable features of the exemplar)
2. Generate the full reverse-engineered prompt, embedding the extracted `<success_criteria>` block
3. Show the Extraction Card (see below)
4. Show the generated prompt in XML format
5. Show quality score
6. Ask: "Save to flywheel? / Execute with this prompt? / Copy?"

```
❌ WRONG: Analyze exemplar → stop
❌ WRONG: Analyze exemplar → generate prompt → stop (skipping criteria extraction)
✅ RIGHT: Analyze exemplar → extract criteria → generate prompt with criteria → show Extraction Card → show score → offer actions
```

### Extraction Card (transparency layer)

Rendered after analysis, before the generated prompt. Use this exact format:

```markdown
## Reverse Extraction

| Dimension | Detected | Confidence |
|-----------|----------|------------|
| Task type | {code-review, architecture-doc, etc.} | {high/medium/low} |
| Domain | {primary domain} | - |
| Tone | {formal/neutral/casual} | - |
| Structure | {N sections, M bullets, K code blocks} | - |
| Quality | Clarity {N}/10, Specificity {N}/10, Coverage {N}/10 | - |

Template match: `{template-id}` | Flywheel injection: {ready/skipped}
```

### Criteria extraction from exemplars

Reverse Reprompter converts the exemplar into criteria by examining three layers of its structure and encoding the distinguishing features as v1 `<criterion>` entries. Aim for 3–6 total, mix of methods.

**Structural layer** (produces `rule` / `predicate` criteria):

- Section/header count: `len(output_text) > N` bounded by the exemplar's length ±20%
- Presence of specific section names the exemplar uses (e.g. "## Summary", "## Trade-offs") → `rule` / `regex` matching those headers
- Minimum number of bulleted items, code blocks, or table rows if the exemplar has them → predicate

**Content layer** (produces `rule` / `regex` or `llm_judge` criteria):

- Required domain terminology that the exemplar uses distinctively (e.g. "CVE-", "SLO", "RFC 7231") → `rule` / `regex`
- Presence of quantitative claims (numbers + units) when the exemplar has them → regex like `\d+\s*(ms|MB|%|seconds)`
- Judgement calls that can't be regex-checked (e.g. "argues from concrete evidence") → `llm_judge` with a judge_prompt that references the exemplar's reasoning style

**Style layer** (produces `llm_judge` or `manual` criteria):

- Tone match to exemplar → `llm_judge` with an explicit "matches the tone of this reference passage: {first 200 chars}" prompt
- Voice (active vs passive, first-person vs third-person) → `llm_judge` or `manual`
- Citation style or formatting conventions unique to the exemplar → `manual`

Rules of thumb:

- **No more than 2 `llm_judge` criteria per reverse prompt** — they're expensive to evaluate and easy to over-rely on. Prefer `rule` when the exemplar exposes an observable pattern.
- **At least one `manual` criterion for any deeply stylistic property** — those are the ones humans actually care about on review and shouldn't be auto-approved.
- **Anchor criteria to exemplar-specific features**, not generic ones. "Output uses headers" is useless; "Output has exactly the sections Summary / Trade-offs / Recommendation in that order" is useful.

### Exemplar types supported

| Exemplar type | Detected via | Template match |
|---------------|-------------|----------------|
| Code review | "critical issues", "suggestions", file:line refs | bugfix-template |
| Security audit | "vulnerability", severity levels, CVE refs | security-template |
| Architecture doc | "components", "tradeoffs", "decision" headings | research-template |
| API specification | HTTP methods, status codes, endpoint paths | api-template |
| Test plan | "test cases", "coverage", assertion patterns | testing-template |
| Bug report | "steps to reproduce", "expected", "actual" | bugfix-template |
| PR description | "what changed", "fixes #N", "breaking changes" | feature-template |
| Documentation | "installation", "usage", "configuration" | docs-template |
| Blog/content | "introduction", "key takeaways", "in this article" | content-template |
| Research/analysis | "methodology", "findings", "recommendations" | research-template |
| Ops report | "timeline", "root cause", "action items" | refactor-template |

### Flywheel integration

Reverse Reprompter is the **data pump** for the flywheel. Each reverse-engineered prompt creates a pre-graded outcome entry:

```
exemplar (known-good output) + generated prompt = high-confidence recipe
→ injected into .reprompter/flywheel/outcomes.ndjson
→ strategy learner can recommend this recipe for similar future tasks
→ solves cold-start problem (no need to accumulate data from scratch)
```

**Injection rules:**
- Only inject with explicit user consent ("Save to flywheel?")
- Exemplar outcomes get a +0.5 effectiveness bonus (user curated = high quality)
- Source field marked as `reverse-exemplar` to distinguish from execution outcomes
- User verdict defaults to `accept` (they chose the exemplar because it's good)

**When NOT to inject:**
- User says "just show me the prompt" or "don't save"
- Exemplar is too short or low quality (analysis quality score < 5)
- Flywheel is disabled (`REPROMPTER_FLYWHEEL=0`)

### Inspiration: Extraktor pattern

Reverse Reprompter follows the same architectural pattern as [Extraktor](https://github.com/AytuncYildizli/extraktor) (design system reverse-engineering from websites):

| Phase | Extraktor | Reverse Reprompter |
|-------|-----------|-------------------|
| **EXTRACT** | Scrape DOM, computed styles, assets | Parse structure, sections, patterns, tone |
| **ANALYZE** | Vision AI identifies components, layout | Classify task type, detect template match, infer constraints |
| **SYNTHESIZE** | Generate React components + genome.json | Generate XML prompt + flywheel entry |

The key borrowed insight is **dual-signal analysis**: Extraktor sends Claude both the screenshot AND the DOM for better results. Reverse Reprompter uses both structural analysis (heading count, bullet density, code blocks) AND content analysis (keywords, tone markers, domain signals) for classification.

### Token budget

| Phase | Tokens | Source |
|-------|--------|--------|
| Interview | 50-200 | AskUserQuestion (0-2 questions) |
| Analysis | 0 | Deterministic (no AI calls) |
| Prompt generation | ~500-1000 | XML prompt output |
| Extraction Card | ~100 | Summary table |
| **Total** | **~650-1300** | **Lighter than Single mode** |

Canonical implementation for structural analysis and classification lives in `scripts/reverse-engineer.js`. If docs and code ever diverge, the script is the source of truth.

---

## Quality scoring

**Always show before/after metrics:**

| Dimension | Weight | Criteria |
|-----------|--------|----------|
| Clarity | 20% | Task unambiguous? |
| Specificity | 20% | Requirements concrete? |
| Structure | 15% | Proper sections, logical flow? |
| Constraints | 15% | Boundaries defined? |
| Verifiability | 15% | Success measurable? |
| Decomposition | 15% | Work split cleanly? (Score 10 if task is correctly atomic) |

```markdown
| Dimension | Before | After | Change |
|-----------|--------|-------|--------|
| Clarity | 3/10 | 9/10 | +200% |
| Specificity | 2/10 | 8/10 | +300% |
| Structure | 1/10 | 10/10 | +900% |
| Constraints | 0/10 | 7/10 | new |
| Verifiability | 2/10 | 8/10 | +300% |
| Decomposition | 0/10 | 8/10 | new |
| **Overall** | **1.45/10** | **8.35/10** | **+476%** |
```

> **Bias note:** Scores are self-assessed. Treat as directional indicators, not absolutes.

---

## Closed-loop quality (v6.0+)

For both modes, RePrompter supports post-execution evaluation:

1. **IMPROVE** — Score raw → generate structured prompt
2. **EXECUTE** — **Repromptverse mode only**: route to agent(s), collect output. **Single mode does not execute code/commands; it only generates prompts.**
3. **EVALUATE** — Score output/prompt against success criteria (0-10)
4. **RETRY** — Thresholds: Single mode retry if score < 7; Repromptverse retry if score < 8. Max 2 retries.

---

## Advanced features

### Reasoning-friendly prompting (Claude 4.x)
Prompts should be less prescriptive about HOW. Focus on WHAT — clear task, requirements, constraints, success criteria. Let the model's own reasoning handle execution strategy.

**Example:** Instead of "Step 1: read the file, Step 2: extract the function" → "Extract the authentication logic from auth.ts into a reusable middleware. Requirements: ..."

### Response prefilling (API only)
Prefill assistant response start to enforce format:
- `{` → forces JSON output
- `## Analysis` → skips preamble, starts with content
- `| Column |` → forces table format

### Context engineering
Generated prompts should COMPLEMENT runtime context (CLAUDE.md, skills, MCP tools), not duplicate it. Before generating:
1. Check what context is already loaded (project files, skills, MCP servers)
2. Reference existing context: "Using the project structure from CLAUDE.md..."
3. Add ONLY what's missing — avoid restating what the model already knows

### Capability policy routing (OpenClaw + multi-LLM)
When multiple providers/models are available, route each agent by capability tier:
- `reasoning_high`: audits, synthesis, high-risk tasks
- `long_context`: very large context windows or broad codebase scans
- `cost_optimized` / `latency_optimized`: low-risk triage and bulk tasks
- Always emit fallback chain with provider diversity (avoid single-provider hard dependency)

### Budgeted layered context
Build per-agent context in layers with explicit budgets:
1. Task contract (always preserved)
2. Local code facts
3. Selected references
4. Prior artifacts/handoffs

Emit a context manifest (used tokens, truncation flags, dropped entries) so retries are reproducible and debuggable.

### Strict artifact gate
Before synthesis, evaluate each artifact for:
- Required section coverage
- Verifiability (file:line refs when required)
- Boundary compliance (forbidden-pattern checks)
- Overall weighted score threshold

If gate fails, retry only with delta prompts (max 2 retries).

Implementation note: combine routing + patterns + model policy + context + adapter + evaluator through a single orchestration contract (`scripts/repromptverse-runtime.js`) to keep behavior deterministic across runtimes.

### Runtime feature flags
Repromptverse runtime supports deterministic toggles for rollout and troubleshooting:
- `REPROMPTER_POLICY_ENGINE=0|1` — disable/enable capability-based model routing
- `REPROMPTER_LAYERED_CONTEXT=0|1` — disable/enable layered context assembly
- `REPROMPTER_STRICT_EVAL=0|1` — disable/enable strict artifact evaluator defaults
- `REPROMPTER_PATTERN_LIBRARY=0|1` — disable/enable pattern selector activation
- `REPROMPTER_TELEMETRY=0|1` — disable/enable runtime telemetry emission for observability reports
- `REPROMPTER_FLYWHEEL=0|1` — disable/enable Prompt Flywheel outcome learning (v9.0+). Controls whether outcome records are **written** to `.reprompter/flywheel/outcomes.ndjson` after a run.
- `REPROMPTER_FLYWHEEL_BIAS=0|1` — disable/enable Prompt Flywheel bias injection at generation time (v3 read-path). Default **off**. When on, Mode 1 and Mode 2 consult `npm run flywheel:query` for a recommendation before picking a template and apply the bias only when confidence is medium/high with `sampleCount >= 3`. See "Flywheel bias injection" under Mode 1 for the full decision rule.

### Telemetry and observability
Every Repromptverse run should emit stage-level telemetry events with `runId`, `taskId`, stage name, status, latency, and provider/model where applicable.
- Event stages: `route_intent`, `select_patterns`, `resolve_model`, `build_context`, `plan_ready`, `spawn_agent`, `poll_artifacts`, `evaluate_artifact`, `finalize_run`, `fingerprint_recipe`, `collect_outcome`, `learn_strategy`
- Storage: `.reprompter/telemetry/events.ndjson`
- Report command: `npm run telemetry:report`

### Prompt Flywheel (v9.0+)
Closed-loop outcome learning system. Every prompt reprompter generates carries a **recipe fingerprint** — a deterministic hash of the strategy decisions (template, patterns, capability tier, domain, context layers, quality bucket). After execution, **outcome signals** are passively collected and linked back to the fingerprint.

#### Flywheel user guidance
When the flywheel has enough historical data to influence a recommendation, the AI agent should communicate this to the user concisely:

**When to show flywheel info:**
- Show a brief one-liner when flywheel bias is applied to a plan (e.g., "Flywheel: using constraint-first pattern based on 8 past runs (score 8.7, high confidence)")
- Show when the recommended strategy differs from what would have been selected without historical data
- If the flywheel recommends a different template (via `flywheelBias.template`), prefer that template for prompt generation in Phase 2 unless the user explicitly overrides

**Template bias:** When `flywheelBias.template` is set, use that template ID for prompt generation instead of the default intent-routed template. This is the most impactful flywheel signal — template choice shapes the entire prompt structure. Log the override: "Flywheel: using {template} (historically {score}/10 over {N} runs)"

**When NOT to show flywheel info:**
- No outcome data exists yet (cold start) — do not mention the flywheel at all
- Confidence is `insufficient` (<2 samples) or `low` (<5 samples) — silently skip, no user-facing note
- Bias lookup found data but no changes were applied — nothing to report

**Format:** Always a single inline note, never a table or multi-line block. Example:
> Flywheel: preferring `security-template` + `self-critique-checkpoint` pattern (9 runs, score 8.3/10, high confidence)

**Privacy:** All flywheel data is local (`.reprompter/flywheel/`). Never reference specific past prompts, tasks, or user content in flywheel messages — only aggregate statistics (run count, score, confidence level).

**All data is stored locally.** Nothing is transmitted anywhere. Storage: `.reprompter/flywheel/outcomes.ndjson`.

#### Bias-on vs bias-off A/B contract (v3 part 3)

The attribution mechanism (records carrying `applied_recommendation`) and the flag (`REPROMPTER_FLYWHEEL_BIAS=0|1`) exist so that bias can be measured, not just described. The contract:

- **Bias-on record** = an outcome whose run consulted the flywheel AND applied a recommendation. The record MUST carry `applied_recommendation = { recipe_hash, confidence, sample_count, applied_at }`.
- **Bias-off record** = every other outcome: `REPROMPTER_FLYWHEEL_BIAS=0` runs, flag-on runs where the query returned `null`, and flag-on runs where the query returned low-confidence (below the medium/high threshold). These records MUST NOT carry `applied_recommendation` at all. **Absence is the control-group signal.** Never stamp a null/placeholder block "to be tidy" — that collapses the A/B partition.

Read the A/B report with `npm run flywheel:ab` (optionally `-- --task-type <slug>` to scope). It returns `{ with_bias: {count, mean, median}, without_bias: {count, mean, median}, delta_mean_effectiveness, notes }`. Notes flag low-sample groups (<5 per side) so you don't over-read noise. A positive `delta_mean_effectiveness` means bias-on outcomes averaged higher than bias-off outcomes for this task type; negative means the opposite. Only consider flipping `REPROMPTER_FLYWHEEL_BIAS` to default-on after both groups pass the 5-sample bar and the delta is consistent across multiple task types.

**How it works:**
1. **Fingerprint** — At `plan_ready`, the recipe vector (template + patterns + tier + domain + layers + quality bucket) is hashed into a 16-char fingerprint
2. **Outcome collection** — At `finalize_run`, passive signals are captured: artifact evaluator score/pass, retry count, execution time. Linked to the recipe fingerprint.
3. **Strategy learning** — On future runs, the learner queries the outcome ledger for similar past tasks, scores each recipe group (time-decay weighted), and recommends the historically best-performing strategy

**Effectiveness scoring:**
- Base: artifact evaluator score
- Penalties: retries (-0.5 each), post-corrections (-0.3 each, capped at -2.0)
- Bonus: first-attempt pass (+0.5)
- Overrides: explicit user reject (caps at 3.0), explicit user accept (floors at 7.0)

**Time decay:** 7-day half-life. Recent outcomes weigh more. Month-old outcomes have <10% influence.

**Confidence levels:** high (10+ samples), medium (5-9), low (2-4), insufficient (<2, no recommendation made).

Report command: `npm run flywheel:report`
Benchmark command: `npm run benchmark:flywheel`

### Pattern library (pluggable)
Treat prompt/context engineering advancements as toggleable patterns (not fixed doctrine):
- Constraint-first framing
- Uncertainty labeling
- Self-critique checkpoint
- Delta retry scaffold
- Evidence-strength labeling
- Context-manifest transparency

Activate by task/domain/outcome profile and validate via benchmark fixtures.

### Token budget
Keep generated prompts under ~2K tokens for single mode, ~1K per agent for Repromptverse. Longer prompts waste context window without improving quality. If a prompt exceeds budget, split into phases or move detail into constraints.

### Uncertainty handling
Always include explicit permission for the model to express uncertainty rather than fabricate:
- Add to constraints: "If unsure about any requirement, ask for clarification rather than assuming"
- For research tasks: "Clearly label confidence levels (high/medium/low) for each finding"
- For code tasks: "Flag any assumptions about the codebase with TODO comments"

---

## Settings (for Repromptverse mode)

> Note: `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS` is an experimental flag that may change in future Claude Code versions. Check [Claude Code docs](https://docs.anthropic.com/en/docs/claude-code) for current status.

In `~/.claude/settings.json`:
```json
{
  "env": {
    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
  },
  "preferences": {
    "teammateMode": "tmux",
    "model": "opus"
  }
}
```

| Setting | Values | Effect |
|---------|--------|--------|
| `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS` | `"1"` | Enables agent team spawning |
| `teammateMode` | `"tmux"` / `"default"` | `tmux`: each teammate gets a visible split pane. `default`: teammates run in background |
| `model` | `"opus"` / `"sonnet"` | Teammates default to Haiku. Always set `model: opus` explicitly in your prompt — do not rely on runtime defaults. |

### Codex CLI

Install the skill under `~/.codex/skills/reprompter/` (same structure as `~/.claude/skills/`). Codex reads config from `~/.codex/config.toml`:

```toml
# ~/.codex/config.toml
model = "gpt-5.4"          # default model for agent runs
approval_policy = "never"  # for interactive Codex TUI only; `codex exec` already defaults to never in headless

[features]
multi_agent = true         # enables native subagents (Option D1, Codex 0.121.0+)
codex_hooks = false        # experimental; leave off unless you need hook events

[agents]
max_threads = 6            # concurrent subagent workers (default)
max_depth = 1              # no sub-subagents by default
job_max_runtime_seconds = 1800

[reprompter]
default_mode = "parallel"  # parallel | sequential — Phase 1 picks Option D vs E
artifact_root = "/tmp"     # override if your runtime sandboxes /tmp
```

| Setting | Values | Effect |
|---------|--------|--------|
| `model` | any Codex-supported id | Default model when `--model` is omitted from `codex exec`. |
| `approval_policy` | `"untrusted"` / `"on-request"` / `"never"` | Applies to the interactive Codex TUI. `codex exec` runs headless and defaults to `never`, so Option D2 workers never need this key set. |
| `features.multi_agent` | `true` / `false` | Enables native subagents (Option D1). Default-enabled in current Codex releases (0.121.0+); set explicitly only if your config disabled it. |
| `agents.max_threads` | integer, default `6` | Concurrent subagent worker cap. |
| `agents.max_depth` | integer, default `1` | Spawn nesting depth (1 = subagents only, no grandchildren). |
| `reprompter.default_mode` | `"parallel"` / `"sequential"` | Skill-defined hint consumed by Phase 1. |
| `reprompter.artifact_root` | absolute path | Override `/tmp` when needed. |

If Codex CLI is the only runtime available, skip the Claude Code block above — Single and Repromptverse modes do not require Claude Code to be installed.

---

## Proven results

### Single prompt (v6.0)
Rough crypto dashboard prompt: **1.6/10 → 9.0/10** (+462%)

### Repromptverse E2E (v6.1)
3 Opus agents, sequential pipeline (PromptAnalyzer → PromptEngineer → QualityAuditor):

| Metric | Value |
|--------|-------|
| Original score | 2.15/10 |
| After Repromptverse | **9.15/10** (+326%) |
| Quality audit | PASS (99.1%) |
| Weaknesses found → fixed | 24/24 (100%) |
| Cost | $1.39 |
| Time | ~8 minutes |

### Repromptverse vs raw Agent Teams (v7.0)
Same audit task, 4 Opus agents:

| Metric | Raw | Repromptverse | Delta |
|--------|-----|----------------|-------|
| CRITICAL findings | 7 | 14 | +100% |
| Total findings | ~40 | 104 | +160% |
| Cost savings identified | $377/mo | $490/mo | +30% |
| Token bloat found | 45K | 113K | +151% |
| Cross-validated findings | 0 | 5 | — |

---

## Tips

- **More context = fewer questions** — mention tech stack, files
- **"expand"** — if Quick Mode gave too simple a result, re-run with full interview
- **"quick"** — skip interview for simple tasks
- **"no context"** — skip auto-detection
- Context is per-project — switching directories = fresh detection

---

## Test scenarios

See [TESTING.md](TESTING.md) for 44 verification scenarios + anti-pattern examples.

---

## Appendix: Extended XML tags

Templates may add domain-specific tags beyond the 8 required base tags. Always include all base tags first.

| Extended Tag | Used In | Purpose |
|-------------|---------|---------|
| `<symptoms>` | bugfix | What the user sees, error messages |
| `<investigation_steps>` | bugfix | Systematic debugging steps |
| `<endpoints>` | api | Endpoint specifications |
| `<component_spec>` | ui | Component props, states, layout |
| `<agents>` | swarm | Agent role definitions |
| `<task_decomposition>` | swarm | Work split per agent |
| `<coordination>` | swarm | Inter-agent handoff rules |
| `<routing_policy>` | repromptverse | Speaker and router policy |
| `<termination_policy>` | repromptverse | Max turn/time and stop conditions |
| `<artifact_contract>` | repromptverse | Output schema and ownership |
| `<evaluation_loop>` | repromptverse | Score thresholds and retry policy |
| `<research_questions>` | research | Specific questions to answer |
| `<methodology>` | research | Research approach and methods |
| `<reasoning>` | research | Reasoning notes space (non-sensitive, concise) |
| `<current_state>` | refactor | Before state of the code |
| `<target_state>` | refactor | Desired after state |
| `<coverage_requirements>` | testing | What needs test coverage |
| `<threat_model>` | security | Threat landscape and vectors |
| `<structure>` | docs | Document organization |
| `<reference>` | docs | Source material to reference |

Related Skills

ai-pair

169

from axtonliu/ai-pair

AI Pair Collaboration Skill. Coordinate multiple AI models to work together: one creates (Author/Developer), two others review (Codex + Gemini). Works for code, articles, video scripts, and any creative task. Trigger: /ai-pair, ai pair, dev-team, content-team, team-stop

Workflow & ProductivityClaude

review

167

from ArtemXTech/claude-code-obsidian-starter

Daily and weekly review workflows. USE WHEN user says "morning routine", "evening routine", "weekly review", "start my day", "end of day".

Workflow & ProductivityClaude

Beads Issue Tracking Skill

164

from maslennikov-ig/claude-code-orchestrator-kit

> **Attribution**: [Beads](https://github.com/steveyegge/beads) by [Steve Yegge](https://github.com/steveyegge)

Workflow & ProductivityClaude

prd

160

from react-native-vibe-code/react-native-vibe-code-sdk

Generate a Product Requirements Document (PRD) for a new feature. Use when planning a feature, starting a new project, or when asked to create a PRD. Triggers on: create a prd, write prd for, plan this feature, requirements for, spec out.

Workflow & ProductivityClaude

Skill: Runs

154

from ai-analyst-lab/ai-analyst

## Purpose

Workflow & ProductivityClaude

Claude-Zeroclaw SKILL

109

from Crestdrasnip/Claude-Zeroclaw

## Overview

Workflow & ProductivityClaude

session-pack

from ten-builder/ten-builder

세션 종료 시 Memory, Handoff를 자동 정리. /pack

Workflow & ProductivityClaude

execute

from vinzenz/prd-breakdown-execute

Main entry point for hierarchical task execution. Orchestrates layer-by-layer implementation of PRD tasks with parallel worktree execution.

Workflow & ProductivityClaude

textum

from snakeying/Textum

Textum PRD→Scaffold→Story workflow for Codex with low-noise outputs and gate checks.

Workflow & ProductivityClaude

sdd

from SpillwaveSolutions/agent_rulez

This skill should be used when users want guidance on Spec-Driven Development methodology using GitHub's Spec-Kit. Guide users through executable specification workflows for both new projects (greenfield) and existing codebases (brownfield). After any SDD command generates artifacts, automatically provide structured 10-point summaries with feature status tracking, enabling natural language feature management and keeping users engaged throughout the process.

Workflow & ProductivityClaude

nonstop

from andylizf/nonstop

Enables an autonomous work mode for AI agents, allowing continuous operation without user interruption. It includes a pre-flight risk assessment and intelligent blocker handling to maximize productivity.

Workflow & ProductivityClaude

superbuild

from asteroid-belt/skulto

Use when executing implementation plans phase-by-phase with strict enforcement of quality gates, tests, and Definition of Done. Triggers on "build this plan", "execute plan", "implement phases", or when user provides a plan document to execute.

Workflow & ProductivityClaude