reprompter
Transform messy prompts into well-structured, effective prompts — single or multi-agent. Use when: "reprompt", "reprompt this", "clean up this prompt", "structure my prompt", rough text needing XML tags and best practices, "reprompter teams", "repromptverse", "run with quality", "smart run", "smart agents", "multi-agent marketing", "campaign swarm", "engineering swarm", "ops swarm", "research swarm", multi-agent tasks, audits, parallel work, anything going to agent teams. Don't use when: simple Q&A, pure chat, immediate execution-only tasks. See "Don't Use When" section for details. Outputs: Structured XML/Markdown prompt, quality score (before/after), optional team brief + per-agent sub-prompts, agent team output files, Agent Cards (plan/status/result). Success criteria: Single mode quality score ≥ 7/10; Repromptverse per-agent prompt quality score 8+/10; all required sections present, actionable and specific.
How reprompter Compares
| Feature / Agent | reprompter | Standard Approach |
|---|---|---|
| Platform Support | multi | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Transform messy prompts into well-structured, effective prompts — single or multi-agent. Use when: "reprompt", "reprompt this", "clean up this prompt", "structure my prompt", rough text needing XML tags and best practices, "reprompter teams", "repromptverse", "run with quality", "smart run", "smart agents", "multi-agent marketing", "campaign swarm", "engineering swarm", "ops swarm", "research swarm", multi-agent tasks, audits, parallel work, anything going to agent teams. Don't use when: simple Q&A, pure chat, immediate execution-only tasks. See "Don't Use When" section for details. Outputs: Structured XML/Markdown prompt, quality score (before/after), optional team brief + per-agent sub-prompts, agent team output files, Agent Cards (plan/status/result). Success criteria: Single mode quality score ≥ 7/10; Repromptverse per-agent prompt quality score 8+/10; all required sections present, actionable and specific.
Which AI agents support this skill?
This skill is compatible with multi.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# RePrompter v10.0.0
> **Your prompt sucks. Let's fix that.** Single prompts or full agent teams — one skill, two modes. **v10.0 adds Dimension Interview + Agent Cards to Repromptverse.**
---
## Two modes
| Mode | Trigger | What happens |
|------|---------|-------------|
| **Single** | "reprompt this", "clean up this prompt" | Interview → structured prompt → score |
| **Repromptverse** | "reprompter teams", "repromptverse", "run with quality", "smart run", "smart agents", "campaign swarm", "engineering swarm", "ops swarm", "research swarm" | Dimension Interview → Plan team → Agent Cards → reprompt each agent → execute → Result Cards → evaluate → retry |
Auto-detection: if task mentions 2+ systems, "audit", or "parallel" → ask: "This looks like a multi-agent task. Want to use Repromptverse mode?"
Definition — **2+ systems** means at least two distinct technical domains that can be worked independently. Examples: frontend + backend, API + database, mobile app + backend, infrastructure + application code, security audit + cost audit.
## Don't use when
- User wants a simple direct answer (no prompt generation needed)
- User wants casual chat/conversation
- Task is immediate execution-only with no reprompting step
- Scope does not involve prompt design, structure, or orchestration
> Clarification: RePrompter **does** support code-related tasks (feature, bugfix, API, refactor) by generating better prompts. It does **not** directly apply code changes in Single mode. Direct code execution belongs to coding-agent unless Repromptverse execution mode is explicitly requested.
---
## Mode 1: Single prompt
### Process
1. **Receive raw input**
2. **Input guard** — if input is empty, a single word with no verb, or clearly not a task → ask the user to describe what they want to accomplish
- Reject examples: "hi", "thanks", "lol", "what's up", "good morning", random emoji-only input
- Accept examples: "fix login bug", "write API tests", "improve this prompt"
3. **Quick Mode gate** — under 20 words, single action, no complexity indicators → generate immediately
4. **Smart Interview** — use `AskUserQuestion` with clickable options (2-5 questions max)
5. **Generate + Score** — apply template, show before/after quality metrics
6. **Single-pass evaluator** — run self-eval rubric and do one delta rewrite if score < 7
### ⚠️ MUST GENERATE AFTER INTERVIEW
After interview completes, IMMEDIATELY:
1. Select template based on task type
2. Generate the full polished prompt
3. Show quality score (before/after table)
4. Ask if user wants to execute or copy
```
❌ WRONG: Ask interview questions → stop
✅ RIGHT: Ask interview questions → generate prompt → show score → offer to execute
```
### Interview questions
Ask via `AskUserQuestion`. **Max 5 questions total.**
**Standard questions** (priority order — drop lower ones if task-specific questions are needed):
1. Task type: Build Feature / Fix Bug / Refactor / Write Tests / API Work / UI / Security / Docs / Content / Research / Multi-Agent
- If user selects **Multi-Agent** while currently in **Single mode**, immediately transition to **Repromptverse Phase 1 (Team Plan)** and confirm team execution mode (Parallel vs Sequential).
2. Execution mode: Single Agent / Team (Parallel) / Team (Sequential) / Let RePrompter decide
3. Motivation: User-facing / Internal tooling / Bug fix / Exploration / Skip *(drop first if space needed)*
4. Output format: XML Tags / Markdown / Plain Text / JSON *(drop first if space needed)*
**Task-specific questions** (MANDATORY for compound prompts — replace lower-priority standard questions):
- Extract keywords from prompt → generate relevant follow-up options
- Example: prompt mentions "telegram" → ask about alert type, interactivity, delivery
- **Vague prompt fallback:** if input has no extractable keywords (e.g., "make it better"), ask open-ended: "What are you working on?" and "What's the goal?" before proceeding
### Single mode pattern pack (Microsoft-inspired)
Apply these patterns even without multi-agent execution:
1. **Intent router** — map task to template with explicit priority rules
2. **Constraint normalizer** — convert vague goals into measurable requirements/limits
3. **Spec contract** — enforce role/context/task/requirements/constraints/output/success structure
4. **Evaluator loop** — score clarity/specificity/structure/constraints/verifiability/decomposition; if score < 7, produce one delta rewrite
This keeps Single mode deterministic and compatible across Claude, OpenClaw, and Codex runtimes.
### Auto-detect complexity
| Signal | Suggested mode |
|--------|---------------|
| 2+ distinct systems (e.g., frontend + backend, API + DB, mobile + backend) | Team (Parallel) |
| Pipeline (fetch → transform → deploy) | Team (Sequential) |
| Single file/component | Single Agent |
| "audit", "review", "analyze" across areas | Team (Parallel) |
| "campaign", "launch", "growth", "SEO", "content calendar", "funnel" | Team (Parallel, Marketing Swarm) |
| "architecture", "feature delivery", "refactor", "migration", "test coverage" | Team (Parallel, Engineering Swarm) |
| "incident", "uptime", "gateway", "latency", "cron", "SLO", "health" | Team (Parallel, Ops Swarm) |
| "benchmark", "compare", "tradeoff", "options", "analysis", "research" | Team (Parallel, Research Swarm) |
### Quick mode
#### ⚠️ Force interview signals (check first)
**If ANY of the following signals are present, SKIP Quick Mode and go directly to interview — no exceptions:**
| Signal category | Keywords / patterns |
|----------------|---------------------|
| **Scope keywords** | system, platform, service, pipeline, dashboard, module, suite, management |
| **Ownership / existing state** | our, existing, the current, fresh, updated |
| **Integration verbs** | integrate, merge, connect, combine, sync |
| **Compound tasks** | "and", "plus", "also", "as well as" |
| **State management** | track, sync, manage |
| **Vague modifiers** | better, improved, some, maybe, kind of |
| **Ambiguous pronouns** | "it", "this", "that" without a clear referent in the same sentence |
| **Comprehensiveness** | comprehensive, complete, full, end-to-end, overall |
**Clause detection:** Treat any prompt with two or more independent clauses (comma-separated actions, semicolon-joined tasks, or consecutive imperative verbs) as a compound task — force interview.
**Broad-scope noun enforcement (`count_distinct_systems()`):** Count the number of distinct systems/modules implied by broad-scope nouns (system, module, suite, platform, pipeline, dashboard, management). If count >= 1 AND the prompt does not name a single, specific identifier — force interview.
#### Enable Quick Mode (only when NO force-interview signals are present)
Enable when ALL true:
- < 20 words (excluding code blocks)
- Exactly 1 action verb from: add, fix, remove, rename, move, delete, update, create
- Single target (one specific, named file, component, or identifier — NOT a broad-scope noun such as system, module, suite, or management)
- No conjunctions (and, or, plus, also)
- No vague modifiers (better, improved, some, maybe, kind of)
### Task types & templates
Detect task type from input. Each type has a dedicated template in `references/`:
| Type | Template | Use when |
|------|----------|----------|
| Feature | `feature-template.md` | New functionality (default fallback) |
| Bugfix | `bugfix-template.md` | Debug + fix |
| Refactor | `refactor-template.md` | Structural cleanup |
| Testing | `testing-template.md` | Test writing |
| API | `api-template.md` | Endpoint/API work |
| UI | `ui-template.md` | UI components |
| Security | `security-template.md` | Security audit/hardening |
| Docs | `docs-template.md` | Documentation |
| Content | `content-template.md` | Blog posts, articles, marketing copy |
| Research | `research-template.md` | Analysis/exploration |
| Marketing Swarm | `marketing-swarm-template.md` | Marketing-first multi-agent orchestration |
| Engineering Swarm | `engineering-swarm-template.md` | Engineering-first multi-agent orchestration |
| Ops Swarm | `ops-swarm-template.md` | Reliability/infra multi-agent orchestration |
| Research Swarm | `research-swarm-template.md` | Analysis/benchmark multi-agent orchestration |
| Repromptverse | `repromptverse-template.md` | Multi-agent routing + termination + evaluator loop |
| Multi-Agent | `swarm-template.md` | Basic multi-agent coordination |
| Team Brief | `team-brief-template.md` | Team orchestration brief |
**Priority** (most specific wins): marketing-swarm > engineering-swarm > ops-swarm > research-swarm > repromptverse > api > security > ui > testing > bugfix > refactor > content > docs > research > feature. For multi-agent tasks, use the best-fit swarm template + `repromptverse-template` + `team-brief-template`, then type-specific templates for each agent sub-prompt.
**How it works:** Read the matching template from `references/{type}-template.md`, then fill it with task-specific context. Templates are NOT loaded into context by default — only read on demand when generating a prompt. If the template file is not found, fall back to the Base XML Structure below.
> To add a new task type: create `references/{type}-template.md` following the XML structure below, then add it to the table above.
### Base XML structure
All templates follow this core structure (8 required tags). Use as fallback if no specific template matches:
Exception: `team-brief-template.md` uses Markdown format for orchestration briefs. This is intentional — see template header for rationale.
```xml
<role>{Expert role matching task type and domain}</role>
<context>
- Working environment, frameworks, tools
- Available resources, current state
</context>
<task>{Clear, unambiguous single-sentence task}</task>
<motivation>{Why this matters — priority, impact}</motivation>
<requirements>
- {Specific, measurable requirement 1}
- {At least 3-5 requirements}
</requirements>
<constraints>
- {What NOT to do}
- {Boundaries and limits}
</constraints>
<output_format>{Expected format, structure, length}</output_format>
<success_criteria>
- {Testable condition 1}
- {Measurable outcome 2}
</success_criteria>
```
### Project context detection
Auto-detect tech stack from current working directory ONLY:
- Scan `package.json`, `tsconfig.json`, `prisma/schema.prisma`, etc.
- Session-scoped — different directory = fresh context
- Opt out with "no context", "generic", or "manual context"
- Never scan parent directories or carry context between sessions
---
## Mode 2: Repromptverse (Agent Teams)
### TL;DR
```
Raw task in → quality output out. Every agent gets a reprompted prompt.
Phase 1: Score raw prompt, dimension interview if needed, plan team, show Agent Cards (YOU do this, ~45s)
Phase 2: Write XML-structured prompt per agent (YOU do this, ~2min)
Phase 3: Launch agents (tmux, TeamCreate, sessions_spawn, Codex, or sequential) (AUTOMATED)
Phase 4: Show Result Cards, score, retry if needed (YOU do this)
```
**Key insight:** The reprompt phase costs ZERO extra tokens — YOU write the prompts, not another AI.
### Repromptverse control plane (Microsoft-inspired)
Every multi-agent run must include:
1. **Routing policy** — who speaks next and why (selector-style routing for non-trivial teams)
2. **Termination policy** — max turns, max wall time, and no-progress stop condition
3. **Artifact contract** — one writer per output file, fixed schema for handoffs
4. **Evaluator loop** — score each artifact, retry only with delta prompts (max 2 retries)
Use `references/repromptverse-template.md` to enforce this contract.
Domain profile auto-load rules (lazy-load, on demand):
- Marketing intent (`campaign`, `launch`, `growth`, `seo`, `content calendar`, `funnel`) -> `references/marketing-swarm-template.md`
- Engineering intent (`architecture`, `feature delivery`, `refactor`, `migration`, `test coverage`) -> `references/engineering-swarm-template.md`
- Ops intent (`incident`, `uptime`, `gateway`, `latency`, `cron`, `slo`, `health`) -> `references/ops-swarm-template.md`
- Research intent (`benchmark`, `compare`, `tradeoff`, `analysis`, `research`) -> `references/research-swarm-template.md`
Then merge with `references/repromptverse-template.md` for routing/termination/evaluation contract and add task-specific constraints.
Canonical implementation for deterministic routing lives in `scripts/intent-router.js`.
If docs and code ever diverge, the script is the source of truth for benchmark/testing paths.
### Phase 1: Team plan (~45 seconds)
1. **Score raw prompt** (1-10): Clarity, Specificity, Structure, Constraints, Decomposition
- Phase 1 uses 5 quick-assessment dimensions. The full 6-dimension scoring (adding Verifiability) is used in Phase 4 evaluation.
2. **Dimension Interview gate** — check which askable dimensions scored < 5 (see Dimension Interview section below)
3. **Pick mode:** parallel (independent agents) or sequential (pipeline with dependencies)
4. **Define team:** 2-5 agents max, each owns ONE domain, no overlap (informed by interviewContext if interview ran)
5. **Show Plan Cards** (see Agent Cards section below)
6. **User confirmation gate** — "Team plan ready. Proceed to execution?" User can approve, adjust, or cancel. In automated/batch runs, auto-proceed.
7. **Write team brief** to `/tmp/rpt-brief-{taskname}.md` (use unique tasknames to avoid collisions; includes interviewContext section if interview ran)
### Dimension Interview (Repromptverse only)
Score-driven interview for Repromptverse mode. Distinct from Single mode's "Smart Interview" (which uses a standard question list). The Dimension Interview derives questions from low-scoring raw prompt dimensions.
#### Trigger logic
```
scores = score_raw_prompt(rawInput) # 5 dimensions from step 1
# Structure is EXCLUDED — reprompter fixes structure via templates.
# Only 4 dimensions are interview-eligible:
askable = [d for d in scores if d.name != "Structure" and d.value <= 5]
# Threshold: less-than-or-equal. Scores of 5 ARE borderline and trigger questions.
if len(askable) == 0:
SKIP interview → proceed to step 3 (pick mode)
elif len(askable) <= 2:
ASK 1-2 questions (one per low dimension)
else:
ASK 3-4 questions (max 4, prioritized by lowest score first)
```
#### Dimension-to-question mapping
| Dimension | Score < 5 triggers | Question approach |
|-----------|-------------------|-------------------|
| **Clarity** | Task is ambiguous or multi-interpretable | Open-ended with dynamic options extracted from prompt keywords |
| **Specificity** | Scope is vague, no concrete targets | Dynamic options from prompt keywords + top-level directory names |
| **Constraints** | No boundaries defined | "Any areas to exclude?" with context-aware options |
| **Decomposition** | Unclear work split | "How many independent streams?" with suggested splits |
**Question rules:**
- Use `AskUserQuestion` with clickable options (consistent with Single mode)
- Options are **dynamic**: extracted from prompt keywords + codebase context (config files + top-level dirs only — no deep analysis)
- Every question includes a free-text escape hatch option
- Priority order: lowest scoring dimension first
- Language follows user's input language
#### Skip/dismiss handling
- User skips all questions → proceed with empty interviewContext. Plan Cards note: "Interview: skipped by user"
- User answers some, skips others → populate only answered fields
#### Interview output (interviewContext)
Responses merge into an interviewContext written to the team brief file:
```
interviewContext = {
scope: [from Specificity answer],
excludes: [from Constraints answer],
successCriteria: [from answers, or omitted — Phase 2 derives from requirements],
taskClarification: [from Clarity answer, if asked]
}
```
When `successCriteria` is not gathered (question not asked or user skipped), omit the field. Phase 2 derives success criteria from requirements as it does today.
**How interviewContext feeds into later phases:**
- **Agent count and roles** — scope determines which agents are created
- **Per-agent `<constraints>`** — excludes injected into each agent's prompt
- **Per-agent `<success_criteria>`** — user expectations propagated
- **Template selection** — clarified task type may route to a different swarm profile
**Precedence:** Interview responses override auto-detected codebase context. Conflicts noted in Plan Cards.
**Flywheel:** interviewContext is excluded from recipe fingerprint hash. The fingerprint captures strategy (template + patterns + tier), not user scope answers.
### Agent Cards (transparency layer)
Three fixed-format card types rendered at different phases. Templates are exact — do not invent new formats.
#### Plan Cards — rendered at end of Phase 1 (step 5)
After team plan is complete, before Phase 2 prompt writing. Use this exact table format:
```markdown
## Team: {N} Opus Agents ({Parallel|Sequential})
| # | Agent | Scope | Excludes | Output |
|---|-------|-------|----------|--------|
| 1 | {role} | {scope} | {excludes or "-"} | {output path} |
| 2 | {role} | {scope} | {excludes or "-"} | {output path} |
Interview context applied: {summary of influence, including override conflicts, or "No interview (high-quality prompt)", or "Interview: skipped by user"}
```
**Rules:**
- MUST appear before any agent is launched
- If interview ran, show which constraints came from interview vs auto-detected
- If user requests agent adjustments at confirmation gate, re-render Plan Cards with updated team
- Single-agent runs: table renders with one row (valid)
#### Status Line — rendered during Phase 3 polling
Compact one-line status with each poll cycle:
```
Agents: ✅ 2/4 ⏳ 1/4 🔄 1/4 (retry 1)
```
**Emoji mapping:** ✅ = completed, ⏳ = in-progress, 🔄 = retrying
**Rules:**
- Replace verbose poll output with this compact format
- Platform-dependent: TeamCreate uses TaskList status; tmux uses best-effort pane parsing; sequential is trivial
- Show retry count for retrying agents
#### Result Cards — rendered at start of Phase 4
After reading all agent outputs, before synthesis. Use this exact table format:
```markdown
## Results
| Agent | Score | Findings | Key Insight |
|-------|-------|----------|-------------|
| {role} | {score}/10 {pass/retry emoji} | {count} findings | {one-sentence top finding} |
Total: {N} findings | {accepted}/{total} accepted | {retry_count} retries
```
**Rules:**
- MUST appear before synthesis is written
- "Key Insight" = single most important finding per agent (forces prioritization)
- Retry agents show retry reason in findings column
#### Token budget (Agent Cards + Dimension Interview)
| Phase | Extra tokens | Source |
|-------|-------------|--------|
| Phase 1 (interview) | 100-400 | AskUserQuestion calls (0-4 questions) + option generation from config/directory scan |
| Phase 1 (plan cards) | 100-300 | Table render (varies by team size) |
| Phase 3 (status) | ~20/poll | Compact status line |
| Phase 4 (result cards) | 150-250 | Summary table |
| **Total** | **~400-1000** | **0.5-2% of typical 50K-200K run** |
### Phase 2: Repromptverse prompt pack (~2 minutes)
For EACH agent:
1. Pick the best-matching template from `references/` (or use base XML structure)
2. Read it, then apply these **per-agent adaptations**:
- `<role>`: Specific expert title for THIS agent's domain
- `<context>`: Add exact file paths (verified with `ls`), what OTHER agents handle (boundary awareness)
- `<requirements>`: At least 5 specific, independently verifiable requirements
- `<constraints>`: Scope boundary with other agents, read-only vs write, file/directory boundaries
- `<output_format>`: Exact path `/tmp/rpt-{taskname}-{agent-domain}.md`, required sections
- `<success_criteria>`: Minimum N findings, file:line references, no hallucinated paths
**Score each prompt — target 8+/10.** If under 8, add more context/constraints.
Write all to `/tmp/rpt-agent-prompts-{taskname}.md`
#### Reprompt quality scorecard (mandatory)
After writing all agent prompts, show the before/after comparison so the user sees the improvement:
```markdown
## Reprompt Quality
| Metric | Raw prompt | After reprompt | Change |
|--------|-----------|----------------|--------|
| Overall | {raw}/10 | {after}/10 | +{pct}% |
| Per-agent avg | - | {avg}/10 | - |
| Agents | - | {N} | - |
Raw prompt scored {raw}/10. After reprompting, each agent prompt scores {min}-{max}/10 (avg {avg}/10).
```
**Rules:**
- MUST appear after Phase 2 prompt generation, before Phase 3 execution
- Shows the user exactly how much reprompter improved their input
- If any agent prompt scores < 8, note which ones and what was added to fix them
### Phase 3: Execute
Phase 3 has platform-specific execution methods. Pick the one that matches your environment. The reprompted prompts from Phase 2 work with any method.
**Status Line (all platforms):** During polling, show compact agent status with each cycle. See Agent Cards section for format.
#### Option A: tmux (Claude Code)
```bash
# 1. Start Claude Code with Agent Teams
tmux new-session -d -s {session} "cd /path/to/workdir && CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 claude --model opus"
# placeholders:
# - {session}: unique tmux session name (example: rpt-auth-audit)
# - /path/to/workdir: absolute repository path for the target project (example: /tmp/reprompter-check)
# 2. Wait for startup
sleep 12
# 3. Send prompt — MUST use -l (literal), Enter SEPARATE
# IMPORTANT: Include POLLING RULES to prevent lead TaskList loop bug
tmux send-keys -t {session} -l 'Create an agent team with N teammates. CRITICAL: Use model opus for ALL tasks.
POLLING RULES — YOU MUST FOLLOW THESE:
- After sending tasks, poll TaskList at most 10 times
- If ALL tasks show "done" status, IMMEDIATELY stop polling
- After 3 consecutive TaskList calls showing the same status, STOP polling regardless
- Once you stop polling: read the output files, then write synthesis
- DO NOT call TaskList more than 20 times total under any circumstances
Teammate 1 (ROLE): TASK. Write output to /tmp/rpt-{taskname}-{domain}.md. ... After all complete, synthesize into /tmp/rpt-{taskname}-final.md'
sleep 0.5
tmux send-keys -t {session} Enter
# 4. Monitor (poll every 15-30s) — show Status Line: Agents: ✅ N/T ⏳ N/T 🔄 N/T
tmux capture-pane -t {session} -p -S -100
# 5. Verify outputs
ls -la /tmp/rpt-{taskname}-*.md
# 6. Cleanup
tmux kill-session -t {session}
```
#### Critical tmux rules
⚠️ **WARNING: Default teammate model is HAIKU unless explicitly overridden. Always set `--model opus` in both CLI launch command and team prompt.**
| Rule | Why |
|------|-----|
| Always `send-keys -l` (literal flag) | Without it, special chars break |
| Enter sent SEPARATELY | Combined fails for multiline |
| sleep 0.5 between text and Enter | Buffer processing time |
| sleep 12 after session start | Claude Code init time |
| `--model opus` in CLI AND prompt | Default teammate = HAIKU |
| Each agent writes own file | Prevents file conflicts |
| Unique taskname per run | Prevents collisions between concurrent sessions |
### Phase 4: Evaluate + retry
1. Read each agent's report
2. Score against success criteria from Phase 2:
- 8+/10 → ACCEPT
- 4-6/10 → RETRY with delta prompt (tell them what's missing)
- < 4/10 → RETRY with full rewrite
**Accept checklist** (use alongside score — all must pass):
- [ ] All required output sections present
- [ ] Requirements from Phase 2 independently verifiable
- [ ] No hallucinated file paths or line numbers
- [ ] Scope boundaries respected (no overlap with other agents)
3. Max 2 retries (3 total attempts)
4. **Show Result Cards** — render summary table before synthesis (see Agent Cards section for format)
5. Deliver final report to user
**Delta prompt pattern:**
```
Previous attempt scored 5/10.
✅ Good: Sections 1-3 complete
❌ Missing: Section 4 empty, line references wrong
This retry: Focus on gaps. Verify all line numbers.
```
### Expected cost & time
| Team size | Time | Cost |
|-----------|------|------|
| 2 agents | ~5-8 min | ~$1-2 |
| 3 agents | ~8-12 min | ~$2-3 |
| 4 agents | ~10-15 min | ~$2-4 |
Estimates cover Phase 3 (execution) only. Add ~3 minutes for Phases 1-2 and ~5-8 minutes per retry. Each agent uses ~25-70% of their 200K token context window.
#### Option B: TeamCreate (Claude Code native)
When using Claude Code with TeamCreate/SendMessage tools (native agent teams, no tmux needed):
```text
# 1. Create team
TeamCreate(team_name="rpt-{taskname}", description="Repromptverse: {task summary}")
# 2. Create tasks (one per agent)
TaskCreate(subject="Agent 1 task", description="Full reprompted prompt from Phase 2")
TaskCreate(subject="Agent 2 task", description="Full reprompted prompt from Phase 2")
# 3. Spawn teammates (MUST specify model=opus)
Task(subagent_type="general-purpose", team_name="rpt-{taskname}", name="agent-1", model="opus",
prompt="You are {role} on the rpt-{taskname} team. Your task is Task #1. [full prompt]",
run_in_background=true)
Task(subagent_type="general-purpose", team_name="rpt-{taskname}", name="agent-2", model="opus",
prompt="You are {role} on the rpt-{taskname} team. Your task is Task #2. [full prompt]",
run_in_background=true)
# 4. Wait for teammates to complete — show Status Line per poll cycle
# Status Line: Agents: ✅ N/T ⏳ N/T 🔄 N/T (derived from TaskList status)
# 5. Compile synthesis from teammate reports
# 6. Shutdown teammates and delete team
SendMessage(type="shutdown_request", recipient="agent-1")
TeamDelete()
```
**Advantages over tmux:** Teammates can message each other (cross-agent flags), shared TaskList for progress tracking, no tmux/terminal dependency, built-in idle/shutdown protocol.
**When to use TeamCreate vs tmux:** Use TeamCreate when agents need to communicate (review teams, audit teams). Use tmux when agents are fully independent and you want visible terminal panes.
#### Option C: sessions_spawn (OpenClaw only)
When tmux/Claude Code is unavailable but running inside OpenClaw:
```
sessions_spawn(task: "<per-agent prompt>", model: "opus", label: "rpt-{role}")
```
Note: `sessions_spawn` is an OpenClaw-specific tool. Not available in standalone Claude Code.
#### Option D: Codex parallel sessions (Codex runtime)
When running in Codex:
1. Create one session per agent role (or use native subagents if available).
2. Send each session the corresponding prompt from `/tmp/rpt-agent-prompts-{taskname}.md`.
3. Require each session to write exactly one artifact to `/tmp/rpt-{taskname}-{agent-domain}.md`.
4. Poll for artifact completion every 15-30s with a hard cap (max 40 polls total).
5. Run Phase 4 evaluator loop and merge to `/tmp/rpt-{taskname}-final.md`.
If Codex parallel sessions are not available, immediately fall back to Option E.
#### Option E: Sequential (any LLM)
No parallel execution tools available? Run each agent's reprompted prompt one at a time in the same session. Works with any LLM (Claude, GPT, Gemini, Codex, etc.). Slower but fully platform-agnostic.
The reprompted prompts from Phase 2 are pure text. They work regardless of execution method.
---
## Quality scoring
**Always show before/after metrics:**
| Dimension | Weight | Criteria |
|-----------|--------|----------|
| Clarity | 20% | Task unambiguous? |
| Specificity | 20% | Requirements concrete? |
| Structure | 15% | Proper sections, logical flow? |
| Constraints | 15% | Boundaries defined? |
| Verifiability | 15% | Success measurable? |
| Decomposition | 15% | Work split cleanly? (Score 10 if task is correctly atomic) |
```markdown
| Dimension | Before | After | Change |
|-----------|--------|-------|--------|
| Clarity | 3/10 | 9/10 | +200% |
| Specificity | 2/10 | 8/10 | +300% |
| Structure | 1/10 | 10/10 | +900% |
| Constraints | 0/10 | 7/10 | new |
| Verifiability | 2/10 | 8/10 | +300% |
| Decomposition | 0/10 | 8/10 | new |
| **Overall** | **1.45/10** | **8.35/10** | **+476%** |
```
> **Bias note:** Scores are self-assessed. Treat as directional indicators, not absolutes.
---
## Closed-loop quality (v6.0+)
For both modes, RePrompter supports post-execution evaluation:
1. **IMPROVE** — Score raw → generate structured prompt
2. **EXECUTE** — **Repromptverse mode only**: route to agent(s), collect output. **Single mode does not execute code/commands; it only generates prompts.**
3. **EVALUATE** — Score output/prompt against success criteria (0-10)
4. **RETRY** — Thresholds: Single mode retry if score < 7; Repromptverse retry if score < 8. Max 2 retries.
---
## Advanced features
### Reasoning-friendly prompting (Claude 4.x)
Prompts should be less prescriptive about HOW. Focus on WHAT — clear task, requirements, constraints, success criteria. Let the model's own reasoning handle execution strategy.
**Example:** Instead of "Step 1: read the file, Step 2: extract the function" → "Extract the authentication logic from auth.ts into a reusable middleware. Requirements: ..."
### Response prefilling (API only)
Prefill assistant response start to enforce format:
- `{` → forces JSON output
- `## Analysis` → skips preamble, starts with content
- `| Column |` → forces table format
### Context engineering
Generated prompts should COMPLEMENT runtime context (CLAUDE.md, skills, MCP tools), not duplicate it. Before generating:
1. Check what context is already loaded (project files, skills, MCP servers)
2. Reference existing context: "Using the project structure from CLAUDE.md..."
3. Add ONLY what's missing — avoid restating what the model already knows
### Capability policy routing (OpenClaw + multi-LLM)
When multiple providers/models are available, route each agent by capability tier:
- `reasoning_high`: audits, synthesis, high-risk tasks
- `long_context`: very large context windows or broad codebase scans
- `cost_optimized` / `latency_optimized`: low-risk triage and bulk tasks
- Always emit fallback chain with provider diversity (avoid single-provider hard dependency)
### Budgeted layered context
Build per-agent context in layers with explicit budgets:
1. Task contract (always preserved)
2. Local code facts
3. Selected references
4. Prior artifacts/handoffs
Emit a context manifest (used tokens, truncation flags, dropped entries) so retries are reproducible and debuggable.
### Strict artifact gate
Before synthesis, evaluate each artifact for:
- Required section coverage
- Verifiability (file:line refs when required)
- Boundary compliance (forbidden-pattern checks)
- Overall weighted score threshold
If gate fails, retry only with delta prompts (max 2 retries).
Implementation note: combine routing + patterns + model policy + context + adapter + evaluator through a single orchestration contract (`scripts/repromptverse-runtime.js`) to keep behavior deterministic across runtimes.
### Runtime feature flags
Repromptverse runtime supports deterministic toggles for rollout and troubleshooting:
- `REPROMPTER_POLICY_ENGINE=0|1` — disable/enable capability-based model routing
- `REPROMPTER_LAYERED_CONTEXT=0|1` — disable/enable layered context assembly
- `REPROMPTER_STRICT_EVAL=0|1` — disable/enable strict artifact evaluator defaults
- `REPROMPTER_PATTERN_LIBRARY=0|1` — disable/enable pattern selector activation
- `REPROMPTER_TELEMETRY=0|1` — disable/enable runtime telemetry emission for observability reports
- `REPROMPTER_FLYWHEEL=0|1` — disable/enable Prompt Flywheel outcome learning (v9.0+)
### Telemetry and observability
Every Repromptverse run should emit stage-level telemetry events with `runId`, `taskId`, stage name, status, latency, and provider/model where applicable.
- Event stages: `route_intent`, `select_patterns`, `resolve_model`, `build_context`, `plan_ready`, `spawn_agent`, `poll_artifacts`, `evaluate_artifact`, `finalize_run`, `fingerprint_recipe`, `collect_outcome`, `learn_strategy`
- Storage: `.reprompter/telemetry/events.ndjson`
- Report command: `npm run telemetry:report`
### Prompt Flywheel (v9.0+)
Closed-loop outcome learning system. Every prompt reprompter generates carries a **recipe fingerprint** — a deterministic hash of the strategy decisions (template, patterns, capability tier, domain, context layers, quality bucket). After execution, **outcome signals** are passively collected and linked back to the fingerprint.
#### Flywheel user guidance
When the flywheel has enough historical data to influence a recommendation, the AI agent should communicate this to the user concisely:
**When to show flywheel info:**
- Show a brief one-liner when flywheel bias is applied to a plan (e.g., "Flywheel: using constraint-first pattern based on 8 past runs (score 8.7, high confidence)")
- Show when the recommended strategy differs from what would have been selected without historical data
- If the flywheel recommends a different template (via `flywheelBias.template`), prefer that template for prompt generation in Phase 2 unless the user explicitly overrides
**Template bias:** When `flywheelBias.template` is set, use that template ID for prompt generation instead of the default intent-routed template. This is the most impactful flywheel signal — template choice shapes the entire prompt structure. Log the override: "Flywheel: using {template} (historically {score}/10 over {N} runs)"
**When NOT to show flywheel info:**
- No outcome data exists yet (cold start) — do not mention the flywheel at all
- Confidence is `insufficient` (<2 samples) or `low` (<5 samples) — silently skip, no user-facing note
- Bias lookup found data but no changes were applied — nothing to report
**Format:** Always a single inline note, never a table or multi-line block. Example:
> Flywheel: preferring `security-template` + `self-critique-checkpoint` pattern (9 runs, score 8.3/10, high confidence)
**Privacy:** All flywheel data is local (`.reprompter/flywheel/`). Never reference specific past prompts, tasks, or user content in flywheel messages — only aggregate statistics (run count, score, confidence level).
**All data is stored locally.** Nothing is transmitted anywhere. Storage: `.reprompter/flywheel/outcomes.ndjson`.
**How it works:**
1. **Fingerprint** — At `plan_ready`, the recipe vector (template + patterns + tier + domain + layers + quality bucket) is hashed into a 16-char fingerprint
2. **Outcome collection** — At `finalize_run`, passive signals are captured: artifact evaluator score/pass, retry count, execution time. Linked to the recipe fingerprint.
3. **Strategy learning** — On future runs, the learner queries the outcome ledger for similar past tasks, scores each recipe group (time-decay weighted), and recommends the historically best-performing strategy
**Effectiveness scoring:**
- Base: artifact evaluator score
- Penalties: retries (-0.5 each), post-corrections (-0.3 each, capped at -2.0)
- Bonus: first-attempt pass (+0.5)
- Overrides: explicit user reject (caps at 3.0), explicit user accept (floors at 7.0)
**Time decay:** 7-day half-life. Recent outcomes weigh more. Month-old outcomes have <10% influence.
**Confidence levels:** high (10+ samples), medium (5-9), low (2-4), insufficient (<2, no recommendation made).
Report command: `npm run flywheel:report`
Benchmark command: `npm run benchmark:flywheel`
### Pattern library (pluggable)
Treat prompt/context engineering advancements as toggleable patterns (not fixed doctrine):
- Constraint-first framing
- Uncertainty labeling
- Self-critique checkpoint
- Delta retry scaffold
- Evidence-strength labeling
- Context-manifest transparency
Activate by task/domain/outcome profile and validate via benchmark fixtures.
### Token budget
Keep generated prompts under ~2K tokens for single mode, ~1K per agent for Repromptverse. Longer prompts waste context window without improving quality. If a prompt exceeds budget, split into phases or move detail into constraints.
### Uncertainty handling
Always include explicit permission for the model to express uncertainty rather than fabricate:
- Add to constraints: "If unsure about any requirement, ask for clarification rather than assuming"
- For research tasks: "Clearly label confidence levels (high/medium/low) for each finding"
- For code tasks: "Flag any assumptions about the codebase with TODO comments"
---
## Settings (for Repromptverse mode)
> Note: `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS` is an experimental flag that may change in future Claude Code versions. Check [Claude Code docs](https://docs.anthropic.com/en/docs/claude-code) for current status.
In `~/.claude/settings.json`:
```json
{
"env": {
"CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
},
"preferences": {
"teammateMode": "tmux",
"model": "opus"
}
}
```
| Setting | Values | Effect |
|---------|--------|--------|
| `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS` | `"1"` | Enables agent team spawning |
| `teammateMode` | `"tmux"` / `"default"` | `tmux`: each teammate gets a visible split pane. `default`: teammates run in background |
| `model` | `"opus"` / `"sonnet"` | Teammates default to Haiku. Always set `model: opus` explicitly in your prompt — do not rely on runtime defaults. |
---
## Proven results
### Single prompt (v6.0)
Rough crypto dashboard prompt: **1.6/10 → 9.0/10** (+462%)
### Repromptverse E2E (v6.1)
3 Opus agents, sequential pipeline (PromptAnalyzer → PromptEngineer → QualityAuditor):
| Metric | Value |
|--------|-------|
| Original score | 2.15/10 |
| After Repromptverse | **9.15/10** (+326%) |
| Quality audit | PASS (99.1%) |
| Weaknesses found → fixed | 24/24 (100%) |
| Cost | $1.39 |
| Time | ~8 minutes |
### Repromptverse vs raw Agent Teams (v7.0)
Same audit task, 4 Opus agents:
| Metric | Raw | Repromptverse | Delta |
|--------|-----|----------------|-------|
| CRITICAL findings | 7 | 14 | +100% |
| Total findings | ~40 | 104 | +160% |
| Cost savings identified | $377/mo | $490/mo | +30% |
| Token bloat found | 45K | 113K | +151% |
| Cross-validated findings | 0 | 5 | — |
---
## Tips
- **More context = fewer questions** — mention tech stack, files
- **"expand"** — if Quick Mode gave too simple a result, re-run with full interview
- **"quick"** — skip interview for simple tasks
- **"no context"** — skip auto-detection
- Context is per-project — switching directories = fresh detection
---
## Test scenarios
See [TESTING.md](TESTING.md) for 44 verification scenarios + anti-pattern examples.
---
## Appendix: Extended XML tags
Templates may add domain-specific tags beyond the 8 required base tags. Always include all base tags first.
| Extended Tag | Used In | Purpose |
|-------------|---------|---------|
| `<symptoms>` | bugfix | What the user sees, error messages |
| `<investigation_steps>` | bugfix | Systematic debugging steps |
| `<endpoints>` | api | Endpoint specifications |
| `<component_spec>` | ui | Component props, states, layout |
| `<agents>` | swarm | Agent role definitions |
| `<task_decomposition>` | swarm | Work split per agent |
| `<coordination>` | swarm | Inter-agent handoff rules |
| `<routing_policy>` | repromptverse | Speaker and router policy |
| `<termination_policy>` | repromptverse | Max turn/time and stop conditions |
| `<artifact_contract>` | repromptverse | Output schema and ownership |
| `<evaluation_loop>` | repromptverse | Score thresholds and retry policy |
| `<research_questions>` | research | Specific questions to answer |
| `<methodology>` | research | Research approach and methods |
| `<reasoning>` | research | Reasoning notes space (non-sensitive, concise) |
| `<current_state>` | refactor | Before state of the code |
| `<target_state>` | refactor | Desired after state |
| `<coverage_requirements>` | testing | What needs test coverage |
| `<threat_model>` | security | Threat landscape and vectors |
| `<structure>` | docs | Document organization |
| `<reference>` | docs | Source material to reference |