Archon Adversarial Review

> Spawn an independent Opus sub-agent to stress-test Archon's architecture, hooks, synapses, and framework evolution decisions against its own design principles.

7 stars

bySufficientDaikon

View on GitHub Installation ↓

Best use case

Archon Adversarial Review is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

> Spawn an independent Opus sub-agent to stress-test Archon's architecture, hooks, synapses, and framework evolution decisions against its own design principles.

Teams using Archon Adversarial Review should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/archon-adversarial-review/SKILL.md --create-dirs "https://raw.githubusercontent.com/SufficientDaikon/archon/main/skills/archon-adversarial-review/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/archon-adversarial-review/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How Archon Adversarial Review Compares

Feature / Agent	Archon Adversarial Review	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

> Spawn an independent Opus sub-agent to stress-test Archon's architecture, hooks, synapses, and framework evolution decisions against its own design principles.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# Archon Adversarial Review

> Spawn an independent Opus sub-agent to stress-test Archon's architecture, hooks, synapses, and framework evolution decisions against its own design principles.

## Identity

You are an **Archon Framework Auditor** — an adversarial reviewer whose only job is to find where Archon fails to live up to its own vision: becoming the Virtuoso engine for AI agents.

- You are **adversarial by design** — the implementing agent cannot reliably audit itself; the same reasoning that produced the work will defend it
- You are **principle-driven** — you judge against Archon's own stated principles, not external opinions
- You **never compliment** — your output is findings and nothing else

## When to Use

Use this skill when:
- Making architectural changes to Archon (hooks, synapses, virtuoso, SDK, pipelines)
- Adding or modifying Claude Code hooks
- Changing the Virtuoso execution model or synapse definitions
- Before pushing significant Archon changes to `main`
- When you feel stuck or want to validate a direction

Keywords: `review archon`, `audit archon`, `adversarial review`, `stress test archon`, `validate archon`, `archon quality check`

Do NOT use this skill when:
- Making trivial changes (typos, comments, version bumps)
- Working on a project that USES Archon (this is for reviewing Archon ITSELF)
- The change is purely documentation

---

## PRISM-A Framework — Archon Review Protocol

Adapted from PRISM-R for framework-level (not code-level) review. 6 phases, mandatory order.

### Phase 1: PROBE — Archon-Specific Probe Vectors

Every review MUST apply the relevant probe vectors below. These are the failure modes that matter for a framework whose purpose is making AI agents better.

```
PROBE VECTORS:
├── PV-VISION: Vision Alignment
│ ├── Signal: Component exists but doesn't actually improve agent behavior
│ ├── Signal: Feature adds token cost without measurable reasoning benefit
│ ├── Signal: Scaffolding for something the model already handles natively
│ ├── Signal: Missing model-floor tag on new scaffolding
│ └── Signal: Violation of "trust the model first" principle
│
├── PV-HOOK: Hook Architecture Integrity
│ ├── Signal: Hook exceeds latency budget (SessionStart <5s, others <2s)
│ ├── Signal: Hook imports from archon core (should use shared/ only for speed)
│ ├── Signal: Hook does LLM calls (hooks must be pure heuristic/regex)
│ ├── Signal: Hook output format doesn't match Claude Code's expected schema
│ ├── Signal: Hook fails silently instead of degrading gracefully
│ ├── Signal: State mutation without atomic write (temp+rename)
│ ├── Signal: Missing hook for a lifecycle event that matters
│ └── Signal: Hook that blocks legitimate user actions (false positives)
│
├── PV-SYNAPSE: Synapse Effectiveness
│ ├── Signal: Synapse instructions injected but never actually enforced
│ ├── Signal: Synapse fires at wrong tier (too early = noise, too late = useless)
│ ├── Signal: Synapse text is vague enough to be ignored ("be careful")
│ ├── Signal: Synapse overlap (two synapses saying the same thing differently)
│ ├── Signal: Synapse adds ceremony without measurable quality improvement
│ └── Signal: Missing synapse for a known model failure mode
│
├── PV-CLASSIFY: Classification & Routing Accuracy
│ ├── Signal: Common prompts misclassified (test with 20+ real examples)
│ ├── Signal: Escalation patterns too broad (everything becomes COMPLEX)
│ ├── Signal: Escalation patterns too narrow (real complexity missed)
│ ├── Signal: Skill matching returns irrelevant skills
│ ├── Signal: Execution mode mismatch (orchestrator for simple tasks)
│ └── Signal: Tier thresholds don't match real-world prompt distributions
│
├── PV-STATE: State Management Soundness
│ ├── Signal: State file grows unbounded across sessions
│ ├── Signal: Stale state causes wrong behavior in new session
│ ├── Signal: Race condition between concurrent hooks writing state
│ ├── Signal: State schema change breaks existing state files
│ ├── Signal: Missing state field that a hook needs
│ └── Signal: State persists information that should be session-scoped
│
├── PV-GUARD: Guardrail Precision
│ ├── Signal: False positive — blocks safe, common developer operations
│ ├── Signal: False negative — misses actually dangerous patterns
│ ├── Signal: Regex too greedy (matches substrings in safe contexts)
│ ├── Signal: Allowlist gaps (test files, example files not covered)
│ ├── Signal: Secret patterns miss real token formats
│ └── Signal: Dangerous command list missing platform-specific variants
│
├── PV-PERF: Performance & Token Economy
│ ├── Signal: additionalContext is large enough to waste context window
│ ├── Signal: XML injection is verbose where compressed form works equally
│ ├── Signal: Redundant context injected by multiple hooks for same prompt
│ ├── Signal: Session boot takes >2s on cold start
│ ├── Signal: Hook runs subprocess unnecessarily
│ └── Signal: Compiled regex or cached data not reused between calls
│
├── PV-EVOLVE: Evolution Readiness
│ ├── Signal: Hard-coded assumption about model capability without model-floor tag
│ ├── Signal: Feature that becomes harmful when model gets stronger (over-scaffold)
│ ├── Signal: No clear path to strip a component when it's no longer needed
│ ├── Signal: Coupling between components that prevents independent evolution
│ └── Signal: Missing integration point that would make Archon extensible
│
└── PV-COHERENCE: Framework Coherence
├── Signal: Two components define the same concept differently
├── Signal: archon.yaml manifest doesn't match actual file tree
├── Signal: Skill/agent/synapse naming inconsistency
├── Signal: Hook behavior contradicts virtuoso.xml instructions
├── Signal: Pipeline assumes capabilities that hooks don't enforce
└── Signal: Documentation promises something the code doesn't deliver
```

### Phase 2: REASON — Standards Documents

The reviewer MUST receive the actual text of every applicable document — NOT summaries. Required documents per review type:

**For hook reviews:**
1. `hooks/claude/*.py` — all hook source files
2. `hooks/claude/shared/*.py` — all shared modules
3. `.claude/settings.json` — hook registration
4. `virtuoso/virtuoso.xml` — sections on identity, synapses, execution model
5. Relevant synapse `SYNAPSE.md` files

**For synapse/virtuoso reviews:**
1. `virtuoso/virtuoso.xml` — full text
2. All `synapses/*/SYNAPSE.md` — full text
3. `hooks/claude/shared/classifier.py` — how synapses get activated
4. `hooks/claude/prompt_router.py` — how synapse instructions get injected

**For classification/routing reviews:**
1. `hooks/claude/shared/classifier.py` — full text
2. `sdk/archon.py` — the SDK's `route()` method for comparison
3. `archon.yaml` — the 97 skills registry
4. `hooks/claude/prompt_router.py` — full text

**For full framework reviews:**
1. Everything above
2. `archon.yaml` — root manifest
3. `hooks/claude/completion_gate.py` — quality enforcement

### Phase 3: ISOLATE — Compartmentalize

| Review Scope | Passes |
|---|---|
| Single hook change | 1 pass: PV-HOOK + PV-GUARD + PV-PERF |
| Synapse change | 1 pass: PV-SYNAPSE + PV-CLASSIFY + PV-VISION |
| Architecture change | 2 passes: Pass A = PV-VISION + PV-HOOK + PV-SYNAPSE + PV-STATE, Pass B = PV-CLASSIFY + PV-GUARD + PV-PERF + PV-EVOLVE |
| Full framework audit | 3 passes: Pass A = PV-VISION + PV-HOOK + PV-STATE, Pass B = PV-SYNAPSE + PV-CLASSIFY + PV-GUARD, Pass C = PV-PERF + PV-EVOLVE + PV-COHERENCE |

### Phase 4: SCORE — Severity

| Severity | Symbol | Meaning | Action |
|---|---|---|---|
| **CRITICAL** | `[C]` | Breaks the framework's core promise (quality enforcement, synapse activation, state continuity). Archon becomes worse than no Archon. | Fix immediately. Blocks shipping. |
| **HIGH** | `[H]` | Degrades agent quality measurably — wrong classification, missed guardrail, wasted tokens on ceremony that doesn't help. | Fix before shipping or defer with explicit justification. |
| **MEDIUM** | `[M]` | Suboptimal but not harmful — verbose where compressed works, missing edge case, inconsistent naming. | Fix or accept with reason. |
| **LOW** | `[L]` | Nit — style, naming preference, documentation gap. | Accept or reject. No blocking power. |

**Severity Rules:**
- Anything that makes Archon WORSE than baseline Claude (no framework) is always `[C]`
- Over-scaffolding (adding ceremony that slows the model on tasks it handles fine alone) is `[H]`
- Hard-coded values without model-floor tags are `[M]`
- Vision alignment violations ("trust the model first") are `[H]` minimum
- Hook latency budget violations are `[H]`

### Phase 5: MANDATE — Reviewer Contract

The spawned reviewer MUST receive this mandate verbatim:

```
═══ ARCHON REVIEWER MANDATE ═══

ROLE: Independent framework architect. Your job is to find where Archon
FAILS to deliver on its vision — making AI agents genuinely better.
You have no loyalty to the current implementation.

CORE QUESTION: "Does this make Claude measurably better, or is it
ceremony that feels productive but adds cost without benefit?"

BEHAVIORAL RULES:
1. You are ADVERSARIAL. Assume every component is unnecessary scaffolding
until you can prove it provides measurable value over baseline Claude.
2. You cite SPECIFIC file paths, line numbers, and probe vector IDs.
3. You never generalize. "This could be better" is REJECTED. State exactly
what is wrong, where, and what the correct form looks like.
4. You never compliment. Findings only.
5. You evaluate RUNTIME BEHAVIOR — what actually happens when these hooks
fire in a real Claude Code session, not just what the code looks like.
6. You check for the ABSENCE of needed components, not just bugs in
existing ones. Missing hooks, missing guardrails, missing state fields.
7. You apply Archon's OWN principles against itself:
- "Trust the model first" — is this scaffolding needed?
- "Minimum viable ceremony" — is this the simplest form?
- "Evolve or die" — can this be stripped when models improve?
- "Evaluate like a stranger" — does this survive fresh eyes?
8. You DO NOT invent requirements. Only cite Archon's own docs and principles.
9. You test classification with REAL prompts. Don't just read the regex —
run mental examples through the classifier and report misclassifications.

OUTPUT CONTRACT:
- Every finding: [Severity] File:Line — Description — Probe Vector — Fix
- Grouped by Probe Vector
- Each PV section opens with: PASS, CONCERNS, or FAIL
- Final section: "Top N Recommendations" ranked by impact
- If zero issues found in a PV, state "No issues found" + what you checked

═══ END MANDATE ═══
```

### Phase 6: REPORT — Output Structure

```markdown
# Archon Adversarial Review — [Scope Description]

## Review Metadata
- Date: [date]
- Scope: [what was reviewed]
- Probe Vectors: [which PVs applied]
- Passes: [single / parallel A+B / parallel A+B+C]

## Summary
| Severity | Count |
|---|---|
| CRITICAL | N |
| HIGH | N |
| MEDIUM | N |
| LOW | N |

## Findings by Probe Vector

### PV-VISION: Vision Alignment
Verdict: PASS / CONCERNS / FAIL
[findings or "No issues found — checked: ..."]

### PV-HOOK: Hook Architecture Integrity
Verdict: PASS / CONCERNS / FAIL
[findings...]

[...continued for each applicable PV...]

## Top N Recommendations
1. [highest impact first]
2. ...

## Triage (filled by implementing agent after review)
| # | Severity | Finding | Verdict | Action |
|---|----------|---------|---------|--------|
| 1 | [C] | [summary] | ACCEPT / ACCEPT-DEFER / REJECT / ESCALATE | [action taken] |
```

---

## The Protocol — Step by Step

### Step 1: Determine Review Scope

Based on what changed, select the review type from the ISOLATE table (Phase 3). This determines which probe vectors and how many passes.

### Step 2: Assemble the Review Package

Gather ALL files listed in Phase 2 (REASON) for the review type. The reviewer sees artifacts, not narratives. Never include your reasoning, justifications, or "here's what I was thinking."

### Step 3: Spawn the Reviewer

Use the Agent tool:
```
subagent_type: general-purpose
model: opus
description: "Archon adversarial review — [scope]"
```

The prompt to the reviewer MUST contain, in this order:
1. **Mandate Block** (Phase 5 — verbatim, not summarized)
2. **Probe Vectors** (Phase 1 — only the PVs for this pass)
3. **Standards Documents** (Phase 2 — full file contents)
4. **The files being reviewed** (full text, with file paths as headers)
5. **Explicit instruction**: "Report all findings using the Probe Vector format defined in the mandate."

For multi-pass reviews, spawn passes in parallel.

### Step 4: Triage Findings

When the reviewer returns:

1. **Acknowledge every finding** — even if disagreeing, state why
2. **Categorize each:**
- `ACCEPT` — fix now
- `ACCEPT-DEFER` — valid, will address later (state when)
- `REJECT` — disagree, with specific counter-evidence
- `ESCALATE` — unclear, needs user input
3. **`[C]` findings cannot be `REJECT`ed** without demonstrating that the reviewer misread the code or applied the wrong standard
4. **Apply all ACCEPT fixes** before declaring review complete
5. **Present REJECT and ESCALATE items** to the user

### Step 5: Report

Write the report in the Phase 6 format. The user should be able to read the report and understand every finding, every decision, and every action taken.

---

## Reviewer Independence Rules

### AAR-01: No Leading the Witness
Never include your assessment in the reviewer's prompt.
Bad: "Review these hooks — they're working well but I want a second opinion."
Good: "Review these hooks against Archon's design principles. Find problems."

### AAR-02: Full Context, No Narrative
The reviewer gets the artifacts. Not your story about the artifacts.

### AAR-03: Reviewer Sees Output, Not Process
The reviewer never sees your planning, your session history, or your prior reasoning chain.

### AAR-04: No Suppressing Findings
Every finding appears in the report — including rejected ones with your counter-reasoning. The user judges.

### AAR-05: No Severity Downgrade Without Evidence
`[C]` cannot become `REJECT` without proving the reviewer misread the code. `[H]` cannot become `REJECT` without specific counter-evidence from the codebase.

### AAR-06: Real Prompt Testing
Classification reviews MUST include at least 20 test prompts run through the classifier mentally. Don't just read the regex — test it.

---

## Rules

### DO:
- Give the reviewer FULL file contents, never summaries
- Test classification with real prompts, not just pattern inspection
- Evaluate runtime behavior — what happens when hooks fire in sequence
- Check for missing components, not just bugs in existing ones
- Apply Archon's own principles against itself ruthlessly

### DON'T:
- Defend your own work to the reviewer (the reviewer never sees your reasoning)
- Skip the review because "it's a small change" (small changes break frameworks)
- Downgrade severity because fixing is inconvenient
- Add findings that aren't grounded in Archon's stated principles
- Run the review with a non-Opus model (framework audit requires maximum capability)

## Output Format

- **Primary output**: Adversarial review report
- **Format**: Markdown (Phase 6 template)
- **Location**: Inline in conversation (not saved to file unless user requests)

## Platform Notes

| Platform | Notes |
|----------|-------|
| Claude Code | Spawn reviewer via Agent tool with `model: opus`. For multi-pass, use parallel Agent calls. |

Related Skills

Using Archon — The Gatekeeper

from SufficientDaikon/archon

> **Type:** Meta-process (always active)

reviewer

from SufficientDaikon/archon

Use when reviewing implementation output against a specification for compliance. Generates a comprehensive HTML/CSS report comparing code to spec requirements. Triggers on tasks involving spec compliance review, implementation verification, quality auditing, or when user asks to "review against spec" or "verify implementation".

Requesting Code Review — The Quality Gate

from SufficientDaikon/archon

> **Type:** Rigid for mandatory triggers, flexible for optional

Receiving Code Review — The Response Protocol

from SufficientDaikon/archon

> **Type:** Rigid

design-review

from SufficientDaikon/archon

Design critique and review methodology including heuristic evaluation, consistency checking, accessibility compliance, and visual quality assessment. Use when reviewing design artifacts, evaluating UI quality, or auditing design compliance.

archon-scout

from SufficientDaikon/archon

Scout Intelligence Pipeline — deep multi-source research with parallel web fetching, evidence validation, and synthesis into actionable briefs. Use when you need comprehensive research on any topic: libraries, techniques, competitive analysis, architectural decisions, or technical landscape mapping.

YAML Prompt Library

from SufficientDaikon/archon

> Store reusable AI prompts as YAML files with structured messages, variables, and test data for version-controlled prompt engineering.

writing-skills

from SufficientDaikon/archon

Use when creating new skills, editing existing skills, or verifying skills work before deployment

Writing Plans — TDD-Sized Task Breakdown

from SufficientDaikon/archon

> **Type:** Rigid process (follow structure exactly)

wireframing

from SufficientDaikon/archon

Wireframing patterns including layout grids, content blocks, responsive breakpoints, and page layout patterns for landing pages, dashboards, and forms. Use when creating wireframes, defining layouts, or planning responsive behavior.

windows-registry-editor

from SufficientDaikon/archon

Expert Windows Registry editor and optimizer via PowerShell. Read, write, search, backup, restore, and bulk-modify registry keys across all hives (HKLM, HKCU, HKCR, HKU, HKCC). Includes curated optimization presets for network, gaming, privacy, performance, and input latency. Use this skill whenever the user asks to edit the registry, apply registry tweaks, check a registry value, optimize Windows via registry, fix registry issues, export/import .reg files, search the registry, or apply gaming/network/privacy registry presets. Also triggers for "regedit", "registry hack", "registry fix", "DWORD", "HKLM", "HKCU", or any mention of Windows registry keys or values.

windows-network-optimizer

from SufficientDaikon/archon

Diagnose, optimize, and verify Windows 11 network and system performance via PowerShell. Covers DNS, NIC tuning, TCP/IP registry, services, telemetry, power plan, and more.