verify-sprint
Spec-fidelity verification tracing requirements through code.
Best use case
verify-sprint is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Spec-fidelity verification tracing requirements through code.
Teams using verify-sprint should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/verify-sprint/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How verify-sprint Compares
| Feature / Agent | verify-sprint | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Spec-fidelity verification tracing requirements through code.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Verify Sprint Spec-fidelity review. Walk the spec's algorithms and requirements step-by-step, verify the code implements each one correctly. Find behavioral bugs that mechanical checks miss. ## When to Use After `/review-sprint` passes. That command checks surface properties (coverage, linting, dead code). This command checks **behavioral correctness** — does the code actually do what the spec says? ## What This Is NOT - Not a linter check (use `/review-sprint`) - Not a docs-vs-code audit (use `/arch-review`) - Not a test quality review (use `/review-tests`) This is the review that asks: "If I follow the spec's algorithm with a pencil, does the code do the same thing at every step?" ## Context to Load 1. `CLAUDE.md` — Principles (especially #7, #8) 2. Architecture doc for the feature (e.g., `docs/architecture/pending/*.md`) 3. `docs/sprints/current/spec.md` — Contracts and phase breakdown 4. All new/modified source files (`git diff --name-only main..HEAD -- '*.py' ':!tests/' ':!demos/'`) 5. Test files for new source files Load the architecture doc FIRST. Read it completely before touching any code. The spec is the oracle. **Code navigation:** Use LSP tools for all code tracing — `find_definition` to jump to implementations, `find_references` to find usages, `get_incoming_calls` to trace call chains, `get_hover` for type info. Do not Grep for `def foo` or `class Bar`. Reserve Grep for pattern searches only. ## Review Process ### 1. Extract Spec Requirements Read the architecture doc and sprint spec. Extract every behavioral requirement into a checklist. Categories: **Algorithm steps:** Numbered steps in resolution/processing algorithms. Each step is a requirement. **Branching logic:** "If X then Y, else Z" — each branch is a requirement. **Timestamp/RNG semantics:** Which distribution, what parameters, what order of RNG consumption. **Error conditions:** What raises, when, with what message. **Feature interactions:** How the new feature interacts with existing features (events, re-entry, mutations, deactivation). **Invariants:** Properties stated as "always true" in the spec. Write each requirement as a one-line checklist item with a spec citation: ``` - [ ] Step 6i: Terminal state behaviors fire after transition recorded (architecture doc line N) - [ ] Dropout behaviors use sequential exponential gaps, not uniform (Behaviors section) - [ ] Runtime probability sum > 1.0 raises SimulationError (Algorithm step 4) ``` ### 2. Trace Each Requirement Through Code For EACH checklist item: 1. Find the corresponding code path 2. Read the code line by line 3. Verify behavioral equivalence (not just structural presence) 4. Check: does the code handle the same edge cases the spec describes? **Critical distinction:** "Code exists for this feature" is NOT the same as "code correctly implements this feature." A function that handles transitions may still use the wrong selection algorithm. ### 3. Identify Untested Spec Requirements For each requirement verified in step 2, check if a test exercises it: 1. Search test files for the specific behavior 2. Verify the test actually triggers the code path (not just nominally testing it) 3. Flag requirements that have NO test coverage Common gaps: - Terminal state behaviors (tests often stop at "reached terminal") - Dropout-specific behavior (tests check dropout happened, not HOW) - Timestamp algorithm differences between code paths - RNG consumption order - Edge cases mentioned in spec but not in tests ### 4. Check Code Quality at Boundaries Look specifically at: **Encapsulation:** Does new code use public APIs or reach into private attributes? **Precision:** Are numeric conversions lossy? (float→int, timedelta→seconds) **Floating point:** Are equality/comparison checks on accumulated floats safe? **Duplication:** Is the same logic implemented twice with slight variations? (Often indicates a missed abstraction or a branch that should dispatch to different implementations.) ### 5. Assess Test-Spec Alignment For each test: 1. What spec requirement does it claim to test? 2. Does it actually exercise that requirement's code path? 3. Would the test still pass if the requirement were implemented wrong? A test that passes by accident (wrong layer, wrong code path, insufficient assertions) is worse than no test — it creates false confidence. ## Output Format Structure findings as: ```markdown # Sprint Verification: [Name] **Date:** YYYY-MM-DD **Spec:** [path to architecture doc] **Sprint:** [path to sprint spec] ## Requirements Checklist ### Algorithm Steps - [x] Step 1: Description — VERIFIED (file:line) - [ ] Step 6i: Terminal behaviors — MISSING (code returns before evaluation) - [x] Step 5: Weighted selection — VERIFIED (file:line) ### Feature Interactions - [x] Mutations visible to subsequent states — VERIFIED - [ ] Events frozen at entry tick — NOT TESTED ### Invariants - [x] Deterministic — VERIFIED (test exists) - [x] Monotonic timestamps — VERIFIED ## Findings ### Bug: [Title] **Spec says:** [Quote from spec with section reference] **Code does:** [What actually happens, with file:line] **Impact:** [What breaks for educators/students] **Test gap:** [Why existing tests don't catch this] ### Code Quality: [Title] **Location:** file:line **Issue:** [Description] **Severity:** High / Medium / Low ## Summary | Category | Total | Verified | Missing | Wrong | |----------|-------|----------|---------|-------| | Algorithm steps | N | N | N | N | | Feature interactions | N | N | N | N | | Invariants | N | N | N | N | | Error conditions | N | N | N | N | **Verdict:** [PASS / ISSUES FOUND] ``` ## Principles - **Spec is oracle.** If code differs from spec, it's a bug until proven otherwise. If the code is genuinely better, note it as a spec update candidate — but still flag it. - **Behavioral equivalence, not structural.** Don't just check "there's a function for X." Check that the function does X correctly. - **Negative space matters.** Requirements the tests DON'T cover are the highest-risk findings. - **One finding per deviation.** Don't bundle. Each spec deviation is its own finding with its own citation. - **No opinions on style.** This review is about correctness, not aesthetics. Leave style to `/review-sprint`.
Related Skills
review-sprint
Post-implementation mechanical audit of sprint deliverables.
implement-sprint
Automated sprint implementation with context discipline and quality gates.
eval-sprint
Adversarial evaluation of sprint spec before implementation.
create-sprint
Guide sprint planning from scope assessment to spec artifacts.
session
Analyze Claude Code session transcripts — search, summarize, list, or inspect how a session went.
role-educator
Course designer mode for creating exercises, configs, and QA criteria.
role-architect
System architect mode for designing interfaces, contracts, and architecture decisions.
review-tests
Comprehensive test review using parallel test-reviewer agents.
issue
Create, list, and resolve review issues. Critical issues get individual files for research; warnings and gaps go to a quick-fix checklist.
get-started
Interactive setup guide for configuring CLAUDE.md and CAPABILITIES.md.
audit-docs
Orchestrate sequential documentation audits with checkpointing and resumption.
arch-review
Review architecture documents against code implementation and principles.