eval-sprint
Adversarial evaluation of sprint spec before implementation.
Best use case
eval-sprint is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Adversarial evaluation of sprint spec before implementation.
Teams using eval-sprint should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/eval-sprint/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How eval-sprint Compares
| Feature / Agent | eval-sprint | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Adversarial evaluation of sprint spec before implementation.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Evaluate Sprint Spec
Adversarial evaluation of sprint spec before implementation. Run in a **new session** to ensure fresh eyes.
## Purpose
Find problems in the spec that would cause implementation to fail or produce poor results. The evaluator is deliberately adversarial — looking for ways the spec could be misinterpreted, is incomplete, or violates principles.
## Context Loading
Load ONLY:
1. `CLAUDE.md` — Principles (especially #7 and #8)
2. `docs/sprints/current/spec.md` — The spec to evaluate
**DO NOT load:**
- Architecture docs (spec should be self-contained)
- Capability docs (evaluating spec as written)
- Previous conversation context (that's why new session)
- **Source files via Read** — Do NOT read `.py` files to understand the codebase. Use LSP tools instead (see below).
## Code Verification — LSP Only
When verifying spec claims against the codebase (class names, function signatures, caller counts, line numbers), use **only LSP tools**. Do NOT read entire source files — this wastes context on code irrelevant to the evaluation.
| Verification need | LSP tool | Example |
|---|---|---|
| "Does this class/function exist?" | `find_definition` | Spec says `TypeResolver` — verify the actual class name |
| "Who calls this function?" | `find_references` | Spec says "only two callers" — verify caller count |
| "What's the signature?" | `get_hover` | Spec says `rng` param at line 203 — verify |
| "What type is this?" | `get_hover` | Spec references `Distribution` union — verify it exists |
| "Does this symbol exist in the module?" | `find_workspace_symbols` | Spec says export from `__init__.py` — verify |
**NEVER** use `Read` on source files for this skill. If you catch yourself about to read a `.py` file, use an LSP tool instead.
## Evaluation Checklist
### Structural Checks
- [ ] Every phase has: Delivers, Demo, Contracts sections
- [ ] Every contract has: full signature with types
- [ ] Every contract has: docstring with Args, Returns, Raises
- [ ] Success criteria are checkboxes (testable)
- [ ] Scope lists what's NOT included
- [ ] File structure section shows where code goes
### Principle Checks
- [ ] No default parameters in any signature (Principle #7)
- [ ] No `= None`, `= []`, `= {}` in signatures (Principle #7)
- [ ] No "Future:", "TODO:", "placeholder", "stub" language (Principle #8)
- [ ] No loops/iterations described that "will do X later" (Principle #8)
- [ ] No hardcoded domain values in contracts (Principle #2)
### Consistency Checks
- [ ] All types referenced in contracts exist (in spec or codebase)
- [ ] All functions called in demos are defined (in spec or codebase)
- [ ] Phase N only uses contracts from phases 1..N (dependency order)
- [ ] Stated scope matches contracts (nothing extra, nothing missing)
- [ ] Contract names follow existing codebase conventions
### Ambiguity Checks
- [ ] No weasel words: "appropriate", "as needed", "etc.", "various", "properly"
- [ ] No vague verbs: "handle", "process", "manage" without specifics
- [ ] Error conditions are specific (not just "raises Exception")
- [ ] Return types are concrete (not "suitable value")
- [ ] None/empty input behavior is specified
- [ ] Edge cases appear in Raises section
### Testability Checks
- [ ] Each contract has at least one obvious test case
- [ ] Demo requirements are specific enough to automate
- [ ] Success criteria are measurable (not "works correctly")
- [ ] Can write a test that would FAIL if contract is wrong
### Architecture Checks
- [ ] New code doesn't break existing interfaces
- [ ] Module placement matches existing structure
- [ ] Naming follows existing conventions
- [ ] No circular dependencies introduced
## Evaluation Process
### Step 1: Read Spec Cold
Read the entire spec without referring to other docs. Note:
- What's confusing?
- What questions do you have?
- What seems underspecified?
### Step 2: Run Checklist
Go through each check systematically. For every failure:
- Note the location (section, line if possible)
- Describe the problem specifically
- Suggest a fix
### Step 3: Adversarial Questions
For each contract, ask:
- "What's the worst reasonable misinterpretation?"
- "What input would break this?"
- "What if I implemented this lazily/wrong — would tests catch it?"
### Step 4: Dependency Trace
Trace through phases in order:
- Phase 1: What does it need? (should be nothing or existing code)
- Phase 2: What does it need from Phase 1? Is that actually delivered?
- Continue for all phases
### Step 5: Demo Feasibility
For each demo:
- Can I actually write this with the contracts provided?
- Does it prove what it claims to prove?
- What could pass the demo but still be wrong?
## Output Format
```markdown
# Spec Evaluation: [Sprint Name]
**Verdict: PASS / NEEDS WORK / FAIL**
**Summary:** [One sentence assessment]
---
## Blocking Issues
Issues that MUST be fixed before implementation.
### 1. [Short Title]
**Location:** [Section/line reference]
**Category:** [Structural/Principle/Consistency/Ambiguity/Testability/Architecture]
**Problem:** [Specific description]
**Impact:** [What goes wrong if not fixed]
**Suggested Fix:** [How to fix it]
---
## Warnings
Issues that SHOULD be addressed but aren't blocking.
### 1. [Short Title]
**Location:** [Section/line reference]
**Problem:** [Description]
**Suggestion:** [How to improve]
---
## Notes
Observations that aren't issues but worth considering.
- [Note 1]
- [Note 2]
---
## Checklist Results
| Category | Pass | Fail | Issues |
|----------|------|------|--------|
| Structural | 5 | 1 | Missing file structure |
| Principles | 4 | 0 | — |
| Consistency | 3 | 1 | Undefined type |
| Ambiguity | 3 | 2 | Weasel words |
| Testability | 3 | 0 | — |
| Architecture | 4 | 0 | — |
---
## Verdict Explanation
**PASS:** No blocking issues. Warnings are minor. Spec is ready for implementation.
**NEEDS WORK:** No blocking issues but warnings are significant. Review before proceeding.
**FAIL:** Blocking issues found. Must fix and re-evaluate before implementation.
```
## Verdicts
### PASS
- Zero blocking issues
- Warnings are cosmetic or minor
- Checklist mostly green
- Confident implementation will succeed
### NEEDS WORK
- Zero blocking issues
- But significant warnings that could cause problems
- User should review warnings before proceeding
- Implementation might need course correction
### FAIL
- One or more blocking issues
- Spec is incomplete, contradictory, or violates principles
- Must fix and run eval-sprint again
- Do NOT proceed to implement-sprint
## Common Blocking Issues
| Issue | Why It Blocks |
|-------|---------------|
| Missing contract for stated scope | Implementer won't know what to build |
| Undefined type referenced | Code won't compile |
| Default parameter in signature | Principle #7 violation |
| Phase dependency violation | Phase N can't be built |
| Ambiguous return type | Implementer will guess wrong |
## Common Warnings
| Issue | Why It's a Warning |
|-------|-------------------|
| Vague error message | Implementation will work but be unhelpful |
| Missing edge case | Might cause bug but not blocking |
| Inconsistent naming | Annoying but not fatal |
| Demo doesn't fully exercise contract | Partial verification |
## After Evaluation
- **If PASS:** Proceed to `/implement-sprint`
- **If NEEDS WORK:** Review warnings, decide to proceed or fix
- **If FAIL:** Fix blocking issues, run `/eval-sprint` again
## Tips for Evaluators
1. **Be skeptical** — Assume the spec has problems until proven otherwise
2. **Read literally** — Don't fill in gaps with what you think they meant
3. **Think like a lazy implementer** — What's the minimal interpretation?
4. **Think like a malicious implementer** — What technically satisfies the spec but is wrong?
5. **Check the edges** — Empty lists, None values, zero counts, boundary conditionsRelated Skills
verify-sprint
Spec-fidelity verification tracing requirements through code.
review-sprint
Post-implementation mechanical audit of sprint deliverables.
implement-sprint
Automated sprint implementation with context discipline and quality gates.
create-sprint
Guide sprint planning from scope assessment to spec artifacts.
session
Analyze Claude Code session transcripts — search, summarize, list, or inspect how a session went.
role-educator
Course designer mode for creating exercises, configs, and QA criteria.
role-architect
System architect mode for designing interfaces, contracts, and architecture decisions.
review-tests
Comprehensive test review using parallel test-reviewer agents.
issue
Create, list, and resolve review issues. Critical issues get individual files for research; warnings and gaps go to a quick-fix checklist.
get-started
Interactive setup guide for configuring CLAUDE.md and CAPABILITIES.md.
audit-docs
Orchestrate sequential documentation audits with checkpointing and resumption.
arch-review
Review architecture documents against code implementation and principles.