review-implementation
Use after hyperpowers:executing-plans completes all tasks - verifies implementation against bd spec, all success criteria met, anti-patterns avoided
Best use case
review-implementation is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Use after hyperpowers:executing-plans completes all tasks - verifies implementation against bd spec, all success criteria met, anti-patterns avoided
Teams using review-implementation should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/review-implementation/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How review-implementation Compares
| Feature / Agent | review-implementation | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Use after hyperpowers:executing-plans completes all tasks - verifies implementation against bd spec, all success criteria met, anti-patterns avoided
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
<skill_overview>
Review completed implementation against bd epic to catch gaps before claiming completion; spec is contract, implementation must fulfill contract completely.
</skill_overview>
<rigidity_level>
LOW FREEDOM - Follow the 4-step review process exactly. Review with Google Fellow-level scrutiny. Never skip automated checks, quality gates, or code reading. No approval without evidence for every criterion.
</rigidity_level>
<evidence_requirements>
## Evidence-Based Review (Zero Speculation Principle)
**Every claim requires evidence:**
| Claim Type | Required Evidence |
|------------|-------------------|
| "Code implements X" | File path:line number showing implementation |
| "Test covers Y" | Test name + specific assertion |
| "Criterion met" | Command output proving criterion |
| "No anti-pattern" | Search command showing no matches |
**Confidence Scores:**
Rate each finding 0.0-1.0:
- **1.0** - Verified with direct evidence (ran command, read code)
- **0.8** - Strong indirect evidence (multiple consistent signals)
- **0.5** - Uncertain (partial evidence, assumptions made)
- **0.3** - Weak (limited investigation, needs more verification)
**Findings below 0.8 confidence must be investigated until ≥0.8 or marked UNCERTAIN.**
**Example evidence format:**
```markdown
| Criterion | Status | Confidence | Evidence |
|-----------|--------|------------|----------|
| All tests pass | ✅ Met | 1.0 | `cargo test`: 127 passed, 0 failed |
| No unwrap in production | ❌ Not met | 1.0 | `rg "\.unwrap\(\)" src/`: Found at jwt.ts:45 |
| Error handling proper | ⚠️ Uncertain | 0.5 | Read jwt.ts, unclear if all paths covered |
```
</evidence_requirements>
<quick_reference>
| Step | Action | Deliverable |
|------|--------|-------------|
| 1 | Load bd epic + all tasks | TodoWrite with tasks to review |
| 2 | Review each task (automated checks, quality gates, read code, **audit tests**, verify criteria) | Findings per task |
| 3 | Report findings (approved / gaps found) | Review decision |
| 4 | Gate: If approved → finishing-a-development-branch, If gaps → STOP | Next action |
**Review Perspective:** Google Fellow-level SRE with 20+ years experience reviewing junior engineer code.
**Test Quality Gate:** Every new test must catch a real bug. Tautological tests (pass by definition, test mocks, verify compiler-checked facts) = GAPS FOUND.
</quick_reference>
<when_to_use>
- hyperpowers:executing-plans completed all tasks
- Before claiming work is complete
- Before hyperpowers:finishing-a-development-branch
- Want to verify implementation matches spec
**Don't use for:**
- Mid-implementation (use hyperpowers:executing-plans)
- Before all tasks done
- Code reviews of external PRs (this is self-review)
</when_to_use>
<the_process>
## Step 1: Load Epic Specification
**Announce:** "I'm using hyperpowers:review-implementation to verify implementation matches spec. Reviewing with Google Fellow-level scrutiny."
**Get epic and tasks:**
```bash
bd show bd-1 # Epic specification
bd dep tree bd-1 # Task tree
bd list --parent bd-1 # All tasks
```
**Create TodoWrite tracker:**
```
TodoWrite todos:
- Review bd-2: Task Name
- Review bd-3: Task Name
- Review bd-4: Task Name
- Compile findings and make decision
```
---
## Step 2: Review Each Task
For each task:
### A. Read Task Specification
```bash
bd show bd-3
```
Extract:
- Goal (what problem solved?)
- Success criteria (how verify done?)
- Implementation checklist (files/functions/tests)
- Key considerations (edge cases)
- Anti-patterns (prohibited patterns)
---
### B. Run Automated Code Completeness Checks
```bash
# TODOs/FIXMEs without issue numbers
rg -i "todo|fixme" src/ tests/ || echo "✅ None"
# Stub implementations
rg "unimplemented!|todo!|unreachable!|panic!\(\"not implemented" src/ || echo "✅ None"
# Unsafe patterns in production
rg "\.unwrap\(\)|\.expect\(" src/ | grep -v "/tests/" || echo "✅ None"
# Ignored/skipped tests
rg "#\[ignore\]|#\[skip\]|\.skip\(\)" tests/ src/ || echo "✅ None"
```
---
### B2. Dead Code and Refactoring Remnants Audit
**Context:** After refactoring, old code must be REMOVED, not kept as fallback.
The canonical implementation is the new one. Old code is dead code.
**Key principle:** "Don't bother with unused code. Delete it before you try to improve anything."
**Automated detection patterns:**
```bash
# 1. Fallback/Legacy Code Detection
# Patterns indicating old code left behind:
rg -i "fallback|legacy|old_|_old|deprecated|obsolete" src/ || echo "✅ None"
# Conditional using old implementation:
rg -i "if.*use.*old|if.*legacy|if.*fallback|ENABLE_OLD|USE_LEGACY|FALLBACK_TO" src/ || echo "✅ None"
# "was:" or "previously:" comments (describing removed behavior):
rg -i "was:|previously:|used to|before refactor" src/ || echo "✅ None"
# 2. Unused Code Detection (Language-Specific)
# Rust - dead code warnings:
cargo build 2>&1 | grep -E "warning.*never used|warning.*dead_code" || echo "✅ None"
# TypeScript/JavaScript - unused exports (if eslint configured):
npx eslint --rule 'no-unused-vars: error' src/ 2>/dev/null || echo "Check manually"
# Swift - unused variables (SwiftLint):
swiftlint lint --reporter json 2>/dev/null | jq '.[] | select(.rule_id == "unused")' || echo "Check manually"
# Python - vulture if available:
vulture src/ --min-confidence 80 2>/dev/null || echo "vulture not installed, check manually"
# 3. Orphaned Tests Detection
# Find tests that reference functions/classes that no longer exist:
git diff main...HEAD --name-only | grep -E "(test|spec)" || echo "No test files changed"
# 4. Deprecation Remnants (should be REMOVED, not marked):
rg "@deprecated|#\[deprecated\]|// deprecated|DEPRECATED|@Deprecated" src/ || echo "✅ None"
# 5. Backwards Compatibility Shims (unless external API):
rg -i "backward.*compat|legacy.*support|shim|polyfill" src/ || echo "✅ None"
```
**If any patterns found, investigate:**
1. **Fallback code:** Why does old implementation still exist? Delete it.
2. **Unused functions:** Who calls this? If nobody, delete it.
3. **Orphaned tests:** Does tested functionality still exist? If not, delete test.
4. **Deprecation markers:** Remove now or create bd issue with removal date.
5. **Backwards compat shims:** Is this external API? If internal, delete shim.
**Dead Code Audit Results Template:**
```markdown
#### Dead Code Audit Results
| Category | Pattern | Found | Location | Action |
|----------|---------|-------|----------|--------|
| Fallback code | `legacy\|old_\|fallback` | 0 | - | ✅ None |
| Unused functions | compiler warnings | 0 | - | ✅ None |
| Deprecation markers | `@deprecated` | 0 | - | ✅ None |
| Orphaned tests | tests for removed code | 0 | - | ✅ None |
| Backwards compat shims | `shim\|polyfill` | 0 | - | ✅ None |
**Verdict:** ✅ No dead code / ❌ Dead code found - refactoring incomplete
```
**If dead code found:** This is a **GAP**. Old code after refactoring = incomplete refactoring.
---
### C. Run Quality Gates (via test-runner agent)
**IMPORTANT:** Use hyperpowers:test-runner agent to avoid context pollution.
```
Dispatch hyperpowers:test-runner: "Run: cargo test"
Dispatch hyperpowers:test-runner: "Run: cargo fmt --check"
Dispatch hyperpowers:test-runner: "Run: cargo clippy -- -D warnings"
Dispatch hyperpowers:test-runner: "Run: .git/hooks/pre-commit"
```
---
### D. Read Implementation Files
**CRITICAL:** READ actual files, not just git diff.
```bash
# See changes
git diff main...HEAD -- src/auth/jwt.ts
# THEN READ FULL FILE
Read tool: src/auth/jwt.ts
```
**While reading, check:**
- ✅ Code implements checklist items (not stubs)
- ✅ Error handling uses proper patterns (Result, try/catch)
- ✅ Edge cases from "Key Considerations" handled
- ✅ Code is clear and maintainable
- ✅ No anti-patterns present
---
### E. Code Quality Review (Google Fellow Perspective)
**Assume code written by junior engineer. Apply production-grade scrutiny.**
**Error Handling:**
- Proper use of Result/Option or try/catch?
- Error messages helpful for production debugging?
- No unwrap/expect in production?
- Errors propagate with context?
- Failure modes graceful?
**Safety:**
- No unsafe blocks without justification?
- Proper bounds checking?
- No potential panics?
- No data races?
- No SQL injection, XSS vulnerabilities?
**Clarity:**
- Would junior understand in 6 months?
- Single responsibility per function?
- Descriptive variable names?
- Complex logic explained?
- No clever tricks - obvious and boring?
**Testing (CRITICAL - Apply strict scrutiny):**
- Edge cases covered (empty, max, Unicode)?
- Tests catch real bugs, not just inflate coverage?
- Test names describe specific bug prevented?
- Tests test behavior, not implementation?
- Failure scenarios tested?
- No tautological tests (see Test Quality Audit below)?
**Production Readiness:**
- Comfortable deploying to production?
- Could cause outage or data loss?
- Performance acceptable under load?
- Logging sufficient for debugging?
---
### E2. Test Quality Audit (Mandatory for All New Tests)
**CRITICAL:** Review every new/modified test for meaningfulness. Tautological tests are WORSE than no tests - they give false confidence.
**For each test, ask:**
1. **What bug would this catch?** → If you can't name a specific failure mode, test is pointless
2. **Could production code break while this test passes?** → If yes, test is too weak
3. **Does this test a real user scenario?** → Or just implementation details?
4. **Is the assertion meaningful?** → `expect(result != nil)` is weaker than `expect(result == expectedValue)`
**Red flags (REJECT implementation until fixed):**
- ❌ Tests that only verify syntax/existence ("enum has cases", "struct has fields")
- ❌ Tautological tests (pass by definition: `expect(builder.build() != nil)` when build() can't return nil)
- ❌ Tests that duplicate implementation (testing 1+1==2 by asserting 1+1==2)
- ❌ Tests without meaningful assertions (call code but don't verify outcomes matter)
- ❌ Tests that verify mock behavior instead of production code
- ❌ Codable/Equatable round-trip tests with only happy path data
- ❌ Generic test names ("test_basic", "test_it_works", "test_model")
**Examples of meaningless tests to reject:**
```swift
// ❌ REJECT: Tautological - compiler ensures enum has cases
func testEnumHasCases() {
_ = MyEnum.caseOne // This proves nothing
_ = MyEnum.caseTwo
}
// ❌ REJECT: Tautological - build() returns non-optional, can't be nil
func testBuilderReturnsValue() {
let result = Builder().build()
#expect(result != nil) // Always passes by type system
}
// ❌ REJECT: Tests mock, not production code
func testServiceCallsAPI() {
let mock = MockAPI()
let service = Service(api: mock)
service.fetchData()
#expect(mock.fetchCalled) // Tests mock behavior, not real logic
}
// ❌ REJECT: Happy path only, no edge cases
func testCodable() {
let original = User(name: "John", age: 30)
let data = try! encoder.encode(original)
let decoded = try! decoder.decode(User.self, from: data)
#expect(decoded == original) // What about empty name? Max age? Unicode?
}
```
**Examples of meaningful tests to approve:**
```swift
// ✅ APPROVE: Catches missing validation bug
func testEmptyPayloadReturnsValidationError() {
let result = validator.validate(payload: "")
#expect(result == .error(.emptyPayload))
}
// ✅ APPROVE: Catches race condition bug
func testConcurrentWritesDontCorruptData() {
let store = ThreadSafeStore()
DispatchQueue.concurrentPerform(iterations: 1000) { i in
store.write(key: "k\(i)", value: i)
}
#expect(store.count == 1000) // Would fail if race condition exists
}
// ✅ APPROVE: Catches error handling bug
func testMalformedJSONReturns400Not500() {
let response = api.parse(json: "{invalid")
#expect(response.status == 400) // Not 500 which would indicate unhandled exception
}
// ✅ APPROVE: Catches encoding bug with edge case
func testUnicodeNamePreservedAfterRoundtrip() {
let original = User(name: "日本語テスト 🎉")
let decoded = roundtrip(original)
#expect(decoded.name == original.name)
}
```
**Audit process:**
```bash
# Find all new/modified test files
git diff main...HEAD --name-only | grep -E "(test|spec)"
# Read each test file
Read tool: tests/new_feature_test.swift
# For EACH test function, document:
# - Test name
# - What bug it catches (or "TAUTOLOGICAL" if none)
# - Verdict: ✅ Keep / ⚠️ Strengthen / ❌ Remove/Replace
```
**If tautological tests found:**
```markdown
## Test Quality Audit: GAPS FOUND ❌
### Tautological/Meaningless Tests
| Test | Problem | Action |
|------|---------|--------|
| testEnumHasCases | Compiler already ensures this | ❌ Remove |
| testBuilderReturns | Non-optional return, can't be nil | ❌ Remove |
| testCodable | Happy path only, no edge cases | ⚠️ Add: empty, unicode, max values |
| testServiceCalls | Tests mock, not production | ❌ Replace with integration test |
**Cannot approve until tests are meaningful.**
```
---
### F. Verify Success Criteria with Evidence
For EACH criterion in bd task:
- Run verification command
- Check actual output
- Don't assume - verify with evidence
- Use hyperpowers:test-runner for tests/lints
**Example:**
```
Criterion: "All tests passing"
Command: cargo test
Evidence: "127 tests passed, 0 failures"
Result: ✅ Met
Criterion: "No unwrap in production"
Command: rg "\.unwrap\(\)" src/
Evidence: "No matches"
Result: ✅ Met
```
---
### G. Check Anti-Patterns
Search for each prohibited pattern from bd task:
```bash
# Example anti-patterns from task
rg "\.unwrap\(\)" src/ # If task prohibits unwrap
rg "TODO" src/ # If task prohibits untracked TODOs
rg "\.skip\(\)" tests/ # If task prohibits skipped tests
```
---
### H. Verify Key Considerations
Read code to confirm edge cases handled:
- Empty input validation
- Unicode handling
- Concurrent access
- Failure modes
- Performance concerns
**Example:** Task says "Must handle empty payload" → Find validation code for empty payload.
---
### I. Record Findings
```markdown
### Task: bd-3 - Implement JWT authentication
#### Evidence-Based Findings
| Criterion | Status | Confidence | Evidence |
|-----------|--------|------------|----------|
| All tests pass | ✅ Met | 1.0 | `cargo test`: 127 passed |
| Pre-commit passes | ❌ Not met | 1.0 | `cargo clippy`: 3 warnings |
| No unwrap in production | ❌ Not met | 1.0 | `rg "\.unwrap()"`: src/auth/jwt.ts:45 |
#### File Evidence
| File | Line | What Verified | Confidence |
|------|------|---------------|------------|
| src/auth/jwt.ts | 45 | unwrap violation | 1.0 |
| src/auth/jwt.ts | 12-30 | token generation logic | 0.9 |
**Findings below 0.8:** None (all verified)
#### Automated Checks
- TODOs: ✅ None
- Stubs: ✅ None
- Unsafe patterns: ❌ Found `.unwrap()` at src/auth/jwt.ts:45
- Ignored tests: ✅ None
#### Quality Gates
- Tests: ✅ Pass (127 tests)
- Formatting: ✅ Pass
- Linting: ❌ 3 warnings
- Pre-commit: ❌ Fails due to linting
#### Files Reviewed
- src/auth/jwt.ts: ⚠️ Contains `.unwrap()` at line 45
- tests/auth/jwt_test.rs: ✅ Complete
#### Code Quality
- Error Handling: ⚠️ Uses unwrap instead of proper error propagation
- Safety: ✅ Good
- Clarity: ✅ Good
- Testing: See Test Quality Audit below
#### Test Quality Audit (New/Modified Tests)
| Test | Bug It Catches | Verdict |
|------|----------------|---------|
| test_valid_token_accepted | Missing validation | ✅ Keep |
| test_expired_token_rejected | Expiration bypass | ✅ Keep |
| test_jwt_struct_exists | Nothing (tautological) | ❌ Remove |
| test_encode_decode | Encoding bug (but happy path only) | ⚠️ Add edge cases |
**Tautological tests found:** 1 (test_jwt_struct_exists)
**Weak tests found:** 1 (test_encode_decode needs edge cases)
#### Anti-Patterns
- "NO unwrap in production": ❌ Violated at src/auth/jwt.ts:45
#### Issues
**Critical:**
1. unwrap() at jwt.ts:45 - violates anti-pattern, must use proper error handling
2. Tautological test: test_jwt_struct_exists must be removed
**Important:**
3. 3 clippy warnings block pre-commit hook
4. test_encode_decode needs edge cases (empty, unicode, max length)
```
---
### J. Mark Task Reviewed (TodoWrite)
---
## Step 3: Report Findings
After reviewing ALL tasks:
**If NO gaps:**
```markdown
## Implementation Review: APPROVED ✅
Reviewed bd-1 (OAuth Authentication) against implementation.
### Tasks Reviewed
- bd-2: Configure OAuth provider ✅
- bd-3: Implement token exchange ✅
- bd-4: Add refresh logic ✅
### Verification Summary
- All success criteria verified
- No anti-patterns detected
- All key considerations addressed
- All files implemented per spec
### Evidence
- Tests: 127 passed, 0 failures (2.3s)
- Linting: No warnings
- Pre-commit: Pass
- Code review: Production-ready
Ready to proceed to hyperpowers:finishing-a-development-branch.
```
**If gaps found:**
```markdown
## Implementation Review: GAPS FOUND ❌
Reviewed bd-1 (OAuth Authentication) against implementation.
### Tasks with Gaps
#### bd-3: Implement token exchange
**Gaps:**
- ❌ Success criterion not met: "Pre-commit hooks pass"
- Evidence: cargo clippy shows 3 warnings
- ❌ Anti-pattern violation: Found `.unwrap()` at src/auth/jwt.ts:45
- ⚠️ Key consideration not addressed: "Empty payload validation"
- No check for empty payload in generateToken()
#### bd-4: Add refresh logic
**Gaps:**
- ❌ Success criterion not met: "All tests passing"
- Evidence: test_verify_expired_token failing
### Cannot Proceed
Implementation does not match spec. Fix gaps before completing.
```
---
## Step 4: Gate Decision
**If APPROVED:**
```
Announce: "I'm using hyperpowers:finishing-a-development-branch to complete this work."
Use Skill tool: hyperpowers:finishing-a-development-branch
```
**If GAPS FOUND:**
```
STOP. Do not proceed to finishing-a-development-branch.
Fix gaps or discuss with partner.
Re-run review after fixes.
```
</the_process>
<examples>
<example>
<scenario>Developer only checks git diff, doesn't read actual files</scenario>
<code>
# Review process
git diff main...HEAD # Shows changes
# Developer sees:
+ function generateToken(payload) {
+ return jwt.sign(payload, secret);
+ }
# Approves based on diff
"Looks good, token generation implemented ✅"
# Misses: Full context shows no validation
function generateToken(payload) {
// No validation of payload!
// No check for empty payload (key consideration)
// No error handling if jwt.sign fails
return jwt.sign(payload, secret);
}
</code>
<why_it_fails>
- Git diff shows additions, not full context
- Missed that empty payload not validated (key consideration)
- Missed that error handling missing (quality issue)
- False approval - gaps exist but not caught
- Will fail in production when empty payload passed
</why_it_fails>
<correction>
**Correct review process:**
```bash
# See changes
git diff main...HEAD -- src/auth/jwt.ts
# THEN READ FULL FILE
Read tool: src/auth/jwt.ts
```
**Reading full file reveals:**
```javascript
function generateToken(payload) {
// Missing: empty payload check (key consideration from bd task)
// Missing: error handling for jwt.sign failure
return jwt.sign(payload, secret);
}
```
**Record in findings:**
```
⚠️ Key consideration not addressed: "Empty payload validation"
- No check for empty payload in generateToken()
- Code at src/auth/jwt.ts:15-17
⚠️ Error handling: jwt.sign can throw, not handled
```
**What you gain:**
- Caught gaps that git diff missed
- Full context reveals missing validation
- Quality issues identified before production
- Spec compliance verified, not assumed
</correction>
</example>
<example>
<scenario>Developer assumes tests passing means done</scenario>
<code>
# Run tests
cargo test
# Output: 127 tests passed
# Developer concludes
"Tests pass, implementation complete ✅"
# Proceeds to finishing-a-development-branch
# Misses:
- bd task has 5 success criteria
- Only checked 1 (tests pass)
- Anti-pattern: unwrap() present (prohibited)
- Key consideration: Unicode handling not tested
- Linter has warnings (blocks pre-commit)
</code>
<why_it_fails>
- Tests passing ≠ spec compliance
- Didn't verify all success criteria
- Didn't check anti-patterns
- Didn't verify key considerations
- Pre-commit will fail (blocks merge)
- Ships code violating anti-patterns
</why_it_fails>
<correction>
**Correct review checks ALL criteria:**
```markdown
bd task has 5 success criteria:
1. "All tests pass" ✅ - Evidence: 127 passed
2. "Pre-commit passes" ❌ - Evidence: clippy warns (3 warnings)
3. "No unwrap in production" ❌ - Evidence: Found at jwt.ts:45
4. "Unicode handling tested" ⚠️ - Need to verify test exists
5. "Rate limiting implemented" ⚠️ - Need to check code
Result: 1/5 criteria verified met. GAPS EXIST.
```
**Run additional checks:**
```bash
# Check criterion 2
cargo clippy
# 3 warnings found ❌
# Check criterion 3
rg "\.unwrap\(\)" src/
# src/auth/jwt.ts:45 ❌
# Check criterion 4
rg "unicode" tests/
# No matches ⚠️ Need to verify
```
**Decision: GAPS FOUND, cannot proceed**
**What you gain:**
- Verified ALL criteria, not just tests
- Caught anti-pattern violations
- Caught pre-commit blockers
- Prevented shipping non-compliant code
- Spec contract honored completely
</correction>
</example>
<example>
<scenario>Developer rationalizes skipping rigor for "simple" task</scenario>
<code>
bd task: "Add logging to error paths"
# Developer thinks: "Simple task, just added console.log"
# Skips:
- Automated checks (assumes no issues)
- Code quality review (seems obvious)
- Full success criteria verification
# Approves quickly:
"Logging added ✅"
# Misses:
- console.log used instead of proper logger (anti-pattern)
- Only added to 2 of 5 error paths (incomplete)
- No test verifying logs actually output (criterion)
- Logs contain sensitive data (security issue)
</code>
<why_it_fails>
- "Simple" tasks have hidden complexity
- Skipped rigor catches exactly these issues
- Incomplete implementation (2/5 paths)
- Security vulnerability shipped
- Anti-pattern not caught
- Failed success criterion (test logs)
</why_it_fails>
<correction>
**Follow full review process:**
```bash
# Automated checks
rg "console\.log" src/
# Found at error-handler.ts:12, 15 ⚠️
# Read bd task
bd show bd-5
# Success criteria:
# 1. "All error paths logged"
# 2. "No sensitive data in logs"
# 3. "Test verifies log output"
# Check criterion 1
grep -n "throw new Error" src/
# 5 locations found
# Only 2 have logging ❌ Incomplete
# Check criterion 2
Read tool: src/error-handler.ts
# Logs contain password field ❌ Security issue
# Check criterion 3
rg "test.*log" tests/
# No matches ❌ Test missing
```
**Decision: GAPS FOUND**
- Incomplete (3/5 error paths missing logs)
- Security issue (logs password)
- Anti-pattern (console.log instead of logger)
- Missing test
**What you gain:**
- "Simple" task revealed multiple gaps
- Security vulnerability caught pre-production
- Rigor prevents incomplete work shipping
- All criteria must be met, no exceptions
</correction>
</example>
<example>
<scenario>Developer approves implementation with high test coverage but tautological tests</scenario>
<code>
# Test results show good coverage
cargo test
# 45 tests passed ✅
# Coverage: 92% ✅
# Developer approves based on numbers
"Tests pass with 92% coverage, implementation complete ✅"
# Proceeds to finishing-a-development-branch
# Later in production:
# - Validation bypassed because test only checked "validator exists"
# - Race condition because test only checked "lock was acquired"
# - Encoding corruption because test only checked "encode != nil"
</code>
<why_it_fails>
- High coverage doesn't mean meaningful tests
- Tests verified existence/syntax, not behavior
- Tautological tests passed by definition:
- `expect(validator != nil)` - always passes, doesn't test validation logic
- `expect(lock.acquire())` - tests mock, not thread safety
- `expect(encoded.count > 0)` - tests non-empty, not correctness
- Production bugs occurred despite "good" test coverage
- Coverage metrics were gamed with meaningless tests
</why_it_fails>
<correction>
**Audit each test for meaningfulness:**
```bash
# Find new tests
git diff main...HEAD --name-only | grep test
# Read and audit each test
Read tool: tests/validator_test.swift
```
**For each test, document:**
```markdown
#### Test Quality Audit
| Test | Assertion | Bug Caught? | Verdict |
|------|-----------|-------------|---------|
| testValidatorExists | `!= nil` | ❌ None (compiler checks) | ❌ Remove |
| testValidInput | `isValid == true` | ⚠️ Happy path only | ⚠️ Add edge cases |
| testEmptyInputFails | `isValid == false` | ✅ Missing validation | ✅ Keep |
| testLockAcquired | mock.acquireCalled | ❌ Tests mock | ❌ Replace |
| testConcurrentAccess | count == expected | ✅ Race condition | ✅ Keep |
| testEncodeNotNil | `!= nil` | ❌ Type guarantees this | ❌ Remove |
| testUnicodeRoundtrip | decoded == original | ✅ Encoding corruption | ✅ Keep |
**Tautological tests:** 3 (must remove)
**Weak tests:** 1 (must strengthen)
**Meaningful tests:** 3 (keep)
```
**Decision: GAPS FOUND ❌**
```markdown
## Test Quality Audit: GAPS FOUND
### Tautological Tests (Must Remove)
- testValidatorExists: Compiler ensures non-nil, test proves nothing
- testLockAcquired: Tests mock behavior, not actual thread safety
- testEncodeNotNil: Return type is non-optional, can never be nil
### Weak Tests (Must Strengthen)
- testValidInput: Only happy path, add:
- testEmptyStringRejected
- testMaxLengthRejected
- testUnicodeNormalized
### Action Required
Remove 3 tautological tests, add 3 edge case tests, then re-review.
```
**What you gain:**
- Real test quality, not coverage theater
- Bugs caught before production
- Tests that actually verify behavior
- Confidence in test suite
</correction>
</example>
<example>
<scenario>Developer completes refactoring but leaves old implementation as fallback</scenario>
<code>
# After refactoring auth system:
git diff shows:
+ function authenticateV2(token) { ... } # New implementation
function authenticate(token) { ... } # Old still exists!
function authenticateLegacy(token) { ... } # Even older!
# In config:
const USE_LEGACY_AUTH = process.env.LEGACY_AUTH ?? true
# Developer claims: "Refactoring complete"
</code>
<why_it_fails>
- Old implementations still present (authenticate, authenticateLegacy)
- Feature flag enables fallback to old code
- Tests may still use old functions
- No single canonical implementation
- Technical debt increased, not decreased
- "Refactoring" actually added code instead of replacing
</why_it_fails>
<correction>
**Run dead code audit:**
```bash
# Fallback patterns
rg -i "legacy|old_|fallback" src/
# Found: authenticateLegacy, USE_LEGACY_AUTH ❌
# Check callers
rg "authenticate\(" src/ --type ts
# authenticate: 0 callers ❌ DEAD
# authenticateLegacy: 0 callers ❌ DEAD
# authenticateV2: 15 callers ✅ ACTIVE
```
**Dead Code Audit Results:**
| Category | Pattern | Found | Location | Action |
|----------|---------|-------|----------|--------|
| Fallback code | `legacy\|fallback` | 2 | auth.ts:45,89 | ❌ Delete |
| Unused functions | no callers | 2 | authenticate(), authenticateLegacy() | ❌ Delete |
| Feature flags | `USE_LEGACY` | 1 | config.ts:12 | ❌ Delete |
**Decision: GAPS FOUND ❌**
```markdown
## Dead Code Audit: GAPS FOUND
### Refactoring Remnants
- authenticate() at auth.ts:12 - 0 callers, delete
- authenticateLegacy() at auth.ts:45 - 0 callers, delete
- USE_LEGACY_AUTH flag at config.ts:12 - enables dead code, delete
### Required Actions
1. Delete authenticate() - replaced by authenticateV2()
2. Delete authenticateLegacy() - obsolete
3. Delete USE_LEGACY_AUTH flag - no longer needed
4. Rename authenticateV2() to authenticate() (cleaner API)
5. Update/delete tests for removed functions
**Cannot approve until old code is removed.**
```
**What you gain:**
- Single canonical implementation
- No dead code accumulation
- Tests test actual functionality
- Technical debt reduced, not increased
- Refactoring actually complete
</correction>
</example>
</examples>
<critical_rules>
## Rules That Have No Exceptions
1. **Review every task** → No skipping "simple" tasks
2. **Run all automated checks** → TODOs, stubs, unwrap, ignored tests
3. **Run dead code audit** → Fallback code, unused functions, deprecation markers
4. **Read actual files with Read tool** → Not just git diff
5. **Verify every success criterion** → With evidence, not assumptions
6. **Check all anti-patterns** → Search for prohibited patterns
7. **Apply Google Fellow scrutiny** → Production-grade code review
8. **Audit all new tests for meaningfulness** → Tautological tests = gaps, not coverage
9. **If gaps found → STOP** → Don't proceed to finishing-a-development-branch
## Common Excuses
All of these mean: **STOP. Follow full review process.**
- "Tests pass, must be complete" (Tests ≠ spec, check all criteria)
- "I implemented it, it's done" (Implementation ≠ compliance, verify)
- "No time for thorough review" (Gaps later cost more than review now)
- "Looks good to me" (Opinion ≠ evidence, run verifications)
- "Small gaps don't matter" (Spec is contract, all criteria matter)
- "Will fix in next PR" (This PR completes this epic, fix now)
- "Can check diff instead of files" (Diff shows changes, not context)
- "Automated checks cover it" (Checks + code review both required)
- "Success criteria passing means done" (Also check anti-patterns, quality, edge cases)
- "Tests exist, so testing is complete" (Tautological tests = false confidence)
- "Coverage looks good" (Coverage can be gamed with meaningless tests)
- "Tests are boilerplate, don't need review" (Every test must catch a real bug)
- "It's just a simple existence check" (Compiler already checks existence)
- "Keeping old code as fallback is safe" (Old code = dead code, delete it)
- "We might need the old implementation later" (Version control remembers, delete now)
- "Backwards compatibility requires the shim" (Internal code doesn't need backwards compat)
- "Deprecation marker is enough" (Deprecation = "delete soon", not "keep forever")
- "The old tests still pass" (Tests for removed code = orphaned tests, delete)
</critical_rules>
<verification_checklist>
Before approving implementation:
**Per task:**
- [ ] Read bd task specification completely
- [ ] Ran all automated checks (TODOs, stubs, unwrap, ignored tests)
- [ ] **Ran dead code audit (fallback patterns, unused code, deprecation, orphaned tests)**
- [ ] Ran all quality gates via test-runner agent (tests, format, lint, pre-commit)
- [ ] Read actual implementation files with Read tool (not just diff)
- [ ] Reviewed code quality with Google Fellow perspective
- [ ] **Audited all new tests for meaningfulness (not tautological)**
- [ ] Verified every success criterion with evidence
- [ ] Checked every anti-pattern (searched for prohibited patterns)
- [ ] Verified every key consideration addressed in code
**Overall:**
- [ ] Reviewed ALL tasks (no exceptions)
- [ ] TodoWrite tracker shows all tasks reviewed
- [ ] Compiled findings (approved or gaps)
- [ ] If approved: all criteria met for all tasks
- [ ] If gaps: documented exactly what missing
**Can't check all boxes?** Return to Step 2 and complete review.
</verification_checklist>
<integration>
**This skill is called by:**
- hyperpowers:executing-plans (Step 5, after all tasks executed)
**This skill calls:**
- hyperpowers:finishing-a-development-branch (if approved)
- hyperpowers:test-runner agent (for quality gates)
**This skill uses:**
- hyperpowers:verification-before-completion principles (evidence before claims)
**Call chain:**
```
hyperpowers:executing-plans → hyperpowers:review-implementation → hyperpowers:finishing-a-development-branch
↓
(if gaps: STOP)
```
**CRITICAL:** Use bd commands (bd show, bd list, bd dep tree), never read `.beads/issues.jsonl` directly.
</integration>
<resources>
**Detailed guides:**
- [Code quality standards by language](resources/quality-standards.md)
- [Common anti-patterns to check](resources/anti-patterns-reference.md)
- [Production readiness checklist](resources/production-checklist.md)
**When stuck:**
- Unsure if gap critical → If violates criterion, it's a gap
- Criteria ambiguous → Ask user for clarification before approving
- Anti-pattern unclear → Search for it, document if found
- Quality concern → Document as gap, don't rationalize away
</resources>Related Skills
software-code-review
Use when reviewing code, pull requests, or diffs. Provides patterns, checklists, and templates for systematic code review with a focus on correctness, security, readability, performance, and maintainability.
slo-implementation
Define and implement Service Level Indicators (SLIs) and Service Level Objectives (SLOs) with error budgets and alerting. Use when establishing reliability targets, implementing SRE practices, or measuring service performance.
skill-latex-implementation
Implement LaTeX documents following a plan. Invoke for LaTeX-language implementation tasks.
skill-code-review
Perform thorough, constructive code reviews focusing on correctness, security, maintainability, and best practices
semgrep-coderabbit-review
Two-stage code review combining fast pattern detection (Semgrep) with AI-powered semantic analysis (CodeRabbit)
security-review
Performs security reviews of Hone code using OWASP guidelines. Use when reviewing database queries, CSV import logic, API endpoints, authentication, encryption, or when the user asks about security.
search-web-implementation
Search the web monorepo (../app) to find how web handles equivalent functionality. Use when implementing mobile features that need to match web behavior, finding web routes, or understanding how web handles a specific feature like statements, portfolios, or user flows.
reviewing-python-architecture
Review ADRs to check they follow testing principles and parent PDR constraints. Use when reviewing ADRs or architecture decisions.
review-fix
Read the latest code review, plan fixes for all findings, then execute the fixes — all in one command. Produces a dated plan and execution log in .reviews/.
review-changes
[Review & Quality] Review all uncommitted changes before commit
review-architecture
Verify DDD patterns, Clean Architecture boundaries, and bITdevKit-specific conventions in modular monolith projects
requesting-code-review
Use when completing tasks, implementing major features, or before merging to verify work meets requirements