aposd-verifying-correctness

Use after implementing code. Triggers on: is it done, ready to commit, verify correctness, did I miss anything, pre-commit check.

211 stars

byryanthedev

View on GitHub Installation ↓

Best use case

aposd-verifying-correctness is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Use after implementing code. Triggers on: is it done, ready to commit, verify correctness, did I miss anything, pre-commit check.

Teams using aposd-verifying-correctness should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/aposd-verifying-correctness/SKILL.md --create-dirs "https://raw.githubusercontent.com/ryanthedev/code-foundations/main/skills/aposd-verifying-correctness/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/aposd-verifying-correctness/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How aposd-verifying-correctness Compares

Feature / Agent	aposd-verifying-correctness	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Use after implementing code. Triggers on: is it done, ready to commit, verify correctness, did I miss anything, pre-commit check.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Skill: aposd-verifying-correctness

## STOP - Before "Done"

**Design quality ≠ correctness.** Well-designed code can still have bugs, missing requirements, or safety issues.

**Run ALL dimension checks before claiming done.** "I think I covered everything" without explicit mapping is a red flag.

---

## Dimension Detection & Checks

For each dimension: detect if it applies, then verify.

---

### 1. Requirements Coverage

**Detect:** Were requirements stated? (explicit list, user request, spec)

**If YES, verify:**
- [ ] List each requirement explicitly
- [ ] For each: point to code that implements it
- [ ] Any requirement without code? → **Not done**
- [ ] Any code without requirement? → Scope creep or missing requirement

**Red flag:** "I think I covered everything" without explicit mapping

---

### 2. Concurrency Safety

**Detect:** Any of these present?
- Multiple threads/processes accessing same data
- Async/await patterns
- Shared mutable state (class attributes, globals)
- "Thread-safe" in requirements or docstring
- Web handlers, queue workers, background tasks

**If YES, verify:**
- [ ] All shared mutable state identified
- [ ] Each access point protected (lock, atomic, queue, immutable)
- [ ] No time-of-check to time-of-use (TOCTOU) gaps
- [ ] Lock ordering consistent (if multiple locks)

**Red flag:** "It's probably fine" or "Python GIL handles it"

---

### 3. Error Handling

**Detect:** Can any operation fail?
- I/O (file, network, database)
- External calls (APIs, subprocesses)
- Resource acquisition (memory, connections)
- User input processing
- Parsing/deserialization

**If YES, verify:**
- [ ] Each failure point has explicit handling OR propagates
- [ ] No bare `except:` or `except Exception: pass`
- [ ] Error messages actionable (what failed, why, how to fix)
- [ ] Partial failures handled (rollback, cleanup, consistent state)

**Red flag:** "Errors are rare" or "caller handles it" without checking caller

---

### 4. Resource Management

**Detect:** Does code acquire resources?
- File handles, sockets, connections
- Locks, semaphores
- Memory allocations (large buffers, caches)
- External service handles
- Background threads/processes

**If YES, verify:**
- [ ] Every acquire has corresponding release
- [ ] Release happens in finally/context manager/destructor
- [ ] Release happens on error paths too
- [ ] No resource leaks on repeated calls
- [ ] Bounded growth (caches have limits, queues have limits)

**Red flag:** "It cleans up eventually" or daemon threads without shutdown

---

### 5. Boundary Conditions

**Detect:** Does code handle variable-size input?
- Collections (lists, dicts, sets)
- Strings, byte arrays
- Numeric ranges
- Optional/nullable values

**If YES, verify:**
- [ ] Empty input: What happens with `[]`, `""`, `None`, `0`?
- [ ] Single item: Edge case often different from N items
- [ ] Maximum size: What if input is huge? Memory? Time?
- [ ] Invalid values: Negative numbers, NaN, special characters?
- [ ] Type boundaries: int overflow, float precision?

**Red flag:** "Nobody would pass that" or "that's an edge case"

---

### 6. Security (if applicable)

**Detect:** Does code handle untrusted input?
- User-provided data (forms, API requests)
- File contents from external sources
- URLs, paths, identifiers from users
- Data that becomes SQL, shell, HTML, or code

**If YES, verify:**
- [ ] Input validated before use
- [ ] No string concatenation for SQL/shell/HTML (use parameterized)
- [ ] Path traversal prevented (no `../` exploitation)
- [ ] Secrets not logged or exposed in errors
- [ ] Auth/authz checked before action, not after

**Red flag:** "It's internal only" (internals get exposed)

---

## Quick Checklist (Minimum)

Before "done", answer YES to all that apply:

| Dimension | Detection Trigger | Verified? |
|-----------|-------------------|-----------|
| Requirements | Requirements were stated | [ ] Each mapped to code |
| Concurrency | Shared state exists | [ ] All access protected |
| Errors | Operations can fail | [ ] All failures handled |
| Resources | Resources acquired | [ ] All released (incl. errors) |
| Boundaries | Variable-size input | [ ] Edge cases handled |
| Security | Untrusted input | [ ] Input validated |

---

## Output Format

When verifying, output:

```
## Correctness Verification

### Requirements: [PASS/FAIL/N/A]
- Requirement 1 → implemented in X
- Requirement 2 → implemented in Y

### Concurrency: [PASS/FAIL/N/A]
- Shared state: [list]
- Protection: [how]

### Errors: [PASS/FAIL/N/A]
- Failure points: [list]
- Handling: [approach]

### Resources: [PASS/FAIL/N/A]
- Acquired: [list]
- Released: [how]

### Boundaries: [PASS/FAIL/N/A]
- Edge cases: [list]
- Handling: [approach]

### Security: [PASS/FAIL/N/A]
- Untrusted input: [list]
- Validation: [approach]

**Verdict:** [DONE / NOT DONE - list blockers]
```

---

## Relationship to Other Skills

| Skill | Focus | When |
|-------|-------|------|
| **aposd-designing-deep-modules** | Design quality | FIRST—during design |
| **aposd-verifying-correctness** | Actual correctness | BEFORE "done" |
| **cc-quality-practices** | Testing/debugging | Throughout |

**Order:** Design → Implement → Verify (this skill) → Done


---

## Chain

| After | Next |
|-------|------|
| All dimensions pass | Done (pre-commit gate) |

Related Skills

aposd-simplifying-complexity

211

from ryanthedev/code-foundations

Use when code is too complex, has scattered error handling, configuration explosion, or callers doing module work. Triggers on: too complex, simplify, scattered errors, configuration proliferation, verbose error handling

aposd-reviewing-module-design

211

from ryanthedev/code-foundations

Use when reviewing code, assessing interfaces, during PR review, or evaluating 'is this too complex?' Triggers on: code review, design review, module complexity, interface assessment, PR review, structural analysis.

aposd-designing-deep-modules

211

from ryanthedev/code-foundations

Use when designing modules, APIs, or classes before implementation.

whiteboarding-planning

211

from ryanthedev/code-foundations

Standard/Full planning pipeline for whiteboarding. Steps: discover, classify, explore, detail, save, check, confirm, handoff. Use when dispatched from whiteboarding command for Medium/Complex tasks. Triggers on 'planning pipeline', 'standard track', 'full track'.

welc-legacy-code

211

from ryanthedev/code-foundations

Use when facing untested legacy code, test harness problems, dependency issues, or time pressure. Triggers on: legacy code, no tests, can't test, afraid to change, need to modify untested code.

performance-optimization

211

from ryanthedev/code-foundations

Use when code is too slow, has performance issues, timeouts, OOM errors, high CPU/memory, or doesn't scale. Triggers on: profiler hot spots, latency complaints, needs optimization, critical path analysis.

code-clarity-and-docs

211

from ryanthedev/code-foundations

Use when reviewing code clarity, writing comments, checking documentation accuracy, or auditing AI-facing docs. Triggers on: naming, comments, documentation, README, CLAUDE.md.

clarify

211

from ryanthedev/code-foundations

Decompose user intent through structured brainstorming. Detects underspecification, ambiguity, and false premises through hypothesis-driven questioning. Use when a request is unclear, could have multiple valid interpretations, or critical details are missing.

cc-routine-and-class-design

211

from ryanthedev/code-foundations

Use when designing routines or classes, reviewing class interfaces, choosing between inheritance and containment, or evaluating routine cohesion. Also trigger when inheritance is used without LSP verification, or when design issues are present despite passing tests

cc-refactoring-guidance

211

from ryanthedev/code-foundations

Use when modifying existing code, improving structure without changing behavior, or deciding between refactor, rewrite, or fix-first.

cc-quality-practices

211

from ryanthedev/code-foundations

Use when planning QA, choosing review methods, designing tests, or debugging fails. Triggers on: defects found late, tests pass but production bugs, coverage disputes, review ineffective, spending excessive time debugging.

cc-pseudocode-programming

211

from ryanthedev/code-foundations

Use when designing routines, stuck on where to start coding, caught in compile-debug loops, or code works but you don't understand why. Triggers on: starting a new coding task