tdd-guide
Test-driven development skill for writing unit tests, generating test fixtures and mocks, analyzing coverage gaps, and guiding red-green-refactor workflows across Jest, Pytest, JUnit, Vitest, and Mocha. Use when the user asks to write tests, improve test coverage, practice TDD, generate mocks or stubs, or mentions testing frameworks like Jest, pytest, or JUnit.
Best use case
tdd-guide is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Test-driven development skill for writing unit tests, generating test fixtures and mocks, analyzing coverage gaps, and guiding red-green-refactor workflows across Jest, Pytest, JUnit, Vitest, and Mocha. Use when the user asks to write tests, improve test coverage, practice TDD, generate mocks or stubs, or mentions testing frameworks like Jest, pytest, or JUnit.
Teams using tdd-guide should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/tdd-guide/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How tdd-guide Compares
| Feature / Agent | tdd-guide | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Test-driven development skill for writing unit tests, generating test fixtures and mocks, analyzing coverage gaps, and guiding red-green-refactor workflows across Jest, Pytest, JUnit, Vitest, and Mocha. Use when the user asks to write tests, improve test coverage, practice TDD, generate mocks or stubs, or mentions testing frameworks like Jest, pytest, or JUnit.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
Cursor vs Codex for AI Workflows
Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.
SKILL.md Source
# TDD Guide
Test-driven development skill for generating tests, analyzing coverage, and guiding red-green-refactor workflows across Jest, Pytest, JUnit, and Vitest.
---
## Workflows
### Generate Tests from Code
1. Provide source code (TypeScript, JavaScript, Python, Java)
2. Specify target framework (Jest, Pytest, JUnit, Vitest)
3. Run `test_generator.py` with requirements
4. Review generated test stubs
5. **Validation:** Tests compile and cover happy path, error cases, edge cases
### Analyze Coverage Gaps
1. Generate coverage report from test runner (`npm test -- --coverage`)
2. Run `coverage_analyzer.py` on LCOV/JSON/XML report
3. Review prioritized gaps (P0/P1/P2)
4. Generate missing tests for uncovered paths
5. **Validation:** Coverage meets target threshold (typically 80%+)
### TDD New Feature
1. Write failing test first (RED)
2. Run `tdd_workflow.py --phase red` to validate
3. Implement minimal code to pass (GREEN)
4. Run `tdd_workflow.py --phase green` to validate
5. Refactor while keeping tests green (REFACTOR)
6. **Validation:** All tests pass after each cycle
---
## Examples
### Test Generation — Input → Output (Pytest)
**Input source function (`math_utils.py`):**
```python
def divide(a: float, b: float) -> float:
if b == 0:
raise ValueError("Cannot divide by zero")
return a / b
```
**Command:**
```bash
python scripts/test_generator.py --input math_utils.py --framework pytest
```
**Generated test output (`test_math_utils.py`):**
```python
import pytest
from math_utils import divide
class TestDivide:
def test_divide_positive_numbers(self):
assert divide(10, 2) == 5.0
def test_divide_negative_numerator(self):
assert divide(-10, 2) == -5.0
def test_divide_float_result(self):
assert divide(1, 3) == pytest.approx(0.333, rel=1e-3)
def test_divide_by_zero_raises_value_error(self):
with pytest.raises(ValueError, match="Cannot divide by zero"):
divide(10, 0)
def test_divide_zero_numerator(self):
assert divide(0, 5) == 0.0
```
---
### Coverage Analysis — Sample P0/P1/P2 Output
**Command:**
```bash
python scripts/coverage_analyzer.py --report lcov.info --threshold 80
```
**Sample output:**
```
Coverage Report — Overall: 63% (threshold: 80%)
P0 — Critical gaps (uncovered error paths):
auth/login.py:42-58 handle_expired_token() 0% covered
payments/process.py:91-110 handle_payment_failure() 0% covered
P1 — High-value gaps (core logic branches):
users/service.py:77 update_profile() — else branch 0% covered
orders/cart.py:134 apply_discount() — zero-qty guard 0% covered
P2 — Low-risk gaps (utility / helper functions):
utils/formatting.py:12 format_currency() 0% covered
Recommended: Generate tests for P0 items first to reach 80% threshold.
```
---
## Key Tools
| Tool | Purpose | Usage |
|------|---------|-------|
| `test_generator.py` | Generate test cases from code/requirements | `python scripts/test_generator.py --input source.py --framework pytest` |
| `coverage_analyzer.py` | Parse and analyze coverage reports | `python scripts/coverage_analyzer.py --report lcov.info --threshold 80` |
| `tdd_workflow.py` | Guide red-green-refactor cycles | `python scripts/tdd_workflow.py --phase red --test test_auth.py` |
| `fixture_generator.py` | Generate test data and mocks | `python scripts/fixture_generator.py --entity User --count 5` |
Additional scripts: `framework_adapter.py` (convert between frameworks), `metrics_calculator.py` (quality metrics), `format_detector.py` (detect language/framework), `output_formatter.py` (CLI/desktop/CI output).
---
## Input Requirements
**For Test Generation:**
- Source code (file path or pasted content)
- Target framework (Jest, Pytest, JUnit, Vitest)
- Coverage scope (unit, integration, edge cases)
**For Coverage Analysis:**
- Coverage report file (LCOV, JSON, or XML format)
- Optional: Source code for context
- Optional: Target threshold percentage
**For TDD Workflow:**
- Feature requirements or user story
- Current phase (RED, GREEN, REFACTOR)
- Test code and implementation status
---
## Spec-First Workflow
TDD is most effective when driven by a written spec. The flow:
1. **Write or receive a spec** — stored in `specs/<feature>.md`
2. **Extract acceptance criteria** — each criterion becomes one or more test cases
3. **Write failing tests (RED)** — one test per acceptance criterion
4. **Implement minimal code (GREEN)** — satisfy each test in order
5. **Refactor** — clean up while all tests stay green
### Spec Directory Convention
```
project/
├── specs/
│ ├── user-auth.md # Feature spec with acceptance criteria
│ ├── payment-processing.md
│ └── notification-system.md
├── tests/
│ ├── test_user_auth.py # Tests derived from specs/user-auth.md
│ ├── test_payments.py
│ └── test_notifications.py
└── src/
```
### Extracting Tests from Specs
Each acceptance criterion in a spec maps to at least one test:
| Spec Criterion | Test Case |
|---------------|-----------|
| "User can log in with valid credentials" | `test_login_valid_credentials_returns_token` |
| "Invalid password returns 401" | `test_login_invalid_password_returns_401` |
| "Account locks after 5 failed attempts" | `test_login_locks_after_five_failures` |
**Tip:** Number your acceptance criteria in the spec. Reference the number in the test docstring for traceability (`# AC-3: Account locks after 5 failed attempts`).
> **Cross-reference:** See `engineering/spec-driven-workflow` for the full spec methodology, including spec templates and review checklists.
---
## Red-Green-Refactor Examples Per Language
### TypeScript / Jest
```typescript
// test/cart.test.ts
describe("Cart", () => {
describe("addItem", () => {
it("should add a new item to an empty cart", () => {
const cart = new Cart();
cart.addItem({ id: "sku-1", name: "Widget", price: 9.99, qty: 1 });
expect(cart.items).toHaveLength(1);
expect(cart.items[0].id).toBe("sku-1");
});
it("should increment quantity when adding an existing item", () => {
const cart = new Cart();
cart.addItem({ id: "sku-1", name: "Widget", price: 9.99, qty: 1 });
cart.addItem({ id: "sku-1", name: "Widget", price: 9.99, qty: 2 });
expect(cart.items).toHaveLength(1);
expect(cart.items[0].qty).toBe(3);
});
it("should throw when quantity is zero or negative", () => {
const cart = new Cart();
expect(() =>
cart.addItem({ id: "sku-1", name: "Widget", price: 9.99, qty: 0 })
).toThrow("Quantity must be positive");
});
});
});
```
### Python / Pytest (Advanced Patterns)
```python
# tests/conftest.py — shared fixtures
import pytest
from app.db import create_engine, Session
@pytest.fixture(scope="session")
def db_engine():
engine = create_engine("sqlite:///:memory:")
yield engine
engine.dispose()
@pytest.fixture
def db_session(db_engine):
session = Session(bind=db_engine)
yield session
session.rollback()
session.close()
# tests/test_pricing.py — parametrize for multiple cases
import pytest
from app.pricing import calculate_discount
@pytest.mark.parametrize("subtotal, expected_discount", [
(50.0, 0.0), # Below threshold — no discount
(100.0, 5.0), # 5% tier
(250.0, 25.0), # 10% tier
(500.0, 75.0), # 15% tier
])
def test_calculate_discount(subtotal, expected_discount):
assert calculate_discount(subtotal) == pytest.approx(expected_discount)
```
### Go — Table-Driven Tests
```go
// cart_test.go
package cart
import "testing"
func TestApplyDiscount(t *testing.T) {
tests := []struct {
name string
subtotal float64
want float64
}{
{"no discount below threshold", 50.0, 0.0},
{"5 percent tier", 100.0, 5.0},
{"10 percent tier", 250.0, 25.0},
{"15 percent tier", 500.0, 75.0},
{"zero subtotal", 0.0, 0.0},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := ApplyDiscount(tt.subtotal)
if got != tt.want {
t.Errorf("ApplyDiscount(%v) = %v, want %v", tt.subtotal, got, tt.want)
}
})
}
}
```
---
## Bounded Autonomy Rules
When generating tests autonomously, follow these rules to decide when to stop and ask the user:
### Stop and Ask When
- **Ambiguous requirements** — the spec or user story has conflicting or unclear acceptance criteria
- **Missing edge cases** — you cannot determine boundary values without domain knowledge (e.g., max allowed transaction amount)
- **Test count exceeds 50** — large test suites need human review before committing; present a summary and ask which areas to prioritize
- **External dependencies unclear** — the feature relies on third-party APIs or services with undocumented behavior
- **Security-sensitive logic** — authentication, authorization, encryption, or payment flows require human sign-off on test scenarios
### Continue Autonomously When
- **Clear spec with numbered acceptance criteria** — each criterion maps directly to tests
- **Straightforward CRUD operations** — create, read, update, delete with well-defined models
- **Well-defined API contracts** — OpenAPI spec or typed interfaces available
- **Pure functions** — deterministic input/output with no side effects
- **Existing test patterns** — the codebase already has similar tests to follow
---
## Property-Based Testing
Property-based testing generates random inputs to verify invariants instead of relying on hand-picked examples. Use it when the input space is large and the expected behavior can be described as a property.
### Python — Hypothesis
```python
from hypothesis import given, strategies as st
from app.serializers import serialize, deserialize
@given(st.text())
def test_roundtrip_serialization(data):
"""Serialization followed by deserialization returns the original."""
assert deserialize(serialize(data)) == data
@given(st.integers(), st.integers())
def test_addition_is_commutative(a, b):
assert a + b == b + a
```
### TypeScript — fast-check
```typescript
import fc from "fast-check";
import { encode, decode } from "./codec";
test("encode/decode roundtrip", () => {
fc.assert(
fc.property(fc.string(), (input) => {
expect(decode(encode(input))).toBe(input);
})
);
});
```
### When to Use Property-Based Over Example-Based
| Use Property-Based | Example |
|-------------------|---------|
| Data transformations | Serialize/deserialize roundtrips |
| Mathematical properties | Commutativity, associativity, idempotency |
| Encoding/decoding | Base64, URL encoding, compression |
| Sorting and filtering | Output is sorted, length preserved |
| Parser correctness | Valid input always parses without error |
---
## Mutation Testing
Mutation testing modifies your production code (creates "mutants") and checks whether your tests catch the changes. If a mutant survives (tests still pass), your tests have a gap that coverage alone cannot reveal.
### Tools
| Language | Tool | Command |
|----------|------|---------|
| TypeScript/JavaScript | **Stryker** | `npx stryker run` |
| Python | **mutmut** | `mutmut run --paths-to-mutate=src/` |
| Java | **PIT** | `mvn org.pitest:pitest-maven:mutationCoverage` |
### Why Mutation Testing Matters
- **100% line coverage != good tests** — coverage tells you code was executed, not that it was verified
- **Catches weak assertions** — tests that run code but assert nothing meaningful
- **Finds missing boundary tests** — mutants that change `<` to `<=` expose off-by-one gaps
- **Quantifiable quality metric** — mutation score (% mutants killed) is a stronger signal than coverage %
**Recommendation:** Run mutation testing on critical paths (auth, payments, data processing) even if overall coverage is high. Target 85%+ mutation score on P0 modules.
---
## Cross-References
| Skill | Relationship |
|-------|-------------|
| `engineering/spec-driven-workflow` | Spec → acceptance criteria → test extraction pipeline |
| `engineering-team/focused-fix` | Phase 5 (Verify) uses TDD to confirm the fix with a regression test |
| `engineering-team/senior-qa` | Broader QA strategy; TDD is one layer in the test pyramid |
| `engineering-team/code-reviewer` | Review generated tests for assertion quality and coverage completeness |
| `engineering-team/senior-fullstack` | Project scaffolders include testing infrastructure compatible with TDD workflows |
---
## Limitations
| Scope | Details |
|-------|---------|
| Unit test focus | Integration and E2E tests require different patterns |
| Static analysis | Cannot execute tests or measure runtime behavior |
| Language support | Best for TypeScript, JavaScript, Python, Java |
| Report formats | LCOV, JSON, XML only; other formats need conversion |
| Generated tests | Provide scaffolding; require human review for complex logic |
**When to use other tools:**
- E2E testing: Playwright, Cypress, Selenium
- Performance testing: k6, JMeter, Locust
- Security testing: OWASP ZAP, Burp SuiteRelated Skills
brand-guidelines
When the user wants to apply, document, or enforce brand guidelines for any product or company. Also use when the user mentions 'brand guidelines,' 'brand colors,' 'typography,' 'logo usage,' 'brand voice,' 'visual identity,' 'tone of voice,' 'brand standards,' 'style guide,' 'brand consistency,' or 'company design standards.' Covers color systems, typography, logo rules, imagery guidelines, and tone matrix for any brand — including Anthropic's official identity.
wiki-query
Query the LLM Wiki — reads index.md first, drills into 3-10 relevant pages, synthesizes an answer with inline [[wikilink]] citations, and offers to file the answer back as a new comparison or synthesis page. Usage /wiki-query "<question>"
wiki-log
Show recent entries from the LLM Wiki log (wiki/log.md). Uses the standardized
wiki-lint
Run a health check on the LLM Wiki vault — mechanical checks (orphans, broken links, stale pages, missing frontmatter, log gap, duplicates) plus semantic checks (contradictions, cross-reference gaps, concepts missing their own page). Outputs a markdown report with suggested actions. Usage /wiki-lint [--stale-days N] [--log-gap-days N]
wiki-init
Bootstrap a fresh LLM Wiki vault with the three-layer structure, schema files, and starter templates. Usage /wiki-init <path> --topic "<topic>" [--tool all|claude-code|codex|cursor|antigravity]
wiki-ingest
Ingest a source file from raw/ into the LLM Wiki — read, discuss, write summary page, update cross-references across 5-15 pages, regenerate index, append to log. Usage /wiki-ingest <path-to-source>
tc
Track technical changes with structured records, a state machine, and session handoff. Usage: /tc <init|create|update|status|resume|close|export|dashboard> [args]
tc-tracker
Use when the user asks to track technical changes, create change records, manage TC lifecycles, or hand off work between AI sessions. Covers init/create/update/status/resume/close/export workflows for structured code change documentation.
llm-wiki
Use when building or maintaining a persistent personal knowledge base (second brain) in Obsidian where an LLM incrementally ingests sources, updates entity/concept pages, maintains cross-references, and keeps a synthesis current. Triggers include "second brain", "Obsidian wiki", "personal knowledge management", "ingest this paper/article/book", "build a research wiki", "compound knowledge", "Memex", or whenever the user wants knowledge to accumulate across sessions instead of being re-derived by RAG on every query.
karpathy-coder
Use when writing, reviewing, or committing code to enforce Karpathy's 4 coding principles — surface assumptions before coding, keep it simple, make surgical changes, define verifiable goals. Triggers on "review my diff", "check complexity", "am I overcomplicating this", "karpathy check", "before I commit", or any code quality concern where the LLM might be overcoding.
karpathy-check
Run Karpathy's 4-principle review on staged changes or the last commit. Checks complexity, diff noise, hidden assumptions, and goal verification. Usage /karpathy-check [--last-commit]
cs-wiki-linter
Dispatched sub-agent that runs a periodic health check on an LLM Wiki vault. Runs mechanical checks via scripts (orphans, broken links, stale pages, missing frontmatter, duplicate titles, log gaps), does semantic checks (contradictions, stale claims, cross-reference gaps, concepts missing their own page), and produces a markdown report with suggested actions. Spawn weekly, after batch ingests, or when the user says "check the wiki" / "lint my wiki" / "audit the vault".