test-driven-development
Red-green-refactor development methodology requiring verified test coverage. Use for feature implementation, bugfixes, refactoring, or any behavior changes where tests must prove correctness.
Best use case
test-driven-development is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Red-green-refactor development methodology requiring verified test coverage. Use for feature implementation, bugfixes, refactoring, or any behavior changes where tests must prove correctness.
Red-green-refactor development methodology requiring verified test coverage. Use for feature implementation, bugfixes, refactoring, or any behavior changes where tests must prove correctness.
Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.
Practical example
Example input
Use the "test-driven-development" skill to help with this workflow task. Context: Red-green-refactor development methodology requiring verified test coverage. Use for feature implementation, bugfixes, refactoring, or any behavior changes where tests must prove correctness.
Example output
A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.
When to use this skill
- Use this skill when you want a reusable workflow rather than writing the same prompt again and again.
When not to use this skill
- Do not use this when you only need a one-off answer and do not need a reusable workflow.
- Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/test-driven-development/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How test-driven-development Compares
| Feature / Agent | test-driven-development | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Red-green-refactor development methodology requiring verified test coverage. Use for feature implementation, bugfixes, refactoring, or any behavior changes where tests must prove correctness.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Cursor vs Codex for AI Workflows
Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.
SKILL.md Source
# Test-Driven Development
Write test first. Watch it fail. Write minimal code to pass. Refactor.
**Core principle:** If you didn't watch the test fail, you don't know if it tests the right thing.
## The Iron Law
```
NO BEHAVIOR-CHANGING PRODUCTION CODE WITHOUT A FAILING TEST FIRST
```
Wrote code before test? Delete it completely. Implement fresh from tests.
**Refactoring is exempt:** The refactor step changes structure, not behavior. Tests stay green throughout. No new failing test required.
## Red-Green-Refactor Cycle
```
RED ──► Verify Fail ──► GREEN ──► Verify Pass ──► REFACTOR ──► Verify Pass ──► Next RED
│ │ │
▼ ▼ ▼
Wrong failure? Still failing? Broke tests?
Fix test, retry Fix code, retry Fix, retry
```
### RED - Write Failing Test
Write one minimal test for one behavior.
**Good example:**
```typescript
test('retries failed operations 3 times', async () => {
let attempts = 0;
const operation = async () => {
attempts++;
if (attempts < 3) throw new Error('fail');
return 'success';
};
const result = await retryOperation(operation);
expect(result).toBe('success');
expect(attempts).toBe(3);
});
```
*Clear name, tests real behavior, asserts observable outcome*
**Bad example:**
```typescript
test('retry works', async () => {
const mock = jest.fn()
.mockRejectedValueOnce(new Error())
.mockRejectedValueOnce(new Error())
.mockResolvedValueOnce('success');
await retryOperation(mock);
expect(mock).toHaveBeenCalledTimes(3);
});
```
*Vague name, asserts only call count without verifying outcome, tests mock mechanics not behavior*
**Requirements:** One behavior. Clear name. Real code (mocks only if unavoidable).
### Verify RED - Watch It Fail
**MANDATORY. Never skip.**
```bash
npm test path/to/test.test.ts
```
Test must go red for the right reason. Acceptable RED states:
- Assertion failure (expected behavior missing)
- Compile/type error (function doesn't exist yet)
Not acceptable: Runtime setup errors, import failures, environment issues.
Test passes immediately? You're testing existing behavior—fix test.
Test errors for wrong reason? Fix error, re-run until it fails correctly.
### GREEN - Minimal Code
Write simplest code to pass the test.
**Good example:**
```typescript
async function retryOperation<T>(fn: () => Promise<T>): Promise<T> {
for (let i = 0; i < 3; i++) {
try {
return await fn();
} catch (e) {
if (i === 2) throw e;
}
}
throw new Error('unreachable');
}
```
*Just enough to pass*
**Bad example:**
```typescript
async function retryOperation<T>(
fn: () => Promise<T>,
options?: { maxRetries?: number; backoff?: 'linear' | 'exponential'; }
): Promise<T> { /* YAGNI */ }
```
*Over-engineered beyond test requirements*
Write only what the test demands. No extra features, no "improvements."
### Verify GREEN - Watch It Pass
**MANDATORY.**
```bash
npm test path/to/test.test.ts
```
Confirm: Test passes. All other tests still pass. Output pristine (no errors, warnings).
Test fails? Fix code, not test.
Other tests fail? Fix now before continuing.
### REFACTOR - Clean Up
After green only: Remove duplication. Improve names. Extract helpers.
Keep tests green throughout. Add no new behavior.
### Repeat
Next failing test for next behavior.
## Good Tests
**Minimal:** One thing per test. "and" in name? Split it. ❌ `test('validates email and domain and whitespace')`
**Clear:** Name describes behavior. ❌ `test('test1')`
**Shows intent:** Demonstrates desired API usage, not implementation details.
## Example: Bug Fix
**Bug:** Empty email accepted
**RED:**
```typescript
test('rejects empty email', async () => {
const result = await submitForm({ email: '' });
expect(result.error).toBe('Email required');
});
```
**Verify RED:**
```bash
$ npm test
FAIL: expected 'Email required', got undefined
```
**GREEN:**
```typescript
function submitForm(data: FormData) {
if (!data.email?.trim()) {
return { error: 'Email required' };
}
// ...
}
```
**Verify GREEN:**
```bash
$ npm test
PASS
```
**REFACTOR:** Extract validation helper if pattern repeats.
## Red Flags - STOP and Start Over
Any of these means delete code and restart with TDD:
- Code written before test
- Test passes immediately (testing existing behavior)
- Can't explain why test failed
- Rationalizing "just this once" or "this is different"
- Keeping code "as reference" while writing tests
- Claiming "tests after achieve the same purpose"
## When Stuck
| Problem | Solution |
|---------|----------|
| Don't know how to test | Write the API you wish existed. Write assertion first. |
| Test too complicated | Design too complicated. Simplify the interface. |
| Must mock everything | Code too coupled. Introduce dependency injection. |
| Test setup huge | Extract helpers. Still complex? Simplify design. |
## Legacy Code (No Existing Tests)
The Iron Law ("delete and restart") applies to **new code you wrote without tests**. For inherited code with no tests, use characterization tests:
1. Write tests that capture current behavior (even if "wrong")
2. Run tests, observe actual outputs
3. Update assertions to match reality (these are "golden masters")
4. Now you have a safety net for refactoring
5. Apply TDD for new behavior changes
Characterization tests lock down existing behavior so you can refactor safely. They're the on-ramp, not a permanent state.
## Flakiness Rules
Tests must be deterministic. Ban these in unit tests:
- **Real sleeps / delays** → Use fake timers (`vi.useFakeTimers()`, `jest.useFakeTimers()`)
- **Wall clock time** → Inject clock, assert against injected time
- **Math.random()** → Seed or inject RNG
- **Network calls** → Mock at boundary or use MSW
- **Filesystem race conditions** → Use temp dirs with unique names
Flaky test? Fix or delete. Flaky tests erode trust in the entire suite.
## Debugging Integration
Bug found? Write failing test reproducing it first. Then follow TDD cycle. Test proves fix and prevents regression.
## Planning: Test List
Before diving into the cycle, spend 2 minutes listing the next 3-10 tests you expect to write. This prevents local-optimum design where early tests paint you into a corner.
Example test list for a retry function:
- retries N times on failure
- returns result on success
- throws after max retries exhausted
- calls onRetry callback between attempts
- respects backoff delay
Work through the list in order. Add/remove tests as you learn.
## Testing Anti-Patterns
When writing tests involving mocks, dependencies, or test utilities: See [references/testing-anti-patterns.md](references/testing-anti-patterns.md) for common pitfalls including testing mock behavior and adding test-only methods to production classes.
## Philosophy and Rationalizations
For detailed rebuttals to common objections ("I'll test after", "deleting work is wasteful", "TDD is dogmatic"): See [references/tdd-philosophy.md](references/tdd-philosophy.md)
## Final Rule
```
Production code exists → test existed first and failed first
Otherwise → not TDD
```Related Skills
vue-development-guides
A collection of best practices and tips for developing applications using Vue.js. This skill MUST be apply when developing, refactoring or reviewing Vue.js or Nuxt projects.
testing-strategies
Design comprehensive testing strategies for software quality assurance. Use when planning test coverage, implementing test pyramids, or setting up testing infrastructure. Handles unit testing, integration testing, E2E testing, TDD, and testing best practices.
backend-testing
Write comprehensive backend tests including unit tests, integration tests, and API tests. Use when testing REST APIs, database operations, authentication flows, or business logic. Handles Jest, Pytest, Mocha, testing strategies, mocking, and test coverage.
qa-test-planner
Generate comprehensive test plans, manual test cases, regression test suites, and bug reports for QA engineers. Includes Figma MCP integration for design validation.
game-test-case-generator
基于需求文档(xls/csv)生成专业游戏测试用例,支持完整用例和快速测试点两种模式。当用户提到"游戏测试"、"测试用例生成"、"需求转测试用例"、上传需求文档或原型时使用此技能。
wordpress-woocommerce-development
WooCommerce store development workflow covering store setup, payment integration, shipping configuration, and customization.
wordpress-theme-development
WordPress theme development workflow covering theme architecture, template hierarchy, custom post types, block editor support, and responsive design.
wordpress-plugin-development
WordPress plugin development workflow covering plugin architecture, hooks, admin interfaces, REST API, and security best practices.
wordpress-penetration-testing
This skill should be used when the user asks to "pentest WordPress sites", "scan WordPress for vulnerabilities", "enumerate WordPress users, themes, or plugins", "exploit WordPress vulnerabilities", or "use WPScan". It provides comprehensive WordPress security assessment methodologies.
web3-testing
Test smart contracts comprehensively using Hardhat and Foundry with unit tests, integration tests, and mainnet forking. Use when testing Solidity contracts, setting up blockchain test suites, or validating DeFi protocols.
web-security-testing
Web application security testing workflow for OWASP Top 10 vulnerabilities including injection, XSS, authentication flaws, and access control issues.
voice-ai-engine-development
Build real-time conversational AI voice engines using async worker pipelines, streaming transcription, LLM agents, and TTS synthesis with interrupt handling and multi-provider support