test-driven-fix

Test-first debugging loop that reproduces bugs with failing tests, then iterates until tests pass. Activates for "write a test first", "test-driven fix", "TDD fix", "reproduce with test", "make it pass", or when fixing bugs that have an existing test suite.

13 stars

byabdullah1854

View on GitHub Installation ↓

Best use case

test-driven-fix is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using test-driven-fix should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/test-driven-fix/SKILL.md --create-dirs "https://raw.githubusercontent.com/abdullah1854/MCPGateway/main/.agents/skills/test-driven-fix/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/test-driven-fix/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How test-driven-fix Compares

Feature / Agent	test-driven-fix	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Test-Driven Fix Protocol

## When This Skill Activates
- "Write a test first", "fix with TDD", "test-driven fix"
- "Reproduce with a test", "make the tests pass"
- Bug fixes where a test suite already exists
- Iterative fix cycles where verification is automated
- After 2+ failed fix attempts on the same bug (escalation)

## Anti-Hallucination Rules (NEVER violate)

| Rule | Description |
|------|-------------|
| **TEST MUST FAIL FIRST** | Never skip the "reproduce" step. The test MUST fail before you fix anything, proving it captures the bug |
| **MINIMAL FIX ONLY** | Fix the bug, not the neighborhood. Don't refactor, don't add features, don't "improve" surrounding code |
| **NO BLIND RETRIES** | If a fix doesn't work, DIAGNOSE why before trying again. Never retry the same approach |
| **EVIDENCE AT EVERY STEP** | Show test output at each phase. User should see: failing test → diagnosis → fix → passing test |
| **FULL SUITE AFTER FIX** | After your fix passes the targeted test, run the FULL test suite. No regressions allowed |
| **3-ATTEMPT LIMIT** | If 3 fix attempts fail, STOP and escalate to the user with findings so far |

## The Loop: REPRODUCE → DIAGNOSE → FIX → VALIDATE

### Phase 1: REPRODUCE (Write or Identify Failing Test)

**If tests already exist:**
```bash
# Run existing tests to identify failures
npm test          # or: bun test, pytest, cargo test, etc.

# Isolate the specific failing test
npm test -- --grep "test name"
```

**If no test captures the bug, write one:**
```typescript
// The test MUST:
// 1. Set up the exact conditions that trigger the bug
// 2. Assert the EXPECTED behavior (what should happen)
// 3. FAIL with the current code (proving it catches the bug)

describe('BugDescription', () => {
  it('should [expected behavior] when [condition]', () => {
    // Arrange: set up the bug conditions
    const input = /* exact input that triggers the bug */;

    // Act: run the code
    const result = functionUnderTest(input);

    // Assert: what SHOULD happen (this will fail now)
    expect(result).toBe(expectedValue);
  });
});
```

**Run the test — it MUST fail:**
```bash
npm test -- --grep "BugDescription"
# Expected: FAIL (this proves the test captures the bug)
```

If the test passes immediately, your test doesn't capture the bug. Rewrite it.

### Phase 2: DIAGNOSE (Understand Root Cause)

Before writing any fix:
```
1. Read the failing test output carefully
2. Read the relevant source code (use Read, Grep, Glob)
3. Trace the execution path from input to failure point
4. Form a specific hypothesis:
   - NOT: "something is wrong"
   - YES: "fetchUser returns null when id=0 because the falsy check treats 0 as missing"
5. Document your hypothesis before proceeding
```

**Use TodoWrite to track your diagnosis:**
```
[ ] Identified failing test and its assertion
[ ] Read source code at failure point
[ ] Formed specific hypothesis with evidence
[ ] Planned minimal fix
```

### Phase 3: FIX (Implement Minimal Change)

Apply the smallest possible change that addresses the root cause:
```
- Change ONLY the lines needed to fix the bug
- Do NOT refactor surrounding code
- Do NOT add "nice-to-have" improvements
- Do NOT change unrelated files
- If fix requires changes in multiple files, verify each file path before editing
```

### Phase 4: VALIDATE (Run Tests)

**Step 1: Run the targeted test**
```bash
npm test -- --grep "BugDescription"
# Expected: PASS
```

**Step 2: Run the full test suite**
```bash
npm test
# Expected: ALL PASS (no regressions)
```

**Step 3: If targeted test fails → back to Phase 2**
```
- Do NOT retry the same fix
- Re-read the test output
- What's different from your hypothesis?
- Form a NEW hypothesis based on the new evidence
- Track attempt number (max 3 before escalation)
```

**Step 4: If full suite has regressions → adjust fix**
```
- Read the newly failing tests
- Your fix broke something else
- Adjust fix to handle both cases
- Re-run full suite
```

**Step 5: Only when ALL tests pass:**
```bash
# Commit with descriptive message
git add [specific files]
git commit -m "fix: [description of what was fixed and why]"
```

## Attempt Tracking

Track each fix attempt:

```markdown
### Attempt 1
- Hypothesis: [what you thought was wrong]
- Fix applied: [what you changed]
- Result: FAIL — [why it failed]
- Learning: [what you learned]

### Attempt 2
- Hypothesis: [updated hypothesis based on attempt 1]
- Fix applied: [different approach]
- Result: PASS/FAIL
```

After 3 failed attempts:
```markdown
### Escalation
- Bug: [description]
- 3 attempts tried: [summary]
- Evidence gathered: [what we know]
- Remaining hypotheses: [what hasn't been tried]
- Recommendation: [suggested next step]
```

## Framework-Specific Commands

| Framework | Run All | Run Specific | Watch Mode |
|-----------|---------|-------------|------------|
| Jest | `npm test` | `npm test -- --grep "name"` | `npm test -- --watch` |
| Vitest | `npx vitest` | `npx vitest -t "name"` | `npx vitest --watch` |
| Bun | `bun test` | `bun test --grep "name"` | N/A |
| Pytest | `pytest` | `pytest -k "name"` | `pytest-watch` |
| Cargo | `cargo test` | `cargo test test_name` | `cargo watch -x test` |
| Playwright | `npx playwright test` | `npx playwright test -g "name"` | N/A |

## Verification Checklist
- [ ] Failing test exists that reproduces the bug (test fails before fix)
- [ ] Root cause diagnosed with specific hypothesis and evidence
- [ ] Fix is minimal (only touches code needed to fix the bug)
- [ ] Targeted test now passes
- [ ] Full test suite passes (no regressions)
- [ ] Fix committed with descriptive message
- [ ] If 3 attempts failed: escalated to user with findings

## Key Principle
**The test is your contract.** Write a test that fails because of the bug, then make it pass with the smallest possible change. If you can't make it pass in 3 attempts, you don't understand the bug well enough yet — escalate with your evidence, don't keep guessing.

Related Skills

sql-analyzer

from abdullah1854/MCPGateway

Analyzes SQL queries for anti-patterns, performance issues, and suggests optimizations.

session-handoff

from abdullah1854/MCPGateway

Preserves context across sessions and IDE switches. Activates for "save session", "handoff", "continue later", "switching IDE", "save context", "what did we do", "summarize session", "end session".

seo-recovery

from abdullah1854/MCPGateway

SEO traffic recovery protocol for diagnosing and fixing indexing drops, canonical mismatches, hreflang bugs, and toxic sitemaps. Integrates with Google Search Console via MCP Gateway. Includes Hostinger deployment awareness for OPcache/CDN friction. Activates for "SEO", "traffic drop", "indexing", "canonical", "hreflang", "sitemap", "search console", "GSC", "crawl errors", "deindexed", "organic traffic".

infra-deploy

from abdullah1854/MCPGateway

Infrastructure deployment for VPS, Docker, and cloud platforms. Activates for "deploy", "setup server", "docker", "coolify", "VPS", "SSH", "nginx", "production", "hosting" requests.

hostinger-deploy

from abdullah1854/MCPGateway

Hostinger-specific deployment protocol handling OPcache, CDN caching, and hPanel Git mechanism. Activates for "deploy to hostinger", "hostinger", "redeploy", "publish website", "push to production" when target is Hostinger.

git-workflow

from abdullah1854/MCPGateway

Smart git operations including conventional commits, PR creation, branch management, and conflict resolution. Activates for "commit", "create PR", "push", "merge", "resolve conflict", "git" operations.

frontend-build

from abdullah1854/MCPGateway

Production-grade frontend development with distinctive design. Activates for "build UI", "create component", "landing page", "dashboard", "form", "responsive", "tailwind", "frontend", "design", "React", "Next.js" requests.

doc-coauthoring

from abdullah1854/MCPGateway

Collaborative document writing with structured workflow. Activates for "write document", "draft proposal", "create spec", "documentation", "co-author", "help me write", "RFC", "design doc".

deep-thinking

from abdullah1854/MCPGateway

Activates extended reasoning for complex problems. Use when asked to "think harder", "ultrathink", "think deeply", "analyze thoroughly", or when facing architecture decisions, complex debugging, system design, or trade-off analysis.

debugging

from abdullah1854/MCPGateway

Systematic debugging protocol for finding and fixing bugs. Activates for "debug", "fix bug", "not working", "error", "broken", "issue", "fails", "crash", "undefined", "null" problems. Hardened against 169 sessions of real-world friction data.

daily-standup

from abdullah1854/MCPGateway

Generates daily standup reports from accomplishments, plans, and blockers.

code-review

from abdullah1854/MCPGateway

Comprehensive code review for pull requests using parallel agents. Reviews for CLAUDE.md compliance, bugs, historical context, previous PR comments, and code comment guidance.