testing-anti-patterns

Identify and fix testing mistakes: flaky, brittle, over-mocked tests.

290 stars

Best use case

testing-anti-patterns is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Identify and fix testing mistakes: flaky, brittle, over-mocked tests.

Teams using testing-anti-patterns should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/testing-anti-patterns/SKILL.md --create-dirs "https://raw.githubusercontent.com/notque/claude-code-toolkit/main/skills/testing-anti-patterns/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/testing-anti-patterns/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How testing-anti-patterns Compares

Feature / Agenttesting-anti-patternsStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Identify and fix testing mistakes: flaky, brittle, over-mocked tests.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Testing Pattern Quality Skill

## Overview

This skill identifies and fixes common testing mistakes across unit, integration, and E2E test suites. Tests should verify behavior, be reliable, run fast, and fail for the right reasons.

**Scope:** This skill focuses on improving test quality and reliability. It complements `test-driven-development` by addressing what goes wrong with tests, complementing how to write them correctly from scratch.

**Out of scope:** Writing new tests from scratch (use `test-driven-development`), fixing fundamental architectural issues (use `systematic-refactoring`), or profiling test performance with external tools.

---

## Instructions

### Phase 1: SCAN

**Goal**: Identify quality issues present in the target test code.

**Step 1: Locate test files**

Use Grep/Glob to find test files in the relevant area. If user pointed to specific files, start there. Common patterns:
- Go: `*_test.go`
- Python: `test_*.py` or `*_test.py`
- JavaScript/TypeScript: `*.test.ts`, `*.spec.ts`, `*.test.js`, `*.spec.js`

**Step 2: Read CLAUDE.md**

Check for project-specific testing conventions before flagging quality issues. Some projects intentionally deviate from general best practices. This prevents false positives based on organizational standards.

**Step 3: Classify quality issues**

For each test file, scan for these 10 categories (detailed examples in `references/anti-pattern-catalog.md`):

| # | Pattern to Fix | Detection Signal |
|---|-------------|-----------------|
| 1 | Testing implementation details | Asserts on private fields, internal regex, spy on private methods |
| 2 | Over-mocking / brittle selectors | Mock setup > 50% of test code, CSS nth-child selectors |
| 3 | Order-dependent tests | Shared mutable state, class-level variables, numbered test names |
| 4 | Incomplete assertions | `!= nil`, `> 0`, `toBeTruthy()`, no value checks |
| 5 | Over-specification | Exact timestamps, hardcoded IDs, asserting every default field |
| 6 | Ignored failures | `@skip`, `.skip`, `xit`, empty catch blocks, `_ = err` |
| 7 | Poor naming | `testFunc2`, `test_new`, `it('works')`, `it('handles case')` |
| 8 | Missing edge cases | Only happy path, no empty/null/boundary/error tests |
| 9 | Slow test suites | Full DB reset per test, no parallelization, no fixture sharing |
| 10 | Flaky tests | `sleep()`, `time.Sleep()`, `setTimeout()`, unsynchronized goroutines |

**Step 4: Document findings**

```markdown
## Pattern Quality Report

### [File:Line] - [Pattern Name]
- **Severity**: HIGH / MEDIUM / LOW
- **Issue**: [What is wrong]
- **Impact**: [Flaky / slow / false-confidence / maintenance burden]
```

**Gate**: At least one quality issue identified with file:line reference. Proceed only when gate passes.

### Phase 2: PRIORITIZE

**Goal**: Rank findings by impact to fix the most damaging patterns first.

**Priority order:**
1. **HIGH** - Flaky tests, order-dependent tests, ignored failures (erode trust in suite)
2. **MEDIUM** - Over-mocking, incomplete assertions, missing edge cases (false confidence)
3. **LOW** - Poor naming, over-specification, slow suites (maintenance burden)

**Constraint: Fix one pattern at a time.** Mechanical bulk fixes (applying the same pattern to 50 tests without running them) miss context-specific nuances and cause regressions. Fix one, verify it works, then move to the next.

**Constraint: Preserve test intent.** When fixing quality issues, maintain what the test was originally trying to verify. Preserve the original test coverage scope.

**Constraint: Prevent over-engineering.** Fix the specific quality issue identified; make targeted fixes to the specific anti-pattern or delete tests and write new ones from scratch. Institutional knowledge lives in the existing tests.

**Gate**: Findings ranked. User agrees on scope of fixes. Proceed only when gate passes.

### Phase 3: FIX

**Goal**: Apply targeted fixes to identified quality issues.

**Step 1: For each quality issue (highest priority first):**

```markdown
ISSUE: [Name]
Location: [file:line]
Issue: [What is wrong]
Impact: [Flaky/slow/false-confidence/maintenance burden]

Current:
[problematic code snippet]

Fixed:
[improved code snippet]

Priority: [HIGH/MEDIUM/LOW]
```

**Step 2: Apply fix**

**Constraint: Show real examples.** Point to actual code when identifying quality issues, not abstract descriptions. Check for rationalization — if a test breaks during refactoring, that test was relying on buggy behavior. Investigate and fix the root cause, investigate and fix the root cause.

**Constraint: Guide toward behavior testing.** Always recommend testing observable behavior, not implementation internals. For example:
- ISSUE: Test asserts on private fields → FIX: Test the public behavior that those fields enable
- ISSUE: Test spies on `_getUser()` → FIX: Test what happens when a user exists or doesn't exist
- ISSUE: Test checks exact regex → FIX: Test that validation succeeds/fails for representative inputs

Change only what is needed to fix the anti-pattern. Consult `references/fix-strategies.md` for language-specific patterns.

**Step 3: Run tests after each fix**

- Run the specific fixed test first to confirm it passes
- Run the full file or package to check for interactions
- If a fix makes a previously-passing test fail, the test was likely depending on buggy behavior — investigate before proceeding

**Gate**: Each fix verified individually. Tests pass after each change.

### Phase 4: VERIFY

**Goal**: Confirm all fixes work together and suite is healthier.

**Step 1**: Run full test suite — all pass

**Step 2**: Verify previously-flaky tests are now deterministic (run 3x if applicable)
- Go: `go test -count=3 -run TestFixed ./...`
- Python: `pytest --count=3 tests/test_fixed.py`
- JS: Run test file 3 times sequentially

**Step 3**: Confirm no test was accidentally deleted or skipped
- Compare test count before and after fixes
- Search for any new `@skip` or `.skip` annotations introduced

**Step 4**: Summary report

```markdown
## Fix Summary
Anti-patterns fixed: [count]
Files modified: [list]
Tests affected: [count]
Suite status: all passing / [details]
Remaining issues: [any deferred items]
```

**Gate**: Full suite passes. All fixes verified. Summary delivered.

---

## Pattern Quality Catalog

This section documents the domain-specific anti-patterns this skill detects and fixes.

### Pattern 1: Test Observable Behavior

**What it looks like:** Tests assert on private fields, internal regex patterns, or spy on private methods.

**Why it's problematic:** Tests coupled to implementation details break whenever the implementation changes, even if public behavior is identical. This creates brittle tests that fail to reflect real-world usage.

**Example signals:**
- Test accesses `obj._privateField`
- Test mocks or spies on `_internalMethod()`
- Test asserts the exact regex used internally

**Fix:** Test the public behavior that those implementation details enable. If private fields matter, they matter because they affect what users see or experience.

### Pattern 2: Mock Only at Boundaries

**What it looks like:** Mock setup spans more than 50% of the test code. CSS selectors use nth-child or rely on brittle DOM structure.

**Why it's problematic:** Over-mocked tests verify mock wiring, not actual behavior. They miss real integration issues and break whenever the mocking structure changes.

**Example signals:**
- Test has 15 lines of setup and 5 lines of assertion
- Test uses `.querySelector('div:nth-child(3) > span')`
- Test mocks every dependency instead of using real implementations at I/O boundaries

**Fix:** Mock only at architectural boundaries (HTTP, DB, external services). Use real implementations for internal logic. For UI tests, select by semantic attributes (data-testid, role) instead of DOM structure.

### Pattern 3: Isolate Test State

**What it looks like:** Tests share mutable state, use class-level variables, or have numbered test names (test1, test2) suggesting sequence dependency.

**Why it's problematic:** Tests that pass in sequence but fail in parallel or random order hide bugs. The suite becomes unreliable — developers can't trust "all tests pass" locally if they fail in CI.

**Example signals:**
- Multiple tests modify a shared class-level variable
- Database is populated by test1, test2 depends on that state
- Test names: `test1_setup`, `test2_verify`, `test3_cleanup`

**Fix:** Each test owns its data. Use setup/teardown or test fixtures to isolate state. Run suite with `--shuffle` or `-random-order` to catch dependencies.

### Pattern 4: Assert Specific Values

**What it looks like:** Tests use assertions like `!= nil`, `> 0`, `toBeTruthy()` without checking specific values.

**Why it's problematic:** Incomplete assertions pass for many wrong reasons. A function that returns 999 (wrong) passes an `> 0` assertion. This gives false confidence — tests pass but miss bugs.

**Example signals:**
- `assert result != nil` (passes for any non-nil value)
- `assert response.status > 0` (passes for 404, 500, etc.)
- `expect(user).toBeTruthy()` (passes for any truthy user, even with wrong name)

**Fix:** Assert specific expected values:
- `assert.equal(result.name, "Alice")`
- `assert.equal(response.status, 200)`
- `expect(user.name).toBe("Alice")`

### Pattern 5: Assert Only What Matters

**What it looks like:** Tests assert on default values, exact timestamps, hardcoded IDs, or every field in a response.

**Why it's problematic:** Over-specified tests are fragile. When a default changes (legitimately), dozens of tests break even though behavior didn't change. Tests should specify only what matters for this test case.

**Example signals:**
- `assert.equal(user.createdAt, "2024-01-15T10:30:00Z")` (timestamp brittle to test time)
- `assert.equal(post.id, "uuid-1234-5678")` (hardcoded ID specific to this test)
- Test asserts `status`, `message`, `timestamp`, `userId`, `metadata` when only `status` matters

**Fix:** Assert only what matters. Use flexible matchers for timestamps and IDs:
- `expect(user.createdAt).toBeDefined()` or `toBeWithin(now, 1000ms)`
- `assert.truthy(post.id)` (just verify it exists)

### Pattern 6: Address or Remove Skipped Tests

**What it looks like:** Tests use `@skip`, `.skip`, `xit`, empty catch blocks, or `_ = err` (ignore error).

**Why it's problematic:** Skipped tests become permanent blind spots. Nobody remembers why they were skipped. Empty catch blocks hide real errors.

**Example signals:**
- `@skip` or `.skip()` with no expiration date
- `try { ...test code... } catch (e) {}` (silently ignore errors)
- `err := doSomething(); _ = err` (acknowledge but ignore)

**Fix:** Delete the test if no longer relevant, or unskip and fix it. Add a reason annotation with a date if skipping is truly necessary:
```go
t.Skip("TODO: fix timing issue (2024-01-15)")
```

### Pattern 7: Use Descriptive Test Names

**What it looks like:** Test names use sequential numbers (`test1`, `test2`), vague names (`testFunc`, `test_new`), or generic descriptions (`it('works')`, `it('handles case')`).

**Why it's problematic:** Poor names hide intent. Developers reading test output see `test1 failed` but have no idea what behavior broke. Good test names document expected behavior.

**Example signals:**
- `TestCreateUser1`, `TestCreateUser2`
- `test_new`, `testFunc`, `test_handle`
- `it('works')`, `it('handles case')`, `it('does something')`

**Fix:** Use descriptive names that describe the scenario and expected outcome:
- Go: `Test_CreateUser_WithValidEmail_ReturnsNewUser`
- Python: `test_create_user_with_valid_email_returns_new_user`
- JS: `it('creates a user when given a valid email')`

### Pattern 8: Cover Boundaries and Errors

**What it looks like:** Test suite covers only the happy path. No tests for empty inputs, null values, boundary conditions, errors, or large datasets.

**Why it's problematic:** Missing edge cases cause production bugs. The happy path works, but the code crashes on empty input, null reference, or boundary values.

**Example signals:**
- Only tests with valid input; no tests with empty/null
- No tests for negative numbers, zero, or max values
- No tests for error conditions (timeout, connection failure)

**Fix:** Add tests for:
- **Empty**: empty string, empty array, empty object
- **Null**: null input, missing required field
- **Boundary**: zero, max value, min value, off-by-one
- **Error**: timeout, network failure, permission denied
- **Large**: very large arrays, deep nesting

### Pattern 9: Optimize Test Speed

**What it looks like:** Full database reset between every test. No parallelization. Fixture data shared instead of created per-test. Tests wait on actual time.

**Why it's problematic:** Slow tests discourage running locally. Developers skip tests before committing, bugs slip through. CI builds take hours, slowing iteration.

**Example signals:**
- Each test: `DROP TABLE users; INSERT INTO users ...` (30s per test)
- Sequential execution with no parallelization
- Tests use `time.Sleep(1000)` to wait for something

**Fix:**
- Use transactions that rollback instead of dropping tables
- Run tests in parallel: `go test -parallel 8`, `pytest -n auto`
- Create fixtures once, reference per-test: fixture factories, test-specific data builders
- Replace waits with condition checks: `waitFor(() => element.textContent)` instead of `sleep(1000)`

### Pattern 10: Ensure Deterministic Tests

**What it looks like:** Tests use `sleep()`, `time.Sleep()`, `setTimeout()` or unsynchronized goroutines. Tests pass locally but fail randomly in CI.

**Why it's problematic:** Flaky tests erode trust in the test suite. Developers cannot tell if a failure is real or just timing. Teams start ignoring test failures — the worst outcome.

**Example signals:**
- `time.Sleep(100 * time.Millisecond)` to wait for goroutine
- `setTimeout(() => { ...assert... }, 500)` hoping it's ready
- Tests pass locally but fail in CI (slower machines, resource contention)

**Fix:**
- Replace `sleep()` with explicit waits: `waitFor()`, `sync.WaitGroup`, channels
- Inject fake clocks or time control: `time.Now()` should be mockable
- Synchronize goroutines with channels or `sync.WaitGroup`, not timing
- Tests must be deterministic: same input → same output, regardless of machine speed

---

## Error Handling

### Error: "Cannot Determine if Pattern is a Quality Issue"

Cause: Context-dependent — pattern may be valid in specific situations

Solution:
1. Check if the test has a comment explaining the unusual approach
2. Consider the testing layer (unit vs integration vs E2E)
3. If mock-heavy test is for a unit with many dependencies, suggest integration test instead
4. When in doubt, flag as MEDIUM and explain trade-offs

### Error: "Fix Changes Test Behavior"

Cause: Anti-pattern was masking an actual test gap or testing wrong thing

Solution:
1. Identify what the test was originally trying to verify
2. Write the correct assertion for that behavior
3. If original behavior was wrong, note it as a separate finding
4. Preserve what each test covers

### Error: "Suite Has Hundreds of Quality Issues"

Cause: Systemic test quality issues, not individual mistakes

Solution:
1. Fix issues incrementally, focusing on highest severity first
2. Focus on HIGH severity items only (flaky, order-dependent)
3. Recommend adopting TDD going forward to prevent new quality issues
4. Suggest incremental cleanup strategy (fix on touch, not bulk rewrite)

---

## References

### Quick Reference Table

| Pattern to Fix | Symptom | Fix |
|-------------|---------|-----|
| Testing implementation | Test breaks on refactor | Test behavior, not internals |
| Over-mocking | Mock setup > test logic | Integration test or mock only I/O |
| Order dependence | Tests fail in isolation | Each test owns its data |
| Incomplete assertions | `assert result != nil` | Assert specific expected values |
| Over-specification | Asserts on defaults/timestamps | Assert only what matters for this test |
| Ignored failures | `@skip`, empty catch | Delete or fix immediately |
| Poor naming | `testFunc2` | `Test{What}_{When}_{Expected}` |
| Missing edge cases | Only happy path | empty, null, boundary, error, large |
| Slow suite | 30s+ for simple tests | Parallelize, share fixtures, rollback |
| Flaky tests | Random failures | Control time, synchronize, no sleep |

### Red Flags During Review

- `@skip`, `@ignore`, `xit`, `.skip` without expiration date
- `time.sleep()`, `setTimeout()` in test code
- Test names with sequential numbers (`test1`, `test2`)
- Global mutable state accessed by multiple tests
- Mock setup spanning 20+ lines
- Empty catch blocks in tests
- Assertions like `!= nil`, `> 0`, `toBeTruthy()` without value checks

### TDD Relationship

Strict TDD prevents most quality issues:
1. **RED phase** catches incomplete assertions (test must fail first)
2. **GREEN phase minimum** prevents over-specification
3. **Watch failure** confirms you test behavior, not mocks
4. **Incremental cycles** prevent test interdependence
5. **Refactor phase** reveals tests coupled to implementation

If you find quality issues in a codebase, check if TDD discipline slipped.

### Reference Files

- `${CLAUDE_SKILL_DIR}/references/pattern-catalog.md`: Detailed code examples for all 10 anti-patterns (Go, Python, JavaScript)
- `${CLAUDE_SKILL_DIR}/references/fix-strategies.md`: Language-specific fix patterns and tooling
- `${CLAUDE_SKILL_DIR}/references/blind-spot-taxonomy.md`: 6-category taxonomy of what high-coverage test suites commonly miss (concurrency, state, boundaries, security, integration, resilience)
- `${CLAUDE_SKILL_DIR}/references/load-test-scenarios.md`: 6 load test scenario types (smoke, load, stress, spike, soak, breakpoint) with configurations and critical endpoint priorities