mutation-testing

Test quality validation through mutation testing, assessing test suite effectiveness by introducing code mutations and measuring kill rate. Use when evaluating test quality, identifying weak tests, or proving tests actually catch bugs.

298 stars

byproffesor-for-testing

View on GitHub Installation ↓

Best use case

mutation-testing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using mutation-testing should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/mutation-testing/SKILL.md --create-dirs "https://raw.githubusercontent.com/proffesor-for-testing/agentic-qe/main/.claude/skills/mutation-testing/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/mutation-testing/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How mutation-testing Compares

Feature / Agent	mutation-testing	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# Mutation Testing

<default_to_action>
When validating test quality or improving test effectiveness:
1. MUTATE code (change + to -, >= to >, remove statements)
2. RUN tests against each mutant
3. VERIFY tests catch mutations (kill mutants)
4. IDENTIFY surviving mutants (tests need improvement)
5. STRENGTHEN tests to kill surviving mutants

**Quick Mutation Metrics:**
- Mutation Score = Killed / (Killed + Survived)
- Target: > 80% mutation score
- Surviving mutants = weak tests

**Critical Success Factors:**
- High coverage ≠ good tests (100% coverage, 0% assertions)
- Mutation testing proves tests actually catch bugs
- Focus on critical code paths first
</default_to_action>

## Quick Reference Card

### When to Use
- Evaluating test suite quality
- Finding gaps in test assertions
- Proving tests catch bugs
- Before critical releases

### Mutation Score Interpretation
| Score | Interpretation |
|-------|----------------|
| **90%+** | Excellent test quality |
| **80-90%** | Good, minor improvements |
| **60-80%** | Needs attention |
| **< 60%** | Significant gaps |

### Common Mutation Operators
| Category | Original | Mutant |
|----------|----------|--------|
| **Arithmetic** | `a + b` | `a - b` |
| **Relational** | `x >= 18` | `x > 18` |
| **Logical** | `a && b` | `a \|\| b` |
| **Conditional** | `if (x)` | `if (true)` |
| **Statement** | `return x` | *(removed)* |

---

## How Mutation Testing Works

```javascript
// Original code
function isAdult(age) {
  return age >= 18; // ← Mutant: change >= to >
}

// Strong test (catches mutation)
test('18 is adult', () => {
  expect(isAdult(18)).toBe(true); // Kills mutant!
});

// Weak test (mutation survives)
test('19 is adult', () => {
  expect(isAdult(19)).toBe(true); // Doesn't catch >= vs >
});
// Surviving mutant → Test needs boundary value
```

---

## Using Stryker

```bash
# Install
npm install --save-dev @stryker-mutator/core @stryker-mutator/jest-runner

# Initialize
npx stryker init
```

**Configuration:**
```json
{
  "packageManager": "npm",
  "reporters": ["html", "clear-text", "progress"],
  "testRunner": "jest",
  "coverageAnalysis": "perTest",
  "mutate": [
    "src/**/*.ts",
    "!src/**/*.spec.ts"
  ],
  "thresholds": {
    "high": 90,
    "low": 70,
    "break": 60
  }
}
```

**Run:**
```bash
npx stryker run
```

**Output:**
```
Mutation Score: 87.3%
Killed: 124
Survived: 18
No Coverage: 3
Timeout: 1
```

---

## Fixing Surviving Mutants

```javascript
// Surviving mutant: >= changed to >
function calculateDiscount(quantity) {
  if (quantity >= 10) { // Mutant survives!
    return 0.1;
  }
  return 0;
}

// Original weak test
test('large order gets discount', () => {
  expect(calculateDiscount(15)).toBe(0.1); // Doesn't test boundary
});

// Fixed: Add boundary test
test('exactly 10 gets discount', () => {
  expect(calculateDiscount(10)).toBe(0.1); // Kills mutant!
});

test('9 does not get discount', () => {
  expect(calculateDiscount(9)).toBe(0); // Tests below boundary
});
```

---

## Agent-Driven Mutation Testing

```typescript
// Analyze mutation score and generate fixes
await Task("Mutation Analysis", {
  targetFile: 'src/payment.ts',
  generateMissingTests: true,
  minScore: 80
}, "qe-test-generator");

// Returns:
// {
//   mutationScore: 0.65,
//   survivedMutations: [
//     { line: 45, operator: '>=', mutant: '>', killedBy: null }
//   ],
//   generatedTests: [
//     'test for boundary at line 45'
//   ]
// }

// Coverage + mutation correlation
await Task("Coverage Quality Analysis", {
  coverageData: coverageReport,
  mutationData: mutationReport,
  identifyWeakCoverage: true
}, "qe-coverage-analyzer");
```

---

## Agent Coordination Hints

### Memory Namespace
```
aqe/mutation-testing/
├── mutation-results/*   - Stryker reports
├── surviving/*          - Surviving mutants
├── generated-tests/*    - Tests to kill mutants
└── trends/*             - Mutation score over time
```

### Fleet Coordination
```typescript
const mutationFleet = await FleetManager.coordinate({
  strategy: 'mutation-testing',
  agents: [
    'qe-test-generator',     // Generate tests for survivors
    'qe-coverage-analyzer',  // Coverage correlation
    'qe-quality-analyzer'    // Quality assessment
  ],
  topology: 'sequential'
});
```

---

## Related Skills
- [tdd-london-chicago](../tdd-london-chicago/) - Write effective tests first
- [test-design-techniques](../test-design-techniques/) - Boundary value analysis
- [quality-metrics](../quality-metrics/) - Measure test effectiveness

---

## Remember

**High code coverage ≠ good tests.** 100% coverage but weak assertions = useless. Mutation testing proves tests actually catch bugs.

**Focus on critical paths first.** Don't mutation test everything - prioritize payment, authentication, data integrity code.

**With Agents:** Agents run mutation analysis, identify surviving mutants, and generate missing test cases to kill them. Automated improvement of test quality.

## Run History

After each mutation test run, append results to `run-history.json` in this skill directory:
```bash
node -e "
const fs = require('fs');
const h = JSON.parse(fs.readFileSync('.claude/skills/mutation-testing/run-history.json'));
h.runs.push({date: new Date().toISOString().split('T')[0], mutation_score_pct: SCORE, killed: KILLED, survived: SURVIVED});
fs.writeFileSync('.claude/skills/mutation-testing/run-history.json', JSON.stringify(h, null, 2));
"
```
Read `run-history.json` before each run to track score improvements over time.

## Skill Composition

- **Before mutation testing** → Run `/qe-test-generation` to ensure tests exist
- **After mutation results** → Use `/qe-coverage-analysis` to prioritize improvement areas
- **Quality gate** → Feed results into `/qe-quality-assessment` for ship/no-ship decision

## Gotchas

- Stryker requires `--testRunner jest` explicitly if both jest and vitest are installed
- Mutating `>=` to `>` in date comparisons rarely gets killed — add boundary tests
- Running on files >500 LOC will timeout; use `--mutate` to target specific functions
- `--concurrency` defaults to CPU count which OOMs in containers — set to 2

Related Skills

qe-visual-testing-advanced

298

from proffesor-for-testing/agentic-qe

Advanced visual regression testing with pixel-perfect comparison, AI-powered diff analysis, responsive design validation, and cross-browser visual consistency. Use when detecting UI regressions, validating designs, or ensuring visual consistency.

qe-shift-right-testing

298

from proffesor-for-testing/agentic-qe

Testing in production with feature flags, canary deployments, synthetic monitoring, and chaos engineering. Use when implementing production observability or progressive delivery.

qe-shift-left-testing

298

from proffesor-for-testing/agentic-qe

Move testing activities earlier in the development lifecycle to catch defects when they're cheapest to fix. Use when implementing TDD, CI/CD, or early quality practices.

qe-security-visual-testing

298

from proffesor-for-testing/agentic-qe

Security-first visual testing combining URL validation, PII detection, and visual regression with parallel viewport support. Use when testing web applications that handle sensitive data, need visual regression coverage, or require WCAG accessibility compliance.

qe-security-testing

298

from proffesor-for-testing/agentic-qe

Test for security vulnerabilities using OWASP principles. Use when conducting security audits, testing auth, or implementing security practices.

qe-risk-based-testing

298

from proffesor-for-testing/agentic-qe

Focus testing effort on highest-risk areas using risk assessment and prioritization. Use when planning test strategy, allocating testing resources, or making coverage decisions.

qe-regression-testing

298

from proffesor-for-testing/agentic-qe

Strategic regression testing with test selection, impact analysis, and continuous regression management. Use when verifying fixes don't break existing functionality, planning regression suites, or optimizing test execution for faster feedback.

qe-performance-testing

298

from proffesor-for-testing/agentic-qe

Test application performance, scalability, and resilience. Use when planning load testing, stress testing, or optimizing system performance.

qe-observability-testing-patterns

298

from proffesor-for-testing/agentic-qe

Observability and monitoring validation patterns for dashboards, alerting, log aggregation, APM traces, and SLA/SLO verification. Use when testing monitoring infrastructure, dashboard accuracy, alert rules, or metric pipelines.

qe-n8n-workflow-testing-fundamentals

298

from proffesor-for-testing/agentic-qe

Comprehensive n8n workflow testing including execution lifecycle, node connection patterns, data flow validation, and error handling strategies. Use when testing n8n workflow automation applications.

qe-n8n-trigger-testing-strategies

298

from proffesor-for-testing/agentic-qe

Webhook testing, schedule validation, event-driven triggers, and polling mechanism testing for n8n workflows. Use when testing how workflows are triggered.

qe-n8n-security-testing

298

from proffesor-for-testing/agentic-qe

Credential exposure detection, OAuth flow validation, API key management testing, and data sanitization verification for n8n workflows. Use when validating n8n workflow security.