running-mutation-tests

Execute mutation testing to evaluate test suite effectiveness. Use when performing specialized testing. Trigger with phrases like "run mutation tests", "test the tests", or "validate test effectiveness".

1,868 stars

byjeremylongshore

View on GitHub Installation ↓

Best use case

running-mutation-tests is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using running-mutation-tests should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/running-mutation-tests/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/testing/mutation-test-runner/skills/running-mutation-tests/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/running-mutation-tests/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How running-mutation-tests Compares

Feature / Agent	running-mutation-tests	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

SKILL.md Source

# Mutation Test Runner

## Overview

Execute mutation testing to evaluate the effectiveness of a test suite by systematically introducing small code changes (mutants) and checking whether existing tests detect them. A killed mutant means the tests caught the change; a surviving mutant reveals a testing gap.

## Prerequisites

- Mutation testing framework installed (Stryker, mutmut, PITest, or go-mutesting)
- Existing test suite with reasonable pass rate (all tests must pass before mutation testing)
- Source code with functions and logic suitable for mutation (conditionals, arithmetic, return values)
- Sufficient CI resources (mutation testing runs the test suite once per mutant -- CPU-intensive)
- Configuration file for the mutation tool specifying target files and test commands

## Instructions

1. Verify the existing test suite passes completely:
   - Run the full test suite and confirm 100% pass rate.
   - Fix any failing or skipped tests before proceeding.
   - Mutation testing is meaningless if the baseline tests are broken.
2. Configure the mutation testing tool:
   - Stryker: Create `stryker.config.mjs` with `mutate` patterns, test runner, and thresholds.
   - mutmut: Configure `setup.cfg` or `pyproject.toml` with `[mutmut]` section.
   - PITest: Add Maven/Gradle plugin with target classes and test configurations.
3. Select target files for mutation:
   - Focus on business logic modules (not configuration, constants, or type definitions).
   - Exclude auto-generated code, third-party wrappers, and test utilities.
   - Start with a small scope (one module) to validate setup before expanding.
4. Run the mutation testing suite:
   - Execute `npx stryker run`, `mutmut run`, or `mvn pitest:mutationCoverage`.
   - Monitor progress -- expect long execution times (10-100x normal test runtime).
   - Use incremental mode if available to skip already-tested mutants.
5. Analyze the mutation report:
   - **Killed mutants**: Tests detected the change -- indicates strong test coverage.
   - **Survived mutants**: Tests did not catch the change -- indicates a testing gap.
   - **Timed out mutants**: Mutation caused an infinite loop -- generally acceptable.
   - **No coverage mutants**: The mutated code is not exercised by any test.
6. For each surviving mutant, determine the appropriate action:
   - Write a new test that specifically catches the mutation.
   - Or determine the mutation is equivalent (functionally identical to original) and mark as ignored.
7. Set mutation score thresholds (recommended: 80% kill rate) and integrate into CI as a quality gate.

## Output

- Mutation testing report (HTML or JSON) with killed/survived/timed-out counts
- Mutation score percentage (killed / total non-equivalent mutants)
- Surviving mutant inventory with file, line, mutation type, and suggested test
- New test cases written to kill surviving mutants
- CI configuration with mutation score threshold enforcement

## Error Handling

| Error | Cause | Solution |
|-------|-------|---------|
| Mutation run takes hours | Too many files in scope or slow test suite | Narrow `mutate` scope to critical modules; use `--incremental` mode; parallelize with `--concurrency` |
| All mutants survive | Tests only check for truthiness, not specific values | Strengthen assertions -- use `toBe(42)` instead of `toBeTruthy()`; add boundary checks |
| Equivalent mutant false positive | Mutation produces functionally identical code (e.g., `x >= 0` vs `x > -1`) | Mark as equivalent in config; ignore in score calculation; document rationale |
| Out of memory during run | Too many concurrent mutation workers | Reduce `--concurrency` setting; increase Node.js `--max-old-space-size`; reduce shard size |
| Stryker "initial test run failed" | Test suite does not pass cleanly before mutations begin | Fix all failing tests first; ensure `npm test` exits 0; check test runner configuration |

## Examples

**Stryker configuration for TypeScript project:**
```javascript
// stryker.config.mjs
export default {
  mutate: ['src/**/*.ts', '!src/**/*.d.ts', '!src/**/index.ts'],
  testRunner: 'jest',
  jest: { configFile: 'jest.config.ts' },
  reporters: ['html', 'clear-text', 'progress'],
  thresholds: { high: 80, low: 60, break: 50 },
  concurrency: 4,
  timeoutMS: 10000,  # 10000: 10 seconds in ms
};
```

**Example surviving mutant and fix:**
```
Mutant: src/utils/discount.ts:15 -- ConditionalExpression
  Original:  if (total > 100)
  Mutant:    if (total >= 100)
  Status:    SURVIVED

Fix -- add boundary test:
it('does not apply discount at exactly 100', () => {
  expect(calculateDiscount(100)).toBe(0);
});
it('applies discount above 100', () => {
  expect(calculateDiscount(101)).toBe(10.1);
});
```

**mutmut for Python:**
```bash
# Run mutation testing
mutmut run --paths-to-mutate=src/ --tests-dir=tests/

# View surviving mutants
mutmut results

# Inspect a specific mutant
mutmut show 42
```

## Resources

- Stryker Mutator: https://stryker-mutator.io/
- mutmut (Python): https://github.com/boxed/mutmut
- PITest (Java): https://pitest.org/
- go-mutesting: https://github.com/zimmski/go-mutesting
- Mutation testing theory: https://en.wikipedia.org/wiki/Mutation_testing

Related Skills

generating-unit-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Test automatically generate comprehensive unit tests from source code covering happy paths, edge cases, and error conditions. Use when creating test coverage for functions, classes, or modules. Trigger with phrases like "generate unit tests", "create tests for", or "add test coverage".

managing-snapshot-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Create and validate component snapshots for UI regression testing. Use when performing specialized testing. Trigger with phrases like "update snapshots", "test UI snapshots", or "validate component snapshots".

running-smoke-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Execute fast smoke tests validating critical functionality after deployment. Use when performing specialized testing. Trigger with phrases like "run smoke tests", "quick validation", or "test critical paths".

tracking-regression-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Track and manage regression test suites across releases. Use when performing specialized testing. Trigger with phrases like "track regressions", "manage regression suite", or "validate against baseline".

running-performance-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Execute load testing, stress testing, and performance benchmarking. Use when performing specialized testing. Trigger with phrases like "run load tests", "test performance", or "benchmark the system".

running-integration-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Execute integration tests validating component interactions and system integration. Use when performing specialized testing. Trigger with phrases like "run integration tests", "test integration", or "validate component interactions".

running-e2e-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Execute end-to-end tests covering full user workflows across frontend and backend. Use when performing specialized testing. Trigger with phrases like "run end-to-end tests", "test user flows", or "execute E2E suite".

managing-database-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Test database testing including fixtures, transactions, and rollback management. Use when performing specialized testing. Trigger with phrases like "test the database", "run database tests", or "validate data integrity".

running-chaos-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Execute chaos engineering experiments to test system resilience. Use when performing specialized testing. Trigger with phrases like "run chaos tests", "test resilience", or "inject failures".

running-load-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Create and execute load tests for performance validation using k6, JMeter, and Artillery. Use when validating application performance under load conditions or identifying bottlenecks. Trigger with phrases like "run load test", "create stress test", or "validate performance under load".

running-clustering-algorithms

1868

from jeremylongshore/claude-code-plugins-plus-skills

Analyze datasets by running clustering algorithms (K-means, DBSCAN, hierarchical) to identify data groups. Use when requesting "run clustering", "cluster analysis", or "group data points". Trigger with relevant phrases based on skill purpose.

graphql-mutation-builder

1868

from jeremylongshore/claude-code-plugins-plus-skills

Graphql Mutation Builder - Auto-activating skill for API Development. Triggers on: graphql mutation builder, graphql mutation builder Part of the API Development skill category.