Flaky Test Analyzer

Diagnoses why tests pass inconsistently and suggests fixes for timing, ordering, and state isolation issues.

8 stars

Best use case

Flaky Test Analyzer is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Diagnoses why tests pass inconsistently and suggests fixes for timing, ordering, and state isolation issues.

Teams using Flaky Test Analyzer should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/flaky-test-analyzer/SKILL.md --create-dirs "https://raw.githubusercontent.com/Notysoty/openagentskills/main/skills/flaky-test-analyzer/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/flaky-test-analyzer/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How Flaky Test Analyzer Compares

Feature / Agent	Flaky Test Analyzer	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Diagnoses why tests pass inconsistently and suggests fixes for timing, ordering, and state isolation issues.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Flaky Test Analyzer

## What this skill does

This skill directs the agent to diagnose flaky tests — tests that sometimes pass and sometimes fail without any code changes. It examines the test code, the code under test, and the failure patterns to identify the root cause category (timing, shared state, ordering, network, randomness, etc.) and then suggests targeted fixes that make the test deterministic.

Use this when a test is unreliable in CI, when a test passes locally but fails on the CI server, or when a test fails intermittently with no obvious pattern.

## How to use

### Claude Code / Cline

Copy this file to `.agents/skills/flaky-test-analyzer/SKILL.md` in your project root.

Then ask:
- *"Use the Flaky Test Analyzer skill on `tests/checkout.test.ts` — it fails about 1 in 5 runs in CI."*
- *"This test passes locally but fails in CI. Use the Flaky Test Analyzer skill to diagnose it."*

Provide:
1. The test file (or the specific test that's flaky)
2. The failure message when it does fail
3. How often it fails (every time? 1 in 10? only in CI?)
4. Any observations about when it fails (after a specific other test? at a specific time of day?)

### Cursor

Add the instructions below to your `.cursorrules` or paste them into the Cursor AI pane. Provide the test code and failure output.

### Codex

Paste the test file, the failure message, and any relevant context. Ask Codex to follow the instructions below.

## The Prompt / Instructions for the Agent

When asked to diagnose a flaky test, follow this process:

### Step 1 — Gather information

Before analyzing, ensure you have:
- The full test code, including `beforeEach`, `afterEach`, `beforeAll`, and `afterAll` hooks
- The code under test (the function or module being tested)
- The error message when the test fails (not just "test failed")
- The failure frequency pattern (always, sometimes, CI only, etc.)
- The test framework and any relevant configuration (jest.config.js, vitest.config.ts, etc.)

If any of these are missing, ask for them.

### Step 2 — Identify the flakiness category

Check for each of these flakiness patterns in order:

**Timing and async issues**
- `setTimeout` or `setInterval` with hardcoded delays that may not be long enough
- Missing `await` on async operations
- Polling for a condition with a timeout that's too short
- Fake timers (`jest.useFakeTimers`) mixed with real timers
- Tests that depend on the exact time of day or system clock

**Shared state and isolation**
- Global variables modified in one test but not reset before the next
- Database records created in one test that affect another
- File system changes not cleaned up
- In-memory caches not cleared between tests
- Static class properties mutated during tests
- Singleton services not reset

**Test ordering dependencies**
- Test A passes only when test B ran first (shared setup)
- Test A fails when run in isolation but passes in the suite
- Tests that rely on a specific execution order

**External dependencies**
- HTTP calls to real external APIs (network flakiness, rate limits)
- File system reads of files that may not exist in CI
- Environment variables that differ between local and CI
- Random number generation with no seed

**Race conditions in the code under test**
- Concurrent operations with no locking
- Event listeners that fire at unpredictable times
- `Promise.all` with side effects that interfere

**Test framework issues**
- Snapshot files out of date
- Test timeouts set too low for the environment

### Step 3 — Diagnose the specific test

After identifying which category applies, pinpoint the exact line(s) causing the flakiness. Explain:
- What assumption the test is making
- Why that assumption is sometimes wrong
- What condition makes it fail vs pass

### Step 4 — Recommend a fix

Provide a concrete, specific fix. Common fixes include:
- Replace `setTimeout(() => ..., 100)` with `waitFor(() => ...)` or `vi.runAllTimers()`
- Add `afterEach(() => { jest.clearAllMocks(); db.cleanup(); })`
- Mock the external API instead of calling it
- Use `jest.useFakeTimers()` consistently and advance time explicitly
- Use a test database that's wiped between runs
- Set a random seed for deterministic random values
- Add proper `await` to async operations

### Step 5 — Format the output

```markdown
## Flaky Test Diagnosis

### Test
`[test name]` in `[file path]`

### Failure Pattern
[How often it fails, under what conditions]

### Root Cause Category
[Timing / Shared state / Test ordering / External dependency / Race condition / Other]

### Root Cause
[2–3 sentences explaining exactly why the test is flaky — what assumption it makes and why that assumption sometimes fails]

### The Problematic Code
[Quote the specific lines that cause the flakiness]

### Fix
[The specific change(s) to make the test deterministic, with code]

### Why This Fix Works
[1–2 sentences explaining why the fix eliminates the non-determinism]

### Prevention
[1 sentence on how to avoid this class of flakiness in future tests]
```

## Example

**Input to Agent:**
> "Use the Flaky Test Analyzer skill. This test fails about 1 in 4 runs in CI:
>
> ```ts
> it('sends a welcome email after registration', async () => {
>   await registerUser({ email: 'test@example.com', password: 'abc123' });
>   setTimeout(() => {
>     expect(emailService.send).toHaveBeenCalledWith({
>       to: 'test@example.com',
>       subject: 'Welcome!'
>     });
>   }, 200);
> });
> ```"

**Output from Agent:**

> ## Flaky Test Diagnosis
>
> ### Test
> `sends a welcome email after registration` — `user.test.ts`
>
> ### Failure Pattern
> Fails ~25% of runs, more often in CI than locally.
>
> ### Root Cause Category
> Timing / async issue
>
> ### Root Cause
> The `expect` inside `setTimeout` is not awaited, so Jest considers the test done as soon as `registerUser` resolves — the assertion runs 200ms later, after Jest has already moved on. When the event loop is busy (common in CI), the 200ms delay may not be enough, and even when it fires, Jest may no longer be tracking it.
>
> ### The Problematic Code
> ```ts
> setTimeout(() => {
>   expect(emailService.send).toHaveBeenCalledWith(...);
> }, 200);
> ```
>
> ### Fix
> ```ts
> it('sends a welcome email after registration', async () => {
>   await registerUser({ email: 'test@example.com', password: 'abc123' });
>   await waitFor(() => {
>     expect(emailService.send).toHaveBeenCalledWith({
>       to: 'test@example.com',
>       subject: 'Welcome!'
>     });
>   });
> });
> ```
> If `waitFor` isn't available (non-React project), use `jest.runAllTimers()` with `jest.useFakeTimers()` or poll manually with a small helper.
>
> ### Why This Fix Works
> `waitFor` repeatedly retries the assertion until it passes or a timeout is reached, making the test resilient to variable execution timing.
>
> ### Prevention
> Never put `expect()` calls inside a raw `setTimeout` — always await the operation that triggers the side effect, or use `waitFor`.

## Notes

- Always provide the failure message, not just "it fails." The exact error often reveals the category immediately.
- "Passes locally, fails in CI" is usually a timing issue (slower machines) or a missing environment variable.
- "Fails only when run with other tests" is almost always a shared state issue — try running the failing test in isolation first to confirm.

Related Skills

Unit Test Writer

from Notysoty/openagentskills

Generates comprehensive unit tests for any function or module with edge cases.

Unit Test Improver

from Notysoty/openagentskills

Reviews existing unit tests for gaps, weak assertions, and missing edge cases, then rewrites them to be more robust.

Stack Trace Analyzer

from Notysoty/openagentskills

Interprets error stack traces to pinpoint root cause, explain what went wrong, and suggest fixes.

Playwright Test Builder

from Notysoty/openagentskills

Generates robust Playwright end-to-end tests for web pages and user flows.

Bug Root Cause Analyzer

from Notysoty/openagentskills

Systematically diagnoses bugs by tracing execution flow and identifying root causes vs symptoms.

Troubleshooting Guide Builder

from Notysoty/openagentskills

Builds a structured troubleshooting guide with symptom → cause → fix format for any tool or system.

Tech Debt Auditor

from Notysoty/openagentskills

Identifies and prioritizes technical debt in a codebase with an effort/impact matrix.

Technical Blog Post Writer

from Notysoty/openagentskills

Writes engaging, accurate technical blog posts targeted at developer audiences.

SQL Query Optimizer

from Notysoty/openagentskills

Reviews SQL queries for performance issues and rewrites them with optimized execution plans.

Sprint Summary Generator

from Notysoty/openagentskills

Converts a list of completed tickets or commits into a clear sprint summary for stakeholders.

Social Post Thread Writer

from Notysoty/openagentskills

Converts a blog post, idea, or document into an engaging Twitter/X or LinkedIn thread with hooks and CTAs.

SEO Metadata Generator

from Notysoty/openagentskills

Generates optimized title tags, meta descriptions, Open Graph tags, and structured data for any web page.