analyze-test-failures

This skill should be used when the user asks to "analyze failing tests", "debug test failures", "investigate test errors", or provides specific failing test cases to examine. Analyzes failing test cases with a balanced, investigative approach to determine whether failures indicate test issues or genuine bugs.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

analyze-test-failures is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using analyze-test-failures should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/analyze-test-failures/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/testing-security/analyze-test-failures/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/analyze-test-failures/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How analyze-test-failures Compares

Feature / Agent	analyze-test-failures	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# Analyze Test Failures

Analyze failing test cases with a balanced, investigative approach.

## Context

When tests fail, there are two primary possibilities:

1. **False positive**: The test itself is incorrect
2. **True positive**: The test discovered a genuine bug

Assuming tests are wrong by default is a dangerous anti-pattern that defeats the purpose of testing.

## Analysis Process

### 1. Initial Analysis

- Read the failing test carefully, understanding its intent
- Examine the test's assertions and expected behavior
- Review the error message and stack trace

### 2. Investigate the Implementation

- Check the actual implementation being tested
- Trace through the code path that leads to the failure
- Verify that implementation matches documented behavior

### 3. Apply Critical Thinking

For each failing test, ask:

- What behavior is the test trying to verify?
- Is this behavior clearly documented or implied by the API design?
- Does the current implementation actually provide this behavior?
- Could this be an edge case the implementation missed?

### 4. Make a Determination

Classify the failure as one of:

| Classification         | Meaning                           |
| ---------------------- | --------------------------------- |
| **Test Bug**           | Test's expectations are incorrect |
| **Implementation Bug** | Code doesn't behave as it should  |
| **Ambiguous**          | Intended behavior is unclear      |

### 5. Document Reasoning

Provide clear explanation including:

- Evidence supporting the conclusion
- Specific mismatch between expectation and reality
- Recommended fix (to test or implementation)

## Example Analyses

### Example 1: Ambiguous Behavior

**Scenario**: Test expects `calculateDiscount(100, 0.2)` to return 20, but it returns 80

**Analysis**:

- Test assumes function returns discount amount
- Implementation returns price after discount
- Function name is ambiguous

**Determination**: Ambiguous
**Recommendation**: Check documentation or clarify intended behavior

### Example 2: Implementation Bug

**Scenario**: Test expects `validateEmail("user@example.com")` to return true, but it returns false

**Analysis**:

- Test provides a valid email format
- Implementation regex is missing support for dots in domain
- Other valid emails also fail

**Determination**: Implementation Bug
**Recommendation**: Fix the regex to properly validate email addresses per RFC standards

### Example 3: Test Bug

**Scenario**: Test expects `divide(10, 0)` to return 0, but it throws an error

**Analysis**:

- Test assumes division by zero returns 0
- Implementation throws DivisionByZeroError
- Standard mathematical behavior is to treat as undefined/error

**Determination**: Test Bug
**Recommendation**: Update test to expect an error, not 0

## Output Format

For each failing test, provide:

```text
Test: [test name/description]
Failure: [what failed and how]

Investigation:
- Test expects: [expected behavior]
- Implementation does: [actual behavior]
- Root cause: [why they differ]

Determination: [Test Bug | Implementation Bug | Ambiguous]

Recommendation:
[Specific fix to either test or implementation]
```

## Key Principles

- NEVER automatically assume the test is wrong
- ALWAYS consider that the test might have found a real bug
- When uncertain, lean toward investigating the implementation
- Tests are often your specification - they define expected behavior
- A failing test is a gift - it's either catching a bug or clarifying requirements

## Related Skills

- **test-failure-mindset**: Set investigative approach for session
- **comprehensive-test-review**: Full test suite review

Related Skills

e2e-testing

from diegosouzapw/awesome-omni-skill

End-to-end testing workflow with Playwright for browser automation, visual regression, cross-browser testing, and CI/CD integration.

e2e-testing-patterns

from diegosouzapw/awesome-omni-skill

Master end-to-end testing with Playwright and Cypress to build reliable test suites that catch bugs, improve confidence, and enable fast deployment. Use when implementing E2E tests, debugging flaky tests, or establishing testing standards.

e2e-outside-in-test-generator

from diegosouzapw/awesome-omni-skill

Generates comprehensive end-to-end Playwright tests using outside-in methodology

dotnet-uno-testing

from diegosouzapw/awesome-omni-skill

Tests Uno Platform apps. Playwright for WASM, platform-specific patterns, runtime heads.

cve-testing

from diegosouzapw/awesome-omni-skill

CVE vulnerability testing coordinator that identifies technology stacks, researches known vulnerabilities, and tests applications for exploitable CVEs using public exploits and proof-of-concept code.

cui-javascript-unit-testing

from diegosouzapw/awesome-omni-skill

Jest unit testing standards covering configuration, test structure, testing patterns, and coverage requirements

core-tester

from diegosouzapw/awesome-omni-skill

Comprehensive testing and quality assurance specialist for ensuring code quality through testing strategies

configure-ux-testing

from diegosouzapw/awesome-omni-skill

Check and configure UX testing infrastructure (Playwright, accessibility, visual regression)

comprehensive-unit-testing-with-pytest

from diegosouzapw/awesome-omni-skill

Aims for high test coverage using pytest, testing both common and edge cases.

Burp Suite Web Application Testing

from diegosouzapw/awesome-omni-skill

This skill should be used when the user asks to "intercept HTTP traffic", "modify web requests", "use Burp Suite for testing", "perform web vulnerability scanning", "test with Burp Repeater", "analyze HTTP history", or "configure proxy for web testing". It provides comprehensive guidance for using Burp Suite's core features for web application security testing.

burp-suite-testing

from diegosouzapw/awesome-omni-skill

This skill should be used when the user asks to "intercept HTTP traffic", "modify web requests", "use Burp Suite for testing", "perform web vulnerability scanning", "test with Burp ...

backtesting-frameworks

from diegosouzapw/awesome-omni-skill

Build robust backtesting systems for trading strategies with proper handling of look-ahead bias, survivorship bias, and transaction costs. Use when developing trading algorithms, validating strateg...