root-cause-analysis
Performs systematic root cause analysis to identify the true source of bugs, errors, and unexpected behavior through structured investigation phases — not just treating symptoms. Use when a user reports a bug, crash, error, or broken behavior and needs to debug, troubleshoot, or investigate why something is not working; especially for complex or intermittent issues across multiple components. Applies the Five Whys method, hypothesis-driven testing, stack trace analysis, git blame/log evidence gathering, and causal chain documentation to isolate and confirm root causes before applying any fix.
Best use case
root-cause-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Performs systematic root cause analysis to identify the true source of bugs, errors, and unexpected behavior through structured investigation phases — not just treating symptoms. Use when a user reports a bug, crash, error, or broken behavior and needs to debug, troubleshoot, or investigate why something is not working; especially for complex or intermittent issues across multiple components. Applies the Five Whys method, hypothesis-driven testing, stack trace analysis, git blame/log evidence gathering, and causal chain documentation to isolate and confirm root causes before applying any fix.
Teams using root-cause-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/root-cause-analysis/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How root-cause-analysis Compares
| Feature / Agent | root-cause-analysis | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Performs systematic root cause analysis to identify the true source of bugs, errors, and unexpected behavior through structured investigation phases — not just treating symptoms. Use when a user reports a bug, crash, error, or broken behavior and needs to debug, troubleshoot, or investigate why something is not working; especially for complex or intermittent issues across multiple components. Applies the Five Whys method, hypothesis-driven testing, stack trace analysis, git blame/log evidence gathering, and causal chain documentation to isolate and confirm root causes before applying any fix.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Best AI Skills for ChatGPT
Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.
AI Agent for SaaS Idea Validation
Use AI agent skills for SaaS idea validation, market research, customer discovery, competitor analysis, and documenting startup hypotheses.
SKILL.md Source
# Root Cause Analysis You are performing systematic root cause analysis to find the true source of a bug. Do not apply fixes until you understand WHY the bug exists. ## Core Principle **Never fix a symptom. Always find and fix the root cause.** ## The Five Whys Method Ask "Why?" repeatedly to drill down to the root cause: 1. **Why** did the API return an error? → The database query failed 2. **Why** did the database query fail? → The connection pool was exhausted 3. **Why** was the pool exhausted? → **ROOT CAUSE:** Missing `finally` block to close connections ## Investigation Phases ### Phase 1: Reproduce the Bug Before investigating: 1. **Reproduce consistently** - If you can't reproduce it, you can't verify a fix 2. **Document reproduction steps** - Exact sequence of actions 3. **Note environment details** - OS, versions, configuration 4. **Identify minimal reproduction** - Smallest case that shows the bug Questions to answer: - Does it happen every time or intermittently? - Does it happen in all environments? - When did it start happening? (recent changes) ### Phase 2: Gather Evidence Collect information before forming theories: - Error messages and stack traces - Log files (application, system, database) - Recent code changes (git log, blame) - User reports and reproduction steps - Monitoring data (metrics, APM) - Related issues (search issue tracker) Do NOT: - Make changes while gathering evidence - Assume you know the cause without evidence - Ignore related symptoms ### Phase 3: Form Hypotheses Based on evidence, create ranked hypotheses: | Priority | Hypothesis | Evidence | Test Plan | |----------|------------|----------|-----------| | 1 | Connection leak in UserService | Stack trace shows connection pool | Add logging, check usage | | 2 | Query timeout too short | Occurs under load | Test with longer timeout | | 3 | Database server overload | Correlates with peak hours | Check DB metrics | For each hypothesis: - What evidence supports it? - What evidence contradicts it? - How can we test it? ### Phase 4: Test Hypotheses Test each hypothesis systematically: 1. **Start with highest probability** 2. **Design a definitive test** - Should clearly confirm or reject 3. **Make ONE change at a time** 4. **Document results** If hypothesis is rejected: - Cross it off the list - Re-evaluate remaining hypotheses - Consider if new evidence suggests new hypotheses ### Phase 5: Verify Root Cause Before declaring root cause found: - [ ] Can you explain the full causal chain? - [ ] Does fixing it consistently prevent the bug? - [ ] Does it explain ALL observed symptoms? - [ ] Is there nothing earlier in the chain that could be fixed? ## Common Root Cause Categories - **Code Defects:** logic errors, boundary conditions, race conditions, resource leaks, null/undefined handling - **Design Issues:** missing error handling, inadequate validation, poor state management, coupling - **Environment:** configuration errors, resource constraints, version mismatches, network issues - **Data Issues:** invalid input, data corruption, schema mismatches, encoding problems ## Evidence Collection Commands ```bash # Recent changes to relevant files git log --oneline -20 -- path/to/file # Who changed this line git blame path/to/file # Changes since last working version git diff v1.2.3..HEAD -- src/ # Search for related error handling grep -r "catch\|error\|throw" --include="*.ts" src/ ``` ## Red Flags - You Haven't Found Root Cause - "I'm not sure why, but this fix works" - "The bug went away after I restarted" - "I added a check to prevent this case" - "It's probably a race condition somewhere" These suggest symptom treatment, not root cause resolution. ## Documentation Template When root cause is found, document: ```markdown ## Bug: [Description] ### Root Cause [Clear explanation of why the bug occurred] ### Evidence - [Evidence 1] - [Evidence 2] ### Causal Chain 1. [Initial trigger] 2. [Intermediate cause] 3. [Root cause] 4. [Observed symptom] ### Fix [Description of the fix and why it addresses root cause] ### Prevention [How to prevent similar issues in the future] ``` ## Integration with Other Skills After finding root cause: - Use **testing/red-green-refactor** to write a test that exposes the bug - Use **planning/verification-gates** to validate the fix - Consider **collaboration/structured-review** for complex fixes
Related Skills
find-skills
Discovers, searches, and installs skills from multiple AI agent skill marketplaces (400K+ skills) using the SkillKit CLI. Supports browsing official partner collections (Anthropic, Vercel, Supabase, Stripe, and more) and community repositories, searching by domain or technology, and installing specific skills from GitHub. Use when the user wants to find, browse, or install new agent skills, plugins, extensions, or add-ons; asks 'is there a skill for X' or 'find a skill for X'; wants to explore a skill store or marketplace; needs to extend agent capabilities in areas like React, testing, DevOps, security, or APIs; or says 'browse skills', 'search skill marketplace', 'install a skill', or 'what skills are available'.
test-patterns
Applies proven testing patterns — Arrange-Act-Assert (AAA), Given-When-Then, Test Data Builders, Object Mother, parameterized tests, fixtures, spies, and test doubles — to help write maintainable, reliable, and readable test suites. Use when the user asks about writing unit tests, integration tests, or end-to-end tests; structuring test cases or test suites; applying TDD or BDD practices; working with mocks, stubs, spies, or fakes; improving test coverage or reducing flakiness; or needs guidance on test organization, naming conventions, or assertions in frameworks like Jest, Vitest, pytest, or similar.
red-green-refactor
Guides the red-green-refactor TDD workflow: write a failing test first, implement the minimum code to make it pass, then refactor while keeping tests green. Use when a user asks to practice TDD, write tests first, follow red-green-refactor, do test-driven development, write failing tests before code, or phrases like 'make the test pass', 'test coverage', or 'unit tests before implementation'.
testing-anti-patterns
Reviews test code to identify and fix common testing anti-patterns including flaky tests, over-mocking, brittle assertions, test interdependency, and hidden test logic. Flags bad patterns, explains the specific defect, and provides corrected implementations. Use when reviewing test code, debugging intermittent or unreliable test failures, or when the user mentions flaky tests, test smells, brittle tests, test isolation issues, mock overuse, slow tests, or test maintenance problems.
verification-gates
Creates explicit validation checkpoints (verification gates) between project phases to catch errors early and ensure quality before proceeding. Use when the user asks about quality gates, milestone checks, phase transitions, approval steps, go/no-go decision points, or preventing cascading errors across a multi-step workflow. Produces acceptance criteria checklists, automated CI gate configurations, manual sign-off requirements, and conditional review rules for scenarios such as security changes, API changes, or database migrations.
task-decomposition
Breaks down complex software, writing, or research tasks into small, atomic, independently completable units with dependency graphs and milestone breakdowns. Use when the user asks to plan a project, decompose a feature, create subtasks, split up work, or needs help organizing a large piece of work into a step-by-step plan. Triggered by phrases like "break down", "decompose", "where do I start", "too big", "split into tasks", "work breakdown", or "task list".
design-first
Guides the creation of technical design documents before writing code, producing architecture diagrams, data models, API interface definitions, implementation plans, and multi-option trade-off analyses. Use when the user asks to plan a feature, architect a system, design an API, explore implementation approaches, or requests a technical design or spec before coding — especially for complex features involving multiple components, ambiguous requirements, or significant architectural changes.
skill-authoring
Creates and structures SKILL.md files for AI coding agents, including YAML frontmatter, trigger phrases, directive instructions, decision trees, code examples, and verification checklists. Use when the user asks to write a new skill, create a skill file, author agent capabilities, generate skill documentation, or define a skill template for Claude Code agents.
trace-and-isolate
Applies systematic tracing and isolation techniques to pinpoint exactly where a bug originates in code. Use when a bug is hard to locate, code is not working as expected, an error or crash appears with unclear cause, a regression was introduced between recent commits, or you need to narrow down which component, function, or line is faulty. Covers binary search debugging, git bisect for regressions, strategic logging with [TRACE] patterns, data and control flow tracing, component isolation, minimal reproduction cases, conditional breakpoints, and watch expressions across TypeScript, SQL, and bash.
hypothesis-testing
Applies the scientific method to debugging by helping users form specific, testable hypotheses, design targeted experiments, and systematically confirm or reject theories to find root causes. Use when a user says their code isn't working, they're getting an error, something broke, they want to troubleshoot a bug, or they're trying to figure out what's causing an issue. Concrete actions include isolating failing components, forming and testing hypotheses, analyzing error messages, tracing execution paths, and interpreting test results to narrow down root causes.
structured-code-review
Performs a structured five-stage code review covering requirements compliance, correctness, code quality, testing, and security/performance. Each stage uses targeted checklists and categorized feedback (Blocker/Major/Minor/Nit) with actionable suggestions and rationale. Use when the user asks for code review, PR feedback, pull request review, or wants their code checked for bugs, style issues, or vulnerabilities — triggered by phrases like "review my code", "check this PR", "review my changes", "pull request review", or "code feedback".
parallel-investigation
Coordinates parallel investigation threads to simultaneously explore multiple hypotheses or root causes across different system areas. Use when debugging production incidents, slow API performance, multi-system integration failures, or complex bugs where the root cause is unclear and multiple plausible theories exist; when serial troubleshooting is too slow; or when multiple investigators can divide root-cause analysis work. Provides structured phases for problem decomposition, thread assignment, sync points with Continue/Pivot/Converge decisions, and final report synthesis.