multiAI Summary Pending
analyze-project
Forensic root cause analyzer for Antigravity sessions. Classifies scope deltas, rework patterns, root causes, hotspots, and auto-improves prompts/health.
28,273 stars
bysickn33
Installation
Claude Code / Cursor / Codex
$curl -o ~/.claude/skills/analyze-project/SKILL.md --create-dirs "https://raw.githubusercontent.com/sickn33/antigravity-awesome-skills/main/skills/analyze-project/SKILL.md"
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/analyze-project/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How analyze-project Compares
| Feature / Agent | analyze-project | Standard Approach |
|---|---|---|
| Platform Support | multi | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Forensic root cause analyzer for Antigravity sessions. Classifies scope deltas, rework patterns, root causes, hotspots, and auto-improves prompts/health.
Which AI agents support this skill?
This skill is compatible with multi.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# /analyze-project — Root Cause Analyst Workflow Analyze AI-assisted coding sessions in `~/.gemini/antigravity/brain/` and produce a report that explains not just **what happened**, but **why it happened**, **who/what caused it**, and **what should change next time**. ## Goal For each session, determine: 1. What changed from the initial ask to the final executed work 2. Whether the main cause was: - user/spec - agent - repo/codebase - validation/testing - legitimate task complexity 3. Whether the opening prompt was sufficient 4. Which files/subsystems repeatedly correlate with struggle 5. What changes would most improve future sessions ## Global Rules - Treat `.resolved.N` counts as **iteration signals**, not proof of failure - Separate **human-added scope**, **necessary discovered scope**, and **agent-introduced scope** - Separate **agent error** from **repo friction** - Every diagnosis must include **evidence** and **confidence** - Confidence levels: - **High** = direct artifact/timestamp evidence - **Medium** = multiple supporting signals - **Low** = plausible inference, not directly proven - Evidence precedence: - artifact contents > timestamps > metadata summaries > inference - If evidence is weak, say so --- ## Step 0.5: Session Intent Classification Classify the primary session intent from objective + artifacts: - `DELIVERY` - `DEBUGGING` - `REFACTOR` - `RESEARCH` - `EXPLORATION` - `AUDIT_ANALYSIS` Record: - `session_intent` - `session_intent_confidence` Use intent to contextualize severity and rework shape. Do not judge exploratory or research sessions by the same standards as narrow delivery sessions. --- ## Step 1: Discover Conversations 1. Read available conversation summaries from system context 2. List conversation folders in the user’s Antigravity `brain/` directory 3. Build a conversation index with: - `conversation_id` - `title` - `objective` - `created` - `last_modified` 4. If the user supplied a keyword/path, filter to matching conversations; otherwise analyze all Output: indexed list of conversations to analyze. --- ## Step 2: Extract Session Evidence For each conversation, read if present: ### Core artifacts - `task.md` - `implementation_plan.md` - `walkthrough.md` ### Metadata - `*.metadata.json` ### Version snapshots - `task.md.resolved.0 ... N` - `implementation_plan.md.resolved.0 ... N` - `walkthrough.md.resolved.0 ... N` ### Additional signals - other `.md` artifacts - timestamps across artifact updates - file/folder/subsystem names mentioned in plans/walkthroughs - validation/testing language - explicit acceptance criteria, constraints, non-goals, and file targets Record per conversation: #### Lifecycle - `has_task` - `has_plan` - `has_walkthrough` - `is_completed` - `is_abandoned_candidate` = task exists but no walkthrough #### Revision / change volume - `task_versions` - `plan_versions` - `walkthrough_versions` - `extra_artifacts` #### Scope - `task_items_initial` - `task_items_final` - `task_completed_pct` - `scope_delta_raw` - `scope_creep_pct_raw` #### Timing - `created_at` - `completed_at` - `duration_minutes` #### Content / quality - `objective_text` - `initial_plan_summary` - `final_plan_summary` - `initial_task_excerpt` - `final_task_excerpt` - `walkthrough_summary` - `mentioned_files_or_subsystems` - `validation_requirements_present` - `acceptance_criteria_present` - `non_goals_present` - `scope_boundaries_present` - `file_targets_present` - `constraints_present` --- ## Step 3: Prompt Sufficiency Score the opening request on a 0–2 scale for: - **Clarity** - **Boundedness** - **Testability** - **Architectural specificity** - **Constraint awareness** - **Dependency awareness** Create: - `prompt_sufficiency_score` - `prompt_sufficiency_band` = High / Medium / Low Then note which missing prompt ingredients likely contributed to later friction. Do not punish short prompts by default; a narrow, obvious task can still have high sufficiency. --- ## Step 4: Scope Change Classification Classify scope change into: - **Human-added scope** — new asks beyond the original task - **Necessary discovered scope** — work required to complete the original task correctly - **Agent-introduced scope** — likely unnecessary work introduced by the agent Record: - `scope_change_type_primary` - `scope_change_type_secondary` (optional) - `scope_change_confidence` - evidence Keep one short example in mind for calibration: - Human-added: “also refactor nearby code while you’re here” - Necessary discovered: hidden dependency must be fixed for original task to work - Agent-introduced: extra cleanup or redesign not requested and not required --- ## Step 5: Rework Shape Classify each session into one primary pattern: - **Clean execution** - **Early replan then stable finish** - **Progressive scope expansion** - **Reopen/reclose churn** - **Late-stage verification churn** - **Abandoned mid-flight** - **Exploratory / research session** Record: - `rework_shape` - `rework_shape_confidence` - evidence --- ## Step 6: Root Cause Analysis For every non-clean session, assign: ### Primary root cause One of: - `SPEC_AMBIGUITY` - `HUMAN_SCOPE_CHANGE` - `REPO_FRAGILITY` - `AGENT_ARCHITECTURAL_ERROR` - `VERIFICATION_CHURN` - `LEGITIMATE_TASK_COMPLEXITY` ### Secondary root cause Optional if materially relevant ### Root-cause guidance - **SPEC_AMBIGUITY**: opening ask lacked boundaries, targets, criteria, or constraints - **HUMAN_SCOPE_CHANGE**: scope expanded because the user broadened the task - **REPO_FRAGILITY**: hidden coupling, brittle files, unclear architecture, or environment issues forced extra work - **AGENT_ARCHITECTURAL_ERROR**: wrong files, wrong assumptions, wrong approach, hallucinated structure - **VERIFICATION_CHURN**: implementation mostly worked, but testing/validation caused loops - **LEGITIMATE_TASK_COMPLEXITY**: revisions were expected for the difficulty and not clearly avoidable Every root-cause assignment must include: - evidence - why stronger alternative causes were rejected - confidence --- ## Step 6.5: Session Severity Scoring (0–100) Assign each session a severity score to prioritize attention. Components (sum, clamp 0–100): - **Completion failure**: 0–25 (`abandoned = 25`) - **Replanning intensity**: 0–15 - **Scope instability**: 0–15 - **Rework shape severity**: 0–15 - **Prompt sufficiency deficit**: 0–10 (`low = 10`) - **Root cause impact**: 0–10 (`REPO_FRAGILITY` / `AGENT_ARCHITECTURAL_ERROR` highest) - **Hotspot recurrence**: 0–10 Bands: - **0–19 Low** - **20–39 Moderate** - **40–59 Significant** - **60–79 High** - **80–100 Critical** Record: - `session_severity_score` - `severity_band` - `severity_drivers` = top 2–4 contributors - `severity_confidence` Use severity as a prioritization signal, not a verdict. Always explain the drivers. Contextualize severity using session intent so research/exploration sessions are not over-penalized. --- ## Step 7: Subsystem / File Clustering Across all conversations, cluster repeated struggle by file, folder, or subsystem. For each cluster, calculate: - number of conversations touching it - average revisions - completion rate - abandonment rate - common root causes - average severity Goal: identify whether friction is mostly prompt-driven, agent-driven, or concentrated in specific repo areas. --- ## Step 8: Comparative Cohorts Compare: - first-shot successes vs re-planned sessions - completed vs abandoned - high prompt sufficiency vs low prompt sufficiency - narrow-scope vs high-scope-growth - short sessions vs long sessions - low-friction subsystems vs high-friction subsystems For each comparison, identify: - what differs materially - which prompt traits correlate with smoother execution - which repo traits correlate with repeated struggle Do not just restate averages; extract cautious evidence-backed patterns. --- ## Step 9: Non-Obvious Findings Generate 3–7 findings that are not simple metric restatements. Each finding must include: - observation - why it matters - evidence - confidence Examples of strong findings: - replans cluster around weak file targeting rather than weak acceptance criteria - scope growth often begins after initial success, suggesting post-success human expansion - auth-related struggle is driven more by repo fragility than agent hallucination --- ## Step 10: Report Generation Create `session_analysis_report.md` with this structure: # 📊 Session Analysis Report — [Project Name] **Generated**: [timestamp] **Conversations Analyzed**: [N] **Date Range**: [earliest] → [latest] ## Executive Summary | Metric | Value | Rating | |:---|:---|:---| | First-Shot Success Rate | X% | 🟢/🟡/🔴 | | Completion Rate | X% | 🟢/🟡/🔴 | | Avg Scope Growth | X% | 🟢/🟡/🔴 | | Replan Rate | X% | 🟢/🟡/🔴 | | Median Duration | Xm | — | | Avg Session Severity | X | 🟢/🟡/🔴 | | High-Severity Sessions | X / N | 🟢/🟡/🔴 | Thresholds: - First-shot: 🟢 >70 / 🟡 40–70 / 🔴 <40 - Scope growth: 🟢 <15 / 🟡 15–40 / 🔴 >40 - Replan rate: 🟢 <20 / 🟡 20–50 / 🔴 >50 Avg severity guidance: - 🟢 <25 - 🟡 25–50 - 🔴 >50 Note: avg severity is an aggregate health signal, not the same as per-session severity bands. Then add a short narrative summary of what is going well, what is breaking down, and whether the main issue is prompt quality, repo fragility, workflow discipline, or validation churn. ## Root Cause Breakdown | Root Cause | Count | % | Notes | |:---|:---|:---|:---| ## Prompt Sufficiency Analysis - common traits of high-sufficiency prompts - common missing inputs in low-sufficiency prompts - which missing prompt ingredients correlate most with replanning or abandonment ## Scope Change Analysis Separate: - Human-added scope - Necessary discovered scope - Agent-introduced scope ## Rework Shape Analysis Summarize the main failure patterns across sessions. ## Friction Hotspots Show the files/folders/subsystems most associated with replanning, abandonment, verification churn, and high severity. ## First-Shot Successes List the cleanest sessions and extract what made them work. ## Non-Obvious Findings List 3–7 evidence-backed findings with confidence. ## Severity Triage List the highest-severity sessions and say whether the best intervention is: - prompt improvement - scope discipline - targeted skill/workflow - repo refactor / architecture cleanup - validation/test harness improvement ## Recommendations For each recommendation, use: - **Observed pattern** - **Likely cause** - **Evidence** - **Change to make** - **Expected benefit** - **Confidence** ## Per-Conversation Breakdown | # | Title | Intent | Duration | Scope Δ | Plan Revs | Task Revs | Root Cause | Rework Shape | Severity | Complete? | |:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---| --- ## Step 11: Optional Post-Analysis Improvements If appropriate, also: - update any local project-health or memory artifact (if present) with recurring failure modes and fragile subsystems - generate `prompt_improvement_tips.md` from high-sufficiency / first-shot-success sessions - suggest missing skills or workflows when the same subsystem or task sequence repeatedly causes struggle Only recommend workflows/skills when the pattern appears repeatedly. --- ## Final Output Standard The workflow must produce: 1. metrics summary 2. root-cause diagnosis 3. prompt-sufficiency assessment 4. subsystem/friction map 5. severity triage and prioritization 6. evidence-backed recommendations 7. non-obvious findings Prefer explicit uncertainty over fake precision.