bug-clustering

Internal process for the bug-clusterer agent. Defines the step-by-step procedure for parsing, classifying, redacting, scoring, and clustering bug candidates from raw X/Twitter posts. Not user-invocable — loaded by the agent via its `skills: ["bug-clustering"]` frontmatter property.

1,868 stars

byjeremylongshore

View on GitHub Installation ↓

Best use case

bug-clustering is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using bug-clustering should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bug-clustering/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/mcp/x-bug-triage/skills/bug-clustering/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bug-clustering/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bug-clustering Compares

Feature / Agent	bug-clustering	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

SKILL.md Source

# Bug Clustering Process

Step-by-step procedure for transforming raw XPost objects into structured, clustered bug candidates with PII redaction and reliability scoring.

## Instructions

### Step 1: Parse

For each XPost, produce a BugCandidate with all 33 fields using `lib/parser.ts`:
- Extract product_surface, feature_area, symptoms, error_strings, repro_hints
- Extract urls, media_keys, language, conversation references
- Determine source_type (mention, reply, quote_post, search_hit)

### Step 1.5: Deduplicate

Before classification, run content-similarity deduplication using `lib/dedupe.ts`:
- Call `deduplicateCandidates()` with parsed candidates and the `candidate_dedup.hybrid_similarity_threshold` from `config/cluster-matching-thresholds.json` (default 0.70)
- Uses char-trigram + token-Jaccard hybrid similarity
- Does NOT remove posts — tags them as duplicate groups with a canonical post (highest engagement)
- Only canonical posts and non-duplicates (`forward_ids`) proceed to classification
- Log dedup stats: `"{n} posts ({m} unique, {k} duplicate groups)"`

### Step 2: Classify

Run `lib/classifier.ts` on each candidate:
- Assign one of 12 classifications with confidence score (0.0-1.0) and rationale
- Sarcastic bug reports get classified separately — still treated as signal

### Step 3: Redact PII

Run `lib/redactor.ts` on each candidate:
- Detect 6 PII types: email, API key, phone, account ID, media flag, URL token
- Replace with `[REDACTED:type]` tags
- Set pii_flags array and raw_text_storage_policy

### Step 4: Score Reliability

Run `lib/reporter-scorer.ts` on each candidate:
- 4 dimensions: report quality, independence, account authenticity, historical accuracy
- Composite reporter_reliability_score (0.0-1.0)

### Step 5: Tag Reporter Category

Match author against approved_accounts config:
- Categories: public, internal, partner, tester

### Step 6: Cluster

Using `lib/clusterer.ts` and `lib/signatures.ts`:
- Generate deterministic bug signature from error_strings + symptoms + feature_area
- Match against active_clusters at >=70% signature overlap
- Family-first guard: different ClusterFamilies NEVER cluster together
- New match: create cluster (initial severity "low")
- Existing match: update report_count, last_seen, sub_status
- Resolved match: reopen with sub_status "regression_reopened"
- Suppressed match: skip, log to audit

### Step 7: Persist

- Insert candidates to DB via `lib/db.ts`
- Insert/update clusters and cluster_posts junction
- Write audit events for each classification, redaction, and cluster action

## References

Load evidence tier definitions for proper cluster evidence assessment:
```
!cat skills/x-bug-triage/references/evidence-policy.md
```

Load data model reference for BugCandidate fields and cluster schemas:
```
!cat skills/x-bug-triage/references/schemas.md
```

Related Skills

running-clustering-algorithms

1868

from jeremylongshore/claude-code-plugins-plus-skills

Analyze datasets by running clustering algorithms (K-means, DBSCAN, hierarchical) to identify data groups. Use when requesting "run clustering", "cluster analysis", or "group data points". Trigger with relevant phrases based on skill purpose.

schema-optimization-orchestrator

1868

from jeremylongshore/claude-code-plugins-plus-skills

Multi-phase schema optimization workflow orchestrator. Creates session directories, spawns phase agents sequentially, validates outputs, aggregates results. Trigger: "run schema optimization", "optimize schema workflow", "execute schema phases"

test-skill

1868

from jeremylongshore/claude-code-plugins-plus-skills

Test skill for E2E validation. Trigger with "run test skill" or "execute test". Use this skill when testing skill activation and tool permissions.

example-skill

1868

from jeremylongshore/claude-code-plugins-plus-skills

Brief description of what this skill does and when the model should activate it. Use when [describe the user's intent or situation]. Trigger with "example phrase", "another trigger", "/example-skill".

testing-visual-regression

1868

from jeremylongshore/claude-code-plugins-plus-skills

Detect visual changes in UI components using screenshot comparison. Use when detecting unintended UI changes or pixel differences. Trigger with phrases like "test visual changes", "compare screenshots", or "detect UI regressions".

generating-unit-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Test automatically generate comprehensive unit tests from source code covering happy paths, edge cases, and error conditions. Use when creating test coverage for functions, classes, or modules. Trigger with phrases like "generate unit tests", "create tests for", or "add test coverage".

generating-test-reports

1868

from jeremylongshore/claude-code-plugins-plus-skills

Generate comprehensive test reports with metrics, coverage, and visualizations. Use when performing specialized testing. Trigger with phrases like "generate test report", "create test documentation", or "show test metrics".

orchestrating-test-execution

1868

from jeremylongshore/claude-code-plugins-plus-skills

Test coordinate parallel test execution across multiple environments and frameworks. Use when performing specialized testing. Trigger with phrases like "orchestrate tests", "run parallel tests", or "coordinate test execution".

managing-test-environments

1868

from jeremylongshore/claude-code-plugins-plus-skills

Test provision and manage isolated test environments with configuration and data. Use when performing specialized testing. Trigger with phrases like "manage test environment", "provision test env", or "setup test infrastructure".

generating-test-doubles

1868

from jeremylongshore/claude-code-plugins-plus-skills

Generate mocks, stubs, spies, and fakes for dependency isolation. Use when creating mocks, stubs, or test isolation fixtures. Trigger with phrases like "generate mocks", "create test doubles", or "setup stubs".

generating-test-data

1868

from jeremylongshore/claude-code-plugins-plus-skills

Generate realistic test data including edge cases and boundary conditions. Use when creating realistic fixtures or edge case test data. Trigger with phrases like "generate test data", "create fixtures", or "setup test database".

analyzing-test-coverage

1868

from jeremylongshore/claude-code-plugins-plus-skills

Analyze code coverage metrics and identify untested code paths. Use when analyzing untested code or coverage gaps. Trigger with phrases like "analyze coverage", "check test coverage", or "find untested code".