topic-cluster
BERTopic clustering — agrupa retros/PBIs/incidents/lessons en topics tematicos con labels. Filtra ruido, descubre patrones cross-proyecto
Best use case
topic-cluster is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
BERTopic clustering — agrupa retros/PBIs/incidents/lessons en topics tematicos con labels. Filtra ruido, descubre patrones cross-proyecto
Teams using topic-cluster should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/topic-cluster/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How topic-cluster Compares
| Feature / Agent | topic-cluster | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
BERTopic clustering — agrupa retros/PBIs/incidents/lessons en topics tematicos con labels. Filtra ruido, descubre patrones cross-proyecto
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Skill: Topic Cluster
> Descubre patrones que cruzan retros, PBIs, incidents, lessons.
> Ref: SE-033, docs/propuestas/SE-033-topic-cluster-skill.md.
## Cuando usar
- Al cierre de sprint: agrupar retros de N proyectos para detectar temas compartidos
- Auditoria periodica de backlog/incidents: detectar duplicados semanticos
- Post-lesson-extract: agrupar lessons cross-project
- Cuando `retro-patterns`, `backlog-patterns`, `lesson-extract` pierden senal
## Cuando NO usar
- Menos de 6 documentos (HDBSCAN no encuentra clusters utiles)
- Documentos muy cortos (<20 palabras) — embeddings poco senal
- Hot-path <500ms — BERTopic tarda 10-30s en ~100 docs
## Invocacion
```bash
# Input via stdin
cat retros.json | python3 scripts/topic-cluster.py --min-cluster-size 3
# Con JSON pretty
cat pbis.json | python3 scripts/topic-cluster.py --json
```
## Input schema
```json
{
"documents": [
{"id": "retro-2026-q1-alpha", "text": "Sprint planning took 3x expected time..."}
],
"min_cluster_size": 3,
"nr_topics": null
}
```
## Output
```json
{
"topics": [
{
"id": 0,
"label": "sprint planning time",
"keywords": ["sprint", "planning", "time", "overrun"],
"size": 7,
"documents": ["retro-1", "retro-3", "retro-5"]
}
],
"outliers": ["retro-8"],
"backend": "bertopic|fallback-keyword",
"model_info": {"sbert": "all-MiniLM-L6-v2", "docs": 15, "clusters": 3},
"latency_ms": 12000
}
```
## Backends
| Backend | Cuando | Latencia | Calidad |
|---|---|---|---|
| `bertopic` | bertopic+sentence-transformers instalados | 10-30s / 100 docs | Alta — semantic clusters |
| `fallback-keyword` | Sin deps ML | <1s / 100 docs | Media — surface keywords |
## Instalacion (opt-in)
```bash
pip install bertopic sentence-transformers
# Primera invocacion descarga all-MiniLM-L6-v2 (~80MB)
```
Zero-install default: script funciona con fallback keyword sin instalar nada.
## Casos de uso
### Sprint retro cluster
```bash
bash scripts/collect-retros.sh --sprint 42 --json | \
python3 scripts/topic-cluster.py --min-cluster-size 3
```
### Backlog pattern detection
```bash
bash scripts/backlog-dump.sh --project alpha --json | \
python3 scripts/topic-cluster.py --nr-topics auto
```
### Cross-project lessons
```bash
find output/lessons -name "*.json" -exec cat {} \; | \
jq -s '{documents: .}' | \
python3 scripts/topic-cluster.py --min-cluster-size 2
```
## Interpretacion
- `clusters >= 3`: patron claro, revisar labels
- `outliers / total > 30%`: corpus heterogeneo, subir `min_cluster_size` o bajar `nr_topics`
- `size` pequeno (2-3): puede ser noise o patron emergente
## Costes
- Sin deps: 0 MB, <1s
- Con BERTopic: ~200MB sbert + deps, ~800MB RAM
- Egress: solo en primera invocacion (download modelo)
## Referencias
- Spec: `docs/propuestas/SE-033-topic-cluster-skill.md`
- Script: `scripts/topic-cluster.py`
- Probe: `scripts/bertopic-probe.sh`
- Tests: `tests/test-topic-cluster.bats`Related Skills
zoom-out
Elevates perspective from trees to forest. Maps architecture, dependencies, and second-order effects before implementation decisions. Use when designing, when evaluating trade-offs, or at the start of design sessions.
workspace-integrity
Catalogo de integrity auditors — drift CLAUDE.md, rule manifest, orphan rules, agents catalog sync, baseline, agent size
wellbeing-guardian
Sistema proactivo de bienestar individual
web-research
Search the web to resolve context gaps — documentation, versions, CVEs, best practices. Auto-starts SearxNG Docker if available, falls back to WebSearch.
voice-inbox
Transcripción de audio y flujo audio→texto→acción para mensajes de voz
verification-lattice
Multi-layer verification pipeline beyond Code Review
time-tracking-report
Generación de informes de imputación de horas a Excel/Word
tier3-probes
Catalogo de feasibility probes para champions Tier 3 — Scrapling, Oumi, Memvid, BERTopic, Reranker, PDF extract
test-architect
Design and generate highest-quality tests across 16 languages and 14 test types
tech-research-agent
Agente de investigación técnica autónoma — investiga temas, genera informes, notifica al humano designado
team-onboarding
Onboarding y evaluación de competencias para nuevos miembros del equipo
team-coordination
Multi-team orchestration — create teams, assign members, detect cross-team blockers