corpus-index-build
Build graph indices (by-topic, by-year, authors, citation-network) from corpus state using definitions in .aiwg/config.yaml. Replaces manual 3-agent dispatch with a single command.
Best use case
corpus-index-build is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
It is a strong fit for teams already working in Codex.
Build graph indices (by-topic, by-year, authors, citation-network) from corpus state using definitions in .aiwg/config.yaml. Replaces manual 3-agent dispatch with a single command.
Teams using corpus-index-build should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/corpus-index-build/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How corpus-index-build Compares
| Feature / Agent | corpus-index-build | Standard Approach |
|---|---|---|
| Platform Support | Codex | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Build graph indices (by-topic, by-year, authors, citation-network) from corpus state using definitions in .aiwg/config.yaml. Replaces manual 3-agent dispatch with a single command.
Which AI agents support this skill?
This skill is designed for Codex.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Cursor vs Codex for AI Workflows
Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.
AI Agents for Marketing
Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.
SKILL.md Source
# Corpus Index Build
Build research graph indices from corpus state. Reads graph definitions from `.aiwg/config.yaml` and generates by-topic, by-year, authors, and citation-network indices from the current findings and citation data.
## Triggers
- "build the research indices"
- "rebuild corpus graphs"
- "update the topic index"
- "index build"
- `/corpus-index-build`
## Parameters
### `--graph <name>` (optional)
Build a single named graph. Must match a key in `config.yaml` `graphs` section.
### `--all` (optional)
Build all graphs defined in config, including those not in `defaultBuild`. Default behavior builds only `defaultBuild` graphs.
### `--force` (optional)
Rebuild from scratch, ignoring cached state. Default: incremental (only rebuild if source data changed).
### `--format` (optional)
Output format: `full` (default), `summary`, or `json`.
## Configuration
Graphs are defined in `.aiwg/config.yaml`:
```yaml
graphs:
by-topic:
type: cluster
source: findings
groupBy: tags
output: indices/by-topic.md
defaultBuild: true
by-year:
type: timeline
source: findings
groupBy: year
output: indices/by-year.md
defaultBuild: true
authors:
type: entity
source: findings
groupBy: authors
output: indices/authors.md
defaultBuild: true
citation-network:
type: graph
source: citations
edges: [outgoing, incoming]
output: indices/citation-network.md
defaultBuild: false # expensive, build on demand
by-methodology:
type: cluster
source: findings
groupBy: methodology
output: indices/by-methodology.md
defaultBuild: false
```
## Execution Flow
### Phase 1: Load Configuration
1. Read `.aiwg/config.yaml` graph definitions
2. Determine which graphs to build:
- No flags: build all `defaultBuild: true` graphs
- `--graph <name>`: build only the named graph
- `--all`: build every defined graph
3. Check for staleness (skip up-to-date graphs unless `--force`)
### Phase 2: Collect Source Data
For each graph, collect the required data:
**Cluster graphs** (by-topic, by-methodology):
- Scan all `findings/REF-*.md` frontmatter
- Extract the `groupBy` field values (tags, methodology)
- Build `Map<group, Set<REF-XXX>>`
**Timeline graphs** (by-year):
- Extract `year` from each finding's frontmatter
- Build `Map<year, Set<REF-XXX>>` sorted chronologically
**Entity graphs** (authors):
- Extract `authors` field from each finding
- Normalize author names (Last, First → canonical form)
- Build `Map<author, Set<REF-XXX>>`
**Citation graphs** (citation-network):
- Read outgoing and incoming citation data (from citation-backfill output)
- Build adjacency list: `Map<REF-XXX, {outgoing: Set, incoming: Set}>`
- Compute: degree distribution, hubs, isolated nodes
### Phase 3: Generate Index Files
For each graph, write the index markdown to the configured `output` path:
**Cluster index format** (by-topic example):
```markdown
# By Topic Index
Generated: 2026-04-13T12:00:00Z
Sources: 372 findings
## agentic-workflows (47 papers)
| REF | Title | Year | GRADE |
|-----|-------|------|-------|
| REF-001 | Multi-Agent Orchestration | 2024 | High |
| REF-016 | AutoGen Framework | 2023 | High |
...
## multi-agent-systems (31 papers)
...
```
**Citation network format**:
```markdown
# Citation Network
Nodes: 372 | Edges: 1,247 | Density: 0.009
Avg degree: 6.7 | Max hub: REF-016 (34 edges)
## Top 10 Hubs
| REF | Title | In | Out | Total |
...
## Isolated Nodes (0 edges)
| REF | Title | Reason |
...
```
### Phase 4: Report
```
Corpus Index Build
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Graphs built: 3 / 3
by-topic: 47 groups, 372 papers → indices/by-topic.md
by-year: 8 years, 372 papers → indices/by-year.md
authors: 412 authors, 372 papers → indices/authors.md
Skipped (not in defaultBuild):
citation-network: use --graph citation-network to build
by-methodology: use --graph by-methodology to build
```
## Staleness Detection
Each index file stores a `Generated:` timestamp and a source checksum. On incremental builds:
1. Compute checksum of all source frontmatter
2. Compare against stored checksum in the index file
3. Skip if identical (report "up to date")
4. Rebuild if different
## Integration Points
| Component | Relationship |
|-----------|-------------|
| `citation-backfill` | Must run before citation-network graph build |
| `research-gap-detect` | Consumes citation-network graph for cluster analysis (#815) |
| `corpus-snapshot` | Reads index metrics for snapshot reports (#814) |
| `aiwg index build` | The existing CLI command — this skill extends it for research-specific graphs |
| `research-status` | Reports index staleness as a health metric |
## Examples
```bash
# Build default graphs (by-topic, by-year, authors)
/corpus-index-build
# Build a specific graph
/corpus-index-build --graph citation-network
# Build everything including optional graphs
/corpus-index-build --all
# Force full rebuild
/corpus-index-build --force
# JSON output for programmatic use
/corpus-index-build --format json
```
## References
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/citation-backfill/SKILL.md — Prerequisite for citation-network graph
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-gap-detect/SKILL.md — Consumes citation-network
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/corpus-snapshot/SKILL.md — Reads index metrics
- @$AIWG_ROOT/src/artifacts/cli.ts — Existing `aiwg index build` infrastructureRelated Skills
customize-rebuild
Rebuild and redeploy AIWG from local customization source — makes recent edits live
skill-builder
Build Claude skills from extracted documentation. Use after doc-scraper/pdf-extractor to generate uploadable skill packages.
index
Build, query, inspect dependencies, and report statistics for the searchable index of SDLC artifacts in .aiwg/
corpus-snapshot
Generate a comprehensive corpus snapshot report from template, computing all metrics (dimensions, topology, degree distribution, delta from previous) and assisting with analysis sections (clusters, chains, gaps).
corpus-health
Report on research corpus health, completeness, and integrity
corpus-export
Package corpus subsets as distribution archives. Select papers by cluster, topic, REF range, or custom filter; bundle PDFs, analysis docs, citation sidecars, web sources, and BibTeX into a tar.gz with manifest.
build-poc
Build a Proof of Concept (PoC) to validate technical feasibility and retire architectural risks
build-artifact-index
Build or rebuild the SDLC artifact index for agent-navigable discovery
aiwg-orchestrate
Route structured artifact work to AIWG workflows via MCP with zero parent context cost
venv-manager
Create, manage, and validate Python virtual environments. Use for project isolation and dependency management.
pytest-runner
Execute Python tests with pytest, supporting fixtures, markers, coverage, and parallel execution. Use for Python test automation.
vitest-runner
Execute JavaScript/TypeScript tests with Vitest, supporting coverage, watch mode, and parallel execution. Use for JS/TS test automation.