Codex

corpus-snapshot

Generate a comprehensive corpus snapshot report from template, computing all metrics (dimensions, topology, degree distribution, delta from previous) and assisting with analysis sections (clusters, chains, gaps).

104 stars

Best use case

corpus-snapshot is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

It is a strong fit for teams already working in Codex.

Generate a comprehensive corpus snapshot report from template, computing all metrics (dimensions, topology, degree distribution, delta from previous) and assisting with analysis sections (clusters, chains, gaps).

Teams using corpus-snapshot should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/corpus-snapshot/SKILL.md --create-dirs "https://raw.githubusercontent.com/jmagly/aiwg/main/.agents/skills/corpus-snapshot/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/corpus-snapshot/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How corpus-snapshot Compares

Feature / Agentcorpus-snapshotStandard Approach
Platform SupportCodexLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Generate a comprehensive corpus snapshot report from template, computing all metrics (dimensions, topology, degree distribution, delta from previous) and assisting with analysis sections (clusters, chains, gaps).

Which AI agents support this skill?

This skill is designed for Codex.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Corpus Snapshot

Generate a point-in-time snapshot of the research corpus with computed metrics and analysis. Reads a snapshot template, fills `[COMPUTE]` sections with data, assists with `[ANALYZE]` sections, and writes the completed report.

## Triggers

- "take a corpus snapshot"
- "generate corpus report"
- "snapshot the research"
- "corpus snapshot"
- `/corpus-snapshot`

## Parameters

### `--compute-only` (optional)
Only compute data sections — skip analysis sections. Faster, fully automated.

### `--delta-only` (optional)
Only compute the delta from the previous snapshot. Useful for tracking session progress.

### `--template <path>` (optional)
Custom template path. Default: `.aiwg/reports/corpus-snapshot-template.md`.

### `--format` (optional)
Output format: `full` (default for the report file), `summary` (terminal), `json` (programmatic).

## Prerequisites

Before generating a snapshot, the following should be current:

| Prerequisite | Command | Gates on |
|-------------|---------|----------|
| Citation edges complete | `/citation-backfill` | Topology metrics |
| Indices up to date | `/corpus-index-build` | Group counts, hub analysis |
| Stub rate < 10% | `/research-quality-audit` | Snapshot validity |

If prerequisites are stale, the snapshot will include warnings.

## Execution Flow

### Phase 1: Collect Raw Metrics

Scan the corpus and compute:

**Dimensions:**
- Total papers (node count)
- Total citation edges (edge count)
- Topics (unique tag count)
- Authors (unique author count)
- Year range (oldest → newest)
- Source types distribution

**Topology (from citation-network index):**
- Graph density: edges / (nodes * (nodes-1))
- Average degree (mean edges per node)
- Max hub (node with most connections)
- Connected components count
- Isolated nodes (degree 0)
- Diameter estimate (longest shortest path in largest component)

**Degree Distribution:**
- Histogram: how many nodes have degree 0, 1-2, 3-5, 6-10, 11-20, 20+
- Power law fit (if applicable)

**Quality Distribution:**
- GRADE breakdown: High / Moderate / Low / Very Low
- Doc depth: Full / Adequate / Stub / Skeleton (from quality-audit)
- Source availability: PDF present / Full text extracted / Missing

### Phase 2: Compute Delta (if previous snapshot exists)

Compare current metrics against the most recent snapshot:

```
Delta from previous snapshot (2026-04-10):
  Papers:     +12 (360 → 372)
  Edges:      +87 (1,160 → 1,247)
  Density:    +0.001 (0.008 → 0.009)
  New topics:  +2 (gui-agents, code-generation)
  Stubs fixed: 23 (88 → 65)
  New hubs:    REF-364 (entered top 10)
```

### Phase 3: Fill Template Sections

Read the snapshot template and fill sections:

**`[COMPUTE]` sections** — fully automated:
- Dimensions table
- Topology metrics
- Degree distribution histogram
- GRADE distribution
- Delta table

**`[ANALYZE]` sections** — agent-assisted:
- **Cluster narrative**: describe the main clusters and their themes
- **Chain analysis**: identify citation chains (A→B→C→D) and their significance
- **Gap narrative**: summarize disconnected areas and bridge opportunities
- **Trend analysis**: what's growing, what's stagnant

### Phase 4: Write Report

Write the completed snapshot to:
```
.aiwg/reports/corpus-snapshot-YYYY-MM-DD.md
```

With frontmatter:
```yaml
---
type: corpus-snapshot
date: 2026-04-13
papers: 372
edges: 1247
density: 0.009
components: 9
stub_rate: 0.17
previous: corpus-snapshot-2026-04-10.md
---
```

### Phase 5: Report Summary

```
Corpus Snapshot Generated
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Papers: 372 (+12)  |  Edges: 1,247 (+87)
Density: 0.009     |  Components: 9
Hub: REF-016 (34)  |  Isolated: 3
GRADE: 33% High, 24% Mod, 26% Low, 16% VLow
Stubs: 65 (17%)    |  Full text: 54%

Delta highlights:
  +12 papers inducted
  +87 citation edges (backfill)
  -23 stubs (expanded)
  +2 new topics

Written to: .aiwg/reports/corpus-snapshot-2026-04-13.md
```

## Template Format

The default template uses markers for computed vs analyzed sections:

```markdown
# Corpus Snapshot — [DATE]

## Dimensions
[COMPUTE: dimensions-table]

## Topology
[COMPUTE: topology-metrics]

## Degree Distribution
[COMPUTE: degree-histogram]

## Quality Distribution
[COMPUTE: grade-distribution]
[COMPUTE: depth-distribution]

## Delta
[COMPUTE: delta-from-previous]

## Cluster Analysis
[ANALYZE: describe main clusters, their themes, and notable papers]

## Citation Chains
[ANALYZE: identify significant citation chains and their meaning]

## Gaps and Opportunities
[ANALYZE: summarize disconnected areas and bridge opportunities]

## Recommendations
[ANALYZE: what should be inducted next, what needs expansion]
```

## Integration Points

| Component | Relationship |
|-----------|-------------|
| `corpus-index-build` | Reads index metrics (topology, hubs, components) |
| `research-quality-audit` | Reads depth distribution; gates if stub rate > 10% |
| `citation-backfill` | Must run before snapshot for accurate topology |
| `research-gap-detect` | Cluster data feeds into gap narrative |
| `research-status` | Snapshot is the detailed version of the health score |

## Examples

```bash
# Full snapshot with analysis
/corpus-snapshot

# Just data, no analysis sections
/corpus-snapshot --compute-only

# Delta from previous snapshot only
/corpus-snapshot --delta-only

# Custom template
/corpus-snapshot --template .aiwg/reports/custom-template.md

# JSON metrics for dashboards
/corpus-snapshot --format json
```

## References

- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/corpus-index-build/SKILL.md — Index metrics source
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-quality-audit/SKILL.md — Depth distribution source
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/citation-backfill/SKILL.md — Prerequisite for topology
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-gap-detect/SKILL.md — Cluster data for narrative
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-status/SKILL.md — Health scoring complement

Related Skills

snapshot

104
from jmagly/aiwg

Capture, list, show, or replay point-in-time workflow snapshots so execution state can be preserved and reproduced

Codex

corpus-index-build

104
from jmagly/aiwg

Build graph indices (by-topic, by-year, authors, citation-network) from corpus state using definitions in .aiwg/config.yaml. Replaces manual 3-agent dispatch with a single command.

Codex

corpus-health

104
from jmagly/aiwg

Report on research corpus health, completeness, and integrity

Codex

corpus-export

104
from jmagly/aiwg

Package corpus subsets as distribution archives. Select papers by cluster, topic, REF range, or custom filter; bundle PDFs, analysis docs, citation sidecars, web sources, and BibTeX into a tar.gz with manifest.

Codex

aiwg-orchestrate

104
from jmagly/aiwg

Route structured artifact work to AIWG workflows via MCP with zero parent context cost

venv-manager

104
from jmagly/aiwg

Create, manage, and validate Python virtual environments. Use for project isolation and dependency management.

pytest-runner

104
from jmagly/aiwg

Execute Python tests with pytest, supporting fixtures, markers, coverage, and parallel execution. Use for Python test automation.

vitest-runner

104
from jmagly/aiwg

Execute JavaScript/TypeScript tests with Vitest, supporting coverage, watch mode, and parallel execution. Use for JS/TS test automation.

eslint-checker

104
from jmagly/aiwg

Run ESLint for JavaScript/TypeScript code quality and style enforcement. Use for static analysis and auto-fixing.

repo-analyzer

104
from jmagly/aiwg

Analyze GitHub repositories for structure, documentation, dependencies, and contribution patterns. Use for codebase understanding and health assessment.

pr-reviewer

104
from jmagly/aiwg

Review GitHub pull requests for code quality, security, and best practices. Use for automated PR feedback and approval workflows.

YouTube Acquisition

104
from jmagly/aiwg

yt-dlp patterns for acquiring content from YouTube and video platforms