AI Agent Skill HUB

Codex

corpus-snapshot

Generate a comprehensive corpus snapshot report from template, computing all metrics (dimensions, topology, degree distribution, delta from previous) and assisting with analysis sections (clusters, chains, gaps).

104 stars

View on GitHub Installation ↓

Best use case

corpus-snapshot is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

It is a strong fit for teams already working in Codex.

Generate a comprehensive corpus snapshot report from template, computing all metrics (dimensions, topology, degree distribution, delta from previous) and assisting with analysis sections (clusters, chains, gaps).

Teams using corpus-snapshot should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/corpus-snapshot/SKILL.md --create-dirs "https://raw.githubusercontent.com/jmagly/aiwg/main/.agents/skills/corpus-snapshot/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/corpus-snapshot/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How corpus-snapshot Compares

Feature / Agent	corpus-snapshot	Standard Approach
Platform Support	Codex	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Generate a comprehensive corpus snapshot report from template, computing all metrics (dimensions, topology, degree distribution, delta from previous) and assisting with analysis sections (clusters, chains, gaps).

Which AI agents support this skill?

This skill is designed for Codex.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Cursor vs Codex for AI Workflows

Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

AI Agent for Product Research

Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.

SKILL.md Source

# Corpus Snapshot

Generate a point-in-time snapshot of the research corpus with computed metrics and analysis. Reads a snapshot template, fills `[COMPUTE]` sections with data, assists with `[ANALYZE]` sections, and writes the completed report.

## Triggers

- "take a corpus snapshot"
- "generate corpus report"
- "snapshot the research"
- "corpus snapshot"
- `/corpus-snapshot`

## Parameters

### `--compute-only` (optional)
Only compute data sections — skip analysis sections. Faster, fully automated.

### `--delta-only` (optional)
Only compute the delta from the previous snapshot. Useful for tracking session progress.

### `--template <path>` (optional)
Custom template path. Default: `.aiwg/reports/corpus-snapshot-template.md`.

### `--format` (optional)
Output format: `full` (default for the report file), `summary` (terminal), `json` (programmatic).

## Prerequisites

Before generating a snapshot, the following should be current:

| Prerequisite | Command | Gates on |
|-------------|---------|----------|
| Citation edges complete | `/citation-backfill` | Topology metrics |
| Indices up to date | `/corpus-index-build` | Group counts, hub analysis |
| Stub rate < 10% | `/research-quality-audit` | Snapshot validity |

If prerequisites are stale, the snapshot will include warnings.

## Execution Flow

### Phase 1: Collect Raw Metrics

Scan the corpus and compute:

**Dimensions:**
- Total papers (node count)
- Total citation edges (edge count)
- Topics (unique tag count)
- Authors (unique author count)
- Year range (oldest → newest)
- Source types distribution

**Topology (from citation-network index):**
- Graph density: edges / (nodes * (nodes-1))
- Average degree (mean edges per node)
- Max hub (node with most connections)
- Connected components count
- Isolated nodes (degree 0)
- Diameter estimate (longest shortest path in largest component)

**Degree Distribution:**
- Histogram: how many nodes have degree 0, 1-2, 3-5, 6-10, 11-20, 20+
- Power law fit (if applicable)

**Quality Distribution:**
- GRADE breakdown: High / Moderate / Low / Very Low
- Doc depth: Full / Adequate / Stub / Skeleton (from quality-audit)
- Source availability: PDF present / Full text extracted / Missing

### Phase 2: Compute Delta (if previous snapshot exists)

Compare current metrics against the most recent snapshot:

```
Delta from previous snapshot (2026-04-10):
  Papers:     +12 (360 → 372)
  Edges:      +87 (1,160 → 1,247)
  Density:    +0.001 (0.008 → 0.009)
  New topics:  +2 (gui-agents, code-generation)
  Stubs fixed: 23 (88 → 65)
  New hubs:    REF-364 (entered top 10)
```

### Phase 3: Fill Template Sections

Read the snapshot template and fill sections:

**`[COMPUTE]` sections** — fully automated:
- Dimensions table
- Topology metrics
- Degree distribution histogram
- GRADE distribution
- Delta table

**`[ANALYZE]` sections** — agent-assisted:
- **Cluster narrative**: describe the main clusters and their themes
- **Chain analysis**: identify citation chains (A→B→C→D) and their significance
- **Gap narrative**: summarize disconnected areas and bridge opportunities
- **Trend analysis**: what's growing, what's stagnant

### Phase 4: Write Report

Write the completed snapshot to:
```
.aiwg/reports/corpus-snapshot-YYYY-MM-DD.md
```

With frontmatter:
```yaml
---
type: corpus-snapshot
date: 2026-04-13
papers: 372
edges: 1247
density: 0.009
components: 9
stub_rate: 0.17
previous: corpus-snapshot-2026-04-10.md
---
```

### Phase 5: Report Summary

```
Corpus Snapshot Generated
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Papers: 372 (+12)  |  Edges: 1,247 (+87)
Density: 0.009     |  Components: 9
Hub: REF-016 (34)  |  Isolated: 3
GRADE: 33% High, 24% Mod, 26% Low, 16% VLow
Stubs: 65 (17%)    |  Full text: 54%

Delta highlights:
  +12 papers inducted
  +87 citation edges (backfill)
  -23 stubs (expanded)
  +2 new topics

Written to: .aiwg/reports/corpus-snapshot-2026-04-13.md
```

## Template Format

The default template uses markers for computed vs analyzed sections:

```markdown
# Corpus Snapshot — [DATE]

## Dimensions
[COMPUTE: dimensions-table]

## Topology
[COMPUTE: topology-metrics]

## Degree Distribution
[COMPUTE: degree-histogram]

## Quality Distribution
[COMPUTE: grade-distribution]
[COMPUTE: depth-distribution]

## Delta
[COMPUTE: delta-from-previous]

## Cluster Analysis
[ANALYZE: describe main clusters, their themes, and notable papers]

## Citation Chains
[ANALYZE: identify significant citation chains and their meaning]

## Gaps and Opportunities
[ANALYZE: summarize disconnected areas and bridge opportunities]

## Recommendations
[ANALYZE: what should be inducted next, what needs expansion]
```

## Integration Points

| Component | Relationship |
|-----------|-------------|
| `corpus-index-build` | Reads index metrics (topology, hubs, components) |
| `research-quality-audit` | Reads depth distribution; gates if stub rate > 10% |
| `citation-backfill` | Must run before snapshot for accurate topology |
| `research-gap-detect` | Cluster data feeds into gap narrative |
| `research-status` | Snapshot is the detailed version of the health score |

## Examples

```bash
# Full snapshot with analysis
/corpus-snapshot

# Just data, no analysis sections
/corpus-snapshot --compute-only

# Delta from previous snapshot only
/corpus-snapshot --delta-only

# Custom template
/corpus-snapshot --template .aiwg/reports/custom-template.md

# JSON metrics for dashboards
/corpus-snapshot --format json
```

## References

- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/corpus-index-build/SKILL.md — Index metrics source
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-quality-audit/SKILL.md — Depth distribution source
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/citation-backfill/SKILL.md — Prerequisite for topology
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-gap-detect/SKILL.md — Cluster data for narrative
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-status/SKILL.md — Health scoring complement

Related Skills

snapshot

from jmagly/aiwg

Capture, list, show, or replay point-in-time workflow snapshots so execution state can be preserved and reproduced

corpus-index-build

from jmagly/aiwg

Build graph indices (by-topic, by-year, authors, citation-network) from corpus state using definitions in .aiwg/config.yaml. Replaces manual 3-agent dispatch with a single command.

corpus-health

from jmagly/aiwg

Report on research corpus health, completeness, and integrity

corpus-export

from jmagly/aiwg

Package corpus subsets as distribution archives. Select papers by cluster, topic, REF range, or custom filter; bundle PDFs, analysis docs, citation sidecars, web sources, and BibTeX into a tar.gz with manifest.

aiwg-orchestrate

from jmagly/aiwg

Route structured artifact work to AIWG workflows via MCP with zero parent context cost

venv-manager

from jmagly/aiwg

Create, manage, and validate Python virtual environments. Use for project isolation and dependency management.

pytest-runner

from jmagly/aiwg

Execute Python tests with pytest, supporting fixtures, markers, coverage, and parallel execution. Use for Python test automation.

vitest-runner

from jmagly/aiwg

Execute JavaScript/TypeScript tests with Vitest, supporting coverage, watch mode, and parallel execution. Use for JS/TS test automation.

eslint-checker

from jmagly/aiwg

Run ESLint for JavaScript/TypeScript code quality and style enforcement. Use for static analysis and auto-fixing.

repo-analyzer

from jmagly/aiwg

Analyze GitHub repositories for structure, documentation, dependencies, and contribution patterns. Use for codebase understanding and health assessment.

pr-reviewer

from jmagly/aiwg

Review GitHub pull requests for code quality, security, and best practices. Use for automated PR feedback and approval workflows.

YouTube Acquisition

from jmagly/aiwg

yt-dlp patterns for acquiring content from YouTube and video platforms