corpus-snapshot
Generate a comprehensive corpus snapshot report from template, computing all metrics (dimensions, topology, degree distribution, delta from previous) and assisting with analysis sections (clusters, chains, gaps).
Best use case
corpus-snapshot is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
It is a strong fit for teams already working in Codex.
Generate a comprehensive corpus snapshot report from template, computing all metrics (dimensions, topology, degree distribution, delta from previous) and assisting with analysis sections (clusters, chains, gaps).
Teams using corpus-snapshot should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/corpus-snapshot/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How corpus-snapshot Compares
| Feature / Agent | corpus-snapshot | Standard Approach |
|---|---|---|
| Platform Support | Codex | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Generate a comprehensive corpus snapshot report from template, computing all metrics (dimensions, topology, degree distribution, delta from previous) and assisting with analysis sections (clusters, chains, gaps).
Which AI agents support this skill?
This skill is designed for Codex.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Cursor vs Codex for AI Workflows
Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
AI Agent for Product Research
Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.
SKILL.md Source
# Corpus Snapshot Generate a point-in-time snapshot of the research corpus with computed metrics and analysis. Reads a snapshot template, fills `[COMPUTE]` sections with data, assists with `[ANALYZE]` sections, and writes the completed report. ## Triggers - "take a corpus snapshot" - "generate corpus report" - "snapshot the research" - "corpus snapshot" - `/corpus-snapshot` ## Parameters ### `--compute-only` (optional) Only compute data sections — skip analysis sections. Faster, fully automated. ### `--delta-only` (optional) Only compute the delta from the previous snapshot. Useful for tracking session progress. ### `--template <path>` (optional) Custom template path. Default: `.aiwg/reports/corpus-snapshot-template.md`. ### `--format` (optional) Output format: `full` (default for the report file), `summary` (terminal), `json` (programmatic). ## Prerequisites Before generating a snapshot, the following should be current: | Prerequisite | Command | Gates on | |-------------|---------|----------| | Citation edges complete | `/citation-backfill` | Topology metrics | | Indices up to date | `/corpus-index-build` | Group counts, hub analysis | | Stub rate < 10% | `/research-quality-audit` | Snapshot validity | If prerequisites are stale, the snapshot will include warnings. ## Execution Flow ### Phase 1: Collect Raw Metrics Scan the corpus and compute: **Dimensions:** - Total papers (node count) - Total citation edges (edge count) - Topics (unique tag count) - Authors (unique author count) - Year range (oldest → newest) - Source types distribution **Topology (from citation-network index):** - Graph density: edges / (nodes * (nodes-1)) - Average degree (mean edges per node) - Max hub (node with most connections) - Connected components count - Isolated nodes (degree 0) - Diameter estimate (longest shortest path in largest component) **Degree Distribution:** - Histogram: how many nodes have degree 0, 1-2, 3-5, 6-10, 11-20, 20+ - Power law fit (if applicable) **Quality Distribution:** - GRADE breakdown: High / Moderate / Low / Very Low - Doc depth: Full / Adequate / Stub / Skeleton (from quality-audit) - Source availability: PDF present / Full text extracted / Missing ### Phase 2: Compute Delta (if previous snapshot exists) Compare current metrics against the most recent snapshot: ``` Delta from previous snapshot (2026-04-10): Papers: +12 (360 → 372) Edges: +87 (1,160 → 1,247) Density: +0.001 (0.008 → 0.009) New topics: +2 (gui-agents, code-generation) Stubs fixed: 23 (88 → 65) New hubs: REF-364 (entered top 10) ``` ### Phase 3: Fill Template Sections Read the snapshot template and fill sections: **`[COMPUTE]` sections** — fully automated: - Dimensions table - Topology metrics - Degree distribution histogram - GRADE distribution - Delta table **`[ANALYZE]` sections** — agent-assisted: - **Cluster narrative**: describe the main clusters and their themes - **Chain analysis**: identify citation chains (A→B→C→D) and their significance - **Gap narrative**: summarize disconnected areas and bridge opportunities - **Trend analysis**: what's growing, what's stagnant ### Phase 4: Write Report Write the completed snapshot to: ``` .aiwg/reports/corpus-snapshot-YYYY-MM-DD.md ``` With frontmatter: ```yaml --- type: corpus-snapshot date: 2026-04-13 papers: 372 edges: 1247 density: 0.009 components: 9 stub_rate: 0.17 previous: corpus-snapshot-2026-04-10.md --- ``` ### Phase 5: Report Summary ``` Corpus Snapshot Generated ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Papers: 372 (+12) | Edges: 1,247 (+87) Density: 0.009 | Components: 9 Hub: REF-016 (34) | Isolated: 3 GRADE: 33% High, 24% Mod, 26% Low, 16% VLow Stubs: 65 (17%) | Full text: 54% Delta highlights: +12 papers inducted +87 citation edges (backfill) -23 stubs (expanded) +2 new topics Written to: .aiwg/reports/corpus-snapshot-2026-04-13.md ``` ## Template Format The default template uses markers for computed vs analyzed sections: ```markdown # Corpus Snapshot — [DATE] ## Dimensions [COMPUTE: dimensions-table] ## Topology [COMPUTE: topology-metrics] ## Degree Distribution [COMPUTE: degree-histogram] ## Quality Distribution [COMPUTE: grade-distribution] [COMPUTE: depth-distribution] ## Delta [COMPUTE: delta-from-previous] ## Cluster Analysis [ANALYZE: describe main clusters, their themes, and notable papers] ## Citation Chains [ANALYZE: identify significant citation chains and their meaning] ## Gaps and Opportunities [ANALYZE: summarize disconnected areas and bridge opportunities] ## Recommendations [ANALYZE: what should be inducted next, what needs expansion] ``` ## Integration Points | Component | Relationship | |-----------|-------------| | `corpus-index-build` | Reads index metrics (topology, hubs, components) | | `research-quality-audit` | Reads depth distribution; gates if stub rate > 10% | | `citation-backfill` | Must run before snapshot for accurate topology | | `research-gap-detect` | Cluster data feeds into gap narrative | | `research-status` | Snapshot is the detailed version of the health score | ## Examples ```bash # Full snapshot with analysis /corpus-snapshot # Just data, no analysis sections /corpus-snapshot --compute-only # Delta from previous snapshot only /corpus-snapshot --delta-only # Custom template /corpus-snapshot --template .aiwg/reports/custom-template.md # JSON metrics for dashboards /corpus-snapshot --format json ``` ## References - @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/corpus-index-build/SKILL.md — Index metrics source - @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-quality-audit/SKILL.md — Depth distribution source - @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/citation-backfill/SKILL.md — Prerequisite for topology - @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-gap-detect/SKILL.md — Cluster data for narrative - @$AIWG_ROOT/agentic/code/frameworks/research-complete/skills/research-status/SKILL.md — Health scoring complement
Related Skills
snapshot
Capture, list, show, or replay point-in-time workflow snapshots so execution state can be preserved and reproduced
corpus-index-build
Build graph indices (by-topic, by-year, authors, citation-network) from corpus state using definitions in .aiwg/config.yaml. Replaces manual 3-agent dispatch with a single command.
corpus-health
Report on research corpus health, completeness, and integrity
corpus-export
Package corpus subsets as distribution archives. Select papers by cluster, topic, REF range, or custom filter; bundle PDFs, analysis docs, citation sidecars, web sources, and BibTeX into a tar.gz with manifest.
aiwg-orchestrate
Route structured artifact work to AIWG workflows via MCP with zero parent context cost
venv-manager
Create, manage, and validate Python virtual environments. Use for project isolation and dependency management.
pytest-runner
Execute Python tests with pytest, supporting fixtures, markers, coverage, and parallel execution. Use for Python test automation.
vitest-runner
Execute JavaScript/TypeScript tests with Vitest, supporting coverage, watch mode, and parallel execution. Use for JS/TS test automation.
eslint-checker
Run ESLint for JavaScript/TypeScript code quality and style enforcement. Use for static analysis and auto-fixing.
repo-analyzer
Analyze GitHub repositories for structure, documentation, dependencies, and contribution patterns. Use for codebase understanding and health assessment.
pr-reviewer
Review GitHub pull requests for code quality, security, and best practices. Use for automated PR feedback and approval workflows.
YouTube Acquisition
yt-dlp patterns for acquiring content from YouTube and video platforms