Codex

research-archive

Package research artifacts for long-term archival

104 stars

Best use case

research-archive is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

It is a strong fit for teams already working in Codex.

Package research artifacts for long-term archival

Teams using research-archive should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/research-archive/SKILL.md --create-dirs "https://raw.githubusercontent.com/jmagly/aiwg/main/.agents/skills/research-archive/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/research-archive/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How research-archive Compares

Feature / Agentresearch-archiveStandard Approach
Platform SupportCodexLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Package research artifacts for long-term archival

Which AI agents support this skill?

This skill is designed for Codex.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Research Archive Command

Package research artifacts for long-term preservation following archival best practices.

## Instructions

When invoked, create archival packages:

1. **Identify Artifacts**
   - If REF-XXX specified, collect all artifacts for that paper
   - If --all, collect entire research corpus
   - Gather: PDF, finding document, metadata, provenance records, quality assessments

2. **Validate Integrity**
   - Verify all checksums against fixity manifest
   - Confirm no file corruption
   - Check metadata completeness
   - Ensure provenance chains are complete

3. **Create Archive Package**
   - Package format options:
     - **BagIt** (default) - Library of Congress standard for digital preservation
     - **ZIP** - Universal compressed format
     - **TAR.GZ** - POSIX archival format
   - Include manifest with checksums
   - Add archive metadata (creation date, creator, package contents)

4. **Generate Archival Metadata**
   - Dublin Core metadata record
   - PREMIS preservation metadata
   - Research corpus inventory
   - Provenance summary

5. **Verify Package**
   - Validate package structure
   - Verify all files present
   - Check checksums
   - Test package extraction

6. **Store and Register**
   - Save to `.aiwg/research/archives/`
   - Register in archival index
   - Generate archival report
   - Create retrieval instructions

## Arguments

- `[ref-id or --all]` - Specific paper or entire corpus (required)
- `--format [bagit|zip|tar]` - Archive format (default: bagit)
- `--output [path]` - Custom output location (default: .aiwg/research/archives/)
- `--verify` - Perform integrity verification after creation
- `--compression [none|gzip|bzip2]` - Compression level (default: gzip)
- `--include-notes` - Include literature notes in package
- `--metadata-only` - Create metadata package without PDFs

## BagIt Format (Default)

BagIt is the Library of Congress standard for digital preservation:

```
REF-022-archive/
├── bagit.txt                    # BagIt declaration
├── bag-info.txt                 # Package metadata
├── manifest-sha256.txt          # Checksums for data files
├── tagmanifest-sha256.txt       # Checksums for tag files
└── data/
    ├── REF-022.pdf              # Source paper
    ├── REF-022-autogen.md       # Finding document
    ├── metadata.yaml            # Extracted metadata
    ├── provenance.yaml          # Provenance records
    └── quality-assessment.yaml  # GRADE assessment
```

## Examples

```bash
# Archive single paper in BagIt format
/research-archive REF-022

# Archive with verification
/research-archive REF-022 --verify

# Archive entire corpus
/research-archive --all --format bagit

# Archive with custom output
/research-archive REF-022 --output /backup/research-archives/

# Create metadata-only archive for sharing
/research-archive REF-022 --metadata-only --format zip

# Archive multiple papers
/research-archive REF-001 REF-013 REF-022 --format tar
```

## Expected Output

```
Creating Archive: REF-022
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 1: Collecting artifacts
  ✓ Source PDF: .aiwg/research/sources/REF-022.pdf (2.4 MB)
  ✓ Finding: .aiwg/research/findings/REF-022-autogen.md (12 KB)
  ✓ Metadata: Extracted from frontmatter
  ✓ Provenance: .aiwg/research/provenance/records/REF-022-acquisition.yaml
  ✓ Quality: .aiwg/research/quality-assessments/REF-022-assessment.yaml
  ✓ Literature notes: .aiwg/research/literature-notes/REF-022-notes.md

Step 2: Validating integrity
  ✓ PDF checksum verified: a1b2c3d4e5f6...
  ✓ All files present and intact
  ✓ Metadata complete
  ✓ Provenance chain validated

Step 3: Creating BagIt package
  ✓ BagIt structure created
  ✓ Files copied to data/ directory
  ✓ SHA-256 checksums generated
  ✓ bag-info.txt created with metadata
  ✓ Package size: 2.5 MB

Step 4: Generating archival metadata
  ✓ Dublin Core record created
  ✓ PREMIS preservation metadata added
  ✓ Inventory generated: 6 files

Step 5: Verifying package
  ✓ BagIt validation passed
  ✓ All checksums verified
  ✓ Package structure correct
  ✓ Test extraction successful

Step 6: Registering archive
  ✓ Saved to: .aiwg/research/archives/REF-022-archive-20260203.bag
  ✓ Registered in archival index
  ✓ Archival report generated

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Archive created successfully!

Package: .aiwg/research/archives/REF-022-archive-20260203.bag
Format: BagIt (Library of Congress standard)
Size: 2.5 MB
Files: 6

Contents:
  - REF-022.pdf (source paper)
  - REF-022-autogen.md (finding document)
  - metadata.yaml (frontmatter + enrichment)
  - provenance.yaml (acquisition + documentation history)
  - quality-assessment.yaml (GRADE assessment)
  - literature-notes.md (synthesis notes)

Verification: PASSED

Retrieval Instructions:
  1. Extract: bagit.py --validate REF-022-archive-20260203.bag
  2. Restore to corpus: /research-restore REF-022-archive-20260203.bag

Archive Report: .aiwg/research/archives/REF-022-archive-20260203-report.md
```

## Archival Metadata

Each archive includes comprehensive metadata:

```yaml
# bag-info.txt (BagIt metadata)
Source-Organization: AIWG Research Corpus
Organization-Address: https://github.com/jmagly/aiwg
Contact-Name: AIWG Archival Agent
Contact-Email: research@aiwg.io
External-Description: Research paper archive for REF-022 (AutoGen)
Bagging-Date: 2026-02-03
Bag-Size: 2.5 MB
Payload-Oxum: 2621440.6
External-Identifier: REF-022
Internal-Sender-Identifier: ai-writing-guide/research-corpus
Internal-Sender-Description: AIWG Research Framework

# Dublin Core metadata
dc:title: AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
dc:creator: Wu, Qingyun; Bansal, Gagan; Zhang, Jieyu; et al.
dc:date: 2023
dc:identifier: 10.48550/arXiv.2308.08155
dc:type: Conference Paper
dc:format: application/pdf
dc:language: en

# PREMIS preservation metadata
premis:objectIdentifier: REF-022
premis:originalName: REF-022.pdf
premis:fixity: sha256:a1b2c3d4e5f6...
premis:dateCreatedByApplication: 2026-02-03T12:00:00Z
premis:preservationLevel: bit-level
```

## Archival Index

All archives are tracked in `.aiwg/research/archives/archive-index.yaml`:

```yaml
archives:
  - archive_id: REF-022-archive-20260203
    ref_id: REF-022
    created_at: "2026-02-03T14:30:00Z"
    format: bagit
    size_bytes: 2621440
    file_count: 6
    checksum: "sha256:xyz789..."
    location: ".aiwg/research/archives/REF-022-archive-20260203.bag"
    verified: true
    last_verified: "2026-02-03T14:30:15Z"
```

## Bulk Archival

Archive entire corpus:

```bash
/research-archive --all

Output:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Archiving Entire Research Corpus
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Found 47 papers in corpus

Progress: [████████████████████] 47/47 (100%)

Summary:
  ✓ 47 papers archived
  ✓ Total size: 142.8 MB
  ✓ All packages verified
  ✓ Archival index updated

Archive Bundle: .aiwg/research/archives/corpus-archive-20260203.tar.gz
Manifest: .aiwg/research/archives/corpus-manifest-20260203.yaml

Individual Archives:
  REF-001-archive-20260203.bag
  REF-002-archive-20260203.bag
  ...
  REF-047-archive-20260203.bag
```

## Restoration

Archived packages can be restored using `/research-restore`:

```bash
/research-restore REF-022-archive-20260203.bag
```

## Long-Term Preservation Compliance

Archives follow best practices from:

- Library of Congress BagIt specification
- OAIS (Open Archival Information System) reference model
- Dublin Core metadata standard
- PREMIS preservation metadata standard

## Validation

All archives undergo validation:

- [ ] BagIt specification compliance
- [ ] All files listed in manifest present
- [ ] All checksums match manifest
- [ ] Metadata is complete and valid
- [ ] Package can be extracted successfully
- [ ] Contents can be restored to working corpus

## References

- @$AIWG_ROOT/agentic/code/frameworks/research-complete/agents/archival-agent.md - Archival Agent
- @$AIWG_ROOT/src/research/services/archival-service.ts - Archival implementation
- @.aiwg/research/fixity-manifest.json - Checksum tracking
- @.aiwg/research/archives/README.md - Archival procedures
- https://tools.ietf.org/html/rfc8493 - BagIt specification

Related Skills

Archive Acquisition

104
from jmagly/aiwg

Patterns for acquiring content from Internet Archive and archival sources

verify-archive

104
from jmagly/aiwg

Verify archive integrity with self-verifying SHA-256 checksums, generate VERIFY.md, and optionally create W3C PROV provenance

Codex

research-workflow

104
from jmagly/aiwg

Execute multi-stage research workflows

Codex

research-status

104
from jmagly/aiwg

Show research corpus health and statistics

Codex

research-query

104
from jmagly/aiwg

Search the local research corpus, read matching findings, and synthesize an answer with inline citations to REF-XXX sources. The "query" operation for the research pipeline.

Codex

research-quality

104
from jmagly/aiwg

Assess source quality using GRADE methodology

Codex

research-quality-audit

104
from jmagly/aiwg

Audit research corpus for shallow stubs, incomplete sections, missing source files, and doc depth issues. Detects docs written from abstracts rather than full papers and optionally auto-dispatches expansion agents.

Codex

research-provenance

104
from jmagly/aiwg

Query provenance chains and artifact relationships

Codex

research-lint

104
from jmagly/aiwg

Run the research corpus lint ruleset to detect structural and referential integrity issues — orphan notes, missing frontmatter, broken references, missing GRADE assessments.

Codex

research-gap

104
from jmagly/aiwg

Analyze gaps in research coverage

Codex

research-gap-detect

104
from jmagly/aiwg

Build the mutual citation graph, find connected components, identify isolated clusters, and optionally search for bridge candidates and file gap issues. Automates the manual cluster analysis workflow.

Codex

research-document

104
from jmagly/aiwg

Generate summaries and literature notes from research papers

Codex