AI Agent Skill HUB

Codex

research-archive

Package research artifacts for long-term archival

104 stars

View on GitHub Installation ↓

Best use case

research-archive is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

It is a strong fit for teams already working in Codex.

Package research artifacts for long-term archival

Teams using research-archive should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/research-archive/SKILL.md --create-dirs "https://raw.githubusercontent.com/jmagly/aiwg/main/.agents/skills/research-archive/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/research-archive/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How research-archive Compares

Feature / Agent	research-archive	Standard Approach
Platform Support	Codex	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Package research artifacts for long-term archival

Which AI agents support this skill?

This skill is designed for Codex.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

AI Agent for YouTube Script Writing

Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.

AI Agent for Product Research

Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.

SKILL.md Source

# Research Archive Command

Package research artifacts for long-term preservation following archival best practices.

## Instructions

When invoked, create archival packages:

1. **Identify Artifacts**
   - If REF-XXX specified, collect all artifacts for that paper
   - If --all, collect entire research corpus
   - Gather: PDF, finding document, metadata, provenance records, quality assessments

2. **Validate Integrity**
   - Verify all checksums against fixity manifest
   - Confirm no file corruption
   - Check metadata completeness
   - Ensure provenance chains are complete

3. **Create Archive Package**
   - Package format options:
     - **BagIt** (default) - Library of Congress standard for digital preservation
     - **ZIP** - Universal compressed format
     - **TAR.GZ** - POSIX archival format
   - Include manifest with checksums
   - Add archive metadata (creation date, creator, package contents)

4. **Generate Archival Metadata**
   - Dublin Core metadata record
   - PREMIS preservation metadata
   - Research corpus inventory
   - Provenance summary

5. **Verify Package**
   - Validate package structure
   - Verify all files present
   - Check checksums
   - Test package extraction

6. **Store and Register**
   - Save to `.aiwg/research/archives/`
   - Register in archival index
   - Generate archival report
   - Create retrieval instructions

## Arguments

- `[ref-id or --all]` - Specific paper or entire corpus (required)
- `--format [bagit|zip|tar]` - Archive format (default: bagit)
- `--output [path]` - Custom output location (default: .aiwg/research/archives/)
- `--verify` - Perform integrity verification after creation
- `--compression [none|gzip|bzip2]` - Compression level (default: gzip)
- `--include-notes` - Include literature notes in package
- `--metadata-only` - Create metadata package without PDFs

## BagIt Format (Default)

BagIt is the Library of Congress standard for digital preservation:

```
REF-022-archive/
├── bagit.txt                    # BagIt declaration
├── bag-info.txt                 # Package metadata
├── manifest-sha256.txt          # Checksums for data files
├── tagmanifest-sha256.txt       # Checksums for tag files
└── data/
    ├── REF-022.pdf              # Source paper
    ├── REF-022-autogen.md       # Finding document
    ├── metadata.yaml            # Extracted metadata
    ├── provenance.yaml          # Provenance records
    └── quality-assessment.yaml  # GRADE assessment
```

## Examples

```bash
# Archive single paper in BagIt format
/research-archive REF-022

# Archive with verification
/research-archive REF-022 --verify

# Archive entire corpus
/research-archive --all --format bagit

# Archive with custom output
/research-archive REF-022 --output /backup/research-archives/

# Create metadata-only archive for sharing
/research-archive REF-022 --metadata-only --format zip

# Archive multiple papers
/research-archive REF-001 REF-013 REF-022 --format tar
```

## Expected Output

```
Creating Archive: REF-022
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 1: Collecting artifacts
  ✓ Source PDF: .aiwg/research/sources/REF-022.pdf (2.4 MB)
  ✓ Finding: .aiwg/research/findings/REF-022-autogen.md (12 KB)
  ✓ Metadata: Extracted from frontmatter
  ✓ Provenance: .aiwg/research/provenance/records/REF-022-acquisition.yaml
  ✓ Quality: .aiwg/research/quality-assessments/REF-022-assessment.yaml
  ✓ Literature notes: .aiwg/research/literature-notes/REF-022-notes.md

Step 2: Validating integrity
  ✓ PDF checksum verified: a1b2c3d4e5f6...
  ✓ All files present and intact
  ✓ Metadata complete
  ✓ Provenance chain validated

Step 3: Creating BagIt package
  ✓ BagIt structure created
  ✓ Files copied to data/ directory
  ✓ SHA-256 checksums generated
  ✓ bag-info.txt created with metadata
  ✓ Package size: 2.5 MB

Step 4: Generating archival metadata
  ✓ Dublin Core record created
  ✓ PREMIS preservation metadata added
  ✓ Inventory generated: 6 files

Step 5: Verifying package
  ✓ BagIt validation passed
  ✓ All checksums verified
  ✓ Package structure correct
  ✓ Test extraction successful

Step 6: Registering archive
  ✓ Saved to: .aiwg/research/archives/REF-022-archive-20260203.bag
  ✓ Registered in archival index
  ✓ Archival report generated

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Archive created successfully!

Package: .aiwg/research/archives/REF-022-archive-20260203.bag
Format: BagIt (Library of Congress standard)
Size: 2.5 MB
Files: 6

Contents:
  - REF-022.pdf (source paper)
  - REF-022-autogen.md (finding document)
  - metadata.yaml (frontmatter + enrichment)
  - provenance.yaml (acquisition + documentation history)
  - quality-assessment.yaml (GRADE assessment)
  - literature-notes.md (synthesis notes)

Verification: PASSED

Retrieval Instructions:
  1. Extract: bagit.py --validate REF-022-archive-20260203.bag
  2. Restore to corpus: /research-restore REF-022-archive-20260203.bag

Archive Report: .aiwg/research/archives/REF-022-archive-20260203-report.md
```

## Archival Metadata

Each archive includes comprehensive metadata:

```yaml
# bag-info.txt (BagIt metadata)
Source-Organization: AIWG Research Corpus
Organization-Address: https://github.com/jmagly/aiwg
Contact-Name: AIWG Archival Agent
Contact-Email: research@aiwg.io
External-Description: Research paper archive for REF-022 (AutoGen)
Bagging-Date: 2026-02-03
Bag-Size: 2.5 MB
Payload-Oxum: 2621440.6
External-Identifier: REF-022
Internal-Sender-Identifier: ai-writing-guide/research-corpus
Internal-Sender-Description: AIWG Research Framework

# Dublin Core metadata
dc:title: AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
dc:creator: Wu, Qingyun; Bansal, Gagan; Zhang, Jieyu; et al.
dc:date: 2023
dc:identifier: 10.48550/arXiv.2308.08155
dc:type: Conference Paper
dc:format: application/pdf
dc:language: en

# PREMIS preservation metadata
premis:objectIdentifier: REF-022
premis:originalName: REF-022.pdf
premis:fixity: sha256:a1b2c3d4e5f6...
premis:dateCreatedByApplication: 2026-02-03T12:00:00Z
premis:preservationLevel: bit-level
```

## Archival Index

All archives are tracked in `.aiwg/research/archives/archive-index.yaml`:

```yaml
archives:
  - archive_id: REF-022-archive-20260203
    ref_id: REF-022
    created_at: "2026-02-03T14:30:00Z"
    format: bagit
    size_bytes: 2621440
    file_count: 6
    checksum: "sha256:xyz789..."
    location: ".aiwg/research/archives/REF-022-archive-20260203.bag"
    verified: true
    last_verified: "2026-02-03T14:30:15Z"
```

## Bulk Archival

Archive entire corpus:

```bash
/research-archive --all

Output:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Archiving Entire Research Corpus
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Found 47 papers in corpus

Progress: [████████████████████] 47/47 (100%)

Summary:
  ✓ 47 papers archived
  ✓ Total size: 142.8 MB
  ✓ All packages verified
  ✓ Archival index updated

Archive Bundle: .aiwg/research/archives/corpus-archive-20260203.tar.gz
Manifest: .aiwg/research/archives/corpus-manifest-20260203.yaml

Individual Archives:
  REF-001-archive-20260203.bag
  REF-002-archive-20260203.bag
  ...
  REF-047-archive-20260203.bag
```

## Restoration

Archived packages can be restored using `/research-restore`:

```bash
/research-restore REF-022-archive-20260203.bag
```

## Long-Term Preservation Compliance

Archives follow best practices from:

- Library of Congress BagIt specification
- OAIS (Open Archival Information System) reference model
- Dublin Core metadata standard
- PREMIS preservation metadata standard

## Validation

All archives undergo validation:

- [ ] BagIt specification compliance
- [ ] All files listed in manifest present
- [ ] All checksums match manifest
- [ ] Metadata is complete and valid
- [ ] Package can be extracted successfully
- [ ] Contents can be restored to working corpus

## References

- @$AIWG_ROOT/agentic/code/frameworks/research-complete/agents/archival-agent.md - Archival Agent
- @$AIWG_ROOT/src/research/services/archival-service.ts - Archival implementation
- @.aiwg/research/fixity-manifest.json - Checksum tracking
- @.aiwg/research/archives/README.md - Archival procedures
- https://tools.ietf.org/html/rfc8493 - BagIt specification

Related Skills

Archive Acquisition

from jmagly/aiwg

Patterns for acquiring content from Internet Archive and archival sources

verify-archive

from jmagly/aiwg

Verify archive integrity with self-verifying SHA-256 checksums, generate VERIFY.md, and optionally create W3C PROV provenance

research-workflow

from jmagly/aiwg

Execute multi-stage research workflows

research-status

from jmagly/aiwg

Show research corpus health and statistics

research-query

from jmagly/aiwg

Search the local research corpus, read matching findings, and synthesize an answer with inline citations to REF-XXX sources. The "query" operation for the research pipeline.

research-quality

from jmagly/aiwg

Assess source quality using GRADE methodology

research-quality-audit

from jmagly/aiwg

Audit research corpus for shallow stubs, incomplete sections, missing source files, and doc depth issues. Detects docs written from abstracts rather than full papers and optionally auto-dispatches expansion agents.

research-provenance

from jmagly/aiwg

Query provenance chains and artifact relationships

research-lint

from jmagly/aiwg

Run the research corpus lint ruleset to detect structural and referential integrity issues — orphan notes, missing frontmatter, broken references, missing GRADE assessments.

research-gap

from jmagly/aiwg

Analyze gaps in research coverage

research-gap-detect

from jmagly/aiwg

Build the mutual citation graph, find connected components, identify isolated clusters, and optionally search for bridge candidates and file gap issues. Automates the manual cluster analysis workflow.

research-document

from jmagly/aiwg

Generate summaries and literature notes from research papers