Codex

research-acquire

Download research papers and extract metadata

104 stars

Best use case

research-acquire is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

It is a strong fit for teams already working in Codex.

Download research papers and extract metadata

Teams using research-acquire should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/research-acquire/SKILL.md --create-dirs "https://raw.githubusercontent.com/jmagly/aiwg/main/.agents/skills/research-acquire/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/research-acquire/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How research-acquire Compares

Feature / Agentresearch-acquireStandard Approach
Platform SupportCodexLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Download research papers and extract metadata

Which AI agents support this skill?

This skill is designed for Codex.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Research Acquire Command

Download research papers from public repositories and extract metadata.

## Instructions

When invoked, perform automated paper acquisition:

1. **Identify Source**
   - Parse DOI, arXiv ID, or URL
   - Determine paper hosting location
   - Check if paper already exists in `.aiwg/research/sources/`

2. **Download Paper**
   - Attempt direct PDF download from source
   - Try fallback sources (arXiv mirror, Unpaywall, PMC)
   - Save to `.aiwg/research/sources/[ref-id].pdf`
   - Verify download integrity (file size, PDF structure)

3. **Extract Metadata**
   - Parse PDF metadata (title, authors, year)
   - Query CrossRef/Semantic Scholar for enhanced metadata
   - Extract abstract, keywords, citation count
   - Determine source type (journal, conference, preprint)

4. **Generate Frontmatter**
   - Create YAML frontmatter per @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/research/frontmatter-schema.yaml
   - Assign REF-XXX identifier
   - Calculate PDF checksum (SHA-256)
   - Set initial GRADE baseline from source type

5. **Extract Full Text** (default, unless `--no-extract-text`)
   - Extract full text from PDF to `.aiwg/research/sources/text/REF-XXX.txt`
   - This text is the primary input for downstream analysis — analysis agents
     must read this file, not just metadata or abstract
   - If extraction fails (scanned PDF, encrypted): log warning, set
     `full_text_available: false` in frontmatter

6. **Create Finding Document**
   - Generate `.aiwg/research/findings/REF-XXX-[slug].md` from template
   - Populate frontmatter with extracted metadata
   - Add placeholder sections for key findings
   - Update fixity manifest

7. **Post-Acquisition**
   - Log acquisition in `.aiwg/research/acquisition-log.yaml`
   - Update corpus index
   - Suggest next steps (quality assessment, documentation)

## Arguments

- `[identifier]` - DOI, arXiv ID, or URL (required)
- `--output [path]` - Custom output location (default: auto-generate)
- `--ref-id [REF-XXX]` - Specific REF-XXX identifier (default: auto-assign)
- `--extract-text` - Extract full text to `.txt` file for analysis (default: enabled; use `--no-extract-text` to skip)
- `--no-metadata` - Skip metadata enrichment
- `--force` - Re-download even if paper exists

## Examples

```bash
# Acquire by DOI
/research-acquire 10.48550/arXiv.2308.08155

# Acquire by arXiv ID
/research-acquire arXiv:2308.08155

# Acquire with custom identifier
/research-acquire https://arxiv.org/pdf/2308.08155.pdf --ref-id REF-022

# Acquire with full text extraction
/research-acquire 10.1145/3377811.3380330 --extract-text
```

## Expected Output

```
Acquiring Paper: 10.48550/arXiv.2308.08155
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 1: Resolving identifier
  ✓ DOI resolved to arXiv:2308.08155
  ✓ Paper not found in corpus

Step 2: Downloading PDF
  ✓ Downloaded from arxiv.org (2.4 MB)
  ✓ Saved to .aiwg/research/sources/REF-022.pdf
  ✓ Checksum: a1b2c3d4e5f6...

Step 3: Extracting metadata
  ✓ Title: AutoGen: Enabling Next-Gen LLM Applications...
  ✓ Authors: Wu, Q., Bansal, G., Zhang, J., et al. (9 authors)
  ✓ Year: 2023
  ✓ Source: arXiv preprint
  ✓ Citations: 234 (as of 2026-02-03)

Step 4: Creating finding document
  ✓ Generated .aiwg/research/findings/REF-022-autogen.md
  ✓ Frontmatter populated
  ✓ Template sections added

Step 5: Updating corpus
  ✓ Added to fixity manifest
  ✓ Updated INDEX.md
  ✓ Logged acquisition

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Acquisition complete!

REF-ID: REF-022
Title: AutoGen: Enabling Next-Gen LLM Applications...
File: .aiwg/research/sources/REF-022.pdf
Finding: .aiwg/research/findings/REF-022-autogen.md

Next Steps:
1. /research-quality REF-022 - Assess evidence quality
2. /research-document REF-022 - Create detailed summary
3. /research-cite REF-022 - Generate citation
```

## Provenance Tracking

All acquisitions create provenance records:

```yaml
# .aiwg/research/provenance/records/REF-022-acquisition.yaml
entity:
  id: "urn:aiwg:artifact:.aiwg/research/sources/REF-022.pdf"
  type: "research_paper"

activity:
  id: "urn:aiwg:activity:acquisition:REF-022:001"
  type: "acquisition"
  started_at: "2026-02-03T12:00:00Z"
  ended_at: "2026-02-03T12:00:15Z"

agent:
  id: "urn:aiwg:agent:acquisition-agent"
  type: "aiwg_agent"

source:
  identifier: "10.48550/arXiv.2308.08155"
  url: "https://arxiv.org/pdf/2308.08155.pdf"
```

## References

- @$AIWG_ROOT/agentic/code/frameworks/research-complete/agents/acquisition-agent.md - Acquisition Agent
- @$AIWG_ROOT/src/research/services/acquisition-service.ts - Download implementation
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/research/frontmatter-schema.yaml - Metadata format
- @.aiwg/research/fixity-manifest.json - Checksum tracking
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/provenance-tracking.md - Provenance requirements

Related Skills

research-workflow

104
from jmagly/aiwg

Execute multi-stage research workflows

Codex

research-status

104
from jmagly/aiwg

Show research corpus health and statistics

Codex

research-query

104
from jmagly/aiwg

Search the local research corpus, read matching findings, and synthesize an answer with inline citations to REF-XXX sources. The "query" operation for the research pipeline.

Codex

research-quality

104
from jmagly/aiwg

Assess source quality using GRADE methodology

Codex

research-quality-audit

104
from jmagly/aiwg

Audit research corpus for shallow stubs, incomplete sections, missing source files, and doc depth issues. Detects docs written from abstracts rather than full papers and optionally auto-dispatches expansion agents.

Codex

research-provenance

104
from jmagly/aiwg

Query provenance chains and artifact relationships

Codex

research-lint

104
from jmagly/aiwg

Run the research corpus lint ruleset to detect structural and referential integrity issues — orphan notes, missing frontmatter, broken references, missing GRADE assessments.

Codex

research-gap

104
from jmagly/aiwg

Analyze gaps in research coverage

Codex

research-gap-detect

104
from jmagly/aiwg

Build the mutual citation graph, find connected components, identify isolated clusters, and optionally search for bridge candidates and file gap issues. Automates the manual cluster analysis workflow.

Codex

research-document

104
from jmagly/aiwg

Generate summaries and literature notes from research papers

Codex

research-discover

104
from jmagly/aiwg

Search for research papers across academic databases

Codex

research-cite

104
from jmagly/aiwg

Generate properly formatted citation from research corpus

Codex