AI Agent Skill HUB

Codex

research-acquire

Download research papers and extract metadata

104 stars

View on GitHub Installation ↓

Best use case

research-acquire is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

It is a strong fit for teams already working in Codex.

Download research papers and extract metadata

Teams using research-acquire should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/research-acquire/SKILL.md --create-dirs "https://raw.githubusercontent.com/jmagly/aiwg/main/.agents/skills/research-acquire/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/research-acquire/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How research-acquire Compares

Feature / Agent	research-acquire	Standard Approach
Platform Support	Codex	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Download research papers and extract metadata

Which AI agents support this skill?

This skill is designed for Codex.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

AI Agent for YouTube Script Writing

Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.

AI Agent for Product Research

Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.

SKILL.md Source

# Research Acquire Command

Download research papers from public repositories and extract metadata.

## Instructions

When invoked, perform automated paper acquisition:

1. **Identify Source**
   - Parse DOI, arXiv ID, or URL
   - Determine paper hosting location
   - Check if paper already exists in `.aiwg/research/sources/`

2. **Download Paper**
   - Attempt direct PDF download from source
   - Try fallback sources (arXiv mirror, Unpaywall, PMC)
   - Save to `.aiwg/research/sources/[ref-id].pdf`
   - Verify download integrity (file size, PDF structure)

3. **Extract Metadata**
   - Parse PDF metadata (title, authors, year)
   - Query CrossRef/Semantic Scholar for enhanced metadata
   - Extract abstract, keywords, citation count
   - Determine source type (journal, conference, preprint)

4. **Generate Frontmatter**
   - Create YAML frontmatter per @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/research/frontmatter-schema.yaml
   - Assign REF-XXX identifier
   - Calculate PDF checksum (SHA-256)
   - Set initial GRADE baseline from source type

5. **Extract Full Text** (default, unless `--no-extract-text`)
   - Extract full text from PDF to `.aiwg/research/sources/text/REF-XXX.txt`
   - This text is the primary input for downstream analysis — analysis agents
     must read this file, not just metadata or abstract
   - If extraction fails (scanned PDF, encrypted): log warning, set
     `full_text_available: false` in frontmatter

6. **Create Finding Document**
   - Generate `.aiwg/research/findings/REF-XXX-[slug].md` from template
   - Populate frontmatter with extracted metadata
   - Add placeholder sections for key findings
   - Update fixity manifest

7. **Post-Acquisition**
   - Log acquisition in `.aiwg/research/acquisition-log.yaml`
   - Update corpus index
   - Suggest next steps (quality assessment, documentation)

## Arguments

- `[identifier]` - DOI, arXiv ID, or URL (required)
- `--output [path]` - Custom output location (default: auto-generate)
- `--ref-id [REF-XXX]` - Specific REF-XXX identifier (default: auto-assign)
- `--extract-text` - Extract full text to `.txt` file for analysis (default: enabled; use `--no-extract-text` to skip)
- `--no-metadata` - Skip metadata enrichment
- `--force` - Re-download even if paper exists

## Examples

```bash
# Acquire by DOI
/research-acquire 10.48550/arXiv.2308.08155

# Acquire by arXiv ID
/research-acquire arXiv:2308.08155

# Acquire with custom identifier
/research-acquire https://arxiv.org/pdf/2308.08155.pdf --ref-id REF-022

# Acquire with full text extraction
/research-acquire 10.1145/3377811.3380330 --extract-text
```

## Expected Output

```
Acquiring Paper: 10.48550/arXiv.2308.08155
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 1: Resolving identifier
  ✓ DOI resolved to arXiv:2308.08155
  ✓ Paper not found in corpus

Step 2: Downloading PDF
  ✓ Downloaded from arxiv.org (2.4 MB)
  ✓ Saved to .aiwg/research/sources/REF-022.pdf
  ✓ Checksum: a1b2c3d4e5f6...

Step 3: Extracting metadata
  ✓ Title: AutoGen: Enabling Next-Gen LLM Applications...
  ✓ Authors: Wu, Q., Bansal, G., Zhang, J., et al. (9 authors)
  ✓ Year: 2023
  ✓ Source: arXiv preprint
  ✓ Citations: 234 (as of 2026-02-03)

Step 4: Creating finding document
  ✓ Generated .aiwg/research/findings/REF-022-autogen.md
  ✓ Frontmatter populated
  ✓ Template sections added

Step 5: Updating corpus
  ✓ Added to fixity manifest
  ✓ Updated INDEX.md
  ✓ Logged acquisition

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Acquisition complete!

REF-ID: REF-022
Title: AutoGen: Enabling Next-Gen LLM Applications...
File: .aiwg/research/sources/REF-022.pdf
Finding: .aiwg/research/findings/REF-022-autogen.md

Next Steps:
1. /research-quality REF-022 - Assess evidence quality
2. /research-document REF-022 - Create detailed summary
3. /research-cite REF-022 - Generate citation
```

## Provenance Tracking

All acquisitions create provenance records:

```yaml
# .aiwg/research/provenance/records/REF-022-acquisition.yaml
entity:
  id: "urn:aiwg:artifact:.aiwg/research/sources/REF-022.pdf"
  type: "research_paper"

activity:
  id: "urn:aiwg:activity:acquisition:REF-022:001"
  type: "acquisition"
  started_at: "2026-02-03T12:00:00Z"
  ended_at: "2026-02-03T12:00:15Z"

agent:
  id: "urn:aiwg:agent:acquisition-agent"
  type: "aiwg_agent"

source:
  identifier: "10.48550/arXiv.2308.08155"
  url: "https://arxiv.org/pdf/2308.08155.pdf"
```

## References

- @$AIWG_ROOT/agentic/code/frameworks/research-complete/agents/acquisition-agent.md - Acquisition Agent
- @$AIWG_ROOT/src/research/services/acquisition-service.ts - Download implementation
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/research/frontmatter-schema.yaml - Metadata format
- @.aiwg/research/fixity-manifest.json - Checksum tracking
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/provenance-tracking.md - Provenance requirements

Related Skills

research-workflow

from jmagly/aiwg

Execute multi-stage research workflows

research-status

from jmagly/aiwg

Show research corpus health and statistics

research-query

from jmagly/aiwg

Search the local research corpus, read matching findings, and synthesize an answer with inline citations to REF-XXX sources. The "query" operation for the research pipeline.

research-quality

from jmagly/aiwg

Assess source quality using GRADE methodology

research-quality-audit

from jmagly/aiwg

Audit research corpus for shallow stubs, incomplete sections, missing source files, and doc depth issues. Detects docs written from abstracts rather than full papers and optionally auto-dispatches expansion agents.

research-provenance

from jmagly/aiwg

Query provenance chains and artifact relationships

research-lint

from jmagly/aiwg

Run the research corpus lint ruleset to detect structural and referential integrity issues — orphan notes, missing frontmatter, broken references, missing GRADE assessments.

research-gap

from jmagly/aiwg

Analyze gaps in research coverage

research-gap-detect

from jmagly/aiwg

Build the mutual citation graph, find connected components, identify isolated clusters, and optionally search for bridge candidates and file gap issues. Automates the manual cluster analysis workflow.

research-document

from jmagly/aiwg

Generate summaries and literature notes from research papers

research-discover

from jmagly/aiwg

Search for research papers across academic databases

research-cite

from jmagly/aiwg

Generate properly formatted citation from research corpus