Best use case
research-acquire is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
It is a strong fit for teams already working in Codex.
Download research papers and extract metadata
Teams using research-acquire should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/research-acquire/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How research-acquire Compares
| Feature / Agent | research-acquire | Standard Approach |
|---|---|---|
| Platform Support | Codex | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Download research papers and extract metadata
Which AI agents support this skill?
This skill is designed for Codex.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Startups
Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.
AI Agent for YouTube Script Writing
Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.
AI Agent for Product Research
Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.
SKILL.md Source
# Research Acquire Command
Download research papers from public repositories and extract metadata.
## Instructions
When invoked, perform automated paper acquisition:
1. **Identify Source**
- Parse DOI, arXiv ID, or URL
- Determine paper hosting location
- Check if paper already exists in `.aiwg/research/sources/`
2. **Download Paper**
- Attempt direct PDF download from source
- Try fallback sources (arXiv mirror, Unpaywall, PMC)
- Save to `.aiwg/research/sources/[ref-id].pdf`
- Verify download integrity (file size, PDF structure)
3. **Extract Metadata**
- Parse PDF metadata (title, authors, year)
- Query CrossRef/Semantic Scholar for enhanced metadata
- Extract abstract, keywords, citation count
- Determine source type (journal, conference, preprint)
4. **Generate Frontmatter**
- Create YAML frontmatter per @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/research/frontmatter-schema.yaml
- Assign REF-XXX identifier
- Calculate PDF checksum (SHA-256)
- Set initial GRADE baseline from source type
5. **Extract Full Text** (default, unless `--no-extract-text`)
- Extract full text from PDF to `.aiwg/research/sources/text/REF-XXX.txt`
- This text is the primary input for downstream analysis — analysis agents
must read this file, not just metadata or abstract
- If extraction fails (scanned PDF, encrypted): log warning, set
`full_text_available: false` in frontmatter
6. **Create Finding Document**
- Generate `.aiwg/research/findings/REF-XXX-[slug].md` from template
- Populate frontmatter with extracted metadata
- Add placeholder sections for key findings
- Update fixity manifest
7. **Post-Acquisition**
- Log acquisition in `.aiwg/research/acquisition-log.yaml`
- Update corpus index
- Suggest next steps (quality assessment, documentation)
## Arguments
- `[identifier]` - DOI, arXiv ID, or URL (required)
- `--output [path]` - Custom output location (default: auto-generate)
- `--ref-id [REF-XXX]` - Specific REF-XXX identifier (default: auto-assign)
- `--extract-text` - Extract full text to `.txt` file for analysis (default: enabled; use `--no-extract-text` to skip)
- `--no-metadata` - Skip metadata enrichment
- `--force` - Re-download even if paper exists
## Examples
```bash
# Acquire by DOI
/research-acquire 10.48550/arXiv.2308.08155
# Acquire by arXiv ID
/research-acquire arXiv:2308.08155
# Acquire with custom identifier
/research-acquire https://arxiv.org/pdf/2308.08155.pdf --ref-id REF-022
# Acquire with full text extraction
/research-acquire 10.1145/3377811.3380330 --extract-text
```
## Expected Output
```
Acquiring Paper: 10.48550/arXiv.2308.08155
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 1: Resolving identifier
✓ DOI resolved to arXiv:2308.08155
✓ Paper not found in corpus
Step 2: Downloading PDF
✓ Downloaded from arxiv.org (2.4 MB)
✓ Saved to .aiwg/research/sources/REF-022.pdf
✓ Checksum: a1b2c3d4e5f6...
Step 3: Extracting metadata
✓ Title: AutoGen: Enabling Next-Gen LLM Applications...
✓ Authors: Wu, Q., Bansal, G., Zhang, J., et al. (9 authors)
✓ Year: 2023
✓ Source: arXiv preprint
✓ Citations: 234 (as of 2026-02-03)
Step 4: Creating finding document
✓ Generated .aiwg/research/findings/REF-022-autogen.md
✓ Frontmatter populated
✓ Template sections added
Step 5: Updating corpus
✓ Added to fixity manifest
✓ Updated INDEX.md
✓ Logged acquisition
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Acquisition complete!
REF-ID: REF-022
Title: AutoGen: Enabling Next-Gen LLM Applications...
File: .aiwg/research/sources/REF-022.pdf
Finding: .aiwg/research/findings/REF-022-autogen.md
Next Steps:
1. /research-quality REF-022 - Assess evidence quality
2. /research-document REF-022 - Create detailed summary
3. /research-cite REF-022 - Generate citation
```
## Provenance Tracking
All acquisitions create provenance records:
```yaml
# .aiwg/research/provenance/records/REF-022-acquisition.yaml
entity:
id: "urn:aiwg:artifact:.aiwg/research/sources/REF-022.pdf"
type: "research_paper"
activity:
id: "urn:aiwg:activity:acquisition:REF-022:001"
type: "acquisition"
started_at: "2026-02-03T12:00:00Z"
ended_at: "2026-02-03T12:00:15Z"
agent:
id: "urn:aiwg:agent:acquisition-agent"
type: "aiwg_agent"
source:
identifier: "10.48550/arXiv.2308.08155"
url: "https://arxiv.org/pdf/2308.08155.pdf"
```
## References
- @$AIWG_ROOT/agentic/code/frameworks/research-complete/agents/acquisition-agent.md - Acquisition Agent
- @$AIWG_ROOT/src/research/services/acquisition-service.ts - Download implementation
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/research/frontmatter-schema.yaml - Metadata format
- @.aiwg/research/fixity-manifest.json - Checksum tracking
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/provenance-tracking.md - Provenance requirementsRelated Skills
research-workflow
Execute multi-stage research workflows
research-status
Show research corpus health and statistics
research-query
Search the local research corpus, read matching findings, and synthesize an answer with inline citations to REF-XXX sources. The "query" operation for the research pipeline.
research-quality
Assess source quality using GRADE methodology
research-quality-audit
Audit research corpus for shallow stubs, incomplete sections, missing source files, and doc depth issues. Detects docs written from abstracts rather than full papers and optionally auto-dispatches expansion agents.
research-provenance
Query provenance chains and artifact relationships
research-lint
Run the research corpus lint ruleset to detect structural and referential integrity issues — orphan notes, missing frontmatter, broken references, missing GRADE assessments.
research-gap
Analyze gaps in research coverage
research-gap-detect
Build the mutual citation graph, find connected components, identify isolated clusters, and optionally search for bridge candidates and file gap issues. Automates the manual cluster analysis workflow.
research-document
Generate summaries and literature notes from research papers
research-discover
Search for research papers across academic databases
research-cite
Generate properly formatted citation from research corpus