arxiv-mcp

Search and retrieve academic papers from arXiv.org using WebFetch and Exa. No MCP server required - uses existing tools to access arXiv API directly.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

arxiv-mcp is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Search and retrieve academic papers from arXiv.org using WebFetch and Exa. No MCP server required - uses existing tools to access arXiv API directly.

Teams using arxiv-mcp should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/arxiv-mcp/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/backend/arxiv-mcp/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/arxiv-mcp/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How arxiv-mcp Compares

Feature / Agent	arxiv-mcp	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Search and retrieve academic papers from arXiv.org using WebFetch and Exa. No MCP server required - uses existing tools to access arXiv API directly.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# arXiv Search Skill

<identity>
arXiv Search Skill - Search and retrieve academic papers from arXiv.org using existing tools (WebFetch, Exa). No MCP server installation required.
</identity>

## ✅ No Installation Required

This skill uses **existing tools** to access arXiv:

- **WebFetch** - Direct access to arXiv API
- **Exa** - Semantic search with arXiv filtering

Works immediately - no MCP server, no restart needed.

<capabilities>
- Search academic papers by keywords, authors, categories, or date ranges
- Retrieve detailed paper metadata (title, authors, abstract, categories, PDF link)
- Get specific papers by arXiv ID
- Find related papers based on categories and keywords
- Filter by arXiv categories (cs.AI, cs.LG, cs.CV, math.*, physics.*, etc.)
- No API key required - uses public arXiv API
</capabilities>

<instructions>
<execution_process>

## Method 1: WebFetch with arXiv API (Recommended for specific queries)

The arXiv API is publicly accessible at `http://export.arxiv.org/api/query`.

### Search by Keywords

```javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention&max_results=10&sortBy=relevance',
  prompt: 'Extract paper titles, authors, abstracts, arXiv IDs, and PDF links from these results',
});
```

### Search by Author

```javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=au:LeCun&max_results=10&sortBy=submittedDate',
  prompt: 'Extract paper titles, authors, abstracts, and arXiv IDs',
});
```

### Search by Category

```javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=15&sortBy=submittedDate',
  prompt: 'Extract paper titles, authors, abstracts, categories, and arXiv IDs',
});
```

### Get Specific Paper by ID

```javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?id_list=2301.07041',
  prompt:
    'Extract full details: title, all authors, abstract, categories, published date, PDF link',
});
```

### API Query Parameters

| Parameter      | Description                                                 | Example                                       |
| -------------- | ----------------------------------------------------------- | --------------------------------------------- |
| `search_query` | Search terms with field prefixes                            | `all:transformer`, `au:LeCun`, `ti:attention` |
| `id_list`      | Comma-separated arXiv IDs                                   | `2301.07041,2302.13971`                       |
| `max_results`  | Number of results (default 10, max 100)                     | `max_results=20`                              |
| `start`        | Offset for pagination                                       | `start=10`                                    |
| `sortBy`       | Sort order: `relevance`, `lastUpdatedDate`, `submittedDate` | `sortBy=submittedDate`                        |
| `sortOrder`    | `ascending` or `descending`                                 | `sortOrder=descending`                        |

### Field Prefixes for search_query

| Prefix | Field      | Example                   |
| ------ | ---------- | ------------------------- |
| `all:` | All fields | `all:machine+learning`    |
| `ti:`  | Title      | `ti:transformer`          |
| `au:`  | Author     | `au:Vaswani`              |
| `abs:` | Abstract   | `abs:attention+mechanism` |
| `cat:` | Category   | `cat:cs.LG`               |
| `co:`  | Comment    | `co:accepted`             |

### Boolean Operators

Combine terms with `AND`, `OR`, `ANDNOT`:

```
search_query=ti:transformer+AND+abs:attention
search_query=au:LeCun+OR+au:Bengio
search_query=cat:cs.LG+ANDNOT+ti:survey
```

---

## Method 2: Exa Search (Better for semantic/natural language queries)

Use Exa for more natural language queries with arXiv filtering:

### Semantic Search

```javascript
mcp__Exa__web_search_exa({
  query: 'site:arxiv.org transformer architecture attention mechanism deep learning',
  numResults: 10,
});
```

### Recent Papers in a Field

```javascript
mcp__Exa__web_search_exa({
  query: 'site:arxiv.org large language model scaling laws 2024',
  numResults: 15,
});
```

### Author-Focused Search

```javascript
mcp__Exa__web_search_exa({
  query: 'site:arxiv.org author:"Yann LeCun" deep learning',
  numResults: 10,
});
```

---

## Common arXiv Categories

| Category   | Field                           |
| ---------- | ------------------------------- |
| cs.AI      | Artificial Intelligence         |
| cs.LG      | Machine Learning                |
| cs.CL      | Computation and Language (NLP)  |
| cs.CV      | Computer Vision                 |
| cs.SE      | Software Engineering            |
| cs.CR      | Cryptography and Security       |
| stat.ML    | Machine Learning (Statistics)   |
| math.\*    | Mathematics (all subcategories) |
| physics.\* | Physics (all subcategories)     |
| q-bio.\*   | Quantitative Biology            |
| econ.\*    | Economics                       |

---

## Workflow: Complete Research Process

### Step 1: Initial Search

```javascript
// Start with broad Exa search for semantic matching
mcp__Exa__web_search_exa({
  query: 'site:arxiv.org transformer attention mechanism neural networks',
  numResults: 10,
});
```

### Step 2: Get Specific Papers

```javascript
// Get details for interesting papers by ID
WebFetch({
  url: 'http://export.arxiv.org/api/query?id_list=2301.07041,2302.13971',
  prompt: 'Extract full metadata for each paper: title, authors, abstract, categories, PDF URL',
});
```

### Step 3: Find Related Work

```javascript
// Search by category of interesting paper
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG+AND+ti:attention&max_results=10&sortBy=submittedDate',
  prompt: 'Find related papers, extract titles and abstracts',
});
```

### Step 4: Get Recent Papers

```javascript
// Latest papers in the field
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=20&sortBy=submittedDate&sortOrder=descending',
  prompt: 'Extract the 20 most recent machine learning papers',
});
```

</execution_process>

<best_practices>

1. **Use Exa for discovery**: Natural language queries find semantically related papers
2. **Use WebFetch for precision**: Specific IDs, categories, or API queries
3. **Combine approaches**: Exa to discover, WebFetch to deep-dive
4. **Use specific queries**: "transformer attention mechanism" > "machine learning"
5. **Check multiple categories**: Papers often span cs.AI + cs.LG + cs.CL
6. **Sort by date for recent work**: `sortBy=submittedDate&sortOrder=descending`

</best_practices>
</instructions>

<examples>
<usage_example>
**Example 1: Search for transformer papers**:

```javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=ti:transformer+AND+abs:attention&max_results=10&sortBy=relevance',
  prompt: 'Extract paper titles, authors, abstracts, and arXiv IDs',
});
```

**Example 2: Find papers by researcher**:

```javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=au:Vaswani&max_results=15',
  prompt: 'List all papers by this author with titles and dates',
});
```

**Example 3: Get recent ML papers**:

```javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=20&sortBy=submittedDate&sortOrder=descending',
  prompt: 'Extract the 20 most recent machine learning papers with titles and abstracts',
});
```

**Example 4: Semantic search with Exa**:

```javascript
mcp__Exa__web_search_exa({
  query: 'site:arxiv.org multimodal large language models vision 2024',
  numResults: 10,
});
```

**Example 5: Get specific paper details**:

```javascript
WebFetch({
  url: 'http://export.arxiv.org/api/query?id_list=1706.03762',
  prompt: "Extract complete details for the 'Attention Is All You Need' paper",
});
```

</usage_example>
</examples>

## Agent Integration

This skill is automatically assigned to:

- **researcher** - Academic research, literature review
- **scientific-research-expert** - Deep scientific analysis
- **developer** - Finding technical papers for implementation

## Memory Protocol (MANDATORY)

**Before starting:**

```bash
cat .claude/context/memory/learnings.md
```

**After completing:**

- New pattern -> `.claude/context/memory/learnings.md`
- Issue found -> `.claude/context/memory/issues.md`
- Decision made -> `.claude/context/memory/decisions.md`

> ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.

Related Skills

arxiv-paper-extract

from diegosouzapw/awesome-omni-skill

Extract, translate and save arXiv CS.CV papers for a specific date. Use when user asks to fetch arXiv papers, download paper lists, extract CV papers, translate paper titles to Chinese, or save paper metadata from arxiv.org/list/cs.CV.

arxivterminal

from diegosouzapw/awesome-omni-skill

CLI tool (arxivterminal) for fetching, searching, and managing arXiv papers locally. Use when working with arXiv papers using the arxivterminal command - fetching new papers by category, searching the local database, viewing papers from specific dates, or managing the local paper database.

arxiv-reader

from diegosouzapw/awesome-omni-skill

arXiv 論文の内容を取得・要約するスキル。URL が arxiv.org/abs/{論文ID} 形式の場合に使用。PDF をダウンロードして Read ツールで読み取る。

arxiv-search

from diegosouzapw/awesome-omni-skill

Search arXiv preprint repository for papers in physics, mathematics, computer science, quantitative biology, and related fields

bgo

from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

moai-lang-r

from diegosouzapw/awesome-omni-skill

R 4.4+ best practices with testthat 3.2, lintr 3.2, and data analysis patterns.

moai-lang-python

from diegosouzapw/awesome-omni-skill

Python 3.13+ development specialist covering FastAPI, Django, async patterns, data science, testing with pytest, and modern Python features. Use when developing Python APIs, web applications, data pipelines, or writing tests.

moai-icons-vector

from diegosouzapw/awesome-omni-skill

Vector icon libraries ecosystem guide covering 10+ major libraries with 200K+ icons, including React Icons (35K+), Lucide (1000+), Tabler Icons (5900+), Iconify (200K+), Heroicons, Phosphor, and Radix Icons with implementation patterns, decision trees, and best practices.

moai-foundation-trust

from diegosouzapw/awesome-omni-skill

Complete TRUST 4 principles guide covering Test First, Readable, Unified, Secured. Validation methods, enterprise quality gates, metrics, and November 2025 standards. Enterprise v4.0 with 50+ software quality standards references.

moai-foundation-memory

from diegosouzapw/awesome-omni-skill

Persistent memory across sessions using MCP Memory Server for user preferences, project context, and learned patterns

moai-foundation-core

from diegosouzapw/awesome-omni-skill

MoAI-ADK's foundational principles - TRUST 5, SPEC-First TDD, delegation patterns, token optimization, progressive disclosure, modular architecture, agent catalog, command reference, and execution rules for building AI-powered development workflows

moai-cc-claude-md

from diegosouzapw/awesome-omni-skill

Authoring CLAUDE.md Project Instructions. Design project-specific AI guidance, document workflows, define architecture patterns. Use when creating CLAUDE.md files for projects, documenting team standards, or establishing AI collaboration guidelines.