arxiv-search

Search arXiv for preprints in physics, math, CS, quantitative biology, quantitative finance, statistics, electrical engineering, economics. Use when: (1) finding preprints by topic, (2) searching by author, (3) browsing arXiv categories, (4) getting paper metadata/abstracts. NOT for: published journal articles (use crossref-search), biomedical (use pubmed-search).

564 stars

Best use case

arxiv-search is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Search arXiv for preprints in physics, math, CS, quantitative biology, quantitative finance, statistics, electrical engineering, economics. Use when: (1) finding preprints by topic, (2) searching by author, (3) browsing arXiv categories, (4) getting paper metadata/abstracts. NOT for: published journal articles (use crossref-search), biomedical (use pubmed-search).

Teams using arxiv-search should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/arxiv-search/SKILL.md --create-dirs "https://raw.githubusercontent.com/beita6969/ScienceClaw/main/skills/arxiv-search/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/arxiv-search/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How arxiv-search Compares

Feature / Agentarxiv-searchStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Search arXiv for preprints in physics, math, CS, quantitative biology, quantitative finance, statistics, electrical engineering, economics. Use when: (1) finding preprints by topic, (2) searching by author, (3) browsing arXiv categories, (4) getting paper metadata/abstracts. NOT for: published journal articles (use crossref-search), biomedical (use pubmed-search).

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# arXiv Search

Search arXiv preprints via public API. Covers physics, math, CS, q-bio, q-fin,
statistics, electrical engineering, and economics.

## API Endpoint

```bash
curl -s "http://export.arxiv.org/api/query?search_query=all:transformer+attention&start=0&max_results=5"
```

Parameters: `search_query=` (required), `id_list=` (direct lookup by arXiv ID),
`start=` (pagination offset), `max_results=` (default 10, max 30000),
`sortBy=relevance|lastUpdatedDate|submittedDate`, `sortOrder=ascending|descending`.

## Query Syntax

**Field prefixes**: `ti:` title, `au:` author, `abs:` abstract, `co:` comment,
`jr:` journal ref, `cat:` category, `all:` all fields.

**Boolean**: `AND`, `OR`, `ANDNOT`. Example:
```bash
curl -s "http://export.arxiv.org/api/query?search_query=au:bengio+AND+cat:cs.LG+AND+ti:attention&max_results=10"
```

## Category Codes

**Physics**: `astro-ph` (.CO/.EP/.GA/.HE/.IM/.SR), `cond-mat` (.dis-nn/.mes-hall/.mtrl-sci/.soft/.stat-mech/.str-el/.supr-con), `hep-ex`, `hep-lat`, `hep-ph`, `hep-th`, `quant-ph`, `gr-qc`, `nucl-ex`, `nucl-th`

**CS**: `cs.AI`, `cs.CL` (NLP), `cs.CV`, `cs.LG` (ML), `cs.CR`, `cs.DB`, `cs.DS`, `cs.SE`, `cs.RO`

**Math**: `math.AG`, `math.AP`, `math.CO`, `math.PR`, `math.ST`

**Other**: `q-bio` (.BM/.CB/.GN/.MN/.NC/.PE/.QM/.SC/.TO), `q-fin` (.CP/.EC/.GN/.MF/.PM/.PR/.RM/.ST/.TR), `stat` (.AP/.CO/.ME/.ML/.OT/.TH), `eess` (.AS/.IV/.SP/.SY), `econ` (.EM/.GN/.TH)

## Response Parsing

The API returns Atom XML. Parse with Python:

```bash
curl -s "http://export.arxiv.org/api/query?search_query=ti:large+language+model&max_results=5&sortBy=submittedDate&sortOrder=descending" | python3 -c "
import sys, xml.etree.ElementTree as ET
ns = {'a': 'http://www.w3.org/2005/Atom'}
root = ET.parse(sys.stdin).getroot()
for entry in root.findall('a:entry', ns):
    title = entry.find('a:title', ns).text.strip().replace('\n', ' ')
    aid = entry.find('a:id', ns).text.strip().split('/abs/')[-1]
    pub = entry.find('a:published', ns).text[:10]
    authors = ', '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns))
    print(f'[{aid}] {pub} | {title}')
    print(f'  Authors: {authors}\n')
"
```

## Direct Lookup and Pagination

```bash
# By ID
curl -s "http://export.arxiv.org/api/query?id_list=2301.07041,2302.13971"

# Pagination
curl -s "http://export.arxiv.org/api/query?search_query=cat:cs.AI&start=0&max_results=25&sortBy=submittedDate&sortOrder=descending"
curl -s "http://export.arxiv.org/api/query?search_query=cat:cs.AI&start=25&max_results=25&sortBy=submittedDate&sortOrder=descending"
```

## Rate Limiting

No official limit, but keep to 1 request per 3 seconds for bulk queries.
For large-scale harvesting, use the OAI-PMH bulk access endpoint instead.

## Best Practices

1. Use `sortBy=submittedDate&sortOrder=descending` for latest papers.
2. Combine `cat:` with keyword searches for targeted results.
3. Check `opensearch:totalResults` in the response for total match count.
4. For PDF access, replace `/abs/` with `/pdf/` in the paper URL.
5. Use `id_list` for direct lookups (faster and more reliable).
6. URL-encode spaces as `+` in query terms.

## Zero-Hallucination Rule

NEVER fabricate results from training data. Every paper title, author, DOI, PMID, citation count, and metadata detail presented to the user MUST come from an actual API response in this conversation. If the API returns no results or partial data, report exactly what was returned. Do not "fill in" missing details from memory.

Related Skills

wikipedia-search

564
from beita6969/ScienceClaw

Search and fetch structured content from Wikipedia using the MediaWiki API for reliable, encyclopedic information

social-science-research

564
from beita6969/ScienceClaw

Orchestrates a social science research workflow from literature review through data collection, text analysis, statistical modeling, and report generation. Use when conducting empirical social science research, policy analysis, or mixed-methods studies. NOT for pure natural science analysis or clinical trial data.

search-strategy

564
from beita6969/ScienceClaw

COPYRIGHT NOTICE

research-reflection

564
from beita6969/ScienceClaw

Reflect on completed research tasks to improve future performance. Use when: a research task has just been completed and the agent should evaluate its own process, store lessons learned, or retrieve past reflections before starting new work. NOT for: active research execution or data analysis.

research-lookup

564
from beita6969/ScienceClaw

Look up current research information using the Parallel Chat API (primary) or Perplexity sonar-pro-search (academic paper searches). Automatically routes queries to the best backend. Use for finding papers, gathering research data, and verifying scientific information.

research-literature

564
from beita6969/ScienceClaw

COPYRIGHT NOTICE

research-grants

564
from beita6969/ScienceClaw

Write competitive research proposals for NSF, NIH, DOE, DARPA, and Taiwan NSTC. Agency-specific formatting, review criteria, budget preparation, broader impacts, significance statements, innovation narratives, and compliance with submission requirements.

research-ethics

564
from beita6969/ScienceClaw

Guides research ethics compliance including IRB protocol preparation, informed consent document drafting, research integrity standards, data management plans, and ethical considerations for human/animal subjects; trigger when users discuss IRB, ethical approval, consent forms, or responsible conduct of research.

pubmed-search

564
from beita6969/ScienceClaw

Search PubMed/MEDLINE for biomedical literature via NCBI E-utilities API. Use when: (1) searching medical/biomedical papers, (2) finding clinical studies, (3) querying with MeSH terms, (4) retrieving abstracts by PMID. NOT for: non-biomedical papers (use arxiv-search or semantic-scholar), full-text access (PubMed provides abstracts), or social science literature.

psychology-research

564
from beita6969/ScienceClaw

Conduct psychological research analysis including mental health, cognitive science, and behavioral studies

perplexity-search

564
from beita6969/ScienceClaw

Perform AI-powered web searches with real-time information using Perplexity models via LiteLLM and OpenRouter. This skill should be used when conducting web searches for current information, finding recent scientific literature, getting grounded answers with source citations, or accessing information beyond the model knowledge cutoff. Provides access to multiple Perplexity models including Sonar Pro, Sonar Pro Search (advanced agentic search), and Sonar Reasoning Pro through a single OpenRouter API key.

openalex-search

564
from beita6969/ScienceClaw

Open academic metadata via OpenAlex API. Use when: user needs author profiles, institution data, concept mapping, or open citation data. NOT for: full-text search or downloading papers.