literature-search

Comprehensive multi-database scientific literature search orchestrating Semantic Scholar, OpenAlex, arXiv, PubMed, and CrossRef. Use when: (1) systematic literature review, (2) finding all relevant papers on a topic, (3) checking state of the art, (4) building comprehensive bibliographies. NOT for: single-database queries (use specific search skills), data analysis (use code-execution).

564 stars

Best use case

literature-search is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Comprehensive multi-database scientific literature search orchestrating Semantic Scholar, OpenAlex, arXiv, PubMed, and CrossRef. Use when: (1) systematic literature review, (2) finding all relevant papers on a topic, (3) checking state of the art, (4) building comprehensive bibliographies. NOT for: single-database queries (use specific search skills), data analysis (use code-execution).

Teams using literature-search should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/literature-search/SKILL.md --create-dirs "https://raw.githubusercontent.com/beita6969/ScienceClaw/main/skills/literature-search/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/literature-search/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How literature-search Compares

Feature / Agentliterature-searchStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Comprehensive multi-database scientific literature search orchestrating Semantic Scholar, OpenAlex, arXiv, PubMed, and CrossRef. Use when: (1) systematic literature review, (2) finding all relevant papers on a topic, (3) checking state of the art, (4) building comprehensive bibliographies. NOT for: single-database queries (use specific search skills), data analysis (use code-execution).

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Literature Search (Meta Skill)

Orchestrate comprehensive literature searches across multiple databases.
**Always execute real API calls** — never fabricate results or rely on training data.

## Priority Order of Databases

1. **Semantic Scholar** (PRIMARY) — best relevance ranking, AI TLDR summaries, citation graph
2. **OpenAlex** (PRIMARY) — 250M+ works, powerful filtering, open access URLs
3. **arXiv** — preprints in physics, math, CS, biology, finance, statistics
4. **PubMed** — biomedical and life sciences (NCBI may be unreachable from some networks)
5. **CrossRef** — DOI resolution and metadata only (NOT for search — poor relevance ranking)

**IMPORTANT**: CrossRef search results are poorly ranked by relevance. Never use CrossRef
as the primary search engine. Use it only for DOI-based lookups and metadata enrichment.

## Mandatory Search Protocol

Every literature search MUST follow this protocol:

### Step 1: Semantic Scholar Search (always do this first)

```bash
# Primary search — returns papers ranked by relevance with AI summaries
curl -s "https://api.semanticscholar.org/graph/v1/paper/search?\
query=YOUR+SEARCH+TERMS&limit=10&\
fields=title,authors,year,abstract,citationCount,influentialCitationCount,\
isOpenAccess,openAccessPdf,url,externalIds,tldr,venue,publicationDate"
```

Parse results with:
```bash
| python3 -c "
import sys, json
data = json.load(sys.stdin)
print(f'Total: {data[\"total\"]} papers')
for i, p in enumerate(data['data']):
    authors = ', '.join(a['name'] for a in (p.get('authors') or [])[:3])
    if len(p.get('authors') or []) > 3: authors += ' et al.'
    tldr = p.get('tldr', {})
    tldr_text = tldr['text'][:150] if tldr else 'N/A'
    oa = '🔓' if p.get('isOpenAccess') else '🔒'
    doi = (p.get('externalIds') or {}).get('DOI', '')
    print(f'[{i+1}] {p[\"title\"]}')
    print(f'    {authors} ({p.get(\"year\",\"?\")}) — {p.get(\"venue\",\"?\")}')
    print(f'    Cited: {p.get(\"citationCount\",0)} (influential: {p.get(\"influentialCitationCount\",0)}) {oa}')
    print(f'    TLDR: {tldr_text}')
    print(f'    DOI: {doi}')
    print()
"
```

Useful filters:
- `year=2022-2025` — restrict by year range
- `fieldsOfStudy=Computer Science` — filter by discipline
- `minCitationCount=10` — only cited papers

### Step 2: OpenAlex Search (for broader coverage + OA links)

```bash
# Complementary search with powerful filtering
curl -s "https://api.openalex.org/works?\
search=YOUR+SEARCH+TERMS&per_page=10&\
sort=relevance_score:desc&\
select=title,publication_year,cited_by_count,doi,authorships,open_access,\
primary_location,abstract_inverted_index&\
mailto=scienceclaw@openclaw.ai"
```

Useful filters (append to URL as `&filter=`):
- `publication_year:2023-2025` — year range
- `cited_by_count:>50` — minimum citations
- `open_access.is_oa:true` — only open access
- `authorships.author.id:A5023888391` — by author OpenAlex ID
- `concepts.id:C41008148` — by concept (e.g., Computer Science)

### Step 3: Discipline-Specific Database (if relevant)

| Discipline | Additional Database | Skill |
|---|---|---|
| Biomedicine / Clinical | PubMed | `pubmed-search` |
| Physics / CS / Math | arXiv | `arxiv-search` |
| Computer Science | DBLP | `dblp-search` |
| Economics / Social Sci | SSRN/RePEc | `ssrn-econpapers` |

### Step 4: Deduplication and Ranking

Match across databases by DOI (most reliable), then normalized title.
Rank by: Semantic Scholar relevance > citation count > influential citations > recency.

### Step 5: Citation Chaining (for thorough searches)

For top 3-5 seed papers, retrieve their references and citations:
```bash
# Forward citations (who cites this paper)
curl -s "https://api.semanticscholar.org/graph/v1/paper/{paperId}/citations?\
fields=title,year,citationCount,venue&limit=20"

# Backward references (what this paper cites)
curl -s "https://api.semanticscholar.org/graph/v1/paper/{paperId}/references?\
fields=title,year,citationCount,venue&limit=20"
```

### Step 6: Paper Recommendations (for discovery)

```bash
# Find similar papers
curl -s "https://api.semanticscholar.org/recommendations/v1/papers/\
forpaper/{paperId}?fields=title,year,citationCount,tldr&limit=10"
```

## Search Quality Checklist

Before presenting results, verify:
- [ ] At least Semantic Scholar was searched with a real API call
- [ ] Results contain real DOIs/paper IDs (not fabricated)
- [ ] Citation counts are from the API (not estimated)
- [ ] Each paper has a verifiable identifier (DOI, arXiv ID, PMID, or S2 URL)
- [ ] TLDR summaries are from Semantic Scholar (not self-generated)

## Output Format

```
[1] Title
    Authors (Year) — Venue
    Cited: N (influential: M) 🔓/🔒
    TLDR: AI-generated summary from Semantic Scholar
    DOI: 10.xxxx/xxxxx | arXiv: xxxx.xxxxx | PMID: xxxxxxxx
    URL: https://...
```

## Zero-Hallucination Rule (ABSOLUTE)

**Every citation detail must come from a tool result in this conversation.**

- NEVER fabricate or "fill in" paper titles, authors, DOIs, PMIDs, citation counts, or journal names from training data
- NEVER say "a well-known study by X et al." without having searched for it first
- If a search returns 0 results, report that honestly — do not substitute training knowledge
- If a tool returns partial metadata (title but no DOI), report only what the tool returned
- Before presenting any paper, verify: Did a tool in THIS conversation return this information?

## Common Pitfalls to Avoid

1. **DO NOT** use CrossRef `/works?query=` for discovery — its relevance ranking is poor
2. **DO NOT** fabricate paper titles, authors, or DOIs from training knowledge
3. **DO NOT** skip API calls and rely on what you "know" about the literature
4. **DO NOT** present Semantic Scholar TLDRs as your own analysis
5. **ALWAYS** run the actual curl commands and parse real responses
6. **ALWAYS** include at least one verifiable identifier per paper
7. **ALWAYS** self-check: every detail in your response must trace back to a tool result

## Rate Limits

| Database | Without Key | With Key |
|---|---|---|
| Semantic Scholar | 100 req/5 min | 1/sec sustained |
| OpenAlex | 10 req/sec (polite pool with mailto) | Same |
| arXiv | ~1 req/3 sec | Same |
| CrossRef | 1 req/sec | 50 req/sec (with mailto) |

Related Skills

wikipedia-search

564
from beita6969/ScienceClaw

Search and fetch structured content from Wikipedia using the MediaWiki API for reliable, encyclopedic information

social-science-research

564
from beita6969/ScienceClaw

Orchestrates a social science research workflow from literature review through data collection, text analysis, statistical modeling, and report generation. Use when conducting empirical social science research, policy analysis, or mixed-methods studies. NOT for pure natural science analysis or clinical trial data.

search-strategy

564
from beita6969/ScienceClaw

COPYRIGHT NOTICE

research-reflection

564
from beita6969/ScienceClaw

Reflect on completed research tasks to improve future performance. Use when: a research task has just been completed and the agent should evaluate its own process, store lessons learned, or retrieve past reflections before starting new work. NOT for: active research execution or data analysis.

research-lookup

564
from beita6969/ScienceClaw

Look up current research information using the Parallel Chat API (primary) or Perplexity sonar-pro-search (academic paper searches). Automatically routes queries to the best backend. Use for finding papers, gathering research data, and verifying scientific information.

research-literature

564
from beita6969/ScienceClaw

COPYRIGHT NOTICE

research-grants

564
from beita6969/ScienceClaw

Write competitive research proposals for NSF, NIH, DOE, DARPA, and Taiwan NSTC. Agency-specific formatting, review criteria, budget preparation, broader impacts, significance statements, innovation narratives, and compliance with submission requirements.

research-ethics

564
from beita6969/ScienceClaw

Guides research ethics compliance including IRB protocol preparation, informed consent document drafting, research integrity standards, data management plans, and ethical considerations for human/animal subjects; trigger when users discuss IRB, ethical approval, consent forms, or responsible conduct of research.

pubmed-search

564
from beita6969/ScienceClaw

Search PubMed/MEDLINE for biomedical literature via NCBI E-utilities API. Use when: (1) searching medical/biomedical papers, (2) finding clinical studies, (3) querying with MeSH terms, (4) retrieving abstracts by PMID. NOT for: non-biomedical papers (use arxiv-search or semantic-scholar), full-text access (PubMed provides abstracts), or social science literature.

psychology-research

564
from beita6969/ScienceClaw

Conduct psychological research analysis including mental health, cognitive science, and behavioral studies

perplexity-search

564
from beita6969/ScienceClaw

Perform AI-powered web searches with real-time information using Perplexity models via LiteLLM and OpenRouter. This skill should be used when conducting web searches for current information, finding recent scientific literature, getting grounded answers with source citations, or accessing information beyond the model knowledge cutoff. Provides access to multiple Perplexity models including Sonar Pro, Sonar Pro Search (advanced agentic search), and Sonar Reasoning Pro through a single OpenRouter API key.

openalex-search

564
from beita6969/ScienceClaw

Open academic metadata via OpenAlex API. Use when: user needs author profiles, institution data, concept mapping, or open citation data. NOT for: full-text search or downloading papers.