asta-corpus-search

Search Allen AI's Asta Scientific Corpus (225M+ papers, 12M+ full-text, 2.4B+ citations) via MCP endpoint. Provides paragraph-level semantic search across full-text publications, citation graph traversal, and author analysis. Use as a complement to PubMed/OpenAlex/Semantic Scholar for deeper literature discovery, especially when full-text search or citation network analysis is needed. Requires ASTA_API_KEY in .env (free registration at allenai.org/asta).

42 stars

Best use case

asta-corpus-search is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Search Allen AI's Asta Scientific Corpus (225M+ papers, 12M+ full-text, 2.4B+ citations) via MCP endpoint. Provides paragraph-level semantic search across full-text publications, citation graph traversal, and author analysis. Use as a complement to PubMed/OpenAlex/Semantic Scholar for deeper literature discovery, especially when full-text search or citation network analysis is needed. Requires ASTA_API_KEY in .env (free registration at allenai.org/asta).

Teams using asta-corpus-search should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/asta-corpus-search/SKILL.md --create-dirs "https://raw.githubusercontent.com/Zaoqu-Liu/ScienceClaw/main/skills/asta-corpus-search/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/asta-corpus-search/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How asta-corpus-search Compares

Feature / Agentasta-corpus-searchStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Search Allen AI's Asta Scientific Corpus (225M+ papers, 12M+ full-text, 2.4B+ citations) via MCP endpoint. Provides paragraph-level semantic search across full-text publications, citation graph traversal, and author analysis. Use as a complement to PubMed/OpenAlex/Semantic Scholar for deeper literature discovery, especially when full-text search or citation network analysis is needed. Requires ASTA_API_KEY in .env (free registration at allenai.org/asta).

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Asta Scientific Corpus Search

Access Allen AI's massive scientific literature graph: 225M+ papers, 80M+ authors, 2.4B+ citation edges, and 12M+ full-text publications (285M+ passages).

## When to Use

- **Always** as part of multi-source literature search (Channel 4 alongside PubMed, OpenAlex, Semantic Scholar)
- **Especially useful** when you need:
  - Full-text paragraph-level search (not just title/abstract)
  - Deep citation network traversal
  - Cross-disciplinary paper discovery
  - Author relationship analysis

## API Configuration

| Parameter | Value |
|-----------|-------|
| **Endpoint** | `https://asta-tools.allen.ai/mcp/v1` |
| **Protocol** | MCP over HTTP POST (JSON-RPC style) |
| **Auth** | `x-api-key: $ASTA_API_KEY` header |
| **Rate limits** | Higher with API key; basic access without |
| **Key registration** | Free at https://allenai.org/asta/resources |

## Core Tools

### 1. search_papers — Find papers by query

```bash
curl -s "https://asta-tools.allen.ai/mcp/v1" \
  -H "Content-Type: application/json" \
  -H "x-api-key: $ASTA_API_KEY" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "id": 1,
    "params": {
      "name": "search_papers",
      "arguments": {
        "query": "THBS2 tumor microenvironment macrophage",
        "limit": 15
      }
    }
  }'
```

Returns: paper IDs, titles, authors, year, citation count, abstract snippets.

### 2. get_papers — Retrieve paper details by ID

Supports multiple ID types: DOI, arXiv ID, PMID, Semantic Scholar CorpusId.

```bash
curl -s "https://asta-tools.allen.ai/mcp/v1" \
  -H "Content-Type: application/json" \
  -H "x-api-key: $ASTA_API_KEY" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "id": 1,
    "params": {
      "name": "get_papers",
      "arguments": {
        "paper_ids": ["PMID:32273438", "DOI:10.1038/s41586-024-07487-w"],
        "fields": ["title", "authors", "year", "abstract", "citationCount", "references"]
      }
    }
  }'
```

### 3. get_citations — Citation graph traversal

```bash
curl -s "https://asta-tools.allen.ai/mcp/v1" \
  -H "Content-Type: application/json" \
  -H "x-api-key: $ASTA_API_KEY" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "id": 1,
    "params": {
      "name": "get_citations",
      "arguments": {
        "paper_id": "PMID:32273438",
        "direction": "citations",
        "limit": 20
      }
    }
  }'
```

Use `"direction": "references"` for backward citations.

## Integration with Multi-Source Search

Add Asta as the fourth channel in the standard literature search block:

```bash
echo "=== PubMed ===" && \
curl -s "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmode=json&retmax=20&sort=relevance&term=QUERY" && \
echo -e "\n=== OpenAlex ===" && \
curl -s "https://api.openalex.org/works?search=QUERY&per_page=10&sort=relevance_score:desc&select=id,title,authorships,publication_year,cited_by_count,doi,primary_location" && \
echo -e "\n=== Semantic Scholar ===" && \
curl -s "https://api.semanticscholar.org/graph/v1/paper/search?query=QUERY&limit=10&fields=title,authors,year,abstract,citationCount,externalIds,url" && \
echo -e "\n=== Asta (225M papers, full-text index) ===" && \
curl -s "https://asta-tools.allen.ai/mcp/v1" \
  -H "Content-Type: application/json" \
  -H "x-api-key: $ASTA_API_KEY" \
  -d '{"jsonrpc":"2.0","method":"tools/call","id":1,"params":{"name":"search_papers","arguments":{"query":"QUERY","limit":15}}}'
```

## When Asta Adds Unique Value

| Scenario | Why Asta helps |
|----------|---------------|
| Full-text keyword search | PubMed only searches title/abstract; Asta indexes 285M+ passages from 12M full-text papers |
| Finding methods/protocols | Search for specific techniques mentioned only in methods sections |
| Citation network depth | 2.4B+ citation edges enable deep forward/backward chain analysis |
| Cross-disciplinary discovery | 225M papers across all fields, not limited to biomedical |
| Preprint coverage | Includes arXiv, bioRxiv, medRxiv alongside published papers |

## Fallback Behavior

If `ASTA_API_KEY` is not configured or the API returns an error:
- Skip Asta silently
- Continue with PubMed + OpenAlex + Semantic Scholar
- Do NOT report the failure to the user unless they explicitly asked for Asta results

## Response Parsing

Asta MCP responses follow JSON-RPC format:

```json
{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "... JSON string with paper results ..."
      }
    ]
  }
}
```

Parse `result.content[0].text` as JSON to extract paper data. Handle nested JSON strings.

Related Skills

tooluniverse-target-research

42
from Zaoqu-Liu/ScienceClaw

Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.

tooluniverse-literature-deep-research

42
from Zaoqu-Liu/ScienceClaw

Conduct comprehensive literature research with target disambiguation, evidence grading, and structured theme extraction. Creates a detailed report with mandatory completeness checklist, biological model synthesis, and testable hypotheses. For biological targets, resolves official IDs (Ensembl/UniProt), synonyms, naming collisions, and gathers expression/pathway context before literature search. Default deliverable is a report file; for single factoid questions, uses a fast verification mode and may include an inline answer. Use when users need thorough literature reviews, target profiles, or to verify specific claims from the literature.

tooluniverse-drug-research

42
from Zaoqu-Liu/ScienceClaw

Generates comprehensive drug research reports with compound disambiguation, evidence grading, and mandatory completeness sections. Covers identity, chemistry, pharmacology, targets, clinical trials, safety, pharmacogenomics, and ADMET properties. Use when users ask about drugs, medications, therapeutics, or need drug profiling, safety assessment, or clinical development research.

tooluniverse-disease-research

42
from Zaoqu-Liu/ScienceClaw

Generate comprehensive disease research reports using 100+ ToolUniverse tools. Creates a detailed markdown report file and progressively updates it with findings from 10 research dimensions. All information includes source references. Use when users ask about diseases, syndromes, or need systematic disease analysis.

Science Communication — Making Research Accessible

42
from Zaoqu-Liu/ScienceClaw

## Overview

research-recipes

42
from Zaoqu-Liu/ScienceClaw

Pre-built research workflow templates that execute complete multi-step analyses from a single user prompt. Triggers on gene analysis, target validation, literature review, differential expression, clinical queries, researcher profiling, drug repurposing, or molecular dynamics simulation. Use when the user's query matches a Recipe pattern defined in SCIENCE.md.

research-lookup

42
from Zaoqu-Liu/ScienceClaw

Look up current research information using Perplexity Sonar Pro Search or Sonar Reasoning Pro models through OpenRouter. Automatically selects the best model based on query complexity. Search academic papers, recent studies, technical documentation, and general research information with citations.

research-grants

42
from Zaoqu-Liu/ScienceClaw

Write competitive research proposals for NSF, NIH, DOE, DARPA, and Taiwan NSTC. Agency-specific formatting, review criteria, budget preparation, broader impacts, significance statements, innovation narratives, and compliance with submission requirements.

research-alerts

42
from Zaoqu-Liu/ScienceClaw

Monitor research topics and alert the user when new papers are published. Use when user says "/watch", "监控", "关注这个课题", "有新文献告诉我", "monitor this topic", "alert me on new papers", "track new publications". Stores watch configurations and checks for new results at session start.

pubmed-search

42
from Zaoqu-Liu/ScienceClaw

Search PubMed biomedical literature with natural language queries powered by Valyu semantic search. Full-text access, integrate into your AI projects.

paper-search

42
from Zaoqu-Liu/ScienceClaw

Search and discover academic papers from arXiv and web sources using arxiv_to_prompt and web search

perplexity-search

42
from Zaoqu-Liu/ScienceClaw

Perform AI-powered web searches with real-time information using Perplexity models via LiteLLM and OpenRouter. This skill should be used when conducting web searches for current information, finding recent scientific literature, getting grounded answers with source citations, or accessing information beyond the model knowledge cutoff. Provides access to multiple Perplexity models including Sonar Pro, Sonar Pro Search (advanced agentic search), and Sonar Reasoning Pro through a single OpenRouter API key.