reactome-database

Query Reactome REST API for pathway analysis, enrichment, gene-pathway mapping, disease pathways, molecular interactions, expression analysis, for systems biology studies.

7 stars

bysanand0

View on GitHub Installation ↓

Best use case

reactome-database is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Query Reactome REST API for pathway analysis, enrichment, gene-pathway mapping, disease pathways, molecular interactions, expression analysis, for systems biology studies.

Teams using reactome-database should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/reactome-database/SKILL.md --create-dirs "https://raw.githubusercontent.com/sanand0/scientific-research/main/.claude/skills/reactome-database/reactome-database/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/reactome-database/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How reactome-database Compares

Feature / Agent	reactome-database	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Query Reactome REST API for pathway analysis, enrichment, gene-pathway mapping, disease pathways, molecular interactions, expression analysis, for systems biology studies.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Reactome Database

## Overview

Reactome is a free, open-source, curated pathway database with 2,825+ human pathways. Query biological pathways, perform overrepresentation and expression analysis, map genes to pathways, explore molecular interactions via REST API and Python client for systems biology research.

## When to Use This Skill

This skill should be used when:
- Performing pathway enrichment analysis on gene or protein lists
- Analyzing gene expression data to identify relevant biological pathways
- Querying specific pathway information, reactions, or molecular interactions
- Mapping genes or proteins to biological pathways and processes
- Exploring disease-related pathways and mechanisms
- Visualizing analysis results in the Reactome Pathway Browser
- Conducting comparative pathway analysis across species

## Core Capabilities

Reactome provides two main API services and a Python client library:

### 1. Content Service - Data Retrieval

Query and retrieve biological pathway data, molecular interactions, and entity information.

**Common operations:**
- Retrieve pathway information and hierarchies
- Query specific entities (proteins, reactions, complexes)
- Get participating molecules in pathways
- Access database version and metadata
- Explore pathway compartments and locations

**API Base URL:** `https://reactome.org/ContentService`

### 2. Analysis Service - Pathway Analysis

Perform computational analysis on gene lists and expression data.

**Analysis types:**
- **Overrepresentation Analysis**: Identify statistically significant pathways from gene/protein lists
- **Expression Data Analysis**: Analyze gene expression datasets to find relevant pathways
- **Species Comparison**: Compare pathway data across different organisms

**API Base URL:** `https://reactome.org/AnalysisService`

### 3. reactome2py Python Package

Python client library that wraps Reactome API calls for easier programmatic access.

**Installation:**
```bash
uv pip install reactome2py
```

**Note:** The reactome2py package (version 3.0.0, released January 2021) is functional but not actively maintained. For the most up-to-date functionality, consider using direct REST API calls.

## Querying Pathway Data

### Using Content Service REST API

The Content Service uses REST protocol and returns data in JSON or plain text formats.

**Get database version:**
```python
import requests

response = requests.get("https://reactome.org/ContentService/data/database/version")
version = response.text
print(f"Reactome version: {version}")
```

**Query a specific entity:**
```python
import requests

entity_id = "R-HSA-69278"  # Example pathway ID
response = requests.get(f"https://reactome.org/ContentService/data/query/{entity_id}")
data = response.json()
```

**Get participating molecules in a pathway:**
```python
import requests

event_id = "R-HSA-69278"
response = requests.get(
    f"https://reactome.org/ContentService/data/event/{event_id}/participatingPhysicalEntities"
)
molecules = response.json()
```

### Using reactome2py Package

```python
import reactome2py
from reactome2py import content

# Query pathway information
pathway_info = content.query_by_id("R-HSA-69278")

# Get database version
version = content.get_database_version()
```

**For detailed API endpoints and parameters**, refer to `references/api_reference.md` in this skill.

## Performing Pathway Analysis

### Overrepresentation Analysis

Submit a list of gene/protein identifiers to find enriched pathways.

**Using REST API:**
```python
import requests

# Prepare identifier list
identifiers = ["TP53", "BRCA1", "EGFR", "MYC"]
data = "\n".join(identifiers)

# Submit analysis
response = requests.post(
    "https://reactome.org/AnalysisService/identifiers/",
    headers={"Content-Type": "text/plain"},
    data=data
)

result = response.json()
token = result["summary"]["token"]  # Save token to retrieve results later

# Access pathways
for pathway in result["pathways"]:
    print(f"{pathway['stId']}: {pathway['name']} (p-value: {pathway['entities']['pValue']})")
```

**Retrieve analysis by token:**
```python
# Token is valid for 7 days
response = requests.get(f"https://reactome.org/AnalysisService/token/{token}")
results = response.json()
```

### Expression Data Analysis

Analyze gene expression datasets with quantitative values.

**Input format (TSV with header starting with #):**
```
#Gene	Sample1	Sample2	Sample3
TP53	2.5	3.1	2.8
BRCA1	1.2	1.5	1.3
EGFR	4.5	4.2	4.8
```

**Submit expression data:**
```python
import requests

# Read TSV file
with open("expression_data.tsv", "r") as f:
    data = f.read()

response = requests.post(
    "https://reactome.org/AnalysisService/identifiers/",
    headers={"Content-Type": "text/plain"},
    data=data
)

result = response.json()
```

### Species Projection

Map identifiers to human pathways exclusively using the `/projection/` endpoint:

```python
response = requests.post(
    "https://reactome.org/AnalysisService/identifiers/projection/",
    headers={"Content-Type": "text/plain"},
    data=data
)
```

## Visualizing Results

Analysis results can be visualized in the Reactome Pathway Browser by constructing URLs with the analysis token:

```python
token = result["summary"]["token"]
pathway_id = "R-HSA-69278"
url = f"https://reactome.org/PathwayBrowser/#{pathway_id}&DTAB=AN&ANALYSIS={token}"
print(f"View results: {url}")
```

## Working with Analysis Tokens

- Analysis tokens are valid for **7 days**
- Tokens allow retrieval of previously computed results without re-submission
- Store tokens to access results across sessions
- Use `GET /token/{TOKEN}` endpoint to retrieve results

## Data Formats and Identifiers

### Supported Identifier Types

Reactome accepts various identifier formats:
- UniProt accessions (e.g., P04637)
- Gene symbols (e.g., TP53)
- Ensembl IDs (e.g., ENSG00000141510)
- EntrezGene IDs (e.g., 7157)
- ChEBI IDs for small molecules

The system automatically detects identifier types.

### Input Format Requirements

**For overrepresentation analysis:**
- Plain text list of identifiers (one per line)
- OR single column in TSV format

**For expression analysis:**
- TSV format with mandatory header row starting with "#"
- Column 1: identifiers
- Columns 2+: numeric expression values
- Use period (.) as decimal separator

### Output Format

All API responses return JSON containing:
- `pathways`: Array of enriched pathways with statistical metrics
- `summary`: Analysis metadata and token
- `entities`: Matched and unmapped identifiers
- Statistical values: pValue, FDR (false discovery rate)

## Helper Scripts

This skill includes `scripts/reactome_query.py`, a helper script for common Reactome operations:

```bash
# Query pathway information
python scripts/reactome_query.py query R-HSA-69278

# Perform overrepresentation analysis
python scripts/reactome_query.py analyze gene_list.txt

# Get database version
python scripts/reactome_query.py version
```

## Additional Resources

- **API Documentation**: https://reactome.org/dev
- **User Guide**: https://reactome.org/userguide
- **Documentation Portal**: https://reactome.org/documentation
- **Data Downloads**: https://reactome.org/download-data
- **reactome2py Docs**: https://reactome.github.io/reactome2py/

For comprehensive API endpoint documentation, see `references/api_reference.md` in this skill.

## Current Database Statistics (Version 94, September 2025)

- 2,825 human pathways
- 16,002 reactions
- 11,630 proteins
- 2,176 small molecules
- 1,070 drugs
- 41,373 literature references

Related Skills

zinc-database

from sanand0/scientific-research

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

uspto-database

from sanand0/scientific-research

Access USPTO APIs for patent/trademark searches, examination history (PEDS), assignments, citations, office actions, TSDR, for IP analysis and prior art searches.

uniprot-database

from sanand0/scientific-research

Direct REST API access to UniProt. Protein searches, FASTA retrieval, ID mapping, Swiss-Prot/TrEMBL. For Python workflows with multiple databases, prefer bioservices (unified interface to 40+ services). Use this for direct HTTP/REST work or UniProt-specific control.

string-database

from sanand0/scientific-research

Query STRING API for protein-protein interactions (59M proteins, 20B interactions). Network analysis, GO/KEGG enrichment, interaction discovery, 5000+ species, for systems biology.

pubmed-database

from sanand0/scientific-research

Direct REST API access to PubMed. Advanced Boolean/MeSH queries, E-utilities API, batch processing, citation management. For Python workflows, prefer biopython (Bio.Entrez). Use this for direct HTTP/REST work or custom API implementations.

pubchem-database

from sanand0/scientific-research

Query PubChem via PUG-REST API/PubChemPy (110M+ compounds). Search by name/CID/SMILES, retrieve properties, similarity/substructure searches, bioactivity, for cheminformatics.

pdb-database

from sanand0/scientific-research

Access RCSB PDB for 3D protein/nucleic acid structures. Search by text/sequence/structure, download coordinates (PDB/mmCIF), retrieve metadata, for structural biology and drug discovery.

opentargets-database

from sanand0/scientific-research

Query Open Targets Platform for target-disease associations, drug target discovery, tractability/safety data, genetics/omics evidence, known drugs, for therapeutic target identification.

metabolomics-workbench-database

from sanand0/scientific-research

Access NIH Metabolomics Workbench via REST API (4,200+ studies). Query metabolites, RefMet nomenclature, MS/NMR data, m/z searches, study metadata, for metabolomics and biomarker discovery.

kegg-database

from sanand0/scientific-research

Direct REST API access to KEGG (academic use only). Pathway analysis, gene-pathway mapping, metabolic pathways, drug interactions, ID conversion. For Python workflows with multiple databases, prefer bioservices. Use this for direct HTTP/REST work or KEGG-specific control.

hmdb-database

from sanand0/scientific-research

Access Human Metabolome Database (220K+ metabolites). Search by name/ID/structure, retrieve chemical properties, biomarker data, NMR/MS spectra, pathways, for metabolomics and identification.

gwas-database

from sanand0/scientific-research

Query NHGRI-EBI GWAS Catalog for SNP-trait associations. Search variants by rs ID, disease/trait, gene, retrieve p-values and summary statistics, for genetic epidemiology and polygenic risk scores.