ensembl-database
Query Ensembl genome database REST API for 250+ species. Gene lookups, sequence retrieval, variant analysis, comparative genomics, orthologs, VEP predictions, for genomic research.
Best use case
ensembl-database is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Query Ensembl genome database REST API for 250+ species. Gene lookups, sequence retrieval, variant analysis, comparative genomics, orthologs, VEP predictions, for genomic research.
Teams using ensembl-database should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/ensembl-database/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How ensembl-database Compares
| Feature / Agent | ensembl-database | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Query Ensembl genome database REST API for 250+ species. Gene lookups, sequence retrieval, variant analysis, comparative genomics, orthologs, VEP predictions, for genomic research.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Ensembl Database
## Overview
Access and query the Ensembl genome database, a comprehensive resource for vertebrate genomic data maintained by EMBL-EBI. The database provides gene annotations, sequences, variants, regulatory information, and comparative genomics data for over 250 species. Current release is 115 (September 2025).
## When to Use This Skill
This skill should be used when:
- Querying gene information by symbol or Ensembl ID
- Retrieving DNA, transcript, or protein sequences
- Analyzing genetic variants using the Variant Effect Predictor (VEP)
- Finding orthologs and paralogs across species
- Accessing regulatory features and genomic annotations
- Converting coordinates between genome assemblies (e.g., GRCh37 to GRCh38)
- Performing comparative genomics analyses
- Integrating Ensembl data into genomic research pipelines
## Core Capabilities
### 1. Gene Information Retrieval
Query gene data by symbol, Ensembl ID, or external database identifiers.
**Common operations:**
- Look up gene information by symbol (e.g., "BRCA2", "TP53")
- Retrieve transcript and protein information
- Get gene coordinates and chromosomal locations
- Access cross-references to external databases (UniProt, RefSeq, etc.)
**Using the ensembl_rest package:**
```python
from ensembl_rest import EnsemblClient
client = EnsemblClient()
# Look up gene by symbol
gene_data = client.symbol_lookup(
species='human',
symbol='BRCA2'
)
# Get detailed gene information
gene_info = client.lookup_id(
id='ENSG00000139618', # BRCA2 Ensembl ID
expand=True
)
```
**Direct REST API (no package):**
```python
import requests
server = "https://rest.ensembl.org"
# Symbol lookup
response = requests.get(
f"{server}/lookup/symbol/homo_sapiens/BRCA2",
headers={"Content-Type": "application/json"}
)
gene_data = response.json()
```
### 2. Sequence Retrieval
Fetch genomic, transcript, or protein sequences in various formats (JSON, FASTA, plain text).
**Operations:**
- Get DNA sequences for genes or genomic regions
- Retrieve transcript sequences (cDNA)
- Access protein sequences
- Extract sequences with flanking regions or modifications
**Example:**
```python
# Using ensembl_rest package
sequence = client.sequence_id(
id='ENSG00000139618', # Gene ID
content_type='application/json'
)
# Get sequence for a genomic region
region_seq = client.sequence_region(
species='human',
region='7:140424943-140624564' # chromosome:start-end
)
```
### 3. Variant Analysis
Query genetic variation data and predict variant consequences using the Variant Effect Predictor (VEP).
**Capabilities:**
- Look up variants by rsID or genomic coordinates
- Predict functional consequences of variants
- Access population frequency data
- Retrieve phenotype associations
**VEP example:**
```python
# Predict variant consequences
vep_result = client.vep_hgvs(
species='human',
hgvs_notation='ENST00000380152.7:c.803C>T'
)
# Query variant by rsID
variant = client.variation_id(
species='human',
id='rs699'
)
```
### 4. Comparative Genomics
Perform cross-species comparisons to identify orthologs, paralogs, and evolutionary relationships.
**Operations:**
- Find orthologs (same gene in different species)
- Identify paralogs (related genes in same species)
- Access gene trees showing evolutionary relationships
- Retrieve gene family information
**Example:**
```python
# Find orthologs for a human gene
orthologs = client.homology_ensemblgene(
id='ENSG00000139618', # Human BRCA2
target_species='mouse'
)
# Get gene tree
gene_tree = client.genetree_member_symbol(
species='human',
symbol='BRCA2'
)
```
### 5. Genomic Region Analysis
Find all genomic features (genes, transcripts, regulatory elements) in a specific region.
**Use cases:**
- Identify all genes in a chromosomal region
- Find regulatory features (promoters, enhancers)
- Locate variants within a region
- Retrieve structural features
**Example:**
```python
# Find all features in a region
features = client.overlap_region(
species='human',
region='7:140424943-140624564',
feature='gene'
)
```
### 6. Assembly Mapping
Convert coordinates between different genome assemblies (e.g., GRCh37 to GRCh38).
**Important:** Use `https://grch37.rest.ensembl.org` for GRCh37/hg19 queries and `https://rest.ensembl.org` for current assemblies.
**Example:**
```python
from ensembl_rest import AssemblyMapper
# Map coordinates from GRCh37 to GRCh38
mapper = AssemblyMapper(
species='human',
asm_from='GRCh37',
asm_to='GRCh38'
)
mapped = mapper.map(chrom='7', start=140453136, end=140453136)
```
## API Best Practices
### Rate Limiting
The Ensembl REST API has rate limits. Follow these practices:
1. **Respect rate limits:** Maximum 15 requests per second for anonymous users
2. **Handle 429 responses:** When rate-limited, check the `Retry-After` header and wait
3. **Use batch endpoints:** When querying multiple items, use batch endpoints where available
4. **Cache results:** Store frequently accessed data to reduce API calls
### Error Handling
Always implement proper error handling:
```python
import requests
import time
def query_ensembl(endpoint, params=None, max_retries=3):
server = "https://rest.ensembl.org"
headers = {"Content-Type": "application/json"}
for attempt in range(max_retries):
response = requests.get(
f"{server}{endpoint}",
headers=headers,
params=params
)
if response.status_code == 200:
return response.json()
elif response.status_code == 429:
# Rate limited - wait and retry
retry_after = int(response.headers.get('Retry-After', 1))
time.sleep(retry_after)
else:
response.raise_for_status()
raise Exception(f"Failed after {max_retries} attempts")
```
## Installation
### Python Package (Recommended)
```bash
uv pip install ensembl_rest
```
The `ensembl_rest` package provides a Pythonic interface to all Ensembl REST API endpoints.
### Direct REST API
No installation needed - use standard HTTP libraries like `requests`:
```bash
uv pip install requests
```
## Resources
### references/
- `api_endpoints.md`: Comprehensive documentation of all 17 API endpoint categories with examples and parameters
### scripts/
- `ensembl_query.py`: Reusable Python script for common Ensembl queries with built-in rate limiting and error handling
## Common Workflows
### Workflow 1: Gene Annotation Pipeline
1. Look up gene by symbol to get Ensembl ID
2. Retrieve transcript information
3. Get protein sequences for all transcripts
4. Find orthologs in other species
5. Export results
### Workflow 2: Variant Analysis
1. Query variant by rsID or coordinates
2. Use VEP to predict functional consequences
3. Check population frequencies
4. Retrieve phenotype associations
5. Generate report
### Workflow 3: Comparative Analysis
1. Start with gene of interest in reference species
2. Find orthologs in target species
3. Retrieve sequences for all orthologs
4. Compare gene structures and features
5. Analyze evolutionary conservation
## Species and Assembly Information
To query available species and assemblies:
```python
# List all available species
species_list = client.info_species()
# Get assembly information for a species
assembly_info = client.info_assembly(species='human')
```
Common species identifiers:
- Human: `homo_sapiens` or `human`
- Mouse: `mus_musculus` or `mouse`
- Zebrafish: `danio_rerio` or `zebrafish`
- Fruit fly: `drosophila_melanogaster`
## Additional Resources
- **Official Documentation:** https://rest.ensembl.org/documentation
- **Python Package Docs:** https://ensemblrest.readthedocs.io
- **EBI Training:** https://www.ebi.ac.uk/training/online/courses/ensembl-rest-api/
- **Ensembl Browser:** https://useast.ensembl.org
- **GitHub Examples:** https://github.com/Ensembl/ensembl-rest/wikiRelated Skills
Validate with Database
Connect to live PostgreSQL database to validate schema assumptions, compare pg_dump vs pgschema output, and query system catalogs interactively
relational-database-mcp-cloudbase
This is the required documentation for agents operating on the CloudBase Relational Database. It lists the only four supported tools for running SQL and managing security rules. Read the full content to understand why you must NOT use standard Application SDKs and how to safely execute INSERT, UPDATE, or DELETE operations without corrupting production data.
moai-domain-database
Database specialist covering PostgreSQL, MongoDB, Redis, and advanced data patterns for modern applications
databases
Work with MongoDB (document database, BSON documents, aggregation pipelines, Atlas cloud) and PostgreSQL (relational database, SQL queries, psql CLI, pgAdmin). Use when designing database schemas, writing queries and aggregations, optimizing indexes for performance, performing database migrations, configuring replication and sharding, implementing backup and restore strategies, managing database users and permissions, analyzing query performance, or administering production databases.
database-testing
Database schema validation, data integrity testing, migration testing, transaction isolation, and query performance. Use when testing data persistence, ensuring referential integrity, or validating database migrations.
database-migration
Execute database migrations across ORMs and platforms with zero-downtime strategies, data transformation, and rollback procedures. Use when migrating databases, changing schemas, performing data transformations, or implementing zero-downtime deployment strategies.
Database Implementation
Database schema design, migrations, query optimization with SQL, Exposed ORM, Flyway. Use for database, migration, schema, sql, flyway tags. Provides migration patterns, validation commands, rollback strategies.
cloudbase-document-database-web-sdk
Use CloudBase document database Web SDK to query, create, update, and delete data. Supports complex queries, pagination, aggregation, and geolocation queries.
zinc-database
Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.
uspto-database
Access USPTO APIs for patent/trademark searches, examination history (PEDS), assignments, citations, office actions, TSDR, for IP analysis and prior art searches.
uniprot-database
Direct REST API access to UniProt. Protein searches, FASTA retrieval, ID mapping, Swiss-Prot/TrEMBL. For Python workflows with multiple databases, prefer bioservices (unified interface to 40+ services). Use this for direct HTTP/REST work or UniProt-specific control.
string-database
Query STRING API for protein-protein interactions (59M proteins, 20B interactions). Network analysis, GO/KEGG enrichment, interaction discovery, 5000+ species, for systems biology.