ensembl-database
Query Ensembl genome database REST API for 250+ species. Gene lookups, sequence retrieval, variant analysis, comparative genomics, orthologs, VEP predictions, for genomic research.
Best use case
ensembl-database is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Query Ensembl genome database REST API for 250+ species. Gene lookups, sequence retrieval, variant analysis, comparative genomics, orthologs, VEP predictions, for genomic research.
Query Ensembl genome database REST API for 250+ species. Gene lookups, sequence retrieval, variant analysis, comparative genomics, orthologs, VEP predictions, for genomic research.
Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.
Practical example
Example input
Use the "ensembl-database" skill to help with this workflow task. Context: Query Ensembl genome database REST API for 250+ species. Gene lookups, sequence retrieval, variant analysis, comparative genomics, orthologs, VEP predictions, for genomic research.
Example output
A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.
When to use this skill
- Use this skill when you want a reusable workflow rather than writing the same prompt again and again.
When not to use this skill
- Do not use this when you only need a one-off answer and do not need a reusable workflow.
- Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/ensembl-database/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How ensembl-database Compares
| Feature / Agent | ensembl-database | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Query Ensembl genome database REST API for 250+ species. Gene lookups, sequence retrieval, variant analysis, comparative genomics, orthologs, VEP predictions, for genomic research.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agent for Product Research
Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.
AI Agent for SaaS Idea Validation
Use AI agent skills for SaaS idea validation, market research, customer discovery, competitor analysis, and documenting startup hypotheses.
SKILL.md Source
# Ensembl Database
## Overview
Access and query the Ensembl genome database, a comprehensive resource for vertebrate genomic data maintained by EMBL-EBI. The database provides gene annotations, sequences, variants, regulatory information, and comparative genomics data for over 250 species. Current release is 115 (September 2025).
## When to Use This Skill
This skill should be used when:
- Querying gene information by symbol or Ensembl ID
- Retrieving DNA, transcript, or protein sequences
- Analyzing genetic variants using the Variant Effect Predictor (VEP)
- Finding orthologs and paralogs across species
- Accessing regulatory features and genomic annotations
- Converting coordinates between genome assemblies (e.g., GRCh37 to GRCh38)
- Performing comparative genomics analyses
- Integrating Ensembl data into genomic research pipelines
## Core Capabilities
### 1. Gene Information Retrieval
Query gene data by symbol, Ensembl ID, or external database identifiers.
**Common operations:**
- Look up gene information by symbol (e.g., "BRCA2", "TP53")
- Retrieve transcript and protein information
- Get gene coordinates and chromosomal locations
- Access cross-references to external databases (UniProt, RefSeq, etc.)
**Using the ensembl_rest package:**
```python
from ensembl_rest import EnsemblClient
client = EnsemblClient()
# Look up gene by symbol
gene_data = client.symbol_lookup(
species='human',
symbol='BRCA2'
)
# Get detailed gene information
gene_info = client.lookup_id(
id='ENSG00000139618', # BRCA2 Ensembl ID
expand=True
)
```
**Direct REST API (no package):**
```python
import requests
server = "https://rest.ensembl.org"
# Symbol lookup
response = requests.get(
f"{server}/lookup/symbol/homo_sapiens/BRCA2",
headers={"Content-Type": "application/json"}
)
gene_data = response.json()
```
### 2. Sequence Retrieval
Fetch genomic, transcript, or protein sequences in various formats (JSON, FASTA, plain text).
**Operations:**
- Get DNA sequences for genes or genomic regions
- Retrieve transcript sequences (cDNA)
- Access protein sequences
- Extract sequences with flanking regions or modifications
**Example:**
```python
# Using ensembl_rest package
sequence = client.sequence_id(
id='ENSG00000139618', # Gene ID
content_type='application/json'
)
# Get sequence for a genomic region
region_seq = client.sequence_region(
species='human',
region='7:140424943-140624564' # chromosome:start-end
)
```
### 3. Variant Analysis
Query genetic variation data and predict variant consequences using the Variant Effect Predictor (VEP).
**Capabilities:**
- Look up variants by rsID or genomic coordinates
- Predict functional consequences of variants
- Access population frequency data
- Retrieve phenotype associations
**VEP example:**
```python
# Predict variant consequences
vep_result = client.vep_hgvs(
species='human',
hgvs_notation='ENST00000380152.7:c.803C>T'
)
# Query variant by rsID
variant = client.variation_id(
species='human',
id='rs699'
)
```
### 4. Comparative Genomics
Perform cross-species comparisons to identify orthologs, paralogs, and evolutionary relationships.
**Operations:**
- Find orthologs (same gene in different species)
- Identify paralogs (related genes in same species)
- Access gene trees showing evolutionary relationships
- Retrieve gene family information
**Example:**
```python
# Find orthologs for a human gene
orthologs = client.homology_ensemblgene(
id='ENSG00000139618', # Human BRCA2
target_species='mouse'
)
# Get gene tree
gene_tree = client.genetree_member_symbol(
species='human',
symbol='BRCA2'
)
```
### 5. Genomic Region Analysis
Find all genomic features (genes, transcripts, regulatory elements) in a specific region.
**Use cases:**
- Identify all genes in a chromosomal region
- Find regulatory features (promoters, enhancers)
- Locate variants within a region
- Retrieve structural features
**Example:**
```python
# Find all features in a region
features = client.overlap_region(
species='human',
region='7:140424943-140624564',
feature='gene'
)
```
### 6. Assembly Mapping
Convert coordinates between different genome assemblies (e.g., GRCh37 to GRCh38).
**Important:** Use `https://grch37.rest.ensembl.org` for GRCh37/hg19 queries and `https://rest.ensembl.org` for current assemblies.
**Example:**
```python
from ensembl_rest import AssemblyMapper
# Map coordinates from GRCh37 to GRCh38
mapper = AssemblyMapper(
species='human',
asm_from='GRCh37',
asm_to='GRCh38'
)
mapped = mapper.map(chrom='7', start=140453136, end=140453136)
```
## API Best Practices
### Rate Limiting
The Ensembl REST API has rate limits. Follow these practices:
1. **Respect rate limits:** Maximum 15 requests per second for anonymous users
2. **Handle 429 responses:** When rate-limited, check the `Retry-After` header and wait
3. **Use batch endpoints:** When querying multiple items, use batch endpoints where available
4. **Cache results:** Store frequently accessed data to reduce API calls
### Error Handling
Always implement proper error handling:
```python
import requests
import time
def query_ensembl(endpoint, params=None, max_retries=3):
server = "https://rest.ensembl.org"
headers = {"Content-Type": "application/json"}
for attempt in range(max_retries):
response = requests.get(
f"{server}{endpoint}",
headers=headers,
params=params
)
if response.status_code == 200:
return response.json()
elif response.status_code == 429:
# Rate limited - wait and retry
retry_after = int(response.headers.get('Retry-After', 1))
time.sleep(retry_after)
else:
response.raise_for_status()
raise Exception(f"Failed after {max_retries} attempts")
```
## Installation
### Python Package (Recommended)
```bash
uv pip install ensembl_rest
```
The `ensembl_rest` package provides a Pythonic interface to all Ensembl REST API endpoints.
### Direct REST API
No installation needed - use standard HTTP libraries like `requests`:
```bash
uv pip install requests
```
## Resources
### references/
- `api_endpoints.md`: Comprehensive documentation of all 17 API endpoint categories with examples and parameters
### scripts/
- `ensembl_query.py`: Reusable Python script for common Ensembl queries with built-in rate limiting and error handling
## Common Workflows
### Workflow 1: Gene Annotation Pipeline
1. Look up gene by symbol to get Ensembl ID
2. Retrieve transcript information
3. Get protein sequences for all transcripts
4. Find orthologs in other species
5. Export results
### Workflow 2: Variant Analysis
1. Query variant by rsID or coordinates
2. Use VEP to predict functional consequences
3. Check population frequencies
4. Retrieve phenotype associations
5. Generate report
### Workflow 3: Comparative Analysis
1. Start with gene of interest in reference species
2. Find orthologs in target species
3. Retrieve sequences for all orthologs
4. Compare gene structures and features
5. Analyze evolutionary conservation
## Species and Assembly Information
To query available species and assemblies:
```python
# List all available species
species_list = client.info_species()
# Get assembly information for a species
assembly_info = client.info_assembly(species='human')
```
Common species identifiers:
- Human: `homo_sapiens` or `human`
- Mouse: `mus_musculus` or `mouse`
- Zebrafish: `danio_rerio` or `zebrafish`
- Fruit fly: `drosophila_melanogaster`
## Additional Resources
- **Official Documentation:** https://rest.ensembl.org/documentation
- **Python Package Docs:** https://ensemblrest.readthedocs.io
- **EBI Training:** https://www.ebi.ac.uk/training/online/courses/ensembl-rest-api/
- **Ensembl Browser:** https://useast.ensembl.org
- **GitHub Examples:** https://github.com/Ensembl/ensembl-rest/wikiRelated Skills
vector-database-engineer
Expert in vector databases, embedding strategies, and semantic search implementation. Masters Pinecone, Weaviate, Qdrant, Milvus, and pgvector for RAG applications, recommendation systems, and similar
sqlmap-database-pentesting
This skill should be used when the user asks to "automate SQL injection testing," "enumerate database structure," "extract database credentials using sqlmap," "dump tables and columns...
sqlmap-database-penetration-testing
This skill should be used when the user asks to "automate SQL injection testing," "enumerate database structure," "extract database credentials using sqlmap," "dump tables and columns from a vulnerable database," or "perform automated database penetration testing." It provides comprehensive guidance for using SQLMap to detect and exploit SQL injection vulnerabilities.
database-optimizer
Expert database optimizer specializing in modern performance tuning, query optimization, and scalable architectures. Masters advanced indexing, N+1 resolution, multi-tier caching, partitioning strategies, and cloud database optimization. Handles complex query analysis, migration strategies, and performance monitoring. Use PROACTIVELY for database optimization, performance issues, or scalability challenges.
database-migrations-sql-migrations
SQL database migrations with zero-downtime strategies for PostgreSQL, MySQL, SQL Server
database-migrations-migration-observability
Migration monitoring, CDC, and observability infrastructure
database-design
Database design principles and decision-making. Schema design, indexing strategy, ORM selection, serverless databases.
database-cloud-optimization-cost-optimize
You are a cloud cost optimization expert specializing in reducing infrastructure expenses while maintaining performance and reliability. Analyze cloud spending, identify savings opportunities, and implement cost-effective architectures across AWS, Azure, and GCP.
database-architect
Expert database architect specializing in data layer design from scratch, technology selection, schema modeling, and scalable database architectures. Masters SQL/NoSQL/TimeSeries database selection, normalization strategies, migration planning, and performance-first design. Handles both greenfield architectures and re-architecture of existing systems. Use PROACTIVELY for database architecture, technology selection, or data modeling decisions.
database-admin
Expert database administrator specializing in modern cloud databases, automation, and reliability engineering. Masters AWS/Azure/GCP database services, Infrastructure as Code, high availability, disaster recovery, performance optimization, and compliance. Handles multi-cloud strategies, container databases, and cost optimization. Use PROACTIVELY for database architecture, operations, or reliability engineering.
zinc-database
Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.
uspto-database
Access USPTO APIs for patent/trademark searches, examination history (PEDS), assignments, citations, office actions, TSDR, for IP analysis and prior art searches.