chembl-database
Query ChEMBL's bioactive molecules and drug discovery data. Search compounds by structure/properties, retrieve bioactivity data (IC50, Ki), find inhibitors, perform SAR studies, for medicinal chemistry.
Best use case
chembl-database is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Query ChEMBL's bioactive molecules and drug discovery data. Search compounds by structure/properties, retrieve bioactivity data (IC50, Ki), find inhibitors, perform SAR studies, for medicinal chemistry.
Query ChEMBL's bioactive molecules and drug discovery data. Search compounds by structure/properties, retrieve bioactivity data (IC50, Ki), find inhibitors, perform SAR studies, for medicinal chemistry.
Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.
Practical example
Example input
Use the "chembl-database" skill to help with this workflow task. Context: Query ChEMBL's bioactive molecules and drug discovery data. Search compounds by structure/properties, retrieve bioactivity data (IC50, Ki), find inhibitors, perform SAR studies, for medicinal chemistry.
Example output
A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.
When to use this skill
- Use this skill when you want a reusable workflow rather than writing the same prompt again and again.
When not to use this skill
- Do not use this when you only need a one-off answer and do not need a reusable workflow.
- Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/chembl-database/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How chembl-database Compares
| Feature / Agent | chembl-database | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Query ChEMBL's bioactive molecules and drug discovery data. Search compounds by structure/properties, retrieve bioactivity data (IC50, Ki), find inhibitors, perform SAR studies, for medicinal chemistry.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# ChEMBL Database
## Overview
ChEMBL is a manually curated database of bioactive molecules maintained by the European Bioinformatics Institute (EBI), containing over 2 million compounds, 19 million bioactivity measurements, 13,000+ drug targets, and data on approved drugs and clinical candidates. Access and query this data programmatically using the ChEMBL Python client for drug discovery and medicinal chemistry research.
## When to Use This Skill
This skill should be used when:
- **Compound searches**: Finding molecules by name, structure, or properties
- **Target information**: Retrieving data about proteins, enzymes, or biological targets
- **Bioactivity data**: Querying IC50, Ki, EC50, or other activity measurements
- **Drug information**: Looking up approved drugs, mechanisms, or indications
- **Structure searches**: Performing similarity or substructure searches
- **Cheminformatics**: Analyzing molecular properties and drug-likeness
- **Target-ligand relationships**: Exploring compound-target interactions
- **Drug discovery**: Identifying inhibitors, agonists, or bioactive molecules
## Installation and Setup
### Python Client
The ChEMBL Python client is required for programmatic access:
```bash
uv pip install chembl_webresource_client
```
### Basic Usage Pattern
```python
from chembl_webresource_client.new_client import new_client
# Access different endpoints
molecule = new_client.molecule
target = new_client.target
activity = new_client.activity
drug = new_client.drug
```
## Core Capabilities
### 1. Molecule Queries
**Retrieve by ChEMBL ID:**
```python
molecule = new_client.molecule
aspirin = molecule.get('CHEMBL25')
```
**Search by name:**
```python
results = molecule.filter(pref_name__icontains='aspirin')
```
**Filter by properties:**
```python
# Find small molecules (MW <= 500) with favorable LogP
results = molecule.filter(
molecule_properties__mw_freebase__lte=500,
molecule_properties__alogp__lte=5
)
```
### 2. Target Queries
**Retrieve target information:**
```python
target = new_client.target
egfr = target.get('CHEMBL203')
```
**Search for specific target types:**
```python
# Find all kinase targets
kinases = target.filter(
target_type='SINGLE PROTEIN',
pref_name__icontains='kinase'
)
```
### 3. Bioactivity Data
**Query activities for a target:**
```python
activity = new_client.activity
# Find potent EGFR inhibitors
results = activity.filter(
target_chembl_id='CHEMBL203',
standard_type='IC50',
standard_value__lte=100,
standard_units='nM'
)
```
**Get all activities for a compound:**
```python
compound_activities = activity.filter(
molecule_chembl_id='CHEMBL25',
pchembl_value__isnull=False
)
```
### 4. Structure-Based Searches
**Similarity search:**
```python
similarity = new_client.similarity
# Find compounds similar to aspirin
similar = similarity.filter(
smiles='CC(=O)Oc1ccccc1C(=O)O',
similarity=85 # 85% similarity threshold
)
```
**Substructure search:**
```python
substructure = new_client.substructure
# Find compounds containing benzene ring
results = substructure.filter(smiles='c1ccccc1')
```
### 5. Drug Information
**Retrieve drug data:**
```python
drug = new_client.drug
drug_info = drug.get('CHEMBL25')
```
**Get mechanisms of action:**
```python
mechanism = new_client.mechanism
mechanisms = mechanism.filter(molecule_chembl_id='CHEMBL25')
```
**Query drug indications:**
```python
drug_indication = new_client.drug_indication
indications = drug_indication.filter(molecule_chembl_id='CHEMBL25')
```
## Query Workflow
### Workflow 1: Finding Inhibitors for a Target
1. **Identify the target** by searching by name:
```python
targets = new_client.target.filter(pref_name__icontains='EGFR')
target_id = targets[0]['target_chembl_id']
```
2. **Query bioactivity data** for that target:
```python
activities = new_client.activity.filter(
target_chembl_id=target_id,
standard_type='IC50',
standard_value__lte=100
)
```
3. **Extract compound IDs** and retrieve details:
```python
compound_ids = [act['molecule_chembl_id'] for act in activities]
compounds = [new_client.molecule.get(cid) for cid in compound_ids]
```
### Workflow 2: Analyzing a Known Drug
1. **Get drug information**:
```python
drug_info = new_client.drug.get('CHEMBL1234')
```
2. **Retrieve mechanisms**:
```python
mechanisms = new_client.mechanism.filter(molecule_chembl_id='CHEMBL1234')
```
3. **Find all bioactivities**:
```python
activities = new_client.activity.filter(molecule_chembl_id='CHEMBL1234')
```
### Workflow 3: Structure-Activity Relationship (SAR) Study
1. **Find similar compounds**:
```python
similar = new_client.similarity.filter(smiles='query_smiles', similarity=80)
```
2. **Get activities for each compound**:
```python
for compound in similar:
activities = new_client.activity.filter(
molecule_chembl_id=compound['molecule_chembl_id']
)
```
3. **Analyze property-activity relationships** using molecular properties from results.
## Filter Operators
ChEMBL supports Django-style query filters:
- `__exact` - Exact match
- `__iexact` - Case-insensitive exact match
- `__contains` / `__icontains` - Substring matching
- `__startswith` / `__endswith` - Prefix/suffix matching
- `__gt`, `__gte`, `__lt`, `__lte` - Numeric comparisons
- `__range` - Value in range
- `__in` - Value in list
- `__isnull` - Null/not null check
## Data Export and Analysis
Convert results to pandas DataFrame for analysis:
```python
import pandas as pd
activities = new_client.activity.filter(target_chembl_id='CHEMBL203')
df = pd.DataFrame(list(activities))
# Analyze results
print(df['standard_value'].describe())
print(df.groupby('standard_type').size())
```
## Performance Optimization
### Caching
The client automatically caches results for 24 hours. Configure caching:
```python
from chembl_webresource_client.settings import Settings
# Disable caching
Settings.Instance().CACHING = False
# Adjust cache expiration (seconds)
Settings.Instance().CACHE_EXPIRE = 86400
```
### Lazy Evaluation
Queries execute only when data is accessed. Convert to list to force execution:
```python
# Query is not executed yet
results = molecule.filter(pref_name__icontains='aspirin')
# Force execution
results_list = list(results)
```
### Pagination
Results are paginated automatically. Iterate through all results:
```python
for activity in new_client.activity.filter(target_chembl_id='CHEMBL203'):
# Process each activity
print(activity['molecule_chembl_id'])
```
## Common Use Cases
### Find Kinase Inhibitors
```python
# Identify kinase targets
kinases = new_client.target.filter(
target_type='SINGLE PROTEIN',
pref_name__icontains='kinase'
)
# Get potent inhibitors
for kinase in kinases[:5]: # First 5 kinases
activities = new_client.activity.filter(
target_chembl_id=kinase['target_chembl_id'],
standard_type='IC50',
standard_value__lte=50
)
```
### Explore Drug Repurposing
```python
# Get approved drugs
drugs = new_client.drug.filter()
# For each drug, find all targets
for drug in drugs[:10]:
mechanisms = new_client.mechanism.filter(
molecule_chembl_id=drug['molecule_chembl_id']
)
```
### Virtual Screening
```python
# Find compounds with desired properties
candidates = new_client.molecule.filter(
molecule_properties__mw_freebase__range=[300, 500],
molecule_properties__alogp__lte=5,
molecule_properties__hba__lte=10,
molecule_properties__hbd__lte=5
)
```
## Resources
### scripts/example_queries.py
Ready-to-use Python functions demonstrating common ChEMBL query patterns:
- `get_molecule_info()` - Retrieve molecule details by ID
- `search_molecules_by_name()` - Name-based molecule search
- `find_molecules_by_properties()` - Property-based filtering
- `get_bioactivity_data()` - Query bioactivities for targets
- `find_similar_compounds()` - Similarity searching
- `substructure_search()` - Substructure matching
- `get_drug_info()` - Retrieve drug information
- `find_kinase_inhibitors()` - Specialized kinase inhibitor search
- `export_to_dataframe()` - Convert results to pandas DataFrame
Consult this script for implementation details and usage examples.
### references/api_reference.md
Comprehensive API documentation including:
- Complete endpoint listing (molecule, target, activity, assay, drug, etc.)
- All filter operators and query patterns
- Molecular properties and bioactivity fields
- Advanced query examples
- Configuration and performance tuning
- Error handling and rate limiting
Refer to this document when detailed API information is needed or when troubleshooting queries.
## Important Notes
### Data Reliability
- ChEMBL data is manually curated but may contain inconsistencies
- Always check `data_validity_comment` field in activity records
- Be aware of `potential_duplicate` flags
### Units and Standards
- Bioactivity values use standard units (nM, uM, etc.)
- `pchembl_value` provides normalized activity (-log scale)
- Check `standard_type` to understand measurement type (IC50, Ki, EC50, etc.)
### Rate Limiting
- Respect ChEMBL's fair usage policies
- Use caching to minimize repeated requests
- Consider bulk downloads for large datasets
- Avoid hammering the API with rapid consecutive requests
### Chemical Structure Formats
- SMILES strings are the primary structure format
- InChI keys available for compounds
- SVG images can be generated via the image endpoint
## Additional Resources
- ChEMBL website: https://www.ebi.ac.uk/chembl/
- API documentation: https://www.ebi.ac.uk/chembl/api/data/docs
- Python client GitHub: https://github.com/chembl/chembl_webresource_client
- Interface documentation: https://chembl.gitbook.io/chembl-interface-documentation/
- Example notebooks: https://github.com/chembl/notebooksRelated Skills
vector-database-engineer
Expert in vector databases, embedding strategies, and semantic search implementation. Masters Pinecone, Weaviate, Qdrant, Milvus, and pgvector for RAG applications, recommendation systems, and similar
sqlmap-database-pentesting
This skill should be used when the user asks to "automate SQL injection testing," "enumerate database structure," "extract database credentials using sqlmap," "dump tables and columns...
sqlmap-database-penetration-testing
This skill should be used when the user asks to "automate SQL injection testing," "enumerate database structure," "extract database credentials using sqlmap," "dump tables and columns from a vulnerable database," or "perform automated database penetration testing." It provides comprehensive guidance for using SQLMap to detect and exploit SQL injection vulnerabilities.
database-optimizer
Expert database optimizer specializing in modern performance tuning, query optimization, and scalable architectures. Masters advanced indexing, N+1 resolution, multi-tier caching, partitioning strategies, and cloud database optimization. Handles complex query analysis, migration strategies, and performance monitoring. Use PROACTIVELY for database optimization, performance issues, or scalability challenges.
database-migrations-sql-migrations
SQL database migrations with zero-downtime strategies for PostgreSQL, MySQL, SQL Server
database-migrations-migration-observability
Migration monitoring, CDC, and observability infrastructure
database-design
Database design principles and decision-making. Schema design, indexing strategy, ORM selection, serverless databases.
database-cloud-optimization-cost-optimize
You are a cloud cost optimization expert specializing in reducing infrastructure expenses while maintaining performance and reliability. Analyze cloud spending, identify savings opportunities, and implement cost-effective architectures across AWS, Azure, and GCP.
database-architect
Expert database architect specializing in data layer design from scratch, technology selection, schema modeling, and scalable database architectures. Masters SQL/NoSQL/TimeSeries database selection, normalization strategies, migration planning, and performance-first design. Handles both greenfield architectures and re-architecture of existing systems. Use PROACTIVELY for database architecture, technology selection, or data modeling decisions.
database-admin
Expert database administrator specializing in modern cloud databases, automation, and reliability engineering. Masters AWS/Azure/GCP database services, Infrastructure as Code, high availability, disaster recovery, performance optimization, and compliance. Handles multi-cloud strategies, container databases, and cost optimization. Use PROACTIVELY for database architecture, operations, or reliability engineering.
zinc-database
Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.
uspto-database
Access USPTO APIs for patent/trademark searches, examination history (PEDS), assignments, citations, office actions, TSDR, for IP analysis and prior art searches.