kegg-database
Direct access to KEGG via the REST API for academic-only pathway/gene/compound/drug queries; use when you need precise HTTP-level control or targeted KEGG ID mapping.
Best use case
kegg-database is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Direct access to KEGG via the REST API for academic-only pathway/gene/compound/drug queries; use when you need precise HTTP-level control or targeted KEGG ID mapping.
Teams using kegg-database should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/kegg-database/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How kegg-database Compares
| Feature / Agent | kegg-database | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Direct access to KEGG via the REST API for academic-only pathway/gene/compound/drug queries; use when you need precise HTTP-level control or targeted KEGG ID mapping.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)
## When to Use
- You need to fetch **KEGG pathway, gene, compound, enzyme, disease, or drug** records directly from the **KEGG REST API**.
- You want to perform **gene ↔ pathway** mapping (e.g., building inputs for pathway enrichment or reporting).
- You need **cross-references** between KEGG databases (e.g., pathway → genes, gene → KO, pathway → compounds).
- You must **convert identifiers** between KEGG and external databases (e.g., KEGG gene → NCBI Gene ID / UniProt; KEGG compound → PubChem).
- You need **drug–drug interaction (DDI)** lookups for KEGG drug IDs.
> Note: KEGG REST access is intended for academic use. Non-academic/commercial use may require a separate KEGG license.
## Key Features
- Full coverage of core KEGG REST operations via Python helpers:
- `kegg_info` (database metadata)
- `kegg_list` (catalog listing)
- `kegg_find` (keyword/property search)
- `kegg_get` (entry retrieval; sequences/structures/images)
- `kegg_conv` (ID conversion)
- `kegg_link` (cross-database linking)
- `kegg_ddi` (drug–drug interactions)
- Supports common KEGG identifiers and formats:
- Pathways: `map00010`, `hsa00010`
- Genes: `hsa:10458`
- Compounds: `cpd:C00002`
- Drugs: `dr:D00001`
- Enzymes: `ec:1.1.1.1`
- KO: `ko:K00001`
- Output format options for `kegg_get`: `aaseq`, `ntseq`, `mol`, `kcf`, `image`, `kgml`, `json` (some formats are single-entry only).
## Dependencies
- Python `>=3.9`
- `requests >=2.31.0`
## Example Usage
```python
"""
End-to-end example:
1) Find a human gene by keyword
2) Link the gene to pathways
3) Retrieve one pathway entry
4) Convert the gene ID to UniProt
"""
from scripts.kegg_api import kegg_find, kegg_link, kegg_get, kegg_conv
# 1) Search for a gene keyword in KEGG GENES
hits = kegg_find("genes", "p53")
print("FIND results (first lines):")
print("\n".join(hits.splitlines()[:5]), "\n")
# Choose a known KEGG gene ID for TP53 (human)
gene_id = "hsa:7157"
# 2) Link gene -> pathways
pathway_links = kegg_link("pathway", gene_id)
print("LINK gene -> pathways (first lines):")
print("\n".join(pathway_links.splitlines()[:5]), "\n")
# Parse the first pathway ID from the link output
# Typical line format: path:hsaXXXXX<TAB>hsa:7157
first_line = next((ln for ln in pathway_links.splitlines() if ln.strip()), None)
if not first_line:
raise RuntimeError("No pathways returned for the gene ID.")
path_id = first_line.split("\t")[0].replace("path:", "")
print("Selected pathway:", path_id, "\n")
# 3) Retrieve the pathway entry (flat text)
pathway_entry = kegg_get(path_id)
print("GET pathway entry (first 30 lines):")
print("\n".join(pathway_entry.splitlines()[:30]), "\n")
# 4) Convert KEGG gene ID -> UniProt
uniprot_map = kegg_conv("uniprot", gene_id)
print("CONV KEGG -> UniProt:")
print(uniprot_map)
```
## Implementation Details
### API-to-function mapping
This skill wraps KEGG REST endpoints into Python functions (see `scripts/kegg_api.py`):
- `kegg_info(database_or_org)`
Retrieves database or organism metadata (release info, counts, etc.).
- `kegg_list(database, organism=None)`
Lists entries in a database; optionally scoped to an organism (e.g., `("pathway", "hsa")`).
Also supports listing explicit IDs (batch-style) when passed as a single string.
- `kegg_find(database, query, option=None)`
Searches by keyword or by chemical properties. Common `option` values:
- `formula` (exact match)
- `exact_mass` (range like `300-310`)
- `mol_weight` (range)
- `kegg_get(entry_ids, option=None)`
Retrieves full entries or specific formats:
- Sequences: `aaseq`, `ntseq`
- Structures: `mol`, `kcf`
- Pathway assets: `image` (PNG), `kgml` (XML), `json` (Pathway JSON)
**Batching rules**:
- Most operations allow up to **10 entries** per request.
- `image`, `kgml`, and `json` typically allow **only 1 entry** per request.
- `kegg_conv(target_db, source)`
Converts IDs between KEGG and external databases (e.g., `uniprot`, `ncbi-geneid`, `pubchem`, `chebi`).
Output is tab-delimited pairs: `source_id<TAB>target_id`.
- `kegg_link(target_db, source)`
Cross-references entries across KEGG databases (e.g., gene → pathway, pathway → compound, gene → KO).
- `kegg_ddi(drug_ids)`
Returns known drug–drug interactions for one or more KEGG drug IDs (up to typical batch limits).
### Practical constraints and error handling
- **Entry limits**: Prefer chunking lists into batches of ≤10 IDs; enforce single-entry calls for `image/kgml/json`.
- **HTTP status codes**: Treat non-200 responses as failures; common issues include:
- `400` (bad request / malformed parameters)
- `404` (unknown database or entry ID)
- **Rate behavior**: KEGG does not publish strict rate limits; avoid high-frequency polling and add backoff/retry for robustness.
### Reference documentation
For detailed endpoint syntax, database lists, and species codes, consult:
- `references/kegg_reference.md`Related Skills
uspto-database
Access USPTO data (Patent Search, PEDS, TSDR, assignments) when you need to query patents/trademarks and retrieve prosecution or status information programmatically.
zinc-database
Access the ZINC (230M+ purchasable compounds) database when you need to look up compounds by ZINC ID/SMILES, run similarity/analog searches, or download 3D ready-to-dock structures for virtual screening and drug discovery.
uniprot-database
Direct REST API access to UniProt for protein search, entry retrieval, and identifier mapping; use when you need programmatic UniProtKB queries or cross-database ID conversion.
string-database
Access the STRING database to map identifiers, retrieve protein–protein interaction networks, and run functional/PPI enrichment when you need interaction context for a gene/protein set.
semantic-scholar-database
Access the Semantic Scholar Graph API to search papers and retrieve paper/author/citation data when you need literature discovery or citation graph exploration.
scite-database
Access Scite.ai Smart Citations to classify how a paper is cited (supporting, contrasting, mentioning) and assess scientific claims; use it when you need to evaluate a paper’s reliability or its acceptance in the literature.
pubchem-database-skill
Programmatic access to the PubChem database (via PUG-REST API and PubChemPy) for searching chemical compounds, retrieving physicochemical properties, performing structure similarity/substructure searches, and obtaining bioactivity data.
pdb-database
Access the RCSB Protein Data Bank (PDB) to search, download, and programmatically retrieve 3D macromolecular structures and metadata; use when you need structure discovery (text/sequence/3D similarity) or automated structural data ingestion for structural biology and drug discovery workflows.
kegg-api
Access the KEGG database API to retrieve biological data (genes, pathways, compounds, drugs). Invoke when the user asks to search, list, or get details from KEGG.
hmdb-database
Access the Human Metabolome Database (HMDB) to search metabolites by name/structure/ID and extract chemical/biological/clinical fields when you need metabolomics research data or automated HMDB XML mining.
gwas-database
Query the NHGRI-EBI GWAS Catalog to retrieve SNP–trait associations, study metadata, and (when available) summary statistics when you need evidence for a variant, trait/disease, gene, or genomic region.
gene-database
Query the NCBI Gene database via E-utilities and the NCBI Datasets API; use it when you need to search genes by symbol/ID and retrieve annotations (RefSeq, GO, location, phenotype) for single or batch gene lists.