kegg-database

Direct access to KEGG via the REST API for academic-only pathway/gene/compound/drug queries; use when you need precise HTTP-level control or targeted KEGG ID mapping.

53 stars

byaipoch

View on GitHub Installation ↓

Best use case

kegg-database is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Direct access to KEGG via the REST API for academic-only pathway/gene/compound/drug queries; use when you need precise HTTP-level control or targeted KEGG ID mapping.

Teams using kegg-database should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/kegg-database/SKILL.md --create-dirs "https://raw.githubusercontent.com/aipoch/medical-research-skills/main/scientific-skills/Evidence Insight/kegg-database/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/kegg-database/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How kegg-database Compares

Feature / Agent	kegg-database	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Direct access to KEGG via the REST API for academic-only pathway/gene/compound/drug queries; use when you need precise HTTP-level control or targeted KEGG ID mapping.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

## When to Use

- You need to fetch **KEGG pathway, gene, compound, enzyme, disease, or drug** records directly from the **KEGG REST API**.
- You want to perform **gene ↔ pathway** mapping (e.g., building inputs for pathway enrichment or reporting).
- You need **cross-references** between KEGG databases (e.g., pathway → genes, gene → KO, pathway → compounds).
- You must **convert identifiers** between KEGG and external databases (e.g., KEGG gene → NCBI Gene ID / UniProt; KEGG compound → PubChem).
- You need **drug–drug interaction (DDI)** lookups for KEGG drug IDs.

> Note: KEGG REST access is intended for academic use. Non-academic/commercial use may require a separate KEGG license.

## Key Features

- Full coverage of core KEGG REST operations via Python helpers:
  - `kegg_info` (database metadata)
  - `kegg_list` (catalog listing)
  - `kegg_find` (keyword/property search)
  - `kegg_get` (entry retrieval; sequences/structures/images)
  - `kegg_conv` (ID conversion)
  - `kegg_link` (cross-database linking)
  - `kegg_ddi` (drug–drug interactions)
- Supports common KEGG identifiers and formats:
  - Pathways: `map00010`, `hsa00010`
  - Genes: `hsa:10458`
  - Compounds: `cpd:C00002`
  - Drugs: `dr:D00001`
  - Enzymes: `ec:1.1.1.1`
  - KO: `ko:K00001`
- Output format options for `kegg_get`: `aaseq`, `ntseq`, `mol`, `kcf`, `image`, `kgml`, `json` (some formats are single-entry only).

## Dependencies

- Python `>=3.9`
- `requests >=2.31.0`

## Example Usage

```python
"""
End-to-end example:
1) Find a human gene by keyword
2) Link the gene to pathways
3) Retrieve one pathway entry
4) Convert the gene ID to UniProt
"""

from scripts.kegg_api import kegg_find, kegg_link, kegg_get, kegg_conv

# 1) Search for a gene keyword in KEGG GENES
hits = kegg_find("genes", "p53")
print("FIND results (first lines):")
print("\n".join(hits.splitlines()[:5]), "\n")

# Choose a known KEGG gene ID for TP53 (human)
gene_id = "hsa:7157"

# 2) Link gene -> pathways
pathway_links = kegg_link("pathway", gene_id)
print("LINK gene -> pathways (first lines):")
print("\n".join(pathway_links.splitlines()[:5]), "\n")

# Parse the first pathway ID from the link output
# Typical line format: path:hsaXXXXX<TAB>hsa:7157
first_line = next((ln for ln in pathway_links.splitlines() if ln.strip()), None)
if not first_line:
    raise RuntimeError("No pathways returned for the gene ID.")

path_id = first_line.split("\t")[0].replace("path:", "")
print("Selected pathway:", path_id, "\n")

# 3) Retrieve the pathway entry (flat text)
pathway_entry = kegg_get(path_id)
print("GET pathway entry (first 30 lines):")
print("\n".join(pathway_entry.splitlines()[:30]), "\n")

# 4) Convert KEGG gene ID -> UniProt
uniprot_map = kegg_conv("uniprot", gene_id)
print("CONV KEGG -> UniProt:")
print(uniprot_map)
```

## Implementation Details

### API-to-function mapping

This skill wraps KEGG REST endpoints into Python functions (see `scripts/kegg_api.py`):

- `kegg_info(database_or_org)`  
  Retrieves database or organism metadata (release info, counts, etc.).

- `kegg_list(database, organism=None)`  
  Lists entries in a database; optionally scoped to an organism (e.g., `("pathway", "hsa")`).  
  Also supports listing explicit IDs (batch-style) when passed as a single string.

- `kegg_find(database, query, option=None)`  
  Searches by keyword or by chemical properties. Common `option` values:
  - `formula` (exact match)
  - `exact_mass` (range like `300-310`)
  - `mol_weight` (range)

- `kegg_get(entry_ids, option=None)`  
  Retrieves full entries or specific formats:
  - Sequences: `aaseq`, `ntseq`
  - Structures: `mol`, `kcf`
  - Pathway assets: `image` (PNG), `kgml` (XML), `json` (Pathway JSON)

  **Batching rules**:
  - Most operations allow up to **10 entries** per request.
  - `image`, `kgml`, and `json` typically allow **only 1 entry** per request.

- `kegg_conv(target_db, source)`  
  Converts IDs between KEGG and external databases (e.g., `uniprot`, `ncbi-geneid`, `pubchem`, `chebi`).  
  Output is tab-delimited pairs: `source_id<TAB>target_id`.

- `kegg_link(target_db, source)`  
  Cross-references entries across KEGG databases (e.g., gene → pathway, pathway → compound, gene → KO).

- `kegg_ddi(drug_ids)`  
  Returns known drug–drug interactions for one or more KEGG drug IDs (up to typical batch limits).

### Practical constraints and error handling

- **Entry limits**: Prefer chunking lists into batches of ≤10 IDs; enforce single-entry calls for `image/kgml/json`.
- **HTTP status codes**: Treat non-200 responses as failures; common issues include:
  - `400` (bad request / malformed parameters)
  - `404` (unknown database or entry ID)
- **Rate behavior**: KEGG does not publish strict rate limits; avoid high-frequency polling and add backoff/retry for robustness.

### Reference documentation

For detailed endpoint syntax, database lists, and species codes, consult:
- `references/kegg_reference.md`

Related Skills

uspto-database

from aipoch/medical-research-skills

Access USPTO data (Patent Search, PEDS, TSDR, assignments) when you need to query patents/trademarks and retrieve prosecution or status information programmatically.

zinc-database

from aipoch/medical-research-skills

Access the ZINC (230M+ purchasable compounds) database when you need to look up compounds by ZINC ID/SMILES, run similarity/analog searches, or download 3D ready-to-dock structures for virtual screening and drug discovery.

uniprot-database

from aipoch/medical-research-skills

Direct REST API access to UniProt for protein search, entry retrieval, and identifier mapping; use when you need programmatic UniProtKB queries or cross-database ID conversion.

string-database

from aipoch/medical-research-skills

Access the STRING database to map identifiers, retrieve protein–protein interaction networks, and run functional/PPI enrichment when you need interaction context for a gene/protein set.

semantic-scholar-database

from aipoch/medical-research-skills

Access the Semantic Scholar Graph API to search papers and retrieve paper/author/citation data when you need literature discovery or citation graph exploration.

scite-database

from aipoch/medical-research-skills

Access Scite.ai Smart Citations to classify how a paper is cited (supporting, contrasting, mentioning) and assess scientific claims; use it when you need to evaluate a paper’s reliability or its acceptance in the literature.

pubchem-database-skill

from aipoch/medical-research-skills

Programmatic access to the PubChem database (via PUG-REST API and PubChemPy) for searching chemical compounds, retrieving physicochemical properties, performing structure similarity/substructure searches, and obtaining bioactivity data.

pdb-database

from aipoch/medical-research-skills

Access the RCSB Protein Data Bank (PDB) to search, download, and programmatically retrieve 3D macromolecular structures and metadata; use when you need structure discovery (text/sequence/3D similarity) or automated structural data ingestion for structural biology and drug discovery workflows.

kegg-api

from aipoch/medical-research-skills

Access the KEGG database API to retrieve biological data (genes, pathways, compounds, drugs). Invoke when the user asks to search, list, or get details from KEGG.

hmdb-database

from aipoch/medical-research-skills

Access the Human Metabolome Database (HMDB) to search metabolites by name/structure/ID and extract chemical/biological/clinical fields when you need metabolomics research data or automated HMDB XML mining.

gwas-database

from aipoch/medical-research-skills

Query the NHGRI-EBI GWAS Catalog to retrieve SNP–trait associations, study metadata, and (when available) summary statistics when you need evidence for a variant, trait/disease, gene, or genomic region.

gene-database

from aipoch/medical-research-skills

Query the NCBI Gene database via E-utilities and the NCBI Datasets API; use it when you need to search genes by symbol/ID and retrieve annotations (RefSeq, GO, location, phenotype) for single or batch gene lists.