hmdb-database
Access the Human Metabolome Database (HMDB) to search metabolites by name/structure/ID and extract chemical/biological/clinical fields when you need metabolomics research data or automated HMDB XML mining.
Best use case
hmdb-database is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Access the Human Metabolome Database (HMDB) to search metabolites by name/structure/ID and extract chemical/biological/clinical fields when you need metabolomics research data or automated HMDB XML mining.
Teams using hmdb-database should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/hmdb-database/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How hmdb-database Compares
| Feature / Agent | hmdb-database | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Access the Human Metabolome Database (HMDB) to search metabolites by name/structure/ID and extract chemical/biological/clinical fields when you need metabolomics research data or automated HMDB XML mining.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)
## When to Use
- You need to look up a metabolite by **common name** (e.g., “Caffeine”) and retrieve its HMDB entry data.
- You have an **HMDB ID** (e.g., `HMDB0000001`) and want to extract standardized chemical/biological/clinical fields for downstream analysis.
- You want to build a **local, scriptable pipeline** to mine the HMDB XML dump instead of manually browsing the website.
- You need to **map HMDB identifiers** to external resources (e.g., KEGG, PubChem, ChEBI) for integration tasks.
- You are preparing metabolomics datasets and need **pathway/enzyme/transporter** annotations from HMDB entries.
## Key Features
- Search metabolites by:
- Text name
- HMDB identifier (e.g., `HMDB0000001`)
- Structure-related query (as supported by the parser/search implementation)
- Parse the HMDB XML dataset and extract:
- **Chemical data** (formula, molecular weight, InChI/SMILES where available)
- **Biological data** (pathways, enzymes, transporters)
- **Clinical data** (disease associations, biofluid concentrations)
- Optional structuring of extracted results for analysis workflows (e.g., tabular outputs).
- Supports integration workflows by exposing identifiers suitable for cross-database mapping.
## Dependencies
- Python `>=3.9`
- Standard library:
- `xml.etree.ElementTree` (built-in)
- Optional:
- `pandas >= 1.5`
## Example Usage
### 1) Download HMDB XML
Download the HMDB metabolite XML dataset from:
- https://hmdb.ca/downloads
Assume you saved it as:
```text
data/hmdb_metabolites.xml
```
### 2) Search and Extract Fields (Runnable Example)
```python
from scripts.hmdb_parser import HMDBParser
def main():
# Path to the HMDB XML dump downloaded from hmdb.ca/downloads
xml_path = "data/hmdb_metabolites.xml"
parser = HMDBParser(xml_path)
# Search by metabolite name (text query)
results = parser.search("Caffeine")
# Print basic information from the first match (structure depends on implementation)
if not results:
print("No results found.")
return
first = results[0]
print("Top match:")
print(first)
if __name__ == "__main__":
main()
```
### 3) Field Reference
For a curated list of extractable fields and how they map to HMDB XML elements, see:
- `references/hmdb_data_fields.md`
## Implementation Details
- **Data acquisition**
- Primary workflow uses the official HMDB downloadable XML dataset (recommended for bulk parsing).
- Single-entry lookups can be done via the HMDB website, but this skill is designed around XML parsing.
- **Parsing approach**
- The parser reads the HMDB XML and traverses metabolite entries using `xml.etree.ElementTree`.
- Extracted fields should follow the definitions documented in `references/hmdb_data_fields.md`.
- **Search behavior**
- Name/ID search typically matches against key textual identifiers (e.g., common name, synonyms, HMDB accession).
- Structure-based search is dependent on what structural fields are indexed/exposed by `HMDBParser` (e.g., SMILES/InChI).
- **Integration / cross-references**
- HMDB entries often include cross-references to external databases (e.g., KEGG, PubChem, ChEBI).
- A common workflow is to extract these identifiers and build mapping tables for downstream joins.
- **Spectral analysis (conceptual)**
- HMDB contains NMR/MS references for some metabolites; this skill can be extended to link parsed entries to spectral metadata.
- Actual spectral matching/identification is not guaranteed unless implemented in the codebase.Related Skills
uspto-database
Access USPTO data (Patent Search, PEDS, TSDR, assignments) when you need to query patents/trademarks and retrieve prosecution or status information programmatically.
zinc-database
Access the ZINC (230M+ purchasable compounds) database when you need to look up compounds by ZINC ID/SMILES, run similarity/analog searches, or download 3D ready-to-dock structures for virtual screening and drug discovery.
uniprot-database
Direct REST API access to UniProt for protein search, entry retrieval, and identifier mapping; use when you need programmatic UniProtKB queries or cross-database ID conversion.
string-database
Access the STRING database to map identifiers, retrieve protein–protein interaction networks, and run functional/PPI enrichment when you need interaction context for a gene/protein set.
semantic-scholar-database
Access the Semantic Scholar Graph API to search papers and retrieve paper/author/citation data when you need literature discovery or citation graph exploration.
scite-database
Access Scite.ai Smart Citations to classify how a paper is cited (supporting, contrasting, mentioning) and assess scientific claims; use it when you need to evaluate a paper’s reliability or its acceptance in the literature.
pubchem-database-skill
Programmatic access to the PubChem database (via PUG-REST API and PubChemPy) for searching chemical compounds, retrieving physicochemical properties, performing structure similarity/substructure searches, and obtaining bioactivity data.
pdb-database
Access the RCSB Protein Data Bank (PDB) to search, download, and programmatically retrieve 3D macromolecular structures and metadata; use when you need structure discovery (text/sequence/3D similarity) or automated structural data ingestion for structural biology and drug discovery workflows.
kegg-database
Direct access to KEGG via the REST API for academic-only pathway/gene/compound/drug queries; use when you need precise HTTP-level control or targeted KEGG ID mapping.
gwas-database
Query the NHGRI-EBI GWAS Catalog to retrieve SNP–trait associations, study metadata, and (when available) summary statistics when you need evidence for a variant, trait/disease, gene, or genomic region.
gene-database
Query the NCBI Gene database via E-utilities and the NCBI Datasets API; use it when you need to search genes by symbol/ID and retrieve annotations (RefSeq, GO, location, phenotype) for single or batch gene lists.
fda-database
Query the openFDA API to retrieve FDA regulatory datasets (drugs, devices, adverse events, recalls, submissions, UNII) when you need programmatic safety/regulatory evidence for analysis or research.