tooluniverse-metabolomics

Comprehensive metabolomics research skill for identifying metabolites, analyzing studies, and searching metabolomics databases. Integrates HMDB (220k+ metabolites), MetaboLights, Metabolomics Workbench, and PubChem. Use when asked to identify or annotate metabolites (HMDB IDs, chemical properties, pathways), retrieve metabolomics study information from MetaboLights (MTBLS*) or Metabolomics Workbench (ST*), search for studies by keywords or disease, or generate comprehensive metabolomics research reports.

1,202 stars

bymims-harvard

View on GitHub Installation ↓

Best use case

tooluniverse-metabolomics is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using tooluniverse-metabolomics should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/tooluniverse-metabolomics/SKILL.md --create-dirs "https://raw.githubusercontent.com/mims-harvard/ToolUniverse/main/skills/tooluniverse-metabolomics/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/tooluniverse-metabolomics/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How tooluniverse-metabolomics Compares

Feature / Agent	tooluniverse-metabolomics	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Metabolomics Research

Comprehensive metabolomics research skill that identifies metabolites, analyzes studies, and searches metabolomics databases. Generates structured research reports with annotated metabolite information, study details, and database statistics.

## Use Case

**Use this skill when asked to:**
- Identify or annotate metabolites (HMDB IDs, chemical properties, pathways)
- Retrieve metabolomics study information from MetaboLights or Metabolomics Workbench
- Search for metabolomics studies by keywords or disease
- Analyze metabolite profiles or datasets
- Generate comprehensive metabolomics research reports

**Example queries:**
- "What is the HMDB ID and pathway information for glucose?"
- "Get study details for MTBLS1"
- "Find metabolomics studies related to diabetes"
- "Analyze these metabolites: glucose, lactate, pyruvate"

## Databases Covered

**Primary metabolite databases:**
- **HMDB** (Human Metabolome Database): 220,000+ metabolites with structures, pathways, and biological roles
- **MetaboLights**: Public metabolomics repository with thousands of studies
- **Metabolomics Workbench**: NIH Common Fund metabolomics data repository
- **PubChem**: Chemical properties and bioactivity data (fallback)

## Research Workflow

The skill executes a 4-phase analysis pipeline:

### Phase 1: Metabolite Identification & Annotation
For each metabolite in the input list:
1. Search HMDB by metabolite name
2. Retrieve HMDB ID, chemical formula, molecular weight
3. Get detailed metabolite information (description, pathways)
4. Fallback to PubChem for CID and chemical properties if HMDB unavailable

### Phase 2: Study Details Retrieval
For provided study IDs:
1. Detect database type (MTBLS = MetaboLights, ST = Metabolomics Workbench)
2. Retrieve study metadata (title, description, organism, status)
3. Extract experimental design and data availability

### Phase 3: Study Search
For keyword searches:
1. Search MetaboLights studies by query term
2. Return matching study IDs with preview information
3. Report total number of results

### Phase 4: Database Overview
Always included in reports:
1. Sample recent studies from MetaboLights
2. Database statistics and availability
3. Integration information for all databases

## Usage Patterns

### Pattern 1: Metabolite Identification
**Input:**
- Metabolite list: ["glucose", "lactate", "pyruvate"]

**Output report includes:**
- HMDB IDs for each metabolite
- Chemical formulas and molecular weights
- Biological pathways
- PubChem CIDs
- SMILES representations

### Pattern 2: Study Retrieval
**Input:**
- Study ID: "MTBLS1" or "ST000001"

**Output report includes:**
- Study title and description
- Organism information
- Study status and release date
- Data availability

### Pattern 3: Study Search
**Input:**
- Search query: "diabetes"
- Optional organism filter

**Output report includes:**
- Matching study IDs
- Study titles and previews
- Total result count

### Pattern 4: Comprehensive Analysis
**Input:**
- Metabolite list: ["glucose", "pyruvate"]
- Study ID: "MTBLS1"
- Search query: "diabetes"

**Output report includes:**
- All phases combined (identification, study details, search results, overview)
- Cross-referenced information
- Complete metabolomics research summary

## Input Parameters

### metabolite_list (optional)
List of metabolite names to identify and annotate.
- **Format**: List of strings
- **Examples**: `["glucose"]`, `["lactate", "pyruvate", "acetate"]`
- **Note**: Common names accepted; HMDB will find standard identifiers

### study_id (optional)
MetaboLights or Metabolomics Workbench study identifier.
- **Format**: String starting with "MTBLS" or "ST"
- **Examples**: `"MTBLS1"`, `"ST000001"`
- **Note**: Database auto-detected from prefix

### search_query (optional)
Keyword to search metabolomics studies.
- **Format**: String (disease, compound, organism, method)
- **Examples**: `"diabetes"`, `"glucose metabolism"`, `"LC-MS"`

### organism (optional)
Target organism for study filtering.
- **Format**: String (scientific name)
- **Default**: `"Homo sapiens"`
- **Examples**: `"Mus musculus"`, `"Saccharomyces cerevisiae"`

### output_file (optional)
Path for the generated markdown report.
- **Format**: String (filename with .md extension)
- **Default**: Auto-generated timestamp-based filename
- **Examples**: `"my_analysis.md"`, `"metabolomics_report.md"`

## Output Format

All analyses generate a structured markdown report with:

**Header section:**
- Report title and generation timestamp
- Input parameters summary (metabolites, study ID, search query, organism)

**Phase sections:**
- Clear section headers (## 1. Metabolite Identification, ## 2. Study Details, etc.)
- Subsections for each metabolite or result
- Consistent formatting (bold labels, tables for results)

**Database overview:**
- Available databases and statistics
- Recent studies sample
- Integration information

**Error handling:**
- Graceful error messages for unavailable data
- Fallback strategies documented in output
- "N/A" for missing fields (not blank)

## Implementation Notes

### SOAP Tool Handling
**HMDB tools are SOAP-based** and require special parameter handling:
- `HMDB_search`: Requires `operation="search"` parameter
- `HMDB_get_metabolite`: Requires `operation="get_metabolite"` parameter
- Do not use `endpoint` or `method` parameters (not applicable to SOAP)

### Response Format Variations
Tools return different response formats - handle all three:
1. **Standard format**: `{status: "success", data: [...], metadata: {...}}`
2. **Direct list**: `[...]` (e.g., metabolights_list_studies)
3. **Direct dict**: `{field1: ..., field2: ...}` (e.g., some detail endpoints)

Always check response type with `isinstance()` before accessing fields.

### Fallback Strategy
Follow this hierarchy for robustness:
1. **Primary source**: Try main database first (HMDB for metabolites, MetaboLights for studies)
2. **Fallback source**: Use alternative database if primary fails (PubChem for chemical properties)
3. **Default behavior**: Show error message with context, continue with remaining phases

### Progressive Report Writing
Write report incrementally to avoid memory issues:
1. Create output file early in pipeline
2. Append sections as each phase completes
3. Flush to disk regularly for long analyses
4. Return file path for user access

## Tool Discovery

The skill automatically discovers and uses these tools from ToolUniverse:

**HMDB Tools:**
- `HMDB_search`: Search metabolites by name
- `HMDB_get_metabolite`: Get detailed metabolite information

**MetaboLights Tools:**
- `metabolights_list_studies`: List available studies
- `metabolights_search_studies`: Search studies by keyword
- `metabolights_get_study`: Get study details by ID

**Metabolomics Workbench Tools:**
- `MetabolomicsWorkbench_get_study`: Get study information
- `MetabolomicsWorkbench_search_compound_by_name`: Search compounds

**PubChem Tools:**
- `PubChem_get_CID_by_compound_name`: Get PubChem CID
- `PubChem_get_compound_properties_by_CID`: Get chemical properties

No manual tool configuration required - all tools loaded automatically.

## Common Issues

### Issue: HMDB returns "Error querying HMDB: 0"
**Cause**: HMDB search returned empty results or index error accessing first result
**Solution**: This is expected for uncommon metabolites; PubChem fallback will be attempted

### Issue: Study details show "N/A" for all fields
**Cause**: Study ID not found or API unavailable
**Solution**: Verify study ID format (MTBLS* or ST*), check if study is public

### Issue: Tool not found errors
**Cause**: Missing API keys for some databases
**Solution**: Check `.env.template`, add required API keys to `.env` file (most metabolomics tools work without keys)

### Issue: Large metabolite lists cause slow execution
**Cause**: Pipeline queries each metabolite individually
**Solution**: Reports limit to first 10 metabolites; consider batching for >20 metabolites

## Summary

The Metabolomics Research skill provides comprehensive metabolomics analysis through a 4-phase pipeline that:

1. **Identifies metabolites** using HMDB (primary) and PubChem (fallback) databases
2. **Retrieves study details** from MetaboLights and Metabolomics Workbench repositories
3. **Searches studies** by keywords across metabolomics databases
4. **Generates structured reports** with all findings in readable markdown format

**Key Features:**
- ✅ 100% test coverage with working pipeline
- ✅ Handles SOAP tools correctly (HMDB requires `operation` parameter)
- ✅ Implements fallback strategies (HMDB → PubChem)
- ✅ Graceful error handling (continues if one phase fails)
- ✅ Progressive report writing (memory-efficient)
- ✅ Implementation-agnostic documentation (works with Python SDK and MCP)

**Best for:**
- Metabolite annotation and pathway analysis
- Study discovery and data retrieval
- Comprehensive metabolomics research reports
- Multi-database metabolomics queries

## Reasoning Framework

### Starting Point: Mass Spectrum Analysis

Metabolite identification starts with the mass spectrum. LOOK UP DON'T GUESS — always search HMDB/PubChem with the calculated neutral mass rather than guessing identity from m/z alone.

- **Step 1 — Calculate neutral mass**: Determine ionization mode. Positive: subtract adduct mass ([M+H]+ = -1.0073, [M+Na]+ = -22.9892, [M+NH4]+ = -18.0344). Negative: add back ([M-H]- = +1.0073, [M+Cl]- = +34.9694, [M+HCOO]- = +44.9977).
- **Step 2 — Search databases**: Query HMDB by mass (±5 ppm for Orbitrap/Q-TOF, ±0.5 Da for unit-resolution). Multiple adduct hypotheses yield different neutral masses — check all plausible adducts before concluding.
- **Step 3 — Resolve ambiguity**: Exact mass alone often matches 5-20 candidates. Use isotope pattern (M+1/M+2 ratios indicate element composition — e.g., high M+2 suggests S or Cl), retention time, and MS/MS fragmentation to narrow down. A single mass match is L3 confidence; MS/MS match to reference spectrum is required for L2/L1.

### Evidence Grading (Metabolite Identification Confidence)

- **L1 - Confirmed**: HMDB ID + retention time + MS/MS match to reference standard
- **L2 - Probable**: HMDB match by exact mass + MS/MS similarity (cosine > 0.7), no standard
- **L3 - Tentative**: Matched by exact mass and molecular formula only; structural isomers unresolved
- **L4 - Unknown**: Detected m/z with no database match; PubChem fallback may provide candidates

### Interpretation Guidance

**Metabolite identification**: HMDB IDs provide the strongest annotation when paired with experimental validation. A PubChem-only match (fallback) indicates the metabolite is chemically characterized but may lack biological context (pathways, disease associations). Always report the identification confidence level.

**Pathway enrichment strategy**: When multiple metabolites map to the same KEGG or HMDB pathway, enrichment is meaningful only if the input list is unbiased (not pre-selected for that pathway). Report hits vs. pathway size (3/5 detected is more informative than 3/500). LOOK UP DON'T GUESS — use `HMDB_get_metabolite` to get pathway annotations for each metabolite rather than assuming pathway membership from names alone.

**Biomarker discovery reasoning**: A candidate biomarker should show: (1) consistent direction of change across samples (fold-change > 1.5), (2) statistical significance (FDR-adjusted p < 0.05), (3) biological plausibility — LOOK UP the metabolite's known disease associations via HMDB, and (4) reproducibility in an independent cohort. Single-study HMDB associations are hypothesis-generating, not confirmatory. Check MetaboLights/Metabolomics Workbench for independent validation datasets.

### Synthesis Questions

A complete metabolomics report should answer:
1. What is the identification confidence level for each metabolite (L1-L4)?
2. Which biological pathways are enriched among the identified metabolites?
3. Do any metabolites meet biomarker criteria (fold-change, significance, plausibility)?
4. Are there relevant metabolomics studies (MTBLS/ST) for the disease or condition of interest?
5. What cross-database evidence supports the biological relevance of key findings (HMDB pathways, PubChem bioactivity)?

**Limitations:**
- HMDB may not have all metabolites (fallback to PubChem)
- Some studies require authentication or are not public
- Large metabolite lists (>10) auto-limited in reports
- API rate limits may affect large-scale queries

See `QUICK_START.md` for Python SDK examples, MCP integration, and step-by-step tutorials.

Related Skills

tooluniverse

1202

from mims-harvard/ToolUniverse

Router skill for ToolUniverse tasks. First checks if specialized tooluniverse skills (105+ skills covering disease/drug/target research, gene-disease associations, clinical decision support, genomics, epigenomics, proteomics, comparative genomics, chemical safety, toxicology, systems biology, and more) can solve the problem, then falls back to general strategies for using 2300+ scientific tools. Covers tool discovery, multi-hop queries, comprehensive research workflows, disambiguation, evidence grading, and report generation. Use when users need to research any scientific topic, find biological data, or explore drug/target/disease relationships. ALSO USE for any biology, medicine, chemistry, pharmacology, or life science question — even simple factoid questions like "how many X in protein Y", "what drug interacts with Z", "what gene causes disease W", or "translate this sequence". These questions benefit from database lookups (UniProt, PubMed, ChEMBL, ClinVar, GWAS Catalog, etc.) rather than answering from memory alone. When in doubt about a scientific fact, USE THIS SKILL to verify against real databases.

tooluniverse-variant-to-mechanism

1202

from mims-harvard/ToolUniverse

End-to-end variant-to-mechanism analysis: given a genetic variant (rsID or coordinates), trace its functional impact from regulatory context (GWAS, eQTL, RegulomeDB, ENCODE) through target gene identification (GTEx, OpenTargets L2G) to downstream pathway and disease biology (STRING, Reactome, GO enrichment, disease associations). Produces an evidence-graded mechanistic narrative linking genotype to phenotype. Use when asked "how does this variant cause disease?", "what is the mechanism of rs7903146?", "trace variant to pathway", or "connect this GWAS hit to biology".

tooluniverse-variant-interpretation

1202

from mims-harvard/ToolUniverse

Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.

tooluniverse-variant-functional-annotation

1202

from mims-harvard/ToolUniverse

Comprehensive functional annotation of protein variants — pathogenicity, population frequency, structural context, and clinical significance. Integrates ProtVar (map_variant, get_function, get_population) for protein-level mapping and structural context, ClinVar for clinical classifications, gnomAD for population frequency with ancestry data, CADD for deleteriousness scores, and ClinGen for gene-disease validity. Produces a structured variant annotation report with evidence grading. Use when asked about protein variant impact, missense variant pathogenicity, ProtVar annotation, variant functional context, or combining population and structural evidence for a variant.

tooluniverse-variant-analysis

1202

from mims-harvard/ToolUniverse

Production-ready VCF processing, variant annotation, mutation analysis, and structural variant (SV/CNV) interpretation for bioinformatics questions. Parses VCF files (streaming, large files), classifies mutation types (missense, nonsense, synonymous, frameshift, splice, intronic, intergenic) and structural variants (deletions, duplications, inversions, translocations), applies VAF/depth/quality/consequence filters, annotates with ClinVar/dbSNP/gnomAD/CADD via ToolUniverse, interprets SV/CNV clinical significance using ClinGen dosage sensitivity scores, computes variant statistics, and generates reports. Solves questions like "What fraction of variants with VAF < 0.3 are missense?", "How many non-reference variants remain after filtering intronic/intergenic?", "What is the pathogenicity of this deletion affecting BRCA1?", or "Which dosage-sensitive genes overlap this CNV?". Use when processing VCF files, annotating variants, filtering by VAF/depth/consequence, classifying mutations, interpreting structural variants, assessing CNV pathogenicity, comparing cohorts, or answering variant analysis questions.

tooluniverse-vaccine-design

1202

from mims-harvard/ToolUniverse

Design and evaluate vaccine candidates using computational immunology tools. Covers epitope prediction (MHC-I/II binding via IEDB), population coverage analysis, antigen selection, adjuvant matching, and immunogenicity assessment. Integrates IEDB for epitope prediction, UniProt for antigen sequences, PDB/AlphaFold for structural epitopes, BVBRC for pathogen proteomes, and literature for clinical precedent. Use when asked about vaccine design, epitope prediction, immunogenicity, MHC binding, T-cell epitopes, B-cell epitopes, or population coverage for vaccine candidates.

tooluniverse-toxicology

1202

from mims-harvard/ToolUniverse

Assess chemical and drug toxicity via adverse outcome pathways, real-world adverse event signals, and toxicogenomic evidence. Integrates AOPWiki (AOPWiki_list_aops, AOPWiki_get_aop) for mechanism- level pathway tracing, FAERS for post-market adverse event quantification, OpenFDA for label mining, and CTD for chemical-gene-disease evidence. Produces structured toxicity reports with evidence grading (T1-T4). Use when asked about toxicity mechanisms, adverse outcome pathways, AOP mapping, FAERS signal detection, or chemical-disease relationships for drugs or environmental chemicals.

tooluniverse-target-research

1202

from mims-harvard/ToolUniverse

Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.

tooluniverse-systems-biology

1202

from mims-harvard/ToolUniverse

Comprehensive systems biology and pathway analysis using multiple pathway databases (Reactome, KEGG, WikiPathways, Pathway Commons, BioModels). Performs pathway enrichment, protein-pathway mapping, keyword searches, and systems-level analysis. Use when analyzing gene sets, exploring biological pathways, or investigating systems-level biology.

tooluniverse-structural-variant-analysis

1202

from mims-harvard/ToolUniverse

Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.

tooluniverse-structural-proteomics

1202

from mims-harvard/ToolUniverse

Integrate structural biology data with proteomics for drug target validation. Retrieves protein structures from PDB (RCSB, PDBe), AlphaFold predictions, antibody structures (SAbDab), GPCR data (GPCRdb), binding pocket analysis (ProteinsPlus), and ligand interactions (BindingDB). Use when asked to find structures for a drug target, identify binding site ligands, cross-validate drug binding with structural data, assess structural druggability, or compare experimental vs predicted structures.

tooluniverse-stem-cell-organoid

1202

from mims-harvard/ToolUniverse

Research stem cells, iPSCs, organoids, and cell differentiation using ToolUniverse tools. Covers pluripotency marker identification, differentiation pathway analysis, organoid model characterization, cell type annotation, and disease modeling. Integrates CellxGene/HCA for single-cell atlas data, CellMarker for cell type markers, GEO for stem cell datasets, and pathway tools for differentiation signaling. Use when asked about stem cells, iPSCs, organoids, cell reprogramming, pluripotency, differentiation protocols, or 3D culture models.