tooluniverse-cancer-classification

Translate free-text tumor descriptions to OncoTree codes, look up cancer subtypes and tissue hierarchies, resolve UMLS/NCI cross-references, and obtain OncoKB-compatible tumor type codes for variant annotation. Use when asked to find the OncoTree code for a tumor type, enumerate subtypes of a cancer, list cancers by tissue of origin, or standardize tumor nomenclature for downstream precision oncology analysis.

1,202 stars

bymims-harvard

View on GitHub Installation ↓

Best use case

tooluniverse-cancer-classification is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using tooluniverse-cancer-classification should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/tooluniverse-cancer-classification/SKILL.md --create-dirs "https://raw.githubusercontent.com/mims-harvard/ToolUniverse/main/skills/tooluniverse-cancer-classification/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/tooluniverse-cancer-classification/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How tooluniverse-cancer-classification Compares

Feature / Agent	tooluniverse-cancer-classification	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# Cancer Classification via OncoTree

Standardize cancer type nomenclature using the OncoTree ontology. Resolves free-text tumor
descriptions to structured codes with UMLS/NCI cross-references, enabling downstream use in
OncoKB variant annotation and GDC cohort selection.

## When to Use

Apply when researcher asks about:
- "What is the OncoTree code for [tumor description]?"
- "Find all subtypes of [cancer type]"
- "What cancers originate in [tissue]?"
- "I need the tumor type code for OncoKB annotation"
- "What is the TCGA/COSMIC code for [cancer]?"
- "List all CNS/Brain cancer subtypes"
- "What NCI code corresponds to glioblastoma?"

## Key Tools

| Tool | Purpose | Key Params |
|------|---------|-----------|
| `OncoTree_search` | Free-text search for cancer types | `query` (tumor name or description) |
| `OncoTree_get_type` | Full details for a known OncoTree code | `code` (e.g., "LUAD", "AML") |
| `OncoTree_list_tissues` | List all 32 tissue categories | (no params) |
| `OncoKB_annotate_variant` | Variant annotation using OncoTree code | `gene`, `variant`, `tumor_type` |
| `GDC_get_mutation_frequency` | Pan-cancer mutation frequency (TCGA) | `gene_symbol` |

## Workflow

### Phase 1: Cancer Type Discovery

Start with free-text search to find matching OncoTree codes:

```
OncoTree_search(query="breast cancer")
-> Returns list: code, name, main_type, tissue, parent, level, external_references
```

Key response fields:
- `code`: OncoTree code (e.g., "BRCA", "IBC") — use this in OncoKB calls
- `level`: hierarchy depth (1=tissue, 2=main type, 3-5=subtypes)
- `parent`: parent node code for navigating the hierarchy
- `external_references.UMLS`: UMLS CUI list
- `external_references.NCI`: NCI thesaurus code list

Search tips:
- Broad terms ("lung cancer") return many results; narrow by tissue or level
- Use tissue-specific terms ("invasive breast carcinoma") for precise matching
- Acronyms work: query="GBM" finds glioblastoma, query="AML" finds leukemia types

### Phase 2: Code Validation and Detail Retrieval

Once you have a candidate code, retrieve full details:

```
OncoTree_get_type(code="LUAD")
-> Returns: name, main_type, tissue, color, parent, level, history, external_references
```

Note: Not all codes are valid. "GBM" returns 404 — correct code is "GB" (Glioblastoma, IDH-Wildtype).
Always validate via `OncoTree_get_type` before using in downstream tools.

### Phase 3: Tissue-Level Exploration

When the user wants all cancers in a tissue category:

```
OncoTree_list_tissues()
-> Returns 32 tissue names: "Breast", "CNS/Brain", "Lung", "Myeloid", ...

OncoTree_search(query="CNS/Brain")
-> All cancer types with tissue="CNS/Brain"
```

### Phase 4: Downstream Use in Variant Annotation

Pass validated OncoTree code to OncoKB for cancer-type-specific therapeutic levels:

```
OncoKB_annotate_variant(gene="EGFR", variant="L858R", tumor_type="LUAD")
-> highestSensitiveLevel: "1" (FDA-approved therapy for this tumor+variant)
```

Without `tumor_type`, OncoKB returns pan-cancer levels which may be less specific.

## Tool Parameter Reference

| Tool | Required | Optional | Notes |
|------|---------|---------|-------|
| `OncoTree_search` | `query` | — | Free text; returns list sorted by relevance |
| `OncoTree_get_type` | `code` | — | Case-sensitive; "BRCA" not "brca". Returns 404 for invalid codes |
| `OncoTree_list_tissues` | — | — | No params; returns list of 32 tissue strings |
| `OncoKB_annotate_variant` | `gene`, `variant` | `tumor_type` | `tumor_type` is OncoTree code; omit for pan-cancer |
| `GDC_get_mutation_frequency` | `gene_symbol` | — | Pan-cancer TCGA only; no per-subtype breakdown |

## Common OncoTree Codes (verified working)

| Code | Name | Tissue |
|------|------|--------|
| `BRCA` | Invasive Breast Carcinoma | Breast |
| `LUAD` | Lung Adenocarcinoma | Lung |
| `LUSC` | Lung Squamous Cell Carcinoma | Lung |
| `MEL` | Melanoma | Skin |
| `CRC` | Colorectal Cancer | Bowel |
| `PAAD` | Pancreatic Adenocarcinoma | Pancreas |
| `GBM` | (invalid — use `GB`) | CNS/Brain |
| `GB` | Glioblastoma, IDH-Wildtype | CNS/Brain |
| `AML` | Acute Myeloid Leukemia | Myeloid |
| `PRAD` | Prostate Adenocarcinoma | Prostate |

## Common Patterns

```python
# Pattern: Resolve free-text to OncoTree code
results = OncoTree_search(query="pancreatic ductal adenocarcinoma")
# Pick result with lowest level number (most specific match)
code = results["data"][0]["code"]  # e.g., "PAAD"

# Pattern: Get all subtypes within a main type
results = OncoTree_search(query="Glioma")
subtypes = [r for r in results["data"] if r["main_type"] == "Glioma"]

# Pattern: Validate code before OncoKB call
detail = OncoTree_get_type(code="GB")
if detail["status"] == "success":
    OncoKB_annotate_variant(gene="IDH1", variant="R132H", tumor_type="GB")
```

## Tumor Classification Reasoning (CRITICAL)

**LOOK UP DON'T GUESS** -- tumor classification determines treatment. Always verify codes and biomarker interpretation via tools rather than relying on memory.

### Histological vs Molecular Classification

Tumors are classified on TWO axes -- both matter for treatment selection:
- **Histological** (what it looks like under microscope): adenocarcinoma, squamous, small cell, etc. This determines the OncoTree hierarchy level 3+.
- **Molecular** (what mutations/alterations drive it): EGFR-mutant, HER2-amplified, MSI-high, etc. This determines OncoKB therapeutic levels.

A tumor can be histologically identical to another but molecularly different, requiring different treatment. Example: two lung adenocarcinomas (both LUAD) but one is EGFR-mutant (targeted therapy) and another is KRAS-mutant (different targeted therapy). **Always check both axes.**

### Biomarker Interpretation Strategy

When interpreting cancer biomarkers, use OncoKB for actionability:
- **HER2**: Positive = IHC 3+ or FISH-amplified. Use `OncoKB_annotate_variant(gene="ERBB2", variant="Amplification", tumor_type="BRCA")` for therapeutic level
- **ER/PR**: Positive = hormone-receptor positive breast cancer. Changes treatment class (endocrine therapy)
- **Ki67**: Proliferation index. High (>20%) suggests aggressive biology; used in breast cancer grading (Luminal A vs B)
- **TMB (Tumor Mutational Burden)**: High TMB (>10 mut/Mb) predicts immunotherapy response across tumor types. Use `OncoKB_annotate_variant(gene="Other Biomarkers", variant="TMB-H")`
- **MSI (Microsatellite Instability)**: MSI-High is FDA-approved biomarker for pembrolizumab pan-cancer. Use `OncoKB_annotate_variant(gene="Other Biomarkers", variant="MSI-H")`

### Staging vs Grading -- Different Concepts

- **Stage** (TNM): How far has it spread? T=tumor size, N=lymph nodes, M=metastasis. Stage I-IV. Determines prognosis and surgery eligibility.
- **Grade**: How abnormal do the cells look? Grade 1 (well-differentiated, slow) to Grade 3 (poorly-differentiated, aggressive). Determines aggressiveness.
- A Stage I, Grade 3 tumor (small but aggressive) has different implications than Stage III, Grade 1 (spread but slow-growing).

### Actionability Assessment

After classifying the tumor, assess whether findings are clinically actionable:
1. **Level 1** (FDA-approved, specific tumor type): Immediate treatment implication. Example: EGFR L858R in LUAD
2. **Level 2** (Standard care): Strong evidence but context-dependent
3. **Level 3** (Compelling evidence): Clinical trial candidates
4. **Level 4** (Biological evidence): Research-stage only
5. Always provide the OncoTree code to OncoKB -- without it, you get pan-cancer levels which may understate or overstate actionability for the specific tumor type

## Reasoning Framework for Result Interpretation

### Evidence Grading

| Grade | Criteria | Example |
|-------|----------|---------|
| **Confirmed** | Exact OncoTree code validated via `OncoTree_get_type`, UMLS + NCI cross-refs present | LUAD: validated, UMLS C0152013, NCI C3512 |
| **Probable** | OncoTree search returns match, but code not yet validated or missing cross-refs | Search for "cholangiocarcinoma" returns CHOL with partial external refs |
| **Ambiguous** | Multiple OncoTree codes match the description at different hierarchy levels | "Breast cancer" matches BRCA (invasive), BREAST (tissue), IBC (inflammatory) |
| **Unresolved** | No OncoTree match; tumor type too rare or novel for the ontology | Ultra-rare sarcoma subtype not in OncoTree |

### Interpretation Guidance

- **OncoTree code confidence**: Always validate candidate codes with `OncoTree_get_type` before downstream use. Some common acronyms (e.g., "GBM") are NOT valid OncoTree codes (correct code is "GB"). A validated code with UMLS and NCI cross-references is highest confidence.
- **UMLS/NCI cross-reference priority**: For standardized reporting, NCI Thesaurus codes are preferred for cancer-specific contexts (used by caDSR, GDC). UMLS CUIs are broader (cross-disease) and useful for literature mining. When both are available, report both; when only one exists, NCI is preferred for oncology workflows.
- **Tissue hierarchy interpretation**: OncoTree levels represent specificity: Level 1 = tissue of origin (e.g., "Lung"), Level 2 = main cancer type (e.g., "Non-Small Cell Lung Cancer"), Level 3+ = histological subtypes (e.g., "Lung Adenocarcinoma"). For OncoKB variant annotation, use the most specific (deepest) level that accurately describes the tumor. For cohort-level analysis (e.g., TCGA), the Level 2-3 code is typically appropriate.
- **OncoKB tumor type impact**: Providing a tumor type code to OncoKB can change the therapeutic level (e.g., EGFR L858R is Level 1 in LUAD but Level 3B pan-cancer). Always use the validated OncoTree code for the patient's specific tumor type.
- **Deprecated or renamed codes**: OncoTree evolves across versions. The `history` field in `OncoTree_get_type` response shows prior names. Always use the current code.

### Synthesis Questions

1. Does the chosen OncoTree code represent the most specific histological subtype, or could a more precise code provide better therapeutic annotation in OncoKB?
2. When the free-text tumor description maps to multiple OncoTree codes, which hierarchy level best balances specificity and coverage for the analysis goal (variant annotation vs cohort selection)?
3. Are the UMLS/NCI cross-references consistent with external classifications (WHO, ICD-O), or are there discrepancies that need resolution?

---

## Fallback Chains

| Primary | Fallback | When |
|---------|---------|------|
| `OncoTree_get_type(code="GBM")` | `OncoTree_search(query="glioblastoma")` | 404 for common aliases |
| `OncoTree_search` (no results) | `OncoTree_list_tissues` + tissue-level search | Very rare/novel tumor types |
| OncoTree code for OncoKB | Omit `tumor_type` param | Code not recognized by OncoKB |

Related Skills

tooluniverse

1202

from mims-harvard/ToolUniverse

Router skill for ToolUniverse tasks. First checks if specialized tooluniverse skills (105+ skills covering disease/drug/target research, gene-disease associations, clinical decision support, genomics, epigenomics, proteomics, comparative genomics, chemical safety, toxicology, systems biology, and more) can solve the problem, then falls back to general strategies for using 2300+ scientific tools. Covers tool discovery, multi-hop queries, comprehensive research workflows, disambiguation, evidence grading, and report generation. Use when users need to research any scientific topic, find biological data, or explore drug/target/disease relationships. ALSO USE for any biology, medicine, chemistry, pharmacology, or life science question — even simple factoid questions like "how many X in protein Y", "what drug interacts with Z", "what gene causes disease W", or "translate this sequence". These questions benefit from database lookups (UniProt, PubMed, ChEMBL, ClinVar, GWAS Catalog, etc.) rather than answering from memory alone. When in doubt about a scientific fact, USE THIS SKILL to verify against real databases.

tooluniverse-variant-to-mechanism

1202

from mims-harvard/ToolUniverse

End-to-end variant-to-mechanism analysis: given a genetic variant (rsID or coordinates), trace its functional impact from regulatory context (GWAS, eQTL, RegulomeDB, ENCODE) through target gene identification (GTEx, OpenTargets L2G) to downstream pathway and disease biology (STRING, Reactome, GO enrichment, disease associations). Produces an evidence-graded mechanistic narrative linking genotype to phenotype. Use when asked "how does this variant cause disease?", "what is the mechanism of rs7903146?", "trace variant to pathway", or "connect this GWAS hit to biology".

tooluniverse-variant-interpretation

1202

from mims-harvard/ToolUniverse

Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.

tooluniverse-variant-functional-annotation

1202

from mims-harvard/ToolUniverse

Comprehensive functional annotation of protein variants — pathogenicity, population frequency, structural context, and clinical significance. Integrates ProtVar (map_variant, get_function, get_population) for protein-level mapping and structural context, ClinVar for clinical classifications, gnomAD for population frequency with ancestry data, CADD for deleteriousness scores, and ClinGen for gene-disease validity. Produces a structured variant annotation report with evidence grading. Use when asked about protein variant impact, missense variant pathogenicity, ProtVar annotation, variant functional context, or combining population and structural evidence for a variant.

tooluniverse-variant-analysis

1202

from mims-harvard/ToolUniverse

Production-ready VCF processing, variant annotation, mutation analysis, and structural variant (SV/CNV) interpretation for bioinformatics questions. Parses VCF files (streaming, large files), classifies mutation types (missense, nonsense, synonymous, frameshift, splice, intronic, intergenic) and structural variants (deletions, duplications, inversions, translocations), applies VAF/depth/quality/consequence filters, annotates with ClinVar/dbSNP/gnomAD/CADD via ToolUniverse, interprets SV/CNV clinical significance using ClinGen dosage sensitivity scores, computes variant statistics, and generates reports. Solves questions like "What fraction of variants with VAF < 0.3 are missense?", "How many non-reference variants remain after filtering intronic/intergenic?", "What is the pathogenicity of this deletion affecting BRCA1?", or "Which dosage-sensitive genes overlap this CNV?". Use when processing VCF files, annotating variants, filtering by VAF/depth/consequence, classifying mutations, interpreting structural variants, assessing CNV pathogenicity, comparing cohorts, or answering variant analysis questions.

tooluniverse-vaccine-design

1202

from mims-harvard/ToolUniverse

Design and evaluate vaccine candidates using computational immunology tools. Covers epitope prediction (MHC-I/II binding via IEDB), population coverage analysis, antigen selection, adjuvant matching, and immunogenicity assessment. Integrates IEDB for epitope prediction, UniProt for antigen sequences, PDB/AlphaFold for structural epitopes, BVBRC for pathogen proteomes, and literature for clinical precedent. Use when asked about vaccine design, epitope prediction, immunogenicity, MHC binding, T-cell epitopes, B-cell epitopes, or population coverage for vaccine candidates.

tooluniverse-toxicology

1202

from mims-harvard/ToolUniverse

Assess chemical and drug toxicity via adverse outcome pathways, real-world adverse event signals, and toxicogenomic evidence. Integrates AOPWiki (AOPWiki_list_aops, AOPWiki_get_aop) for mechanism- level pathway tracing, FAERS for post-market adverse event quantification, OpenFDA for label mining, and CTD for chemical-gene-disease evidence. Produces structured toxicity reports with evidence grading (T1-T4). Use when asked about toxicity mechanisms, adverse outcome pathways, AOP mapping, FAERS signal detection, or chemical-disease relationships for drugs or environmental chemicals.

tooluniverse-target-research

1202

from mims-harvard/ToolUniverse

Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.

tooluniverse-systems-biology

1202

from mims-harvard/ToolUniverse

Comprehensive systems biology and pathway analysis using multiple pathway databases (Reactome, KEGG, WikiPathways, Pathway Commons, BioModels). Performs pathway enrichment, protein-pathway mapping, keyword searches, and systems-level analysis. Use when analyzing gene sets, exploring biological pathways, or investigating systems-level biology.

tooluniverse-structural-variant-analysis

1202

from mims-harvard/ToolUniverse

Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.

tooluniverse-structural-proteomics

1202

from mims-harvard/ToolUniverse

Integrate structural biology data with proteomics for drug target validation. Retrieves protein structures from PDB (RCSB, PDBe), AlphaFold predictions, antibody structures (SAbDab), GPCR data (GPCRdb), binding pocket analysis (ProteinsPlus), and ligand interactions (BindingDB). Use when asked to find structures for a drug target, identify binding site ligands, cross-validate drug binding with structural data, assess structural druggability, or compare experimental vs predicted structures.

tooluniverse-stem-cell-organoid

1202

from mims-harvard/ToolUniverse

Research stem cells, iPSCs, organoids, and cell differentiation using ToolUniverse tools. Covers pluripotency marker identification, differentiation pathway analysis, organoid model characterization, cell type annotation, and disease modeling. Integrates CellxGene/HCA for single-cell atlas data, CellMarker for cell type markers, GEO for stem cell datasets, and pathway tools for differentiation signaling. Use when asked about stem cells, iPSCs, organoids, cell reprogramming, pluripotency, differentiation protocols, or 3D culture models.