tooluniverse-cell-line-profiling
Help researchers select and characterize cancer cell lines for experiments. Given a cancer type, gene of interest, or cell line name, profiles molecular features (mutations, expression, CNV), gene dependencies (CRISPR screens), drug sensitivities (IC50/AUC), and genetic backgrounds using DepMap, Cellosaurus, PharmacoDB, COSMIC, CellMarker, CLUE, and SYNERGxDB. Generates a decision-support report for cell line selection. Use when researchers ask about which cell line to use, cell line characterization, DepMap dependencies, drug sensitivity profiles, or cancer model selection.
Best use case
tooluniverse-cell-line-profiling is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Help researchers select and characterize cancer cell lines for experiments. Given a cancer type, gene of interest, or cell line name, profiles molecular features (mutations, expression, CNV), gene dependencies (CRISPR screens), drug sensitivities (IC50/AUC), and genetic backgrounds using DepMap, Cellosaurus, PharmacoDB, COSMIC, CellMarker, CLUE, and SYNERGxDB. Generates a decision-support report for cell line selection. Use when researchers ask about which cell line to use, cell line characterization, DepMap dependencies, drug sensitivity profiles, or cancer model selection.
Teams using tooluniverse-cell-line-profiling should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/tooluniverse-cell-line-profiling/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How tooluniverse-cell-line-profiling Compares
| Feature / Agent | tooluniverse-cell-line-profiling | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Help researchers select and characterize cancer cell lines for experiments. Given a cancer type, gene of interest, or cell line name, profiles molecular features (mutations, expression, CNV), gene dependencies (CRISPR screens), drug sensitivities (IC50/AUC), and genetic backgrounds using DepMap, Cellosaurus, PharmacoDB, COSMIC, CellMarker, CLUE, and SYNERGxDB. Generates a decision-support report for cell line selection. Use when researchers ask about which cell line to use, cell line characterization, DepMap dependencies, drug sensitivity profiles, or cancer model selection.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Cancer Cell Line Profiling and Selection
Comprehensive profiling of cancer cell lines for experimental model selection. Transforms a query (cancer type, gene, or cell line name) into an actionable report covering identity verification, molecular features, gene dependencies, drug sensitivities, and druggable targets.
**KEY PRINCIPLES**:
1. **Decision-first** - Answer "which cell line should I use?" not "here is all the data"
2. **Multi-source validation** - Cross-reference DepMap, Cellosaurus, COSMIC, PharmacoDB
3. **Actionable output** - Ranked cell line recommendations with rationale
4. **Practical focus** - Include availability, growth characteristics, common pitfalls
5. **Gene-aware** - When a gene of interest is given, prioritize lines with relevant mutations/dependencies
6. **Source-referenced** - Cite database sources for every claim
7. **English-first queries** - Always use English terms in tool calls, even if the user writes in another language
## LOOK UP, DON'T GUESS
When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory. A database-verified answer is always more reliable than a guess.
---
## COMPUTE, DON'T DESCRIBE
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
## When to Use
Apply for: cell line selection by cancer type/gene, cell line profiling, gene dependencies, drug sensitivity queries, cell line comparisons, mutation checks.
---
## Phase 0: Tool Parameter Reference (CRITICAL)
**BEFORE calling ANY tool**, verify parameters against this table.
| Tool | Key Parameters | Notes |
|------|---------------|-------|
| `DepMap_search_cell_lines` | `query` (required) | Search by name, e.g., "A549", "MCF" |
| `DepMap_get_cell_line` | `model_name` OR `model_id` | Name: "A549"; ID: "SIDM00001" |
| `DepMap_get_cell_lines` | `tissue`, `cancer_type`, `page_size` | Filter by tissue (e.g., "Lung") |
| `DepMap_get_gene_dependencies` | `gene_symbol` (required), `model_id` | Gene effect scores; negative = essential |
| `DepMap_search_genes` | `query` (required) | Validate gene symbol in DepMap first |
| `cellosaurus_search_cell_lines` | `q` (required), `size` | Solr syntax: `id:HeLa`, `ox:9606 AND char:cancer` |
| `cellosaurus_get_cell_line_info` | `accession` (required, CVCL_ format) | Full cell line record |
| `cellosaurus_query_converter` | `query` (required) | Natural language to Solr syntax |
| `COSMIC_search_mutations` | `terms` OR `query`, `max_results` | Search "BRAF V600E" or gene name |
| `COSMIC_get_mutations_by_gene` | `gene` OR `gene_name`, `max_results` | All mutations for a gene |
| `PharmacoDB_get_cell_line` | `operation="get_cell_line"`, `cell_name` | Cell line metadata + datasets |
| `PharmacoDB_get_experiments` | `operation="get_experiments"`, `compound_name`, `cell_line_name`, `dataset_name`, `per_page` | Drug response data (IC50, AAC, EC50) |
| `PharmacoDB_get_biomarker_assoc` | `operation="get_biomarker_associations"`, `compound_name`, `tissue_name`, `mdata_type`, `per_page` | Gene-drug sensitivity correlations |
| `PharmacoDB_search` | `operation="search"`, `query` | Find PharmacoDB IDs |
| `CellMarker_search_cancer_markers` | `operation="search_cancer_markers"`, `cancer_type`, `gene_symbol`, `cell_type` | Cancer cell markers |
| `CellMarker_search_by_gene` | `operation="search_by_gene"`, `gene_symbol` (required), `species` | Cell types expressing a gene |
| `HPA_get_comparative_expression_by_gene_and_cellline` | `gene_name` (required), `cell_line` (required) | Supported lines: ishikawa, hela, mcf7, a549, hepg2, jurkat, pc3, rh30, siha, u251 |
| `CLUE_get_cell_lines` | `operation="get_cell_lines"`, `cell_id` | L1000 CMap cell line info (requires CLUE_API_KEY) |
| `SYNERGxDB_search_combos` | `drug_name_1`, `drug_name_2`, `sample` (tissue or cell ID) | Drug combination synergy (ZIP, Bliss, Loewe) |
| `SYNERGxDB_list_cell_lines` | - | All cell lines in SYNERGxDB |
| `DGIdb_get_drug_gene_interactions` | `genes: list[str]` | Druggable gene interactions |
| `OpenTargets_get_associated_drugs_by_target_ensemblID` | `ensemblId`, `size` | Drugs targeting a gene |
| `STRING_get_network` | `protein_ids: list[str]`, `species: int` (9606) | PPI network for gene context |
| `MyGene_query_genes` | `query` (NOT `q`) | Resolve gene symbol to Ensembl ID |
| `cBioPortal_get_mutations` | `study_id`, `gene_list` (STRING, not array) | Cell line mutations from CCLE |
---
## Workflow Overview
```
Input: Cancer type AND/OR Gene of interest AND/OR Cell line name(s)
Phase 1: Cell Line Identification
- Search and verify cell line identity (Cellosaurus)
- Get metadata: species, disease, STR profile, cross-references
- If cancer type given without cell line: find candidate lines (DepMap)
Phase 2: Molecular Profiling
- Mutation landscape (COSMIC, cBioPortal CCLE)
- Gene expression (HPA, DepMap)
- Cancer markers (CellMarker)
Phase 3: Gene Dependencies (CRISPR Screens)
- Gene essentiality scores from DepMap
- Identify selectively essential genes
- Compare across cell lines if multiple candidates
Phase 4: Drug Sensitivity
- IC50/AAC from PharmacoDB (GDSC, CCLE, CTRPv2, PRISM)
- Biomarker associations for drug response
- Drug combination synergy (SYNERGxDB)
Phase 5: Target Druggability & Recommendations
- Druggable targets (DGIdb, OpenTargets)
- Final ranked recommendation with rationale
```
---
## Phase 1: Cell Line Identification
**Goal**: Verify cell line identity and find candidates.
**If specific cell line given**: (1) `cellosaurus_search_cell_lines(q="id:<NAME>")` → get CVCL accession, species, disease, contamination flags. (2) `cellosaurus_get_cell_line_info(accession="CVCL_XXXX")` for STR profile. (3) `DepMap_get_cell_line(model_name="...")` for tissue, cancer_type, MSI, ploidy. (4) `PharmacoDB_get_cell_line(operation="get_cell_line", cell_name="...")` for datasets.
**If cancer type only**: (1) `DepMap_get_cell_lines(tissue="Lung", page_size=20)`. (2) Narrow by gene mutations/dependencies in Phases 2-3. (3) `CellMarker_search_cancer_markers(operation="search_cancer_markers", cancer_type="Lung")`.
**OUTPUT**: Table of candidate cell lines with: name, tissue, cancer type, key identifiers.
---
## Phase 2: Molecular Profiling
**Goal**: Characterize mutational and expression landscape.
**2A Mutations**: `COSMIC_get_mutations_by_gene(gene="EGFR")` + `cBioPortal_get_mutations(study_id="ccle_broad_2019", gene_list="EGFR,KRAS,TP53")`. Note: `gene_list` is a comma-separated STRING. CCLE study ID: `ccle_broad_2019`.
**2B Expression**: `HPA_get_comparative_expression_by_gene_and_cellline(gene_name="EGFR", cell_line="a549")`. Only 10 lines supported: hela, mcf7, a549, hepg2, jurkat, pc3, rh30, siha, u251, ishikawa.
**2C Cancer markers**: `CellMarker_search_by_gene(operation="search_by_gene", gene_symbol="EGFR", species="Human")`
**OUTPUT**: Mutation table (gene, AA change, type) + expression summary per cell line.
---
## Phase 3: Gene Dependencies (CRISPR Screens)
**Goal**: Determine which genes are essential in candidate cell lines.
**LIMITATION**: `DepMap_get_gene_dependencies` returns gene metadata (HGNC ID, Ensembl ID) but NOT per-cell-line CRISPR scores. Full Chronos scores require depmap.org download.
**Available tools**: (1) `DepMap_search_genes(query="EGFR")` — validate gene exists. (2) `DepMap_get_gene_dependencies(gene_symbol="EGFR")` — metadata only. (3) **Alternatives**: cBioPortal CCLE for mutation data, PubMed for published screens, or direct user to depmap.org/portal.
**Interpreting Chronos scores** (from DepMap portal): <-0.5 = essential; ~0 = not essential; ~-1.0 = strongly essential. Selective dependency (essential in some lineages only) indicates therapeutic window.
**OUTPUT**: Gene validation + mutation status per cell line.
**Offline DepMap analysis** (when API lacks CRISPR scores): Download `CRISPRGeneEffect.csv` + `Model.csv` from https://depmap.org/portal/download/all/. Load with pandas, find gene column (format: "KRAS (3845)"), merge with metadata, filter by lineage, sort by score. Most negative Chronos score = most dependent.
**If DepMap data is unavailable**: Use `cBioPortal_get_mutations(study_id="ccle_broad_2019", gene_list="KRAS")` for mutation data, and the Quick Reference table below for common recommendations.
---
## Phase 4: Drug Sensitivity
**Goal**: Profile drug response data.
**4A PharmacoDB**: `PharmacoDB_get_experiments(operation="get_experiments", compound_name="Erlotinib", cell_line_name="A549", per_page=20)` for dose-response (IC50, AAC, EC50). Omit `compound_name` to get all drugs for a cell line. Use `PharmacoDB_get_biomarker_assoc(compound_name="...", tissue_name="...", mdata_type="mutation")` for sensitivity biomarkers.
**4B SYNERGxDB**: `SYNERGxDB_search_combos(drug_name_1="gemcitabine", drug_name_2="erlotinib", sample="lung")`. Positive ZIP = synergy. Covers cytotoxic agents only (not targeted therapies/biologics).
**4C CLUE**: `CLUE_get_cell_lines(operation="get_cell_lines", cell_id="MCF7")` — requires CLUE_API_KEY.
**OUTPUT**: Drug sensitivity table (drug, IC50, AAC, dataset) + synergy data if available.
---
## Phase 5: Target Druggability and Recommendations
**5A Druggability**: `DGIdb_get_drug_gene_interactions(genes=["EGFR", "KRAS"])` + `MyGene_query_genes(query="EGFR")` → `OpenTargets_get_associated_drugs_by_target_ensemblID(ensemblId="...", size=10)` + `STRING_get_network(protein_ids=["EGFR"], species=9606)`.
**5B Final Recommendation**: Synthesize all phases. **Explain WHY one line is better for this specific use case.**
#### Decision Criteria with Concrete Thresholds
| Criterion | Weight | Score 3 (Best) | Score 2 (Acceptable) | Score 1 (Poor) |
|-----------|--------|----------------|---------------------|----------------|
| **Mutation match** | x3 | Exact mutation (e.g., KRAS G12D) | Same gene, different mutation | No mutation in gene of interest |
| **Co-mutation simplicity** | x2 | Few co-mutations (cleaner background) | Moderate co-mutations | Complex background (3+ driver mutations) |
| **Gene dependency** | x2 | DepMap score < -0.5 (essential) | Score -0.5 to -0.2 (moderately essential) | Score > -0.2 (not essential) |
| **Drug sensitivity data** | x1 | In GDSC + CCLE + PRISM (3+ datasets) | In 1-2 datasets | No drug response data |
| **Practical factors** | x1 | Adherent, well-characterized, widely used | Suspension or less common | Hard to culture, contamination-prone |
**Total score** = sum of (criterion score × weight). Max = 27. Rank cell lines by total score.
#### Use-Case-Specific Guidance
The best cell line depends on what you're doing with it:
| Use Case | Key Requirements | Extra Considerations |
|----------|-----------------|---------------------|
| **CRISPR knockout screen** | Adherent growth, good lentiviral transduction, pre-existing Cas9 clones (check Cellosaurus for "-Cas9" derivatives) | Doubling time matters for library coverage; <72h ideal |
| **Drug sensitivity testing** | In PharmacoDB/GDSC, known IC50 for reference compounds | Check SYNERGxDB for combo data |
| **Xenograft model** | Known tumorigenicity in mice, available PDX data | Check if line forms tumors in nude/NSG mice (Cellosaurus often notes this) |
| **Mechanism of action** | Clean genetic background, gene dependency confirmed | Fewer co-mutations = easier to attribute phenotypes |
| **Biomarker discovery** | Isogenic pairs available, well-characterized omics | Check if isogenic knockouts exist (Cellosaurus) |
| **Drug combination** | In SYNERGxDB with combo data, known single-agent responses | ZIP score available for synergy assessment |
#### Cellosaurus Derivative Lines
**Check for pre-made derivatives** — this can save months of lab work:
- `cellosaurus_search_cell_lines(q="ca:<PARENT_LINE>", size=20)` — finds all derivatives
- Look for: Cas9-expressing clones, drug-resistant derivatives, knockout lines, fluorescent reporter lines
- Example: PANC-1-Cas9-554 through PANC-1-Cas9-559 (CVCL_WL48-WL53) are pre-validated Cas9 clones
#### DepMap API Fallbacks
**If DepMap_get_gene_dependencies fails** (common for some genes):
- The Sanger Cell Model Passports API may not index all genes. Note this limitation.
- Recommend the user check DepMap portal (depmap.org) directly for CRISPR dependency data.
- Use `cBioPortal_get_mutations(study_id="ccle_broad_2019", gene_list="<GENE>")` as an alternative source for cell line mutation data.
**OUTPUT**: Ranked cell line table with total scores, per-criterion breakdown, and a text recommendation explaining the top pick and runner-up with biological reasoning.
---
## Common Use Patterns
| Pattern | Question Type | Key Tools (in order) |
|---------|--------------|---------------------|
| **1** | "Which cell line for [cancer] + [gene]?" | DepMap_get_cell_lines → DepMap_get_gene_dependencies → COSMIC_get_mutations_by_gene → cBioPortal_get_mutations (ccle_broad_2019) → PharmacoDB_get_experiments → rank by mutation + dependency + drug sensitivity |
| **2** | "Profile cell line X" | cellosaurus_search → DepMap_get_cell_line → PharmacoDB_get_cell_line → cBioPortal_get_mutations → HPA expression (if supported) → PharmacoDB_get_experiments |
| **3** | "Which lines are sensitive to [drug]?" | DepMap_get_cell_lines (tissue filter) → PharmacoDB_get_experiments (compound) → PharmacoDB_get_biomarker_assoc → rank by AAC (higher=sensitive) or IC50 (lower=sensitive) |
| **4** | "Compare A vs B" | Run Pattern 2 for both in parallel → side-by-side comparison table |
| **5** | "Drug combos for [cell line]?" | SYNERGxDB_search_combos → PharmacoDB_get_experiments (single-agent baseline) → report synergistic pairs with ZIP scores |
---
## Quick Reference: Common Cancer Cell Lines by Type
| Cancer Type | Key Cell Lines | Common Mutations |
|-------------|---------------|-----------------|
| NSCLC | A549 (KRAS G12S), H1975 (EGFR L858R/T790M), PC-9 (EGFR del19), HCC827 (EGFR del19/amp), H460 (KRAS Q61H), H1299 (NRAS Q61K, TP53-null) | KRAS, EGFR, TP53, STK11 |
| Breast | MCF7 (ER+/PR+), MDA-MB-231 (TNBC, KRAS G13D), T-47D (ER+), BT-474 (HER2+), SK-BR-3 (HER2+) | PIK3CA, TP53, BRCA1/2 |
| Colorectal | HCT116 (KRAS G13D, MSI-H), SW480 (KRAS G12V), HT-29 (BRAF V600E), Caco-2 (APC), LoVo (KRAS G13D, MSI-H) | APC, KRAS, TP53, BRAF |
| Melanoma | A375 (BRAF V600E), SK-MEL-28 (BRAF V600E), WM266-4 (BRAF V600D), MeWo (WT BRAF) | BRAF, NRAS, TP53 |
| Pancreatic | PANC-1 (KRAS G12D), MIA PaCa-2 (KRAS G12C), AsPC-1 (KRAS G12D), Capan-1 (BRCA2 mut) | KRAS, TP53, CDKN2A, SMAD4 |
| Prostate | PC-3 (AR-negative), LNCaP (AR+, PTEN-null), DU145 (AR-negative), VCaP (AR amp, TMPRSS2-ERG) | AR, PTEN, TP53, RB1 |
| Ovarian | SKOV3 (HER2+, TP53 mut), OVCAR3 (TP53 mut), A2780 (sensitive), A2780cis (cisplatin-resistant) | TP53, BRCA1/2 |
| Leukemia | K562 (CML, BCR-ABL), Jurkat (T-ALL), HL-60 (AML), THP-1 (AML, monocytic) | BCR-ABL, FLT3, NPM1 |
| Glioblastoma | U251 (TP53 mut), U87MG (PTEN-null), T98G (TP53/PTEN mut), LN229 (TP53 mut, PTEN WT) | TP53, PTEN, EGFR, IDH1 |
| Liver | HepG2 (hepatoblastoma, WT TP53), Hep3B (HBV+, TP53-null), Huh7 (HCC, TP53 Y220C) | TP53, CTNNB1, AXIN1 |
---
## Cross-Referencing Cell Line IDs
Use cell line NAME as common key across databases. IDs: DepMap=SIDM, Cellosaurus=CVCL, cBioPortal=sample (e.g. A549_LUNG), PharmacoDB/SYNERGxDB=name string. When names differ ("HCT 116" vs "HCT116"), check Cellosaurus synonyms first.
**Mutation-based filtering**: `cBioPortal_get_mutations(study_id="ccle_broad_2019", gene_list="KRAS")` → filter by `amino_acid_change` → extract cell line names → query other databases.
---
## Error Handling
| Issue | Resolution |
|-------|-----------|
| DepMap returns no results for cell line name | Try alternative names: check Cellosaurus synonyms first |
| cBioPortal CCLE study ID unknown | Use `ccle_broad_2019` as default CCLE study |
| PharmacoDB cell line name mismatch | Use `PharmacoDB_search(operation="search", query="<name>")` to find the canonical name |
| HPA cell line not supported | Only 10 lines supported (hela, mcf7, a549, hepg2, jurkat, pc3, rh30, siha, u251, ishikawa). Skip HPA for other lines |
| CLUE requires API key | Skip CLUE tools if CLUE_API_KEY not set; note in report |
| Gene symbol not found in DepMap | Use `DepMap_search_genes(query="<symbol>")` to check aliases |
| Cellosaurus accession pattern | Must be CVCL_XXXX format; search first if you only have a name |
| SYNERGxDB no results for drug combo | Drug may not be in database; SYNERGxDB covers cytotoxic agents, not most targeted therapies |
---
## Completeness Checklist
Before finalizing the report, verify:
- [ ] Cell line identity verified (Cellosaurus or DepMap)
- [ ] Species confirmed as human (unless otherwise specified)
- [ ] Key mutations documented (COSMIC or cBioPortal)
- [ ] Gene dependency assessed (DepMap CRISPR, if gene of interest provided)
- [ ] Drug sensitivity data included (PharmacoDB, at least one dataset)
- [ ] Druggability of key targets checked (DGIdb or OpenTargets)
- [ ] Practical recommendation provided (not just raw data)
- [ ] All claims cite their source database
- [ ] Known limitations noted (missing data, unsupported lines)Related Skills
tooluniverse
Router skill for ToolUniverse tasks. First checks if specialized tooluniverse skills (105+ skills covering disease/drug/target research, gene-disease associations, clinical decision support, genomics, epigenomics, proteomics, comparative genomics, chemical safety, toxicology, systems biology, and more) can solve the problem, then falls back to general strategies for using 2300+ scientific tools. Covers tool discovery, multi-hop queries, comprehensive research workflows, disambiguation, evidence grading, and report generation. Use when users need to research any scientific topic, find biological data, or explore drug/target/disease relationships. ALSO USE for any biology, medicine, chemistry, pharmacology, or life science question — even simple factoid questions like "how many X in protein Y", "what drug interacts with Z", "what gene causes disease W", or "translate this sequence". These questions benefit from database lookups (UniProt, PubMed, ChEMBL, ClinVar, GWAS Catalog, etc.) rather than answering from memory alone. When in doubt about a scientific fact, USE THIS SKILL to verify against real databases.
tooluniverse-variant-to-mechanism
End-to-end variant-to-mechanism analysis: given a genetic variant (rsID or coordinates), trace its functional impact from regulatory context (GWAS, eQTL, RegulomeDB, ENCODE) through target gene identification (GTEx, OpenTargets L2G) to downstream pathway and disease biology (STRING, Reactome, GO enrichment, disease associations). Produces an evidence-graded mechanistic narrative linking genotype to phenotype. Use when asked "how does this variant cause disease?", "what is the mechanism of rs7903146?", "trace variant to pathway", or "connect this GWAS hit to biology".
tooluniverse-variant-interpretation
Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.
tooluniverse-variant-functional-annotation
Comprehensive functional annotation of protein variants — pathogenicity, population frequency, structural context, and clinical significance. Integrates ProtVar (map_variant, get_function, get_population) for protein-level mapping and structural context, ClinVar for clinical classifications, gnomAD for population frequency with ancestry data, CADD for deleteriousness scores, and ClinGen for gene-disease validity. Produces a structured variant annotation report with evidence grading. Use when asked about protein variant impact, missense variant pathogenicity, ProtVar annotation, variant functional context, or combining population and structural evidence for a variant.
tooluniverse-variant-analysis
Production-ready VCF processing, variant annotation, mutation analysis, and structural variant (SV/CNV) interpretation for bioinformatics questions. Parses VCF files (streaming, large files), classifies mutation types (missense, nonsense, synonymous, frameshift, splice, intronic, intergenic) and structural variants (deletions, duplications, inversions, translocations), applies VAF/depth/quality/consequence filters, annotates with ClinVar/dbSNP/gnomAD/CADD via ToolUniverse, interprets SV/CNV clinical significance using ClinGen dosage sensitivity scores, computes variant statistics, and generates reports. Solves questions like "What fraction of variants with VAF < 0.3 are missense?", "How many non-reference variants remain after filtering intronic/intergenic?", "What is the pathogenicity of this deletion affecting BRCA1?", or "Which dosage-sensitive genes overlap this CNV?". Use when processing VCF files, annotating variants, filtering by VAF/depth/consequence, classifying mutations, interpreting structural variants, assessing CNV pathogenicity, comparing cohorts, or answering variant analysis questions.
tooluniverse-vaccine-design
Design and evaluate vaccine candidates using computational immunology tools. Covers epitope prediction (MHC-I/II binding via IEDB), population coverage analysis, antigen selection, adjuvant matching, and immunogenicity assessment. Integrates IEDB for epitope prediction, UniProt for antigen sequences, PDB/AlphaFold for structural epitopes, BVBRC for pathogen proteomes, and literature for clinical precedent. Use when asked about vaccine design, epitope prediction, immunogenicity, MHC binding, T-cell epitopes, B-cell epitopes, or population coverage for vaccine candidates.
tooluniverse-toxicology
Assess chemical and drug toxicity via adverse outcome pathways, real-world adverse event signals, and toxicogenomic evidence. Integrates AOPWiki (AOPWiki_list_aops, AOPWiki_get_aop) for mechanism- level pathway tracing, FAERS for post-market adverse event quantification, OpenFDA for label mining, and CTD for chemical-gene-disease evidence. Produces structured toxicity reports with evidence grading (T1-T4). Use when asked about toxicity mechanisms, adverse outcome pathways, AOP mapping, FAERS signal detection, or chemical-disease relationships for drugs or environmental chemicals.
tooluniverse-target-research
Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.
tooluniverse-systems-biology
Comprehensive systems biology and pathway analysis using multiple pathway databases (Reactome, KEGG, WikiPathways, Pathway Commons, BioModels). Performs pathway enrichment, protein-pathway mapping, keyword searches, and systems-level analysis. Use when analyzing gene sets, exploring biological pathways, or investigating systems-level biology.
tooluniverse-structural-variant-analysis
Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.
tooluniverse-structural-proteomics
Integrate structural biology data with proteomics for drug target validation. Retrieves protein structures from PDB (RCSB, PDBe), AlphaFold predictions, antibody structures (SAbDab), GPCR data (GPCRdb), binding pocket analysis (ProteinsPlus), and ligand interactions (BindingDB). Use when asked to find structures for a drug target, identify binding site ligands, cross-validate drug binding with structural data, assess structural druggability, or compare experimental vs predicted structures.
tooluniverse-stem-cell-organoid
Research stem cells, iPSCs, organoids, and cell differentiation using ToolUniverse tools. Covers pluripotency marker identification, differentiation pathway analysis, organoid model characterization, cell type annotation, and disease modeling. Integrates CellxGene/HCA for single-cell atlas data, CellMarker for cell type markers, GEO for stem cell datasets, and pathway tools for differentiation signaling. Use when asked about stem cells, iPSCs, organoids, cell reprogramming, pluripotency, differentiation protocols, or 3D culture models.