tooluniverse-variant-to-mechanism

End-to-end variant-to-mechanism analysis: given a genetic variant (rsID or coordinates), trace its functional impact from regulatory context (GWAS, eQTL, RegulomeDB, ENCODE) through target gene identification (GTEx, OpenTargets L2G) to downstream pathway and disease biology (STRING, Reactome, GO enrichment, disease associations). Produces an evidence-graded mechanistic narrative linking genotype to phenotype. Use when asked "how does this variant cause disease?", "what is the mechanism of rs7903146?", "trace variant to pathway", or "connect this GWAS hit to biology".

1,202 stars

bymims-harvard

View on GitHub Installation ↓

Best use case

tooluniverse-variant-to-mechanism is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using tooluniverse-variant-to-mechanism should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/tooluniverse-variant-to-mechanism/SKILL.md --create-dirs "https://raw.githubusercontent.com/mims-harvard/ToolUniverse/main/skills/tooluniverse-variant-to-mechanism/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/tooluniverse-variant-to-mechanism/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How tooluniverse-variant-to-mechanism Compares

Feature / Agent	tooluniverse-variant-to-mechanism	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# Variant-to-Mechanism Analysis Skill

Trace the full causal chain from a genetic variant to its disease mechanism: regulatory context,
target gene(s), molecular pathways, and phenotypic consequences. Integrates 7+ databases across
3 evidence layers (regulatory, molecular, disease) to build an evidence-graded mechanistic model.

**IMPORTANT**: Always use English terms in tool calls. Respond in the user's language.

---

## LOOK UP, DON'T GUESS
When uncertain about any scientific fact, SEARCH databases first (PubMed, UniProt, ChEMBL, ClinVar, etc.) rather than reasoning from memory. A database-verified answer is always more reliable than a guess.

---

## When to Use This Skill

Apply when users:
- Ask "how does rs7903146 cause type 2 diabetes?"
- Want to trace a GWAS variant to its biological mechanism
- Need to connect a non-coding variant to downstream pathways
- Ask "what gene does this variant affect and through what pathway?"
- Want a mechanistic narrative for a regulatory variant
- Need to identify the causal gene for a GWAS locus
- Ask "what is the functional impact of this intronic SNP?"

**NOT for** (use other skills instead):
- Coding variant pathogenicity / ACMG classification -> Use `tooluniverse-variant-interpretation`
- Pure regulatory element annotation without mechanism -> Use `tooluniverse-regulatory-genomics`
- Pure GWAS hit listing without mechanism -> Use `tooluniverse-gwas-snp-interpretation`
- Pharmacogenomic variant annotation -> Use `tooluniverse-pharmacogenomics`
- Gene-disease association without variant context -> Use `tooluniverse-gene-disease-association`

---

## Input Parameters

| Parameter | Required | Description | Example |
|-----------|----------|-------------|---------|
| **variant** | Yes | rsID or genomic coordinates | `"rs7903146"` or `"10:112998590:C:T"` |
| **trait** | No | Disease/trait context (helps prioritize) | `"type 2 diabetes"` |
| **tissue** | No | Tissue of interest for eQTL/expression | `"pancreas"` |

---

## Workflow Overview

**Phase 1** Variant Characterization (VEP, population frequency, CADD) ->
**Phase 2** Regulatory Context (GWAS, RegulomeDB, ENCODE, cCREs) ->
**Phase 3** Target Gene Identification (GTEx eQTL, OpenTargets L2G) ->
**Phase 4** Gene Function & Pathways (STRING, Reactome, GO) ->
**Phase 5** Disease Connection (OpenTargets, GenCC, DisGeNET) ->
**Phase 6** Mechanistic Synthesis (causal chain, evidence grading, confidence scoring)

---

## Phase 1: Variant Characterization

Resolve variant identity and get basic annotations.

```python
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()

# Step 1: VEP annotation
vep = tu.tools.EnsemblVEP_annotate_rsid(variant_id="rs7903146")
# NOTE: param is `variant_id`, NOT `rsid`
# Response format variable: list, {data, metadata}, or {error} -- handle all three
# Extract: consequence_type, gene, chromosome, position, alleles

# Step 2: Population frequency and pathogenicity scores
myvar = tu.tools.MyVariant_query_variants(
    query="rs7903146",  # NOTE: param is `query`, NOT `variant_id`
    fields="dbsnp,gnomad_genome,cadd,clinvar"
)
# Returns: {hits: [{_id, dbsnp, gnomad_genome: {af: {af, ...ancestry}}, cadd: {phred}, clinvar}]}
# gnomAD AF: frequency in general population
# CADD phred: >=20 = top 1% deleterious, >=30 = top 0.1%

# Step 3: Confirm gene context
gwas_snp = tu.tools.gwas_search_snps(rs_id="rs7903146")
# Returns: SNP location, mapped genes, functional class
```

---

## Phase 2: Regulatory Context

Assess the variant's regulatory environment.

### 2a: GWAS Associations

```python
# All trait associations for this variant
gwas = tu.tools.gwas_search_associations(rs_id="rs7903146", size=50)
# Returns: {data: [{disease_trait, p_value, or_per_copy_number, beta, study_accession}]}
# NOTE: Use gwas_search_associations, NOT gwas_get_associations_for_trait (BROKEN)
```

### 2b: RegulomeDB Score

```python
regulome = tu.tools.RegulomeDB_query_variant(rsid="rs7903146")
# Returns: {status, data: {score/ranking, probability, features}}
# Score 1a-2a = strong regulatory evidence
# Uses GRCh38 (NOT hg19)
```

### 2c: ENCODE Chromatin Context

```python
# Check enhancer (H3K27ac), promoter (H3K4me3), and accessibility in disease-relevant tissue
encode_h3k27ac = tu.tools.ENCODE_search_histone_experiments(
    histone_mark="H3K27ac", biosample_term_name="pancreas", limit=10)
encode_h3k4me3 = tu.tools.ENCODE_search_histone_experiments(
    histone_mark="H3K4me3", biosample_term_name="pancreas", limit=10)
encode_atac = tu.tools.ENCODE_search_chromatin_accessibility(
    biosample_term_name="pancreas", limit=10)
```

### 2d: cCRE Overlap (if coordinates known)

```python
# Check candidate cis-regulatory elements
ccres = tu.tools.UCSC_get_encode_cCREs(
    chrom="chr10",
    start=112998000,
    end=112999000
)
# Returns: cCRE type (PLS=promoter, pELS=proximal enhancer, dELS=distal enhancer, CTCF-only)
```

### 2e: OLS Trait Resolution (if trait provided)

```python
# Resolve trait to ontology ID for downstream tools
ols = tu.tools.ols_search_terms(query="type 2 diabetes", ontology="efo")
# Returns: {status, data: [{iri, label, short_form, ontology_name}]}
# For OpenTargets: use MONDO_0005148 for T2D (NOT EFO_0001360)
```

---

## Phase 3: Target Gene Identification

Identify which gene(s) the variant regulates. This is the critical link.

### 3a: GTEx eQTLs (primary evidence)

```python
# Find eQTL associations for the nearest gene
eqtls = tu.tools.GTEx_query_eqtl(gene_symbol="TCF7L2", size=100)
# Returns: {status, data: [{snpId, geneSymbol, tissueSiteDetailId, pValue, nes}]}
# Filter for the specific rsID in the results
# NES > 0: alt allele increases expression; NES < 0: decreases

# Check expression levels to prioritize tissues
expression = tu.tools.GTEx_get_median_gene_expression(gene_symbol="TCF7L2")
# Returns: median TPM across GTEx tissues
# IMPORTANT: GTEx uses v8 data (v10 endpoints may return empty)
```

### 3b: OpenTargets Locus-to-Gene (L2G)

```python
# Get credible sets and L2G predictions
credible = tu.tools.OpenTargets_get_variant_credible_sets(
    variant_id="10_112998590_C_T"  # chr_pos_ref_alt format
)
# Returns: credible set membership, L2G scores for causal genes
# L2G > 0.5: high confidence causal gene
# L2G 0.1-0.5: moderate confidence
```

### 3c: Nearest Gene Mapping

```python
# Get gene details for the candidate target
gene_info = tu.tools.MyGene_query_genes(
    query="symbol:TCF7L2",
    species="human",
    fields="symbol,ensembl.gene,entrezgene,name,summary,go",
    size=5
)
# Extract Ensembl ID for OpenTargets queries
# Filter by exact symbol match (first hit may not be correct gene)
```

### How to Reason About Target Gene Identification

The nearest gene is often NOT the causal gene for non-coding variants. Regulatory elements can act over hundreds of kilobases, skipping intervening genes entirely. Before looking at tools, reason about which gene is the likely target:

- **eQTL in disease-relevant tissue is the strongest evidence.** If a variant is an eQTL for Gene X in pancreas and the trait is diabetes, that is a direct mechanistic link. An eQTL in an unrelated tissue is weaker but still informative.
- **L2G scores integrate multiple signals** (distance, chromatin interaction, eQTL) into a single prediction. L2G > 0.5 is high confidence, but it is a computational prediction, not experimental proof. Combine it with eQTL data -- when both agree, confidence is high.
- **Proximity alone is unreliable.** Many GWAS loci have the causal gene 2-3 genes away from the lead SNP. Never default to "nearest gene" without checking eQTL and L2G.
- **When multiple candidate genes have evidence**, present all of them ranked by strength. The answer may genuinely be uncertain.

---

## Phase 4: Gene Function & Pathways

Once target gene(s) identified, characterize molecular function.

```python
# STRING PPI network — NOTE: param is `identifiers` (string), NOT `protein_ids`
ppi = tu.tools.STRING_get_interaction_partners(
    identifiers="TCF7L2", species=9606, required_score=700)

# STRING functional enrichment (GO, KEGG, Reactome)
enrichment = tu.tools.STRING_get_functional_enrichment(
    identifiers="TCF7L2", species=9606)

# Reactome pathway enrichment — space-separated STRING, NOT array
reactome = tu.tools.ReactomeAnalysis_pathway_enrichment(
    identifiers="TCF7L2 CTNNB1 APC")

# UniProt function (returns list of strings, NOT dict)
uniprot_func = tu.tools.UniProt_get_function_by_accession(accession="Q9NQB0")

# PANTHER GO enrichment — GO:0008150=BP, GO:0003674=MF, GO:0005575=CC
panther = tu.tools.PANTHER_enrichment(
    gene_list="TCF7L2,CTNNB1,APC,LEF1", organism=9606,
    annotation_dataset="GO:0008150")
```

---

## Phase 5: Disease Connection

Link the molecular mechanism to disease phenotype.

```python
# Disease associations for target gene (requires Ensembl ID)
ot_diseases = tu.tools.OpenTargets_get_diseases_phenotypes_by_target_ensembl(
    ensemblId="ENSG00000148737")

# Specific gene-disease evidence (both IDs must be pre-resolved)
ot_evidence = tu.tools.OpenTargets_target_disease_evidence(
    ensemblId="ENSG00000148737", efoId="MONDO_0005148")

# GenCC curated validity (Definitive/Strong/Moderate/Limited/Disputed/Refuted)
gencc = tu.tools.GenCC_search_gene(gene_symbol="TCF7L2")

# DisGeNET scored associations (requires DISGENET_API_KEY)
disgenet = tu.tools.DisGeNET_search_gene(gene="TCF7L2", limit=20)

# OpenTargets GWAS studies — use MONDO IDs, NOT EFO
ot_gwas = tu.tools.OpenTargets_search_gwas_studies_by_disease(
    diseaseIds=["MONDO_0005148"], size=20)
```

---

## Phase 6: Mechanistic Synthesis

### How to Build and Evaluate the Causal Chain

The goal is an explicit chain: **Variant -> changed protein or expression -> altered pathway -> disease phenotype.** Each arrow is a link that needs evidence. A chain with all links supported is strong. A chain with a missing link is a hypothesis, not a conclusion.

**Start with the variant's location.** Is this coding or non-coding? Coding variants directly change the protein -- a missense variant swaps one amino acid, a nonsense variant truncates it. Non-coding variants change REGULATION -- they affect how much protein is made, not what protein is made. This distinction determines your entire analysis path.

**For coding variants**, ask: Is this position conserved across species? Is it in a functional domain (active site, binding interface, transmembrane helix)? A variant in a conserved active site residue is more likely pathogenic than one in a disordered loop. Check ClinVar and CADD for existing assessments, but reason about the structural context yourself.

**For non-coding variants**, ask: Is there an eQTL linking this variant to a nearby gene? Is the variant in a TFBS, enhancer, or promoter (from RegulomeDB and ENCODE)? A variant in open chromatin with an eQTL in disease-relevant tissue has a clear regulatory mechanism. A variant in a gene desert with no eQTL is much harder to interpret -- acknowledge this uncertainty.

**Evaluate each link independently.** For each arrow in the chain, ask: what is the evidence? Is it replicated? Is it from the right tissue? Then assess the chain as a whole:
- All links well-supported with convergent evidence from multiple databases = **Established** mechanism
- Most links strong but one relies on a single source = **Strong** mechanism
- Evidence is mixed or only moderate across links = **Moderate** mechanism
- One or more links have no direct evidence, only inference = **Preliminary** mechanism
- The connection rests on proximity or text-mining alone = **Speculative** mechanism

**The evidence hierarchy matters.** Clinical observation (ClinVar pathogenic with clinical data) is stronger than functional assays (eQTL, chromatin marks), which are stronger than computational predictions (CADD score, L2G model). A CADD score alone is weak evidence. Multiple convergent computational predictions are moderate. An eQTL replicated across studies in disease-relevant tissue is strong.

**Always consider alternative mechanisms.** If eQTL points to Gene A but L2G favors Gene B, present both models. If the variant could act through splicing rather than expression, note that. Honest uncertainty is more useful than false confidence.

---

## Evidence Grading

Grade each piece of evidence by how directly it supports a mechanistic link:

- **T1 (Strongest)**: Replicated GWAS association combined with functional validation -- an eQTL in the right tissue plus regulatory chromatin marks at the variant position. This is convergent evidence from independent sources.
- **T2**: Curated biological knowledge -- the target gene is in a known disease-relevant pathway (Reactome, GenCC Definitive). This does not prove the variant acts through this pathway, but it makes the mechanism biologically plausible.
- **T3**: Computational predictions -- L2G scores, PPI network enrichment, GO term enrichment. These are hypothesis-generating, not confirmatory. Useful when T1/T2 evidence is sparse.
- **T4**: Single-source or indirect evidence -- nearest gene assignment, literature co-mention, text-mining associations. These should never be the sole basis for a mechanistic claim.

---

## Common Mistakes and Fallbacks

**Parameter pitfalls** (correct param names are shown in code blocks above):
- `EnsemblVEP_annotate_rsid` uses `variant_id`, NOT `rsid`
- `MyVariant_query_variants` uses `query`, NOT `variant_id`
- `ReactomeAnalysis_pathway_enrichment` takes space-separated STRING, NOT array
- OpenTargets disease search uses `queryString`, NOT `query`
- Use MONDO IDs (e.g., MONDO_0005148 for T2D), NOT EFO IDs in OpenTargets
- `gwas_get_associations_for_trait` is BROKEN; use `gwas_search_associations`
- ENCODE biosample names are tissues/cell lines, not disease names

**When data is missing**, adapt the analysis:
- No eQTL: check additional tissues, use L2G alone, note reduced confidence
- No GWAS hits: variant may be sub-threshold or in LD with the true signal
- No RegulomeDB data: check ENCODE directly for chromatin marks at the position
- No L2G data: fall back to eQTL + nearest gene with appropriate caveats
- Gene not in STRING/Reactome: use UniProt function + GO annotations
- Multiple candidate genes: present all ranked by evidence, do not force a single answer

---

## Reasoning Through Common Scenarios

### Scenario 1: Non-coding variant with clear regulatory signal

When VEP shows an intronic or intergenic variant, GWAS shows strong trait associations, and RegulomeDB returns a high score (1a-2a), the path is straightforward: the variant disrupts a regulatory element. The key question becomes WHICH GENE it regulates. Check eQTL in disease-relevant tissue first, then L2G. When both converge on the same gene, follow the pathway analysis through to disease. This is the "happy path" and yields Established confidence.

### Scenario 2: Non-coding variant with ambiguous target gene

When the variant is intergenic with multiple nearby genes and eQTL/L2G evidence is weak or conflicting, resist the temptation to pick the nearest gene. Instead: check eQTLs for ALL candidate genes in the locus, compare L2G scores, and consider whether one candidate has known disease-relevant biology. Present multiple models ranked by evidence. Report as Preliminary or Moderate.

### Scenario 3: No functional signal at all

When there is no eQTL, no L2G support, and minimal regulatory annotation, acknowledge the gap honestly. The variant may act through a mechanism not captured by current databases (cell-type-specific regulation, splicing QTL, 3D genome interactions). State what was checked, what was negative, and what experiments would resolve the question. This is a valid and useful finding.

---

## Programmatic Access (Beyond Tools)

When ToolUniverse tools return partial annotations, aggregate from multiple sources directly:

```python
import requests, pandas as pd

rsid = "rs12345"

# Ensembl VEP REST — functional consequence prediction
vep = requests.get(f"https://rest.ensembl.org/vep/human/id/{rsid}?content-type=application/json").json()

# GTEx eQTL — tissue-specific expression effects
eqtl = requests.get(f"https://gtexportal.org/api/v2/association/singleTissueEqtl",
    params={"snpId": rsid, "datasetId": "gtex_v8"}).json()

# ClinVar bulk (variant_summary.txt.gz, ~200MB, all variants)
# Download once, filter locally:
# df = pd.read_csv("https://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/variant_summary.txt.gz", sep="\t")
# hits = df[df["RS# (dbSNP)"] == int(rsid.replace("rs",""))]

# Combine into unified annotation DataFrame
annotations = {"rsid": rsid, "vep_consequence": vep[0]["most_severe_consequence"] if vep else None,
    "eqtl_tissues": len(eqtl.get("singleTissueEqtl", [])),
    "sources_checked": ["VEP", "GTEx", "ClinVar"]}
```

See `tooluniverse-data-wrangling` skill for API patterns and error handling.

---

## Limitations

- GTEx eQTLs are bulk tissue (v8 data); cell-type-specific and splicing effects may be missed.
- L2G and CADD are computational predictions, not experimental proof.
- GWAS associations are correlative; LD tagging means the lead SNP may not be the causal variant.
- Long-range regulatory effects (>1Mb) and 3D genome interactions are poorly captured by current tools.
- OpenTargets variant credible sets require chr_pos_ref_alt format; get alleles from VEP first.

Related Skills

tooluniverse

1202

from mims-harvard/ToolUniverse

Router skill for ToolUniverse tasks. First checks if specialized tooluniverse skills (105+ skills covering disease/drug/target research, gene-disease associations, clinical decision support, genomics, epigenomics, proteomics, comparative genomics, chemical safety, toxicology, systems biology, and more) can solve the problem, then falls back to general strategies for using 2300+ scientific tools. Covers tool discovery, multi-hop queries, comprehensive research workflows, disambiguation, evidence grading, and report generation. Use when users need to research any scientific topic, find biological data, or explore drug/target/disease relationships. ALSO USE for any biology, medicine, chemistry, pharmacology, or life science question — even simple factoid questions like "how many X in protein Y", "what drug interacts with Z", "what gene causes disease W", or "translate this sequence". These questions benefit from database lookups (UniProt, PubMed, ChEMBL, ClinVar, GWAS Catalog, etc.) rather than answering from memory alone. When in doubt about a scientific fact, USE THIS SKILL to verify against real databases.

tooluniverse-variant-interpretation

1202

from mims-harvard/ToolUniverse

Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.

tooluniverse-variant-functional-annotation

1202

from mims-harvard/ToolUniverse

Comprehensive functional annotation of protein variants — pathogenicity, population frequency, structural context, and clinical significance. Integrates ProtVar (map_variant, get_function, get_population) for protein-level mapping and structural context, ClinVar for clinical classifications, gnomAD for population frequency with ancestry data, CADD for deleteriousness scores, and ClinGen for gene-disease validity. Produces a structured variant annotation report with evidence grading. Use when asked about protein variant impact, missense variant pathogenicity, ProtVar annotation, variant functional context, or combining population and structural evidence for a variant.

tooluniverse-variant-analysis

1202

from mims-harvard/ToolUniverse

Production-ready VCF processing, variant annotation, mutation analysis, and structural variant (SV/CNV) interpretation for bioinformatics questions. Parses VCF files (streaming, large files), classifies mutation types (missense, nonsense, synonymous, frameshift, splice, intronic, intergenic) and structural variants (deletions, duplications, inversions, translocations), applies VAF/depth/quality/consequence filters, annotates with ClinVar/dbSNP/gnomAD/CADD via ToolUniverse, interprets SV/CNV clinical significance using ClinGen dosage sensitivity scores, computes variant statistics, and generates reports. Solves questions like "What fraction of variants with VAF < 0.3 are missense?", "How many non-reference variants remain after filtering intronic/intergenic?", "What is the pathogenicity of this deletion affecting BRCA1?", or "Which dosage-sensitive genes overlap this CNV?". Use when processing VCF files, annotating variants, filtering by VAF/depth/consequence, classifying mutations, interpreting structural variants, assessing CNV pathogenicity, comparing cohorts, or answering variant analysis questions.

tooluniverse-vaccine-design

1202

from mims-harvard/ToolUniverse

Design and evaluate vaccine candidates using computational immunology tools. Covers epitope prediction (MHC-I/II binding via IEDB), population coverage analysis, antigen selection, adjuvant matching, and immunogenicity assessment. Integrates IEDB for epitope prediction, UniProt for antigen sequences, PDB/AlphaFold for structural epitopes, BVBRC for pathogen proteomes, and literature for clinical precedent. Use when asked about vaccine design, epitope prediction, immunogenicity, MHC binding, T-cell epitopes, B-cell epitopes, or population coverage for vaccine candidates.

tooluniverse-toxicology

1202

from mims-harvard/ToolUniverse

Assess chemical and drug toxicity via adverse outcome pathways, real-world adverse event signals, and toxicogenomic evidence. Integrates AOPWiki (AOPWiki_list_aops, AOPWiki_get_aop) for mechanism- level pathway tracing, FAERS for post-market adverse event quantification, OpenFDA for label mining, and CTD for chemical-gene-disease evidence. Produces structured toxicity reports with evidence grading (T1-T4). Use when asked about toxicity mechanisms, adverse outcome pathways, AOP mapping, FAERS signal detection, or chemical-disease relationships for drugs or environmental chemicals.

tooluniverse-target-research

1202

from mims-harvard/ToolUniverse

Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.

tooluniverse-systems-biology

1202

from mims-harvard/ToolUniverse

Comprehensive systems biology and pathway analysis using multiple pathway databases (Reactome, KEGG, WikiPathways, Pathway Commons, BioModels). Performs pathway enrichment, protein-pathway mapping, keyword searches, and systems-level analysis. Use when analyzing gene sets, exploring biological pathways, or investigating systems-level biology.

tooluniverse-structural-variant-analysis

1202

from mims-harvard/ToolUniverse

Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.

tooluniverse-structural-proteomics

1202

from mims-harvard/ToolUniverse

Integrate structural biology data with proteomics for drug target validation. Retrieves protein structures from PDB (RCSB, PDBe), AlphaFold predictions, antibody structures (SAbDab), GPCR data (GPCRdb), binding pocket analysis (ProteinsPlus), and ligand interactions (BindingDB). Use when asked to find structures for a drug target, identify binding site ligands, cross-validate drug binding with structural data, assess structural druggability, or compare experimental vs predicted structures.

tooluniverse-stem-cell-organoid

1202

from mims-harvard/ToolUniverse

Research stem cells, iPSCs, organoids, and cell differentiation using ToolUniverse tools. Covers pluripotency marker identification, differentiation pathway analysis, organoid model characterization, cell type annotation, and disease modeling. Integrates CellxGene/HCA for single-cell atlas data, CellMarker for cell type markers, GEO for stem cell datasets, and pathway tools for differentiation signaling. Use when asked about stem cells, iPSCs, organoids, cell reprogramming, pluripotency, differentiation protocols, or 3D culture models.

tooluniverse-statistical-modeling

1202

from mims-harvard/ToolUniverse

Perform statistical modeling and regression analysis on biomedical datasets. Supports linear regression, logistic regression (binary/ordinal/multinomial), mixed-effects models, Cox proportional hazards survival analysis, Kaplan-Meier estimation, and comprehensive model diagnostics. Extracts odds ratios, hazard ratios, confidence intervals, p-values, and effect sizes. Designed to solve BixBench statistical reasoning questions involving clinical/experimental data. Use when asked to fit regression models, compute odds ratios, perform survival analysis, run statistical tests, or interpret model coefficients from provided data.