tooluniverse-rare-disease-diagnosis
Provide differential diagnosis for patients with suspected rare diseases based on phenotype and genetic data. Matches symptoms to HPO terms, identifies candidate diseases from Orphanet/OMIM, prioritizes genes for testing, interprets variants of uncertain significance. Use when clinician asks about rare disease diagnosis, unexplained phenotypes, or genetic testing interpretation.
Best use case
tooluniverse-rare-disease-diagnosis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Provide differential diagnosis for patients with suspected rare diseases based on phenotype and genetic data. Matches symptoms to HPO terms, identifies candidate diseases from Orphanet/OMIM, prioritizes genes for testing, interprets variants of uncertain significance. Use when clinician asks about rare disease diagnosis, unexplained phenotypes, or genetic testing interpretation.
Teams using tooluniverse-rare-disease-diagnosis should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/tooluniverse-rare-disease-diagnosis/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How tooluniverse-rare-disease-diagnosis Compares
| Feature / Agent | tooluniverse-rare-disease-diagnosis | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Provide differential diagnosis for patients with suspected rare diseases based on phenotype and genetic data. Matches symptoms to HPO terms, identifies candidate diseases from Orphanet/OMIM, prioritizes genes for testing, interprets variants of uncertain significance. Use when clinician asks about rare disease diagnosis, unexplained phenotypes, or genetic testing interpretation.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Rare Disease Diagnosis Advisor
Systematic diagnosis support for rare diseases using phenotype matching, gene panel prioritization, and variant interpretation across Orphanet, OMIM, HPO, ClinVar, and structure-based analysis.
**KEY PRINCIPLES**:
1. **Report-first approach** - Create report file FIRST, update progressively
2. **Phenotype-driven** - Convert symptoms to HPO terms before searching
3. **Multi-database triangulation** - Cross-reference Orphanet, OMIM, OpenTargets
4. **Evidence grading** - Grade diagnoses by supporting evidence strength
5. **Actionable output** - Prioritized differential diagnosis with next steps
6. **Genetic counseling aware** - Consider inheritance patterns and family history
7. **English-first queries** - Always use English terms in tool calls (phenotype descriptions, gene names, disease names), even if the user writes in another language. Only try original-language terms as a fallback. Respond in the user's language
---
## When to Use
Apply when user asks:
- "Patient has [symptoms], what rare disease could this be?"
- "Unexplained developmental delay with [features]"
- "WES found VUS in [gene], is this pathogenic?"
- "What genes should we test for [phenotype]?"
- "Differential diagnosis for [rare symptom combination]"
---
## Critical Workflow Requirements
### 1. Report-First Approach (MANDATORY)
1. **Create the report file FIRST**:
- File name: `[PATIENT_ID]_rare_disease_report.md`
- Initialize with all section headers
- Add placeholder text: `[Researching...]`
2. **Progressively update** as you gather data
3. **Output separate data files**:
- `[PATIENT_ID]_gene_panel.csv` - Prioritized genes for testing
- `[PATIENT_ID]_variant_interpretation.csv` - If variants provided
### 2. Citation Requirements (MANDATORY)
Every finding MUST include source:
```markdown
### Candidate Disease: Marfan Syndrome
- **ORPHA**: ORPHA:558
- **OMIM**: 154700
- **Phenotype match**: 85% (17/20 HPO terms)
- **Inheritance**: AD
- **Gene**: FBN1
*Source: Orphanet via `Orphanet_558`, OMIM via `OMIM_get_entry`*
```
---
## Phase 0: Tool Verification
**CRITICAL**: Verify tool parameters before calling.
### Known Parameter Corrections
| Tool | WRONG Parameter | CORRECT Parameter |
|------|-----------------|-------------------|
| `OpenTargets_get_associated_diseases_by_target_ensemblId` | `ensemblID` | `ensemblId` |
| `ClinVar_get_variant_by_id` | `variant_id` | `id` |
| `MyGene_query_genes` | `gene` | `q` |
| `gnomAD_get_variant_frequencies` | `variant` | `variant_id` |
---
## Workflow Overview
```
Phase 1: Phenotype Standardization
├── Convert symptoms to HPO terms
├── Identify core vs. variable features
└── Note age of onset, inheritance hints
↓
Phase 2: Disease Matching
├── Orphanet phenotype search
├── OMIM clinical synopsis match
├── OpenTargets disease associations
└── OUTPUT: Ranked differential diagnosis
↓
Phase 3: Gene Panel Identification
├── Extract genes from top diseases
├── Cross-reference expression (GTEx)
├── Prioritize by evidence strength
└── OUTPUT: Recommended gene panel
↓
Phase 3.5: Expression & Tissue Context (NEW)
├── CELLxGENE: Cell-type specific expression
├── ChIPAtlas: Regulatory context (TF binding)
├── Tissue-specific gene networks
└── OUTPUT: Expression validation
↓
Phase 3.6: Pathway Analysis (NEW)
├── KEGG: Metabolic/signaling pathways
├── Reactome: Biological processes
├── IntAct: Protein-protein interactions
└── OUTPUT: Biological context
↓
Phase 4: Variant Interpretation (if provided)
├── ClinVar pathogenicity lookup
├── gnomAD population frequency
├── Protein domain/function impact
├── ENCODE/ChIPAtlas: Regulatory variant impact
└── OUTPUT: Variant classification
↓
Phase 5: Structure Analysis (for VUS)
├── NvidiaNIM_alphafold2 → Predict structure
├── Map variant to structure
├── Assess functional domain impact
└── OUTPUT: Structural evidence
↓
Phase 6: Literature Evidence (NEW)
├── PubMed: Published studies
├── BioRxiv/MedRxiv: Preprints
├── OpenAlex: Citation analysis
└── OUTPUT: Literature support
↓
Phase 7: Report Synthesis
├── Prioritized differential diagnosis
├── Recommended genetic testing
├── Next steps for clinician
└── OUTPUT: Final report
```
---
## Phase 1: Phenotype Standardization
### 1.1 Convert Symptoms to HPO Terms
```python
def standardize_phenotype(tu, symptoms_list):
"""Convert clinical descriptions to HPO terms."""
hpo_terms = []
for symptom in symptoms_list:
# Search HPO for matching terms
results = tu.tools.HPO_search_terms(query=symptom)
if results:
hpo_terms.append({
'original': symptom,
'hpo_id': results[0]['id'],
'hpo_name': results[0]['name'],
'confidence': 'exact' if symptom.lower() in results[0]['name'].lower() else 'partial'
})
return hpo_terms
```
### 1.2 Phenotype Categories
| Category | Examples | Weight |
|----------|----------|--------|
| **Core features** | Always present in disease | High |
| **Variable features** | Present in >50% | Medium |
| **Occasional features** | Present in <50% | Low |
| **Age-specific** | Onset-dependent | Context |
### 1.3 Output for Report
```markdown
## 1. Phenotype Analysis
### 1.1 Standardized HPO Terms
| Clinical Feature | HPO Term | HPO ID | Category |
|------------------|----------|--------|----------|
| Tall stature | Tall stature | HP:0000098 | Core |
| Long fingers | Arachnodactyly | HP:0001166 | Core |
| Heart murmur | Cardiac murmur | HP:0030148 | Variable |
| Joint hypermobility | Joint hypermobility | HP:0001382 | Core |
**Total HPO Terms**: 8
**Onset**: Childhood
**Family History**: Father with similar features (AD suspected)
*Source: HPO via `HPO_search_terms`*
```
---
## Phase 2: Disease Matching
### 2.1 Orphanet Disease Search (NEW TOOLS)
```python
def match_diseases_orphanet(tu, symptom_keywords):
"""Find rare diseases matching symptoms using Orphanet."""
candidate_diseases = []
# Search Orphanet by disease keywords
for keyword in symptom_keywords:
results = tu.tools.Orphanet_search_diseases(
operation="search_diseases",
query=keyword
)
if results.get('status') == 'success':
candidate_diseases.extend(results['data']['results'])
# Get genes for each disease
for disease in candidate_diseases:
orpha_code = disease.get('ORPHAcode')
genes = tu.tools.Orphanet_get_genes(
operation="get_genes",
orpha_code=orpha_code
)
disease['genes'] = genes.get('data', {}).get('genes', [])
return deduplicate_and_rank(candidate_diseases)
```
### 2.2 OMIM Cross-Reference (NEW TOOLS)
```python
def cross_reference_omim(tu, orphanet_diseases, gene_symbols):
"""Get OMIM details for diseases and genes."""
omim_data = {}
# Search OMIM for each disease/gene
for gene in gene_symbols:
search_result = tu.tools.OMIM_search(
operation="search",
query=gene,
limit=5
)
if search_result.get('status') == 'success':
for entry in search_result['data'].get('entries', []):
mim_number = entry.get('mimNumber')
# Get detailed entry
details = tu.tools.OMIM_get_entry(
operation="get_entry",
mim_number=str(mim_number)
)
# Get clinical synopsis (phenotype features)
synopsis = tu.tools.OMIM_get_clinical_synopsis(
operation="get_clinical_synopsis",
mim_number=str(mim_number)
)
omim_data[gene] = {
'mim_number': mim_number,
'details': details.get('data', {}),
'clinical_synopsis': synopsis.get('data', {})
}
return omim_data
```
### 2.3 DisGeNET Gene-Disease Associations (NEW TOOLS)
```python
def get_gene_disease_associations(tu, gene_symbols):
"""Get gene-disease associations from DisGeNET."""
associations = {}
for gene in gene_symbols:
# Get diseases associated with gene
result = tu.tools.DisGeNET_search_gene(
operation="search_gene",
gene=gene,
limit=20
)
if result.get('status') == 'success':
associations[gene] = result['data'].get('associations', [])
return associations
def get_disease_genes_disgenet(tu, disease_name):
"""Get all genes associated with a disease."""
result = tu.tools.DisGeNET_search_disease(
operation="search_disease",
disease=disease_name,
limit=30
)
return result.get('data', {}).get('associations', [])
```
### 2.4 Phenotype Overlap Scoring
| Match Level | Score | Criteria |
|-------------|-------|----------|
| **Excellent** | >80% | Most core + variable features match |
| **Good** | 60-80% | Core features match, some variable |
| **Possible** | 40-60% | Some overlap, needs consideration |
| **Unlikely** | <40% | Poor phenotype fit |
### 2.5 Output for Report
```markdown
## 2. Differential Diagnosis
### Top Candidate Diseases (Ranked by Phenotype Match)
| Rank | Disease | ORPHA | OMIM | Match | Inheritance | Key Gene(s) |
|------|---------|-------|------|-------|-------------|-------------|
| 1 | Marfan syndrome | 558 | 154700 | 85% | AD | FBN1 |
| 2 | Loeys-Dietz syndrome | 60030 | 609192 | 72% | AD | TGFBR1, TGFBR2 |
| 3 | Ehlers-Danlos, vascular | 286 | 130050 | 65% | AD | COL3A1 |
| 4 | Homocystinuria | 394 | 236200 | 58% | AR | CBS |
### DisGeNET Gene-Disease Evidence
| Gene | Associated Diseases | GDA Score | Evidence |
|------|---------------------|-----------|----------|
| FBN1 | Marfan syndrome, MASS phenotype | 0.95 | ★★★ Curated |
| TGFBR1 | Loeys-Dietz syndrome | 0.89 | ★★★ Curated |
| COL3A1 | vascular EDS | 0.91 | ★★★ Curated |
*Source: DisGeNET via `DisGeNET_search_gene`*
### Disease Details
#### 1. Marfan Syndrome (★★★)
**ORPHA**: 558 | **OMIM**: 154700 | **Prevalence**: 1-5/10,000
**Phenotype Match Analysis**:
| Patient Feature | Disease Feature | Match |
|-----------------|-----------------|-------|
| Tall stature | Present in 95% | ✓ |
| Arachnodactyly | Present in 90% | ✓ |
| Joint hypermobility | Present in 85% | ✓ |
| Cardiac murmur | Aortic root dilation (70%) | Partial |
**OMIM Clinical Synopsis** (via `OMIM_get_clinical_synopsis`):
- **Cardiovascular**: Aortic root dilation, mitral valve prolapse
- **Skeletal**: Scoliosis, pectus excavatum, tall stature
- **Ocular**: Ectopia lentis, myopia
**Diagnostic Criteria**: Ghent nosology (2010)
- Aortic root dilation/dissection + FBN1 mutation = Diagnosis
- Without genetic testing: systemic score ≥7 + ectopia lentis
**Inheritance**: Autosomal dominant (25% de novo)
*Source: Orphanet via `Orphanet_get_disease`, OMIM via `OMIM_get_entry`, DisGeNET*
```
---
## Phase 3: Gene Panel Identification
### 3.1 Extract Disease Genes
```python
def build_gene_panel(tu, candidate_diseases):
"""Build prioritized gene panel from candidate diseases."""
genes = {}
for disease in candidate_diseases:
for gene in disease['genes']:
if gene not in genes:
genes[gene] = {
'symbol': gene,
'diseases': [],
'evidence_level': 'unknown'
}
genes[gene]['diseases'].append(disease['name'])
return genes
```
### 3.1.1 ClinGen Gene-Disease Validity Check (NEW)
**Critical**: Always verify gene-disease validity through ClinGen before including in panel.
```python
def get_clingen_gene_evidence(tu, gene_symbol):
"""
Get ClinGen gene-disease validity and dosage sensitivity.
ESSENTIAL for rare disease gene panel prioritization.
"""
# 1. Gene-disease validity classification
validity = tu.tools.ClinGen_search_gene_validity(gene=gene_symbol)
validity_levels = []
diseases_with_validity = []
if validity.get('data'):
for entry in validity.get('data', []):
validity_levels.append(entry.get('Classification'))
diseases_with_validity.append({
'disease': entry.get('Disease Label'),
'mondo_id': entry.get('Disease ID (MONDO)'),
'classification': entry.get('Classification'),
'inheritance': entry.get('Inheritance')
})
# 2. Dosage sensitivity (critical for CNV interpretation)
dosage = tu.tools.ClinGen_search_dosage_sensitivity(gene=gene_symbol)
hi_score = None
ts_score = None
if dosage.get('data'):
for entry in dosage.get('data', []):
hi_score = entry.get('Haploinsufficiency Score')
ts_score = entry.get('Triplosensitivity Score')
break
# 3. Clinical actionability (return of findings context)
actionability = tu.tools.ClinGen_search_actionability(gene=gene_symbol)
is_actionable = (actionability.get('adult_count', 0) > 0 or
actionability.get('pediatric_count', 0) > 0)
# Determine best evidence level
level_priority = ['Definitive', 'Strong', 'Moderate', 'Limited', 'Disputed', 'Refuted']
best_level = 'Not curated'
for level in level_priority:
if level in validity_levels:
best_level = level
break
return {
'gene': gene_symbol,
'evidence_level': best_level,
'diseases_curated': diseases_with_validity,
'haploinsufficiency_score': hi_score,
'triplosensitivity_score': ts_score,
'is_actionable': is_actionable,
'include_in_panel': best_level in ['Definitive', 'Strong', 'Moderate']
}
def prioritize_genes_with_clingen(tu, gene_list):
"""Prioritize genes using ClinGen evidence levels."""
prioritized = []
for gene in gene_list:
evidence = get_clingen_gene_evidence(tu, gene)
# Score based on ClinGen classification
score = 0
if evidence['evidence_level'] == 'Definitive':
score = 5
elif evidence['evidence_level'] == 'Strong':
score = 4
elif evidence['evidence_level'] == 'Moderate':
score = 3
elif evidence['evidence_level'] == 'Limited':
score = 1
# Disputed/Refuted get 0
# Bonus for haploinsufficiency score 3
if evidence['haploinsufficiency_score'] == '3':
score += 1
# Bonus for actionability
if evidence['is_actionable']:
score += 1
prioritized.append({
**evidence,
'priority_score': score
})
# Sort by priority score
return sorted(prioritized, key=lambda x: x['priority_score'], reverse=True)
```
**ClinGen Classification Impact on Panel**:
| Classification | Include in Panel? | Priority |
|----------------|-------------------|----------|
| **Definitive** | YES - mandatory | Highest |
| **Strong** | YES - highly recommended | High |
| **Moderate** | YES | Medium |
| **Limited** | Include but flag | Low |
| **Disputed** | Exclude or separate | Avoid |
| **Refuted** | EXCLUDE | Do not test |
| **Not curated** | Use other evidence | Variable |
### 3.2 Gene Prioritization Criteria
| Priority | Criteria | Points |
|----------|----------|--------|
| **Tier 1** | Gene causes #1 ranked disease | +5 |
| **Tier 2** | Gene causes multiple candidates | +3 |
| **Tier 3** | ClinGen "Definitive" evidence | +3 |
| **Tier 4** | Expressed in affected tissue | +2 |
| **Tier 5** | Constraint score pLI >0.9 | +1 |
### 3.3 Expression Validation
```python
def validate_expression(tu, gene_symbol, affected_tissue):
"""Check if gene is expressed in relevant tissue."""
# Get Ensembl ID
gene_info = tu.tools.MyGene_query_genes(q=gene_symbol, species="human")
ensembl_id = gene_info.get('ensembl', {}).get('gene')
# Check GTEx expression
expression = tu.tools.GTEx_get_median_gene_expression(
gencode_id=f"{ensembl_id}.latest"
)
return expression.get(affected_tissue, 0) > 1 # TPM > 1
```
### 3.4 Output for Report
```markdown
## 3. Recommended Gene Panel
### 3.1 Prioritized Genes for Testing
| Priority | Gene | Diseases | Evidence | Constraint (pLI) | Expression |
|----------|------|----------|----------|------------------|------------|
| ★★★ | FBN1 | Marfan syndrome | Definitive | 1.00 | Heart, aorta |
| ★★★ | TGFBR1 | Loeys-Dietz 1 | Definitive | 0.98 | Ubiquitous |
| ★★★ | TGFBR2 | Loeys-Dietz 2 | Definitive | 0.99 | Ubiquitous |
| ★★☆ | COL3A1 | EDS vascular | Definitive | 1.00 | Connective tissue |
| ★☆☆ | CBS | Homocystinuria | Definitive | 0.00 | Liver |
### 3.2 Panel Design Recommendation
**Minimum Panel** (high yield): FBN1, TGFBR1, TGFBR2, COL3A1
**Extended Panel** (+differential): Add CBS, SMAD3, ACTA2
**Testing Strategy**:
1. Start with FBN1 sequencing (highest pre-test probability)
2. If negative, proceed to full connective tissue panel
3. Consider WES if panel negative
*Source: ClinGen via gene-disease validity, GTEx expression*
```
---
## Phase 3.5: Expression & Tissue Context (ENHANCED)
### 3.5.1 Cell-Type Specific Expression (CELLxGENE)
```python
def get_cell_type_expression(tu, gene_symbol, affected_tissues):
"""Get single-cell expression to validate tissue relevance."""
# Get expression across cell types
expression = tu.tools.CELLxGENE_get_expression_data(
gene=gene_symbol,
tissue=affected_tissues[0] if affected_tissues else "all"
)
# Get cell type metadata
cell_metadata = tu.tools.CELLxGENE_get_cell_metadata(
gene=gene_symbol
)
# Identify high-expression cell types
high_expression = [
ct for ct in expression
if ct.get('mean_expression', 0) > 1.0 # TPM > 1
]
return {
'expression_data': expression,
'high_expression_cells': high_expression,
'total_cell_types': len(cell_metadata)
}
```
**Why it matters**: Confirms candidate genes are expressed in disease-relevant tissues/cells.
### 3.5.2 Regulatory Context (ChIPAtlas)
```python
def get_regulatory_context(tu, gene_symbol):
"""Get transcription factor binding for candidate genes."""
# Search for TF binding near gene
tf_binding = tu.tools.ChIPAtlas_enrichment_analysis(
gene=gene_symbol,
cell_type="all"
)
# Get specific binding peaks
peaks = tu.tools.ChIPAtlas_get_peak_data(
gene=gene_symbol,
experiment_type="TF"
)
return {
'transcription_factors': tf_binding,
'regulatory_peaks': peaks
}
```
**Why it matters**: Identifies regulatory mechanisms that may be disrupted in disease.
### 3.5.3 Output for Report
```markdown
## 3.5 Expression & Regulatory Context
### Cell-Type Specific Expression (CELLxGENE)
| Gene | Top Expressing Cell Types | Expression Level | Tissue Relevance |
|------|---------------------------|------------------|------------------|
| FBN1 | Fibroblasts, Smooth muscle | High (TPM=45) | ✓ Connective tissue |
| TGFBR1 | Endothelial, Fibroblasts | Medium (TPM=12) | ✓ Vascular |
| COL3A1 | Fibroblasts, Myofibroblasts | Very High (TPM=120) | ✓ Connective tissue |
**Interpretation**: All top candidate genes show high expression in disease-relevant cell types (connective tissue, vascular cells), supporting their candidacy.
### Regulatory Context (ChIPAtlas)
| Gene | Key TF Regulators | Regulatory Significance |
|------|-------------------|------------------------|
| FBN1 | TGFβ pathway (SMAD2/3), AP-1 | TGFβ-responsive |
| TGFBR1 | STAT3, NF-κB | Inflammation-responsive |
*Source: CELLxGENE Census, ChIPAtlas*
```
---
## Phase 3.6: Pathway Analysis (NEW)
### 3.6.1 KEGG Pathway Context
```python
def get_pathway_context(tu, gene_symbols):
"""Get pathway context for candidate genes."""
pathways = {}
for gene in gene_symbols:
# Search KEGG for gene
kegg_genes = tu.tools.kegg_find_genes(query=f"hsa:{gene}")
if kegg_genes:
# Get pathway membership
gene_info = tu.tools.kegg_get_gene_info(gene_id=kegg_genes[0]['id'])
pathways[gene] = gene_info.get('pathways', [])
return pathways
```
### 3.6.2 Protein-Protein Interactions (IntAct)
```python
def get_protein_interactions(tu, gene_symbol):
"""Get interaction partners for candidate genes."""
# Search IntAct for interactions
interactions = tu.tools.intact_search_interactions(
query=gene_symbol,
species="human"
)
# Get interaction network
network = tu.tools.intact_get_interaction_network(
gene=gene_symbol,
depth=1 # Direct interactors only
)
return {
'interactions': interactions,
'network': network,
'interactor_count': len(interactions)
}
```
### 3.6.3 Output for Report
```markdown
## 3.6 Pathway & Network Context
### KEGG Pathways
| Gene | Key Pathways | Biological Process |
|------|--------------|-------------------|
| FBN1 | ECM-receptor interaction (hsa04512) | Extracellular matrix |
| TGFBR1/2 | TGF-beta signaling (hsa04350) | Cell signaling |
| COL3A1 | Focal adhesion (hsa04510) | Cell-matrix adhesion |
### Shared Pathway Analysis
**Convergent pathways** (≥2 candidate genes):
- TGF-beta signaling pathway: FBN1, TGFBR1, TGFBR2, SMAD3
- ECM organization: FBN1, COL3A1
**Interpretation**: Candidate genes converge on TGF-beta signaling and extracellular matrix pathways, consistent with connective tissue disorder etiology.
### Protein-Protein Interactions (IntAct)
| Gene | Direct Interactors | Notable Partners |
|------|-------------------|------------------|
| FBN1 | 42 | LTBP1, TGFB1, ADAMTS10 |
| TGFBR1 | 68 | TGFBR2, SMAD2, SMAD3 |
*Source: KEGG, IntAct, Reactome*
```
---
## Phase 4: Variant Interpretation (If Provided)
### 4.1 ClinVar Lookup
```python
def interpret_variant(tu, variant_hgvs):
"""Get ClinVar interpretation for variant."""
result = tu.tools.ClinVar_search_variants(query=variant_hgvs)
return {
'clinvar_id': result.get('id'),
'classification': result.get('clinical_significance'),
'review_status': result.get('review_status'),
'conditions': result.get('conditions'),
'last_evaluated': result.get('last_evaluated')
}
```
### 4.2 Population Frequency
```python
def check_population_frequency(tu, variant_id):
"""Get gnomAD allele frequency."""
freq = tu.tools.gnomAD_get_variant_frequencies(variant_id=variant_id)
# Interpret rarity
if freq['allele_frequency'] < 0.00001:
rarity = "Ultra-rare"
elif freq['allele_frequency'] < 0.0001:
rarity = "Rare"
elif freq['allele_frequency'] < 0.01:
rarity = "Low frequency"
else:
rarity = "Common (likely benign)"
return freq, rarity
```
### 4.3 Computational Pathogenicity Prediction (ENHANCED)
Use state-of-the-art prediction tools for VUS interpretation:
```python
def comprehensive_vus_prediction(tu, variant_info):
"""
Combine multiple prediction tools for VUS classification.
Critical for rare disease variants not in ClinVar.
"""
predictions = {}
# 1. CADD - Deleteriousness (NEW API)
cadd = tu.tools.CADD_get_variant_score(
chrom=variant_info['chrom'],
pos=variant_info['pos'],
ref=variant_info['ref'],
alt=variant_info['alt'],
version="GRCh38-v1.7"
)
if cadd.get('status') == 'success':
predictions['cadd'] = {
'score': cadd['data'].get('phred_score'),
'interpretation': cadd['data'].get('interpretation'),
'acmg': 'PP3' if cadd['data'].get('phred_score', 0) >= 20 else 'neutral'
}
# 2. AlphaMissense - DeepMind pathogenicity (NEW)
if variant_info.get('uniprot_id') and variant_info.get('aa_change'):
am = tu.tools.AlphaMissense_get_variant_score(
uniprot_id=variant_info['uniprot_id'],
variant=variant_info['aa_change'] # e.g., "E1541K"
)
if am.get('status') == 'success' and am.get('data'):
classification = am['data'].get('classification')
predictions['alphamissense'] = {
'score': am['data'].get('pathogenicity_score'),
'classification': classification,
'acmg': 'PP3 (strong)' if classification == 'pathogenic' else (
'BP4 (strong)' if classification == 'benign' else 'neutral'
)
}
# 3. EVE - Evolutionary prediction (NEW)
eve = tu.tools.EVE_get_variant_score(
chrom=variant_info['chrom'],
pos=variant_info['pos'],
ref=variant_info['ref'],
alt=variant_info['alt']
)
if eve.get('status') == 'success':
eve_scores = eve['data'].get('eve_scores', [])
if eve_scores:
predictions['eve'] = {
'score': eve_scores[0].get('eve_score'),
'classification': eve_scores[0].get('classification'),
'acmg': 'PP3' if eve_scores[0].get('eve_score', 0) > 0.5 else 'BP4'
}
# 4. SpliceAI - Splice variant prediction (NEW)
# Use for intronic, synonymous, or exonic variants near splice sites
variant_str = f"chr{variant_info['chrom']}-{variant_info['pos']}-{variant_info['ref']}-{variant_info['alt']}"
splice = tu.tools.SpliceAI_predict_splice(
variant=variant_str,
genome="38"
)
if splice.get('data'):
max_score = splice['data'].get('max_delta_score', 0)
interpretation = splice['data'].get('interpretation', '')
if max_score >= 0.8:
splice_acmg = 'PP3 (strong) - high splice impact'
elif max_score >= 0.5:
splice_acmg = 'PP3 (moderate) - splice impact'
elif max_score >= 0.2:
splice_acmg = 'PP3 (supporting) - possible splice effect'
else:
splice_acmg = 'BP7 (if synonymous) - no splice impact'
predictions['spliceai'] = {
'max_delta_score': max_score,
'interpretation': interpretation,
'scores': splice['data'].get('scores', []),
'acmg': splice_acmg
}
# Consensus for PP3/BP4
damaging = sum(1 for p in predictions.values() if 'PP3' in p.get('acmg', ''))
benign = sum(1 for p in predictions.values() if 'BP4' in p.get('acmg', ''))
return {
'predictions': predictions,
'consensus': {
'damaging_count': damaging,
'benign_count': benign,
'pp3_applicable': damaging >= 2 and benign == 0,
'bp4_applicable': benign >= 2 and damaging == 0
}
}
```
### 4.4 ACMG Classification Criteria
| Evidence Type | Criteria | Weight |
|---------------|----------|--------|
| **PVS1** | Null variant in gene where LOF is mechanism | Very Strong |
| **PS1** | Same amino acid change as established pathogenic | Strong |
| **PM2** | Absent from population databases | Moderate |
| **PP3** | Computational evidence supports deleterious (AlphaMissense, CADD, EVE, SpliceAI) | Supporting |
| **BA1** | Allele frequency >5% | Benign standalone |
**Enhanced PP3 Evidence** (NEW):
- **AlphaMissense pathogenic** (>0.564) = Strong PP3 support (~90% accuracy)
- **CADD ≥20** + **EVE >0.5** = Multiple concordant predictions
- Agreement from 2+ predictors strengthens PP3 evidence
### 4.5 Output for Report
```markdown
## 4. Variant Interpretation
### 4.1 Variant: FBN1 c.4621G>A (p.Glu1541Lys)
| Property | Value | Interpretation |
|----------|-------|----------------|
| Gene | FBN1 | Marfan syndrome gene |
| Consequence | Missense | Amino acid change |
| ClinVar | VUS | Uncertain significance |
| gnomAD AF | 0.000004 | Ultra-rare (PM2) |
### 4.2 Computational Predictions (NEW)
| Predictor | Score | Classification | ACMG Support |
|-----------|-------|----------------|--------------|
| **AlphaMissense** | 0.78 | Pathogenic | PP3 (strong) |
| **CADD PHRED** | 28.5 | Top 0.1% deleterious | PP3 |
| **EVE** | 0.72 | Likely pathogenic | PP3 |
**Consensus**: 3/3 predictors concordant damaging → **Strong PP3 support**
*Source: AlphaMissense, CADD API, EVE via Ensembl VEP*
### 4.3 ACMG Evidence Summary
| Criterion | Evidence | Strength |
|-----------|----------|----------|
| PM2 | Absent from gnomAD (AF < 0.00001) | Moderate |
| PP3 | AlphaMissense + CADD + EVE concordant | Supporting (strong) |
| PP4 | Phenotype highly specific for Marfan | Supporting |
| PS4 | Multiple affected family members | Strong |
**Preliminary Classification**: Likely Pathogenic (1 Strong + 1 Moderate + 2 Supporting)
*Source: ClinVar, gnomAD, AlphaMissense, CADD, EVE*
```
---
## Phase 5: Structure Analysis for VUS
### 5.1 When to Perform Structure Analysis
Perform when:
- Variant is VUS or conflicting interpretations
- Missense variant in critical domain
- Novel variant not in databases
- Additional evidence needed for classification
### 5.2 Structure Prediction (NVIDIA NIM)
```python
def analyze_variant_structure(tu, protein_sequence, variant_position):
"""Predict structure and analyze variant impact."""
# Predict structure with AlphaFold2
structure = tu.tools.NvidiaNIM_alphafold2(
sequence=protein_sequence,
algorithm="mmseqs2",
relax_prediction=False
)
# Extract pLDDT at variant position
variant_plddt = get_residue_plddt(structure, variant_position)
# Check if in structured region
confidence = "High" if variant_plddt > 70 else "Low"
return {
'structure': structure,
'variant_plddt': variant_plddt,
'confidence': confidence
}
```
### 5.3 Domain Impact Assessment
```python
def assess_domain_impact(tu, uniprot_id, variant_position):
"""Check if variant affects functional domain."""
# Get domain annotations
domains = tu.tools.InterPro_get_protein_domains(accession=uniprot_id)
for domain in domains:
if domain['start'] <= variant_position <= domain['end']:
return {
'in_domain': True,
'domain_name': domain['name'],
'domain_function': domain['description']
}
return {'in_domain': False}
```
### 5.4 Output for Report
```markdown
## 5. Structural Analysis
### 5.1 Structure Prediction
**Method**: AlphaFold2 via NVIDIA NIM
**Protein**: Fibrillin-1 (FBN1)
**Sequence Length**: 2,871 amino acids
| Metric | Value | Interpretation |
|--------|-------|----------------|
| Mean pLDDT | 85.3 | High confidence overall |
| Variant position pLDDT | 92.1 | Very high confidence |
| Nearby domain | cbEGF-like domain 23 | Calcium-binding |
### 5.2 Variant Location Analysis
**Variant**: p.Glu1541Lys
| Feature | Finding | Impact |
|---------|---------|--------|
| Domain | cbEGF-like domain 23 | Critical for calcium binding |
| Conservation | 100% conserved across vertebrates | High constraint |
| Structural role | Calcium coordination residue | Likely destabilizing |
| Nearby pathogenic | p.Glu1540Lys (Pathogenic) | Adjacent residue |
### 5.3 Structural Interpretation
The variant p.Glu1541Lys:
1. **Located in cbEGF domain** - These domains are critical for fibrillin-1 function
2. **Glutamate → Lysine** - Charge reversal (negative to positive)
3. **Calcium binding** - Glutamate at this position coordinates Ca2+
4. **Adjacent pathogenic variant** - p.Glu1540Lys is classified Pathogenic
**Structural Evidence**: Strong support for pathogenicity (PM1 - critical domain)
*Source: NVIDIA NIM via `NvidiaNIM_alphafold2`, InterPro*
```
---
## Phase 6: Literature Evidence (NEW)
### 6.1 Published Literature (PubMed)
```python
def search_disease_literature(tu, disease_name, genes):
"""Search for relevant published literature."""
# Disease-specific search
disease_papers = tu.tools.PubMed_search_articles(
query=f'"{disease_name}" AND (genetics OR mutation OR variant)',
limit=20
)
# Gene-specific searches
gene_papers = []
for gene in genes[:5]: # Top 5 genes
papers = tu.tools.PubMed_search_articles(
query=f'"{gene}" AND rare disease AND pathogenic',
limit=10
)
gene_papers.extend(papers)
return {
'disease_literature': disease_papers,
'gene_literature': gene_papers
}
```
### 6.2 Preprint Literature (BioRxiv/MedRxiv)
```python
def search_preprints(tu, disease_name, genes):
"""Search preprints for cutting-edge findings."""
# BioRxiv search
biorxiv = tu.tools.BioRxiv_search_preprints(
query=f"{disease_name} genetics",
limit=10
)
# ArXiv for computational methods
arxiv = tu.tools.ArXiv_search_papers(
query=f"rare disease diagnosis {' OR '.join(genes[:3])}",
category="q-bio",
limit=5
)
return {
'biorxiv': biorxiv,
'arxiv': arxiv
}
```
### 6.3 Citation Analysis (OpenAlex)
```python
def analyze_citations(tu, key_papers):
"""Analyze citation network for key papers."""
citation_analysis = []
for paper in key_papers[:5]:
# Get citation data
work = tu.tools.openalex_search_works(
query=paper['title'],
limit=1
)
if work:
citation_analysis.append({
'title': paper['title'],
'citations': work[0].get('cited_by_count', 0),
'year': work[0].get('publication_year')
})
return citation_analysis
```
### 6.4 Output for Report
```markdown
## 6. Literature Evidence
### 6.1 Key Published Studies
| PMID | Title | Year | Citations | Relevance |
|------|-------|------|-----------|-----------|
| 32123456 | FBN1 variants in Marfan syndrome... | 2023 | 45 | Direct |
| 31987654 | TGF-beta signaling in connective... | 2022 | 89 | Pathway |
| 30876543 | Novel diagnostic criteria for... | 2021 | 156 | Diagnostic |
### 6.2 Recent Preprints (Not Yet Peer-Reviewed)
| Source | Title | Posted | Relevance |
|--------|-------|--------|-----------|
| BioRxiv | Novel FBN1 splice variant causes... | 2024-01 | Case report |
| MedRxiv | Machine learning for Marfan... | 2024-02 | Diagnostic |
**⚠️ Note**: Preprints have not undergone peer review. Use with caution.
### 6.3 Evidence Summary
| Evidence Type | Count | Strength |
|---------------|-------|----------|
| Case reports | 12 | Supporting |
| Functional studies | 5 | Strong |
| Clinical trials | 2 | Strong |
| Reviews | 8 | Context |
*Source: PubMed, BioRxiv, OpenAlex*
```
---
## Report Template
**File**: `[PATIENT_ID]_rare_disease_report.md`
```markdown
# Rare Disease Diagnostic Report
**Patient ID**: [ID] | **Date**: [Date] | **Status**: In Progress
---
## Executive Summary
[Researching...]
---
## 1. Phenotype Analysis
### 1.1 Standardized HPO Terms
[Researching...]
### 1.2 Key Clinical Features
[Researching...]
---
## 2. Differential Diagnosis
### 2.1 Ranked Candidate Diseases
[Researching...]
### 2.2 Disease Details
[Researching...]
---
## 3. Recommended Gene Panel
### 3.1 Prioritized Genes
[Researching...]
### 3.2 Testing Strategy
[Researching...]
---
## 4. Variant Interpretation (if applicable)
### 4.1 Variant Details
[Researching...]
### 4.2 ACMG Classification
[Researching...]
---
## 5. Structural Analysis (if applicable)
### 5.1 Structure Prediction
[Researching...]
### 5.2 Variant Impact
[Researching...]
---
## 6. Clinical Recommendations
### 6.1 Diagnostic Next Steps
[Researching...]
### 6.2 Specialist Referrals
[Researching...]
### 6.3 Family Screening
[Researching...]
---
## 7. Data Gaps & Limitations
[Researching...]
---
## 8. Data Sources
[Will be populated as research progresses...]
```
---
## Evidence Grading
| Tier | Symbol | Criteria | Example |
|------|--------|----------|---------|
| **T1** | ★★★ | Phenotype match >80% + gene match | Marfan with FBN1 mutation |
| **T2** | ★★☆ | Phenotype match 60-80% OR likely pathogenic variant | Good phenotype fit |
| **T3** | ★☆☆ | Phenotype match 40-60% OR VUS in candidate gene | Possible diagnosis |
| **T4** | ☆☆☆ | Phenotype <40% OR uncertain gene | Low probability |
---
## Completeness Checklist
### Phase 1: Phenotype
- [ ] All symptoms converted to HPO terms
- [ ] Core vs. variable features distinguished
- [ ] Age of onset documented
- [ ] Family history noted
### Phase 2: Disease Matching
- [ ] ≥5 candidate diseases identified (or all matching)
- [ ] Phenotype overlap % calculated
- [ ] Inheritance patterns noted
- [ ] ORPHA and OMIM IDs provided
### Phase 3: Gene Panel
- [ ] ≥5 genes prioritized (or all from top diseases)
- [ ] Evidence level for each gene (ClinGen)
- [ ] Expression validation performed
- [ ] Testing strategy recommended
### Phase 4: Variant Interpretation (if applicable)
- [ ] ClinVar classification retrieved
- [ ] gnomAD frequency checked
- [ ] ACMG criteria applied
- [ ] Classification justified
### Phase 5: Structure Analysis (if applicable)
- [ ] Structure predicted (if VUS)
- [ ] pLDDT confidence reported
- [ ] Domain impact assessed
- [ ] Structural evidence summarized
### Phase 6: Recommendations
- [ ] ≥3 next steps listed
- [ ] Specialist referrals suggested
- [ ] Family screening addressed
---
## Fallback Chains
| Primary Tool | Fallback 1 | Fallback 2 |
|--------------|------------|------------|
| `Orphanet_search_by_hpo` | `OMIM_search` | PubMed phenotype search |
| `ClinVar_get_variant` | `gnomAD_get_variant` | VEP annotation |
| `NvidiaNIM_alphafold2` | `alphafold_get_prediction` | UniProt features |
| `GTEx_expression` | `HPA_expression` | Tissue-specific literature |
| `gnomAD_get_variant` | `ExAC_frequencies` | 1000 Genomes |
---
## Tool Reference
See [TOOLS_REFERENCE.md](TOOLS_REFERENCE.md) for complete tool documentation.Related Skills
tooluniverse-target-research
Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.
tooluniverse-protein-therapeutic-design
Design novel protein therapeutics (binders, enzymes, scaffolds) using AI-guided de novo design. Uses RFdiffusion for backbone generation, ProteinMPNN for sequence design, ESMFold/AlphaFold2 for validation. Use when asked to design protein binders, therapeutic proteins, or engineer protein function.
tooluniverse-pharmacovigilance
Analyze drug safety signals from FDA adverse event reports, label warnings, and pharmacogenomic data. Calculates disproportionality measures (PRR, ROR), identifies serious adverse events, assesses pharmacogenomic risk variants. Use when asked about drug safety, adverse events, post-market surveillance, or risk-benefit assessment.
tooluniverse-network-pharmacology
Construct and analyze compound-target-disease networks for drug repurposing, polypharmacology discovery, and systems pharmacology. Builds multi-layer networks from ChEMBL, OpenTargets, STRING, DrugBank, Reactome, FAERS, and 60+ other ToolUniverse tools. Calculates Network Pharmacology Scores (0-100), identifies repurposing candidates, predicts mechanisms, and analyzes polypharmacology. Use when users ask about drug repurposing via network analysis, multi-target drug effects, compound-target-disease networks, systems pharmacology, or polypharmacology.
tooluniverse-drug-target-validation
Comprehensive computational validation of drug targets for early-stage drug discovery. Evaluates targets across 10 dimensions (disambiguation, disease association, druggability, chemical matter, clinical precedent, safety, pathway context, validation evidence, structural insights, validation roadmap) using 60+ ToolUniverse tools. Produces a quantitative Target Validation Score (0-100) with GO/NO-GO recommendation. Use when users ask about target validation, druggability assessment, target prioritization, or "is X a good drug target for Y?"
tooluniverse-drug-research
Generates comprehensive drug research reports with compound disambiguation, evidence grading, and mandatory completeness sections. Covers identity, chemistry, pharmacology, targets, clinical trials, safety, pharmacogenomics, and ADMET properties. Use when users ask about drugs, medications, therapeutics, or need drug profiling, safety assessment, or clinical development research.
tooluniverse-drug-repurposing
Identify drug repurposing candidates using ToolUniverse for target-based, compound-based, and disease-driven strategies. Searches existing drugs for new therapeutic indications by analyzing targets, bioactivity, safety profiles, and literature evidence. Use when exploring drug repurposing opportunities, finding new indications for approved drugs, or when users mention drug repositioning, off-label uses, or therapeutic alternatives.
tooluniverse-drug-drug-interaction
Comprehensive drug-drug interaction (DDI) prediction and risk assessment. Analyzes interaction mechanisms (CYP450, transporters, pharmacodynamic), severity classification, clinical evidence grading, and provides management strategies. Supports single drug pairs, polypharmacy analysis (3+ drugs), and alternative drug recommendations. Use when users ask about drug interactions, medication safety, polypharmacy risks, or need DDI assessment for clinical decision support.
tooluniverse-chemical-safety
Comprehensive chemical safety and toxicology assessment integrating ADMET-AI predictions, CTD toxicogenomics, FDA label safety data, DrugBank safety profiles, and STITCH chemical-protein interactions. Performs predictive toxicology (AMES, DILI, LD50, carcinogenicity), organ/system toxicity profiling, chemical-gene-disease relationship mapping, regulatory safety extraction, and environmental hazard assessment. Use when asked about chemical toxicity, drug safety profiling, ADMET properties, environmental health risks, chemical hazard assessment, or toxicogenomic analysis.
tooluniverse-chemical-compound-retrieval
Retrieves chemical compound information from PubChem and ChEMBL with disambiguation, cross-referencing, and quality assessment. Creates comprehensive compound profiles with identifiers, properties, bioactivity, and drug information. Use when users need chemical data, drug information, or mention PubChem CID, ChEMBL ID, SMILES, InChI, or compound names.
tooluniverse-binder-discovery
Discover novel small molecule binders for protein targets using structure-based and ligand-based approaches. Creates actionable reports with candidate compounds, ADMET profiles, and synthesis feasibility. Use when users ask to find small molecules for a target, identify novel binders, perform virtual screening, or need hit-to-lead compound identification.
tooluniverse-antibody-engineering
Comprehensive antibody engineering and optimization for therapeutic development. Covers humanization, affinity maturation, developability assessment, and immunogenicity prediction. Use when asked to optimize antibodies, humanize sequences, or engineer therapeutic antibodies from lead to clinical candidate.