tooluniverse-variant-interpretation

Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.

912 stars

Best use case

tooluniverse-variant-interpretation is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.

Teams using tooluniverse-variant-interpretation should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/tooluniverse-variant-interpretation/SKILL.md --create-dirs "https://raw.githubusercontent.com/wu-yc/LabClaw/main/skills/bio/tooluniverse-variant-interpretation/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/tooluniverse-variant-interpretation/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How tooluniverse-variant-interpretation Compares

Feature / Agenttooluniverse-variant-interpretationStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

---
name: tooluniverse-variant-interpretation
description: Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.
---

# Clinical Variant Interpreter

Systematic variant interpretation skill using ToolUniverse - from raw variant calls to ACMG-classified clinical recommendations with structural impact analysis.

---

## Problem This Skill Solves

Clinical labs and researchers face critical challenges in variant interpretation:

1. **Variant classification uncertainty** - VUS (Variants of Uncertain Significance) comprise 40-60% of clinical variants
2. **Evidence aggregation burden** - Must integrate data from 10+ databases per variant
3. **Structural context missing** - Traditional annotation ignores 3D protein impact
4. **Clinical actionability unclear** - How does classification translate to patient care?

**This skill provides**: A systematic workflow that combines population databases, functional predictions, structural analysis (via AlphaFold2), and literature evidence into ACMG-compliant interpretations with clear clinical recommendations.

---

## Key Principles

1. **ACMG-Guided Classification** - Follow ACMG/AMP 2015 guidelines with explicit evidence codes
2. **Structural Evidence Integration** - Use AlphaFold2 for novel structural impact analysis
3. **Population Context** - gnomAD frequencies with ancestry-specific data
4. **Gene-Disease Validity** - ClinGen curation status for clinical relevance
5. **Actionable Output** - Clear recommendations, not just classifications
6. **English-first queries** - Always use English terms in tool calls (gene names, variant descriptions, disease names), even if the user writes in another language. Only try original-language terms as a fallback. Respond in the user's language

---

## Triggers

Use this skill when users:
- Ask about variant interpretation or classification
- Have VCF data needing clinical annotation
- Ask "what does this variant mean clinically?"
- Need ACMG classification for variants
- Want structural impact analysis for missense variants
- Ask about pathogenicity of specific variants

---

## Workflow Overview

```
┌─────────────────────────────────────────────────────────────────┐
│                    VARIANT INTERPRETATION                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Phase 1: VARIANT IDENTITY                                       │
│  ├── Normalize variant notation (HGVS)                          │
│  ├── Map to gene, transcript, protein                           │
│  └── Get consequence type (missense, nonsense, etc.)            │
│                                                                  │
│  Phase 2: CLINICAL DATABASES                                     │
│  ├── ClinVar: Existing classifications                          │
│  ├── gnomAD: Population frequencies (all + ancestry)            │
│  ├── OMIM: Gene-disease associations                            │
│  ├── ClinGen: Gene validity + dosage sensitivity (ENHANCED)     │
│  │   └─ ClinGen_search_gene_validity, ClinGen_search_dosage     │
│  └── SpliceAI: Splice variant prediction (NEW)                  │
│                                                                  │
│  Phase 2.5: REGULATORY CONTEXT (NEW - for non-coding variants)  │
│  ├── ChIPAtlas: TF binding at position                          │
│  ├── ENCODE: Regulatory elements (enhancers, promoters)         │
│  ├── Conservation in regulatory regions                         │
│  └── Functional annotation of regulatory impact                 │
│                                                                  │
│  Phase 3: COMPUTATIONAL PREDICTIONS                              │
│  ├── SIFT/PolyPhen: Damaging predictions                        │
│  ├── CADD: Deleteriousness score                                │
│  ├── SpliceAI: Splice impact (if applicable)                    │
│  └── Conservation: Cross-species alignment                      │
│                                                                  │
│  Phase 4: STRUCTURAL ANALYSIS (for VUS/novel missense)          │
│  ├── Get protein structure (PDB or AlphaFold2)                  │
│  ├── Map variant to structure                                   │
│  ├── Assess domain/functional site impact                       │
│  └── Predict structural destabilization                         │
│                                                                  │
│  Phase 4.5: EXPRESSION CONTEXT (NEW)                            │
│  ├── CELLxGENE: Cell-type specific expression                   │
│  ├── Tissue relevance to phenotype                              │
│  └── Expression validation                                       │
│                                                                  │
│  Phase 5: LITERATURE EVIDENCE                                    │
│  ├── PubMed: Functional studies                                 │
│  ├── BioRxiv/MedRxiv: Recent preprints (NEW)                   │
│  ├── Case reports: Phenotype correlations                       │
│  └── Segregation data (if in literature)                        │
│                                                                  │
│  Phase 6: ACMG CLASSIFICATION                                    │
│  ├── Apply evidence codes (PVS1, PM2, PP3, etc.)               │
│  ├── Calculate classification                                   │
│  ├── Identify limiting factors                                  │
│  └── Generate clinical recommendations                          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
```

---

## Phase Details

### Phase 1: Variant Identity & Normalization

**Goal**: Standardize variant notation and determine molecular consequence

**Tools**:
| Tool | Purpose |
|------|---------|
| `myvariant_query` | Get variant annotations from MyVariant.info |
| `Ensembl_get_variant_info` | Variant effect predictor data |
| `NCBI_gene_search` | Gene information |

**Key Information to Capture**:
- HGVS notation (c. and p.)
- Gene symbol and Ensembl ID
- Transcript (canonical/MANE Select)
- Consequence type
- Amino acid change (for missense)
- Exon/intron location

### Phase 2: Clinical Database Queries

**Goal**: Aggregate existing clinical knowledge

**Tools**:
| Tool | Purpose | Key Data |
|------|---------|----------|
| `clinvar_search` | Existing classifications | Classification, review status, submissions |
| `gnomad_search` | Population frequency | AF, ancestry-specific AFs, homozygotes |
| `OMIM_search`, `OMIM_get_entry` | Gene-disease | Inheritance, phenotypes |
| `ClinGen_gene_validity` | Curation status | Gene-disease validity level |
| `COSMIC_search_mutations` | **Somatic mutations (NEW)** | Cancer frequency, histology |
| `DisGeNET_search_gene` | **Gene-disease associations (NEW)** | Evidence scores, sources |

### 2.1 COSMIC for Somatic Context (NEW)

For cancer variants, check COSMIC for somatic mutation frequency:

```python
def get_somatic_context(tu, gene_symbol, variant_aa):
    """Get somatic mutation context from COSMIC."""
    
    # Search for specific mutation
    cosmic = tu.tools.COSMIC_search_mutations(
        operation="search",
        terms=f"{gene_symbol} {variant_aa}",
        max_results=20,
        genome_build=38
    )
    
    # Get all gene mutations for context
    gene_mutations = tu.tools.COSMIC_get_mutations_by_gene(
        operation="get_by_gene",
        gene=gene_symbol,
        max_results=100
    )
    
    # Determine if it's a hotspot
    mutation_counts = Counter(m['MutationAA'] for m in gene_mutations.get('results', []))
    is_hotspot = variant_aa in [m[0] for m in mutation_counts.most_common(10)]
    
    return {
        'cosmic_hits': cosmic.get('results', []),
        'is_somatic_hotspot': is_hotspot,
        'cancer_types': [m['PrimarySite'] for m in cosmic.get('results', [])],
        'total_cosmic_count': cosmic.get('total_count', 0)
    }
```

### 2.2 OMIM Gene-Disease Context (NEW)

```python
def get_omim_context(tu, gene_symbol):
    """Get OMIM gene-disease associations."""
    
    # Search OMIM for gene
    search = tu.tools.OMIM_search(
        operation="search",
        query=gene_symbol,
        limit=5
    )
    
    omim_data = []
    for entry in search.get('data', {}).get('entries', []):
        mim = entry.get('mimNumber')
        
        # Get detailed entry
        details = tu.tools.OMIM_get_entry(
            operation="get_entry",
            mim_number=str(mim)
        )
        
        # Get clinical synopsis
        synopsis = tu.tools.OMIM_get_clinical_synopsis(
            operation="get_clinical_synopsis",
            mim_number=str(mim)
        )
        
        omim_data.append({
            'mim_number': mim,
            'title': details.get('data', {}).get('titles', {}),
            'inheritance': synopsis.get('data', {}).get('inheritance'),
            'clinical_features': synopsis.get('data', {})
        })
    
    return omim_data
```

### 2.3 DisGeNET Gene-Disease Evidence (NEW)

```python
def get_disgenet_context(tu, gene_symbol, variant_rsid=None):
    """Get gene-disease associations from DisGeNET."""
    
    # Gene-disease associations
    gda = tu.tools.DisGeNET_search_gene(
        operation="search_gene",
        gene=gene_symbol,
        limit=20
    )
    
    # Variant-disease associations (if rsID available)
    vda = None
    if variant_rsid:
        vda = tu.tools.DisGeNET_get_vda(
            operation="get_vda",
            variant=variant_rsid,
            limit=20
        )
    
    return {
        'gene_associations': gda.get('data', {}).get('associations', []),
        'variant_associations': vda.get('data', {}).get('associations', []) if vda else []
    }
```

### 2.4 ClinGen Gene Validity & Dosage Sensitivity (NEW)

ClinGen provides authoritative curation of gene-disease relationships:

```python
def get_clingen_evidence(tu, gene_symbol):
    """
    Get ClinGen gene validity and dosage sensitivity data.
    CRITICAL for ACMG classification - establishes gene-disease validity.
    """
    
    # 1. Gene-disease validity (Definitive/Strong/Moderate/Limited)
    validity = tu.tools.ClinGen_search_gene_validity(gene=gene_symbol)
    
    validity_data = []
    if validity.get('data'):
        for entry in validity.get('data', []):
            validity_data.append({
                'disease': entry.get('Disease Label'),
                'classification': entry.get('Classification'),  # Definitive, Strong, etc.
                'inheritance': entry.get('Inheritance'),
                'mondo_id': entry.get('Disease ID (MONDO)')
            })
    
    # 2. Dosage sensitivity (haploinsufficiency, triplosensitivity)
    dosage = tu.tools.ClinGen_search_dosage_sensitivity(gene=gene_symbol)
    
    dosage_data = {}
    if dosage.get('data'):
        for entry in dosage.get('data', []):
            dosage_data = {
                'haploinsufficiency_score': entry.get('Haploinsufficiency Score'),
                'triplosensitivity_score': entry.get('Triplosensitivity Score'),
                'disease': entry.get('Disease')
            }
            break  # Usually one entry per gene
    
    # 3. Clinical actionability (for incidental findings context)
    actionability = tu.tools.ClinGen_search_actionability(gene=gene_symbol)
    
    return {
        'gene_validity': validity_data,
        'dosage_sensitivity': dosage_data,
        'actionability': actionability.get('data', {}),
        'has_definitive_validity': any(v['classification'] == 'Definitive' for v in validity_data),
        'is_haploinsufficient': dosage_data.get('haploinsufficiency_score') == '3'
    }
```

**ClinGen Validity Levels** (for ACMG PM1/PP4):
| Classification | Meaning | ACMG Impact |
|----------------|---------|-------------|
| **Definitive** | Multiple concordant studies | Strong gene-disease support |
| **Strong** | Extensive evidence | Moderate-strong support |
| **Moderate** | Some evidence | Moderate support |
| **Limited** | Minimal evidence | Weak support, use caution |
| **Disputed** | Conflicting evidence | Do not use for classification |
| **Refuted** | Evidence against | Gene NOT associated |

**Dosage Sensitivity Scores** (for CNV interpretation):
| Score | Meaning | Interpretation |
|-------|---------|----------------|
| **3** | Sufficient evidence | Haploinsufficiency/triplosensitivity established |
| **2** | Emerging evidence | Some support, not definitive |
| **1** | Little evidence | Minimal support |
| **0** | No evidence | Unknown |

### 2.5 SpliceAI Splice Variant Prediction (NEW)

~15% of pathogenic variants affect splicing. SpliceAI is the gold standard for splice prediction:

```python
def get_spliceai_prediction(tu, chrom, pos, ref, alt, genome="38"):
    """
    Get SpliceAI splice effect predictions.
    
    Delta scores:
    - DS_AG: Acceptor gain
    - DS_AL: Acceptor loss  
    - DS_DG: Donor gain
    - DS_DL: Donor loss
    
    Thresholds:
    - ≥0.8: High pathogenicity (strong PP3)
    - 0.5-0.8: Moderate (supporting PP3)
    - 0.2-0.5: Low (weak evidence)
    - <0.2: Likely benign
    """
    
    # Format variant for SpliceAI
    variant = f"chr{chrom}-{pos}-{ref}-{alt}"
    
    # Get full splice predictions
    result = tu.tools.SpliceAI_predict_splice(
        variant=variant,
        genome=genome
    )
    
    if result.get('data'):
        max_score = result['data'].get('max_delta_score', 0)
        interpretation = result['data'].get('interpretation', '')
        
        # Determine ACMG support
        if max_score >= 0.8:
            acmg = 'PP3 (strong) - high splice impact'
        elif max_score >= 0.5:
            acmg = 'PP3 (supporting) - moderate splice impact'
        elif max_score >= 0.2:
            acmg = 'PP3 (weak) - possible splice impact'
        else:
            acmg = 'BP7 (if synonymous) - splice benign'
        
        return {
            'max_delta_score': max_score,
            'interpretation': interpretation,
            'acmg_support': acmg,
            'scores': result['data'].get('scores', [])
        }
    return None

def quick_splice_check(tu, variant, genome="38"):
    """Quick triage using max delta score only."""
    
    result = tu.tools.SpliceAI_get_max_delta(
        variant=variant,
        genome=genome
    )
    
    return result.get('data', {})
```

**When to Use SpliceAI**:
- **Intronic variants** near splice sites (±50bp)
- **Synonymous variants** (may still affect splicing)
- **Exonic variants** near splice junctions
- **Variants creating cryptic splice sites**

**Report Section for Splice Variants**:
```markdown
### Splice Impact Analysis (SpliceAI)

| Score Type | Value | Position | Interpretation |
|------------|-------|----------|----------------|
| DS_AG | 0.02 | +15 | Acceptor gain unlikely |
| DS_AL | 0.85 | -2 | **High acceptor loss** |
| DS_DG | 0.01 | +8 | Donor gain unlikely |
| DS_DL | 0.03 | +1 | Donor loss unlikely |

**Max Delta Score**: 0.85 (DS_AL)
**Interpretation**: High impact - likely disrupts acceptor site
**ACMG Support**: PP3 (strong) for splice-altering effect

*Source: SpliceAI via `SpliceAI_predict_splice`*
```

**ClinVar Classification Map**:
| ClinVar | Interpretation |
|---------|----------------|
| Pathogenic | Disease-causing |
| Likely pathogenic | 90%+ confidence pathogenic |
| VUS | Uncertain significance |
| Likely benign | 90%+ confidence benign |
| Benign | Not disease-causing |
| Conflicting | Multiple interpretations |

**gnomAD Thresholds (for rare disease)**:
| Frequency | ACMG Code | Interpretation |
|-----------|-----------|----------------|
| Absent | PM2_Supporting | Absent from controls |
| <0.00001 | PM2_Supporting | Extremely rare |
| <0.0001 | - | Rare (use with caution) |
| >0.01 | BS1/BA1 | Too common for rare disease |

**COSMIC Somatic Evidence (NEW)**:
| COSMIC Finding | Interpretation | ACMG Support |
|----------------|----------------|--------------|
| Recurrent hotspot (>100 samples) | Known oncogenic driver | PS3 (functional) |
| Moderate frequency (10-100) | Likely oncogenic | PM1 (hotspot) |
| Rare somatic (<10) | Unknown significance | No support |

**DisGeNET Score Interpretation (NEW)**:
| GDA Score | Evidence Level | ACMG Support |
|-----------|----------------|--------------|
| >0.7 | Strong | PP4 (phenotype) |
| 0.4-0.7 | Moderate | Supporting |
| <0.4 | Weak | Insufficient |

### Phase 2.5: Regulatory Context (NEW - for Non-Coding Variants)

**Goal**: Assess regulatory impact for non-coding, intronic, and promoter variants

**When to Apply**:
- Intronic variants (not splice site)
- Promoter variants
- 5'UTR / 3'UTR variants
- Intergenic variants near disease genes

**Tools**:
| Tool | Purpose | Key Data |
|------|---------|----------|
| `ChIPAtlas_enrichment_analysis` | TF binding at position | Bound TFs, cell types |
| `ChIPAtlas_get_peak_data` | ChIP-seq peaks | Peak coordinates, scores |
| `ENCODE_search_experiments` | Regulatory elements | Enhancers, promoters, DHS |
| `ENCODE_get_experiment` | Experiment details | Assay type, targets |

**Regulatory Impact Assessment**:

```python
def assess_regulatory_impact(tu, variant_position, gene_symbol):
    """Assess regulatory impact of non-coding variant."""
    
    # Check TF binding at position
    tf_binding = tu.tools.ChIPAtlas_enrichment_analysis(
        gene=gene_symbol,
        cell_type="all"
    )
    
    # Get ChIP-seq peaks overlapping variant
    peaks = tu.tools.ChIPAtlas_get_peak_data(
        gene=gene_symbol,
        experiment_type="TF"
    )
    
    # Search ENCODE for regulatory annotations
    encode_data = tu.tools.ENCODE_search_experiments(
        assay_title="ATAC-seq",
        biosample="all"
    )
    
    # Assess if variant disrupts TF binding
    binding_disrupted = check_motif_disruption(variant_position, peaks)
    
    return {
        'tf_binding': tf_binding,
        'regulatory_peaks': peaks,
        'encode_annotations': encode_data,
        'likely_regulatory': binding_disrupted
    }
```

**Regulatory Impact Categories**:
| Category | Criteria | ACMG Support |
|----------|----------|--------------|
| **High impact** | Disrupts known TF binding motif | PP3 (supporting) |
| **Moderate impact** | In active regulatory region | Consider context |
| **Low impact** | No regulatory annotation | No support |

**Output for Report**:
```markdown
### 2.5 Regulatory Context (for Non-Coding Variants)

| Feature | Finding | Significance |
|---------|---------|--------------|
| Variant location | Intron 5, 120bp from exon 6 | Not canonical splice |
| TF binding site | CTCF binding peak (ChIPAtlas) | May affect insulation |
| ENCODE annotation | Active enhancer (H3K27ac) | Regulatory function |
| Conservation | PhyloP = 2.8 | Moderate conservation |

**Regulatory Interpretation**: Variant overlaps CTCF binding site in active enhancer region. Potential impact on gene regulation.

*Source: ChIPAtlas, ENCODE*
```

### Phase 3: Computational Predictions (ENHANCED)

**Goal**: Assess in silico pathogenicity predictions using state-of-the-art models

**Tools**:
| Tool | Purpose | Score Range |
|------|---------|-------------|
| `CADD_get_variant_score` | **Deleteriousness score (NEW API)** | PHRED 0-99 |
| `AlphaMissense_get_variant_score` | **DeepMind pathogenicity (NEW)** | 0-1 |
| `EVE_get_variant_score` | **Evolutionary pathogenicity (NEW)** | 0-1 |
| `myvariant_query` | Aggregated predictions | SIFT, PolyPhen |
| `Ensembl_get_variant_info` | VEP predictions | SIFT, PolyPhen |

### 3.1 CADD Deleteriousness Scoring (NEW)

```python
def get_cadd_score(tu, chrom, pos, ref, alt):
    """Get CADD deleteriousness score for a variant."""
    
    result = tu.tools.CADD_get_variant_score(
        chrom=str(chrom),
        pos=pos,
        ref=ref,
        alt=alt,
        version="GRCh38-v1.7"
    )
    
    if result.get('status') == 'success':
        phred = result['data'].get('phred_score')
        return {
            'score': phred,
            'interpretation': result['data'].get('interpretation'),
            'acmg_support': 'PP3' if phred >= 20 else ('BP4' if phred < 15 else 'neutral')
        }
    return None
```

### 3.2 AlphaMissense Pathogenicity (NEW)

DeepMind's AlphaMissense provides state-of-the-art missense pathogenicity prediction:

```python
def get_alphamissense_score(tu, uniprot_id, variant):
    """
    Get AlphaMissense pathogenicity score.
    variant format: 'R123H' or 'p.R123H'
    
    Thresholds:
    - Pathogenic: score > 0.564
    - Ambiguous: 0.34-0.564
    - Benign: score < 0.34
    """
    
    result = tu.tools.AlphaMissense_get_variant_score(
        uniprot_id=uniprot_id,
        variant=variant
    )
    
    if result.get('status') == 'success' and result.get('data'):
        score = result['data'].get('pathogenicity_score')
        classification = result['data'].get('classification')
        
        # Map to ACMG
        if classification == 'pathogenic':
            acmg = 'PP3 (strong)'  # AlphaMissense has high accuracy
        elif classification == 'benign':
            acmg = 'BP4 (strong)'
        else:
            acmg = 'neutral'
        
        return {
            'score': score,
            'classification': classification,
            'acmg_support': acmg
        }
    return None
```

### 3.3 EVE Evolutionary Prediction (NEW)

EVE uses unsupervised learning on evolutionary data:

```python
def get_eve_score(tu, chrom, pos, ref, alt):
    """
    Get EVE evolutionary pathogenicity score.
    
    Threshold: >0.5 indicates likely pathogenic
    """
    
    result = tu.tools.EVE_get_variant_score(
        chrom=str(chrom),
        pos=pos,
        ref=ref,
        alt=alt
    )
    
    if result.get('status') == 'success':
        eve_scores = result['data'].get('eve_scores', [])
        if eve_scores:
            best_score = eve_scores[0]
            return {
                'score': best_score.get('eve_score'),
                'classification': best_score.get('classification'),
                'gene': best_score.get('gene_symbol'),
                'acmg_support': 'PP3' if best_score.get('eve_score', 0) > 0.5 else 'BP4'
            }
    return None
```

### 3.4 Integrated Prediction Strategy

**For VUS (Variants of Uncertain Significance)**, combine multiple predictors:

```python
def comprehensive_pathogenicity_assessment(tu, variant_info):
    """
    Combine all prediction tools for robust classification.
    """
    chrom = variant_info['chrom']
    pos = variant_info['pos']
    ref = variant_info['ref']
    alt = variant_info['alt']
    uniprot_id = variant_info.get('uniprot_id')
    aa_change = variant_info.get('aa_change')  # e.g., 'R123H'
    
    predictions = {}
    
    # 1. CADD (works for all variant types)
    cadd = get_cadd_score(tu, chrom, pos, ref, alt)
    if cadd:
        predictions['cadd'] = cadd
    
    # 2. AlphaMissense (missense only, requires UniProt ID)
    if uniprot_id and aa_change:
        am = get_alphamissense_score(tu, uniprot_id, aa_change)
        if am:
            predictions['alphamissense'] = am
    
    # 3. EVE (missense only)
    eve = get_eve_score(tu, chrom, pos, ref, alt)
    if eve:
        predictions['eve'] = eve
    
    # Consensus assessment
    damaging_count = sum(1 for p in predictions.values() 
                         if 'PP3' in p.get('acmg_support', ''))
    benign_count = sum(1 for p in predictions.values() 
                       if 'BP4' in p.get('acmg_support', ''))
    
    if damaging_count >= 2 and benign_count == 0:
        consensus = 'likely_damaging'
        acmg = 'PP3 (multiple predictors concordant)'
    elif benign_count >= 2 and damaging_count == 0:
        consensus = 'likely_benign'
        acmg = 'BP4 (multiple predictors concordant)'
    else:
        consensus = 'uncertain'
        acmg = 'neutral (discordant predictions)'
    
    return {
        'predictions': predictions,
        'consensus': consensus,
        'acmg_recommendation': acmg
    }
```

**Prediction Interpretation** (Updated):
| Predictor | Damaging | Benign |
|-----------|----------|--------|
| **AlphaMissense** | >0.564 | <0.34 |
| **CADD PHRED** | ≥20 (top 1%) | <15 |
| **EVE** | >0.5 | ≤0.5 |
| SIFT | <0.05 | ≥0.05 |
| PolyPhen2 | >0.85 (probably) | <0.15 (benign) |

**ACMG Application** (Enhanced):
- **PP3**: Multiple concordant damaging predictions (AlphaMissense + CADD + EVE agreement = strong PP3)
- **BP4**: Multiple concordant benign predictions
- **Note**: AlphaMissense alone achieves ~90% accuracy on ClinVar pathogenic variants

### Phase 4: Structural Analysis

**Goal**: Assess protein structural impact (especially for VUS)

**Tools**:
| Tool | Purpose |
|------|---------|
| `PDB_search_by_uniprot` | Find experimental structures |
| `NvidiaNIM_alphafold2` | Predict structure if no PDB |
| `alphafold_get_prediction` | Get AlphaFold DB structure |
| `InterPro_get_protein_domains` | Domain annotations |
| `UniProt_get_protein_function` | Functional sites |

**Structural Impact Categories**:

| Impact Level | Description | ACMG Support |
|--------------|-------------|--------------|
| **Critical** | Active site, catalytic residue | PM1 (strong) |
| **High** | Buried residue, disulfide, structural core | PM1 (moderate) |
| **Moderate** | Domain interface, binding site | PM1 (supporting) |
| **Low** | Surface, flexible region | No support |

**Using AlphaFold2 for VUS**:
```
1. Get wildtype structure (PDB or AlphaFold)
2. Identify residue location:
   - pLDDT at position (confidence)
   - Solvent accessibility
   - Secondary structure
3. Assess structural context:
   - Distance to functional sites
   - Interaction partners
   - Conservation in structure
4. Predict impact:
   - Side chain burial
   - Hydrogen bond disruption
   - Charge changes in buried positions
```

### Phase 4.5: Expression Context (NEW)

**Goal**: Validate gene expression in disease-relevant tissues/cells

**Tools**:
| Tool | Purpose | Key Data |
|------|---------|----------|
| `CELLxGENE_get_expression_data` | Cell-type specific expression | TPM per cell type |
| `CELLxGENE_get_cell_metadata` | Cell type annotations | Tissue, disease state |
| `GTEx_get_median_gene_expression` | Tissue expression | TPM per tissue |

**Expression Validation**:

```python
def validate_expression_context(tu, gene_symbol, phenotype_tissues):
    """Validate gene is expressed in phenotype-relevant tissues."""
    
    # Single-cell expression
    sc_expression = tu.tools.CELLxGENE_get_expression_data(
        gene=gene_symbol,
        tissue=phenotype_tissues[0] if phenotype_tissues else "all"
    )
    
    # Bulk tissue expression (GTEx)
    gtex = tu.tools.GTEx_get_median_gene_expression(
        gene=gene_symbol
    )
    
    # Check expression in relevant tissues
    relevant_expression = {
        tissue: gtex.get(tissue, 0)
        for tissue in phenotype_tissues
    }
    
    return {
        'single_cell': sc_expression,
        'gtex': relevant_expression,
        'expressed_in_phenotype_tissue': any(v > 1 for v in relevant_expression.values())
    }
```

**Why it matters**:
- Confirms gene is expressed where disease manifests
- Supports PP4 (phenotype-specific) if highly restricted expression
- Can challenge classification if not expressed in affected tissue

**Output for Report**:
```markdown
### 4.5 Expression Context

| Tissue | Expression (TPM) | Relevance |
|--------|------------------|-----------|
| Heart | 45.2 | ✓ Primary disease tissue |
| Skeletal muscle | 38.7 | ✓ Secondary involvement |
| Liver | 2.1 | Low expression |
| Brain | 0.5 | Not expressed |

**Single-Cell Analysis (CELLxGENE)**:
- **Cardiomyocytes**: High expression (TPM=85)
- **Cardiac fibroblasts**: Low expression (TPM=5)

**Interpretation**: Gene highly expressed in cardiomyocytes, supporting cardiac phenotype association.

*Source: GTEx, CELLxGENE Census*
```

### Phase 5: Literature Evidence (ENHANCED)

**Goal**: Find functional studies, case reports, and cutting-edge preprints

**Tools**:
| Tool | Purpose | Coverage |
|------|---------|----------|
| `PubMed_search` | Peer-reviewed studies | Comprehensive |
| `EuropePMC_search` | Additional literature | Europe PMC |
| `BioRxiv_search_preprints` | Biology preprints | Recent findings |
| `MedRxiv_search_preprints` | Clinical preprints | Clinical studies |
| `openalex_search_works` | Citation analysis | Impact metrics |
| `SemanticScholar_search_papers` | AI-ranked search | Relevance |

**Search Strategies**:
```python
def comprehensive_literature_search(tu, gene, variant, phenotype):
    """Search across all literature sources."""
    
    # 1. PubMed: Peer-reviewed
    pubmed = tu.tools.PubMed_search(
        query=f'"{gene}" AND ("{variant}" OR functional)',
        max_results=30
    )
    
    # 2. BioRxiv: Recent preprints
    biorxiv = tu.tools.BioRxiv_search_preprints(
        query=f"{gene} {phenotype}",
        limit=10
    )
    
    # 3. MedRxiv: Clinical preprints
    medrxiv = tu.tools.MedRxiv_search_preprints(
        query=f"{gene} variant {phenotype}",
        limit=10
    )
    
    # 4. Citation analysis
    key_papers = pubmed[:5]  # Top papers
    for paper in key_papers:
        citations = tu.tools.openalex_search_works(
            query=paper['title'],
            limit=1
        )
        paper['citation_count'] = citations[0].get('cited_by_count', 0) if citations else 0
    
    return {
        'pubmed': pubmed,
        'preprints': biorxiv + medrxiv,
        'key_papers_with_citations': key_papers
    }
```

**Search Queries**:
```
# Gene + variant specific
"{GENE} AND ({HGVS_p} OR {AA_change})"

# Functional studies
"{GENE} AND (functional OR functional study OR mutagenesis)"

# Clinical reports
"{GENE} AND (case report OR patient) AND {phenotype}"

# Preprint-specific
"{GENE} genetics 2024" (for recent preprints)
```

**⚠️ Preprint Warning**: Always flag preprints as NOT peer-reviewed in reports.

**Evidence Types**:
| Evidence | ACMG Code | Weight |
|----------|-----------|--------|
| Functional study (null) | PS3 | Strong |
| Functional study (reduced) | PS3_Moderate | Moderate |
| Case reports with segregation | PP1 | Supporting to Moderate |
| Co-occurrence with pathogenic | BP2 | Supporting against |

### Phase 6: ACMG Classification

**Goal**: Systematic classification with explicit evidence

**ACMG Evidence Codes**:

**Pathogenic**:
| Code | Strength | Description |
|------|----------|-------------|
| PVS1 | Very Strong | Null variant in gene where LOF is mechanism |
| PS1 | Strong | Same amino acid change as known pathogenic |
| PS3 | Strong | Well-established functional studies |
| PM1 | Moderate | Mutational hot spot / functional domain |
| PM2 | Moderate | Absent from controls |
| PM5 | Moderate | Different missense at same residue as pathogenic |
| PP3 | Supporting | Multiple computational predictions |
| PP5 | Supporting | Reputable source reports pathogenic |

**Benign**:
| Code | Strength | Description |
|------|----------|-------------|
| BA1 | Stand-alone | MAF >5% |
| BS1 | Strong | MAF greater than expected |
| BS3 | Strong | Functional studies show no effect |
| BP4 | Supporting | Multiple computational predictions benign |
| BP7 | Supporting | Synonymous with no splice impact |

**Classification Algorithm**:
| Classification | Evidence Required |
|----------------|-------------------|
| Pathogenic | 1 Very Strong + 1 Strong; OR 2 Strong; OR 1 Strong + 3 Moderate |
| Likely Pathogenic | 1 Very Strong + 1 Moderate; OR 1 Strong + 2 Moderate; OR 1 Strong + 2 Supporting |
| Likely Benign | 1 Strong + 1 Supporting; OR 2 Supporting |
| Benign | 1 Stand-alone; OR 2 Strong |
| VUS | Criteria not met |

---

## Output Structure

### Report Sections

```markdown
# Variant Interpretation Report: {GENE} {VARIANT}

## Executive Summary
- **Variant**: {HGVS notation}
- **Gene**: {gene symbol}
- **Classification**: {Pathogenic/Likely Pathogenic/VUS/Likely Benign/Benign}
- **Evidence Strength**: {strong/moderate/limited}
- **Key Finding**: {one-sentence summary}

## 1. Variant Identity
{gene, transcript, protein change, consequence}

## 2. Population Data
{gnomAD frequencies, ancestry breakdown}

## 3. Clinical Database Evidence
{ClinVar, ClinGen, OMIM}

## 4. Computational Predictions
{SIFT, PolyPhen, CADD scores}

## 5. Structural Analysis
{Domain location, functional site proximity, AlphaFold confidence}

## 6. Literature Evidence
{Functional studies, case reports}

## 7. ACMG Classification
{Evidence codes applied, classification rationale}

## 8. Clinical Recommendations
{Testing, management, family screening}

## 9. Limitations & Uncertainties
{Missing data, conflicting evidence}

## Data Sources
{All tools and databases queried}
```

---

## Evidence Grading

### Classification Confidence

| Symbol | Classification | Evidence Level |
|--------|----------------|----------------|
| ★★★ | High confidence | Multiple independent lines |
| ★★☆ | Moderate confidence | Some supporting evidence |
| ★☆☆ | Limited confidence | Minimal evidence |
| VUS | Uncertain | Insufficient data |

### Structural Impact Confidence

| pLDDT Range | Interpretation |
|-------------|----------------|
| >90 | Very high confidence in position |
| 70-90 | High confidence |
| 50-70 | Moderate (often loops) |
| <50 | Low confidence (disorder) |

---

## Special Scenarios

### Scenario 1: Novel Missense VUS

**Additional workflow**:
1. Check if other pathogenic variants at same residue
2. Get AlphaFold2 structure
3. Analyze:
   - Is residue buried or surface?
   - What secondary structure?
   - Proximity to active/binding sites?
   - Conservation across species?
4. Apply PM1 if in functional domain
5. Apply PP3 if predictions concordant

### Scenario 2: Truncating Variant

**Additional workflow**:
1. Check if LOF is mechanism for gene
2. Determine if escapes NMD (last exon)
3. Check for alternative isoforms
4. Review ClinGen LOF curation

**PVS1 Application**:
| Scenario | PVS1 Strength |
|----------|---------------|
| Canonical LOF gene, NMD predicted | Very Strong |
| LOF gene, last exon | Moderate |
| Non-LOF gene | Not applicable |

### Scenario 3: Splice Variant

**Additional workflow**:
1. Check SpliceAI scores (if available)
2. Determine canonical splice site distance
3. Review for in-frame skipping potential
4. Check for cryptic splice activation

---

## Quantified Minimums

| Section | Requirement |
|---------|-------------|
| Population frequency | gnomAD overall + ≥3 ancestry groups |
| Predictions | ≥3 computational predictors |
| Literature search | ≥2 search strategies |
| ACMG codes | All applicable codes listed |

---

## NVIDIA NIM Integration

### When to Use AlphaFold2 for Variants

**Use Case**: VUS missense variants where structural context aids interpretation

**Workflow**:
```python
# 1. Get protein sequence
protein_seq = tu.tools.UniProt_get_protein_sequence(accession=uniprot_id)

# 2. Get/predict structure
try:
    pdb_hits = tu.tools.PDB_search_by_uniprot(uniprot_id=uniprot_id)
    structure = tu.tools.PDB_get_structure(pdb_id=pdb_hits[0]['pdb_id'])
except:
    # Predict with AlphaFold2
    structure = tu.tools.NvidiaNIM_alphafold2(
        sequence=protein_seq['sequence'],
        algorithm="mmseqs2"
    )

# 3. Analyze variant position
# - Extract pLDDT at residue position
# - Calculate solvent accessibility
# - Check for nearby functional sites
```

**Structural Features to Report**:
- pLDDT at variant position
- Secondary structure (helix/sheet/coil)
- Solvent accessibility (buried/exposed)
- Distance to active site (if applicable)
- Interactions disrupted (H-bonds, salt bridges)

---

## Report File Naming

```
{GENE}_{VARIANT}_interpretation_report.md

Examples:
BRCA1_c.5266dupC_interpretation_report.md
TP53_p.R273H_interpretation_report.md
```

---

## Clinical Recommendations Framework

### For Pathogenic/Likely Pathogenic

| Disease Context | Recommendations |
|-----------------|-----------------|
| Cancer predisposition | Enhanced screening, risk-reducing options |
| Pharmacogenomics | Drug dosing adjustment |
| Carrier status | Reproductive counseling |
| Predictive testing | Family cascade screening |

### For VUS

| Action | Details |
|--------|---------|
| Clinical management | Do not use for medical decisions |
| Follow-up | Reinterpret in 1-2 years |
| Research | Functional studies if available |
| Family | Segregation data valuable |

### For Benign/Likely Benign

| Action | Details |
|--------|---------|
| Clinical | Not expected to cause disease |
| Family | No cascade testing needed |
| Documentation | Include in report for completeness |

---

## See Also

- `CHECKLIST.md` - Pre-delivery verification
- `EXAMPLES.md` - Sample interpretations
- `TOOLS_REFERENCE.md` - Tool parameters and fallbacks

Related Skills

tooluniverse-target-research

912
from wu-yc/LabClaw

Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.

tooluniverse-protein-therapeutic-design

912
from wu-yc/LabClaw

Design novel protein therapeutics (binders, enzymes, scaffolds) using AI-guided de novo design. Uses RFdiffusion for backbone generation, ProteinMPNN for sequence design, ESMFold/AlphaFold2 for validation. Use when asked to design protein binders, therapeutic proteins, or engineer protein function.

tooluniverse-pharmacovigilance

912
from wu-yc/LabClaw

Analyze drug safety signals from FDA adverse event reports, label warnings, and pharmacogenomic data. Calculates disproportionality measures (PRR, ROR), identifies serious adverse events, assesses pharmacogenomic risk variants. Use when asked about drug safety, adverse events, post-market surveillance, or risk-benefit assessment.

tooluniverse-network-pharmacology

912
from wu-yc/LabClaw

Construct and analyze compound-target-disease networks for drug repurposing, polypharmacology discovery, and systems pharmacology. Builds multi-layer networks from ChEMBL, OpenTargets, STRING, DrugBank, Reactome, FAERS, and 60+ other ToolUniverse tools. Calculates Network Pharmacology Scores (0-100), identifies repurposing candidates, predicts mechanisms, and analyzes polypharmacology. Use when users ask about drug repurposing via network analysis, multi-target drug effects, compound-target-disease networks, systems pharmacology, or polypharmacology.

tooluniverse-drug-target-validation

912
from wu-yc/LabClaw

Comprehensive computational validation of drug targets for early-stage drug discovery. Evaluates targets across 10 dimensions (disambiguation, disease association, druggability, chemical matter, clinical precedent, safety, pathway context, validation evidence, structural insights, validation roadmap) using 60+ ToolUniverse tools. Produces a quantitative Target Validation Score (0-100) with GO/NO-GO recommendation. Use when users ask about target validation, druggability assessment, target prioritization, or "is X a good drug target for Y?"

tooluniverse-drug-research

912
from wu-yc/LabClaw

Generates comprehensive drug research reports with compound disambiguation, evidence grading, and mandatory completeness sections. Covers identity, chemistry, pharmacology, targets, clinical trials, safety, pharmacogenomics, and ADMET properties. Use when users ask about drugs, medications, therapeutics, or need drug profiling, safety assessment, or clinical development research.

tooluniverse-drug-repurposing

912
from wu-yc/LabClaw

Identify drug repurposing candidates using ToolUniverse for target-based, compound-based, and disease-driven strategies. Searches existing drugs for new therapeutic indications by analyzing targets, bioactivity, safety profiles, and literature evidence. Use when exploring drug repurposing opportunities, finding new indications for approved drugs, or when users mention drug repositioning, off-label uses, or therapeutic alternatives.

tooluniverse-drug-drug-interaction

912
from wu-yc/LabClaw

Comprehensive drug-drug interaction (DDI) prediction and risk assessment. Analyzes interaction mechanisms (CYP450, transporters, pharmacodynamic), severity classification, clinical evidence grading, and provides management strategies. Supports single drug pairs, polypharmacy analysis (3+ drugs), and alternative drug recommendations. Use when users ask about drug interactions, medication safety, polypharmacy risks, or need DDI assessment for clinical decision support.

tooluniverse-chemical-safety

912
from wu-yc/LabClaw

Comprehensive chemical safety and toxicology assessment integrating ADMET-AI predictions, CTD toxicogenomics, FDA label safety data, DrugBank safety profiles, and STITCH chemical-protein interactions. Performs predictive toxicology (AMES, DILI, LD50, carcinogenicity), organ/system toxicity profiling, chemical-gene-disease relationship mapping, regulatory safety extraction, and environmental hazard assessment. Use when asked about chemical toxicity, drug safety profiling, ADMET properties, environmental health risks, chemical hazard assessment, or toxicogenomic analysis.

tooluniverse-chemical-compound-retrieval

912
from wu-yc/LabClaw

Retrieves chemical compound information from PubChem and ChEMBL with disambiguation, cross-referencing, and quality assessment. Creates comprehensive compound profiles with identifiers, properties, bioactivity, and drug information. Use when users need chemical data, drug information, or mention PubChem CID, ChEMBL ID, SMILES, InChI, or compound names.

tooluniverse-binder-discovery

912
from wu-yc/LabClaw

Discover novel small molecule binders for protein targets using structure-based and ligand-based approaches. Creates actionable reports with candidate compounds, ADMET profiles, and synthesis feasibility. Use when users ask to find small molecules for a target, identify novel binders, perform virtual screening, or need hit-to-lead compound identification.

tooluniverse-antibody-engineering

912
from wu-yc/LabClaw

Comprehensive antibody engineering and optimization for therapeutic development. Covers humanization, affinity maturation, developability assessment, and immunogenicity prediction. Use when asked to optimize antibodies, humanize sequences, or engineer therapeutic antibodies from lead to clinical candidate.