tooluniverse-drug-target-validation

Comprehensive computational validation of drug targets for early-stage drug discovery. Evaluates targets across 10 dimensions (disambiguation, disease association, druggability, chemical matter, clinical precedent, safety, pathway context, validation evidence, structural insights, validation roadmap) using 60+ ToolUniverse tools. Produces a quantitative Target Validation Score (0-100) with GO/NO-GO recommendation. Use when users ask about target validation, druggability assessment, target prioritization, or "is X a good drug target for Y?"

42 stars

byZaoqu-Liu

View on GitHub Installation ↓

Best use case

tooluniverse-drug-target-validation is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using tooluniverse-drug-target-validation should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/tooluniverse-drug-target-validation/SKILL.md --create-dirs "https://raw.githubusercontent.com/Zaoqu-Liu/ScienceClaw/main/skills/tooluniverse-drug-target-validation/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/tooluniverse-drug-target-validation/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How tooluniverse-drug-target-validation Compares

Feature / Agent	tooluniverse-drug-target-validation	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Drug Target Validation Pipeline

Validate drug target hypotheses using multi-dimensional computational evidence before committing to wet-lab work. Produces a quantitative Target Validation Score (0-100) with priority tier classification and GO/NO-GO recommendation.

**KEY PRINCIPLES**:
1. **Report-first approach** - Create report file FIRST, then populate progressively
2. **Target disambiguation FIRST** - Resolve all identifiers before analysis
3. **Evidence grading** - Grade all evidence as T1 (experimental) to T4 (computational)
4. **Disease-specific** - Tailor analysis to disease context when provided
5. **Modality-aware** - Consider small molecule vs biologics tractability
6. **Safety-first** - Prominently flag safety concerns early
7. **Quantitative scoring** - Every dimension scored numerically (0-100 composite)
8. **Negative results documented** - "No data" is data; empty sections are failures
9. **Source references** - Every statement must cite tool/database
10. **Completeness checklist** - Mandatory section showing analysis coverage
11. **English-first queries** - Always use English terms in tool calls. Respond in user's language

---

## When to Use This Skill

Apply when users:
- Ask "Is [target] a good drug target for [disease]?"
- Need target validation or druggability assessment
- Want to compare targets for drug discovery prioritization
- Ask about safety risks of modulating a target
- Need chemical starting points for target validation
- Ask about pathway context for a target
- Need a GO/NO-GO recommendation for a target
- Want a comprehensive target dossier for investment decisions

**NOT for** (use other skills instead):
- General target biology overview -> Use `tooluniverse-target-research`
- Drug compound profiling -> Use `tooluniverse-drug-research`
- Variant interpretation -> Use `tooluniverse-variant-interpretation`
- Disease research -> Use `tooluniverse-disease-research`

---

## Input Parameters

| Parameter | Required | Description | Example |
|-----------|----------|-------------|---------|
| **target** | Yes | Gene symbol, protein name, or UniProt ID | `EGFR`, `P00533`, `Epidermal growth factor receptor` |
| **disease** | No | Disease/indication for context | `Non-small cell lung cancer`, `Pancreatic cancer` |
| **modality** | No | Preferred therapeutic modality | `small molecule`, `antibody`, `protein therapeutic`, `PROTAC` |

---

## Target Validation Scoring System

### Score Components (Total: 0-100)

**Disease Association** (0-30 points):
- Genetic evidence: 0-10 (GWAS, rare variants, somatic mutations)
- Literature evidence: 0-10 (publications, clinical studies)
- Pathway evidence: 0-10 (disease pathway involvement)

**Druggability** (0-25 points):
- Structural tractability: 0-10 (structure quality, binding pockets)
- Chemical matter: 0-10 (known compounds, bioactivity data)
- Target class: 0-5 (validated target family bonus)

**Safety Profile** (0-20 points):
- Tissue expression selectivity: 0-5 (expression in critical tissues)
- Genetic validation: 0-10 (knockout phenotypes, human genetics)
- Known adverse events: 0-5 (safety signals from modulators)

**Clinical Precedent** (0-15 points):
- Approved drugs: 15 (strong precedent, validated target)
- Clinical trials: 10 (moderate precedent)
- Preclinical compounds: 5 (weak precedent)
- None: 0 (novel target)

**Validation Evidence** (0-10 points):
- Functional studies: 0-5 (CRISPR, siRNA, biochemical)
- Disease models: 0-5 (animal models, patient data)

### Priority Tiers

| Score | Tier | Recommendation |
|-------|------|----------------|
| **80-100** | Tier 1 | Highly validated - proceed with confidence |
| **60-79** | Tier 2 | Good target - needs focused validation |
| **40-59** | Tier 3 | Moderate risk - significant validation needed |
| **0-39** | Tier 4 | High risk - consider alternatives |

### Evidence Grading System

| Tier | Symbol | Criteria | Examples |
|------|--------|----------|----------|
| **T1** | [T1] | Direct mechanistic, human clinical proof | FDA-approved drug, crystal structure with mechanism, patient mutation |
| **T2** | [T2] | Functional studies, model organism | siRNA phenotype, mouse KO, biochemical assay, CRISPR screen |
| **T3** | [T3] | Association, screen hits, computational | GWAS hit, DepMap essentiality, expression correlation |
| **T4** | [T4] | Mention, review, text-mined, predicted | Review article, database annotation, AlphaFold prediction |

---

## Phase 0: Target Disambiguation & ID Resolution (ALWAYS FIRST)

**Objective**: Resolve target to ALL needed identifiers before any analysis.

### Resolution Strategy

```python
# Step 1: Determine input type and get initial identifiers
# If gene symbol (e.g., "EGFR"):
mygene = tu.tools.MyGene_query_genes(query="EGFR", species="human", fields="symbol,name,ensembl.gene,uniprot.Swiss-Prot,entrezgene")
# Extract: ensembl_id, uniprot_id, entrez_id, symbol, name

# If UniProt ID (e.g., "P00533"):
uniprot = tu.tools.UniProt_get_entry_by_accession(accession="P00533")
# Extract: gene names, Ensembl xrefs, function

# Step 2: Resolve Ensembl ID and get versioned ID for GTEx
ensembl = tu.tools.ensembl_lookup_gene(gene_id=ensembl_id, species="homo_sapiens")
# CRITICAL: species parameter is REQUIRED
# CRITICAL: Response is wrapped in {status, data, url, content_type} - access via ensembl['data']
ensembl_data = ensembl.get('data', ensembl) if isinstance(ensembl, dict) else ensembl
# Extract: version for versioned_id (e.g., "ENSG00000146648.18")

# Step 3: Get Ensembl cross-references
xrefs = tu.tools.ensembl_get_xrefs(id=ensembl_id)
# Extract: HGNC, UniProt, EntrezGene mappings

# Step 4: Get OpenTargets target info
ot_target = tu.tools.OpenTargets_get_target_id_description_by_name(targetName="EGFR")
# Verify ensemblId matches

# Step 5: Get ChEMBL target ID
chembl_targets = tu.tools.ChEMBL_search_targets(pref_name__contains="EGFR", organism="Homo sapiens", limit=5)
# Extract: target_chembl_id for later use

# Step 6: Get UniProt function summary
function_info = tu.tools.UniProt_get_function_by_accession(accession=uniprot_id)
# Returns list of strings (NOT dict)

# Step 7: Get alternative names for collision detection
alt_names = tu.tools.UniProt_get_alternative_names_by_accession(accession=uniprot_id)
```

### Identifier Resolution Output

```markdown
## 1. Target Identity

| Database | Identifier | Verified |
|----------|-----------|----------|
| Gene Symbol | EGFR | Yes |
| Full Name | Epidermal growth factor receptor | Yes |
| Ensembl | ENSG00000146648 | Yes |
| Ensembl (versioned) | ENSG00000146648.18 | Yes |
| UniProt | P00533 | Yes |
| Entrez Gene | 1956 | Yes |
| ChEMBL | CHEMBL203 | Yes |
| HGNC | HGNC:3236 | Yes |

**Protein Function**: [from UniProt_get_function_by_accession]
**Subcellular Location**: [from UniProt_get_subcellular_location_by_accession]
**Target Class**: [from OpenTargets_get_target_classes_by_ensemblID]
```

### Known Parameter Corrections

| Tool | WRONG Parameter | CORRECT Parameter |
|------|-----------------|-------------------|
| `ensembl_lookup_gene` | `id` | `gene_id` (+ `species="homo_sapiens"` REQUIRED) |
| `Reactome_map_uniprot_to_pathways` | `uniprot_id` | `id` |
| `ensembl_get_xrefs` | `gene_id` | `id` |
| `GTEx_get_median_gene_expression` | `gencode_id` only | `gencode_id` + `operation="median"` |
| `OpenTargets_*` | `ensemblID` (uppercase) | `ensemblId` (camelCase) |
| `OpenTargets_get_publications_*` | `ensemblId` | `entityId` |
| `OpenTargets_get_associated_drugs_by_target_ensemblID` | `ensemblId` only | `ensemblId` + `size` (REQUIRED) |
| `MyGene_query_genes` | `q` | `query` |
| `PubMed_search_articles` | returns `{articles: [...]}` | returns **plain list** of dicts |
| `UniProt_get_function_by_accession` | returns dict | returns **list of strings** |
| `HPA_get_rna_expression_by_source` | `ensembl_id` | `gene_name` + `source_type` + `source_name` (ALL required) |
| `alphafold_get_prediction` | `uniprot_accession` | `qualifier` |
| `drugbank_get_safety_*` | simple params | `query`, `case_sensitive`, `exact_match`, `limit` (ALL required) |

---

## Phase 1: Disease Association Evidence (0-30 points)

**Objective**: Quantify the strength of target-disease association from genetic, literature, and pathway evidence.

### 1A. OpenTargets Disease Associations (Primary)

```python
# Get ALL disease associations for target
diseases = tu.tools.OpenTargets_get_diseases_phenotypes_by_target_ensembl(ensemblId=ensembl_id)

# If specific disease provided, get detailed evidence
if disease_name:
    disease_info = tu.tools.OpenTargets_get_disease_id_description_by_name(diseaseName=disease_name)
    efo_id = disease_info.get('id')  # e.g., "EFO_0003060"

    evidence = tu.tools.OpenTargets_target_disease_evidence(
        efoId=efo_id, ensemblId=ensembl_id
    )

    # Get evidence by data source for detailed breakdown
    datasource_evidence = tu.tools.OpenTargets_get_evidence_by_datasource(
        efoId=efo_id, ensemblId=ensembl_id,
        datasourceIds=["ot_genetics_portal", "eva", "gene2phenotype", "genomics_england", "uniprot_literature"],
        size=100
    )
```

### 1B. GWAS Genetic Evidence

```python
# GWAS associations for target gene
gwas_snps = tu.tools.gwas_get_snps_for_gene(mapped_gene=gene_symbol, size=50)

# If specific disease, search for trait-specific associations
if disease_name:
    gwas_studies = tu.tools.gwas_search_studies(query=disease_name, size=20)
```

### 1C. Constraint Scores (gnomAD)

```python
# Genetic constraint - intolerance to loss of function
constraints = tu.tools.gnomad_get_gene_constraints(gene_symbol=gene_symbol)
# Extract: pLI, LOEUF, missense_z, pRec
# High pLI (>0.9) = highly intolerant to LoF = likely essential
```

### 1D. Literature Evidence

```python
# PubMed for target-disease association
articles = tu.tools.PubMed_search_articles(
    query=f'"{gene_symbol}" AND "{disease_name}" AND (target OR therapeutic OR inhibitor)',
    limit=50
)
# PubMed_search_articles returns a plain list of dicts

# OpenTargets publications
pubs = tu.tools.OpenTargets_get_publications_by_target_ensemblID(entityId=ensembl_id)
```

### Scoring Logic - Disease Association

```
Genetic Evidence (0-10):
  - GWAS hits for specific disease: +3 per significant locus (max 6)
  - Rare variant evidence (ClinVar pathogenic): +2
  - Somatic mutations in disease: +2
  - pLI > 0.9 (essential gene): +2

Literature Evidence (0-10):
  - >100 publications on target+disease: 10
  - 50-100 publications: 7
  - 10-50 publications: 5
  - 1-10 publications: 3
  - 0 publications: 0

Pathway Evidence (0-10):
  - OpenTargets overall score > 0.8: 10
  - Score 0.5-0.8: 7
  - Score 0.2-0.5: 4
  - Score < 0.2: 1
```

---

## Phase 2: Druggability Assessment (0-25 points)

**Objective**: Assess whether the target is amenable to therapeutic intervention.

### 2A. OpenTargets Tractability

```python
# Tractability assessment across modalities
tractability = tu.tools.OpenTargets_get_target_tractability_by_ensemblID(ensemblId=ensembl_id)
# Returns: label, modality (SM, AB, PR, OC), value (boolean/score)
# Modalities: Small Molecule, Antibody, PROTAC, Other Clinical
```

### 2B. Target Class & Family

```python
# Target classification (kinase, GPCR, ion channel, etc.)
target_classes = tu.tools.OpenTargets_get_target_classes_by_ensemblID(ensemblId=ensembl_id)

# Pharos target development level
pharos = tu.tools.Pharos_get_target(gene=gene_symbol)
# TDL: Tclin (approved drug) > Tchem (compounds) > Tbio (biology) > Tdark (unknown)

# DGIdb druggability categories
druggability = tu.tools.DGIdb_get_gene_druggability(genes=[gene_symbol])
```

### 2C. Structural Tractability

```python
# PDB structures available
if uniprot_id:
    uniprot_entry = tu.tools.UniProt_get_entry_by_accession(accession=uniprot_id)
    # Extract PDB cross-references from entry

# AlphaFold prediction
alphafold = tu.tools.alphafold_get_prediction(qualifier=uniprot_id)
alphafold_summary = tu.tools.alphafold_get_summary(qualifier=uniprot_id)

# For top PDB structures, analyze binding pockets
# ProteinsPlus DoGSiteScorer for pocket detection
for pdb_id in top_pdb_ids[:3]:
    pockets = tu.tools.ProteinsPlus_predict_binding_sites(pdb_id=pdb_id)
    # Returns predicted druggable pockets with scores
```

### 2D. Chemical Probes & Enabling Packages

```python
# Chemical probes (validated tool compounds)
probes = tu.tools.OpenTargets_get_chemical_probes_by_target_ensemblID(ensemblId=ensembl_id)

# Target Enabling Packages (TEPs)
teps = tu.tools.OpenTargets_get_target_enabling_packages_by_ensemblID(ensemblId=ensembl_id)
```

### Scoring Logic - Druggability

```
Structural Tractability (0-10):
  - High-res co-crystal structure with ligand: 10
  - PDB structure available, pockets detected: 7
  - AlphaFold only, confident pocket prediction: 5
  - AlphaFold low confidence / no structure: 2
  - No structural data: 0

Chemical Matter (0-10):
  - Known drug-like compounds (IC50 < 100nM): 10
  - Tool compounds (IC50 < 1uM): 7
  - HTS hits only (IC50 > 1uM): 4
  - No known ligands: 0

Target Class Bonus (0-5):
  - Validated druggable family (kinase, GPCR, nuclear receptor): 5
  - Enzyme, ion channel: 4
  - Protein-protein interaction, transporter: 2
  - Novel/unknown class: 0
```

---

## Phase 3: Known Modulators & Chemical Matter (Feeds into Phase 2 scoring)

**Objective**: Identify existing chemical starting points for target validation.

### 3A. ChEMBL Bioactivity

```python
# Search for ChEMBL target
chembl_targets = tu.tools.ChEMBL_search_targets(
    pref_name__contains=gene_symbol, organism="Homo sapiens", limit=10
)

# Get activities for best matching target
target_chembl_id = chembl_targets[0]['target_chembl_id']
activities = tu.tools.ChEMBL_get_target_activities(
    target_chembl_id__exact=target_chembl_id, limit=100
)
# Parse: compound IDs, pChEMBL values, activity types (IC50, Ki, Kd)
# Filter: potent compounds (pChEMBL >= 6.0 = IC50 <= 1uM)
```

### 3B. BindingDB Ligands

```python
# Experimental binding data
ligands = tu.tools.BindingDB_get_ligands_by_uniprot(
    uniprot=uniprot_id, affinity_cutoff=10000  # nM
)
# Returns: SMILES, affinity_type (Ki/IC50/Kd), affinity value, PMID
```

### 3C. PubChem Bioassays

```python
# HTS screening data
assays = tu.tools.PubChem_search_assays_by_target_gene(gene_symbol=gene_symbol)
# Get details for top assays
for aid in assay_ids[:5]:
    summary = tu.tools.PubChem_get_assay_summary(aid=str(aid))
    targets = tu.tools.PubChem_get_assay_targets(aid=str(aid))
    actives = tu.tools.PubChem_get_assay_active_compounds(aid=str(aid))
```

### 3D. Known Drugs Targeting This Protein

```python
# OpenTargets known drugs
drugs = tu.tools.OpenTargets_get_associated_drugs_by_target_ensemblID(
    ensemblId=ensembl_id, size=25
)

# ChEMBL drug mechanisms
drug_mechanisms = tu.tools.ChEMBL_search_mechanisms(
    target_chembl_id=target_chembl_id, limit=50
)

# Drug interaction databases
dgidb = tu.tools.DGIdb_get_gene_info(genes=[gene_symbol])
```

### Report Format - Chemical Matter

```markdown
### 4. Known Modulators & Chemical Matter

#### 4.1 Approved Drugs
| Drug | ChEMBL ID | Mechanism | Phase | Indication | Source |
|------|-----------|-----------|-------|------------|--------|
| Erlotinib | CHEMBL553 | Inhibitor | 4 | NSCLC | [T1] OpenTargets |
| Gefitinib | CHEMBL939 | Inhibitor | 4 | NSCLC | [T1] OpenTargets |

#### 4.2 ChEMBL Bioactivity Summary
**Total Activities**: 12,456 datapoints across 2,341 assays
**Most Potent Compound**: CHEMBL413456 (IC50 = 0.3 nM) [T1]
**Chemical Series**: 8 distinct scaffolds with pChEMBL >= 7.0
**Selectivity Data**: Available for 45 compounds (kinase panel)

#### 4.3 BindingDB Ligands
**Total Ligands**: 856 with measured affinity
**Best Affinity**: 0.1 nM (Ki)
**Affinity Distribution**: <1nM: 23, 1-10nM: 89, 10-100nM: 234, 100nM-1uM: 510

#### 4.4 Chemical Probes
| Probe | Source | Potency | Selectivity | Use |
|-------|--------|---------|-------------|-----|
| SGC-1234 | SGC | IC50=5nM | >100x | In vitro |
```

---

## Phase 4: Clinical Precedent (0-15 points)

**Objective**: Assess clinical validation from approved drugs and clinical trials.

### 4A. FDA-Approved Drugs

```python
# FDA label information
fda_moa = tu.tools.FDA_get_mechanism_of_action_by_drug_name(drug_name=gene_symbol)
fda_indications = tu.tools.FDA_get_indications_by_drug_name(drug_name=known_drug_name)

# DrugBank pharmacology
drugbank_targets = tu.tools.drugbank_get_targets_by_drug_name_or_drugbank_id(
    query=known_drug_name, case_sensitive=False, exact_match=False, limit=10
)

# DrugBank safety info
drugbank_safety = tu.tools.drugbank_get_safety_by_drug_name_or_drugbank_id(
    query=known_drug_name, case_sensitive=False, exact_match=False, limit=10
)
```

### 4B. Clinical Trials

```python
# Active clinical trials targeting this protein
trials = tu.tools.search_clinical_trials(
    query_term=gene_symbol,
    intervention=gene_symbol,
    pageSize=50
)

# If specific disease context
if disease_name:
    disease_trials = tu.tools.search_clinical_trials(
        query_term=gene_symbol,
        condition=disease_name,
        pageSize=50
    )
```

### 4C. Failed Programs (Learn from Failures)

```python
# Drug warnings and withdrawals
for drug_chembl_id in known_drug_ids:
    warnings = tu.tools.OpenTargets_get_drug_warnings_by_chemblId(chemblId=drug_chembl_id)
    adverse = tu.tools.OpenTargets_get_drug_adverse_events_by_chemblId(chemblId=drug_chembl_id)
```

### Scoring Logic - Clinical Precedent

```
Clinical Precedent (0-15):
  - FDA-approved drug for SAME disease: 15
  - FDA-approved drug for DIFFERENT disease: 12
  - Phase 3 clinical trial: 10
  - Phase 2 clinical trial: 7
  - Phase 1 clinical trial: 5
  - Preclinical compounds only: 3
  - No clinical development: 0

Adjustment factors:
  - Failed clinical program for safety: -3
  - Drug withdrawal: -5
  - Multiple approved drugs (validated class): +2
```

---

## Phase 5: Safety & Toxicity Considerations (0-20 points)

**Objective**: Identify safety risks from expression, genetics, and known adverse events.

### 5A. OpenTargets Safety Profile

```python
safety = tu.tools.OpenTargets_get_target_safety_profile_by_ensemblID(ensemblId=ensembl_id)
# Returns: safety liabilities, adverse effects, experimental toxicity
```

### 5B. Expression in Critical Tissues

```python
# GTEx tissue expression (identifies essential organ expression)
gtex = tu.tools.GTEx_get_median_gene_expression(
    operation="median", gencode_id=ensembl_versioned_id
)
# If empty, try unversioned ID

# HPA expression
# NOTE: HPA_get_rna_expression_by_source requires gene_name, source_type, source_name
hpa = tu.tools.HPA_search_genes_by_query(search_query=gene_symbol)
hpa_details = tu.tools.HPA_get_comprehensive_gene_details_by_ensembl_id(ensembl_id=ensembl_id)

# Check expression in safety-critical tissues
# Heart, liver, kidney, brain, bone marrow = high risk if target is expressed
```

### 5C. Knockout Phenotypes

```python
# Mouse model phenotypes
mouse_models = tu.tools.OpenTargets_get_biological_mouse_models_by_ensemblID(ensemblId=ensembl_id)

# Genetic constraint (proxy for essentiality)
constraints = tu.tools.gnomad_get_gene_constraints(gene_symbol=gene_symbol)
# High pLI = essential gene = potential safety concern
```

### 5D. Known Adverse Events from Target Modulation

```python
# For known drugs targeting this protein
for drug_name in known_drug_names:
    fda_adr = tu.tools.FDA_get_adverse_reactions_by_drug_name(drug_name=drug_name)
    fda_warnings = tu.tools.FDA_get_warnings_and_cautions_by_drug_name(drug_name=drug_name)
    fda_boxed = tu.tools.FDA_get_boxed_warning_info_by_drug_name(drug_name=drug_name)
    fda_contraindications = tu.tools.FDA_get_contraindications_by_drug_name(drug_name=drug_name)
```

### 5E. Homologs & Off-Target Risks

```python
# Paralogs (close family members that might be hit)
homologs = tu.tools.OpenTargets_get_target_homologues_by_ensemblID(ensemblId=ensembl_id)
# Paralogs with high sequence identity = selectivity challenge
```

### Scoring Logic - Safety

```
Tissue Expression Selectivity (0-5):
  - Target restricted to disease tissue: 5
  - Low expression in heart/liver/kidney/brain: 4
  - Moderate expression in 1-2 critical tissues: 2
  - High expression in multiple critical tissues: 0

Genetic Validation (0-10):
  - Mouse KO viable, no severe phenotype: 10
  - Mouse KO viable with mild phenotype: 7
  - Mouse KO has concerning phenotype: 3
  - Mouse KO lethal: 0
  - No KO data, low pLI (<0.5): 5
  - No KO data, high pLI (>0.9): 2

Known Adverse Events (0-5):
  - No known safety signals: 5
  - Mild, manageable ADRs: 3
  - Serious ADRs reported: 1
  - Black box warning or drug withdrawal: 0
```

---

## Phase 6: Pathway Context & Network Analysis

**Objective**: Understand the target's role in biological networks and disease pathways.

### 6A. Reactome Pathways

```python
# Map target to pathways
pathways = tu.tools.Reactome_map_uniprot_to_pathways(id=uniprot_id)

# Get pathway details for top pathways
for pathway in top_pathways[:5]:
    detail = tu.tools.Reactome_get_pathway(id=pathway['stId'])
    reactions = tu.tools.Reactome_get_pathway_reactions(id=pathway['stId'])
```

### 6B. Protein-Protein Interactions

```python
# STRING network
string_ppi = tu.tools.STRING_get_protein_interactions(
    protein_ids=[gene_symbol], species=9606, confidence_score=0.7
)
# Higher confidence = more reliable

# IntAct interactions (experimental)
intact_ppi = tu.tools.intact_get_interactions(identifier=uniprot_id)

# OpenTargets interactions
ot_ppi = tu.tools.OpenTargets_get_target_interactions_by_ensemblID(ensemblId=ensembl_id)
```

### 6C. Functional Enrichment

```python
# GO annotations
go_terms = tu.tools.OpenTargets_get_target_gene_ontology_by_ensemblID(ensemblId=ensembl_id)

# Direct GO query
go_annotations = tu.tools.GO_get_annotations_for_gene(gene_id=gene_symbol)

# STRING functional enrichment of interaction partners
enrichment = tu.tools.STRING_functional_enrichment(
    protein_ids=[gene_symbol], species=9606
)
```

### Report Format - Pathway Context

```markdown
### 7. Pathway Context & Network Analysis

#### 7.1 Key Pathways
| Pathway | Reactome ID | Relevance to Disease | Evidence |
|---------|-------------|---------------------|----------|
| EGFR signaling | R-HSA-177929 | Driver pathway in NSCLC | [T1] |
| RAS-RAF-MEK-ERK | R-HSA-5673001 | Downstream effector | [T1] |
| PI3K-AKT signaling | R-HSA-2219528 | Resistance mechanism | [T2] |

#### 7.2 Protein-Protein Interactions
**Total Interactors**: 45 (STRING confidence > 0.7)
**Key Interactors**: GRB2, SHC1, PLCG1, PIK3CA, STAT3

#### 7.3 Pathway Redundancy Assessment
**Compensation Risk**: MODERATE
- Parallel pathways: HER2, HER3 can compensate
- Feedback loops: RAS activation bypasses EGFR
- Downstream convergence: MEK/ERK shared with other RTKs
```

---

## Phase 7: Validation Evidence (0-10 points)

**Objective**: Assess existing functional validation data.

### 7A. DepMap Essentiality (CRISPR/RNAi)

```python
# Gene essentiality in cancer cell lines
deps = tu.tools.DepMap_get_gene_dependencies(gene_symbol=gene_symbol)
# Negative scores = essential (cells die upon KO)
# Score < -0.5: moderately essential
# Score < -1.0: strongly essential
```

### 7B. Literature Validation Evidence

```python
# Search for functional studies
validation_papers = tu.tools.PubMed_search_articles(
    query=f'"{gene_symbol}" AND (CRISPR OR siRNA OR knockdown OR knockout OR "loss of function") AND "{disease_name}"',
    limit=30
)

# Search for biomarker studies
biomarker_papers = tu.tools.PubMed_search_articles(
    query=f'"{gene_symbol}" AND (biomarker OR "target engagement" OR "pharmacodynamic")',
    limit=20
)
```

### 7C. Animal Model Evidence

```python
# Mouse phenotypes from OpenTargets (already retrieved in Phase 5)
# Reuse mouse_models data

# CTD gene-disease associations (complementary)
ctd_diseases = tu.tools.CTD_get_gene_diseases(input_terms=gene_symbol)
```

### Scoring Logic - Validation Evidence

```
Functional Studies (0-5):
  - CRISPR KO shows disease-relevant phenotype: 5
  - siRNA knockdown shows phenotype: 4
  - Biochemical assay validates mechanism: 3
  - Overexpression study only: 2
  - No functional data: 0

Disease Models (0-5):
  - Patient-derived xenograft (PDX) response: 5
  - Genetically engineered mouse model: 4
  - Cell line model: 3
  - In silico model only: 1
  - No model data: 0
```

---

## Phase 8: Structural Insights

**Objective**: Leverage structural biology for druggability and mechanism understanding.

### 8A. PDB Structures

```python
# Get PDB entries from UniProt cross-references
uniprot_entry = tu.tools.UniProt_get_entry_by_accession(accession=uniprot_id)
# Parse: uniProtKBCrossReferences where database == "PDB"

# Get details for each PDB
for pdb_id in pdb_ids[:10]:
    metadata = tu.tools.get_protein_metadata_by_pdb_id(pdb_id=pdb_id)
    quality = tu.tools.pdbe_get_entry_quality(pdb_id=pdb_id)
    summary = tu.tools.pdbe_get_entry_summary(pdb_id=pdb_id)
    experiment = tu.tools.pdbe_get_entry_experiment(pdb_id=pdb_id)
    molecules = tu.tools.pdbe_get_entry_molecules(pdb_id=pdb_id)
```

### 8B. AlphaFold Prediction

```python
alphafold = tu.tools.alphafold_get_prediction(qualifier=uniprot_id)
alphafold_info = tu.tools.alphafold_get_summary(qualifier=uniprot_id)
# Check pLDDT scores for confidence
```

### 8C. Binding Pocket Analysis

```python
# ProteinsPlus DoGSiteScorer for best PDB structure
pockets = tu.tools.ProteinsPlus_predict_binding_sites(pdb_id=best_pdb_id)
# Returns: pocket locations, druggability scores, volume, surface

# Interaction diagram for co-crystal structures
if has_ligand:
    diagram = tu.tools.ProteinsPlus_generate_interaction_diagram(pdb_id=pdb_id)
```

### 8D. Domain Architecture

```python
# InterPro domains
domains = tu.tools.InterPro_get_protein_domains(uniprot_accession=uniprot_id)

# Domain details for key domains
for domain in domains[:5]:
    detail = tu.tools.InterPro_get_domain_details(entry_id=domain['accession'])
```

---


---

> **Extended Reference**: For detailed tool tables, examples, and templates, read `REFERENCE.md` in this skill directory.
> The agent can access it via: `read skills/tooluniverse-drug-target-validation/REFERENCE.md`

Related Skills

torchdrug

from Zaoqu-Liu/ScienceClaw

PyTorch-native graph neural networks for molecules and proteins. Use when building custom GNN architectures for drug discovery, protein modeling, or knowledge graph reasoning. Best for custom model development, protein property prediction, retrosynthesis. For pre-trained models and diverse featurizers use deepchem; for benchmark datasets use pytdc.

tooluniverse

from Zaoqu-Liu/ScienceClaw

Router skill for ToolUniverse tasks. First checks if specialized tooluniverse skills (34+ skills covering disease/drug/target research, clinical decision support, genomics, epigenomics, chemical safety, systems biology, and more) can solve the problem, then falls back to general strategies for using 1400+ scientific tools. Covers tool discovery, multi-hop queries, comprehensive research workflows, disambiguation, evidence grading, and report generation. Use when users need to research any scientific topic, find biological data, or explore drug/target/disease relationships.

tooluniverse-variant-interpretation

from Zaoqu-Liu/ScienceClaw

Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.

tooluniverse-variant-analysis

from Zaoqu-Liu/ScienceClaw

Production-ready VCF processing, variant annotation, mutation analysis, and structural variant (SV/CNV) interpretation for bioinformatics questions. Parses VCF files (streaming, large files), classifies mutation types (missense, nonsense, synonymous, frameshift, splice, intronic, intergenic) and structural variants (deletions, duplications, inversions, translocations), applies VAF/depth/quality/consequence filters, annotates with ClinVar/dbSNP/gnomAD/CADD via ToolUniverse, interprets SV/CNV clinical significance using ClinGen dosage sensitivity scores, computes variant statistics, and generates reports. Solves questions like "What fraction of variants with VAF < 0.3 are missense?", "How many non-reference variants remain after filtering intronic/intergenic?", "What is the pathogenicity of this deletion affecting BRCA1?", or "Which dosage-sensitive genes overlap this CNV?". Use when processing VCF files, annotating variants, filtering by VAF/depth/consequence, classifying mutations, interpreting structural variants, assessing CNV pathogenicity, comparing cohorts, or answering variant analysis questions.

tooluniverse-target-research

from Zaoqu-Liu/ScienceClaw

Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.

tooluniverse-systems-biology

from Zaoqu-Liu/ScienceClaw

Comprehensive systems biology and pathway analysis using multiple pathway databases (Reactome, KEGG, WikiPathways, Pathway Commons, BioModels). Performs pathway enrichment, protein-pathway mapping, keyword searches, and systems-level analysis. Use when analyzing gene sets, exploring biological pathways, or investigating systems-level biology.

tooluniverse-structural-variant-analysis

from Zaoqu-Liu/ScienceClaw

Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.

tooluniverse-statistical-modeling

from Zaoqu-Liu/ScienceClaw

Perform statistical modeling and regression analysis on biomedical datasets. Supports linear regression, logistic regression (binary/ordinal/multinomial), mixed-effects models, Cox proportional hazards survival analysis, Kaplan-Meier estimation, and comprehensive model diagnostics. Extracts odds ratios, hazard ratios, confidence intervals, p-values, and effect sizes. Designed to solve BixBench statistical reasoning questions involving clinical/experimental data. Use when asked to fit regression models, compute odds ratios, perform survival analysis, run statistical tests, or interpret model coefficients from provided data.

tooluniverse-spatial-transcriptomics

from Zaoqu-Liu/ScienceClaw

Analyze spatial transcriptomics data to map gene expression in tissue architecture. Supports 10x Visium, MERFISH, seqFISH, Slide-seq, and imaging-based platforms. Performs spatial clustering, domain identification, cell-cell proximity analysis, spatial gene expression patterns, tissue architecture mapping, and integration with single-cell data. Use when analyzing spatial transcriptomics datasets, studying tissue organization, identifying spatial expression patterns, mapping cell-cell interactions in tissue context, characterizing tumor microenvironment spatial structure, or integrating spatial and single-cell RNA-seq data for comprehensive tissue analysis.

tooluniverse-spatial-omics-analysis

from Zaoqu-Liu/ScienceClaw

Computational analysis framework for spatial multi-omics data integration. Given spatially variable genes (SVGs), spatial domain annotations, tissue type, and disease context from spatial transcriptomics/proteomics experiments (10x Visium, MERFISH, DBiTplus, SLIDE-seq, etc.), performs comprehensive biological interpretation including pathway enrichment, cell-cell interaction inference, druggable target identification, immune microenvironment characterization, and multi-modal integration. Produces a detailed markdown report with Spatial Omics Integration Score (0-100), domain-by-domain characterization, and validation recommendations. Uses 70+ ToolUniverse tools across 9 analysis phases. Use when users ask about spatial transcriptomics analysis, spatial omics interpretation, tissue heterogeneity, spatial gene expression patterns, tumor microenvironment mapping, tissue zonation, or cell-cell communication from spatial data.

tooluniverse-single-cell

from Zaoqu-Liu/ScienceClaw

Production-ready single-cell and expression matrix analysis using scanpy, anndata, and scipy. Performs scRNA-seq QC, normalization, PCA, UMAP, Leiden/Louvain clustering, differential expression (Wilcoxon, t-test, DESeq2), cell type annotation, per-cell-type statistical analysis, gene-expression correlation, batch correction (Harmony), trajectory inference, and cell-cell communication analysis. NEW: Analyzes ligand-receptor interactions between cell types using OmniPath (CellPhoneDB, CellChatDB), scores communication strength, identifies signaling cascades, and handles multi-subunit receptor complexes. Integrates with ToolUniverse gene annotation tools (HPA, Ensembl, MyGene, UniProt) and enrichment tools (gseapy, PANTHER, STRING). Supports h5ad, 10X, CSV/TSV count matrices, and pre-annotated datasets. Use when analyzing single-cell RNA-seq data, studying cell-cell interactions, performing cell type differential expression, computing gene-expression correlations by cell type, analyzing tumor-immune communication, or answering questions about scRNA-seq datasets.

tooluniverse-sequence-retrieval

from Zaoqu-Liu/ScienceClaw

Retrieves biological sequences (DNA, RNA, protein) from NCBI and ENA with gene disambiguation, accession type handling, and comprehensive sequence profiles. Creates detailed reports with sequence metadata, cross-database references, and download options. Use when users need nucleotide sequences, protein sequences, genome data, or mention GenBank, RefSeq, EMBL accessions.