tooluniverse-protein-therapeutic-design

Design novel protein therapeutics (binders, enzymes, scaffolds) using AI-guided de novo design. Uses RFdiffusion for backbone generation, ProteinMPNN for sequence design, ESMFold/AlphaFold2 for validation. Use when asked to design protein binders, therapeutic proteins, or engineer protein function.

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

tooluniverse-protein-therapeutic-design is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using tooluniverse-protein-therapeutic-design should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/tooluniverse-protein-therapeutic-design/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/tooluniverse-protein-therapeutic-design/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/tooluniverse-protein-therapeutic-design/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How tooluniverse-protein-therapeutic-design Compares

Feature / Agent	tooluniverse-protein-therapeutic-design	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Therapeutic Protein Designer

AI-guided de novo protein design using RFdiffusion backbone generation, ProteinMPNN sequence optimization, and structure validation for therapeutic protein development.

**KEY PRINCIPLES**:
1. **Structure-first design** - Generate backbone geometry before sequence
2. **Target-guided** - Design binders with target structure in mind
3. **Iterative validation** - Predict structure to validate designs
4. **Developability-aware** - Consider aggregation, immunogenicity, expression
5. **Evidence-graded** - Grade designs by confidence metrics
6. **Actionable output** - Provide sequences ready for experimental testing
7. **English-first queries** - Always use English terms in tool calls (protein names, target names), even if the user writes in another language. Only try original-language terms as a fallback. Respond in the user's language

---

## When to Use

Apply when user asks:
- "Design a protein binder for [target]"
- "Create a therapeutic protein against [protein/epitope]"
- "Design a protein scaffold with [property]"
- "Optimize this protein sequence for [function]"
- "Design a de novo enzyme for [reaction]"
- "Generate protein variants for [target binding]"

---

## Critical Workflow Requirements

### 1. Report-First Approach (MANDATORY)

1. **Create the report file FIRST**:
   - File name: `[TARGET]_protein_design_report.md`
   - Initialize with section headers
   - Add placeholder: `[Designing...]`

2. **Progressively update** as designs are generated

3. **Output separate files**:
   - `[TARGET]_designed_sequences.fasta` - All designed sequences
   - `[TARGET]_top_candidates.csv` - Ranked candidates with metrics

### 2. Design Documentation (MANDATORY)

Every design MUST include:

```markdown
### Design: Binder_001

**Sequence**: MVLSPADKTN...
**Length**: 85 amino acids
**Target**: PD-L1 (UniProt: Q9NZQ7)
**Method**: RFdiffusion → ProteinMPNN → ESMFold validation

**Quality Metrics**:
| Metric | Value | Interpretation |
|--------|-------|----------------|
| pLDDT | 88.5 | High confidence |
| pTM | 0.82 | Good fold |
| ProteinMPNN score | -2.3 | Favorable |
| Predicted binding | Strong | Based on interface pLDDT |

*Source: NVIDIA NIM via `NvidiaNIM_rfdiffusion`, `NvidiaNIM_proteinmpnn`, `NvidiaNIM_esmfold`*
```

---

## Phase 0: Tool Verification

### NVIDIA NIM Tools Required

| Tool | Purpose | API Key Required |
|------|---------|------------------|
| `NvidiaNIM_rfdiffusion` | Backbone generation | Yes |
| `NvidiaNIM_proteinmpnn` | Sequence design | Yes |
| `NvidiaNIM_esmfold` | Fast structure validation | Yes |
| `NvidiaNIM_alphafold2` | High-accuracy validation | Yes |
| `NvidiaNIM_esm2_650m` | Sequence embeddings | Yes |

### Parameter Verification

| Tool | WRONG Parameter | CORRECT Parameter |
|------|-----------------|-------------------|
| `NvidiaNIM_rfdiffusion` | `num_steps` | `diffusion_steps` |
| `NvidiaNIM_proteinmpnn` | `pdb` | `pdb_string` |
| `NvidiaNIM_esmfold` | `seq` | `sequence` |

---

## Workflow Overview

```
Phase 1: Target Characterization
├── Get target structure (PDB, EMDB cryo-EM, or AlphaFold)
├── Identify binding epitope
├── Analyze existing binders
├── Check EMDB for membrane protein structures (NEW)
└── OUTPUT: Target profile
    ↓
Phase 2: Backbone Generation (RFdiffusion)
├── Define design constraints
├── Generate multiple backbones
├── Filter by geometry quality
└── OUTPUT: Candidate backbones
    ↓
Phase 3: Sequence Design (ProteinMPNN)
├── Design sequences for each backbone
├── Sample multiple sequences per backbone
├── Score by ProteinMPNN likelihood
└── OUTPUT: Designed sequences
    ↓
Phase 4: Structure Validation
├── Predict structure (ESMFold/AlphaFold2)
├── Compare to designed backbone
├── Assess fold quality (pLDDT, pTM)
└── OUTPUT: Validated designs
    ↓
Phase 5: Developability Assessment
├── Aggregation propensity
├── Expression likelihood
├── Immunogenicity prediction
└── OUTPUT: Developability scores
    ↓
Phase 6: Report Synthesis
├── Ranked candidate list
├── Experimental recommendations
├── Next steps
└── OUTPUT: Final report
```

---

## Phase 1: Target Characterization

### 1.1 Get Target Structure

```python
def get_target_structure(tu, target_id):
    """Get target structure from PDB, EMDB, or predict."""
    
    # Try PDB first (X-ray/NMR)
    pdb_results = tu.tools.PDB_search_by_uniprot(uniprot_id=target_id)
    
    if pdb_results:
        # Get highest resolution structure
        best_pdb = sorted(pdb_results, key=lambda x: x['resolution'])[0]
        structure = tu.tools.PDB_get_structure(pdb_id=best_pdb['pdb_id'])
        return {'source': 'PDB', 'pdb_id': best_pdb['pdb_id'], 
                'resolution': best_pdb['resolution'], 'structure': structure}
    
    # Try EMDB for cryo-EM structures (valuable for membrane proteins)
    protein_info = tu.tools.UniProt_get_protein_by_accession(accession=target_id)
    emdb_results = tu.tools.emdb_search(
        query=protein_info['proteinDescription']['recommendedName']['fullName']['value']
    )
    
    if emdb_results and len(emdb_results) > 0:
        # Get highest resolution cryo-EM entry
        best_emdb = sorted(emdb_results, key=lambda x: x.get('resolution', 99))[0]
        # Get associated PDB model if available
        emdb_details = tu.tools.emdb_get_entry(entry_id=best_emdb['emdb_id'])
        if emdb_details.get('pdb_ids'):
            structure = tu.tools.PDB_get_structure(pdb_id=emdb_details['pdb_ids'][0])
            return {'source': 'EMDB cryo-EM', 'emdb_id': best_emdb['emdb_id'],
                    'pdb_id': emdb_details['pdb_ids'][0], 
                    'resolution': best_emdb.get('resolution'), 'structure': structure}
    
    # Fallback to AlphaFold prediction
    sequence = tu.tools.UniProt_get_protein_sequence(accession=target_id)
    structure = tu.tools.NvidiaNIM_alphafold2(
        sequence=sequence['sequence'],
        algorithm="mmseqs2"
    )
    return {'source': 'AlphaFold2 (predicted)', 'structure': structure}
```

### 1.1b EMDB for Membrane Proteins (NEW)

**When to prioritize EMDB**: Membrane proteins, large complexes, and targets where conformational states matter.

```python
def get_cryoem_structures(tu, target_name):
    """Get cryo-EM structures for membrane proteins/complexes."""
    
    # Search EMDB
    emdb_results = tu.tools.emdb_search(
        query=f"{target_name} membrane OR receptor"
    )
    
    structures = []
    for entry in emdb_results[:5]:
        details = tu.tools.emdb_get_entry(entry_id=entry['emdb_id'])
        structures.append({
            'emdb_id': entry['emdb_id'],
            'resolution': entry.get('resolution', 'N/A'),
            'title': entry.get('title', 'N/A'),
            'conformational_state': details.get('state', 'Unknown'),
            'pdb_models': details.get('pdb_ids', [])
        })
    
    return structures
```

**Output for Report**:

```markdown
### 1.1b Cryo-EM Structures (EMDB)

| EMDB ID | Resolution | PDB Model | Conformation |
|---------|------------|-----------|--------------|
| EMD-12345 | 2.8 Å | 7ABC | Active state |
| EMD-23456 | 3.1 Å | 8DEF | Inactive state |

**Note**: Cryo-EM structures capture physiologically relevant conformations for membrane protein targets.

*Source: EMDB*
```

### 1.2 Identify Binding Epitope

```python
def identify_epitope(tu, target_structure, epitope_residues=None):
    """Identify or validate binding epitope."""
    
    if epitope_residues:
        # User-specified epitope
        return {'residues': epitope_residues, 'source': 'user-defined'}
    
    # Find surface-exposed regions
    # Use structural analysis to identify potential epitopes
    return analyze_surface(target_structure)
```

### 1.3 Output for Report

```markdown
## 1. Target Characterization

### 1.1 Target Information

| Property | Value |
|----------|-------|
| **Target** | PD-L1 (Programmed death-ligand 1) |
| **UniProt** | Q9NZQ7 |
| **Structure source** | PDB: 4ZQK (2.0 Å resolution) |
| **Binding epitope** | IgV domain, residues 19-127 |
| **Known binders** | Atezolizumab, durvalumab, avelumab |

### 1.2 Epitope Analysis

| Residue Range | Type | Surface Area | Druggability |
|---------------|------|--------------|--------------|
| 54-68 | Loop | 850 Å² | High |
| 115-125 | Beta strand | 420 Å² | Medium |
| 19-30 | N-terminus | 380 Å² | Medium |

**Selected Epitope**: Residues 54-68 (PD-1 binding interface)

*Source: PDB 4ZQK, surface analysis*
```

---

## Phase 2: Backbone Generation

### 2.1 RFdiffusion Design

```python
def generate_backbones(tu, design_params):
    """Generate de novo backbones using RFdiffusion."""
    
    backbones = tu.tools.NvidiaNIM_rfdiffusion(
        diffusion_steps=design_params.get('steps', 50),
        # Additional parameters depending on design type
    )
    
    return backbones
```

### 2.2 Design Modes

| Mode | Use Case | Key Parameters |
|------|----------|----------------|
| **Unconditional** | De novo scaffold | `diffusion_steps` only |
| **Binder design** | Target-guided binder | `target_structure`, `hotspot_residues` |
| **Motif scaffolding** | Functional motif embedding | `motif_sequence`, `motif_structure` |

### 2.3 Output for Report

```markdown
## 2. Backbone Generation

### 2.1 Design Parameters

| Parameter | Value |
|-----------|-------|
| **Method** | RFdiffusion via NVIDIA NIM |
| **Design mode** | Unconditional scaffold generation |
| **Diffusion steps** | 50 |
| **Number generated** | 10 backbones |

### 2.2 Generated Backbones

| Backbone | Length | Topology | Quality |
|----------|--------|----------|---------|
| BB_001 | 85 aa | 3-helix bundle | Good |
| BB_002 | 92 aa | Beta sandwich | Good |
| BB_003 | 78 aa | Alpha-beta | Good |
| BB_004 | 88 aa | All-alpha | Moderate |
| BB_005 | 95 aa | Mixed | Good |

**Selected for sequence design**: BB_001, BB_002, BB_003, BB_005 (top 4)

*Source: NVIDIA NIM via `NvidiaNIM_rfdiffusion`*
```

---

## Phase 3: Sequence Design

### 3.1 ProteinMPNN Design

```python
def design_sequences(tu, backbone_pdb, num_sequences=8):
    """Design sequences for backbone using ProteinMPNN."""
    
    sequences = tu.tools.NvidiaNIM_proteinmpnn(
        pdb_string=backbone_pdb,
        num_sequences=num_sequences,
        temperature=0.1  # Lower = more conservative
    )
    
    return sequences
```

### 3.2 Sampling Parameters

| Parameter | Conservative | Moderate | Diverse |
|-----------|--------------|----------|---------|
| Temperature | 0.1 | 0.2 | 0.5 |
| Sequences per backbone | 4 | 8 | 16 |
| Use case | Validated scaffold | Exploration | Diversity |

### 3.3 Output for Report

```markdown
## 3. Sequence Design

### 3.1 Design Parameters

| Parameter | Value |
|-----------|-------|
| **Method** | ProteinMPNN via NVIDIA NIM |
| **Temperature** | 0.1 (conservative) |
| **Sequences per backbone** | 8 |
| **Total sequences** | 32 |

### 3.2 Designed Sequences (Top 10 by Score)

| Rank | Backbone | Sequence ID | Length | MPNN Score | Predicted pI |
|------|----------|-------------|--------|------------|--------------|
| 1 | BB_001 | Seq_001_A | 85 | -1.89 | 6.2 |
| 2 | BB_002 | Seq_002_C | 92 | -1.95 | 5.8 |
| 3 | BB_001 | Seq_001_B | 85 | -2.01 | 7.1 |
| 4 | BB_003 | Seq_003_A | 78 | -2.08 | 6.5 |
| 5 | BB_005 | Seq_005_B | 95 | -2.12 | 5.4 |

### 3.3 Top Sequence: Seq_001_A

```
>Seq_001_A (85 aa, MPNN score: -1.89)
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH
GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL
```

*Source: NVIDIA NIM via `NvidiaNIM_proteinmpnn`*
```

---

## Phase 4: Structure Validation

### 4.1 ESMFold Validation

```python
def validate_structure(tu, sequence):
    """Validate designed sequence by structure prediction."""
    
    # Fast validation with ESMFold
    predicted = tu.tools.NvidiaNIM_esmfold(sequence=sequence)
    
    # Extract quality metrics
    plddt = extract_plddt(predicted)
    ptm = extract_ptm(predicted)
    
    return {
        'structure': predicted,
        'mean_plddt': np.mean(plddt),
        'ptm': ptm,
        'passes': np.mean(plddt) > 70 and ptm > 0.7
    }
```

### 4.2 Validation Criteria

| Metric | Threshold | Interpretation |
|--------|-----------|----------------|
| Mean pLDDT | >70 | Confident fold |
| pTM | >0.7 | Good global topology |
| RMSD to backbone | <2 Å | Design recapitulated |

### 4.3 Output for Report

```markdown
## 4. Structure Validation

### 4.1 Validation Results

| Sequence | pLDDT | pTM | RMSD to Design | Status |
|----------|-------|-----|----------------|--------|
| Seq_001_A | 88.5 | 0.85 | 1.2 Å | ✓ PASS |
| Seq_002_C | 82.3 | 0.79 | 1.5 Å | ✓ PASS |
| Seq_001_B | 85.1 | 0.82 | 1.3 Å | ✓ PASS |
| Seq_003_A | 79.8 | 0.76 | 1.8 Å | ✓ PASS |
| Seq_005_B | 68.2 | 0.65 | 2.8 Å | ✗ FAIL |

### 4.2 Top Validated Design: Seq_001_A

| Region | Residues | pLDDT | Interpretation |
|--------|----------|-------|----------------|
| Helix 1 | 1-28 | 92.3 | Very high confidence |
| Loop 1 | 29-35 | 78.4 | Moderate confidence |
| Helix 2 | 36-58 | 91.8 | Very high confidence |
| Loop 2 | 59-65 | 75.2 | Moderate confidence |
| Helix 3 | 66-85 | 90.1 | Very high confidence |

**Overall**: Well-folded 3-helix bundle with high confidence core

*Source: NVIDIA NIM via `NvidiaNIM_esmfold`*
```

---

## Phase 5: Developability Assessment

### 5.1 Aggregation Propensity

```python
def assess_aggregation(sequence):
    """Assess aggregation propensity."""
    
    # Calculate hydrophobic patches
    # Calculate isoelectric point
    # Identify aggregation-prone motifs
    
    return {
        'aggregation_score': score,
        'hydrophobic_patches': patches,
        'risk_level': 'Low' if score < 0.5 else 'Medium' if score < 0.7 else 'High'
    }
```

### 5.2 Developability Metrics

| Metric | Favorable | Marginal | Unfavorable |
|--------|-----------|----------|-------------|
| Aggregation score | <0.5 | 0.5-0.7 | >0.7 |
| Isoelectric point | 5-9 | 4-5 or 9-10 | <4 or >10 |
| Hydrophobic patches | <3 | 3-5 | >5 |
| Cysteine count | 0 or even | Odd | Multiple unpaired |

### 5.3 Output for Report

```markdown
## 5. Developability Assessment

### 5.1 Developability Scores

| Design | Aggregation | pI | Cysteines | Expression | Overall |
|--------|-------------|-----|-----------|------------|---------|
| Seq_001_A | 0.32 (Low) | 6.2 | 0 | High | ★★★ |
| Seq_002_C | 0.45 (Low) | 5.8 | 2 (paired) | Medium | ★★☆ |
| Seq_001_B | 0.38 (Low) | 7.1 | 0 | High | ★★★ |
| Seq_003_A | 0.58 (Med) | 6.5 | 0 | Medium | ★★☆ |

### 5.2 Recommendations

**Best candidate for expression**: Seq_001_A
- Low aggregation propensity
- Neutral pI (easy purification)
- No cysteines (no misfolding risk)
- Predicted high E. coli expression

*Source: Sequence analysis*
```

---

## Report Template

```markdown
# Therapeutic Protein Design Report: [TARGET]

**Generated**: [Date] | **Query**: [Original query] | **Status**: In Progress

---

## Executive Summary
[Designing...]

---

## 1. Target Characterization
### 1.1 Target Information
[Designing...]
### 1.2 Binding Epitope
[Designing...]

---

## 2. Backbone Generation
### 2.1 Design Parameters
[Designing...]
### 2.2 Generated Backbones
[Designing...]

---

## 3. Sequence Design
### 3.1 ProteinMPNN Results
[Designing...]
### 3.2 Top Sequences
[Designing...]

---

## 4. Structure Validation
### 4.1 ESMFold Validation
[Designing...]
### 4.2 Quality Metrics
[Designing...]

---

## 5. Developability Assessment
### 5.1 Scores
[Designing...]
### 5.2 Recommendations
[Designing...]

---

## 6. Final Candidates
### 6.1 Ranked List
[Designing...]
### 6.2 Sequences for Testing
[Designing...]

---

## 7. Experimental Recommendations
[Designing...]

---

## 8. Data Sources
[Will be populated...]
```

---

## Evidence Grading

| Tier | Symbol | Criteria |
|------|--------|----------|
| **T1** | ★★★ | pLDDT >85, pTM >0.8, low aggregation, neutral pI |
| **T2** | ★★☆ | pLDDT >75, pTM >0.7, acceptable developability |
| **T3** | ★☆☆ | pLDDT >70, pTM >0.65, developability concerns |
| **T4** | ☆☆☆ | Failed validation or major developability issues |

---

## Completeness Checklist

### Phase 1: Target
- [ ] Target structure obtained (PDB or predicted)
- [ ] Binding epitope identified
- [ ] Existing binders noted

### Phase 2: Backbones
- [ ] ≥5 backbones generated
- [ ] Top 3-5 selected for sequence design
- [ ] Selection criteria documented

### Phase 3: Sequences
- [ ] ≥8 sequences per backbone designed
- [ ] MPNN scores reported
- [ ] Top 10 sequences listed

### Phase 4: Validation
- [ ] All sequences validated by ESMFold
- [ ] pLDDT and pTM reported
- [ ] Pass/fail criteria applied
- [ ] ≥3 passing designs

### Phase 5: Developability
- [ ] Aggregation assessed
- [ ] pI calculated
- [ ] Expression prediction
- [ ] Final ranking

### Phase 6: Deliverables
- [ ] Ranked candidate list
- [ ] FASTA file with sequences
- [ ] Experimental recommendations

---

## Fallback Chains

| Primary Tool | Fallback 1 | Fallback 2 |
|--------------|------------|------------|
| `NvidiaNIM_rfdiffusion` | Manual backbone design | Scaffold from PDB |
| `NvidiaNIM_proteinmpnn` | Rosetta ProteinMPNN | Manual sequence design |
| `NvidiaNIM_esmfold` | `NvidiaNIM_alphafold2` | AlphaFold DB |
| PDB structure | `NvidiaNIM_alphafold2` | AlphaFold DB |

---

## Tool Reference

See [TOOLS_REFERENCE.md](TOOLS_REFERENCE.md) for complete tool documentation.

Related Skills

tooluniverse-variant-interpretation

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.

tooluniverse-variant-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Production-ready VCF processing, variant annotation, mutation analysis, and structural variant (SV/CNV) interpretation for bioinformatics questions. Parses VCF files (streaming, large files), classifies mutation types (missense, nonsense, synonymous, frameshift, splice, intronic, intergenic) and structural variants (deletions, duplications, inversions, translocations), applies VAF/depth/quality/consequence filters, annotates with ClinVar/dbSNP/gnomAD/CADD via ToolUniverse, interprets SV/CNV clinical significance using ClinGen dosage sensitivity scores, computes variant statistics, and generates reports. Solves questions like "What fraction of variants with VAF < 0.3 are missense?", "How many non-reference variants remain after filtering intronic/intergenic?", "What is the pathogenicity of this deletion affecting BRCA1?", or "Which dosage-sensitive genes overlap this CNV?". Use when processing VCF files, annotating variants, filtering by VAF/depth/consequence, classifying mutations, interpreting structural variants, assessing CNV pathogenicity, comparing cohorts, or answering variant analysis questions.

tooluniverse-target-research

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.

tooluniverse-systems-biology

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Comprehensive systems biology and pathway analysis using multiple pathway databases (Reactome, KEGG, WikiPathways, Pathway Commons, BioModels). Performs pathway enrichment, protein-pathway mapping, keyword searches, and systems-level analysis. Use when analyzing gene sets, exploring biological pathways, or investigating systems-level biology.

tooluniverse-structural-variant-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.

tooluniverse-statistical-modeling

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Perform statistical modeling and regression analysis on biomedical datasets. Supports linear regression, logistic regression (binary/ordinal/multinomial), mixed-effects models, Cox proportional hazards survival analysis, Kaplan-Meier estimation, and comprehensive model diagnostics. Extracts odds ratios, hazard ratios, confidence intervals, p-values, and effect sizes. Designed to solve BixBench statistical reasoning questions involving clinical/experimental data. Use when asked to fit regression models, compute odds ratios, perform survival analysis, run statistical tests, or interpret model coefficients from provided data.

tooluniverse-spatial-transcriptomics

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Analyze spatial transcriptomics data to map gene expression in tissue architecture. Supports 10x Visium, MERFISH, seqFISH, Slide-seq, and imaging-based platforms. Performs spatial clustering, domain identification, cell-cell proximity analysis, spatial gene expression patterns, tissue architecture mapping, and integration with single-cell data. Use when analyzing spatial transcriptomics datasets, studying tissue organization, identifying spatial expression patterns, mapping cell-cell interactions in tissue context, characterizing tumor microenvironment spatial structure, or integrating spatial and single-cell RNA-seq data for comprehensive tissue analysis.

tooluniverse-spatial-omics-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Computational analysis framework for spatial multi-omics data integration. Given spatially variable genes (SVGs), spatial domain annotations, tissue type, and disease context from spatial transcriptomics/proteomics experiments (10x Visium, MERFISH, DBiTplus, SLIDE-seq, etc.), performs comprehensive biological interpretation including pathway enrichment, cell-cell interaction inference, druggable target identification, immune microenvironment characterization, and multi-modal integration. Produces a detailed markdown report with Spatial Omics Integration Score (0-100), domain-by-domain characterization, and validation recommendations. Uses 70+ ToolUniverse tools across 9 analysis phases. Use when users ask about spatial transcriptomics analysis, spatial omics interpretation, tissue heterogeneity, spatial gene expression patterns, tumor microenvironment mapping, tissue zonation, or cell-cell communication from spatial data.

tooluniverse-single-cell

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Production-ready single-cell and expression matrix analysis using scanpy, anndata, and scipy. Performs scRNA-seq QC, normalization, PCA, UMAP, Leiden/Louvain clustering, differential expression (Wilcoxon, t-test, DESeq2), cell type annotation, per-cell-type statistical analysis, gene-expression correlation, batch correction (Harmony), trajectory inference, and cell-cell communication analysis. NEW: Analyzes ligand-receptor interactions between cell types using OmniPath (CellPhoneDB, CellChatDB), scores communication strength, identifies signaling cascades, and handles multi-subunit receptor complexes. Integrates with ToolUniverse gene annotation tools (HPA, Ensembl, MyGene, UniProt) and enrichment tools (gseapy, PANTHER, STRING). Supports h5ad, 10X, CSV/TSV count matrices, and pre-annotated datasets. Use when analyzing single-cell RNA-seq data, studying cell-cell interactions, performing cell type differential expression, computing gene-expression correlations by cell type, analyzing tumor-immune communication, or answering questions about scRNA-seq datasets.

tooluniverse-sequence-retrieval

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Retrieves biological sequences (DNA, RNA, protein) from NCBI and ENA with gene disambiguation, accession type handling, and comprehensive sequence profiles. Creates detailed reports with sequence metadata, cross-database references, and download options. Use when users need nucleotide sequences, protein sequences, genome data, or mention GenBank, RefSeq, EMBL accessions.

tooluniverse-rnaseq-deseq2

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Production-ready RNA-seq differential expression analysis using PyDESeq2. Performs DESeq2 normalization, dispersion estimation, Wald testing, LFC shrinkage, and result filtering. Handles multi-factor designs, multiple contrasts, batch effects, and integrates with gene enrichment (gseapy) and ToolUniverse annotation tools (UniProt, Ensembl, OpenTargets). Supports CSV/TSV/H5AD input formats and any organism. Use when analyzing RNA-seq count matrices, identifying DEGs, performing differential expression with statistical rigor, or answering questions about gene expression changes.

tooluniverse-rare-disease-diagnosis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Provide differential diagnosis for patients with suspected rare diseases based on phenotype and genetic data. Matches symptoms to HPO terms, identifies candidate diseases from Orphanet/OMIM, prioritizes genes for testing, interprets variants of uncertain significance. Use when clinician asks about rare disease diagnosis, unexplained phenotypes, or genetic testing interpretation.