protein-design-workflow

End-to-end guidance for protein design pipelines. Use this skill when: (1) Starting a new protein design project, (2) Need step-by-step workflow guidance, (3) Understanding the full design pipeline, (4) Planning compute resources and timelines, (5) Integrating multiple design tools. For tool selection, use binder-design. For QC thresholds, use protein-qc.

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

protein-design-workflow is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using protein-design-workflow should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/protein-design-workflow/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/protein-design-workflow/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/protein-design-workflow/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How protein-design-workflow Compares

Feature / Agent	protein-design-workflow	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Top AI Agents for Productivity

See the top AI agent skills for productivity, workflow automation, operational systems, documentation, and everyday task execution.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

SKILL.md Source

# Protein Design Workflow Guide

## Standard binder design pipeline

### Overview
```
Target Preparation --> Backbone Generation --> Sequence Design
         |                     |                     |
         v                     v                     v
    (pdb skill)          (rfdiffusion)         (proteinmpnn)
                               |                     |
                               v                     v
                        Structure Validation --> Filtering
                               |                     |
                               v                     v
                         (alphafold/chai)      (protein-qc)
```

## Phase 1: Target preparation

### 1.1 Obtain target structure
```bash
# Download from PDB
curl -o target.pdb "https://files.rcsb.org/download/XXXX.pdb"
```

### 1.2 Clean and prepare
```python
# Extract target chain
# Remove waters, ligands if needed
# Trim to binding region + 10A buffer
```

### 1.3 Select hotspots
- Choose 3-6 exposed residues
- Prefer charged/aromatic (K, R, E, D, W, Y, F)
- Check surface accessibility
- Verify residue numbering

**Output**: `target_prepared.pdb`, hotspot list

## Phase 2: Backbone generation

### Option A: RFdiffusion (diverse exploration)
```bash
modal run modal_rfdiffusion.py \
  --pdb target_prepared.pdb \
  --contigs "A1-150/0 70-100" \
  --hotspot "A45,A67,A89" \
  --num-designs 500
```

### Option B: BindCraft (end-to-end)
```bash
modal run modal_bindcraft.py \
  --target-pdb target_prepared.pdb \
  --hotspots "A45,A67,A89" \
  --num-designs 100
```

**Output**: 100-500 backbone PDBs

## Phase 3: Sequence design

### For RFdiffusion backbones
```bash
for backbone in backbones/*.pdb; do
  modal run modal_proteinmpnn.py \
    --pdb-path "$backbone" \
    --num-seq-per-target 8 \
    --sampling-temp 0.1
done
```

**Output**: 8 sequences per backbone (800-4000 total)

## Phase 4: Structure validation

### Predict complexes
```bash
# Prepare FASTA with binder + target
# binder:target format for multimer

modal run modal_colabfold.py \
  --input-faa all_sequences.fasta \
  --out-dir predictions/
```

**Output**: AF2 predictions with pLDDT, ipTM, PAE

## Phase 5: Filtering and selection

### Apply standard thresholds
```python
import pandas as pd

# Load metrics
designs = pd.read_csv('all_metrics.csv')

# Filter
filtered = designs[
    (designs['pLDDT'] > 0.85) &
    (designs['ipTM'] > 0.50) &
    (designs['PAE_interface'] < 10) &
    (designs['scRMSD'] < 2.0) &
    (designs['esm2_pll'] > 0.0)
]

# Rank by composite score
filtered['score'] = (
    0.3 * filtered['pLDDT'] +
    0.3 * filtered['ipTM'] +
    0.2 * (1 - filtered['PAE_interface'] / 20) +
    0.2 * filtered['esm2_pll']
)

top_designs = filtered.nlargest(50, 'score')
```

**Output**: 50-200 filtered candidates

## Resource planning

### Compute requirements

| Stage | GPU | Time (100 designs) |
|-------|-----|-------------------|
| RFdiffusion | A10G | 30 min |
| ProteinMPNN | T4 | 15 min |
| ColabFold | A100 | 4-8 hours |
| Filtering | CPU | 15 min |

### Total timeline
- Small campaign (100 designs): 8-12 hours
- Medium campaign (500 designs): 24-48 hours
- Large campaign (1000+ designs): 2-5 days

## Quality checkpoints

### After backbone generation
- [ ] Visual inspection of diverse backbones
- [ ] Secondary structure present
- [ ] No clashes with target

### After sequence design
- [ ] ESM2 PLL > 0.0 for most sequences
- [ ] No unwanted cysteines (unless intentional)
- [ ] Reasonable sequence diversity

### After validation
- [ ] pLDDT > 0.85
- [ ] ipTM > 0.50
- [ ] PAE_interface < 10
- [ ] Self-consistency RMSD < 2.0 A

### Final selection
- [ ] Diverse sequences (cluster if needed)
- [ ] Manufacturable (no problematic motifs)
- [ ] Reasonable molecular weight

## Common issues

| Problem | Solution |
|---------|----------|
| Low ipTM | Check hotspots, increase designs |
| Poor diversity | Higher temperature, more backbones |
| High scRMSD | Backbone may be unusual |
| Low pLDDT | Check design quality |

## Advanced workflows

### Multi-tool combination
1. RFdiffusion for initial backbones
2. ColabDesign for refinement
3. ProteinMPNN diversification
4. AF2 final validation

### Iterative refinement
1. Run initial campaign
2. Analyze failures
3. Adjust hotspots/parameters
4. Repeat with insights

Related Skills

tooluniverse-protein-therapeutic-design

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Design novel protein therapeutics (binders, enzymes, scaffolds) using AI-guided de novo design. Uses RFdiffusion for backbone generation, ProteinMPNN for sequence design, ESMFold/AlphaFold2 for validation. Use when asked to design protein binders, therapeutic proteins, or engineer protein function.

tooluniverse-protein-structure-retrieval

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Retrieves protein structure data from RCSB PDB, PDBe, and AlphaFold with protein disambiguation, quality assessment, and comprehensive structural profiles. Creates detailed structure reports with experimental metadata, ligand information, and download links. Use when users need protein structures, 3D models, crystallography data, or mention PDB IDs (4-character codes like 1ABC) or UniProt accessions.

protein-interaction-network-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Analyze protein-protein interaction networks using STRING, BioGRID, and SASBDB databases. Maps protein identifiers, retrieves interaction networks with confidence scores, performs functional enrichment analysis (GO/KEGG/Reactome), and optionally includes structural data. No API key required for core functionality (STRING). Use when analyzing protein networks, discovering interaction partners, identifying functional modules, or studying protein complexes.

tooluniverse-clinical-trial-design

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Strategic clinical trial design feasibility assessment using ToolUniverse. Evaluates patient population sizing, biomarker prevalence, endpoint selection, comparator analysis, safety monitoring, and regulatory pathways. Creates comprehensive feasibility reports with evidence grading, enrollment projections, and trial design recommendations. Use when planning Phase 1/2 trials, assessing trial feasibility, or designing biomarker-driven studies.

proteinmpnn

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Design protein sequences using ProteinMPNN inverse folding. Use this skill when: (1) Designing sequences for RFdiffusion backbones, (2) Redesigning existing protein sequences, (3) Fixing specific residues while designing others, (4) Optimizing sequences for expression or stability, (5) Multi-state or negative design. For backbone generation, use rfdiffusion or bindcraft. For ligand-aware design, use ligandmpnn. For solubility optimization, use solublempnn.

protein-qc

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Quality control metrics and filtering thresholds for protein design. Use this skill when: (1) Evaluating design quality for binding, expression, or structure, (2) Setting filtering thresholds for pLDDT, ipTM, PAE, (3) Checking sequence liabilities (cysteines, deamidation, polybasic clusters), (4) Creating multi-stage filtering pipelines, (5) Computing PyRosetta interface metrics (dG, SC, dSASA), (6) Checking biophysical properties (instability, GRAVY, pI), (7) Ranking designs with composite scoring. This skill provides research-backed thresholds from binder design competitions and published benchmarks.

string-protein-interaction-analysis-with-omicverse

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Help Claude query STRING for protein interactions, build PPI graphs with pyPPI, and render styled network figures for bulk gene lists.

bio-read-qc-fastp-workflow

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

All-in-one read preprocessing with fastp including adapter trimming, quality filtering, deduplication, base correction, and HTML report generation. Use when preprocessing Illumina data and wanting a single fast tool instead of separate Cutadapt, Trimmomatic, and FastQC steps.

bio-proteomics-protein-inference

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Protein grouping and inference from peptide identifications. Use when resolving protein ambiguity from shared peptides. Handles protein groups and protein-level FDR control using parsimony and probabilistic approaches.

bio-microbiome-qiime2-workflow

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

QIIME2 command-line workflow for 16S/ITS amplicon analysis. Alternative to DADA2/phyloseq R workflow with built-in provenance tracking. Use when preferring CLI over R, needing reproducible provenance, or working within QIIME2 ecosystem.

bio-genome-engineering-prime-editing-design

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Design pegRNAs for prime editing using PrimeDesign algorithms. Generate spacer, PBS, and RT template sequences for precise genomic modifications without double-strand breaks. Use when designing prime editing experiments for precise insertions, deletions, or point mutations.

bio-genome-engineering-hdr-template-design

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Design homology-directed repair donor templates for CRISPR knock-ins using primer3-py. Create ssODN, dsDNA, or plasmid templates with optimized homology arms. Use when designing donor templates for precise insertions, tagging, or allele replacement.