bio-structural-biology-modern-structure-prediction

Predict protein structures using modern ML models including AlphaFold3, ESMFold, Chai-1, and Boltz-1. Use when predicting structures for novel proteins, protein complexes, or when comparing predictions across multiple methods.

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

bio-structural-biology-modern-structure-prediction is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using bio-structural-biology-modern-structure-prediction should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-structural-biology-modern-structure-prediction/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-structural-biology-modern-structure-prediction/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bio-structural-biology-modern-structure-prediction/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bio-structural-biology-modern-structure-prediction Compares

Feature / Agent	bio-structural-biology-modern-structure-prediction	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## Version Compatibility

Reference examples tested with: BioPython 1.83+, numpy 1.26+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Modern Structure Prediction

**"Predict the structure of my protein"** → Run ML-based structure prediction using ESMFold (single-sequence, fast), AlphaFold3 (MSA-based, highest accuracy), Chai-1, or Boltz-1 and compare predictions across methods.
- Python: ESMFold API via `requests`, local ESMFold with `esm.pretrained`

Predict protein structures using state-of-the-art machine learning models. This covers cloud APIs, local installations, and interpretation of results.

## Model Comparison

| Model | Complexes | Ligands | Speed | Access |
|-------|-----------|---------|-------|--------|
| AlphaFold3 | Yes | Yes | Slow | Server only (2025) |
| ESMFold | No | No | Fast | API or local |
| Chai-1 | Yes | Yes | Moderate | Local or API |
| Boltz-1 | Yes | Yes | Moderate | Local |
| ColabFold | No* | No | Moderate | Colab/local |

*ColabFold can predict complexes with AlphaFold-Multimer.

## ESMFold (Fastest Single-Chain)

**Goal:** Predict a protein's 3D structure from its amino acid sequence using the ESMFold language model, which requires no MSA and runs in seconds.

**Approach:** Submit the sequence to the ESMFold API (or run locally with the esm library), retrieve the predicted PDB coordinates, and assess per-residue confidence via pLDDT scores in the B-factor column.

### Via ESM Atlas API

```python
import requests

def predict_esmfold(sequence):
    '''Predict structure using ESMFold API'''
    url = 'https://api.esmatlas.com/foldSequence/v1/pdb/'
    response = requests.post(url, data=sequence, timeout=300)
    if response.status_code == 200:
        return response.text
    raise Exception(f'ESMFold failed: {response.status_code}')

sequence = 'MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH'
pdb_text = predict_esmfold(sequence)
with open('predicted.pdb', 'w') as f:
    f.write(pdb_text)
```

### Local ESMFold

```python
import torch
import esm

def predict_esmfold_local(sequence, device='cuda'):
    '''Run ESMFold locally (requires ~16GB GPU memory)'''
    model = esm.pretrained.esmfold_v1()
    model = model.eval().to(device)

    with torch.no_grad():
        output = model.infer_pdb(sequence)
    return output

# Extract pLDDT from ESMFold output
def extract_esmfold_plddt(pdb_text):
    plddt = {}
    for line in pdb_text.split('\n'):
        if line.startswith('ATOM') and line[12:16].strip() == 'CA':
            resnum = int(line[22:26])
            bfactor = float(line[60:66])
            plddt[resnum] = bfactor
    return plddt
```

## AlphaFold3 (Server)

AlphaFold3 predictions via the server at alphafoldserver.com.

### Prepare Input JSON

```python
import json

def create_af3_input(sequences, job_name='prediction'):
    '''Create AlphaFold3 server input JSON'''
    entities = []
    for i, seq in enumerate(sequences):
        entities.append({
            'type': 'protein',
            'sequence': seq,
            'count': 1
        })

    job = {
        'name': job_name,
        'modelSeeds': [1],
        'sequences': entities
    }
    return json.dumps(job, indent=2)

# Single protein
input_json = create_af3_input(['MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH'])

# Protein complex
input_json = create_af3_input([
    'MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH',
    'MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSS'
])
```

### Process AF3 Results

```python
import json
from Bio.PDB import PDBParser
import numpy as np

def analyze_af3_result(result_dir):
    '''Analyze AlphaFold3 prediction results'''
    # Load summary
    with open(f'{result_dir}/summary_confidences.json') as f:
        summary = json.load(f)

    # Extract confidence metrics
    iptm = summary.get('iptm', None)  # Interface pTM (complexes)
    ptm = summary.get('ptm', None)    # Predicted TM-score
    ranking = summary.get('ranking_score', None)

    print(f'pTM: {ptm:.3f}' if ptm else 'pTM: N/A')
    print(f'ipTM: {iptm:.3f}' if iptm else 'ipTM: N/A')

    return summary
```

### AF3 Confidence Interpretation

| Metric | Range | Interpretation |
|--------|-------|----------------|
| pTM | 0-1 | Overall structure confidence |
| ipTM | 0-1 | Interface prediction quality |
| pLDDT | 0-100 | Per-residue confidence |
| PAE | 0-30A | Position error between residue pairs |

## Chai-1 (Local Open-Source)

### Installation

```bash
pip install chai-lab
```

### Basic Prediction

```python
from chai_lab.chai1 import run_inference
import numpy as np
from pathlib import Path

def predict_chai1(fasta_path, output_dir='chai_output'):
    '''Run Chai-1 structure prediction'''
    Path(output_dir).mkdir(exist_ok=True)

    candidates = run_inference(
        fasta_file=Path(fasta_path),
        output_dir=Path(output_dir),
        num_trunk_recycles=3,        # 3: Standard. Use 5+ for difficult targets.
        num_diffn_timesteps=200,     # 200: Standard. 500 for higher quality.
        seed=42,
        device='cuda:0'
    )
    return candidates

# Candidates are sorted by confidence
# candidates.cif files contain predicted structures
```

### Chai-1 with Ligands

```python
# Chai-1 supports protein-ligand complexes
# Include ligand SMILES in input FASTA with special format

def create_chai_fasta_with_ligand(protein_seq, ligand_smiles, output_file):
    '''Create Chai-1 input with protein and ligand'''
    with open(output_file, 'w') as f:
        f.write('>protein|chain_A\n')
        f.write(f'{protein_seq}\n')
        f.write('>ligand|chain_B\n')
        f.write(f'{ligand_smiles}\n')
```

## Boltz-1 (Open-Source Complex Prediction)

### Installation

```bash
pip install boltz
```

### Basic Prediction

```python
from boltz import Boltz1

def predict_boltz1(sequences, output_dir='boltz_output'):
    '''Run Boltz-1 structure prediction'''
    model = Boltz1()

    result = model.predict(
        sequences=sequences,
        output_dir=output_dir,
        recycling_steps=3,   # 3: Standard. Increase for difficult targets.
        sampling_steps=200   # 200: Standard. 500 for publication quality.
    )
    return result
```

### Boltz-1 for Complexes

```python
# Boltz-1 handles heteromeric complexes
def predict_complex_boltz(chain_sequences):
    '''Predict protein complex with Boltz-1'''
    model = Boltz1()

    result = model.predict(
        sequences=chain_sequences,  # List of sequences for each chain
        output_dir='complex_output'
    )

    # Extract interface metrics
    return result
```

## ColabFold (AlphaFold2 + MMseqs2)

### Command Line

```bash
# Install ColabFold
pip install colabfold

# Run prediction
colabfold_batch input.fasta output_dir/

# With custom templates
colabfold_batch input.fasta output_dir/ --templates

# For complexes (use : to separate chains)
# Create FASTA like: >complex\nSEQUENCE1:SEQUENCE2
```

### Python API

```python
from colabfold.batch import run_colabfold

def predict_colabfold(fasta_file, output_dir, use_templates=False):
    '''Run ColabFold prediction'''
    run_colabfold(
        input_path=fasta_file,
        result_dir=output_dir,
        use_templates=use_templates,
        num_models=5,           # 5: Standard. Use 1 for quick predictions.
        num_recycles=3,         # 3: Standard. Increase for multimers.
        model_order=[1,2,3,4,5]
    )
```

## Comparing Predictions

```python
from Bio.PDB import PDBParser, Superimposer
import numpy as np

def compare_predictions(pdb_files, labels=None):
    '''Compare multiple structure predictions'''
    parser = PDBParser(QUIET=True)
    structures = [parser.get_structure(f'model_{i}', f) for i, f in enumerate(pdb_files)]

    # Extract CA atoms from first chain
    def get_ca_atoms(struct):
        return [r['CA'] for r in struct[0].get_residues() if 'CA' in r]

    all_atoms = [get_ca_atoms(s) for s in structures]

    # Pairwise RMSD
    n = len(structures)
    rmsd_matrix = np.zeros((n, n))

    for i in range(n):
        for j in range(i+1, n):
            min_len = min(len(all_atoms[i]), len(all_atoms[j]))
            super_imposer = Superimposer()
            super_imposer.set_atoms(all_atoms[i][:min_len], all_atoms[j][:min_len])
            rmsd_matrix[i,j] = rmsd_matrix[j,i] = super_imposer.rms

    return rmsd_matrix

# Compare ESMFold vs AlphaFold3 vs Chai-1
rmsd = compare_predictions(['esmfold.pdb', 'af3.pdb', 'chai1.pdb'])
print('RMSD matrix:')
print(rmsd)
```

## When to Use Each Model

| Scenario | Recommended Model |
|----------|-------------------|
| Quick single-chain prediction | ESMFold (API) |
| Highest accuracy single chain | AlphaFold3 or ColabFold |
| Protein-protein complex | AlphaFold3, Chai-1, or Boltz-1 |
| Protein-ligand complex | AlphaFold3 or Chai-1 |
| No GPU available | ESMFold API or AlphaFold3 server |
| Large-scale screening | ESMFold (local) |
| Open-source requirement | Chai-1 or Boltz-1 |

## Memory Requirements

| Model | GPU Memory | Notes |
|-------|------------|-------|
| ESMFold | ~16 GB | Sequence length dependent |
| ColabFold | ~8-16 GB | Model size dependent |
| Chai-1 | ~24 GB | Complex size dependent |
| Boltz-1 | ~24 GB | Complex size dependent |

## Related Skills

- alphafold-predictions - Download pre-computed AlphaFold structures
- structure-io - Parse and write structure files
- geometric-analysis - RMSD, superimposition, distance calculations
- structure-navigation - Navigate predicted structure hierarchy

Related Skills

tooluniverse-systems-biology

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Comprehensive systems biology and pathway analysis using multiple pathway databases (Reactome, KEGG, WikiPathways, Pathway Commons, BioModels). Performs pathway enrichment, protein-pathway mapping, keyword searches, and systems-level analysis. Use when analyzing gene sets, exploring biological pathways, or investigating systems-level biology.

tooluniverse-structural-variant-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.

tooluniverse-protein-structure-retrieval

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Retrieves protein structure data from RCSB PDB, PDBe, and AlphaFold with protein disambiguation, quality assessment, and comprehensive structural profiles. Creates detailed structure reports with experimental metadata, ligand information, and download links. Use when users need protein structures, 3D models, crystallography data, or mention PDB IDs (4-character codes like 1ABC) or UniProt accessions.

tooluniverse-immunotherapy-response-prediction

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Predict patient response to immune checkpoint inhibitors (ICIs) using multi-biomarker integration. Given a cancer type, somatic mutations, and optional biomarkers (TMB, PD-L1, MSI status), performs systematic analysis across 11 phases covering TMB classification, neoantigen burden estimation, MSI/MMR assessment, PD-L1 evaluation, immune microenvironment profiling, mutation-based resistance/sensitivity prediction, clinical evidence retrieval, and multi-biomarker score integration. Generates a quantitative ICI Response Score (0-100), response likelihood tier, specific ICI drug recommendations with evidence, resistance risk factors, and a monitoring plan. Use when oncologists ask about immunotherapy eligibility, checkpoint inhibitor selection, or biomarker-guided ICI treatment decisions.

modern-drug-rehab-computer

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Comprehensive knowledge system for addiction recovery environments, supporting both residential and outpatient (IOP/PHP) patients. Expert in evidence-based treatment modalities (CBT, DBT, MI, EMDR, MAT), recovery resources, coping strategies, crisis intervention, family systems, and holistic wellness. Activate on "rehab", "addiction recovery", "substance abuse", "treatment center", "IOP", "PHP", "detox", "sobriety support", "MAT", "Suboxone", "methadone", "12 step", "SMART Recovery". NOT for prescribing medications (consult medical professionals), emergency overdose situations (call 911), or replacing licensed counselors/therapists.

bio-variant-calling-structural-variant-calling

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Call structural variants (SVs) from short-read sequencing using Manta, Delly, and LUMPY. Detects deletions, insertions, inversions, duplications, and translocations that are too large for standard SNV callers. Use when detecting structural variants from short-read data.

bio-substructure-search

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Searches molecular libraries for substructure matches using SMARTS patterns with RDKit. Filters compounds by pharmacophore features, functional groups, or scaffold matches with atom mapping. Use when finding compounds containing specific chemical moieties or filtering libraries by structural features.

bio-structural-biology-alphafold-predictions

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Access and analyze AlphaFold protein structure predictions. Use when predicted structures are needed for proteins without experimental structures, or for confidence scores (pLDDT).

bio-pdb-structure-navigation

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Navigate protein structure hierarchy using Biopython Bio.PDB SMCRA model. Use when accessing models, chains, residues, and atoms, iterating over structure levels, or extracting sequences from PDB files.

bio-pdb-structure-modification

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Modify protein structures using Biopython Bio.PDB. Use when transforming coordinates, removing atoms or residues, adding new entities, modifying B-factors and occupancies, or building structures programmatically.

bio-pdb-structure-io

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Parse and write protein structure files using Biopython Bio.PDB. Use when reading PDB, mmCIF, and MMTF files, downloading structures from RCSB PDB, or writing structures to various formats.

bio-microbiome-functional-prediction

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Predict metagenome functional content from 16S rRNA marker gene data using PICRUSt2. Infer KEGG, MetaCyc, and EC abundances from ASV tables. Use when functional profiling is needed from 16S data without shotgun metagenomics sequencing.