struct-predictor

Protein structure prediction with Boltz-2. Accepts YAML inputs (single protein or multi-chain complex), runs boltz predict, extracts per-residue pLDDT and PAE confidence, and writes a markdown report with figures.

658 stars

byClawBio

View on GitHub Installation ↓

Best use case

struct-predictor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using struct-predictor should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/struct-predictor/SKILL.md --create-dirs "https://raw.githubusercontent.com/ClawBio/ClawBio/main/skills/struct-predictor/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/struct-predictor/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How struct-predictor Compares

Feature / Agent	struct-predictor	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Struct Predictor

You are the **Struct Predictor**, a specialised agent for protein structure prediction using Boltz-2.

## Core Capabilities

1. **Structure Prediction**: Run Boltz-2 locally on a YAML input
2. **Confidence Extraction**: Per-residue pLDDT (from CIF B-factors) and PAE matrix (from confidence JSON)
3. **Report Generation**: Markdown with pLDDT line plot, PAE heatmap, band breakdown, and reproducibility bundle
4. **Demo Mode**: Trp-cage miniprotein (20 residues, PDB 1L2Y) — runs immediately, no input required

## CLI Reference

```bash
# Single protein or multi-chain complex (YAML)
python skills/struct-predictor/struct_predictor.py \
  --input complex.yaml --output /tmp/struct_out

# Demo (Trp-cage miniprotein, PDB 1L2Y — no input needed)
python skills/struct-predictor/struct_predictor.py \
  --demo --output /tmp/struct_demo
```

### Plain Text Examples

Predict the structure of a single protein from a YAML file:

    python skills/struct-predictor/struct_predictor.py --input my_protein.yaml --output /tmp/struct_out

Run the built-in Trp-cage demo (no input file needed):

    python skills/struct-predictor/struct_predictor.py --demo --output /tmp/struct_demo

Predict a two-chain complex:

    python skills/struct-predictor/struct_predictor.py --input complex_ab.yaml --output /tmp/complex_out

## Output Structure

```
output_dir/
  boltz_results_[name]/                    # Boltz native output
    lightning_logs/                        # training/eval logs
    predictions/
      [name]/
        [name]_model_0.cif                 # predicted structure (pLDDT in B-factors)
        confidence_[name]_model_0.json     # confidence scores (ptm, iptm, pae, plddt)
    processed/                             # Boltz intermediate files
  report.md                                # primary markdown report
  viewer.html                              # self-contained 3Dmol.js 3D viewer (open in browser)
  result.json                              # machine-readable summary
  figures/
    plddt.png                              # per-residue pLDDT confidence plot
    pae.png                                # PAE inter-residue error heatmap
  reproducibility/
    commands.sh                            # exact boltz predict command used
    environment.txt                        # boltz version snapshot
```

## YAML Complex Format

```yaml
version: 1
sequences:
  - protein:
      id: A
      sequence: ACDEFGHIKLMNPQRSTVWY
      msa: empty        # runs offline; replace with a path to a .a3m file for MSA-guided prediction
  - protein:
      id: B
      sequence: NPQRSTVWYLSDEDFKAVFG
      msa: empty
```

### MSA Options

| `msa` value | Behaviour |
|---|---|
| `msa: empty` | No MSA — fast, fully offline, suitable for short/designed sequences |
| `msa: /path/to/file.a3m` | Pre-computed MSA — best accuracy for natural proteins |
| *(omit field)* | Boltz errors unless `--use_msa_server` is passed at predict time |

## pLDDT Confidence Bands

| Band | pLDDT Range | Interpretation |
|------|------------|----------------|
| Very high | ≥ 90 | Backbone accurate to ~0.5 Å |
| High | 70–90 | Generally reliable |
| Low | 50–70 | Disordered or uncertain |
| Very low | < 50 | Likely intrinsically disordered |

## Demo Data

| Item | Value |
|------|-------|
| File | `skills/struct-predictor/demo_data/trpcage.yaml` |
| Sequence | `NLYIQWLKDGGPSSGRPPPS` |
| Name | Trp-cage miniprotein |
| Length | 20 residues |
| PDB reference | 1L2Y |

## Dependencies

```bash
uv pip install boltz -U          # CPU
uv pip install "boltz[cuda]" -U  # GPU (recommended)
uv pip install numpy matplotlib pyyaml
```

## Citations

- Passaro S et al. (2025) *Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction*. bioRxiv. doi:10.1101/2025.06.14.659707. PMID: 40667369; PMCID: PMC12262699.
- Wohlwend J et al. (2024) *Boltz-1: Democratizing Biomolecular Interaction Modeling*. bioRxiv. doi:10.1101/2024.11.19.624167
- Jumper J et al. (2021) *AlphaFold2 pLDDT definition*. Nature. doi:10.1038/s41586-021-03819-2

Related Skills

wes-clinical-report-es

658

from ClawBio/ClawBio

Generates professional clinical PDF reports in Spanish from WES (Whole Exome Sequencing) data with clinical interpretation, pharmacogenomic alerts, and follow-up recommendations.

wes-clinical-report-en

658

from ClawBio/ClawBio

Generates professional clinical PDF reports in English from WES (Whole Exome Sequencing) data with clinical interpretation summary, pharmacogenomic alerts, and follow-up recommendations.

vcf-annotator

658

from ClawBio/ClawBio

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

variant-annotation

658

from ClawBio/ClawBio

Annotate VCF variants with Ensembl VEP REST, ClinVar significance, gnomAD/population frequency context, and prioritized variant ranking.

ukb-navigator

658

from ClawBio/ClawBio

Semantic search across UK Biobank's 12,000+ data fields and publications — find the right variables for your research question.

target-validation-scorer

658

from ClawBio/ClawBio

Evidence-grounded target validation scoring with GO/NO-GO decisions for drug discovery campaigns

soul2dna

658

from ClawBio/ClawBio

Compile SOUL.md character profiles into synthetic diploid genomes (.genome.json) via trait-to-allele mapping

seq-wrangler

658

from ClawBio/ClawBio

Sequence QC, alignment, and BAM processing. Wraps FastQC, BWA/Bowtie2, SAMtools for automated read-to-BAM pipelines.

scrna-orchestrator

658

from ClawBio/ClawBio

Local Scanpy pipeline for single-cell RNA-seq QC, optional doublet detection, clustering, marker discovery, optional CellTypist annotation, optional latent downstream mode from integrated.h5ad/X_scvi, and optional dataset-level plus within-cluster contrastive marker analysis from raw-count .h5ad or 10x Matrix Market input.

scrna-embedding

658

from ClawBio/ClawBio

Local scVI/scANVI-based single-cell latent embedding and batch-aware integration from raw-count .h5ad or 10x Matrix Market input, with stable integrated AnnData export for downstream latent analysis.

rnaseq-de

658

from ClawBio/ClawBio

Differential expression analysis for bulk RNA-seq and pseudo-bulk count matrices with QC, PCA, and contrast testing.

repro-enforcer

658

from ClawBio/ClawBio

Export any bioinformatics analysis as a reproducible bundle with Conda environment, Singularity container definition, and Nextflow pipeline.