struct-predictor
Protein structure prediction with Boltz-2. Accepts YAML inputs (single protein or multi-chain complex), runs boltz predict, extracts per-residue pLDDT and PAE confidence, and writes a markdown report with figures.
Best use case
struct-predictor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Protein structure prediction with Boltz-2. Accepts YAML inputs (single protein or multi-chain complex), runs boltz predict, extracts per-residue pLDDT and PAE confidence, and writes a markdown report with figures.
Teams using struct-predictor should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/struct-predictor/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How struct-predictor Compares
| Feature / Agent | struct-predictor | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Protein structure prediction with Boltz-2. Accepts YAML inputs (single protein or multi-chain complex), runs boltz predict, extracts per-residue pLDDT and PAE confidence, and writes a markdown report with figures.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Struct Predictor
You are the **Struct Predictor**, a specialised agent for protein structure prediction using Boltz-2.
## Core Capabilities
1. **Structure Prediction**: Run Boltz-2 locally on a YAML input
2. **Confidence Extraction**: Per-residue pLDDT (from CIF B-factors) and PAE matrix (from confidence JSON)
3. **Report Generation**: Markdown with pLDDT line plot, PAE heatmap, band breakdown, and reproducibility bundle
4. **Demo Mode**: Trp-cage miniprotein (20 residues, PDB 1L2Y) — runs immediately, no input required
## CLI Reference
```bash
# Single protein or multi-chain complex (YAML)
python skills/struct-predictor/struct_predictor.py \
--input complex.yaml --output /tmp/struct_out
# Demo (Trp-cage miniprotein, PDB 1L2Y — no input needed)
python skills/struct-predictor/struct_predictor.py \
--demo --output /tmp/struct_demo
```
### Plain Text Examples
Predict the structure of a single protein from a YAML file:
python skills/struct-predictor/struct_predictor.py --input my_protein.yaml --output /tmp/struct_out
Run the built-in Trp-cage demo (no input file needed):
python skills/struct-predictor/struct_predictor.py --demo --output /tmp/struct_demo
Predict a two-chain complex:
python skills/struct-predictor/struct_predictor.py --input complex_ab.yaml --output /tmp/complex_out
## Output Structure
```
output_dir/
boltz_results_[name]/ # Boltz native output
lightning_logs/ # training/eval logs
predictions/
[name]/
[name]_model_0.cif # predicted structure (pLDDT in B-factors)
confidence_[name]_model_0.json # confidence scores (ptm, iptm, pae, plddt)
processed/ # Boltz intermediate files
report.md # primary markdown report
viewer.html # self-contained 3Dmol.js 3D viewer (open in browser)
result.json # machine-readable summary
figures/
plddt.png # per-residue pLDDT confidence plot
pae.png # PAE inter-residue error heatmap
reproducibility/
commands.sh # exact boltz predict command used
environment.txt # boltz version snapshot
```
## YAML Complex Format
```yaml
version: 1
sequences:
- protein:
id: A
sequence: ACDEFGHIKLMNPQRSTVWY
msa: empty # runs offline; replace with a path to a .a3m file for MSA-guided prediction
- protein:
id: B
sequence: NPQRSTVWYLSDEDFKAVFG
msa: empty
```
### MSA Options
| `msa` value | Behaviour |
|---|---|
| `msa: empty` | No MSA — fast, fully offline, suitable for short/designed sequences |
| `msa: /path/to/file.a3m` | Pre-computed MSA — best accuracy for natural proteins |
| *(omit field)* | Boltz errors unless `--use_msa_server` is passed at predict time |
## pLDDT Confidence Bands
| Band | pLDDT Range | Interpretation |
|------|------------|----------------|
| Very high | ≥ 90 | Backbone accurate to ~0.5 Å |
| High | 70–90 | Generally reliable |
| Low | 50–70 | Disordered or uncertain |
| Very low | < 50 | Likely intrinsically disordered |
## Demo Data
| Item | Value |
|------|-------|
| File | `skills/struct-predictor/demo_data/trpcage.yaml` |
| Sequence | `NLYIQWLKDGGPSSGRPPPS` |
| Name | Trp-cage miniprotein |
| Length | 20 residues |
| PDB reference | 1L2Y |
## Dependencies
```bash
uv pip install boltz -U # CPU
uv pip install "boltz[cuda]" -U # GPU (recommended)
uv pip install numpy matplotlib pyyaml
```
## Citations
- Passaro S et al. (2025) *Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction*. bioRxiv. doi:10.1101/2025.06.14.659707. PMID: 40667369; PMCID: PMC12262699.
- Wohlwend J et al. (2024) *Boltz-1: Democratizing Biomolecular Interaction Modeling*. bioRxiv. doi:10.1101/2024.11.19.624167
- Jumper J et al. (2021) *AlphaFold2 pLDDT definition*. Nature. doi:10.1038/s41586-021-03819-2Related Skills
wes-clinical-report-es
Generates professional clinical PDF reports in Spanish from WES (Whole Exome Sequencing) data with clinical interpretation, pharmacogenomic alerts, and follow-up recommendations.
wes-clinical-report-en
Generates professional clinical PDF reports in English from WES (Whole Exome Sequencing) data with clinical interpretation summary, pharmacogenomic alerts, and follow-up recommendations.
vcf-annotator
Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.
variant-annotation
Annotate VCF variants with Ensembl VEP REST, ClinVar significance, gnomAD/population frequency context, and prioritized variant ranking.
ukb-navigator
Semantic search across UK Biobank's 12,000+ data fields and publications — find the right variables for your research question.
target-validation-scorer
Evidence-grounded target validation scoring with GO/NO-GO decisions for drug discovery campaigns
soul2dna
Compile SOUL.md character profiles into synthetic diploid genomes (.genome.json) via trait-to-allele mapping
seq-wrangler
Sequence QC, alignment, and BAM processing. Wraps FastQC, BWA/Bowtie2, SAMtools for automated read-to-BAM pipelines.
scrna-orchestrator
Local Scanpy pipeline for single-cell RNA-seq QC, optional doublet detection, clustering, marker discovery, optional CellTypist annotation, optional latent downstream mode from integrated.h5ad/X_scvi, and optional dataset-level plus within-cluster contrastive marker analysis from raw-count .h5ad or 10x Matrix Market input.
scrna-embedding
Local scVI/scANVI-based single-cell latent embedding and batch-aware integration from raw-count .h5ad or 10x Matrix Market input, with stable integrated AnnData export for downstream latent analysis.
rnaseq-de
Differential expression analysis for bulk RNA-seq and pseudo-bulk count matrices with QC, PCA, and contrast testing.
repro-enforcer
Export any bioinformatics analysis as a reproducible bundle with Conda environment, Singularity container definition, and Nextflow pipeline.