alphafold
Validate protein designs using AlphaFold2 structure prediction. Use this skill when: (1) Validating designed sequences fold correctly, (2) Predicting binder-target complex structures, (3) Calculating confidence metrics (pLDDT, pTM, ipTM), (4) Self-consistency validation of designs, (5) Multi-chain complex prediction with AlphaFold-Multimer. For faster single-chain prediction, use esm. For QC thresholds, use protein-qc.
Best use case
alphafold is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Validate protein designs using AlphaFold2 structure prediction. Use this skill when: (1) Validating designed sequences fold correctly, (2) Predicting binder-target complex structures, (3) Calculating confidence metrics (pLDDT, pTM, ipTM), (4) Self-consistency validation of designs, (5) Multi-chain complex prediction with AlphaFold-Multimer. For faster single-chain prediction, use esm. For QC thresholds, use protein-qc.
Teams using alphafold should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/alphafold/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How alphafold Compares
| Feature / Agent | alphafold | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Validate protein designs using AlphaFold2 structure prediction. Use this skill when: (1) Validating designed sequences fold correctly, (2) Predicting binder-target complex structures, (3) Calculating confidence metrics (pLDDT, pTM, ipTM), (4) Self-consistency validation of designs, (5) Multi-chain complex prediction with AlphaFold-Multimer. For faster single-chain prediction, use esm. For QC thresholds, use protein-qc.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# AlphaFold2 Structure Validation
## Prerequisites
| Requirement | Minimum | Recommended |
|-------------|---------|-------------|
| Python | 3.8+ | 3.10 |
| CUDA | 11.0+ | 12.0+ |
| GPU VRAM | 32GB | 40GB (A100) |
| RAM | 32GB | 64GB |
| Disk | 100GB | 500GB (for databases) |
## How to run
> **First time?** See [Installation Guide](../../docs/installation.md) to set up Modal and biomodals.
### Option 1: ColabFold (recommended for multimer)
```bash
cd biomodals
modal run modal_colabfold.py \
--input-faa sequences.fasta \
--out-dir output/
```
**GPU**: A100 (40GB) | **Timeout**: 3600s default
### Option 2: Local installation
```bash
git clone https://github.com/deepmind/alphafold.git
cd alphafold
python run_alphafold.py \
--fasta_paths=query.fasta \
--output_dir=output/ \
--model_preset=monomer \
--max_template_date=2026-01-01
```
### Option 3: ESMFold (fast single-chain)
```bash
modal run modal_esmfold.py \
--sequence "MKTAYIAKQRQISFVK..."
```
## Key parameters
| Parameter | Default | Options | Description |
|-----------|---------|---------|-------------|
| `--model_preset` | monomer | monomer/multimer | Model type |
| `--num_recycle` | 3 | 1-20 | Recycling iterations |
| `--max_template_date` | - | YYYY-MM-DD | Template cutoff |
| `--use_templates` | True | True/False | Use template search |
## Output format
```
output/
├── ranked_0.pdb # Best model
├── ranked_1.pdb # Second best
├── ranking_debug.json # Confidence scores
├── result_model_1.pkl # Full results
├── msas/ # MSA files
└── features.pkl # Input features
```
### Extracting metrics
```python
import pickle
with open('result_model_1.pkl', 'rb') as f:
result = pickle.load(f)
plddt = result['plddt']
ptm = result['ptm']
iptm = result.get('iptm', None) # Multimer only
pae = result['predicted_aligned_error']
```
## Sample output
### Successful run
```
$ python run_alphafold.py --fasta_paths complex.fasta --model_preset multimer
[INFO] Running MSA search...
[INFO] Running model 1/5...
[INFO] Running model 5/5...
[INFO] Relaxing structures...
Results:
ranked_0.pdb:
pLDDT: 87.3 (mean)
pTM: 0.78
ipTM: 0.62
PAE (interface): 8.5
Saved to output/
```
**What good output looks like:**
- pLDDT: > 85 (mean, on 0-100 scale) or > 0.85 (normalized)
- pTM: > 0.70
- ipTM: > 0.50 for complexes
- PAE_interface: < 10
## Decision tree
```
Should I use AlphaFold?
│
├─ What are you predicting?
│ ├─ Single protein → ESMFold (faster)
│ ├─ Protein-protein complex → AlphaFold/ColabFold ✓
│ ├─ Protein + ligand → Chai or Boltz
│ └─ Batch of sequences → ColabFold ✓
│
├─ What do you need?
│ ├─ Highest accuracy → AlphaFold/ColabFold ✓
│ ├─ Fast screening → ESMFold
│ └─ MSA-free prediction → Chai or ESMFold
│
└─ Which AF2 option?
├─ Local installation → Full control, slow setup
├─ ColabFold → Easier, MSA server
└─ Modal → Recommended for batch
```
## Typical performance
| Campaign Size | Time (A100) | Cost (Modal) | Notes |
|---------------|-------------|--------------|-------|
| 100 complexes | 1-2h | ~$8 | With MSA server |
| 500 complexes | 5-10h | ~$40 | Standard campaign |
| 1000 complexes | 10-20h | ~$80 | Large campaign |
**Per-complex**: ~30-60s with MSA server.
---
## Verify
```bash
find output -name "ranked_0.pdb" | wc -l # Should match input count
```
---
## Troubleshooting
**Low pLDDT regions**: May indicate disorder or poor design
**Low ipTM**: Interface not confident, check hotspots
**High PAE off-diagonal**: Chains may not interact
**OOM errors**: Use ColabFold with MSA server instead
### Error interpretation
| Error | Cause | Fix |
|-------|-------|-----|
| `RuntimeError: CUDA out of memory` | Sequence too long | Use A100 or split prediction |
| `KeyError: 'iptm'` | Running monomer on complex | Use multimer preset |
| `FileNotFoundError: database` | Missing MSA databases | Use ColabFold MSA server |
| `TimeoutError` | MSA search slow | Reduce num_recycles |
---
**Next**: `protein-qc` for filtering and ranking.Related Skills
bio-structural-biology-alphafold-predictions
Access and analyze AlphaFold protein structure predictions. Use when predicted structures are needed for proteins without experimental structures, or for confidence scores (pLDDT).
alphafold-database
Access AlphaFold's 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.
zinc-database
Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.
zarr-python
Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.
xlsx
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.
writing-skills
Use when creating new skills, editing existing skills, or verifying skills work before deployment
writing-plans
Use when you have a spec or requirements for a multi-step task, before touching code
wikipedia-search
Search and fetch structured content from Wikipedia using the MediaWiki API for reliable, encyclopedic information
wellally-tech
Integrate digital health data sources (Apple Health, Fitbit, Oura Ring) and connect to WellAlly.tech knowledge base. Import external health device data, standardize to local format, and recommend relevant WellAlly.tech knowledge base articles based on health data. Support generic CSV/JSON import, provide intelligent article recommendations, and help users better manage personal health data.
weightloss-analyzer
分析减肥数据、计算代谢率、追踪能量缺口、管理减肥阶段
<!--
# COPYRIGHT NOTICE
verification-before-completion
Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always