tooluniverse-vaccine-design

Design and evaluate vaccine candidates using computational immunology tools. Covers epitope prediction (MHC-I/II binding via IEDB), population coverage analysis, antigen selection, adjuvant matching, and immunogenicity assessment. Integrates IEDB for epitope prediction, UniProt for antigen sequences, PDB/AlphaFold for structural epitopes, BVBRC for pathogen proteomes, and literature for clinical precedent. Use when asked about vaccine design, epitope prediction, immunogenicity, MHC binding, T-cell epitopes, B-cell epitopes, or population coverage for vaccine candidates.

1,202 stars

bymims-harvard

View on GitHub Installation ↓

Best use case

tooluniverse-vaccine-design is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using tooluniverse-vaccine-design should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/tooluniverse-vaccine-design/SKILL.md --create-dirs "https://raw.githubusercontent.com/mims-harvard/ToolUniverse/main/skills/tooluniverse-vaccine-design/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/tooluniverse-vaccine-design/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How tooluniverse-vaccine-design Compares

Feature / Agent	tooluniverse-vaccine-design	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# Vaccine Design

Computational pipeline for designing peptide/subunit vaccine candidates through epitope prediction, population coverage optimization, and immunogenicity assessment.

## Reasoning Strategy

Vaccine design requires presenting the right epitopes to elicit protective immunity — not just any immune response, but one that is neutralizing, durable, and broadly applicable. For T-cell vaccines, the core tool is MHC binding prediction (IEDB tools): predict peptide-MHC affinity across multiple HLA alleles, then select epitopes with broad coverage of the target population. For antibody vaccines, prioritize surface-exposed conserved regions — a deeply buried or hypervariable region makes a poor antibody target. MHC binding does not equal immunogenicity; many good binders are not immunogenic in vivo due to tolerance, poor processing, or lack of T-cell help. A multi-epitope strategy (combining MHC-I for CD8+ CTL response, MHC-II for CD4+ helper response, and B-cell epitopes for antibody induction) is more robust than any single epitope. Conservation across pathogen strains is critical — an epitope that mutates under immune pressure (like HIV envelope hypervariable regions) is a poor vaccine target.

**LOOK UP DON'T GUESS**: Do not predict MHC binding or population coverage from memory — use `IEDB_predict_mhci_binding` and `IEDB_predict_mhcii_binding` for predictions and `IEDB_search_epitopes` for validated experimental data. Do not assume what's on the pathogen surface; retrieve annotated sequences from UniProt or BVBRC.

**Key principles**:
1. **Epitope-driven** — vaccines work by presenting epitopes to T/B cells; start with epitope prediction
2. **Population coverage matters** — HLA diversity means no single epitope covers everyone; design for breadth
3. **Multi-epitope is better** — combine CD8+ (MHC-I) and CD4+ (MHC-II) epitopes for robust immunity
4. **Conservation = broad protection** — conserved epitopes across strains provide cross-protective immunity
5. **Evidence grading** — T1: clinical trial data, T2: in-vivo immunogenicity, T3: in-vitro binding, T4: computational prediction only

---

## When to Use

- "Design a vaccine against [pathogen]"
- "Predict T-cell epitopes for [protein]"
- "What MHC-I epitopes does [protein] have?"
- "Assess population coverage of these epitopes"
- "Find conserved epitopes across [pathogen] strains"

**Not this skill**: For HLA typing or allele frequency only, use `tooluniverse-hla-immunogenomics`. For antibody engineering, use `tooluniverse-antibody-engineering`.

---

## Core Tools

| Tool | Use For |
|------|---------|
| `IEDB_search_epitopes` | Search experimentally validated epitopes |
| `IEDB_get_epitope` | Get detailed epitope data (assay results, MHC restriction) |
| `iedb_search_mhc` | Search validated MHC binding assay data |
| `IEDB_predict_mhci_binding` | **Predict MHC-I binding** (NetMHCpan EL; rank < 0.5% = strong binder) |
| `IEDB_predict_mhcii_binding` | **Predict MHC-II binding** (NetMHCIIpan EL; CD4+ helper epitopes) |
| `UniProt_get_entry_by_accession` | Get antigen protein sequence |
| `UniProt_search` | Find pathogen protein sequences |
| `BVBRC_search_genome_features` | Search pathogen proteomes |
| `alphafold_get_prediction` | Get/predict antigen 3D structure |
| `EnsemblVEP_annotate_hgvs` | Check epitope conservation across variants |
| `PubMed_search_articles` | Find published vaccine studies |
| `search_clinical_trials` | Find ongoing vaccine clinical trials |

---

## Workflow

```
Phase 0: Antigen Selection
  Pathogen → essential surface proteins → sequence retrieval
    |
Phase 1: T-Cell Epitope Prediction
  MHC-I (CD8+ CTL) and MHC-II (CD4+ helper) binding prediction
    |
Phase 2: B-Cell Epitope Prediction
  Linear and conformational B-cell epitopes for antibody response
    |
Phase 3: Population Coverage
  HLA allele frequencies → design for target population
    |
Phase 4: Conservation Analysis
  Cross-strain epitope conservation → broad protection
    |
Phase 5: Candidate Assembly & Report
  Multi-epitope construct design → immunogenicity assessment
```

### Phase 0: Antigen Selection

**Best antigens for vaccines**: Surface-exposed, essential for pathogen function, conserved across strains.

```python
# Find pathogen surface proteins
UniProt_search(query="[organism] AND locations:(location:cell surface) AND reviewed:true")
# Or search BVBRC for annotated pathogen proteomes
BVBRC_search_genome_features(keyword="surface protein", genome_id="[taxon_id]")
```

**Antigen prioritization**: prefer surface-exposed (secreted/outer membrane) over cytoplasmic; >95% conserved across strains; essential for pathogen viability; known immunogen in natural infection. Use UniProt subcellular location annotations and PubMed to verify these properties.

### Phase 1: T-Cell Epitope Prediction

**MHC-I epitopes** (CD8+ cytotoxic T cells — kill infected cells):

```python
# Option A: Search for KNOWN validated epitopes from IEDB
iedb_search_mhc(
    mhc_class="I",
    qualitative_measure="Positive",
    filters={"source_organism_iri": "eq.NCBITaxon:2697049"},  # SARS-CoV-2
    select=["linear_sequence", "mhc_restriction", "qualitative_measure"],
    limit=50
)

# Option B: PREDICT novel peptide binding (recommended for new proteins)
IEDB_predict_mhci_binding(
    sequence="YOUR_PROTEIN_SEQUENCE",  # full protein or peptide
    allele="HLA-A*02:01",             # or H-2-Kd for mouse
    method="netmhcpan_el",            # EL = eluted ligand (recommended)
    length=9                           # 8-11 for MHC-I
)
# Returns peptides ranked by percentile_rank:
# < 0.5% = strong binder (include in vaccine)
# 0.5-2% = moderate binder (consider)
# > 2% = weak/non-binder (exclude)
```

**MHC-II epitopes** (CD4+ helper T cells — activate B cells and CD8+ T cells):

```python
iedb_search_mhc(
    mhc_class="II",
    qualitative_measure="Positive",
    filters={"source_organism_iri": "eq.NCBITaxon:2697049"},
    limit=50
)
```

**Binding affinity interpretation**:

| IC50 (nM) | Classification | Vaccine Relevance |
|-----------|---------------|-------------------|
| < 50 | Strong binder | Include — high presentation probability |
| 50-500 | Moderate binder | Consider — may contribute to response |
| 500-5000 | Weak binder | Exclude — unlikely to be presented |
| > 5000 | Non-binder | Exclude |

**HLA supertype strategy**: For broad coverage, predict against HLA supertypes:
- **A2 supertype** (A*02:01, A*02:06, A*68:02) — covers ~40% globally
- **A3 supertype** (A*03:01, A*11:01, A*31:01) — covers ~25%
- **B7 supertype** (B*07:02, B*35:01, B*51:01) — covers ~25%
- **A2 + A3 + B7 + B44** combined — covers >90% of most populations

### Phase 2: B-Cell Epitope Prediction

B-cell epitopes trigger antibody production. Look for:
- **Linear epitopes**: Continuous peptide sequences (easier to synthesize)
- **Conformational epitopes**: 3D surface patches (requires structural data)

```python
# Check for known B-cell epitopes
IEDB_search_epitopes(query="[protein_name]", epitope_type="B cell")
# Get structure for conformational epitope prediction
alphafold_get_prediction(uniprot_id="[accession]")
```

**B-cell epitope criteria**: Surface-exposed loops, hydrophilic regions, flexible regions (high B-factor). Combine computational prediction with structural analysis.

### Phase 3: Population Coverage

```python
# Search for epitopes restricted to common HLA alleles in target population
# NOTE: No HLA frequency tool exists in ToolUniverse. For population coverage:
# 1. Use IEDB Analysis Resource (tools.iedb.org/population) for population coverage calculation
# 2. Use the HLA supertype strategy (see above) to ensure broad coverage
# 3. Search PubMed for published HLA frequency data: PubMed_search_articles(query="HLA allele frequency [population]")
```

**Population coverage targets**:

| Coverage Level | Interpretation | Action |
|---------------|---------------|--------|
| >90% | Excellent — vaccine will work in most individuals | Proceed to development |
| 70-90% | Good — most people covered; some populations underserved | Add more epitopes for uncovered HLA types |
| 50-70% | Moderate — significant gaps | Redesign with broader HLA coverage |
| <50% | Poor — vaccine will miss too many people | Fundamental redesign needed |

### Phase 4: Conservation Analysis

Check if epitopes are conserved across pathogen strains/variants:

```python
# Search for protein variants across strains
PubMed_search_articles(query="[pathogen] [protein] sequence variation strains")
# Check specific mutations in epitope regions
EnsemblVEP_annotate_hgvs(hgvs_notation="[variant_in_epitope]")
```

**Conservation interpretation**:
- **100% conserved** across all known strains → ideal vaccine target
- **>95% conserved** → good target; monitor emerging variants
- **80-95% conserved** → may need strain-specific variants in construct
- **<80% conserved** → avoid; pathogen evolves to escape this epitope

### Phase 5: Candidate Assembly & Report

**Multi-epitope construct design principles**:
1. Include 3-5 MHC-I epitopes (CD8+ response)
2. Include 2-3 MHC-II epitopes (CD4+ helper response)
3. Include 1-2 B-cell epitopes (antibody response)
4. Connect with appropriate linkers (AAY for MHC-I, GPGPG for MHC-II)
5. Add adjuvant sequence if needed (e.g., flagellin domain for TLR5)

**Report structure**:
1. **Antigen Selection** — rationale, conservation, essentiality
2. **Epitope Map** — all predicted epitopes with binding affinities and HLA restrictions
3. **Top Epitopes** — ranked by binding strength × conservation × population coverage
4. **Population Coverage** — % coverage per major world population
5. **Conservation Analysis** — strain coverage, escape risk assessment
6. **Construct Design** — multi-epitope sequence with linkers
7. **Clinical Precedent** — existing vaccines/trials for related antigens
8. **Limitations** — predicted only (T4 evidence); needs experimental validation

---

## Limitations

- **All predictions are computational** (T4 evidence) — experimental validation (binding assays, immunogenicity studies) is required before any clinical development
- **No immunogenicity guarantee** — MHC binding ≠ immunogenicity; many good binders are not immunogenic in vivo
- **B-cell epitope prediction is less reliable** than T-cell; conformational epitopes require accurate structures
- **No adjuvant optimization** — adjuvant selection requires empirical testing
- **Pathogen evasion** — rapidly evolving pathogens (HIV, influenza) may escape epitope-based vaccines

Related Skills

tooluniverse

1202

from mims-harvard/ToolUniverse

Router skill for ToolUniverse tasks. First checks if specialized tooluniverse skills (105+ skills covering disease/drug/target research, gene-disease associations, clinical decision support, genomics, epigenomics, proteomics, comparative genomics, chemical safety, toxicology, systems biology, and more) can solve the problem, then falls back to general strategies for using 2300+ scientific tools. Covers tool discovery, multi-hop queries, comprehensive research workflows, disambiguation, evidence grading, and report generation. Use when users need to research any scientific topic, find biological data, or explore drug/target/disease relationships. ALSO USE for any biology, medicine, chemistry, pharmacology, or life science question — even simple factoid questions like "how many X in protein Y", "what drug interacts with Z", "what gene causes disease W", or "translate this sequence". These questions benefit from database lookups (UniProt, PubMed, ChEMBL, ClinVar, GWAS Catalog, etc.) rather than answering from memory alone. When in doubt about a scientific fact, USE THIS SKILL to verify against real databases.

tooluniverse-variant-to-mechanism

1202

from mims-harvard/ToolUniverse

End-to-end variant-to-mechanism analysis: given a genetic variant (rsID or coordinates), trace its functional impact from regulatory context (GWAS, eQTL, RegulomeDB, ENCODE) through target gene identification (GTEx, OpenTargets L2G) to downstream pathway and disease biology (STRING, Reactome, GO enrichment, disease associations). Produces an evidence-graded mechanistic narrative linking genotype to phenotype. Use when asked "how does this variant cause disease?", "what is the mechanism of rs7903146?", "trace variant to pathway", or "connect this GWAS hit to biology".

tooluniverse-variant-interpretation

1202

from mims-harvard/ToolUniverse

Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability.

tooluniverse-variant-functional-annotation

1202

from mims-harvard/ToolUniverse

Comprehensive functional annotation of protein variants — pathogenicity, population frequency, structural context, and clinical significance. Integrates ProtVar (map_variant, get_function, get_population) for protein-level mapping and structural context, ClinVar for clinical classifications, gnomAD for population frequency with ancestry data, CADD for deleteriousness scores, and ClinGen for gene-disease validity. Produces a structured variant annotation report with evidence grading. Use when asked about protein variant impact, missense variant pathogenicity, ProtVar annotation, variant functional context, or combining population and structural evidence for a variant.

tooluniverse-variant-analysis

1202

from mims-harvard/ToolUniverse

Production-ready VCF processing, variant annotation, mutation analysis, and structural variant (SV/CNV) interpretation for bioinformatics questions. Parses VCF files (streaming, large files), classifies mutation types (missense, nonsense, synonymous, frameshift, splice, intronic, intergenic) and structural variants (deletions, duplications, inversions, translocations), applies VAF/depth/quality/consequence filters, annotates with ClinVar/dbSNP/gnomAD/CADD via ToolUniverse, interprets SV/CNV clinical significance using ClinGen dosage sensitivity scores, computes variant statistics, and generates reports. Solves questions like "What fraction of variants with VAF < 0.3 are missense?", "How many non-reference variants remain after filtering intronic/intergenic?", "What is the pathogenicity of this deletion affecting BRCA1?", or "Which dosage-sensitive genes overlap this CNV?". Use when processing VCF files, annotating variants, filtering by VAF/depth/consequence, classifying mutations, interpreting structural variants, assessing CNV pathogenicity, comparing cohorts, or answering variant analysis questions.

tooluniverse-toxicology

1202

from mims-harvard/ToolUniverse

Assess chemical and drug toxicity via adverse outcome pathways, real-world adverse event signals, and toxicogenomic evidence. Integrates AOPWiki (AOPWiki_list_aops, AOPWiki_get_aop) for mechanism- level pathway tracing, FAERS for post-market adverse event quantification, OpenFDA for label mining, and CTD for chemical-gene-disease evidence. Produces structured toxicity reports with evidence grading (T1-T4). Use when asked about toxicity mechanisms, adverse outcome pathways, AOP mapping, FAERS signal detection, or chemical-disease relationships for drugs or environmental chemicals.

tooluniverse-target-research

1202

from mims-harvard/ToolUniverse

Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.

tooluniverse-systems-biology

1202

from mims-harvard/ToolUniverse

Comprehensive systems biology and pathway analysis using multiple pathway databases (Reactome, KEGG, WikiPathways, Pathway Commons, BioModels). Performs pathway enrichment, protein-pathway mapping, keyword searches, and systems-level analysis. Use when analyzing gene sets, exploring biological pathways, or investigating systems-level biology.

tooluniverse-structural-variant-analysis

1202

from mims-harvard/ToolUniverse

Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.

tooluniverse-structural-proteomics

1202

from mims-harvard/ToolUniverse

Integrate structural biology data with proteomics for drug target validation. Retrieves protein structures from PDB (RCSB, PDBe), AlphaFold predictions, antibody structures (SAbDab), GPCR data (GPCRdb), binding pocket analysis (ProteinsPlus), and ligand interactions (BindingDB). Use when asked to find structures for a drug target, identify binding site ligands, cross-validate drug binding with structural data, assess structural druggability, or compare experimental vs predicted structures.

tooluniverse-stem-cell-organoid

1202

from mims-harvard/ToolUniverse

Research stem cells, iPSCs, organoids, and cell differentiation using ToolUniverse tools. Covers pluripotency marker identification, differentiation pathway analysis, organoid model characterization, cell type annotation, and disease modeling. Integrates CellxGene/HCA for single-cell atlas data, CellMarker for cell type markers, GEO for stem cell datasets, and pathway tools for differentiation signaling. Use when asked about stem cells, iPSCs, organoids, cell reprogramming, pluripotency, differentiation protocols, or 3D culture models.

tooluniverse-statistical-modeling

1202

from mims-harvard/ToolUniverse

Perform statistical modeling and regression analysis on biomedical datasets. Supports linear regression, logistic regression (binary/ordinal/multinomial), mixed-effects models, Cox proportional hazards survival analysis, Kaplan-Meier estimation, and comprehensive model diagnostics. Extracts odds ratios, hazard ratios, confidence intervals, p-values, and effect sizes. Designed to solve BixBench statistical reasoning questions involving clinical/experimental data. Use when asked to fit regression models, compute odds ratios, perform survival analysis, run statistical tests, or interpret model coefficients from provided data.