bio-immunoinformatics-tcr-epitope-binding

Predict TCR-epitope specificity using ERGO-II and deep learning models for T-cell receptor antigen recognition. Match TCRs to their cognate epitopes or predict TCR targets. Use when analyzing TCR repertoire specificity or identifying antigen-reactive T-cells.

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

bio-immunoinformatics-tcr-epitope-binding is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using bio-immunoinformatics-tcr-epitope-binding should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-immunoinformatics-tcr-epitope-binding/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-immunoinformatics-tcr-epitope-binding/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bio-immunoinformatics-tcr-epitope-binding/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bio-immunoinformatics-tcr-epitope-binding Compares

Feature / Agent	bio-immunoinformatics-tcr-epitope-binding	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

## Version Compatibility

Reference examples tested with: MiXCR 4.6+, numpy 1.26+, pandas 2.2+, scikit-learn 1.4+, scipy 1.12+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# TCR-Epitope Binding

**"Predict which epitopes my TCRs recognize"** → Match T-cell receptors to their cognate epitopes using deep learning models for TCR antigen specificity prediction.
- Python: ERGO-II model for TCR-epitope binding prediction

## ERGO-II Model

```python
# ERGO-II uses deep learning to predict TCR-epitope binding
# GitHub: https://github.com/IdoSpringer/ERGO-II

def setup_ergo():
    '''Setup ERGO-II for TCR-epitope prediction

    Requirements:
    - PyTorch
    - Pre-trained models from ERGO-II repository

    ERGO-II features:
    - Uses both CDR3 alpha and beta chains
    - Incorporates MHC context
    - Trained on VDJdb and IEDB data
    '''
    print('ERGO-II setup:')
    print('1. Clone: git clone https://github.com/IdoSpringer/ERGO-II')
    print('2. Install: pip install torch pandas scikit-learn')
    print('3. Download models from repository')
```

## TCR Input Format

```python
def parse_tcr_data(tcr_file):
    '''Parse TCR sequence data

    Required columns:
    - cdr3_beta: CDR3 beta chain sequence (most informative)
    - cdr3_alpha: CDR3 alpha chain (optional, improves accuracy)
    - v_beta: V gene usage (optional)
    - j_beta: J gene usage (optional)

    CDR3 is the primary determinant of antigen specificity.
    Alpha chain provides ~20% additional specificity.
    '''
    import pandas as pd

    df = pd.read_csv(tcr_file, sep='\t')

    # Validate CDR3 sequences
    valid_aa = set('ACDEFGHIKLMNPQRSTVWY')

    def is_valid_cdr3(seq):
        if pd.isna(seq):
            return False
        return all(aa in valid_aa for aa in seq.upper())

    df['valid_beta'] = df['cdr3_beta'].apply(is_valid_cdr3)

    return df[df['valid_beta']]
```

## Predict TCR-Epitope Binding

```python
def predict_binding_simple(cdr3_beta, epitope):
    '''Simple TCR-epitope compatibility score

    This is a simplified heuristic. For accurate predictions,
    use ERGO-II or other deep learning models.

    Features considered:
    - CDR3 length compatibility
    - Amino acid composition
    - Hydrophobicity matching
    '''
    # Length compatibility
    # TCRs recognizing similar epitopes often have similar CDR3 lengths
    optimal_length = len(epitope) + 5  # Rough heuristic
    length_score = 1 - abs(len(cdr3_beta) - optimal_length) / 10

    # Charge complementarity
    positive = set('RKH')
    negative = set('DE')

    tcr_charge = sum(1 if aa in positive else -1 if aa in negative else 0
                    for aa in cdr3_beta)
    epitope_charge = sum(1 if aa in positive else -1 if aa in negative else 0
                        for aa in epitope)

    # Opposite charges suggest complementarity
    charge_score = 0.5 + (tcr_charge * -epitope_charge) / 20

    return {
        'cdr3_beta': cdr3_beta,
        'epitope': epitope,
        'length_score': max(0, min(1, length_score)),
        'charge_score': max(0, min(1, charge_score)),
        'combined': (length_score + charge_score) / 2
    }
```

## Match TCRs to Known Epitopes

```python
def match_to_vdjdb(tcr_sequences, vdjdb_path='vdjdb.tsv'):
    '''Match TCRs to known epitopes in VDJdb

    VDJdb is a curated database of TCR-epitope pairs.
    Download from: https://vdjdb.cdr3.net/

    Matching approaches:
    - Exact CDR3 match
    - Similar CDR3 (edit distance ≤1)
    - Cluster-based (group similar TCRs)
    '''
    import pandas as pd
    from difflib import SequenceMatcher

    vdjdb = pd.read_csv(vdjdb_path, sep='\t')

    matches = []
    for tcr in tcr_sequences:
        # Exact match
        exact = vdjdb[vdjdb['cdr3'] == tcr]
        if len(exact) > 0:
            matches.append({
                'query_tcr': tcr,
                'match_type': 'exact',
                'epitopes': exact['antigen.epitope'].tolist(),
                'species': exact['antigen.species'].tolist()
            })
            continue

        # Fuzzy match (1 mismatch)
        for _, row in vdjdb.iterrows():
            similarity = SequenceMatcher(None, tcr, row['cdr3']).ratio()
            if similarity > 0.9:  # >90% similar
                matches.append({
                    'query_tcr': tcr,
                    'match_type': 'similar',
                    'similarity': similarity,
                    'db_tcr': row['cdr3'],
                    'epitope': row['antigen.epitope'],
                    'species': row['antigen.species']
                })

    return pd.DataFrame(matches)
```

## TCR Clustering

**Goal:** Group TCRs that likely recognize the same epitope based on CDR3 sequence similarity, enabling specificity group discovery from large repertoire datasets.

**Approach:** Compute pairwise Levenshtein distances between CDR3 sequences, apply hierarchical clustering with average linkage, and cut the dendrogram at a maximum edit distance threshold to define specificity groups.

```python
def cluster_tcrs_by_specificity(tcr_sequences, method='levenshtein'):
    '''Cluster TCRs likely to share specificity

    TCRs recognizing the same epitope often have:
    - Similar CDR3 length
    - Shared motifs
    - Similar V gene usage

    Methods:
    - levenshtein: Edit distance clustering
    - tcrdist: TCRdist3 distance metric
    - deep: Deep learning embeddings
    '''
    from scipy.cluster.hierarchy import linkage, fcluster
    from scipy.spatial.distance import pdist, squareform
    import numpy as np

    def levenshtein_distance(s1, s2):
        if len(s1) < len(s2):
            return levenshtein_distance(s2, s1)
        if len(s2) == 0:
            return len(s1)

        previous_row = range(len(s2) + 1)
        for i, c1 in enumerate(s1):
            current_row = [i + 1]
            for j, c2 in enumerate(s2):
                insertions = previous_row[j + 1] + 1
                deletions = current_row[j] + 1
                substitutions = previous_row[j] + (c1 != c2)
                current_row.append(min(insertions, deletions, substitutions))
            previous_row = current_row

        return previous_row[-1]

    # Calculate pairwise distances
    n = len(tcr_sequences)
    distances = np.zeros((n, n))
    for i in range(n):
        for j in range(i + 1, n):
            d = levenshtein_distance(tcr_sequences[i], tcr_sequences[j])
            distances[i, j] = distances[j, i] = d

    # Cluster
    condensed = squareform(distances)
    Z = linkage(condensed, method='average')
    clusters = fcluster(Z, t=3, criterion='distance')  # Max 3 edits

    return dict(zip(tcr_sequences, clusters))
```

## Analyze Repertoire Specificity

```python
def analyze_repertoire_specificity(tcr_df, epitope_db):
    '''Analyze antigen specificity of TCR repertoire

    Reports:
    - Fraction matching known epitopes
    - Epitope diversity
    - Potential public TCRs (shared across individuals)
    '''
    results = {
        'total_tcrs': len(tcr_df),
        'unique_cdr3': tcr_df['cdr3_beta'].nunique(),
        'matched_epitopes': 0,
        'epitope_distribution': {}
    }

    # Match to database
    matched = match_to_vdjdb(tcr_df['cdr3_beta'].unique(), epitope_db)

    if len(matched) > 0:
        results['matched_epitopes'] = len(matched['query_tcr'].unique())
        results['epitope_distribution'] = matched['epitope'].value_counts().to_dict()

    return results
```

## Related Skills

- tcr-bcr-analysis/mixcr-analysis - TCR repertoire sequencing analysis
- immunoinformatics/mhc-binding-prediction - Epitope context
- single-cell/clustering - Single-cell TCR analysis

Related Skills

bio-immunoinformatics-neoantigen-prediction

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Identify tumor neoantigens from somatic mutations using pVACtools for personalized cancer immunotherapy. Predict mutant peptides that bind patient HLA and may elicit T-cell responses. Use when identifying vaccine targets or checkpoint inhibitor response biomarkers from tumor sequencing data.

bio-immunoinformatics-mhc-binding-prediction

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Predict peptide-MHC class I and II binding affinity using MHCflurry and NetMHCpan neural network models. Identify potential T-cell epitopes from protein sequences. Use when predicting MHC binding for vaccine design or neoantigen identification.

bio-immunoinformatics-immunogenicity-scoring

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Score and prioritize neoantigens and epitopes for immunogenicity using multi-factor models combining MHC binding, processing, expression, and sequence features. Rank candidates for vaccine design. Use when prioritizing epitopes for vaccine development or identifying the most immunogenic neoantigens.

bio-immunoinformatics-epitope-prediction

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Predict B-cell and T-cell epitopes using BepiPred, IEDB tools, and structure-based methods for vaccine and antibody design. Identify immunogenic regions in antigens. Use when designing vaccines, mapping antibody binding sites, or predicting immunogenic peptides.

bio-chipseq-differential-binding

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Differential binding analysis using DiffBind. Compare ChIP-seq peaks between conditions with statistical rigor. Requires replicate samples. Outputs differentially bound regions with fold changes and p-values. Use when comparing ChIP-seq binding between conditions.

bindingdb-database

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Query BindingDB for measured drug-target binding affinities (Ki, Kd, IC50, EC50). Search by target (UniProt ID), compound (SMILES/name), or pathogen. Essential for drug discovery, lead optimization, polypharmacology analysis, and structure-activity relationship (SAR) studies.

binding-characterization

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Guidance for SPR and BLI binding characterization experiments. Use when: (1) Planning binding kinetics experiments, (2) Troubleshooting poor/no binding signal, (3) Interpreting kinetic data artifacts, (4) Choosing between SPR vs BLI platforms.

zinc-database

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

zarr-python

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

xlsx

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.

writing-skills

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use when creating new skills, editing existing skills, or verifying skills work before deployment

writing-plans

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use when you have a spec or requirements for a multi-step task, before touching code