bio-genome-engineering-off-target-prediction

Predict CRISPR off-target sites using Cas-OFFinder and CFD scoring algorithms. Identify potential unintended cleavage sites genome-wide and assess guide specificity. Use when evaluating guide RNA specificity or selecting guides with minimal off-target risk.

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

bio-genome-engineering-off-target-prediction is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using bio-genome-engineering-off-target-prediction should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-genome-engineering-off-target-prediction/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-genome-engineering-off-target-prediction/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bio-genome-engineering-off-target-prediction/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bio-genome-engineering-off-target-prediction Compares

Feature / Agent	bio-genome-engineering-off-target-prediction	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## Version Compatibility

Reference examples tested with: pandas 2.2+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Off-Target Prediction

**"Check my guide RNA for off-target sites"** → Search the genome for potential unintended cleavage sites allowing mismatches, then score each off-target by cutting frequency determination (CFD) to assess guide specificity.
- CLI: `cas-offinder` for genome-wide off-target search
- Python: CFD scoring with mismatch penalty matrices

## Cas-OFFinder (CLI)

Cas-OFFinder searches genomes for potential off-target sites allowing mismatches.

```bash
# Input file format (input.txt):
# Line 1: Path to genome directory (2bit or fasta index)
# Line 2: PAM pattern (N = any, R = A/G, Y = C/T)
# Line 3+: Guide sequences with mismatch tolerance

# Example input.txt:
# /path/to/genome
# NNNNNNNNNNNNNNNNNNNNNGG
# ATCGATCGATCGATCGATCGNNN 4

# Run Cas-OFFinder
cas-offinder input.txt C output.txt  # C = use CPU
cas-offinder input.txt G output.txt  # G = use GPU (faster)
```

## Cas-OFFinder Input Preparation

```python
def prepare_cas_offinder_input(guides, genome_path, max_mismatches=4, pam='NGG'):
    '''Prepare Cas-OFFinder input file

    Args:
        guides: List of 20nt guide sequences
        genome_path: Path to genome directory with .2bit or indexed fasta
        max_mismatches: Maximum mismatches to search (0-6 typical)
                       More mismatches = slower but more comprehensive
                       4 mismatches: good balance of speed and sensitivity
        pam: PAM sequence (NGG for SpCas9)
    '''
    lines = [genome_path]

    # Build pattern: 20 N's for guide + PAM
    pattern = 'N' * 20 + pam
    lines.append(pattern)

    # Add each guide with mismatch tolerance
    for guide in guides:
        # Append NNN to represent PAM positions (not matched)
        lines.append(f'{guide}NNN {max_mismatches}')

    return '\n'.join(lines)
```

## Parse Cas-OFFinder Output

```python
import pandas as pd

def parse_cas_offinder_output(output_file):
    '''Parse Cas-OFFinder results

    Output columns:
    - Guide: Query guide sequence
    - Chromosome: Target chromosome
    - Position: Genomic position (0-based)
    - Sequence: Off-target sequence found
    - Strand: + or -
    - Mismatches: Number of mismatches
    '''
    columns = ['guide', 'chrom', 'position', 'sequence', 'strand', 'mismatches']
    df = pd.read_csv(output_file, sep='\t', header=None, names=columns)

    # Sort by mismatches (fewer = more concerning)
    df = df.sort_values('mismatches')

    return df

def summarize_off_targets(df):
    '''Summarize off-target counts by mismatch number'''
    summary = df.groupby(['guide', 'mismatches']).size().unstack(fill_value=0)

    # Calculate specificity score
    # Fewer off-targets with 0-2 mismatches = higher specificity
    summary['specificity'] = 1 / (1 + summary.get(0, 0) * 100 +
                                      summary.get(1, 0) * 10 +
                                      summary.get(2, 0))
    return summary
```

## CFD Score Calculation

```python
# Cutting Frequency Determination (CFD) score
# Predicts cleavage probability at off-target sites
# From Doench et al. 2016

# Position-specific mismatch penalties
CFD_MISMATCH_SCORES = {
    # (position, ref_nt, target_nt): penalty_multiplier
    # Position 1 = PAM-proximal, Position 20 = PAM-distal
    # Values < 1 indicate reduced cutting
    (1, 'C', 'A'): 0.5, (1, 'C', 'G'): 0.7, (1, 'C', 'T'): 0.3,
    (1, 'G', 'A'): 0.4, (1, 'G', 'C'): 0.6, (1, 'G', 'T'): 0.3,
    # ... (full matrix has all 20 positions x 12 mismatch types)
    (20, 'C', 'A'): 0.9, (20, 'C', 'G'): 0.95, (20, 'C', 'T'): 0.85,
}

# PAM mismatch penalties
CFD_PAM_SCORES = {
    'AGG': 0.26, 'CGG': 0.11, 'TGG': 0.02,  # First position
    'GAG': 0.07, 'GCG': 0.03, 'GTG': 0.02,  # Second position
    'GGA': 0.01, 'GGC': 0.01, 'GGT': 0.01,  # Third position
}

def calculate_cfd_score(guide, off_target, pam='NGG'):
    '''Calculate CFD score for an off-target site

    CFD score interpretation:
    - 1.0: Perfect match (on-target)
    - >0.5: High probability of cleavage (concerning)
    - 0.1-0.5: Moderate probability
    - <0.1: Low probability (likely acceptable)
    '''
    score = 1.0

    # Apply mismatch penalties
    for i, (g, t) in enumerate(zip(guide, off_target[:20]), 1):
        if g != t:
            key = (i, g, t)
            penalty = CFD_MISMATCH_SCORES.get(key, 0.5)  # Default 0.5
            score *= penalty

    # Apply PAM penalty if not NGG
    off_pam = off_target[20:23]
    if off_pam != 'NGG' and off_pam in CFD_PAM_SCORES:
        score *= CFD_PAM_SCORES[off_pam]

    return score
```

## Aggregate Off-Target Score

```python
def calculate_guide_specificity(guide, off_targets):
    '''Calculate aggregate specificity score for a guide

    Uses sum of CFD scores for all off-targets.
    Lower aggregate = more specific guide.

    Specificity score interpretation:
    - >0.9: Highly specific (excellent)
    - 0.7-0.9: Specific (good)
    - 0.5-0.7: Moderate specificity (acceptable)
    - <0.5: Poor specificity (consider alternatives)
    '''
    cfd_sum = sum(calculate_cfd_score(guide, ot['sequence'], ot.get('pam', 'NGG'))
                  for ot in off_targets)

    # Specificity = 1 / (1 + sum of off-target CFD scores)
    specificity = 1 / (1 + cfd_sum)

    return specificity
```

## CRISPOR-style Analysis

**Goal:** Run a complete off-target analysis pipeline from a single guide sequence to a scored, ranked list of potential off-target sites.

**Approach:** Prepare a Cas-OFFinder input file, run the genome-wide mismatch search, parse the tabular output, calculate CFD scores for each hit, and flag high-risk off-targets by cleavage probability.

```python
def analyze_guide_specificity(guide_seq, genome_fasta, max_mm=4):
    '''Full off-target analysis workflow

    1. Run Cas-OFFinder to find potential sites
    2. Calculate CFD scores for each site
    3. Compute aggregate specificity
    4. Flag high-risk off-targets
    '''
    import subprocess
    import tempfile

    # Prepare input
    with tempfile.NamedTemporaryFile(mode='w', suffix='.txt', delete=False) as f:
        f.write(f'{genome_fasta}\n')
        f.write('N' * 20 + 'NGG\n')
        f.write(f'{guide_seq}NNN {max_mm}\n')
        input_file = f.name

    output_file = input_file.replace('.txt', '_out.txt')

    # Run Cas-OFFinder
    subprocess.run(['cas-offinder', input_file, 'C', output_file], check=True)

    # Parse results
    off_targets = parse_cas_offinder_output(output_file)

    # Calculate CFD for each
    off_targets['cfd_score'] = off_targets.apply(
        lambda row: calculate_cfd_score(guide_seq, row['sequence']), axis=1
    )

    # Flag high-risk (CFD > 0.5 and in exon)
    # Would need exon annotations for full implementation

    return off_targets
```

## Related Skills

- genome-engineering/grna-design - Design guides before off-target check
- variant-calling/variant-annotation - Check if off-targets overlap known variants
- genome-intervals/bed-file-basics - Intersect off-targets with genomic features

Related Skills

tooluniverse-target-research

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Gather comprehensive biological target intelligence from 9 parallel research paths covering protein info, structure, interactions, pathways, expression, variants, drug interactions, and literature. Features collision-aware searches, evidence grading (T1-T4), explicit Open Targets coverage, and mandatory completeness auditing. Use when users ask about drug targets, proteins, genes, or need target validation, druggability assessment, or comprehensive target profiling.

tooluniverse-immunotherapy-response-prediction

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Predict patient response to immune checkpoint inhibitors (ICIs) using multi-biomarker integration. Given a cancer type, somatic mutations, and optional biomarkers (TMB, PD-L1, MSI status), performs systematic analysis across 11 phases covering TMB classification, neoantigen burden estimation, MSI/MMR assessment, PD-L1 evaluation, immune microenvironment profiling, mutation-based resistance/sensitivity prediction, clinical evidence retrieval, and multi-biomarker score integration. Generates a quantitative ICI Response Score (0-100), response likelihood tier, specific ICI drug recommendations with evidence, resistance risk factors, and a monitoring plan. Use when oncologists ask about immunotherapy eligibility, checkpoint inhibitor selection, or biomarker-guided ICI treatment decisions.

tooluniverse-drug-target-validation

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Comprehensive computational validation of drug targets for early-stage drug discovery. Evaluates targets across 10 dimensions (disambiguation, disease association, druggability, chemical matter, clinical precedent, safety, pathway context, validation evidence, structural insights, validation roadmap) using 60+ ToolUniverse tools. Produces a quantitative Target Validation Score (0-100) with GO/NO-GO recommendation. Use when users ask about target validation, druggability assessment, target prioritization, or "is X a good drug target for Y?"

tooluniverse-antibody-engineering

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Comprehensive antibody engineering and optimization for therapeutic development. Covers humanization, affinity maturation, developability assessment, and immunogenicity prediction. Use when asked to optimize antibodies, humanize sequences, or engineer therapeutic antibodies from lead to clinical candidate.

opentargets-database

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Query Open Targets Platform for target-disease associations, drug target discovery, tractability/safety data, genetics/omics evidence, known drugs, for therapeutic target identification.

open-targets-search

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Search Open Targets drug-disease associations with natural language queries. Target validation powered by Valyu semantic search.

glycoengineering

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Analyze and engineer protein glycosylation. Scan sequences for N-glycosylation sequons (N-X-S/T), predict O-glycosylation hotspots, and access curated glycoengineering tools (NetOGlyc, GlycoShield, GlycoWorkbench). For glycoprotein engineering, therapeutic antibody optimization, and vaccine design.

genome-compare

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Compare your genome to George Church (PGP-1) and estimate ancestry composition via IBS and EM admixture

bio-structural-biology-modern-structure-prediction

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Predict protein structures using modern ML models including AlphaFold3, ESMFold, Chai-1, and Boltz-1. Use when predicting structures for novel proteins, protein complexes, or when comparing predictions across multiple methods.

bio-structural-biology-alphafold-predictions

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Access and analyze AlphaFold protein structure predictions. Use when predicted structures are needed for proteins without experimental structures, or for confidence scores (pLDDT).

bio-microbiome-functional-prediction

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Predict metagenome functional content from 16S rRNA marker gene data using PICRUSt2. Infer KEGG, MetaCyc, and EC abundances from ASV tables. Use when functional profiling is needed from 16S data without shotgun metagenomics sequencing.

bio-metabolomics-targeted-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Targeted metabolomics analysis using MRM/SRM with standard curves. Covers absolute quantification, method validation, and quality assessment. Use when quantifying specific metabolites using calibration curves and internal standards.