bio-crispr-screens-library-design

CRISPR library design for genetic screens. Covers sgRNA selection, library composition, control design, and oligo ordering. Use when designing custom sgRNA libraries for knockout, activation, or interference screens.

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

bio-crispr-screens-library-design is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using bio-crispr-screens-library-design should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-crispr-screens-library-design/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-crispr-screens-library-design/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bio-crispr-screens-library-design/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bio-crispr-screens-library-design Compares

Feature / Agent	bio-crispr-screens-library-design	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## Version Compatibility

Reference examples tested with: BioPython 1.83+, MAGeCK 0.5+, numpy 1.26+, pandas 2.2+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Library Design

**"Design a custom CRISPR library for my screen"** → Select optimal sgRNAs for knockout, CRISPRi/a, or base editing libraries with on-target scoring, off-target filtering, and control guide design.
- Python: CRISPOR-based scoring with `BioPython` for sequence handling

## sgRNA Selection Criteria

**Goal:** Score and rank candidate sgRNAs for a target gene based on design quality metrics.

**Approach:** Scan the gene sequence for PAM sites, extract 20-nt protospacer sequences, score each on GC content, poly-T avoidance, 5' G preference, and length, then return the top-ranked candidates.

```python
import pandas as pd
import numpy as np
from Bio import SeqIO
from Bio.Seq import Seq

def score_sgrna(sequence, pam='NGG'):
    '''Score sgRNA based on multiple criteria.'''
    scores = {}

    gc_content = (sequence.count('G') + sequence.count('C')) / len(sequence)
    scores['gc_content'] = 1 - abs(gc_content - 0.5) * 2

    if len(sequence) >= 4:
        has_poly_t = 'TTTT' in sequence
        scores['poly_t'] = 0 if has_poly_t else 1

    starts_with_g = sequence.startswith('G')
    scores['start_g'] = 1 if starts_with_g else 0.5

    scores['length'] = 1 if len(sequence) == 20 else 0.8

    overall = np.mean(list(scores.values()))
    return overall, scores

def design_sgrnas_for_gene(gene_sequence, n_guides=4, pam='NGG'):
    '''Design sgRNAs targeting a gene.'''
    candidates = []

    pam_pattern = pam.replace('N', '[ACGT]')
    import re

    for strand in ['+', '-']:
        seq = gene_sequence if strand == '+' else str(Seq(gene_sequence).reverse_complement())

        for match in re.finditer(f'([ACGT]{{20}})({pam_pattern})', seq):
            sgrna = match.group(1)
            position = match.start()

            if strand == '-':
                position = len(seq) - position - 23

            score, details = score_sgrna(sgrna)

            candidates.append({
                'sequence': sgrna,
                'pam': match.group(2),
                'strand': strand,
                'position': position,
                'score': score,
                'gc_content': (sgrna.count('G') + sgrna.count('C')) / 20,
                **details
            })

    candidates_df = pd.DataFrame(candidates)
    candidates_df = candidates_df.sort_values('score', ascending=False)

    return candidates_df.head(n_guides)

gene_seq = 'ATGCGATCGATCGATCGATCGAATCGATCGATCGAGGCGATCGATCGATCGATCGAATCGATCGATCGAGGCGATCGATCGATCGATCGAATCGATCGATCGAGG'
guides = design_sgrnas_for_gene(gene_seq, n_guides=5)
print(guides[['sequence', 'position', 'strand', 'score', 'gc_content']])
```

## Library Composition

**Goal:** Assemble a complete sgRNA library targeting a list of genes with appropriate controls.

**Approach:** Design top-scoring guides for each gene, append non-targeting, essential-control, and safe-harbor-control guides, and compile into an ordered library table.

```python
def design_library(gene_list, guides_per_gene=4, include_controls=True):
    '''Design complete library for gene list.'''
    library = []

    for gene in gene_list:
        gene_data = get_gene_sequence(gene)

        guides = design_sgrnas_for_gene(gene_data['sequence'], n_guides=guides_per_gene)

        for idx, guide in guides.iterrows():
            library.append({
                'gene': gene,
                'gene_id': gene_data.get('ensembl_id', ''),
                'guide_number': idx + 1,
                'sequence': guide['sequence'],
                'pam': guide['pam'],
                'position': guide['position'],
                'strand': guide['strand'],
                'score': guide['score'],
                'type': 'targeting'
            })

    if include_controls:
        controls = design_control_guides()
        library.extend(controls)

    return pd.DataFrame(library)

def get_gene_sequence(gene_name):
    '''Fetch gene sequence (placeholder - use Ensembl API or local files).'''
    return {
        'sequence': 'ATGC' * 250,
        'ensembl_id': f'ENSG_{hash(gene_name) % 100000:05d}'
    }

genes = ['TP53', 'BRCA1', 'KRAS', 'MYC', 'CDK4']
library = design_library(genes, guides_per_gene=4)
print(f'Library size: {len(library)} guides')
print(f'Genes: {library["gene"].nunique()}')
```

## Control Guide Design

**Goal:** Design control guide sets for normalization and quality assessment in CRISPR screens.

**Approach:** Generate random non-targeting sequences with acceptable GC content, add validated guides against known essential genes (positive controls) and safe-harbor loci (negative controls).

```python
def design_control_guides(n_nontargeting=100, n_essential=20, n_nonessential=20):
    '''Design control guides for library.'''
    controls = []

    for i in range(n_nontargeting):
        sequence = generate_nontargeting_sequence()
        controls.append({
            'gene': f'NonTargeting_{i+1}',
            'gene_id': '',
            'guide_number': 1,
            'sequence': sequence,
            'pam': 'NGG',
            'position': -1,
            'strand': '',
            'score': 0,
            'type': 'non-targeting'
        })

    essential_genes = ['RPS3', 'RPL11', 'EIF3A', 'POLR2A', 'CDK1']
    for gene in essential_genes[:n_essential]:
        controls.append({
            'gene': gene,
            'gene_id': '',
            'guide_number': 1,
            'sequence': get_validated_guide(gene),
            'pam': 'NGG',
            'position': 0,
            'strand': '+',
            'score': 1,
            'type': 'essential-control'
        })

    nonessential_genes = ['AAVS1', 'ROSA26']
    for gene in nonessential_genes[:n_nonessential]:
        controls.append({
            'gene': gene,
            'gene_id': '',
            'guide_number': 1,
            'sequence': get_validated_guide(gene),
            'pam': 'NGG',
            'position': 0,
            'strand': '+',
            'score': 1,
            'type': 'safe-harbor-control'
        })

    return controls

def generate_nontargeting_sequence(length=20):
    '''Generate random non-targeting sequence.'''
    while True:
        seq = ''.join(np.random.choice(['A', 'C', 'G', 'T'], length))
        gc = (seq.count('G') + seq.count('C')) / length
        if 0.4 <= gc <= 0.6 and 'TTTT' not in seq:
            return seq

def get_validated_guide(gene):
    '''Get validated guide sequence for control gene.'''
    validated = {
        'RPS3': 'GAGCTTCTTCAGCAGCATGG',
        'RPL11': 'GAAACAGGGCATCATCTACG',
        'EIF3A': 'GTGCAAGAGGATGATGACAA',
        'AAVS1': 'GGGGCCACTAGGGACAGGAT',
        'ROSA26': 'GAAGATGGGCGGGAGTCTTC'
    }
    return validated.get(gene, generate_nontargeting_sequence())
```

## Off-Target Analysis

**Goal:** Filter library guides to remove those with excessive off-target genomic matches.

**Approach:** Align each guide sequence against the genome with Bowtie allowing mismatches, count off-target hits within a mismatch threshold, and remove guides exceeding the maximum.

```python
def check_offtargets(guide_sequence, genome_index, max_mismatches=3):
    '''Check for potential off-target sites.'''
    from subprocess import run
    import tempfile

    with tempfile.NamedTemporaryFile(mode='w', suffix='.fa', delete=False) as f:
        f.write(f'>guide\n{guide_sequence}\n')
        query_file = f.name

    result = run(
        ['bowtie', '-a', '-n', str(max_mismatches), '-l', '20', genome_index, '-f', query_file],
        capture_output=True, text=True
    )

    offtargets = []
    for line in result.stdout.strip().split('\n'):
        if line:
            fields = line.split('\t')
            offtargets.append({
                'chromosome': fields[2],
                'position': int(fields[3]),
                'strand': fields[1],
                'mismatches': int(fields[7]) if len(fields) > 7 else 0
            })

    return offtargets

def filter_by_offtargets(library_df, genome_index, max_offtargets=10):
    '''Filter library to remove guides with too many off-targets.'''
    filtered = []

    for _, guide in library_df.iterrows():
        offtargets = check_offtargets(guide['sequence'], genome_index)
        n_offtargets = len([ot for ot in offtargets if ot['mismatches'] <= 2])

        if n_offtargets <= max_offtargets:
            guide_dict = guide.to_dict()
            guide_dict['n_offtargets'] = n_offtargets
            filtered.append(guide_dict)

    return pd.DataFrame(filtered)
```

## Oligo Design for Cloning

**Goal:** Generate forward and reverse oligo sequences ready for ordering and cloning into a lentiviral vector.

**Approach:** Add vector-specific adapter sequences (overhangs for BsmBI or BbsI restriction sites) to each guide and its reverse complement, formatted for the target vector backbone.

```python
def design_oligos(library_df, vector='lentiGuide-Puro'):
    '''Design oligos for library cloning.'''
    vector_specs = {
        'lentiGuide-Puro': {
            'forward_prefix': 'CACCG',
            'forward_suffix': '',
            'reverse_prefix': 'AAAC',
            'reverse_suffix': 'C'
        },
        'pLKO': {
            'forward_prefix': 'CCGG',
            'forward_suffix': 'CTCGAG',
            'reverse_prefix': 'AATTCTCGAG',
            'reverse_suffix': ''
        }
    }

    spec = vector_specs.get(vector, vector_specs['lentiGuide-Puro'])

    oligos = []
    for _, guide in library_df.iterrows():
        seq = guide['sequence']

        forward = spec['forward_prefix'] + seq + spec['forward_suffix']
        reverse = spec['reverse_prefix'] + str(Seq(seq).reverse_complement()) + spec['reverse_suffix']

        oligos.append({
            'guide_id': f"{guide['gene']}_{guide['guide_number']}",
            'gene': guide['gene'],
            'guide_sequence': seq,
            'forward_oligo': forward,
            'reverse_oligo': reverse,
            'type': guide.get('type', 'targeting')
        })

    return pd.DataFrame(oligos)

oligos = design_oligos(library)
oligos.to_csv('library_oligos.csv', index=False)
print(f'Designed {len(oligos)} oligo pairs')
```

## Pool Design for Synthesis

```python
def design_array_oligos(library_df, array_format='12K'):
    '''Design array oligos for pooled synthesis.'''
    formats = {
        '12K': {'capacity': 12000, 'length': 200},
        '92K': {'capacity': 92000, 'length': 150},
        '244K': {'capacity': 244000, 'length': 60}
    }

    spec = formats[array_format]

    primer_5 = 'AGGCTTGGATTTCTATAACTTCGTATAGCATACATTATACGAAGTTAT'
    primer_3 = 'ATAACTTCGTATAATGTATGCTATACGAAGTTATCTTGGATTTCTAGA'
    scaffold = 'GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC'

    array_oligos = []
    for _, guide in library_df.iterrows():
        full_oligo = primer_5 + guide['sequence'] + scaffold + primer_3

        if len(full_oligo) > spec['length']:
            print(f"Warning: {guide['gene']} oligo too long for {array_format}")
            continue

        array_oligos.append({
            'id': f"{guide['gene']}_{guide['guide_number']}",
            'sequence': full_oligo,
            'length': len(full_oligo)
        })

    if len(array_oligos) > spec['capacity']:
        print(f"Warning: Library ({len(array_oligos)}) exceeds {array_format} capacity ({spec['capacity']})")

    return pd.DataFrame(array_oligos)

array_oligos = design_array_oligos(library, '92K')
array_oligos.to_csv('array_synthesis.csv', index=False)
```

## Library QC

```python
def qc_library(library_df):
    '''Quality control checks for library design.'''
    qc = {}

    qc['total_guides'] = len(library_df)
    qc['unique_genes'] = library_df[library_df['type'] == 'targeting']['gene'].nunique()
    qc['guides_per_gene'] = library_df[library_df['type'] == 'targeting'].groupby('gene').size().describe()

    gc_contents = library_df['sequence'].apply(lambda x: (x.count('G') + x.count('C')) / len(x))
    qc['gc_mean'] = gc_contents.mean()
    qc['gc_std'] = gc_contents.std()
    qc['gc_range'] = (gc_contents.min(), gc_contents.max())

    has_poly_t = library_df['sequence'].apply(lambda x: 'TTTT' in x)
    qc['poly_t_count'] = has_poly_t.sum()

    type_counts = library_df['type'].value_counts()
    qc['control_ratio'] = type_counts.get('non-targeting', 0) / len(library_df)

    return qc

qc = qc_library(library)
print('Library QC:')
for key, value in qc.items():
    print(f'  {key}: {value}')
```

## Alternative PAM Systems

The examples above use SpCas9 with NGG PAM. Alternative systems expand targeting range:

| System | PAM | Use Case |
|--------|-----|----------|
| SpCas9 | NGG | Standard, most validated |
| SpCas9-NG | NG | Relaxed PAM requirement |
| SpRY | NRN/NYN | Near-PAMless, broadest targeting |
| Cas12a (Cpf1) | TTTV | AT-rich regions, staggered cuts |
| SaCas9 | NNGRRT | AAV delivery (smaller gene) |

For alternative PAMs, modify the `design_sgrnas_for_gene()` function:

```python
# Cas12a example (TTTV PAM, 23nt guide)
def design_cas12a_guides(gene_sequence, n_guides=4):
    pam_pattern = 'TTT[ACG]'  # TTTV
    guide_length = 23

    for match in re.finditer(f'({pam_pattern})([ACGT]{{{guide_length}}})', gene_sequence):
        pam = match.group(1)
        guide = match.group(2)
        # Cas12a cuts downstream of guide
        # ...
```

## Related Skills

- mageck-analysis - Analyze screen results
- crispresso-editing - Validate editing efficiency
- screen-qc - QC sequencing data
- hit-calling - Identify screen hits

Related Skills

tooluniverse-protein-therapeutic-design

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Design novel protein therapeutics (binders, enzymes, scaffolds) using AI-guided de novo design. Uses RFdiffusion for backbone generation, ProteinMPNN for sequence design, ESMFold/AlphaFold2 for validation. Use when asked to design protein binders, therapeutic proteins, or engineer protein function.

tooluniverse-crispr-screen-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Comprehensive CRISPR screen analysis for functional genomics. Analyze pooled or arrayed CRISPR screens (knockout, activation, interference) to identify essential genes, synthetic lethal interactions, and drug targets. Perform sgRNA count processing, gene-level scoring (MAGeCK, BAGEL), quality control, pathway enrichment, and drug target prioritization. Use for CRISPR screen analysis, gene essentiality studies, synthetic lethality detection, functional genomics, drug target validation, or identifying genetic vulnerabilities.

tooluniverse-clinical-trial-design

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Strategic clinical trial design feasibility assessment using ToolUniverse. Evaluates patient population sizing, biomarker prevalence, endpoint selection, comparator analysis, safety monitoring, and regulatory pathways. Creates comprehensive feasibility reports with evidence grading, enrollment projections, and trial design recommendations. Use when planning Phase 1/2 trials, assessing trial feasibility, or designing biomarker-driven studies.

protein-design-workflow

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

End-to-end guidance for protein design pipelines. Use this skill when: (1) Starting a new protein design project, (2) Need step-by-step workflow guidance, (3) Understanding the full design pipeline, (4) Planning compute resources and timelines, (5) Integrating multiple design tools. For tool selection, use binder-design. For QC thresholds, use protein-qc.

bio-genome-engineering-prime-editing-design

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Design pegRNAs for prime editing using PrimeDesign algorithms. Generate spacer, PBS, and RT template sequences for precise genomic modifications without double-strand breaks. Use when designing prime editing experiments for precise insertions, deletions, or point mutations.

bio-genome-engineering-hdr-template-design

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Design homology-directed repair donor templates for CRISPR knock-ins using primer3-py. Create ssODN, dsDNA, or plasmid templates with optimized homology arms. Use when designing donor templates for precise insertions, tagging, or allele replacement.

bio-genome-engineering-grna-design

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Design guide RNAs for CRISPR-Cas9/Cas12a experiments using CRISPRscan and local scoring algorithms. Score guides for on-target activity using Rule Set 2 and Azimuth models. Use when designing sgRNAs for gene knockout, activation, or repression experiments.

bio-genome-engineering-base-editing-design

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Design guides for cytosine and adenine base editing using editing window optimization and BE-Hive outcome prediction. Select optimal positions for C-to-T or A-to-G conversions without double-strand breaks. Use when designing base editor experiments for precise nucleotide changes.

bio-crispr-screens-screen-qc

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Quality control for pooled CRISPR screens. Covers library representation, read distribution, replicate correlation, and essential gene recovery. Use when assessing screen quality before hit calling or diagnosing poor screen performance.

bio-crispr-screens-mageck-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

MAGeCK (Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout) for pooled CRISPR screen analysis. Covers count normalization, gene ranking, and pathway analysis. Use when identifying essential genes, drug targets, or resistance mechanisms from dropout or enrichment screens.

bio-crispr-screens-jacks-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

JACKS (Joint Analysis of CRISPR/Cas9 Knockout Screens) for modeling sgRNA efficacy and gene essentiality. Use when analyzing multiple CRISPR screens simultaneously or when accounting for variable sgRNA efficiency across experiments.

bio-crispr-screens-hit-calling

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Statistical methods for calling hits in CRISPR screens. Covers MAGeCK, BAGEL2, drugZ, and custom approaches for identifying essential and resistance genes. Use when identifying significant genes from screen count data after QC passes.