bio-longread-structural-variants

Detect structural variants from long-read alignments using Sniffles, cuteSV, and SVIM. Use when detecting deletions, insertions, inversions, translocations, or complex rearrangements from ONT or PacBio data, especially those missed by short-read methods.

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

bio-longread-structural-variants is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using bio-longread-structural-variants should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-longread-structural-variants/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-longread-structural-variants/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bio-longread-structural-variants/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bio-longread-structural-variants Compares

Feature / Agent	bio-longread-structural-variants	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## Version Compatibility

Reference examples tested with: bcftools 1.19+

Before using code patterns, verify installed versions match. If versions differ:
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Structural Variant Detection

**"Call structural variants from my long reads"** → Detect large deletions, insertions, inversions, duplications, and translocations with precise breakpoint resolution from ONT or PacBio alignments.
- CLI: `sniffles --input aligned.bam --vcf svs.vcf`, `cuteSV aligned.bam ref.fa svs.vcf output/`

## Sniffles2 - Basic SV Calling

```bash
# Call SVs from aligned BAM
sniffles --input aligned.bam \
    --vcf structural_variants.vcf \
    --reference reference.fa \
    --threads 4
```

## Sniffles2 - Common Options

```bash
sniffles --input aligned.bam \
    --vcf structural_variants.vcf \
    --reference reference.fa \
    --threads 8 \
    --minsupport 3 \               # Min supporting reads
    --minsvlen 50 \                # Min SV length
    --mapq 20 \                    # Min mapping quality
    --output-rnames \              # Include read names
    --mosaic                       # Detect mosaic SVs
```

## Sniffles2 - Population Calling

**Goal:** Jointly call and genotype structural variants across a cohort of long-read samples for population-level SV analysis.

**Approach:** Generate per-sample SNF signature files from individual BAMs, then merge and jointly genotype all samples in a single Sniffles2 call.

```bash
# Step 1: Call SVs per sample with SNF output
sniffles --input sample1.bam --snf sample1.snf --reference reference.fa
sniffles --input sample2.bam --snf sample2.snf --reference reference.fa

# Step 2: Merge and genotype
sniffles --input sample1.snf sample2.snf \
    --vcf population_svs.vcf \
    --reference reference.fa
```

## cuteSV - Alternative Caller

```bash
# cuteSV SV calling
cuteSV aligned.bam reference.fa output.vcf work_dir/ \
    --threads 8 \
    --min_support 3 \
    --min_size 50 \
    --genotype
```

## cuteSV - ONT Optimized

```bash
# Settings optimized for ONT
cuteSV aligned.bam reference.fa output.vcf work_dir/ \
    --threads 8 \
    --max_cluster_bias_INS 100 \
    --diff_ratio_merging_INS 0.3 \
    --max_cluster_bias_DEL 100 \
    --diff_ratio_merging_DEL 0.3 \
    --genotype
```

## cuteSV - PacBio HiFi Optimized

```bash
# Settings optimized for HiFi
cuteSV aligned.bam reference.fa output.vcf work_dir/ \
    --threads 8 \
    --max_cluster_bias_INS 1000 \
    --diff_ratio_merging_INS 0.9 \
    --max_cluster_bias_DEL 1000 \
    --diff_ratio_merging_DEL 0.5 \
    --genotype
```

## SVIM - Another Alternative

```bash
# SVIM for ONT data
svim alignment output_dir/ aligned.bam reference.fa \
    --insertion_sequences \
    --read_names \
    --sample sample_name
```

## pbsv - PacBio Specific

```bash
# Discover signatures
pbsv discover aligned.bam signatures.svsig.gz

# Call SVs
pbsv call reference.fa signatures.svsig.gz structural_variants.vcf
```

## Filter SV Calls

```bash
# Filter by quality and size
bcftools filter -i 'QUAL>=20 && ABS(SVLEN)>=50' svs.vcf > svs.filtered.vcf

# Keep only PASS
bcftools view -f PASS svs.vcf > svs.pass.vcf

# Filter specific SV types
bcftools view -i 'SVTYPE="DEL"' svs.vcf > deletions.vcf
bcftools view -i 'SVTYPE="INS"' svs.vcf > insertions.vcf
```

## Merge Multiple Callers

```bash
# Use SURVIVOR to merge SV callsets
SURVIVOR merge sample_files.txt 1000 2 1 1 0 50 merged_svs.vcf

# sample_files.txt contains VCF paths, one per line
# Parameters: max_distance, min_callers, type_agree, strand_agree, est_distance, min_size
```

## Annotate SVs

```bash
# Annotate with AnnotSV
AnnotSV -SVinputFile svs.vcf \
    -genomeBuild GRCh38 \
    -outputFile annotated_svs

# Or with bcftools
bcftools annotate -a gnomad_sv.vcf.gz -c INFO svs.vcf > svs.annotated.vcf
```

## SV Types

| Type | Code | Description |
|------|------|-------------|
| Deletion | DEL | Sequence removed |
| Insertion | INS | Sequence added |
| Inversion | INV | Sequence inverted |
| Duplication | DUP | Sequence duplicated |
| Translocation | BND | Breakend (complex) |

## Key Parameters - Sniffles2

| Parameter | Default | Description |
|-----------|---------|-------------|
| --minsupport | auto | Min supporting reads |
| --minsvlen | 50 | Min SV length |
| --mapq | 20 | Min mapping quality |
| --reference | none | Reference (for INS sequences) |
| --tandem-repeats | none | BED of tandem repeats |
| --mosaic | off | Detect mosaic SVs |

## Key Parameters - cuteSV

| Parameter | Default | Description |
|-----------|---------|-------------|
| --min_support | 10 | Min supporting reads |
| --min_size | 30 | Min SV length |
| --max_size | 100000 | Max SV length |
| --genotype | off | Output genotypes |
| --report_readid | off | Report read IDs |

## Coverage Guidelines

| Coverage | SV Detection |
|----------|--------------|
| 5-10x | Large SVs (>1kb) |
| 10-20x | Most SVs |
| 20-30x | High confidence |
| >30x | Mosaic/rare SVs |

## Related Skills

- long-read-alignment - Generate input BAM
- medaka-polishing - Polish assembly with SVs
- variant-calling/structural-variant-calling - Short-read SV comparison

Related Skills

tooluniverse-structural-variant-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.

bio-variant-calling-structural-variant-calling

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Call structural variants (SVs) from short-read sequencing using Manta, Delly, and LUMPY. Detects deletions, insertions, inversions, duplications, and translocations that are too large for standard SNV callers. Use when detecting structural variants from short-read data.

bio-structural-biology-modern-structure-prediction

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Predict protein structures using modern ML models including AlphaFold3, ESMFold, Chai-1, and Boltz-1. Use when predicting structures for novel proteins, protein complexes, or when comparing predictions across multiple methods.

bio-structural-biology-alphafold-predictions

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Access and analyze AlphaFold protein structure predictions. Use when predicted structures are needed for proteins without experimental structures, or for confidence scores (pLDDT).

bio-longread-qc

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Quality control for long-read sequencing data using NanoPlot, NanoStat, and chopper. Generate QC reports, filter reads by length and quality, and visualize read characteristics. Use when assessing ONT or PacBio run quality or filtering reads before assembly or alignment.

bio-longread-medaka

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Polish assemblies and call variants from Oxford Nanopore data using medaka. Uses neural networks trained on specific basecaller versions. Use when improving ONT-only assemblies or calling variants from Nanopore data without short-read polishing.

bio-longread-alignment

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Align long reads using minimap2 for Oxford Nanopore and PacBio data. Supports various presets for different read types and applications. Use when aligning ONT or PacBio reads to a reference genome for variant calling, SV detection, or coverage analysis.

bio-long-read-sequencing-clair3-variants

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Deep learning-based variant calling from long reads using Clair3 for SNPs and small indels. Use when calling germline variants from ONT or PacBio alignments, particularly when high accuracy is needed for clinical or research applications.

zinc-database

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

zarr-python

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

xlsx

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.

writing-skills

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use when creating new skills, editing existing skills, or verifying skills work before deployment