bio-longread-alignment

Align long reads using minimap2 for Oxford Nanopore and PacBio data. Supports various presets for different read types and applications. Use when aligning ONT or PacBio reads to a reference genome for variant calling, SV detection, or coverage analysis.

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

bio-longread-alignment is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using bio-longread-alignment should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-longread-alignment/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-longread-alignment/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bio-longread-alignment/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bio-longread-alignment Compares

Feature / Agent	bio-longread-alignment	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## Version Compatibility

Reference examples tested with: minimap2 2.26+, samtools 1.19+

Before using code patterns, verify installed versions match. If versions differ:
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Long-Read Alignment with minimap2

**"Align my long reads to the reference"** → Map ONT or PacBio reads using minimap2 with technology-specific presets for optimal sensitivity and accuracy.
- CLI: `minimap2 -ax map-ont ref.fa reads.fq | samtools sort -o aligned.bam` (ONT), `minimap2 -ax map-hifi` (PacBio HiFi)

## Oxford Nanopore Alignment

```bash
# Basic ONT alignment
minimap2 -ax map-ont reference.fa reads.fastq.gz | \
    samtools sort -o aligned.bam
samtools index aligned.bam
```

## PacBio HiFi Alignment

```bash
# PacBio HiFi reads (high accuracy)
minimap2 -ax map-hifi reference.fa reads.fastq.gz | \
    samtools sort -o aligned.bam
samtools index aligned.bam
```

## PacBio CLR Alignment

```bash
# PacBio CLR (continuous long reads, lower accuracy)
minimap2 -ax map-pb reference.fa reads.fastq.gz | \
    samtools sort -o aligned.bam
samtools index aligned.bam
```

## Pre-Build Index for Multiple Runs

```bash
# Build index once
minimap2 -d reference.mmi reference.fa

# Use index for alignment
minimap2 -ax map-ont reference.mmi reads.fastq.gz | samtools sort -o aligned.bam
```

## Common Options

```bash
minimap2 -ax map-ont \
    -t 8 \                         # Threads
    -R '@RG\tID:sample\tSM:sample' \  # Read group
    --secondary=no \               # No secondary alignments
    --MD \                         # Generate MD tag for variants
    -Y \                           # Use soft clipping for supplementary
    reference.fa reads.fastq.gz | \
    samtools sort -@ 4 -o aligned.bam
```

## Splice-Aware Alignment (RNA)

```bash
# For direct RNA or cDNA sequencing
minimap2 -ax splice reference.fa reads.fastq.gz | \
    samtools sort -o aligned.bam
```

## With Junction BED (Known Splice Sites)

```bash
# Provide known splice junctions
minimap2 -ax splice --junc-bed junctions.bed \
    reference.fa reads.fastq.gz | samtools sort -o aligned.bam
```

## Assembly to Reference Alignment

```bash
# Assembly with ~0.1% divergence
minimap2 -ax asm5 reference.fa assembly.fa > aligned.sam

# Assembly with higher divergence (~5%)
minimap2 -ax asm20 reference.fa assembly.fa > aligned.sam
```

## Output PAF (Faster, No BAM)

```bash
# PAF format (faster, for quick analysis)
minimap2 -x map-ont reference.fa reads.fastq.gz > alignments.paf
```

## Keep Secondary and Supplementary

```bash
# Keep all alignments (for SV calling)
minimap2 -ax map-ont \
    --secondary=yes \
    -N 5 \                         # Max secondary alignments
    reference.fa reads.fastq.gz | samtools sort -o aligned.bam
```

## Filter Alignments

```bash
# During alignment pipeline
minimap2 -ax map-ont reference.fa reads.fastq.gz | \
    samtools view -b -q 10 | \     # Min mapping quality 10
    samtools sort -o aligned.bam
```

## Multiple FASTQ Files

```bash
# Concatenate inputs
minimap2 -ax map-ont reference.fa reads1.fastq.gz reads2.fastq.gz | \
    samtools sort -o aligned.bam

# Or use file list
cat file_list.txt | xargs minimap2 -ax map-ont reference.fa | \
    samtools sort -o aligned.bam
```

## Output Statistics

```bash
# Get alignment statistics
samtools flagstat aligned.bam

# Detailed stats
samtools stats aligned.bam | grep ^SN
```

## Convert PAF to BED

```bash
# Extract alignments to BED
awk 'OFS="\t" {print $6, $8, $9, $1, $12, ($5=="+")?"+":"-"}' alignments.paf > alignments.bed
```

## Key Presets

| Preset | Description | Best For |
|--------|-------------|----------|
| map-ont | ONT reads | Nanopore genomic |
| map-hifi | PacBio HiFi | PacBio genomic |
| map-pb | PacBio CLR | PacBio CLR |
| splice | Long RNA reads | cDNA, direct RNA |
| asm5 | Low divergence | Same species assembly |
| asm20 | High divergence | Cross-species assembly |
| sr | Short reads | Illumina (basic) |

## Key Parameters

| Parameter | Default | Description |
|-----------|---------|-------------|
| -t | 3 | CPU threads |
| -k | 15 | K-mer size |
| -w | 10 | Minimizer window |
| -a | off | Output SAM |
| -x | none | Preset |
| --secondary | yes | Output secondary |
| -N | 5 | Max secondary alignments |
| --MD | off | Generate MD tag |
| -R | none | Read group header |
| -Y | off | Soft clipping for supplementary |

## Output Formats

| Format | Flag | Description |
|--------|------|-------------|
| PAF | (default) | Pairwise Alignment Format |
| SAM | -a | Sequence Alignment Map |
| BAM | -a \| samtools | Binary SAM |

## Related Skills

- medaka-polishing - Polish consensus with medaka
- structural-variants - Call SVs from alignments
- alignment-files/sam-bam-basics - BAM manipulation

Related Skills

bio-methylation-bismark-alignment

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Bisulfite sequencing read alignment using Bismark with bowtie2/hisat2. Handles genome preparation and produces BAM files with methylation information. Use when aligning WGBS, RRBS, or other bisulfite-converted sequencing reads to a reference genome.

bio-longread-structural-variants

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Detect structural variants from long-read alignments using Sniffles, cuteSV, and SVIM. Use when detecting deletions, insertions, inversions, translocations, or complex rearrangements from ONT or PacBio data, especially those missed by short-read methods.

bio-longread-qc

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Quality control for long-read sequencing data using NanoPlot, NanoStat, and chopper. Generate QC reports, filter reads by length and quality, and visualize read characteristics. Use when assessing ONT or PacBio run quality or filtering reads before assembly or alignment.

bio-longread-medaka

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Polish assemblies and call variants from Oxford Nanopore data using medaka. Uses neural networks trained on specific basecaller versions. Use when improving ONT-only assemblies or calling variants from Nanopore data without short-read polishing.

bio-alignment-pairwise

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Perform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences.

bio-alignment-msa-statistics

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Calculate alignment statistics including sequence identity, conservation scores, substitution matrices, and similarity metrics. Use when comparing alignment quality, measuring sequence divergence, and analyzing evolutionary patterns.

bio-alignment-msa-parsing

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Parse and analyze multiple sequence alignments using Biopython. Extract sequences, identify conserved regions, analyze gaps, work with annotations, and manipulate alignment data for downstream analysis. Use when parsing or manipulating multiple sequence alignments.

bio-alignment-io

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

zinc-database

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

zarr-python

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

xlsx

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.

writing-skills

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use when creating new skills, editing existing skills, or verifying skills work before deployment