bio-methylation-bismark-alignment

Bisulfite sequencing read alignment using Bismark with bowtie2/hisat2. Handles genome preparation and produces BAM files with methylation information. Use when aligning WGBS, RRBS, or other bisulfite-converted sequencing reads to a reference genome.

1,802 stars

Best use case

bio-methylation-bismark-alignment is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Bisulfite sequencing read alignment using Bismark with bowtie2/hisat2. Handles genome preparation and produces BAM files with methylation information. Use when aligning WGBS, RRBS, or other bisulfite-converted sequencing reads to a reference genome.

Teams using bio-methylation-bismark-alignment should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-methylation-bismark-alignment/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-methylation-bismark-alignment/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/bio-methylation-bismark-alignment/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How bio-methylation-bismark-alignment Compares

Feature / Agentbio-methylation-bismark-alignmentStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Bisulfite sequencing read alignment using Bismark with bowtie2/hisat2. Handles genome preparation and produces BAM files with methylation information. Use when aligning WGBS, RRBS, or other bisulfite-converted sequencing reads to a reference genome.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## Version Compatibility

Reference examples tested with: Bowtie2 2.5.3+, HISAT2 2.2.1+, Trim Galore 0.6.10+, samtools 1.19+

Before using code patterns, verify installed versions match. If versions differ:
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Bismark Alignment

**"Align my bisulfite sequencing reads"** → Map WGBS/RRBS reads to an in-silico bisulfite-converted reference genome, producing BAM files with methylation context tags.
- CLI: `bismark_genome_preparation genome/` then `bismark --genome genome/ reads.fq.gz`

## Prepare Genome Index

```bash
# One-time genome preparation (creates bisulfite-converted index)
bismark_genome_preparation --bowtie2 /path/to/genome_folder/

# Genome folder should contain FASTA files (e.g., hg38.fa, chr1.fa, etc.)
# Creates Bisulfite_Genome/ subdirectory with CT and GA converted indices
```

## Basic Single-End Alignment

```bash
bismark --genome /path/to/genome_folder/ reads.fastq.gz -o output_dir/
```

## Paired-End Alignment

```bash
bismark --genome /path/to/genome_folder/ \
    -1 reads_R1.fastq.gz \
    -2 reads_R2.fastq.gz \
    -o output_dir/
```

## Common Options

```bash
bismark --genome /path/to/genome_folder/ \
    --bowtie2 \                    # Use bowtie2 (default)
    --parallel 4 \                 # Number of parallel instances
    --temp_dir /tmp/ \             # Temporary directory
    --non_directional \            # For non-directional libraries
    --nucleotide_coverage \        # Generate nucleotide coverage report
    -o output_dir/ \
    reads.fastq.gz
```

## RRBS Mode

```bash
# Reduced Representation Bisulfite Sequencing
bismark --genome /path/to/genome_folder/ \
    --pbat \                       # For PBAT libraries (post-bisulfite adapter tagging)
    reads.fastq.gz

# MspI digestion (RRBS standard)
# Bismark handles MspI-digested libraries automatically
```

## PBAT Libraries

```bash
# Post-Bisulfite Adapter Tagging (e.g., scBS-seq)
bismark --genome /path/to/genome_folder/ --pbat reads.fastq.gz
```

## Non-Directional Libraries

```bash
# For libraries where all 4 strands are present
bismark --genome /path/to/genome_folder/ --non_directional reads.fastq.gz
```

## With Quality/Adapter Trimming (Pre-alignment)

```bash
# Trim adapters first with Trim Galore (recommended)
trim_galore --illumina --paired reads_R1.fastq.gz reads_R2.fastq.gz

# Then align
bismark --genome /path/to/genome_folder/ \
    -1 reads_R1_val_1.fq.gz \
    -2 reads_R2_val_2.fq.gz
```

## Multicore Processing

```bash
# --parallel sets instances per alignment direction
# Total threads = parallel * 2 (for directional) or parallel * 4 (non-directional)
bismark --genome /path/to/genome_folder/ \
    --parallel 4 \
    reads.fastq.gz
```

## Output Files

```bash
# Bismark produces:
# - reads_bismark_bt2.bam          # Aligned reads
# - reads_bismark_bt2_SE_report.txt # Alignment report

# View alignment report
cat output_dir/reads_bismark_bt2_SE_report.txt
```

## Sort and Index BAM

```bash
# Bismark output is unsorted
samtools sort output.bam -o output.sorted.bam
samtools index output.sorted.bam
```

## Deduplicate (Optional)

```bash
# Remove PCR duplicates (recommended for WGBS, not RRBS)
deduplicate_bismark --bam output_bismark_bt2.bam

# For paired-end
deduplicate_bismark --paired --bam output_bismark_bt2_pe.bam
```

## Check Alignment Statistics

```bash
# Bismark generates detailed report
cat *_SE_report.txt

# Key metrics:
# - Sequences analyzed
# - Unique alignments
# - Mapping efficiency
# - C methylated in CpG context
```

## Genome Preparation with HISAT2 (Recommended for Large Genomes)

```bash
# HISAT2 is faster and uses less memory for large mammalian genomes
bismark_genome_preparation --hisat2 /path/to/genome_folder/

# Align with HISAT2
bismark --genome /path/to/genome_folder/ --hisat2 reads.fastq.gz

# HISAT2 paired-end
bismark --genome /path/to/genome_folder/ --hisat2 \
    -1 reads_R1.fastq.gz \
    -2 reads_R2.fastq.gz
```

## Key Parameters

| Parameter | Description |
|-----------|-------------|
| --genome | Path to genome folder |
| --bowtie2 | Use Bowtie2 aligner (default) |
| --hisat2 | Use HISAT2 aligner |
| --parallel | Parallel alignment instances |
| --non_directional | Non-directional library |
| --pbat | PBAT library protocol |
| -o | Output directory |
| --temp_dir | Temporary file directory |
| --nucleotide_coverage | Generate nuc coverage report |
| -N | Mismatches in seed (0 or 1, default 0) |
| -L | Seed length (default 20) |

## Library Types

| Type | Parameter | Description |
|------|-----------|-------------|
| Directional | (default) | Standard WGBS/RRBS |
| Non-directional | --non_directional | All 4 strands |
| PBAT | --pbat | Post-bisulfite adapter tagging |

## Related Skills

- methylation-calling - Extract methylation from Bismark BAM
- methylkit-analysis - Import Bismark output to R
- sequence-io/read-sequences - FASTQ handling
- alignment-files/sam-bam-basics - BAM manipulation

Related Skills

bio-methylation-methylkit

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

DNA methylation analysis with methylKit in R. Import Bismark coverage files, filter by coverage, normalize samples, and perform statistical comparisons. Use when analyzing single-base methylation patterns, comparing samples, or preparing data for DMR detection.

bio-methylation-dmr-detection

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Differentially methylated region (DMR) detection using methylKit tiles, bsseq BSmooth, and DMRcate. Use when identifying contiguous genomic regions with methylation differences between experimental conditions or cell types.

bio-methylation-calling

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Extract methylation calls from Bismark BAM files using bismark_methylation_extractor. Generates per-cytosine reports for CpG, CHG, and CHH contexts. Use when extracting methylation levels from aligned bisulfite sequencing data for downstream analysis.

bio-methylation-based-detection

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Analyzes cfDNA methylation patterns for cancer detection using cfMeDIP-seq or bisulfite sequencing with MethylDackel. Identifies cancer-specific methylation signatures and performs tissue-of-origin deconvolution. Use when using methylation biomarkers for early cancer detection or minimal residual disease.

bio-longread-alignment

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Align long reads using minimap2 for Oxford Nanopore and PacBio data. Supports various presets for different read types and applications. Use when aligning ONT or PacBio reads to a reference genome for variant calling, SV detection, or coverage analysis.

bio-long-read-sequencing-nanopore-methylation

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Calls DNA methylation from Oxford Nanopore sequencing data using signal-level analysis. Use when detecting 5mC or 6mA modifications directly from nanopore reads without bisulfite conversion.

bio-alignment-pairwise

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Perform pairwise sequence alignment using Biopython Bio.Align.PairwiseAligner. Use when comparing two sequences, finding optimal alignments, scoring similarity, and identifying local or global matches between DNA, RNA, or protein sequences.

bio-alignment-msa-statistics

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Calculate alignment statistics including sequence identity, conservation scores, substitution matrices, and similarity metrics. Use when comparing alignment quality, measuring sequence divergence, and analyzing evolutionary patterns.

bio-alignment-msa-parsing

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Parse and analyze multiple sequence alignments using Biopython. Extract sequences, identify conserved regions, analyze gaps, work with annotations, and manipulate alignment data for downstream analysis. Use when parsing or manipulating multiple sequence alignments.

bio-alignment-io

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Read, write, and convert multiple sequence alignment files using Biopython Bio.AlignIO. Supports Clustal, PHYLIP, Stockholm, FASTA, Nexus, and other alignment formats for phylogenetics and conservation analysis. Use when reading, writing, or converting alignment file formats.

zinc-database

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

zarr-python

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.