bio-workflows-atacseq-pipeline

End-to-end ATAC-seq workflow from FASTQ files to differential accessibility and TF footprinting. Covers alignment, peak calling with MACS3, QC metrics, and optional TOBIAS footprinting. Use when running end-to-end ATAC-seq analysis from FASTQ to differential accessibility.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

bio-workflows-atacseq-pipeline is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using bio-workflows-atacseq-pipeline should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-workflows-atacseq-pipeline/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/cli-automation/bio-workflows-atacseq-pipeline/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bio-workflows-atacseq-pipeline/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bio-workflows-atacseq-pipeline Compares

Feature / Agent	bio-workflows-atacseq-pipeline	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# ATAC-seq Pipeline

Complete workflow from raw ATAC-seq FASTQ files to accessibility peaks, differential analysis, and TF footprinting.

## Workflow Overview

```
FASTQ files
    |
    v
[1. QC & Trimming] -----> fastp (Nextera adapters)
    |
    v
[2. Alignment] ---------> Bowtie2
    |
    v
[3. BAM Processing] ----> filter, shift, dedup
    |
    v
[4. Peak Calling] ------> MACS3
    |
    v
[5. QC] ----------------> TSS enrichment, FRiP, fragment size
    |
    v
[6. Differential] ------> DiffBind (optional)
    |
    v
[7. Footprinting] ------> TOBIAS (optional)
    |
    v
Accessibility peaks + TF activity
```

## Primary Path: Bowtie2 + MACS3

### Step 1: Quality Control with fastp

```bash
# ATAC-seq uses Nextera adapters
NEXTERA_R1="CTGTCTCTTATACACATCT"
NEXTERA_R2="CTGTCTCTTATACACATCT"

for sample in sample1 sample2 sample3; do
    fastp -i ${sample}_R1.fastq.gz -I ${sample}_R2.fastq.gz \
        -o trimmed/${sample}_R1.fq.gz -O trimmed/${sample}_R2.fq.gz \
        --adapter_sequence ${NEXTERA_R1} \
        --adapter_sequence_r2 ${NEXTERA_R2} \
        --qualified_quality_phred 20 \
        --length_required 25 \
        --html qc/${sample}_fastp.html
done
```

### Step 2: Alignment with Bowtie2

```bash
# Build index (once)
bowtie2-build genome.fa bt2_index/genome

# Align with ATAC-seq specific settings
for sample in sample1 sample2 sample3; do
    bowtie2 -p 8 -x bt2_index/genome \
        -1 trimmed/${sample}_R1.fq.gz \
        -2 trimmed/${sample}_R2.fq.gz \
        --very-sensitive \
        --no-mixed --no-discordant \
        -X 2000 \
        2> aligned/${sample}.log | \
    samtools view -@ 4 -bS -q 30 -f 2 - | \
    samtools sort -@ 4 -o aligned/${sample}.bam
done
```

### Step 3: BAM Processing

ATAC-seq requires special processing: removing mitochondrial reads, shifting reads for Tn5 insertion, and removing duplicates.

```bash
for sample in sample1 sample2 sample3; do
    # Remove mitochondrial reads
    samtools view -h aligned/${sample}.bam | \
        grep -v chrM | \
        samtools view -b - > aligned/${sample}.noMT.bam

    # Mark and remove duplicates
    samtools fixmate -m aligned/${sample}.noMT.bam - | \
    samtools sort - | \
    samtools markdup -r - aligned/${sample}.dedup.bam

    samtools index aligned/${sample}.dedup.bam

    # Shift reads for Tn5 (+ strand +4bp, - strand -5bp)
    alignmentSieve -b aligned/${sample}.dedup.bam \
        -o aligned/${sample}.shifted.bam \
        --ATACshift \
        -p 8

    samtools index aligned/${sample}.shifted.bam
done
```

Alternative manual Tn5 shift with bedtools:
```bash
# Convert to BED and shift
bedtools bamtobed -i aligned/${sample}.dedup.bam | \
    awk 'BEGIN{OFS="\t"} {if($6=="+"){$2=$2+4} else if($6=="-"){$3=$3-5} print}' | \
    sort -k1,1 -k2,2n > aligned/${sample}.shifted.bed
```

### Step 4: Peak Calling with MACS3

```bash
# Call peaks (use --shift and --extsize for shifted reads)
macs3 callpeak \
    -t aligned/sample1.shifted.bam \
    -f BAMPE \
    -g hs \
    -n sample1 \
    --outdir peaks \
    --nomodel \
    --shift -75 \
    --extsize 150 \
    --keep-dup all \
    -q 0.01

# For calling on all samples together
macs3 callpeak \
    -t aligned/*.shifted.bam \
    -f BAMPE \
    -g hs \
    -n consensus \
    --outdir peaks \
    --nomodel \
    --shift -75 \
    --extsize 150 \
    -q 0.01
```

### Step 5: ATAC-seq QC

```bash
# TSS enrichment (using deepTools)
computeMatrix reference-point \
    -S bigwig/sample1.bw \
    -R genes.bed \
    --referencePoint TSS \
    -a 2000 -b 2000 \
    -o tss_matrix.gz

plotProfile -m tss_matrix.gz -o qc/tss_enrichment.pdf

# Fragment size distribution
samtools view aligned/sample1.dedup.bam | \
    awk '{print sqrt($9^2)}' | \
    sort | uniq -c | \
    awk '{print $2"\t"$1}' > qc/fragment_sizes.txt

# FRiP calculation
total=$(samtools view -c aligned/sample1.shifted.bam)
in_peaks=$(bedtools intersect -a aligned/sample1.shifted.bam \
    -b peaks/sample1_peaks.narrowPeak -u | samtools view -c)
echo "FRiP: $(echo "scale=4; $in_peaks/$total" | bc)"
```

**QC Checkpoint:** Assess ATAC quality
- TSS enrichment score >5 (ideally >10)
- FRiP >20%
- Nucleosome-free (<100bp) and mono/di-nucleosome peaks visible

### Step 6: Differential Accessibility with DiffBind

```r
library(DiffBind)

# Create sample sheet
samples <- data.frame(
    SampleID = c('control_1', 'control_2', 'treated_1', 'treated_2'),
    Condition = c('control', 'control', 'treated', 'treated'),
    Replicate = c(1, 2, 1, 2),
    bamReads = c('aligned/control_1.shifted.bam', 'aligned/control_2.shifted.bam',
                 'aligned/treated_1.shifted.bam', 'aligned/treated_2.shifted.bam'),
    Peaks = c('peaks/control_1_peaks.narrowPeak', 'peaks/control_2_peaks.narrowPeak',
              'peaks/treated_1_peaks.narrowPeak', 'peaks/treated_2_peaks.narrowPeak')
)

# Create DBA object
dba <- dba(sampleSheet = samples)

# Count reads in peaks
dba <- dba.count(dba)

# Normalize
dba <- dba.normalize(dba)

# Contrast
dba <- dba.contrast(dba, categories = DBA_CONDITION)

# Differential analysis
dba <- dba.analyze(dba)

# Report
report <- dba.report(dba)
write.csv(as.data.frame(report), 'differential_peaks.csv')

# Visualization
dba.plotMA(dba)
dba.plotVolcano(dba)
```

### Step 7: TF Footprinting with TOBIAS

```bash
# Correct Tn5 bias
TOBIAS ATACorrect \
    -b aligned/sample1.shifted.bam \
    -g genome.fa \
    -p peaks/consensus_peaks.narrowPeak \
    --outdir footprinting \
    --cores 8

# Score footprints
TOBIAS ScoreBigwig \
    --signal footprinting/sample1_corrected.bw \
    --regions peaks/consensus_peaks.narrowPeak \
    --output footprinting/sample1_footprints.bw \
    --cores 8

# Bind detection
TOBIAS BINDetect \
    --motifs motifs.jaspar \
    --signals footprinting/sample1_footprints.bw \
    --genome genome.fa \
    --peaks peaks/consensus_peaks.narrowPeak \
    --outdir footprinting/bindetect \
    --cores 8

# Differential footprinting (two conditions)
TOBIAS BINDetect \
    --motifs motifs.jaspar \
    --signals footprinting/control_footprints.bw footprinting/treated_footprints.bw \
    --genome genome.fa \
    --peaks peaks/consensus_peaks.narrowPeak \
    --outdir footprinting/differential \
    --cores 8
```

## Parameter Recommendations

| Step | Parameter | Value |
|------|-----------|-------|
| fastp | adapter | Nextera (CTGTCTCTTATACACATCT) |
| Bowtie2 | -X | 2000 (max insert size) |
| samtools | -q | 30 (MAPQ filter) |
| MACS3 | --shift | -75 (for Tn5 shift) |
| MACS3 | --extsize | 150 |
| MACS3 | -q | 0.01-0.05 |

## Troubleshooting

| Issue | Likely Cause | Solution |
|-------|--------------|----------|
| High mitochondrial | Normal for ATAC | Filter chrM reads |
| Low TSS enrichment | Poor library, overdigestion | Check Tn5 concentration |
| Many small peaks | Tn5 insertion noise | Increase -q threshold |
| No nucleosome periodicity | Overdigestion | Adjust Tn5:DNA ratio |

## Complete Pipeline Script

```bash
#!/bin/bash
set -e

THREADS=8
INDEX="bt2_index/genome"
GENOME="genome.fa"
SAMPLES="sample1 sample2 sample3"
OUTDIR="atac_results"

mkdir -p ${OUTDIR}/{trimmed,aligned,peaks,qc,bigwig}

# Step 1: QC
for sample in $SAMPLES; do
    fastp -i ${sample}_R1.fastq.gz -I ${sample}_R2.fastq.gz \
        -o ${OUTDIR}/trimmed/${sample}_R1.fq.gz \
        -O ${OUTDIR}/trimmed/${sample}_R2.fq.gz \
        --adapter_sequence CTGTCTCTTATACACATCT \
        --html ${OUTDIR}/qc/${sample}_fastp.html -w ${THREADS}
done

# Step 2-3: Align and process
for sample in $SAMPLES; do
    bowtie2 -p ${THREADS} -x ${INDEX} \
        -1 ${OUTDIR}/trimmed/${sample}_R1.fq.gz \
        -2 ${OUTDIR}/trimmed/${sample}_R2.fq.gz \
        --very-sensitive --no-mixed --no-discordant -X 2000 \
        2> ${OUTDIR}/qc/${sample}_bowtie2.log | \
    samtools view -@ ${THREADS} -bS -q 30 -f 2 - | \
    grep -v chrM | \
    samtools fixmate -m - - | \
    samtools sort -@ ${THREADS} - | \
    samtools markdup -r - - | \
    alignmentSieve --ATACshift -b /dev/stdin -o ${OUTDIR}/aligned/${sample}.bam
    samtools index ${OUTDIR}/aligned/${sample}.bam
done

# Step 4: Peak calling
macs3 callpeak -t ${OUTDIR}/aligned/*.bam -f BAMPE -g hs \
    -n consensus --outdir ${OUTDIR}/peaks \
    --nomodel --shift -75 --extsize 150 -q 0.01

echo "Pipeline complete. Peaks: ${OUTDIR}/peaks/consensus_peaks.narrowPeak"
```

## Related Skills

- atac-seq/atac-peak-calling - MACS3 ATAC parameters
- atac-seq/atac-qc - TSS enrichment, FRiP details
- atac-seq/differential-accessibility - DiffBind for ATAC
- atac-seq/footprinting - TOBIAS and HINT details
- chip-seq/peak-annotation - Annotate ATAC peaks to genes

Related Skills

etl-pipeline

from diegosouzapw/awesome-omni-skill

Build automated ETL (Extract-Transform-Load) pipelines for construction data. Process PDFs, Excel, BIM exports. Generate reports, dashboards, and integrate with other systems. Orchestrate with Airflow or n8n.

data-pipeline

from diegosouzapw/awesome-omni-skill

Data pipeline and ETL automation - extract, transform, load workflows for data integration and analytics

data-pipeline-manager

from diegosouzapw/awesome-omni-skill

Design and troubleshoot robust data pipelines with comprehensive quality validation, error handling, and monitoring capabilities for bioinformatics and data processing workflows

data-engineering-data-pipeline

from diegosouzapw/awesome-omni-skill

You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.

book-sft-pipeline

from diegosouzapw/awesome-omni-skill

This skill should be used when the user asks to "fine-tune on books", "create SFT dataset", "train style model", "extract ePub text", or mentions style transfer, LoRA training, book segmentation, or author voice replication.

atft-pipeline

from diegosouzapw/awesome-omni-skill

Manage J-Quants ingestion, feature graph generation, and cache hygiene for the ATFT-GAT-FAN dataset pipeline.

ATACseq-QC

from diegosouzapw/awesome-omni-skill

Performs ATAC-specific biological validation. It calculates metrics unique to chromatin accessibility assays, such as TSS enrichment scores and fragment size distributions (nucleosome banding patterns). Use this skill when you have filtered BAM file and have called peak for the file. Do NOT use this skill for ChIP-seq data or general alignment statistics.

architecture-paradigm-pipeline

from diegosouzapw/awesome-omni-skill

Consult this skill when designing data pipelines or transformation workflows. Use when data flows through fixed sequence of transformations, stages can be independently developed and tested, parallel processing of stages is beneficial. Do not use when selecting from multiple paradigms - use architecture-paradigms first. DO NOT use when: data flow is not sequential or predictable. DO NOT use when: complex branching/merging logic dominates.

airflow-workflows

from diegosouzapw/awesome-omni-skill

Apache Airflow DAG design, operators, and scheduling best practices.

adaptive-workflows

from diegosouzapw/awesome-omni-skill

Self-learning workflow system that tracks what works best for your use cases. Records experiment results, suggests optimizations, creates custom templates, and builds a personal knowledge base. Use to learn from experience and optimize your LLM workflows over time.

ai-content-pipeline

from diegosouzapw/awesome-omni-skill

Build multi-step AI content creation pipelines combining image, video, audio, and text. Workflow examples: generate image -> animate -> add voiceover -> merge with music. Tools: FLUX, Veo, Kokoro TTS, OmniHuman, media merger, upscaling. Use for: YouTube videos, social media content, marketing materials, automated content. Triggers: content pipeline, ai workflow, content creation, multi-step ai, content automation, ai video workflow, generate and edit, ai content factory, automated content creation, ai production pipeline, media pipeline, content at scale

workflows-expert

from diegosouzapw/awesome-omni-skill

Activate when requests involve workflow execution, CI/CD pipelines, git automation, or multi-step task orchestration. This skill provides workflows-mcp MCP server integration with tag-based workflow discovery, DAG-based execution, and variable syntax expertise. Trigger on phrases like "run workflow", "execute workflow", "orchestrate tasks", "automate CI/CD", or "workflow information".