bio-workflows-atacseq-pipeline
End-to-end ATAC-seq workflow from FASTQ files to differential accessibility and TF footprinting. Covers alignment, peak calling with MACS3, QC metrics, and optional TOBIAS footprinting. Use when running end-to-end ATAC-seq analysis from FASTQ to differential accessibility.
Best use case
bio-workflows-atacseq-pipeline is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
End-to-end ATAC-seq workflow from FASTQ files to differential accessibility and TF footprinting. Covers alignment, peak calling with MACS3, QC metrics, and optional TOBIAS footprinting. Use when running end-to-end ATAC-seq analysis from FASTQ to differential accessibility.
Teams using bio-workflows-atacseq-pipeline should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/bio-workflows-atacseq-pipeline/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How bio-workflows-atacseq-pipeline Compares
| Feature / Agent | bio-workflows-atacseq-pipeline | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
End-to-end ATAC-seq workflow from FASTQ files to differential accessibility and TF footprinting. Covers alignment, peak calling with MACS3, QC metrics, and optional TOBIAS footprinting. Use when running end-to-end ATAC-seq analysis from FASTQ to differential accessibility.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# ATAC-seq Pipeline
Complete workflow from raw ATAC-seq FASTQ files to accessibility peaks, differential analysis, and TF footprinting.
## Workflow Overview
```
FASTQ files
|
v
[1. QC & Trimming] -----> fastp (Nextera adapters)
|
v
[2. Alignment] ---------> Bowtie2
|
v
[3. BAM Processing] ----> filter, shift, dedup
|
v
[4. Peak Calling] ------> MACS3
|
v
[5. QC] ----------------> TSS enrichment, FRiP, fragment size
|
v
[6. Differential] ------> DiffBind (optional)
|
v
[7. Footprinting] ------> TOBIAS (optional)
|
v
Accessibility peaks + TF activity
```
## Primary Path: Bowtie2 + MACS3
### Step 1: Quality Control with fastp
```bash
# ATAC-seq uses Nextera adapters
NEXTERA_R1="CTGTCTCTTATACACATCT"
NEXTERA_R2="CTGTCTCTTATACACATCT"
for sample in sample1 sample2 sample3; do
fastp -i ${sample}_R1.fastq.gz -I ${sample}_R2.fastq.gz \
-o trimmed/${sample}_R1.fq.gz -O trimmed/${sample}_R2.fq.gz \
--adapter_sequence ${NEXTERA_R1} \
--adapter_sequence_r2 ${NEXTERA_R2} \
--qualified_quality_phred 20 \
--length_required 25 \
--html qc/${sample}_fastp.html
done
```
### Step 2: Alignment with Bowtie2
```bash
# Build index (once)
bowtie2-build genome.fa bt2_index/genome
# Align with ATAC-seq specific settings
for sample in sample1 sample2 sample3; do
bowtie2 -p 8 -x bt2_index/genome \
-1 trimmed/${sample}_R1.fq.gz \
-2 trimmed/${sample}_R2.fq.gz \
--very-sensitive \
--no-mixed --no-discordant \
-X 2000 \
2> aligned/${sample}.log | \
samtools view -@ 4 -bS -q 30 -f 2 - | \
samtools sort -@ 4 -o aligned/${sample}.bam
done
```
### Step 3: BAM Processing
ATAC-seq requires special processing: removing mitochondrial reads, shifting reads for Tn5 insertion, and removing duplicates.
```bash
for sample in sample1 sample2 sample3; do
# Remove mitochondrial reads
samtools view -h aligned/${sample}.bam | \
grep -v chrM | \
samtools view -b - > aligned/${sample}.noMT.bam
# Mark and remove duplicates
samtools fixmate -m aligned/${sample}.noMT.bam - | \
samtools sort - | \
samtools markdup -r - aligned/${sample}.dedup.bam
samtools index aligned/${sample}.dedup.bam
# Shift reads for Tn5 (+ strand +4bp, - strand -5bp)
alignmentSieve -b aligned/${sample}.dedup.bam \
-o aligned/${sample}.shifted.bam \
--ATACshift \
-p 8
samtools index aligned/${sample}.shifted.bam
done
```
Alternative manual Tn5 shift with bedtools:
```bash
# Convert to BED and shift
bedtools bamtobed -i aligned/${sample}.dedup.bam | \
awk 'BEGIN{OFS="\t"} {if($6=="+"){$2=$2+4} else if($6=="-"){$3=$3-5} print}' | \
sort -k1,1 -k2,2n > aligned/${sample}.shifted.bed
```
### Step 4: Peak Calling with MACS3
```bash
# Call peaks (use --shift and --extsize for shifted reads)
macs3 callpeak \
-t aligned/sample1.shifted.bam \
-f BAMPE \
-g hs \
-n sample1 \
--outdir peaks \
--nomodel \
--shift -75 \
--extsize 150 \
--keep-dup all \
-q 0.01
# For calling on all samples together
macs3 callpeak \
-t aligned/*.shifted.bam \
-f BAMPE \
-g hs \
-n consensus \
--outdir peaks \
--nomodel \
--shift -75 \
--extsize 150 \
-q 0.01
```
### Step 5: ATAC-seq QC
```bash
# TSS enrichment (using deepTools)
computeMatrix reference-point \
-S bigwig/sample1.bw \
-R genes.bed \
--referencePoint TSS \
-a 2000 -b 2000 \
-o tss_matrix.gz
plotProfile -m tss_matrix.gz -o qc/tss_enrichment.pdf
# Fragment size distribution
samtools view aligned/sample1.dedup.bam | \
awk '{print sqrt($9^2)}' | \
sort | uniq -c | \
awk '{print $2"\t"$1}' > qc/fragment_sizes.txt
# FRiP calculation
total=$(samtools view -c aligned/sample1.shifted.bam)
in_peaks=$(bedtools intersect -a aligned/sample1.shifted.bam \
-b peaks/sample1_peaks.narrowPeak -u | samtools view -c)
echo "FRiP: $(echo "scale=4; $in_peaks/$total" | bc)"
```
**QC Checkpoint:** Assess ATAC quality
- TSS enrichment score >5 (ideally >10)
- FRiP >20%
- Nucleosome-free (<100bp) and mono/di-nucleosome peaks visible
### Step 6: Differential Accessibility with DiffBind
```r
library(DiffBind)
# Create sample sheet
samples <- data.frame(
SampleID = c('control_1', 'control_2', 'treated_1', 'treated_2'),
Condition = c('control', 'control', 'treated', 'treated'),
Replicate = c(1, 2, 1, 2),
bamReads = c('aligned/control_1.shifted.bam', 'aligned/control_2.shifted.bam',
'aligned/treated_1.shifted.bam', 'aligned/treated_2.shifted.bam'),
Peaks = c('peaks/control_1_peaks.narrowPeak', 'peaks/control_2_peaks.narrowPeak',
'peaks/treated_1_peaks.narrowPeak', 'peaks/treated_2_peaks.narrowPeak')
)
# Create DBA object
dba <- dba(sampleSheet = samples)
# Count reads in peaks
dba <- dba.count(dba)
# Normalize
dba <- dba.normalize(dba)
# Contrast
dba <- dba.contrast(dba, categories = DBA_CONDITION)
# Differential analysis
dba <- dba.analyze(dba)
# Report
report <- dba.report(dba)
write.csv(as.data.frame(report), 'differential_peaks.csv')
# Visualization
dba.plotMA(dba)
dba.plotVolcano(dba)
```
### Step 7: TF Footprinting with TOBIAS
```bash
# Correct Tn5 bias
TOBIAS ATACorrect \
-b aligned/sample1.shifted.bam \
-g genome.fa \
-p peaks/consensus_peaks.narrowPeak \
--outdir footprinting \
--cores 8
# Score footprints
TOBIAS ScoreBigwig \
--signal footprinting/sample1_corrected.bw \
--regions peaks/consensus_peaks.narrowPeak \
--output footprinting/sample1_footprints.bw \
--cores 8
# Bind detection
TOBIAS BINDetect \
--motifs motifs.jaspar \
--signals footprinting/sample1_footprints.bw \
--genome genome.fa \
--peaks peaks/consensus_peaks.narrowPeak \
--outdir footprinting/bindetect \
--cores 8
# Differential footprinting (two conditions)
TOBIAS BINDetect \
--motifs motifs.jaspar \
--signals footprinting/control_footprints.bw footprinting/treated_footprints.bw \
--genome genome.fa \
--peaks peaks/consensus_peaks.narrowPeak \
--outdir footprinting/differential \
--cores 8
```
## Parameter Recommendations
| Step | Parameter | Value |
|------|-----------|-------|
| fastp | adapter | Nextera (CTGTCTCTTATACACATCT) |
| Bowtie2 | -X | 2000 (max insert size) |
| samtools | -q | 30 (MAPQ filter) |
| MACS3 | --shift | -75 (for Tn5 shift) |
| MACS3 | --extsize | 150 |
| MACS3 | -q | 0.01-0.05 |
## Troubleshooting
| Issue | Likely Cause | Solution |
|-------|--------------|----------|
| High mitochondrial | Normal for ATAC | Filter chrM reads |
| Low TSS enrichment | Poor library, overdigestion | Check Tn5 concentration |
| Many small peaks | Tn5 insertion noise | Increase -q threshold |
| No nucleosome periodicity | Overdigestion | Adjust Tn5:DNA ratio |
## Complete Pipeline Script
```bash
#!/bin/bash
set -e
THREADS=8
INDEX="bt2_index/genome"
GENOME="genome.fa"
SAMPLES="sample1 sample2 sample3"
OUTDIR="atac_results"
mkdir -p ${OUTDIR}/{trimmed,aligned,peaks,qc,bigwig}
# Step 1: QC
for sample in $SAMPLES; do
fastp -i ${sample}_R1.fastq.gz -I ${sample}_R2.fastq.gz \
-o ${OUTDIR}/trimmed/${sample}_R1.fq.gz \
-O ${OUTDIR}/trimmed/${sample}_R2.fq.gz \
--adapter_sequence CTGTCTCTTATACACATCT \
--html ${OUTDIR}/qc/${sample}_fastp.html -w ${THREADS}
done
# Step 2-3: Align and process
for sample in $SAMPLES; do
bowtie2 -p ${THREADS} -x ${INDEX} \
-1 ${OUTDIR}/trimmed/${sample}_R1.fq.gz \
-2 ${OUTDIR}/trimmed/${sample}_R2.fq.gz \
--very-sensitive --no-mixed --no-discordant -X 2000 \
2> ${OUTDIR}/qc/${sample}_bowtie2.log | \
samtools view -@ ${THREADS} -bS -q 30 -f 2 - | \
grep -v chrM | \
samtools fixmate -m - - | \
samtools sort -@ ${THREADS} - | \
samtools markdup -r - - | \
alignmentSieve --ATACshift -b /dev/stdin -o ${OUTDIR}/aligned/${sample}.bam
samtools index ${OUTDIR}/aligned/${sample}.bam
done
# Step 4: Peak calling
macs3 callpeak -t ${OUTDIR}/aligned/*.bam -f BAMPE -g hs \
-n consensus --outdir ${OUTDIR}/peaks \
--nomodel --shift -75 --extsize 150 -q 0.01
echo "Pipeline complete. Peaks: ${OUTDIR}/peaks/consensus_peaks.narrowPeak"
```
## Related Skills
- atac-seq/atac-peak-calling - MACS3 ATAC parameters
- atac-seq/atac-qc - TSS enrichment, FRiP details
- atac-seq/differential-accessibility - DiffBind for ATAC
- atac-seq/footprinting - TOBIAS and HINT details
- chip-seq/peak-annotation - Annotate ATAC peaks to genesRelated Skills
etl-pipeline
Build automated ETL (Extract-Transform-Load) pipelines for construction data. Process PDFs, Excel, BIM exports. Generate reports, dashboards, and integrate with other systems. Orchestrate with Airflow or n8n.
data-pipeline
Data pipeline and ETL automation - extract, transform, load workflows for data integration and analytics
data-pipeline-manager
Design and troubleshoot robust data pipelines with comprehensive quality validation, error handling, and monitoring capabilities for bioinformatics and data processing workflows
data-engineering-data-pipeline
You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.
book-sft-pipeline
This skill should be used when the user asks to "fine-tune on books", "create SFT dataset", "train style model", "extract ePub text", or mentions style transfer, LoRA training, book segmentation, or author voice replication.
atft-pipeline
Manage J-Quants ingestion, feature graph generation, and cache hygiene for the ATFT-GAT-FAN dataset pipeline.
ATACseq-QC
Performs ATAC-specific biological validation. It calculates metrics unique to chromatin accessibility assays, such as TSS enrichment scores and fragment size distributions (nucleosome banding patterns). Use this skill when you have filtered BAM file and have called peak for the file. Do NOT use this skill for ChIP-seq data or general alignment statistics.
architecture-paradigm-pipeline
Consult this skill when designing data pipelines or transformation workflows. Use when data flows through fixed sequence of transformations, stages can be independently developed and tested, parallel processing of stages is beneficial. Do not use when selecting from multiple paradigms - use architecture-paradigms first. DO NOT use when: data flow is not sequential or predictable. DO NOT use when: complex branching/merging logic dominates.
airflow-workflows
Apache Airflow DAG design, operators, and scheduling best practices.
adaptive-workflows
Self-learning workflow system that tracks what works best for your use cases. Records experiment results, suggests optimizations, creates custom templates, and builds a personal knowledge base. Use to learn from experience and optimize your LLM workflows over time.
ai-content-pipeline
Build multi-step AI content creation pipelines combining image, video, audio, and text. Workflow examples: generate image -> animate -> add voiceover -> merge with music. Tools: FLUX, Veo, Kokoro TTS, OmniHuman, media merger, upscaling. Use for: YouTube videos, social media content, marketing materials, automated content. Triggers: content pipeline, ai workflow, content creation, multi-step ai, content automation, ai video workflow, generate and edit, ai content factory, automated content creation, ai production pipeline, media pipeline, content at scale
workflows-expert
Activate when requests involve workflow execution, CI/CD pipelines, git automation, or multi-step task orchestration. This skill provides workflows-mcp MCP server integration with tag-based workflow discovery, DAG-based execution, and variable syntax expertise. Trigger on phrases like "run workflow", "execute workflow", "orchestrate tasks", "automate CI/CD", or "workflow information".