differential-region-analysis
The differential-region-analysis pipeline identifies genomic regions exhibiting significant differences in signal intensity between experimental conditions using a count-based framework and DESeq2. It supports detection of both differentially accessible regions (DARs) from open-chromatin assays (e.g., ATAC-seq, DNase-seq) and differential transcription factor (TF) binding regions from TF-centric assays (e.g., ChIP-seq, CUT&RUN, CUT&Tag). The pipeline can start from aligned BAM files or a precomputed count matrix and is suitable whenever genomic signal can be summarized as read counts per region.
Best use case
differential-region-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
The differential-region-analysis pipeline identifies genomic regions exhibiting significant differences in signal intensity between experimental conditions using a count-based framework and DESeq2. It supports detection of both differentially accessible regions (DARs) from open-chromatin assays (e.g., ATAC-seq, DNase-seq) and differential transcription factor (TF) binding regions from TF-centric assays (e.g., ChIP-seq, CUT&RUN, CUT&Tag). The pipeline can start from aligned BAM files or a precomputed count matrix and is suitable whenever genomic signal can be summarized as read counts per region.
Teams using differential-region-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/differential-region-analysis/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How differential-region-analysis Compares
| Feature / Agent | differential-region-analysis | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
The differential-region-analysis pipeline identifies genomic regions exhibiting significant differences in signal intensity between experimental conditions using a count-based framework and DESeq2. It supports detection of both differentially accessible regions (DARs) from open-chromatin assays (e.g., ATAC-seq, DNase-seq) and differential transcription factor (TF) binding regions from TF-centric assays (e.g., ChIP-seq, CUT&RUN, CUT&Tag). The pipeline can start from aligned BAM files or a precomputed count matrix and is suitable whenever genomic signal can be summarized as read counts per region.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Differential Region Analysis with DESeq2
## Overview
This skill performs differential region analysis between experimental conditions using DESeq2 in a count-based framework.
Main steps include:
- Initialize the project directory.
- Refer to the **Inputs & Outputs** section to check inputs and build the output architecture. All the output file should located in `${proj_dir}` in Step 0.
- **Always prompt user** if required files are missing.
- **Always prompt user** for the threshold of `qvalues` and `log2foldchange` to define significant regions.
- Merge peaks across replicates or samples to build a consensus peak set.
- Generate read count matrix over peaks using featureCounts or bedtools.
- Prepare sample metadata file describing conditions and replicates.
- Perform differential analysis using DESeq2.
- Visualize and interpret results (PCA, volcano plot).
- Output significantly up and down accessible regions.
---
## When to use this skill
Use the differential-region-analysis pipeline when your goal is to identify genomic regions with condition-dependent changes in signal intensity, provided the signal can be represented as raw read counts per region.
Recommended scenarios include:
- Comparing treated vs. control samples to identify regulatory regions responsive to a drug, signaling molecule, or environmental change.
- Investigating cell differentiation or developmental trajectories to reveal dynamic chromatin remodeling.
- Analyzing disease vs. normal tissues to pinpoint dysregulated enhancer or promoter accessibility.
- Integrating with RNA-seq or ChIP-seq data to connect chromatin accessibility with transcriptional or epigenetic regulation.
The pipeline performs best with datasets containing biological replicates (≥2 per condition) and moderate to high sequencing depth (~20–50 million reads per sample).
---
## Inputs & Outputs
### Inputs (choose one)
- If starting from BAM files and BED peak files → Generate consensus peaks and count matrix.
- If starting from existing count matrix → Go directly to DESeq2 analysis.
- If multiple conditions or batches → Include batch/condition in design
### Outputs
```bash
${sample}_DAR_analysis/ # or ${tf}_${sample}_DB_analysis in differential TF binding detection task
tables/
all_peaks.bed
consensus_peaks.bed # Unified peak set
atac_counts.txt # Count matrix of reads per peak
samples.csv # Sample metadata
DARs/
DAR_results.csv # DESeq2 results (log2FC, p-values)
DAR_sig.bed # Significantly diffential accessible regions
DAR_up.bed
DAR_down.bed
plots/ # visualization outputs
PCA.pdf
Volcano.pdf
logs/ # analysis logs
temp/ # other temp files
```
---
## Decision Tree
### Step 0: Initialize Project
1. Make director for this project:
Call:
- `mcp__project-init-tools__project_init`
with:
- `sample`: sample name (e.g. c1_vs_c2)
- `task`: DAR_analysis
The tool will:
- Create `${sample}_DAR_analysis` (or `${tf}_${sample}_DB_analysis`) directory.
- Return the full path of the `${sample}_DAR_analysis` (or `${tf}_${sample}_DB_analysis`) directory, which will be used as `${proj_dir}`.
### Step 1: Generate Consensus Peaks
Combine peaks from replicates to define a shared feature space.
Call:
- mcp__pydeseq2-tools__generate_consensus_peaks
with:
- `bed_files`: List of paths to peak BED files from replicates.
- `output_bed`: Output path for the merged consensus BED file.
- `output_saf`: Output path for the SAF file (needed for featureCounts)
Output: `consensus_peaks.bed`, `consensus_peaks.saf`
---
### Step 2: Generate Count Matrix
Call:
- mcp__pydeseq2-tools__count_reads_featurecounts
with:
- `saf_file`: SAF file output from Step 1.
- `bam_files`: List of paths to BAM files.
- `output_counts`: Path to output count matrix.
- `is_paired_end`: Whether the BAM file is pair end or not.
- `threads`
Output: `atac_counts.txt`
---
### Step 3: Prepare Metadata
Prepare `samples.csv` describing condition and replicate information.
```csv
sample,condition,replicate
sample1.bam,c1,1
sample2.bam,c1,2
sample3.bam,c2,1
sample4.bam,c2,2
```
---
### Step 4: Differential Accessibility with pyDESeq2
Call:
- mcp__pydeseq2-tools__run_pydeseq2_analysis
with:
- counts_file: Path to featureCounts from Step 2.
- metadata_file: Path to metadata CSV from Step 3.
- design_factors: Design formula columns (e.g. 'condition' or 'batch,condition').
- contrast_column: Column name for contrast (e.g. 'condition').
- contrast_control: Control group name (e.g. 'Control').
- contrast_treatment: Treatment group name (e.g. 'Treated').
- output_csv: Output path for results CSV.
Output: `DAR_results.csv` or `${tf}_DB_results.csv`
---
### Step 5: Visualization and QC
Call:
- mcp__pydeseq2-tools__visualize_results
with:
- `results_csv`: Path to DESeq2 results CSV.
- `counts_file`: Path to original counts file (for PCA).
- `metadata_file`: Path to metadata (for PCA grouping).
- `output_dir`: Directory to save plots.
- `condition_col`: (e.g."condition")
---
### Step 6: Output significantly up and down accessible regions
Call:
- mcp__pydeseq2-tools__filter_and_export_bed
with:
- `results_csv`: Path to DESeq2 results CSV.
- `output_prefix`: Prefix for output BED files.
- `padj_cutoff`: Provided by user
- `log2fc_cutoff`: Provided by user
Output: `DAR_sig.bed` `DAR_up.bed` `DAR_down.bed` or `${tf}_DB_sig.bed` `${tf}_DB_up.bed` `${tf}_DB_down.bed`
## Advanced Usage
- **Batch effects**: `design = ~ batch + condition`
- **Multi-group comparison**: `contrast=c("condition","A","B")`
- **Time series**: `DESeq(dds, test="LRT", reduced=~1)`
- **Filter low counts**: `dds[rowSums(counts(dds)) >= 20, ]`
---
## Notes & Troubleshooting
| Issue | Solution |
|-------|-----------|
| Very low counts | Increase threshold (`rowSums >= 20`) |
| Batch effect | Add batch term to design |
| Non-converging model | Use `fitType="local"` or `betaPrior=FALSE` |
| Mismatched sample names | Ensure count column names match metadata rows |Related Skills
error-diagnostics-error-analysis
You are an expert error analysis specialist with deep expertise in debugging distributed systems, analyzing production incidents, and implementing comprehensive observability solutions. Use when: the user asks to run the `error-analysis` workflow and the task requires multi-step orchestration. Do not use when: the task is small, single-step, and can be completed directly without orchestration overhead.
differential-tad-analysis
This skill performs differential topologically associating domain (TAD) analysis using HiCExplorer's hicDifferentialTAD tool. It compares Hi-C contact matrices between two conditions based on existing TAD definitions to identify significantly altered chromatin domains.
differential-review
Perform security-focused review of code diffs and pull requests, identifying newly introduced vulnerabilities, security regressions, and unsafe patterns in changed code.
differential-methylation
This skill performs differential DNA methylation analysis (DMRs and DMCs) between experimental conditions using WGBS methylation tracks (BED/BedGraph). It standardizes input files into per-sample four-column Metilene tables, constructs a merged methylation matrix, runs Metilene for DMR detection, filters the results, and generates quick visualizations.
developer-growth-analysis
Analyzes your recent Claude Code chat history to identify coding patterns, development gaps, and areas for improvement, curates relevant learning resources from HackerNews, and automatically sends a personalized growth report to your Slack DMs.
bicep-what-if-analysis
azd up/azd provisionの影響分析、Bicep what-if実行とノイズフィルタリング。インフラ変更・デプロイ前の影響確認時に使用。
arxiv-analysis
Analyze arXiv research papers and explain them in accessible terms. Use when the user mentions arXiv, research paper, academic paper, scientific paper, preprint, or provides an arxiv.org URL.
analysis-spec-builder
Build and iteratively refine physics analysis specifications using analysis-specification-template.md. Use when the user asks to create or update an analysis spec, requests plots/histograms for a dataset, or describes a quick analysis task that should be formalized into a specification document.
abaqus-fatigue-analysis
Workflow for fatigue and durability analysis - cycle counting, damage accumulation, and fatigue life prediction.
security-analysis
Security audit patterns including OWASP Top 10, secret scanning, and language-specific vulnerabilities.
A/B Test Analysis
Design and analyze A/B tests, calculate statistical significance, and determine sample sizes for conversion optimization and experiment validation
architecture-analysis
Comprehensive frontend architecture analyzer that identifies technology stacks, build tools, and architectural patterns. Use when you need to quickly understand a project's structure, dependencies, and technical configuration. Provides analysis for Vue/React/Angular frameworks, Node.js environments, package managers, TypeScript usage, linters, and architecture patterns with multiple output formats including executive summaries and visualizations.