bio-chipseq-super-enhancers
Identifies super-enhancers from H3K27ac ChIP-seq data using ROSE and related tools. Use when studying cell identity genes, cancer-associated regulatory elements, or master transcription factor binding regions that cluster into large enhancer domains.
Best use case
bio-chipseq-super-enhancers is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Identifies super-enhancers from H3K27ac ChIP-seq data using ROSE and related tools. Use when studying cell identity genes, cancer-associated regulatory elements, or master transcription factor binding regions that cluster into large enhancer domains.
Teams using bio-chipseq-super-enhancers should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/bio-chipseq-super-enhancers/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How bio-chipseq-super-enhancers Compares
| Feature / Agent | bio-chipseq-super-enhancers | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Identifies super-enhancers from H3K27ac ChIP-seq data using ROSE and related tools. Use when studying cell identity genes, cancer-associated regulatory elements, or master transcription factor binding regions that cluster into large enhancer domains.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
## Version Compatibility
Reference examples tested with: GenomicRanges 1.54+, bedtools 2.31+, ggplot2 3.5+, samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
- CLI: `<tool> --version` then `<tool> --help` to confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Super-Enhancer Calling
**"Identify super-enhancers from H3K27ac ChIP-seq"** → Stitch nearby enhancer peaks and rank by signal to find large regulatory domains controlling cell identity genes.
- CLI: `ROSE_main.py -g hg38 -i peaks.gff -r chip.bam -c input.bam`
Identify super-enhancers (SEs) - large clusters of enhancers that control cell identity genes.
## Background
Super-enhancers are:
- Large clusters of enhancer regions
- Marked by H3K27ac, Med1, BRD4
- Control cell identity genes
- Often altered in disease/cancer
## ROSE (Rank Ordering of Super-Enhancers)
### Installation
```bash
git clone https://github.com/stjude/ROSE.git
cd ROSE
# Requires samtools, R, bedtools
```
### Input Requirements
1. **BAM file** - H3K27ac ChIP-seq aligned reads
2. **Peak file** - Called peaks (BED or GFF)
3. **Genome annotation** - TSS annotations
### Run ROSE
**Goal:** Identify super-enhancers by stitching nearby enhancer peaks and ranking by H3K27ac signal.
**Approach:** Run ROSE_main.py with a GFF peak file, ChIP-seq BAM, and optional input control to stitch enhancers within 12.5 kb, rank by signal, and identify the inflection point separating super-enhancers from typical enhancers.
```bash
# Basic usage
python ROSE_main.py \
-g HG38 \
-i peaks.gff \
-r h3k27ac.bam \
-o output_dir \
-s 12500 \
-t 2500
# With control/input
python ROSE_main.py \
-g HG38 \
-i peaks.gff \
-r h3k27ac.bam \
-c input.bam \
-o output_dir
```
### Key Parameters
| Parameter | Description | Default |
|-----------|-------------|---------|
| `-s` | Stitching distance | 12500 bp |
| `-t` | TSS exclusion | 2500 bp |
| `-c` | Control BAM | None |
### Output Files
```
output_dir/
├── *_AllEnhancers.table.txt # All enhancer regions
├── *_SuperEnhancers.table.txt # Super-enhancers only
├── *_Enhancers_withSuper.bed # BED with SE annotation
└── *_Plot_points.png # Hockey stick plot
```
## Prepare Input Files
### Convert BED to GFF
```bash
# ROSE requires GFF format for peaks
awk 'BEGIN{OFS="\t"} {print $1,"peaks","enhancer",$2,$3,".",$6,".","ID="NR}' \
peaks.bed > peaks.gff
```
### Filter Peaks for Enhancers
```bash
# Remove promoter peaks (within 2.5kb of TSS)
bedtools intersect -a peaks.bed -b promoters.bed -v > enhancer_peaks.bed
```
## Alternative: HOMER Super-Enhancers
```bash
# Call super-enhancers with HOMER
findPeaks tag_dir/ -style super -o auto
# Or from existing peaks
findPeaks tag_dir/ -style super -i input_tag_dir/ \
-typical typical_enhancers.txt \
-superSlope -1000 \
> super_enhancers.txt
```
## Alternative: SEanalysis
```bash
# R-based analysis
Rscript << 'EOF'
library(SEanalysis)
# Load H3K27ac signal at enhancers
signal <- read.table('enhancer_signal.txt', header=TRUE)
# Rank and identify super-enhancers
se_result <- identifySE(signal$signal, method='ROSE')
# Get super-enhancer IDs
super_enhancers <- signal$id[se_result$is_super]
write.table(super_enhancers, 'super_enhancers.txt', quote=FALSE, row.names=FALSE)
EOF
```
## Custom Hockey Stick Analysis (R)
**Goal:** Classify enhancers as super-enhancers vs typical using a custom hockey stick plot and inflection-point detection.
**Approach:** Rank enhancers by normalized signal, compute the slope at each point, find where the tangent exceeds 1 (inflection point), and classify all enhancers above the inflection as super-enhancers.
```r
library(ggplot2)
# Load enhancer signal data
enhancers <- read.table('enhancer_signal.txt', header=TRUE)
# Rank by signal
enhancers <- enhancers[order(enhancers$signal), ]
enhancers$rank <- 1:nrow(enhancers)
# Find inflection point (tangent = 1)
# Normalize ranks and signal to 0-1
enhancers$rank_norm <- enhancers$rank / max(enhancers$rank)
enhancers$signal_norm <- enhancers$signal / max(enhancers$signal)
# Calculate slope at each point
n <- nrow(enhancers)
slopes <- diff(enhancers$signal_norm) / diff(enhancers$rank_norm)
inflection <- which(slopes > 1)[1]
# Classify
enhancers$type <- ifelse(enhancers$rank >= inflection, 'Super-Enhancer', 'Typical')
# Plot
ggplot(enhancers, aes(rank, signal, color = type)) +
geom_point(size = 0.5) +
scale_color_manual(values = c('Super-Enhancer' = 'red', 'Typical' = 'grey60')) +
geom_vline(xintercept = inflection, linetype = 'dashed') +
labs(x = 'Enhancer Rank', y = 'H3K27ac Signal', title = 'Super-Enhancer Identification') +
theme_bw()
ggsave('hockey_stick_plot.pdf', width = 8, height = 6)
# Output super-enhancers
super_enhancers <- enhancers[enhancers$type == 'Super-Enhancer', ]
write.table(super_enhancers, 'super_enhancers.txt', sep = '\t', quote = FALSE, row.names = FALSE)
```
## Calculate Enhancer Signal
```bash
# Get H3K27ac signal at peak regions
bedtools multicov -bams h3k27ac.bam -bed enhancer_peaks.bed > enhancer_counts.txt
# Normalize by peak size
awk 'BEGIN{OFS="\t"} {
size = $3 - $2
rpm = ($NF / TOTAL_READS) * 1e6
rpkm = rpm / (size / 1000)
print $0, rpkm
}' enhancer_counts.txt > enhancer_signal.txt
```
## Downstream Analysis
### Gene Assignment
```bash
# Assign super-enhancers to nearest genes
bedtools closest -a super_enhancers.bed -b genes.bed -d > se_gene_assignment.txt
```
### Compare Conditions
**Goal:** Find super-enhancers gained or lost between two experimental conditions.
**Approach:** Convert super-enhancer tables to GRanges objects and use subsetByOverlaps with invert to identify condition-specific super-enhancers.
```r
# Load SE from two conditions
se1 <- read.table('condition1_SE.txt', header=TRUE)
se2 <- read.table('condition2_SE.txt', header=TRUE)
# Find differential super-enhancers
library(GenomicRanges)
gr1 <- makeGRangesFromDataFrame(se1)
gr2 <- makeGRangesFromDataFrame(se2)
# Gained in condition 2
gained <- subsetByOverlaps(gr2, gr1, invert=TRUE)
# Lost in condition 2
lost <- subsetByOverlaps(gr1, gr2, invert=TRUE)
```
### Enrichment of Disease Variants
```bash
# Check if GWAS SNPs enriched in super-enhancers
bedtools intersect -a gwas_snps.bed -b super_enhancers.bed -wa -wb > snps_in_SE.txt
# Calculate enrichment
total_snps=$(wc -l < gwas_snps.bed)
snps_in_se=$(wc -l < snps_in_SE.txt)
se_coverage=$(awk '{sum += $3-$2} END {print sum}' super_enhancers.bed)
genome_size=3000000000
expected=$(echo "$total_snps * $se_coverage / $genome_size" | bc -l)
enrichment=$(echo "$snps_in_se / $expected" | bc -l)
echo "Enrichment: $enrichment"
```
## Complete Workflow
```bash
#!/bin/bash
set -euo pipefail
H3K27AC_BAM=$1
PEAKS_BED=$2
OUTPUT_DIR=$3
mkdir -p $OUTPUT_DIR
echo "=== Convert peaks to GFF ==="
awk 'BEGIN{OFS="\t"} {print $1,"peaks","enhancer",$2,$3,".",$6,".","ID="NR}' \
$PEAKS_BED > $OUTPUT_DIR/peaks.gff
echo "=== Run ROSE ==="
python ROSE_main.py \
-g HG38 \
-i $OUTPUT_DIR/peaks.gff \
-r $H3K27AC_BAM \
-o $OUTPUT_DIR \
-s 12500 \
-t 2500
echo "=== Summary ==="
n_typical=$(grep -c "Typical" $OUTPUT_DIR/*_AllEnhancers.table.txt || echo 0)
n_super=$(wc -l < $OUTPUT_DIR/*_SuperEnhancers.table.txt)
echo "Typical enhancers: $n_typical"
echo "Super-enhancers: $n_super"
```
## Related Skills
- chip-seq/peak-calling - Call H3K27ac peaks first
- chip-seq/peak-annotation - Annotate SE to genes
- chip-seq/differential-binding - Compare SE between conditions
- data-visualization/genome-tracks - Visualize SE regionsRelated Skills
using-superpowers
Use when starting any conversation - establishes how to find and use skills, requiring Skill tool invocation before ANY response including clarifying questions
bio-chipseq-visualization
Visualize ChIP-seq data using deepTools, Gviz, and ChIPseeker. Create heatmaps, profile plots, and genome browser tracks. Visualize signal around peaks, TSS, or custom regions. Use when visualizing ChIP-seq signal and peaks.
bio-chipseq-qc
ChIP-seq quality control metrics including FRiP (Fraction of Reads in Peaks), cross-correlation analysis (NSC/RSC), library complexity, and IDR (Irreproducibility Discovery Rate) for replicate concordance. Use to assess experiment quality before downstream analysis. Use when assessing ChIP-seq data quality metrics.
bio-chipseq-peak-calling
ChIP-seq peak calling using MACS3 (or MACS2). Call narrow peaks for transcription factors or broad peaks for histone modifications. Supports input control, fragment size modeling, and various output formats including narrowPeak and broadPeak BED files. Use when calling peaks from ChIP-seq alignments.
bio-chipseq-peak-annotation
Annotate ChIP-seq peaks to genomic features and genes using ChIPseeker. Assign peaks to promoters, exons, introns, and intergenic regions. Find nearest genes and calculate distance to TSS. Generate annotation plots and statistics. Use when annotating ChIP-seq peaks to genomic features.
bio-chipseq-motif-analysis
De novo motif discovery and known motif enrichment analysis using HOMER and MEME-ChIP. Identify transcription factor binding motifs in ChIP-seq, ATAC-seq, or other genomic peak data. Use when finding enriched DNA motifs in peak sequences.
bio-chipseq-differential-binding
Differential binding analysis using DiffBind. Compare ChIP-seq peaks between conditions with statistical rigor. Requires replicate samples. Outputs differentially bound regions with fold changes and p-values. Use when comparing ChIP-seq binding between conditions.
zinc-database
Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.
zarr-python
Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.
xlsx
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.
writing-skills
Use when creating new skills, editing existing skills, or verifying skills work before deployment
writing-plans
Use when you have a spec or requirements for a multi-step task, before touching code