bio-chipseq-peak-annotation

Annotate ChIP-seq peaks to genomic features and genes using ChIPseeker. Assign peaks to promoters, exons, introns, and intergenic regions. Find nearest genes and calculate distance to TSS. Generate annotation plots and statistics. Use when annotating ChIP-seq peaks to genomic features.

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

bio-chipseq-peak-annotation is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using bio-chipseq-peak-annotation should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-chipseq-peak-annotation/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-chipseq-peak-annotation/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bio-chipseq-peak-annotation/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bio-chipseq-peak-annotation Compares

Feature / Agent	bio-chipseq-peak-annotation	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## Version Compatibility

Reference examples tested with: MACS3 3.0+, clusterProfiler 4.10+

Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Peak Annotation with ChIPseeker

**"Annotate my ChIP-seq peaks to genes"** → Assign peaks to genomic features (promoter, exon, intron, intergenic), find nearest genes, and calculate TSS distances.
- R: `ChIPseeker::annotatePeak(peaks, TxDb=txdb)`

## Load Peaks and Annotations

```r
library(ChIPseeker)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(org.Hs.eg.db)

txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene

# Read peaks from MACS3
peaks <- readPeakFile('sample_peaks.narrowPeak')
```

## Annotate Peaks

**Goal:** Assign each ChIP-seq peak to its nearest gene and genomic feature category.

**Approach:** Use annotatePeak with a TxDb annotation database to classify peaks as promoter, exon, intron, or intergenic and retrieve the nearest gene symbol.

```r
# Annotate with default settings
peak_anno <- annotatePeak(
    peaks,
    TxDb = txdb,
    annoDb = 'org.Hs.eg.db'
)

# View annotation summary
peak_anno
```

## Custom Promoter Definition

```r
# Define promoter region (-3kb to +3kb from TSS)
peak_anno <- annotatePeak(
    peaks,
    TxDb = txdb,
    tssRegion = c(-3000, 3000),  # Promoter definition
    annoDb = 'org.Hs.eg.db'
)
```

## Extract Annotated Data Frame

```r
# Convert to data frame
anno_df <- as.data.frame(peak_anno)

# Key columns: seqnames, start, end, annotation, distanceToTSS, SYMBOL, GENENAME
head(anno_df)

# Export to CSV
write.csv(anno_df, 'annotated_peaks.csv', row.names = FALSE)
```

## Get Genes with Peaks in Promoter

```r
# Filter for promoter peaks
promoter_peaks <- anno_df[grep('Promoter', anno_df$annotation), ]

# Get unique genes
promoter_genes <- unique(promoter_peaks$SYMBOL)
```

## Annotation Pie Chart

```r
# Pie chart of genomic feature distribution
plotAnnoPie(peak_anno)

# Bar plot alternative
plotAnnoBar(peak_anno)
```

## Distance to TSS Plot

```r
# Distribution of peaks relative to TSS
plotDistToTSS(peak_anno, title = 'Distribution of peaks relative to TSS')
```

## Compare Multiple Peak Sets

**Goal:** Compare genomic feature distributions across multiple ChIP-seq experiments (e.g., different histone marks).

**Approach:** Read and annotate each peak file separately, then use plotAnnoBar and plotDistToTSS on the annotation list for side-by-side comparison.

```r
# Read multiple peak files
peak_files <- list(
    H3K4me3 = 'H3K4me3_peaks.narrowPeak',
    H3K27ac = 'H3K27ac_peaks.narrowPeak',
    H3K27me3 = 'H3K27me3_peaks.broadPeak'
)

peak_list <- lapply(peak_files, readPeakFile)

# Annotate all
anno_list <- lapply(peak_list, annotatePeak, TxDb = txdb, annoDb = 'org.Hs.eg.db')

# Compare annotations
plotAnnoBar(anno_list)
plotDistToTSS(anno_list)
```

## Venn Diagram of Peak Overlap

```r
# Find overlapping peaks
genes_list <- lapply(anno_list, function(x) as.data.frame(x)$SYMBOL)
vennplot(genes_list)
```

## Coverage Plot

```r
# Plot peak coverage around TSS
covplot(peaks, weightCol = 'V5')  # V5 is score column in narrowPeak
```

## Profile Heatmap Around TSS

**Goal:** Visualize the distribution of ChIP-seq signal around transcription start sites.

**Approach:** Extract promoter regions from the TxDb, build a tag matrix of signal at those regions, and plot as a heatmap or average profile.

```r
# Get promoter coordinates
promoter <- getPromoters(TxDb = txdb, upstream = 3000, downstream = 3000)

# Get tag matrix
tagMatrix <- getTagMatrix(peaks, windows = promoter)

# Plot heatmap
tagHeatmap(tagMatrix, xlim = c(-3000, 3000), color = 'red')

# Average profile
plotAvgProf(tagMatrix, xlim = c(-3000, 3000), xlab = 'Distance from TSS')
```

## Functional Enrichment of Peak Genes

**Goal:** Determine which biological processes are enriched among genes with ChIP-seq peaks in their promoters.

**Approach:** Extract Entrez IDs from annotated peaks and run GO enrichment analysis with clusterProfiler.

```r
library(clusterProfiler)

# Get genes from peaks
genes <- unique(anno_df$ENTREZID)

# GO enrichment
ego <- enrichGO(
    gene = genes,
    OrgDb = org.Hs.eg.db,
    ont = 'BP',
    pAdjustMethod = 'BH',
    pvalueCutoff = 0.05
)
```

## Seq2Gene - All Genes in Peak Regions

```r
# Find all genes overlapping peak regions (not just nearest)
genes_in_peaks <- seq2gene(peaks, tssRegion = c(-1000, 1000), flankDistance = 3000, TxDb = txdb)
```

## Different Organisms

```r
# Mouse
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
library(org.Mm.eg.db)
peak_anno_mm <- annotatePeak(peaks, TxDb = TxDb.Mmusculus.UCSC.mm10.knownGene, annoDb = 'org.Mm.eg.db')

# Zebrafish
library(TxDb.Drerio.UCSC.danRer11.refGene)
library(org.Dr.eg.db)
```

## Key Functions

| Function | Purpose |
|----------|---------|
| readPeakFile | Read peak file (BED, narrowPeak) |
| annotatePeak | Annotate peaks to genes |
| plotAnnoPie | Pie chart of annotations |
| plotAnnoBar | Bar plot of annotations |
| plotDistToTSS | Distance to TSS distribution |
| getPromoters | Get promoter regions |
| getTagMatrix | Coverage matrix around regions |
| tagHeatmap | Heatmap of signal |
| plotAvgProf | Average profile plot |
| seq2gene | Map peaks to all overlapping genes |

## Annotation Categories

| Category | Description |
|----------|-------------|
| Promoter | Within tssRegion of TSS |
| 5' UTR | 5' untranslated region |
| 3' UTR | 3' untranslated region |
| Exon | Coding exon |
| Intron | Intronic region |
| Downstream | Within 3kb downstream |
| Distal Intergenic | Beyond gene regions |

## Related Skills

- peak-calling - Generate peak files with MACS3
- differential-binding - Find differential peaks
- pathway-analysis - Functional enrichment
- chipseq-visualization - Additional visualizations

Related Skills

single-cell-annotation-skills-with-omicverse

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Guide Claude through SCSA, MetaTiME, CellVote, CellMatch, GPTAnno, and weighted KNN transfer workflows for annotating single-cell modalities.

bio-variant-annotation

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Comprehensive variant annotation using bcftools annotate/csq, VEP, SnpEff, and ANNOVAR. Add database annotations, predict functional consequences, and assess clinical significance. Use when annotating variants with functional and clinical information.

bio-single-cell-markers-annotation

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Find marker genes and annotate cell types in single-cell RNA-seq using Seurat (R) and Scanpy (Python). Use for differential expression between clusters, identifying cluster-specific markers, scoring gene sets, and assigning cell type labels. Use when finding marker genes and annotating clusters.

bio-single-cell-cell-annotation

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Automated cell type annotation using reference-based methods including CellTypist, scPred, SingleR, and Azimuth for consistent, reproducible cell labeling. Use when automatically annotating cell types using reference datasets.

bio-metabolomics-metabolite-annotation

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Metabolite identification from m/z and retention time. Covers database matching, MS/MS spectral matching, and confidence level assignment. Use when assigning compound identities to detected features in untargeted metabolomics.

bio-imaging-mass-cytometry-interactive-annotation

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Interactive cell type annotation for IMC data. Covers napari-based annotation, marker-guided labeling, training data generation, and annotation validation. Use when manually annotating cell types for training classifiers or validating automated phenotyping results.

bio-copy-number-cnv-annotation

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Annotate CNVs with genes, pathways, and clinical significance. Use when interpreting CNV calls or identifying affected genes from copy number analysis.

bio-chipseq-visualization

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Visualize ChIP-seq data using deepTools, Gviz, and ChIPseeker. Create heatmaps, profile plots, and genome browser tracks. Visualize signal around peaks, TSS, or custom regions. Use when visualizing ChIP-seq signal and peaks.

bio-chipseq-super-enhancers

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Identifies super-enhancers from H3K27ac ChIP-seq data using ROSE and related tools. Use when studying cell identity genes, cancer-associated regulatory elements, or master transcription factor binding regions that cluster into large enhancer domains.

bio-chipseq-qc

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

ChIP-seq quality control metrics including FRiP (Fraction of Reads in Peaks), cross-correlation analysis (NSC/RSC), library complexity, and IDR (Irreproducibility Discovery Rate) for replicate concordance. Use to assess experiment quality before downstream analysis. Use when assessing ChIP-seq data quality metrics.

bio-chipseq-peak-calling

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

ChIP-seq peak calling using MACS3 (or MACS2). Call narrow peaks for transcription factors or broad peaks for histone modifications. Supports input control, fragment size modeling, and various output formats including narrowPeak and broadPeak BED files. Use when calling peaks from ChIP-seq alignments.

bio-chipseq-motif-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

De novo motif discovery and known motif enrichment analysis using HOMER and MEME-ChIP. Identify transcription factor binding motifs in ChIP-seq, ATAC-seq, or other genomic peak data. Use when finding enriched DNA motifs in peak sequences.