global-methylation-profile

This skill performs genome-wide DNA methylation profiling. It supports single-sample and multi-sample workflows to compute methylation density distributions, genomic feature distribution of the methylation profile, and sample-level clustering/PCA. Use it when you want to systematically characterize global methylation patterns from WGBS or similar per-CpG methylation call files.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

global-methylation-profile is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using global-methylation-profile should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/global-methylation-profile/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/development/global-methylation-profile/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/global-methylation-profile/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How global-methylation-profile Compares

Feature / Agent	global-methylation-profile	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Global DNA Methylation Profiling

## Overview

Main steps include:

- Refer to the **Inputs & Outputs** section to check available inputs and design the output structure.  
- **Always prompt user** for genome assembly used.
- **Always prompt user** for which columns are methylation fraction/percent and coverage and strand.
- Analyze the genomic feature distribution of methylations for each sample.
- Compute and visualize genome-wide methylation density distributions.
- For multi-sample datasets, prepare the matrix of methylation data.
- Perform PCA and hierarchical clustering to assess sample similarity based on global methylation.
- **Never use MCP tools in this skill**, use R scripts instead.
---

## When to use this skill

Use the **global-methylation-profiling** skill when you want to:

- Characterize **global DNA methylation status** of one or multiple samples (e.g. normal vs tumor, different cell types).  
- Compare broad methylation patterns across samples:  
  - Are some samples globally hypo-/hyper-methylated?  
  - Are certain chromosomes or genomic regions more strongly affected?  
- Explore genomic feature of your methylation dataset (e.g. promoter hypomethylation, gene body hypermethylation).  
- Perform **unsupervised clustering/PCA** to see if samples separate by condition based on genome-wide methylation patterns.

---

## Inputs & Outputs

### Inputs
`<sample.bed`

### Outputs

```bash
global_methylation_profile/
  stats/
    summary_statistics.tsv
    ...
  plots/
    sample1_genomic_feature_pie.pdf
    sample2_genomic_feature_pie.pdf
    ... # Other samples
    allSamples_methylation_density_overlay.pdf
    PCA_scatterplot.pdf
    sample_correlation_heatmap.pdf
    ...
  logs/
  temp/ # all the temp files
```

---

## Decision Tree

### Step 1: Prepare the sample meta data
```r
library(methylKit)
# Example input: Bismark coverage files (chr, start, end, numCs, numTs, strand)
file.list <- list(
  "sample1.cov",
  "sample2.cov",
  "sample3.cov"
)

sample.id <- list("S1", "S2", "S3")
treatment <- c(0, 1, 1)  # e.g. 0 = control, 1 = treated

# Read methylation data
myobj <- methRead(
  location = file.list,
  sample.id = sample.id,
  assembly  = "hg38", # provided by user
  treatment = treatment,
  context   = "CpG",
  pipeline = list(
    fraction = FALSE,  # percMeth is 0–100, fraction is 0-1, depend on inputs
    chr.col = 1,
    start.col = 2,
    end.col = 3,
    strand.col = 6, # provided by user
    coverage.col = 10, # provided by user
    freqC.col = 11 # provided by user
  )
)

# Optional filtering: remove low / extremely high coverage CpGs
filtered.myobj <- filterByCoverage(
  myobj,
  lo.count = 10, lo.perc = NULL,
  hi.count = 99.9, hi.perc = TRUE
)

# Unite CpGs across samples (common CpG sites)
meth <- unite(filtered.myobj, destrand = TRUE)
```

### Step 2: Analyze Genomic Feature Distribution of CpGs

Annotate CpGs with genomic features (promoter, exon, intron, intergenic, etc.) with genomation and summarize where CpGs (or methylated CpGs) are located for each sample

```r
library(genomation)
library(TxDb.Hsapiens.UCSC.hg38.knownGene) # depend on user inputs
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
# exons
exons <- unlist(exonsBy(txdb))
names(exons) <- NULL
mcols(exons)$type <- "exon"
# introns
introns <- unlist(intronsByTranscript(txdb))
names(introns) <- NULL
mcols(introns)$type <- "intron"
# promoters
promoters.gr <- promoters(txdb, upstream = 2000, downstream = 200)
names(promoters.gr) <- NULL
mcols(promoters.gr)$type <- "promoter"
# TSS（1bp）
TSSes <- promoters(txdb, upstream = 1, downstream = 1)
names(TSSes) <- NULL
mcols(TSSes)$type <- "TSS"
# 3'UTR
utr3 <- unlist(threeUTRsByTranscript(txdb))
names(utr3) <- NULL
mcols(utr3)$type <- "UTR3"
# 5'UTR
utr5 <- unlist(fiveUTRsByTranscript(txdb))
names(utr5) <- NULL
mcols(utr5)$type <- "UTR5"

gene.obj <- GRangesList(
  promoters = promoters.gr,
  exons     = exons,
  introns   = introns,
  TSSes     = TSSes
  UTR5      = utr5,
  UTR3      = utr3,
  ... # other features
)

for (i in seq_along(filtered.myobj)) {
  sample_id <- filtered.myobj[[i]]@sample.id
  cpg.gr <- as(filtered.myobj[[i]], "GRanges")
  ann.gene <- annotateWithGeneParts(cpg.gr, gene.obj)
  feature.summary <- getTargetAnnotationStats(ann.gene, percentage = TRUE)
  out_tab <- as.data.frame(feature.summary)
  write.table(
    out_tab,
    file      = file.path(plot_dir, paste0(sample_id, "_feature_annotation_stats.tsv")),
    sep       = "\t", quote = FALSE, row.names = FALSE
  )
  pdf(file.path(plot_dir, paste0(sample_id, "_genomic_feature_distribution.pdf")))
  plotTargetAnnotation(
    ann.gene,
    main = paste("Genomic feature distribution of CpGs -", sample_id)
  )
  dev.off()
}
```

Step 3: Compute & visualize genome-wide methylation density distributions

```r
# Convert to percent methylation matrix: rows = CpGs, cols = samples
meth.mat <- percMethylation(meth)  # values 0–100

df.long <- reshape2::melt(
  as.data.frame(meth.mat),
  variable.name = "Sample",
  value.name   = "Methylation"
)

ggplot(df.long, aes(x = Methylation, color = Sample)) +
  geom_density() +
  theme_bw() +
  xlab("Percent methylation") +
  ggtitle("Genome-wide methylation density across samples")
```

### Step 4: PCA & Hierarchical Clustering of Multi-sample Methylation

- Use CpG methylation profiles across samples to assess sample similarity and batch effects.

```r
# Meth matrix: rows = CpGs, cols = samples (0–100)
meth.mat <- percMethylation(meth)

# (Optional) Filter CpGs by variability
cpg.sd <- apply(meth.mat, 1, sd, na.rm = TRUE)
keep.var <- cpg.sd > 0
meth.var <- meth.mat[keep.var, ]
if (sum(keep.var) > 10000) {
  keep.idx <- order(cpg.sd[keep.var], decreasing = TRUE)[1:10000]
  meth.var <- meth.var[keep.idx, ]
}

# Z-score transformation (per CpG) – helps clustering
meth.scaled <- t(scale(t(meth.var)))  # rows scaled

pca <- prcomp(t(meth.scaled), center = FALSE, scale. = FALSE)
pca.df <- data.frame(
  Sample = colnames(meth.scaled),
  PC1 = pca$x[, 1],
  PC2 = pca$x[, 2],
  Treatment = factor(treatment, labels = c("Control", "Treatment"))
)

ggplot(pca.df, aes(x = PC1, y = PC2, color = Treatment, label = Sample)) +
  geom_point(size = 3) +
  geom_text(vjust = -1) +
  theme_bw() +
  ggtitle("PCA of global CpG methylation") +
  xlab(paste0("PC1 (", round(summary(pca)$importance[2, 1] * 100, 1), "%)")) +
  ylab(paste0("PC2 (", round(summary(pca)$importance[2, 2] * 100, 1), "%)"))

dist.samples <- dist(t(meth.scaled), method = "euclidean")
hc <- hclust(dist.samples, method = "complete")

plot(hc, main = "Hierarchical clustering of samples (methylation)",
     xlab = "", sub = "")

cor.samples <- cor(meth.var, use = "pairwise.complete.obs")

pheatmap(cor.samples,
         clustering_method = "complete",
         main = "Sample correlation based on CpG methylation")
```

Related Skills

hinge-profile-optimizer

from diegosouzapw/awesome-omni-skill

Comprehensive, research-backed Hinge dating profile optimization. Use when someone wants to improve their Hinge profile, audit an existing profile, write better prompts/captions, select and order photos strategically, or understand why they're not getting quality matches. This is the thorough process (60-90 mins) - discovery interview, game theory analysis, photo strategy, copy creation, settings optimization, and implementation support. Based on 45+ peer-reviewed sources on dating app behavior.

global-standards

from diegosouzapw/awesome-omni-skill

Project-wide coding standards and conventions specialist. Use PROACTIVELY when writing code, making architectural decisions, or establishing project conventions. Covers coding style, commenting, error handling, validation, tech stack consistency, and project conventions across all languages and frameworks.

Global Error Handling

from diegosouzapw/awesome-omni-skill

Your approach to handling global error handling. Use this skill when working on files where global error handling comes into play.

add-global

from diegosouzapw/awesome-omni-skill

Create a reactive global variable that derives from atoms with subscription support

routing-profiles

from diegosouzapw/awesome-omni-skill

Change the Routing Solution routing profiles/vehicle types. To be used as part of customize-main skill

methylation-variability-analysis

from diegosouzapw/awesome-omni-skill

This skill provides a complete and streamlined workflow for performing methylation variability and epigenetic heterogeneity analysis from whole-genome bisulfite sequencing (WGBS) data. It is designed for researchers who want to quantify CpG-level variability across biological samples or conditions, identify highly variable CpGs (HVCs), and explore epigenetic heterogeneity.

globalexceptionhandler-class

from diegosouzapw/awesome-omni-skill

Structure of GlobalExceptionHandler class.

bio-methylation-calling

from diegosouzapw/awesome-omni-skill

Extract methylation calls from Bismark BAM files using bismark_methylation_extractor. Generates per-cytosine reports for CpG, CHG, and CHH contexts. Use when extracting methylation levels from aligned bisulfite sequencing data for downstream analysis.

agentuity-cli-profile-use

from diegosouzapw/awesome-omni-skill

Switch to a different configuration profile

agentuity-cli-profile-show

from diegosouzapw/awesome-omni-skill

Show the configuration of a profile

agentuity-cli-profile-list

from diegosouzapw/awesome-omni-skill

List all available profiles

agentuity-cli-profile-delete

from diegosouzapw/awesome-omni-skill

Delete a configuration profile