methylation-variability-analysis
This skill provides a complete and streamlined workflow for performing methylation variability and epigenetic heterogeneity analysis from whole-genome bisulfite sequencing (WGBS) data. It is designed for researchers who want to quantify CpG-level variability across biological samples or conditions, identify highly variable CpGs (HVCs), and explore epigenetic heterogeneity.
Best use case
methylation-variability-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
This skill provides a complete and streamlined workflow for performing methylation variability and epigenetic heterogeneity analysis from whole-genome bisulfite sequencing (WGBS) data. It is designed for researchers who want to quantify CpG-level variability across biological samples or conditions, identify highly variable CpGs (HVCs), and explore epigenetic heterogeneity.
Teams using methylation-variability-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/methylation-variability-analysis/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How methylation-variability-analysis Compares
| Feature / Agent | methylation-variability-analysis | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
This skill provides a complete and streamlined workflow for performing methylation variability and epigenetic heterogeneity analysis from whole-genome bisulfite sequencing (WGBS) data. It is designed for researchers who want to quantify CpG-level variability across biological samples or conditions, identify highly variable CpGs (HVCs), and explore epigenetic heterogeneity.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# SKILL: Methylation Variability & Heterogeneity Analysis
## Overview
Main steps include:
- Refer to the **Inputs & Outputs** section to check available inputs and design the output structure.
- **Always prompt user** for genome assembly used.
- **Always prompt user** for which columns in the BED files are methylation fraction/percent and coverage and strand.
- Building a multi-sample CpG methylation matrix from WGBS coverage files.
- Computing **between-sample variability** at CpG level (variance, MAD, CV).
---
## When to use this skill
Use this methylKit-based variability pipeline when you want to:
- Quantify **between-sample variability** at CpG level (e.g., across replicates, cell types, conditions).
- Identify **highly variable CpGs (HVCs)** as candidate epigenetically heterogeneous loci.
- Explore **epigenetic heterogeneity** between groups (e.g., GM12878 vs K562, disease vs control).
---
## Inputs & Outputs
### Inputs
`<sample1>.bed`
`<sample2>.bed`
### Outputs
```bash
methylation_variability/
stats/
top_variable_CpGs.tsv
CpG_variability_stats.tsv
plots/
heatmap_top_variable_CpGs.pdf
distribution_CpG_variance.pdf
mean_vs_variance_scatter.pdf
temp/
```
---
## Decision Tree
### Step 1: Prepare the sample meta data
```r
library(methylKit)
file.list <- list(
"sample1.cov",
"sample2.cov",
"sample3.cov"
)
sample.id <- list("S1", "S2", "S3")
treatment <- c(0, 1, 1) # e.g. 0 = control, 1 = treated
# Read methylation data
myobj <- methRead(
location = file.list,
sample.id = sample.id,
assembly = "hg38", # provided by user
treatment = treatment,
context = "CpG",
pipeline = list(
fraction = FALSE, # percMeth is 0–100, fraction is 0-1, depend on inputs
chr.col = 1,
start.col = 2,
end.col = 3,
strand.col = 6, # provided by user
coverage.col = 10, # provided by user
freqC.col = 11 # provided by user
)
)
# Optional filtering: remove low / extremely high coverage CpGs
filtered.myobj <- filterByCoverage(
myobj,
lo.count = 10, lo.perc = NULL,
hi.count = 99.9, hi.perc = TRUE
)
# Unite CpGs across samples (common CpG sites)
meth <- unite(filtered.myobj, destrand = TRUE)
```
### Step 2: Statistical analysis
```r
d <- getData(meth.united)
numCs.cols <- grep("numCs", colnames(d), value = TRUE)
cov.cols <- grep("coverage", colnames(d), value = TRUE)
pmat01 <- d[, numCs.cols] / d[, cov.cols]
pmat01 <- as.matrix(data.frame(pmat01))
var.cpg <- rowVars(pmat01, na.rm = TRUE) # Variance across samples
mad.cpg <- rowMads(pmat01, na.rm = TRUE) # Median absolute deviation (MAD)
# Coefficient of variation (CV = sd / mean)
mean.cpg <- rowMeans(pmat01, na.rm = TRUE)
sd.cpg <- sqrt(var.cpg)
cv.cpg <- sd.cpg / (mean.cpg + 1e-6) # add small constant to avoid division by zero
# Assemble statistics table
var.stats <- data.frame(
chr = d$chr,
start = d$start,
end = d$end,
mean = mean.cpg,
variance = var.cpg,
MAD = mad.cpg,
CV = cv.cpg,
stringsAsFactors = FALSE
)
var.stats <- var.stats[order(-var.stats$variance), ] # Sort by variance (descending)
# Save full table
write.table(
var.stats,
file = "CpG_variability_stats.tsv",
sep = " ",
quote = FALSE,
row.names = FALSE
)
```
### Step 3: high variable CpG selection
```r
topN <- 1000
top.idx <- head(order(-var.cpg), topN)
pmat.top <- pmat01[top.idx, , drop = FALSE]
# Save top-variable CpGs table
write.table(
var.stats[match(rownames(pmat.top), rownames(var.stats)), ],
file = "top_variable_CpGs.tsv",
sep = " ",
quote = FALSE,
row.names = FALSE
)
```
### Step 4: Visualization
```r
group.factor <- factor(ifelse(treatment == 0, "GM12878", "K562"))
ha <- HeatmapAnnotation(Group = group.factor)
Heatmap(
pmat.top,
name = "methylation",
show_row_names = FALSE,
show_column_names = TRUE,
top_annotation = ha,
cluster_rows = TRUE,
cluster_columns = TRUE
)
# Distribution of the CpG variability
var.df <- data.frame(
variance = var.cpg,
log10_variance = log10(var.cpg + 1e-8)
)
ggplot(var.df, aes(x = log10_variance)) +
geom_histogram(bins = 50) +
theme_minimal() +
labs(
title = "CpG-wise methylation variance (log10 scale)",
x = "log10(variance + 1e-8)",
y = "Count of CpGs"
)
# 3. Mean vs Variance scatter plot
ggplot(var.stats, aes(x = mean_methylation, y = variance)) +
geom_hex(bins = 50) +
scale_fill_viridis_c(trans = "log10") +
theme_minimal() +
labs(
title = "Mean Methylation vs Variance",
x = "Mean Methylation",
y = "Variance",
fill = "Count (log10)"
) +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold")
)
```
---
## Recommended Extensions
- You can change 'lo.count', 'hi.perc', and 'topN' depending on coverage and dataset size.
- If you want group-wise differential variability (e.g., GM12878 vs K562),
- you can apply variance/Bartlett/Levene tests per CpG using 'pmat01' and 'treatment'.
- Add region-level annotation (promoters, gene bodies, CpG islands) using `GenomicRanges` and TxDb annotations, then compute variability at region level by aggregating CpG variability.
- Implement differential variability tests between groups (e.g., variance comparison between GM12878 and K562).
- Combine this variability pipeline with DMR analysis from methylKit to simultaneously look at mean shifts and heterogeneity.Related Skills
swot-pestle-analysis
Strategic environmental analysis using SWOT, PESTLE, and Porter's Five Forces. Creates structured assessments with Mermaid visualizations for competitive positioning and strategic planning.
pl-cost-analysis
Calculate monthly COGS, cost percentages, and manager bonuses (COGS + Top Line) using NET SALES for accuracy and detailed inventory data for restaurant locations.
neuropixels-analysis
Neuropixels neural recording analysis. Load SpikeGLX/OpenEphys data, preprocess, motion correction, Kilosort4 spike sorting, quality metrics, Allen/IBL curation, AI-assisted visual analysis, for Neuropixels 1.0/2.0 extracellular electrophysiology. Use when working with neural recordings, spike sorting, extracellular electrophysiology, or when the user mentions Neuropixels, SpikeGLX, Open Ephys, Kilosort, quality metrics, or unit curation.
deep-codebase-analysis
Agent capable of reading and analyzing the entire source code of a software project to gain a thorough understanding of architecture, communication, design patterns, and business flows. Use when exploring new systems, maintenance, or refactoring.
dataql-analysis
Analyze data files using SQL queries with DataQL. Use when working with CSV, JSON, Parquet, Excel files or when the user mentions data analysis, filtering, aggregation, or SQL queries on files.
bio-methylation-calling
Extract methylation calls from Bismark BAM files using bismark_methylation_extractor. Generates per-cytosine reports for CpG, CHG, and CHH contexts. Use when extracting methylation levels from aligned bisulfite sequencing data for downstream analysis.
analysis
Docent is a platform for analyzing AI agent behavior using large language models. Use this skill anytime you want to use Docent to analyze AI agent behavior.
analysis-report
Generates comprehensive, structured research reports.
azure-ai-vision-imageanalysis-java
Build image analysis applications with Azure AI Vision SDK for Java. Use when implementing image captioning, OCR text extraction, object detection, tagging, or smart cropping.
article-analysis
Analyze blog posts and web articles by fetching content from URLs. Use when the user mentions blog post, article, Substack, Medium, web page, newsletter, or provides a URL to analyze.
order-analysis
分析产品升级工单,识别共性问题并提出产品改进建议。通过 agent-browser工具 访问工单系统,提取工单数据,进行问题分类、趋势分析和根因定位,输出改进方案。
agent-ops-git-analysis
Analyze git repository for insights: contributor stats, commit patterns, branch health, and change analysis. Outputs actionable reports.