methylation-variability-analysis
This skill provides a complete and streamlined workflow for performing methylation variability and epigenetic heterogeneity analysis from whole-genome bisulfite sequencing (WGBS) data. It is designed for researchers who want to quantify CpG-level variability across biological samples or conditions, identify highly variable CpGs (HVCs), and explore epigenetic heterogeneity.
Best use case
methylation-variability-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
This skill provides a complete and streamlined workflow for performing methylation variability and epigenetic heterogeneity analysis from whole-genome bisulfite sequencing (WGBS) data. It is designed for researchers who want to quantify CpG-level variability across biological samples or conditions, identify highly variable CpGs (HVCs), and explore epigenetic heterogeneity.
Teams using methylation-variability-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/25-methylation-variability/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How methylation-variability-analysis Compares
| Feature / Agent | methylation-variability-analysis | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
This skill provides a complete and streamlined workflow for performing methylation variability and epigenetic heterogeneity analysis from whole-genome bisulfite sequencing (WGBS) data. It is designed for researchers who want to quantify CpG-level variability across biological samples or conditions, identify highly variable CpGs (HVCs), and explore epigenetic heterogeneity.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Best AI Skills for ChatGPT
Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
AI Agent for Product Research
Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.
SKILL.md Source
# SKILL: Methylation Variability & Heterogeneity Analysis
## Overview
Main steps include:
- Refer to the **Inputs & Outputs** section to check available inputs and design the output structure.
- **Always prompt user** for genome assembly used.
- **Always prompt user** for which columns in the BED files are methylation fraction/percent and coverage and strand.
- Building a multi-sample CpG methylation matrix from WGBS coverage files.
- Computing **between-sample variability** at CpG level (variance, MAD, CV).
---
## When to use this skill
Use this methylKit-based variability pipeline when you want to:
- Quantify **between-sample variability** at CpG level (e.g., across replicates, cell types, conditions).
- Identify **highly variable CpGs (HVCs)** as candidate epigenetically heterogeneous loci.
- Explore **epigenetic heterogeneity** between groups (e.g., GM12878 vs K562, disease vs control).
---
## Inputs & Outputs
### Inputs
`<sample1>.bed`
`<sample2>.bed`
### Outputs
```bash
methylation_variability/
stats/
top_variable_CpGs.tsv
CpG_variability_stats.tsv
plots/
heatmap_top_variable_CpGs.pdf
distribution_CpG_variance.pdf
mean_vs_variance_scatter.pdf
temp/
```
---
## Decision Tree
### Step 1: Prepare the sample meta data
```r
library(methylKit)
file.list <- list(
"sample1.cov",
"sample2.cov",
"sample3.cov"
)
sample.id <- list("S1", "S2", "S3")
treatment <- c(0, 1, 1) # e.g. 0 = control, 1 = treated
# Read methylation data
myobj <- methRead(
location = file.list,
sample.id = sample.id,
assembly = "hg38", # provided by user
treatment = treatment,
context = "CpG",
pipeline = list(
fraction = FALSE, # percMeth is 0–100, fraction is 0-1, depend on inputs
chr.col = 1,
start.col = 2,
end.col = 3,
strand.col = 6, # provided by user
coverage.col = 10, # provided by user
freqC.col = 11 # provided by user
)
)
# Optional filtering: remove low / extremely high coverage CpGs
filtered.myobj <- filterByCoverage(
myobj,
lo.count = 10, lo.perc = NULL,
hi.count = 99.9, hi.perc = TRUE
)
# Unite CpGs across samples (common CpG sites)
meth <- unite(filtered.myobj, destrand = TRUE)
```
### Step 2: Statistical analysis
```r
d <- getData(meth.united)
numCs.cols <- grep("numCs", colnames(d), value = TRUE)
cov.cols <- grep("coverage", colnames(d), value = TRUE)
pmat01 <- d[, numCs.cols] / d[, cov.cols]
pmat01 <- as.matrix(data.frame(pmat01))
var.cpg <- rowVars(pmat01, na.rm = TRUE) # Variance across samples
mad.cpg <- rowMads(pmat01, na.rm = TRUE) # Median absolute deviation (MAD)
# Coefficient of variation (CV = sd / mean)
mean.cpg <- rowMeans(pmat01, na.rm = TRUE)
sd.cpg <- sqrt(var.cpg)
cv.cpg <- sd.cpg / (mean.cpg + 1e-6) # add small constant to avoid division by zero
# Assemble statistics table
var.stats <- data.frame(
chr = d$chr,
start = d$start,
end = d$end,
mean = mean.cpg,
variance = var.cpg,
MAD = mad.cpg,
CV = cv.cpg,
stringsAsFactors = FALSE
)
var.stats <- var.stats[order(-var.stats$variance), ] # Sort by variance (descending)
# Save full table
write.table(
var.stats,
file = "CpG_variability_stats.tsv",
sep = " ",
quote = FALSE,
row.names = FALSE
)
```
### Step 3: high variable CpG selection
```r
topN <- 1000
top.idx <- head(order(-var.cpg), topN)
pmat.top <- pmat01[top.idx, , drop = FALSE]
# Save top-variable CpGs table
write.table(
var.stats[match(rownames(pmat.top), rownames(var.stats)), ],
file = "top_variable_CpGs.tsv",
sep = " ",
quote = FALSE,
row.names = FALSE
)
```
### Step 4: Visualization
```r
group.factor <- factor(ifelse(treatment == 0, "GM12878", "K562"))
ha <- HeatmapAnnotation(Group = group.factor)
Heatmap(
pmat.top,
name = "methylation",
show_row_names = FALSE,
show_column_names = TRUE,
top_annotation = ha,
cluster_rows = TRUE,
cluster_columns = TRUE
)
# Distribution of the CpG variability
var.df <- data.frame(
variance = var.cpg,
log10_variance = log10(var.cpg + 1e-8)
)
ggplot(var.df, aes(x = log10_variance)) +
geom_histogram(bins = 50) +
theme_minimal() +
labs(
title = "CpG-wise methylation variance (log10 scale)",
x = "log10(variance + 1e-8)",
y = "Count of CpGs"
)
# 3. Mean vs Variance scatter plot
ggplot(var.stats, aes(x = mean_methylation, y = variance)) +
geom_hex(bins = 50) +
scale_fill_viridis_c(trans = "log10") +
theme_minimal() +
labs(
title = "Mean Methylation vs Variance",
x = "Mean Methylation",
y = "Variance",
fill = "Count (log10)"
) +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold")
)
```
---
## Recommended Extensions
- You can change 'lo.count', 'hi.perc', and 'topN' depending on coverage and dataset size.
- If you want group-wise differential variability (e.g., GM12878 vs K562),
- you can apply variance/Bartlett/Levene tests per CpG using 'pmat01' and 'treatment'.
- Add region-level annotation (promoters, gene bodies, CpG islands) using `GenomicRanges` and TxDb annotations, then compute variability at region level by aggregating CpG variability.
- Implement differential variability tests between groups (e.g., variance comparison between GM12878 and K562).
- Combine this variability pipeline with DMR analysis from methylKit to simultaneously look at mean shifts and heterogeneity.Related Skills
Advanced RE Analysis
Specialized reverse engineering analysis workflows for binary analysis, pattern recognition, and vulnerability assessment
adaptive-temporal-analysis-integration
Integrate adaptive temporal analysis for drift detection.
abaqus-thermal-analysis
Complete workflow for heat transfer analysis - steady-state and transient thermal. Use when user asks about temperature distribution, conduction, convection, or heat flow.
abaqus-static-analysis
Complete workflow for static structural analysis. Use when analyzing stress, displacement, or reaction forces under constant loads. For strength and stiffness evaluation.
abaqus-modal-analysis
Complete workflow for modal/frequency analysis - extract natural frequencies and mode shapes. Use for vibration analysis and resonance avoidance.
abaqus-fatigue-analysis
Workflow for fatigue and durability analysis - cycle counting, damage accumulation, and fatigue life prediction.
abaqus-dynamic-analysis
Complete workflow for dynamic analysis. Use when user mentions impact, crash, drop test, transient, or time-varying response. Handles explicit and implicit dynamics.
abaqus-coupled-analysis
Complete workflow for coupled thermomechanical analysis. Use when user mentions thermal stress, thermal expansion, or temperature causing deformation.
abaqus-contact-analysis
Analyze multi-body contact. Use when user mentions parts touching, friction between surfaces, bolt-plate contact, press fit, or assembly with contact.
A/B Test Analysis
Design and analyze A/B tests, calculate statistical significance, and determine sample sizes for conversion optimization and experiment validation
a-share-analysis
Comprehensive China A-share stock analysis covering fundamental analysis, technical analysis, policy impact assessment, and market-specific features (T+1 trading, price limits, northbound capital flow). Use when user asks about A股分析, Chinese mainland stocks, Shanghai/Shenzhen listed stocks, or needs analysis considering China market characteristics.
differential-region-analysis
The differential-region-analysis pipeline identifies genomic regions exhibiting significant differences in signal intensity between experimental conditions using a count-based framework and DESeq2. It supports detection of both differentially accessible regions (DARs) from open-chromatin assays (e.g., ATAC-seq, DNase-seq) and differential transcription factor (TF) binding regions from TF-centric assays (e.g., ChIP-seq, CUT&RUN, CUT&Tag). The pipeline can start from aligned BAM files or a precomputed count matrix and is suitable whenever genomic signal can be summarized as read counts per region.