methylation-variability-analysis

This skill provides a complete and streamlined workflow for performing methylation variability and epigenetic heterogeneity analysis from whole-genome bisulfite sequencing (WGBS) data. It is designed for researchers who want to quantify CpG-level variability across biological samples or conditions, identify highly variable CpGs (HVCs), and explore epigenetic heterogeneity.

181 stars

bymajiayu000

View on GitHub Installation ↓

Best use case

methylation-variability-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using methylation-variability-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/25-methylation-variability/SKILL.md --create-dirs "https://raw.githubusercontent.com/majiayu000/claude-skill-registry/main/skills/data/25-methylation-variability/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/25-methylation-variability/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How methylation-variability-analysis Compares

Feature / Agent	methylation-variability-analysis	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for ChatGPT

Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

AI Agent for Product Research

Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.

SKILL.md Source

# SKILL: Methylation Variability & Heterogeneity Analysis

## Overview

Main steps include:

- Refer to the **Inputs & Outputs** section to check available inputs and design the output structure.
- **Always prompt user** for genome assembly used.
- **Always prompt user** for which columns in the BED files are methylation fraction/percent and coverage and strand.
- Building a multi-sample CpG methylation matrix from WGBS coverage files.
- Computing **between-sample variability** at CpG level (variance, MAD, CV).

---

## When to use this skill

Use this methylKit-based variability pipeline when you want to:

- Quantify **between-sample variability** at CpG level (e.g., across replicates, cell types, conditions).
- Identify **highly variable CpGs (HVCs)** as candidate epigenetically heterogeneous loci.
- Explore **epigenetic heterogeneity** between groups (e.g., GM12878 vs K562, disease vs control).

---

## Inputs & Outputs

### Inputs

`<sample1>.bed`
`<sample2>.bed`

### Outputs
```bash
methylation_variability/
  stats/
    top_variable_CpGs.tsv
    CpG_variability_stats.tsv
  plots/
    heatmap_top_variable_CpGs.pdf
    distribution_CpG_variance.pdf
    mean_vs_variance_scatter.pdf
  temp/
```

---

## Decision Tree

### Step 1: Prepare the sample meta data
```r
library(methylKit)
file.list <- list(
  "sample1.cov",
  "sample2.cov",
  "sample3.cov"
)

sample.id <- list("S1", "S2", "S3")
treatment <- c(0, 1, 1)  # e.g. 0 = control, 1 = treated

# Read methylation data
myobj <- methRead(
  location = file.list,
  sample.id = sample.id,
  assembly  = "hg38", # provided by user
  treatment = treatment,
  context   = "CpG",
  pipeline = list(
    fraction = FALSE,  # percMeth is 0–100, fraction is 0-1, depend on inputs
    chr.col = 1,
    start.col = 2,
    end.col = 3,
    strand.col = 6, # provided by user
    coverage.col = 10, # provided by user
    freqC.col = 11 # provided by user
  )
)

# Optional filtering: remove low / extremely high coverage CpGs
filtered.myobj <- filterByCoverage(
  myobj,
  lo.count = 10, lo.perc = NULL,
  hi.count = 99.9, hi.perc = TRUE
)

# Unite CpGs across samples (common CpG sites)
meth <- unite(filtered.myobj, destrand = TRUE)
```
### Step 2: Statistical analysis

```r
d <- getData(meth.united)
numCs.cols <- grep("numCs", colnames(d), value = TRUE)
cov.cols   <- grep("coverage", colnames(d), value = TRUE)
pmat01 <- d[, numCs.cols] / d[, cov.cols]
pmat01 <- as.matrix(data.frame(pmat01))

var.cpg <- rowVars(pmat01, na.rm = TRUE) # Variance across samples
mad.cpg <- rowMads(pmat01, na.rm = TRUE) # Median absolute deviation (MAD)

# Coefficient of variation (CV = sd / mean)
mean.cpg <- rowMeans(pmat01, na.rm = TRUE)
sd.cpg <- sqrt(var.cpg)
cv.cpg <- sd.cpg / (mean.cpg + 1e-6)  # add small constant to avoid division by zero

# Assemble statistics table
var.stats <- data.frame(
  chr = d$chr,
  start = d$start,
  end = d$end,
  mean = mean.cpg,
  variance = var.cpg,
  MAD = mad.cpg,
  CV = cv.cpg,
  stringsAsFactors = FALSE
)

var.stats <- var.stats[order(-var.stats$variance), ] # Sort by variance (descending)

# Save full table
write.table(
  var.stats,
  file = "CpG_variability_stats.tsv",
  sep = "	",
  quote = FALSE,
  row.names = FALSE
)
```

### Step 3: high variable CpG selection

```r
topN <- 1000
top.idx <- head(order(-var.cpg), topN)

pmat.top <- pmat01[top.idx, , drop = FALSE]

# Save top-variable CpGs table
write.table(
  var.stats[match(rownames(pmat.top), rownames(var.stats)), ],
  file = "top_variable_CpGs.tsv",
  sep = "	",
  quote = FALSE,
  row.names = FALSE
)
```

### Step 4: Visualization

```r
group.factor <- factor(ifelse(treatment == 0, "GM12878", "K562"))
ha <- HeatmapAnnotation(Group = group.factor)

Heatmap(
  pmat.top,
  name = "methylation",
  show_row_names = FALSE,
  show_column_names = TRUE,
  top_annotation = ha,
  cluster_rows = TRUE,
  cluster_columns = TRUE
)

# Distribution of the CpG variability
var.df <- data.frame(
  variance = var.cpg,
  log10_variance = log10(var.cpg + 1e-8)
)

ggplot(var.df, aes(x = log10_variance)) +
  geom_histogram(bins = 50) +
  theme_minimal() +
  labs(
    title = "CpG-wise methylation variance (log10 scale)",
    x = "log10(variance + 1e-8)",
    y = "Count of CpGs"
  )

# 3. Mean vs Variance scatter plot
ggplot(var.stats, aes(x = mean_methylation, y = variance)) +
    geom_hex(bins = 50) +
    scale_fill_viridis_c(trans = "log10") +
    theme_minimal() +
    labs(
      title = "Mean Methylation vs Variance",
      x = "Mean Methylation",
      y = "Variance",
      fill = "Count (log10)"
    ) +
    theme(
      plot.title = element_text(hjust = 0.5, size = 14, face = "bold")
    )
```
---

## Recommended Extensions

- You can change 'lo.count', 'hi.perc', and 'topN' depending on coverage and dataset size.
- If you want group-wise differential variability (e.g., GM12878 vs K562),
- you can apply variance/Bartlett/Levene tests per CpG using 'pmat01' and 'treatment'.
- Add region-level annotation (promoters, gene bodies, CpG islands) using `GenomicRanges` and TxDb annotations, then compute variability at region level by aggregating CpG variability.
- Implement differential variability tests between groups (e.g., variance comparison between GM12878 and K562).
- Combine this variability pipeline with DMR analysis from methylKit to simultaneously look at mean shifts and heterogeneity.

Related Skills

Advanced RE Analysis

181

from majiayu000/claude-skill-registry

Specialized reverse engineering analysis workflows for binary analysis, pattern recognition, and vulnerability assessment

adaptive-temporal-analysis-integration

181

from majiayu000/claude-skill-registry

Integrate adaptive temporal analysis for drift detection.

abaqus-thermal-analysis

181

from majiayu000/claude-skill-registry

Complete workflow for heat transfer analysis - steady-state and transient thermal. Use when user asks about temperature distribution, conduction, convection, or heat flow.

abaqus-static-analysis

181

from majiayu000/claude-skill-registry

Complete workflow for static structural analysis. Use when analyzing stress, displacement, or reaction forces under constant loads. For strength and stiffness evaluation.

abaqus-modal-analysis

181

from majiayu000/claude-skill-registry

Complete workflow for modal/frequency analysis - extract natural frequencies and mode shapes. Use for vibration analysis and resonance avoidance.

abaqus-fatigue-analysis

181

from majiayu000/claude-skill-registry

Workflow for fatigue and durability analysis - cycle counting, damage accumulation, and fatigue life prediction.

abaqus-dynamic-analysis

181

from majiayu000/claude-skill-registry

Complete workflow for dynamic analysis. Use when user mentions impact, crash, drop test, transient, or time-varying response. Handles explicit and implicit dynamics.

abaqus-coupled-analysis

181

from majiayu000/claude-skill-registry

Complete workflow for coupled thermomechanical analysis. Use when user mentions thermal stress, thermal expansion, or temperature causing deformation.

abaqus-contact-analysis

181

from majiayu000/claude-skill-registry

Analyze multi-body contact. Use when user mentions parts touching, friction between surfaces, bolt-plate contact, press fit, or assembly with contact.

A/B Test Analysis

181

from majiayu000/claude-skill-registry

Design and analyze A/B tests, calculate statistical significance, and determine sample sizes for conversion optimization and experiment validation

a-share-analysis

181

from majiayu000/claude-skill-registry

Comprehensive China A-share stock analysis covering fundamental analysis, technical analysis, policy impact assessment, and market-specific features (T+1 trading, price limits, northbound capital flow). Use when user asks about A股分析, Chinese mainland stocks, Shanghai/Shenzhen listed stocks, or needs analysis considering China market characteristics.

differential-region-analysis

181

from majiayu000/claude-skill-registry

The differential-region-analysis pipeline identifies genomic regions exhibiting significant differences in signal intensity between experimental conditions using a count-based framework and DESeq2. It supports detection of both differentially accessible regions (DARs) from open-chromatin assays (e.g., ATAC-seq, DNase-seq) and differential transcription factor (TF) binding regions from TF-centric assays (e.g., ChIP-seq, CUT&RUN, CUT&Tag). The pipeline can start from aligned BAM files or a precomputed count matrix and is suitable whenever genomic signal can be summarized as read counts per region.