bio-methylation-methylkit
DNA methylation analysis with methylKit in R. Import Bismark coverage files, filter by coverage, normalize samples, and perform statistical comparisons. Use when analyzing single-base methylation patterns, comparing samples, or preparing data for DMR detection.
Best use case
bio-methylation-methylkit is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
DNA methylation analysis with methylKit in R. Import Bismark coverage files, filter by coverage, normalize samples, and perform statistical comparisons. Use when analyzing single-base methylation patterns, comparing samples, or preparing data for DMR detection.
Teams using bio-methylation-methylkit should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/bio-methylation-methylkit/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How bio-methylation-methylkit Compares
| Feature / Agent | bio-methylation-methylkit | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
DNA methylation analysis with methylKit in R. Import Bismark coverage files, filter by coverage, normalize samples, and perform statistical comparisons. Use when analyzing single-base methylation patterns, comparing samples, or preparing data for DMR detection.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
## Version Compatibility
Reference examples tested with: Bismark 0.24+, methylKit 1.28+
Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# methylKit Analysis
**"Analyze methylation patterns across my samples"** → Import per-cytosine methylation data, filter by coverage, normalize across samples, and test for differential methylation at individual CpG sites.
- R: `methylKit::methRead()` → `filterByCoverage()` → `normalizeCoverage()` → `calculateDiffMeth()`
## Read Bismark Coverage Files
```r
library(methylKit)
file_list <- list('sample1.bismark.cov.gz', 'sample2.bismark.cov.gz',
'sample3.bismark.cov.gz', 'sample4.bismark.cov.gz')
sample_ids <- c('ctrl_1', 'ctrl_2', 'treat_1', 'treat_2')
treatment <- c(0, 0, 1, 1) # 0 = control, 1 = treatment
meth_obj <- methRead(
location = as.list(file_list),
sample.id = as.list(sample_ids),
treatment = treatment,
assembly = 'hg38',
context = 'CpG',
pipeline = 'bismarkCoverage'
)
```
## Read Bismark cytosine Report
```r
meth_obj <- methRead(
location = as.list(file_list),
sample.id = as.list(sample_ids),
treatment = treatment,
assembly = 'hg38',
context = 'CpG',
pipeline = 'bismarkCytosineReport'
)
```
## Basic Statistics
```r
# Coverage statistics
getMethylationStats(meth_obj[[1]], plot = TRUE, both.strands = FALSE)
# Coverage per sample
getCoverageStats(meth_obj[[1]], plot = TRUE, both.strands = FALSE)
```
## Filter by Coverage
```r
# Remove CpGs with very low or very high coverage
meth_filtered <- filterByCoverage(
meth_obj,
lo.count = 10, # Minimum 10 reads
lo.perc = NULL,
hi.count = NULL,
hi.perc = 99.9 # Remove top 0.1% (likely PCR artifacts)
)
```
## Normalize Coverage
```r
# Normalize coverage between samples (recommended)
meth_norm <- normalizeCoverage(meth_filtered, method = 'median')
```
## Merge Samples (Unite)
```r
# Find common CpGs across all samples
meth_united <- unite(meth_norm, destrand = TRUE) # Combine strands
# Allow some missing data
meth_united <- unite(meth_norm, destrand = TRUE, min.per.group = 2L)
```
## Visualize Samples
```r
# Correlation between samples
getCorrelation(meth_united, plot = TRUE)
# PCA of samples
PCASamples(meth_united, screeplot = TRUE)
PCASamples(meth_united)
# Clustering
clusterSamples(meth_united, dist = 'correlation', method = 'ward.D', plot = TRUE)
```
## Differential Methylation (Single CpGs)
```r
# Calculate differential methylation
diff_meth <- calculateDiffMeth(
meth_united,
overdispersion = 'MN', # Use shrinkage
test = 'Chisq',
mc.cores = 4
)
# Get significant differentially methylated CpGs
dmcs <- getMethylDiff(diff_meth, difference = 25, qvalue = 0.01)
# Hyper vs hypomethylated
dmcs_hyper <- getMethylDiff(diff_meth, difference = 25, qvalue = 0.01, type = 'hyper')
dmcs_hypo <- getMethylDiff(diff_meth, difference = 25, qvalue = 0.01, type = 'hypo')
```
## Tile-Based Analysis (Regions)
**Goal:** Detect differentially methylated regions by aggregating single CpG data into fixed-size genomic windows.
**Approach:** Tile individual CpG measurements into 1kb windows, unite common tiles across samples, and run differential methylation testing on the aggregated tiles.
```r
# Aggregate CpGs into tiles/windows
tiles <- tileMethylCounts(meth_obj, win.size = 1000, step.size = 1000)
tiles_united <- unite(tiles, destrand = TRUE)
# Differential methylation on tiles
diff_tiles <- calculateDiffMeth(tiles_united, overdispersion = 'MN', mc.cores = 4)
dmrs <- getMethylDiff(diff_tiles, difference = 25, qvalue = 0.01)
```
## Export Results
```r
# To data frame
diff_df <- getData(dmcs)
write.csv(diff_df, 'dmcs_results.csv', row.names = FALSE)
# To BED file
library(genomation)
dmcs_gr <- as(dmcs, 'GRanges')
export(dmcs_gr, 'dmcs.bed', format = 'BED')
```
## Annotate with Genomic Features
```r
library(genomation)
gene_obj <- readTranscriptFeatures('genes.bed')
annotated <- annotateWithGeneParts(as(dmcs, 'GRanges'), gene_obj)
# Or with annotatr
library(annotatr)
annotations <- build_annotations(genome = 'hg38', annotations = 'hg38_basicgenes')
dmcs_annotated <- annotate_regions(regions = as(dmcs, 'GRanges'), annotations = annotations)
```
## Reorganize for Multi-Group Comparison
```r
# For more than 2 groups
meth_obj <- reorganize(
meth_united,
sample.ids = c('A1', 'A2', 'B1', 'B2', 'C1', 'C2'),
treatment = c(0, 0, 1, 1, 2, 2)
)
```
## Pool Replicates
```r
# Combine biological replicates
meth_pooled <- pool(meth_united, sample.ids = c('control', 'treatment'))
```
## Key Functions
| Function | Purpose |
|----------|---------|
| methRead | Read methylation files |
| filterByCoverage | Remove low/high coverage |
| normalizeCoverage | Normalize between samples |
| unite | Find common CpGs |
| calculateDiffMeth | Statistical test |
| getMethylDiff | Filter significant results |
| tileMethylCounts | Region-level analysis |
| PCASamples | PCA visualization |
| getCorrelation | Sample correlation |
## Key Parameters for calculateDiffMeth
| Parameter | Default | Description |
|-----------|---------|-------------|
| overdispersion | none | MN (shrinkage) or shrinkMN |
| test | Chisq | Chisq, F, fast.fisher |
| mc.cores | 1 | Parallel cores |
| slim | TRUE | Remove unused columns |
## Related Skills
- bismark-alignment - Generate input BAM files
- methylation-calling - Extract coverage files
- dmr-detection - Advanced DMR methods
- pathway-analysis/go-enrichment - Functional annotationRelated Skills
bio-methylation-dmr-detection
Differentially methylated region (DMR) detection using methylKit tiles, bsseq BSmooth, and DMRcate. Use when identifying contiguous genomic regions with methylation differences between experimental conditions or cell types.
bio-methylation-calling
Extract methylation calls from Bismark BAM files using bismark_methylation_extractor. Generates per-cytosine reports for CpG, CHG, and CHH contexts. Use when extracting methylation levels from aligned bisulfite sequencing data for downstream analysis.
bio-methylation-bismark-alignment
Bisulfite sequencing read alignment using Bismark with bowtie2/hisat2. Handles genome preparation and produces BAM files with methylation information. Use when aligning WGBS, RRBS, or other bisulfite-converted sequencing reads to a reference genome.
bio-methylation-based-detection
Analyzes cfDNA methylation patterns for cancer detection using cfMeDIP-seq or bisulfite sequencing with MethylDackel. Identifies cancer-specific methylation signatures and performs tissue-of-origin deconvolution. Use when using methylation biomarkers for early cancer detection or minimal residual disease.
bio-long-read-sequencing-nanopore-methylation
Calls DNA methylation from Oxford Nanopore sequencing data using signal-level analysis. Use when detecting 5mC or 6mA modifications directly from nanopore reads without bisulfite conversion.
zinc-database
Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.
zarr-python
Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.
xlsx
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.
writing-skills
Use when creating new skills, editing existing skills, or verifying skills work before deployment
writing-plans
Use when you have a spec or requirements for a multi-step task, before touching code
wikipedia-search
Search and fetch structured content from Wikipedia using the MediaWiki API for reliable, encyclopedic information
wellally-tech
Integrate digital health data sources (Apple Health, Fitbit, Oura Ring) and connect to WellAlly.tech knowledge base. Import external health device data, standardize to local format, and recommend relevant WellAlly.tech knowledge base articles based on health data. Support generic CSV/JSON import, provide intelligent article recommendations, and help users better manage personal health data.