bio-de-visualization

Visualize differential expression results using DESeq2/edgeR built-in functions. Covers plotMA, plotDispEsts, plotCounts, plotBCV, sample distance heatmaps, and p-value histograms. Use when visualizing differential expression results.

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

bio-de-visualization is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using bio-de-visualization should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-de-visualization/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-de-visualization/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bio-de-visualization/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bio-de-visualization Compares

Feature / Agent	bio-de-visualization	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## Version Compatibility

Reference examples tested with: DESeq2 1.42+, edgeR 4.0+, ggplot2 3.5+, limma 3.58+, matplotlib 3.8+

Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# DE Visualization

Create visualizations for differential expression analysis using DESeq2 and edgeR built-in plotting functions.

## Scope

This skill covers **DE-specific built-in functions**:
- DESeq2: `plotMA()`, `plotPCA()`, `plotDispEsts()`, `plotCounts()`
- edgeR: `plotMD()`, `plotBCV()`, `plotMDS()`
- Sample distance heatmaps and p-value distributions

**For custom ggplot2/matplotlib implementations** of volcano, MA, and PCA plots, see `data-visualization/specialized-omics-plots`.

## Required Libraries

```r
library(DESeq2)
library(ggplot2)
library(pheatmap)
library(RColorBrewer)
library(ggrepel)  # For labeled points
```

## Installation

```r
install.packages(c('ggplot2', 'pheatmap', 'RColorBrewer', 'ggrepel'))
# Optional: Enhanced volcano plots
BiocManager::install('EnhancedVolcano')
```

## MA Plot

**Goal:** Visualize the relationship between mean expression and log fold change to assess DE results.

**Approach:** Plot log fold change against mean normalized counts, highlighting significant genes.

**"Make an MA plot of my DE results"** → Plot mean expression vs. fold change with significant genes colored, using plotMA or ggplot2.

### DESeq2 MA Plot

```r
# Built-in MA plot
plotMA(res, ylim = c(-5, 5), main = 'MA Plot')

# With custom alpha
plotMA(res, alpha = 0.05, ylim = c(-5, 5))

# Highlight specific genes
plotMA(res, ylim = c(-5, 5))
with(subset(res, padj < 0.01 & abs(log2FoldChange) > 2),
     points(baseMean, log2FoldChange, col = 'red', pch = 20))
```

### Custom ggplot2 MA Plot

```r
res_df <- as.data.frame(res)
res_df$significant <- res_df$padj < 0.05 & !is.na(res_df$padj)

ggplot(res_df, aes(x = log10(baseMean), y = log2FoldChange, color = significant)) +
    geom_point(alpha = 0.5, size = 1) +
    scale_color_manual(values = c('grey60', 'red')) +
    geom_hline(yintercept = 0, linetype = 'dashed') +
    labs(x = 'log10(Mean Expression)', y = 'log2 Fold Change', title = 'MA Plot') +
    theme_bw() +
    theme(legend.position = 'bottom')
```

### edgeR MA Plot

```r
# Using plotMD (mean-difference plot)
plotMD(qlf, main = 'MD Plot')
abline(h = c(-1, 1), col = 'blue', lty = 2)
```

## Volcano Plot

**Goal:** Display statistical significance against fold change magnitude to identify the most important DE genes.

**Approach:** Plot -log10(p-value) vs. log2 fold change with threshold lines and optional gene labels.

**"Create a volcano plot of differentially expressed genes"** → Scatter plot of fold change vs. significance with colored significance regions and labeled top hits.

### Basic Volcano Plot

```r
res_df <- as.data.frame(res)
res_df$significant <- res_df$padj < 0.05 & abs(res_df$log2FoldChange) > 1

ggplot(res_df, aes(x = log2FoldChange, y = -log10(pvalue), color = significant)) +
    geom_point(alpha = 0.5, size = 1) +
    scale_color_manual(values = c('grey60', 'red')) +
    geom_vline(xintercept = c(-1, 1), linetype = 'dashed', color = 'blue') +
    geom_hline(yintercept = -log10(0.05), linetype = 'dashed', color = 'blue') +
    labs(x = 'log2 Fold Change', y = '-log10(p-value)', title = 'Volcano Plot') +
    theme_bw()
```

### Volcano with Gene Labels

```r
res_df <- as.data.frame(res)
res_df$gene <- rownames(res_df)
res_df$significant <- res_df$padj < 0.05 & abs(res_df$log2FoldChange) > 1

# Label top genes
top_genes <- head(res_df[order(res_df$padj), ], 10)

ggplot(res_df, aes(x = log2FoldChange, y = -log10(pvalue))) +
    geom_point(aes(color = significant), alpha = 0.5, size = 1) +
    scale_color_manual(values = c('grey60', 'red')) +
    geom_text_repel(data = top_genes, aes(label = gene),
                    size = 3, max.overlaps = 20) +
    geom_vline(xintercept = c(-1, 1), linetype = 'dashed') +
    geom_hline(yintercept = -log10(0.05), linetype = 'dashed') +
    labs(x = 'log2 Fold Change', y = '-log10(p-value)') +
    theme_bw()
```

### EnhancedVolcano

```r
library(EnhancedVolcano)

EnhancedVolcano(res,
    lab = rownames(res),
    x = 'log2FoldChange',
    y = 'pvalue',
    pCutoff = 0.05,
    FCcutoff = 1,
    title = 'Differential Expression',
    subtitle = 'Treatment vs Control')
```

## PCA Plot

**Goal:** Assess sample clustering and identify batch effects or outliers via dimensionality reduction.

**Approach:** Apply variance-stabilizing transformation then project samples onto principal components, coloring by experimental variables.

**"Show me a PCA plot of my samples"** → Perform PCA on transformed expression data and visualize sample separation by condition and batch.

### DESeq2 PCA

```r
# Variance stabilizing transformation first
vsd <- vst(dds, blind = FALSE)

# Basic PCA
plotPCA(vsd, intgroup = 'condition')

# With more options
plotPCA(vsd, intgroup = c('condition', 'batch'), ntop = 500)
```

### Custom PCA with ggplot2

```r
vsd <- vst(dds, blind = FALSE)
pca_data <- plotPCA(vsd, intgroup = c('condition', 'batch'), returnData = TRUE)
percentVar <- round(100 * attr(pca_data, 'percentVar'))

ggplot(pca_data, aes(x = PC1, y = PC2, color = condition, shape = batch)) +
    geom_point(size = 4) +
    xlab(paste0('PC1: ', percentVar[1], '% variance')) +
    ylab(paste0('PC2: ', percentVar[2], '% variance')) +
    ggtitle('PCA Plot') +
    theme_bw() +
    theme(legend.position = 'right')
```

### edgeR PCA (via limma)

```r
library(limma)
log_cpm <- cpm(y, log = TRUE)
plotMDS(log_cpm, col = as.numeric(group), pch = 16)
legend('topright', legend = levels(group), col = 1:nlevels(group), pch = 16)
```

## Heatmaps

**Goal:** Visualize expression patterns of significant genes across samples to reveal clusters and condition effects.

**Approach:** Z-score normalize VST-transformed counts for significant genes and cluster with pheatmap, annotating by condition.

**"Make a heatmap of the top differentially expressed genes"** → Extract significant genes, z-score normalize, and create a clustered heatmap with sample annotations.

### Top DE Genes Heatmap

```r
library(pheatmap)

# Get top significant genes
sig_genes <- rownames(subset(res, padj < 0.01))

# Get normalized counts
vsd <- vst(dds, blind = FALSE)
mat <- assay(vsd)[sig_genes, ]

# Scale by row (z-score)
mat_scaled <- t(scale(t(mat)))

# Create annotation
annotation_col <- data.frame(
    condition = colData(dds)$condition,
    row.names = colnames(mat)
)

pheatmap(mat_scaled,
         annotation_col = annotation_col,
         show_rownames = FALSE,
         clustering_distance_rows = 'correlation',
         clustering_distance_cols = 'correlation',
         color = colorRampPalette(c('blue', 'white', 'red'))(100),
         main = 'Top DE Genes')
```

### Sample Distance Heatmap

```r
vsd <- vst(dds, blind = FALSE)

# Calculate sample distances
sampleDists <- dist(t(assay(vsd)))
sampleDistMatrix <- as.matrix(sampleDists)

# Annotation
annotation <- data.frame(
    condition = colData(dds)$condition,
    row.names = colnames(dds)
)

pheatmap(sampleDistMatrix,
         annotation_col = annotation,
         annotation_row = annotation,
         clustering_distance_rows = sampleDists,
         clustering_distance_cols = sampleDists,
         color = colorRampPalette(c('white', 'steelblue'))(100),
         main = 'Sample Distance Matrix')
```

### Gene Expression Heatmap

```r
# Select genes of interest
genes_of_interest <- c('gene1', 'gene2', 'gene3', 'gene4', 'gene5')
mat <- assay(vsd)[genes_of_interest, ]

pheatmap(mat,
         scale = 'row',
         annotation_col = annotation_col,
         show_rownames = TRUE,
         cluster_cols = TRUE,
         cluster_rows = TRUE,
         main = 'Genes of Interest')
```

## Dispersion Plot

**Goal:** Assess the fit of the dispersion model to verify DE analysis assumptions.

**Approach:** Plot gene-wise, fitted, and final dispersion estimates against mean expression.

### DESeq2

```r
plotDispEsts(dds, main = 'Dispersion Estimates')
```

### edgeR

```r
plotBCV(y, main = 'Biological Coefficient of Variation')
```

## Counts Plot for Individual Genes

**Goal:** Visualize expression of a specific gene across samples and conditions.

**Approach:** Extract per-sample counts for a gene and plot by condition using plotCounts or ggplot2.

### DESeq2

```r
# Plot counts for a specific gene
plotCounts(dds, gene = 'GENE_NAME', intgroup = 'condition')

# With ggplot2
d <- plotCounts(dds, gene = 'GENE_NAME', intgroup = 'condition', returnData = TRUE)
ggplot(d, aes(x = condition, y = count, color = condition)) +
    geom_point(position = position_jitter(width = 0.1), size = 3) +
    scale_y_log10() +
    ggtitle('GENE_NAME Expression') +
    theme_bw()
```

### edgeR

```r
# Get CPM for a gene
gene_idx <- which(rownames(y) == 'GENE_NAME')
cpm_gene <- cpm(y)[gene_idx, ]

# Plot
df <- data.frame(cpm = cpm_gene, group = group)
ggplot(df, aes(x = group, y = cpm, color = group)) +
    geom_point(position = position_jitter(width = 0.1), size = 3) +
    scale_y_log10() +
    labs(y = 'CPM', title = 'GENE_NAME Expression') +
    theme_bw()
```

## P-value Histogram

**Goal:** Diagnose the quality of the DE analysis by examining the raw p-value distribution.

**Approach:** Histogram of raw p-values; a uniform distribution with a peak near zero indicates a well-calibrated test.

```r
# Check p-value distribution (should be uniform under null with peak near 0)
res_df <- as.data.frame(res)
ggplot(res_df, aes(x = pvalue)) +
    geom_histogram(bins = 50, fill = 'steelblue', color = 'white') +
    labs(x = 'P-value', y = 'Frequency', title = 'P-value Distribution') +
    theme_bw()
```

## Saving Plots

**Goal:** Export publication-quality plots in vector or raster formats.

**Approach:** Use pdf/png devices or ggsave with appropriate resolution and dimensions.

```r
# Save as PDF (vector)
pdf('volcano_plot.pdf', width = 8, height = 6)
# ... plot code ...
dev.off()

# Save as PNG (raster)
png('volcano_plot.png', width = 800, height = 600, res = 150)
# ... plot code ...
dev.off()

# Using ggsave for ggplot objects
p <- ggplot(...) + ...
ggsave('plot.pdf', p, width = 8, height = 6)
ggsave('plot.png', p, width = 8, height = 6, dpi = 300)
```

## Color Palettes

**Goal:** Select appropriate color schemes for heatmaps and categorical data.

**Approach:** Use RColorBrewer palettes -- diverging for expression, sequential for distances, qualitative for groups.

```r
# For heatmaps
library(RColorBrewer)

# Diverging (for expression: blue-white-red)
colorRampPalette(rev(brewer.pal(n = 7, name = 'RdBu')))(100)

# Sequential (for distances)
colorRampPalette(brewer.pal(n = 9, name = 'Blues'))(100)

# For categorical groups
brewer.pal(n = 8, name = 'Set1')
```

## Quick Reference: Common Plots

| Plot | Purpose | Function |
|------|---------|----------|
| MA plot | LFC vs mean expression | `plotMA()`, `plotMD()` |
| Volcano | LFC vs significance | ggplot2, EnhancedVolcano |
| PCA | Sample clustering | `plotPCA()`, `plotMDS()` |
| Heatmap | Gene patterns | `pheatmap()` |
| Dispersion | Model fit | `plotDispEsts()`, `plotBCV()` |
| Counts | Individual genes | `plotCounts()` |

## Related Skills

- deseq2-basics - Generate DESeq2 results for visualization
- edger-basics - Generate edgeR results for visualization
- de-results - Filter genes before visualization
- data-visualization/specialized-omics-plots - Custom ggplot2 volcano/MA/PCA functions
- data-visualization/heatmaps-clustering - Advanced heatmap customization

Related Skills

scientific-visualization

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.

bio-tcr-bcr-analysis-repertoire-visualization

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Create publication-quality visualizations of immune repertoire data including circos plots, clone tracking, diversity plots, and network graphs. Use when generating figures for repertoire comparisons, clonal dynamics, or V(D)J gene usage.

bio-spatial-transcriptomics-spatial-visualization

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Visualize spatial transcriptomics data using Squidpy and Scanpy. Create tissue plots with gene expression, clusters, and annotations overlaid on histology images. Use when visualizing spatial expression patterns.

bio-pathway-enrichment-visualization

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Visualize enrichment results using enrichplot package functions. Use when creating publication-quality figures from clusterProfiler results. Covers dotplot, barplot, cnetplot, emapplot, gseaplot2, ridgeplot, and treeplot.

bio-metagenomics-visualization

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Visualize metagenomic profiles using R (phyloseq, microbiome) and Python (matplotlib, seaborn). Create stacked bar plots, heatmaps, PCA plots, and diversity analyses. Use when creating publication-quality figures from MetaPhlAn, Bracken, or other taxonomic profiling output.

bio-hi-c-analysis-hic-visualization

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Visualize Hi-C contact matrices, TADs, loops, and genomic features using matplotlib, cooltools, and HiCExplorer. Create triangle plots, virtual 4C, and multi-track figures. Use when visualizing contact matrices or genomic features.

bio-copy-number-cnv-visualization

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Visualize copy number profiles, segments, and compare across samples. Create publication-quality plots of CNV data from CNVkit, GATK, or other callers. Use when creating genome-wide CNV plots, sample heatmaps, or chromosome-level visualizations.

bio-chipseq-visualization

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Visualize ChIP-seq data using deepTools, Gviz, and ChIPseeker. Create heatmaps, profile plots, and genome browser tracks. Visualize signal around peaks, TSS, or custom regions. Use when visualizing ChIP-seq signal and peaks.

zinc-database

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

zarr-python

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

xlsx

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.

writing-skills

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Use when creating new skills, editing existing skills, or verifying skills work before deployment