bulk-rna-seq-batch-correction-with-combat

Use omicverse's pyComBat wrapper to remove batch effects from merged bulk RNA-seq or microarray cohorts, export corrected matrices, and benchmark pre/post correction visualisations.

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

bulk-rna-seq-batch-correction-with-combat is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Use omicverse's pyComBat wrapper to remove batch effects from merged bulk RNA-seq or microarray cohorts, export corrected matrices, and benchmark pre/post correction visualisations.

Teams using bulk-rna-seq-batch-correction-with-combat should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bulk-combat-correction/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bulk-combat-correction/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bulk-combat-correction/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bulk-rna-seq-batch-correction-with-combat Compares

Feature / Agent	bulk-rna-seq-batch-correction-with-combat	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Use omicverse's pyComBat wrapper to remove batch effects from merged bulk RNA-seq or microarray cohorts, export corrected matrices, and benchmark pre/post correction visualisations.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Bulk RNA-seq batch correction with ComBat

## Overview
Apply this skill when a user has multiple bulk expression matrices measured across different batches and needs to harmonise them
 before downstream analysis. It follows [`t_bulk_combat.ipynb`](../../omicverse_guide/docs/Tutorials-bulk/t_bulk_combat.ipynb), w
hich demonstrates the pyComBat workflow on ovarian cancer microarray cohorts.

## Instructions
1. **Import core libraries**
   - Load `omicverse as ov`, `anndata`, `pandas as pd`, and `matplotlib.pyplot as plt`.
   - Call `ov.ov_plot_set()` (aliased `ov.plot_set()` in some releases) to align figures with omicverse styling.
2. **Load each batch separately**
   - Read the prepared pickled matrices (or user-provided expression tables) with `pd.read_pickle(...)`/`pd.read_csv(...)`.
   - Transpose to gene × sample before wrapping them in `anndata.AnnData` objects so `adata.obs` stores sample metadata.
   - Assign a `batch` column for every cohort (`adata.obs['batch'] = '1'`, `'2'`, ...). Encourage descriptive labels when availa
ble.
3. **Concatenate on shared genes**
   - Use `anndata.concat([adata1, adata2, adata3], merge='same')` to retain the intersection of genes across batches.
   - Confirm the combined `adata` reports balanced sample counts per batch; if not, prompt users to re-check inputs.
4. **Run ComBat batch correction**
   - Execute `ov.bulk.batch_correction(adata, batch_key='batch')`.
   - Explain that corrected values are stored in `adata.layers['batch_correction']` while the original counts remain in `adata.X`.
5. **Export corrected and raw matrices**
   - Obtain DataFrames via `adata.to_df().T` (raw) and `adata.to_df(layer='batch_correction').T` (corrected).
   - Encourage saving both tables (`.to_csv(...)`) plus the harmonised AnnData (`adata.write_h5ad('adata_batch.h5ad', compressio
n='gzip')`).
6. **Benchmark the correction**
   - For per-sample variance checks, draw before/after boxplots and recolour boxes using `ov.utils.red_color`, `blue_color`, `gree
n_color` palettes to match batches.
   - Copy raw counts to a named layer with `adata.layers['raw'] = adata.X.copy()` before PCA.
   - Run `ov.pp.pca(adata, layer='raw', n_pcs=50)` and `ov.pp.pca(adata, layer='batch_correction', n_pcs=50)`.
   - Visualise embeddings with `ov.utils.embedding(..., basis='raw|original|X_pca', color='batch', frameon='small')` and repeat fo
r the corrected layer to verify mixing.
7. **Troubleshooting tips**
   - Mismatched gene identifiers cause dropped features—remind users to harmonise feature names (e.g., gene symbols) before conca
tenation.
   - pyComBat expects log-scale intensities or similarly distributed counts; recommend log-transforming strongly skewed matrices.
   - If `batch_correction` layer is missing, ensure the `batch_key` matches the column name in `adata.obs`.

## Examples
- "Combine three GEO ovarian cohorts, run ComBat, and export both the raw and corrected CSV matrices."
- "Plot PCA embeddings before and after batch correction to confirm that batches 1–3 overlap."
- "Save the harmonised AnnData file so I can reload it later for downstream DEG analysis."

## References
- Tutorial notebook: [`t_bulk_combat.ipynb`](../../omicverse_guide/docs/Tutorials-bulk/t_bulk_combat.ipynb)
- Example inputs: [`omicverse_guide/docs/Tutorials-bulk/data/combat/`](../../omicverse_guide/docs/Tutorials-bulk/data/combat/)
- Quick copy/paste commands: [`reference.md`](reference.md)

Related Skills

tcga-bulk-data-preprocessing-with-omicverse

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Guide Claude through ingesting TCGA sample sheets, expression archives, and clinical carts into omicverse, initialising survival metadata, and exporting annotated AnnData files.

single-cell-clustering-and-batch-correction-with-omicverse

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Guide Claude through omicverse's single-cell clustering workflow, covering preprocessing, QC, multimethod clustering, topic modeling, cNMF, and cross-batch integration as demonstrated in t_cluster.ipynb and t_single_batch.ipynb.

bulk-wgcna-analysis-with-omicverse

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Assist Claude in running PyWGCNA through omicverse—preprocessing expression matrices, constructing co-expression modules, visualising eigengenes, and extracting hub genes.

bulktrajblend-trajectory-interpolation

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Extend scRNA-seq developmental trajectories with BulkTrajBlend by generating intermediate cells from bulk RNA-seq, training beta-VAE and GNN models, and interpolating missing states.

bulk-rna-seq-deconvolution-with-bulk2single

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Turn bulk RNA-seq cohorts into synthetic single-cell datasets using omicverse's Bulk2Single workflow for cell fraction estimation, beta-VAE generation, and quality control comparisons against reference scRNA-seq.

bulk-rna-seq-deseq2-analysis-with-omicverse

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Walk Claude through PyDESeq2-based differential expression, including ID mapping, DE testing, fold-change thresholding, and enrichment visualisation.

bulk-rna-seq-differential-expression-with-omicverse

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Guide Claude through omicverse's bulk RNA-seq DEG pipeline, from gene ID mapping and DESeq2 normalization to statistical testing, visualization, and pathway enrichment. Use when a user has bulk count matrices and needs differential expression analysis in omicverse.

bio-single-cell-batch-integration

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Integrate multiple scRNA-seq samples/batches using Harmony, scVI, Seurat anchors, and fastMNN. Remove technical variation while preserving biological differences. Use when integrating multiple scRNA-seq batches or datasets.

bio-differential-expression-batch-correction

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Remove batch effects from RNA-seq data using ComBat, ComBat-Seq, limma removeBatchEffect, and SVA for unknown batch variables. Use when correcting batch effects in expression data.

bio-crispr-screens-batch-correction

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Batch effect correction for CRISPR screens. Covers normalization across batches, technical replicate handling, and batch-aware analysis. Use when combining screens from multiple batches or correcting systematic technical variation.

bio-batch-processing

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Process multiple sequence files in batch using Biopython. Use when working with many files, merging/splitting sequences, or automating file operations across directories.

zinc-database

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.