bulk-rna-seq-differential-expression-with-omicverse

Guide Claude through omicverse's bulk RNA-seq DEG pipeline, from gene ID mapping and DESeq2 normalization to statistical testing, visualization, and pathway enrichment. Use when a user has bulk count matrices and needs differential expression analysis in omicverse.

1,802 stars

Best use case

bulk-rna-seq-differential-expression-with-omicverse is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Guide Claude through omicverse's bulk RNA-seq DEG pipeline, from gene ID mapping and DESeq2 normalization to statistical testing, visualization, and pathway enrichment. Use when a user has bulk count matrices and needs differential expression analysis in omicverse.

Teams using bulk-rna-seq-differential-expression-with-omicverse should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bulk-deg-analysis/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bulk-deg-analysis/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/bulk-deg-analysis/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How bulk-rna-seq-differential-expression-with-omicverse Compares

Feature / Agentbulk-rna-seq-differential-expression-with-omicverseStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Guide Claude through omicverse's bulk RNA-seq DEG pipeline, from gene ID mapping and DESeq2 normalization to statistical testing, visualization, and pathway enrichment. Use when a user has bulk count matrices and needs differential expression analysis in omicverse.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Bulk RNA-seq differential expression with omicverse

## Overview
Follow this skill to run the end-to-end differential expression (DEG) workflow showcased in [`t_deg.ipynb`](../../omicverse_guide/docs/Tutorials-bulk/t_deg.ipynb). It assumes the user provides a raw gene-level count matrix (e.g., from featureCounts) and wants to analyse bulk RNA-seq cohorts inside omicverse.

## Instructions
1. **Set up the session**
   - Import `omicverse as ov`, `scanpy as sc`, and `matplotlib.pyplot as plt`.
   - Call `ov.plot_set()` so downstream plots adopt omicverse styling.
2. **Prepare ID mapping assets**
   - When gene IDs must be converted to gene symbols, instruct the user to download mapping pairs via `ov.utils.download_geneid_annotation_pair()` and store them under `genesets/`.
   - Mention the available prebuilt genomes (T2T-CHM13, GRCh38, GRCh37, GRCm39, danRer7, danRer11) and that users can generate their own mapping from GTF files if needed.
3. **Load the raw counts**
   - Read tab-delimited featureCounts output with `ov.pd.read_csv(..., sep='\t', header=1, index_col=0)`.
   - Strip trailing `.bam` segments from column names using list comprehension so sample IDs are clean.
4. **Map gene identifiers**
   - Run `ov.bulk.Matrix_ID_mapping(counts_df, 'genesets/pair_<GENOME>.tsv')` to replace `gene_id` entries with gene symbols.
5. **Initialise the DEG object**
   - Create `dds = ov.bulk.pyDEG(mapped_counts)`.
   - Handle duplicate gene symbols with `dds.drop_duplicates_index()` to keep the highest expressed version.
6. **Normalise and estimate size factors**
   - Execute `dds.normalize()` to calculate DESeq2 size factors, correcting for library size and batch differences.
7. **Run differential testing**
   - Collect treatment and control replicate labels into lists.
   - Call `dds.deg_analysis(treatment_groups, control_groups, method='ttest')` for the default Welch t-test.
   - Offer optional alternatives: `method='edgepy'` for edgeR-like tests and `method='limma'` for limma-style modelling.
8. **Filter and threshold results**
   - Note that lowly expressed genes are retained by default; filter using `dds.result.loc[dds.result['log2(BaseMean)'] > 1]` when needed.
   - Set dynamic fold-change and significance cutoffs via `dds.foldchange_set(fc_threshold=-1, pval_threshold=0.05, logp_max=6)` (`fc_threshold=-1` auto-selects based on log2FC distribution).
9. **Visualise differential expression**
   - Produce volcano plots with `dds.plot_volcano(title=..., figsize=..., plot_genes=... or plot_genes_num=...)` to highlight key genes.
   - Generate per-gene boxplots using `dds.plot_boxplot(genes=[...], treatment_groups=..., control_groups=..., figsize=..., legend_bbox=...)`; adjust y-axis tick labels if required.
10. **Perform pathway enrichment (optional)**
    - Download curated pathway libraries through `ov.utils.download_pathway_database()`.
    - Load genesets with `ov.utils.geneset_prepare(<path>, organism='Mouse'|'Human'|...)`.
    - Build the DEG gene list from `dds.result.loc[dds.result['sig'] != 'normal'].index`.
    - Run enrichment with `ov.bulk.geneset_enrichment(gene_list=deg_genes, pathways_dict=..., pvalue_type='auto', organism=...)`. Encourage users without internet access to provide a `background` gene list.
    - Visualise single-library results via `ov.bulk.geneset_plot(...)` and combine multiple ontologies using `ov.bulk.geneset_plot_multi(enr_dict, colors_dict, num=...)`.
11. **Document outputs**
    - Suggest exporting `dds.result` and enrichment tables to CSV for downstream reporting.
    - Encourage users to save figures generated by matplotlib (`plt.savefig(...)`) when running outside notebooks.
12. **Troubleshooting tips**
    - Ensure sample labels in `treatment_groups`/`control_groups` exactly match column names post-cleanup.
    - Verify required packages (`omicverse`, `pyComplexHeatmap`, `gseapy`) are installed for enrichment visualisations.
    - Remind users that internet access is required the first time they download gene mappings or pathway databases.

## Examples
- "I have a featureCounts matrix for mouse tumour samples—normalize it with DESeq2, run t-test DEG, and highlight the top 8 genes in a volcano plot."
- "Use omicverse to compute edgeR-style differential expression between treated and control replicates, then run GO enrichment on significant genes."
- "Guide me through converting Ensembl IDs to symbols, performing limma DEG, and plotting boxplots for Krtap9-5 and Lef1."

## References
- Detailed walkthrough notebook: [`t_deg.ipynb`](../../omicverse_guide/docs/Tutorials-bulk/t_deg.ipynb)
- Sample count matrix for testing: [`sample/counts.txt`](../../sample/counts.txt)
- Quick copy/paste commands: [`reference.md`](reference.md)

Related Skills

tooluniverse-expression-data-retrieval

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Retrieves gene expression and omics datasets from ArrayExpress and BioStudies with gene disambiguation, experiment quality assessment, and structured reports. Creates comprehensive dataset profiles with metadata, sample information, and download links. Use when users need expression data, omics datasets, or mention ArrayExpress (E-MTAB, E-GEOD) or BioStudies (S-BSST) accessions.

tcga-bulk-data-preprocessing-with-omicverse

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Guide Claude through ingesting TCGA sample sheets, expression archives, and clinical carts into omicverse, initialising survival metadata, and exporting annotated AnnData files.

spatial-transcriptomics-tutorials-with-omicverse

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Guide users through omicverse's spatial transcriptomics tutorials covering preprocessing, deconvolution, and downstream modelling workflows across Visium, Visium HD, Stereo-seq, and Slide-seq datasets.

single-cell-preprocessing-with-omicverse

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Walk through omicverse's single-cell preprocessing tutorials to QC PBMC3k data, normalise counts, detect HVGs, and run PCA/embedding pipelines on CPU, CPU–GPU mixed, or GPU stacks.

single-cell-clustering-and-batch-correction-with-omicverse

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Guide Claude through omicverse's single-cell clustering workflow, covering preprocessing, QC, multimethod clustering, topic modeling, cNMF, and cross-batch integration as demonstrated in t_cluster.ipynb and t_single_batch.ipynb.

single-cell-annotation-skills-with-omicverse

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Guide Claude through SCSA, MetaTiME, CellVote, CellMatch, GPTAnno, and weighted KNN transfer workflows for annotating single-cell modalities.

cell-free-expression

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Guidance for cell-free protein synthesis (CFPS) optimization. Use when: (1) Planning CFPS experiments, (2) Troubleshooting low yield or aggregation, (3) Optimizing DNA template design for CFPS, (4) Expressing difficult proteins (disulfide-rich, toxic, membrane).

bulk-wgcna-analysis-with-omicverse

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Assist Claude in running PyWGCNA through omicverse—preprocessing expression matrices, constructing co-expression modules, visualising eigengenes, and extracting hub genes.

bulktrajblend-trajectory-interpolation

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Extend scRNA-seq developmental trajectories with BulkTrajBlend by generating intermediate cells from bulk RNA-seq, training beta-VAE and GNN models, and interpolating missing states.

bulk-rna-seq-deconvolution-with-bulk2single

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Turn bulk RNA-seq cohorts into synthetic single-cell datasets using omicverse's Bulk2Single workflow for cell fraction estimation, beta-VAE generation, and quality control comparisons against reference scRNA-seq.

string-protein-interaction-analysis-with-omicverse

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Help Claude query STRING for protein interactions, build PPI graphs with pyPPI, and render styled network figures for bulk gene lists.

bulk-rna-seq-deseq2-analysis-with-omicverse

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Walk Claude through PyDESeq2-based differential expression, including ID mapping, DE testing, fold-change thresholding, and enrichment visualisation.