single-cell-downstream-analysis

Checklist-style reference for OmicVerse downstream tutorials covering AUCell scoring, metacell DEG, and related exports.

1,802 stars

Best use case

single-cell-downstream-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Checklist-style reference for OmicVerse downstream tutorials covering AUCell scoring, metacell DEG, and related exports.

Teams using single-cell-downstream-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/single-downstream-analysis/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/single-downstream-analysis/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/single-downstream-analysis/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How single-cell-downstream-analysis Compares

Feature / Agentsingle-cell-downstream-analysisStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Checklist-style reference for OmicVerse downstream tutorials covering AUCell scoring, metacell DEG, and related exports.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Single-cell downstream analysis quick-reference

This skill sheet distills the OmicVerse single-cell downstream tutorials into an executable checklist. Each module
highlights **prerequisites**, the **core API entry points**, **interpretation checkpoints**, **resource planning notes**, and
any **optional validation or export steps** surfaced in the notebooks.

## AUCell pathway scoring (`t_aucell.ipynb`)
- **Prerequisites**
  - Download pathway collections (GO, KEGG, or custom) that match the organism under study before running the tutorial.
  - Ensure an `AnnData` object with clustering/embedding (`adata.obsm['X_umap']`) is prepared.
- **Core calls**
  - `ov.single.geneset_aucell` for one pathway; `ov.single.pathway_aucell` for multiple pathways.
  - `ov.single.pathway_aucell_enrichment` to score all pathways in a library (set `num_workers` for parallelism).
- **Result checks**
  - Interpret AUCell scores as expression-like values (0–1). Use `sc.pl.embedding` to confirm pathway activity patterns.
  - Run `sc.tl.rank_genes_groups` on the AUCell `AnnData` to find cluster-enriched pathways and visualize with
    `sc.pl.rank_genes_groups_dotplot`.
- **Resources**
  - Library-wide scoring can be CPU-intensive; allocate workers (`num_workers=8` in tutorial) and sufficient memory for the
    dense AUCell matrix.
- **Optional validation / exports**
  - Persist scores with `adata_aucs.write_h5ad('...')` for reuse.
  - Plot enriched pathways via `ov.single.pathway_enrichment` and `ov.single.pathway_enrichment_plot` heatmaps.

## scRNA-seq DEG (bulk-style meta cell) (`t_scdeg.ipynb`)
- **Prerequisites**
  - Run quality control and preprocessing (`ov.pp.qc`, `ov.pp.preprocess`, `ov.pp.scale`, `ov.pp.pca`).
  - Retain raw counts in `adata.raw` before HVG filtering.
- **Core calls**
  - Construct differential objects with `ov.bulk.pyDEG(test_adata.to_df(...).T)` for full-cell and metacell views.
  - Build metacells via `ov.single.MetaCell(..., use_gpu=True)` when GPU is available for acceleration.
- **Result checks**
  - Inspect volcano plots (`dds.plot_volcano`) and targeted boxplots (`dds.plot_boxplot`) for top DEGs.
  - Map DEG markers back to UMAP embeddings using `ov.utils.embedding` to confirm localization.
- **Resources**
  - Metacell construction benefits from GPU but can fall back to CPU; ensure enough memory for transposed dense matrices
    passed to `pyDEG`.
- **Optional validation / exports**
  - Save metacell embeddings with matplotlib figures; adjust `legend_*` settings for publication-ready visuals.

## scRNA-seq DEG (cell-type & composition) (`t_deg_single.ipynb`)
- **Prerequisites**
  - Annotated `adata` with `condition`, `cell_label`, and optional `batch` metadata.
  - Initialize mixed CPU/GPU resources when using graph-based DA methods (`ov.settings.cpu_gpu_mixed_init()`).
- **Core calls**
  - `ov.single.DEG(..., method='wilcoxon'|'t-test'|'memento-de')` with `deg_obj.run(...)` to target cell types.
  - `ov.single.DCT(..., method='sccoda'|'milo')` for differential composition testing.
  - Graph setup for Milo: `ov.pp.preprocess`, `ov.single.batch_correction`, `ov.pp.neighbors`, `ov.pp.umap`.
- **Result checks**
  - Review DEG tables from `deg_obj` (Wilcoxon / memento) and adjust capture rate / bootstraps for stability.
  - For scCODA, tune FDR via `sim_results.set_fdr()`; interpret boxplots with condition-level shifts.
  - Milo diagnostics: histogram of P-values, logFC vs –log10 FDR scatter, beeswarm of differential abundance.
- **Resources**
  - Memento and Milo require multiple CPUs (`num_cpus`, `num_boot`, high `k`); ensure adequate compute time.
  - Harmony/scVI batch correction needs GPU memory when enabled; plan for VRAM usage.
- **Optional validation / exports**
  - Visual diagnostics include UMAP overlays (`ov.pl.embedding`), Milo beeswarm plots, and custom color palettes.

## scDrug response prediction (`t_scdrug.ipynb`)
- **Prerequisites**
  - Fetch tumor-focused dataset (e.g., `infercnvpy.datasets.maynard2020_3k`).
  - Download reference assets **before** running predictions:
    - Gene annotations via `ov.utils.get_gene_annotation` (requires GTF from GENCODE or T2T-CHM13).
    - `ov.utils.download_GDSC_data()` and `ov.utils.download_CaDRReS_model()` for drug-response models.
    - Clone CaDRReS-Sc repo (`git clone https://github.com/CSB5/CaDRReS-Sc`).
- **Core calls**
  - Tumor resolution detection: `ov.single.autoResolution(adata, cpus=4)`.
  - Drug response runner: `ov.single.Drug_Response(adata, scriptpath='CaDRReS-Sc', modelpath='models/', output='result')`.
- **Result checks**
  - Inspect clustering and IC50 outputs stored under `output`; cross-reference with inferred CNV states.
- **Resources**
  - Requires external CaDRReS-Sc environment (Python/R dependencies) and storage for model downloads.
  - Running inferCNV preprocessing may need multiple CPUs and substantial RAM.
- **Optional validation / exports**
  - Persist intermediate `AnnData` (`adata.write('scanpyobj.h5ad')`) to reuse for downstream analyses or re-runs.

## SCENIC regulon discovery (`t_scenic.ipynb`)
- **Prerequisites**
  - Mouse hematopoiesis dataset loaded via `ov.single.mouse_hsc_nestorowa16()` (or provide preprocessed data with raw counts).
  - Download cisTarget ranking databases (`*.feather`) and motif annotations (`motifs-*.tbl`) for the species; allocate
    >3 GB disk space and verify paths (`db_glob`, `motif_path`).
- **Core calls**
  - Initialize analysis: `ov.single.SCENIC(adata, db_glob=..., motif_path=..., n_jobs=12)`.
  - Run RegDiffusion-based GRN inference, regulon pruning, and AUCell scoring via the SCENIC object methods.
- **Result checks**
  - Examine regulon activity matrices (`scenic_obj.auc_mtx.head()`), RSS scores, and embeddings colored by regulon activity.
  - Use RSS plots, dendrograms, and AUCell distributions to interpret TF specificity and activity thresholds.
- **Resources**
  - Multi-core CPU recommended (`n_jobs` matches available cores); ensure enough RAM for motif enrichment.
  - Large downloads and intermediate objects (pickle/h5ad) require disk space.
- **Optional validation / exports**
  - Save `scenic_obj` (`ov.utils.save`) and regulon AnnData (`regulon_ad.write`).
  - Optional plots: RSS per cell type, regulon embeddings, AUC histograms with threshold lines, GRN network visualizations.

## cNMF program discovery (`t_cnmf.ipynb`)
- **Prerequisites**
  - Preprocess with HVG selection (`ov.pp.preprocess`), scaling (`ov.pp.scale`), PCA, and have UMAP embeddings for inspection.
  - Select component range (e.g., `np.arange(5, 11)`) and iterations; ensure output directory exists.
- **Core calls**
  - Instantiate analysis: `ov.single.cNMF(..., output_dir='...', name='...')`.
  - Factorization workflow: `cnmf_obj.factorize(...)`, `cnmf_obj.combine(...)`, `cnmf_obj.k_selection_plot()`,
    `cnmf_obj.consensus(...)`.
  - Extract results: `cnmf_obj.load_results(...)`, `cnmf_obj.get_results(...)`, optional RF classifier via `get_results_rfc`.
- **Result checks**
  - Evaluate stability via K-selection plot and local density histogram; confirm chosen K with consensus heatmaps.
  - Inspect topic usage embeddings (`ov.pl.embedding`), cluster labels, and dotplots of top genes.
- **Resources**
  - Multiple iterations and components are CPU-heavy; consider distributing workers (`total_workers`) and verifying disk
    space for intermediate factorization files.
- **Optional validation / exports**
  - Visualizations include Euclidean distance heatmaps, density histograms, UMAP overlays for topics/clusters, and dotplots.

## NOCD overlapping communities (`t_nocd.ipynb`)
- **Prerequisites**
  - Prepare AnnData via `ov.single.scanpy_lazy` (automated preprocessing) before running NOCD.
  - Note: Tutorial warns NOCD implementation is under active development—expect variability.
- **Core calls**
  - Pipeline wrapper: `scbrca = ov.single.scnocd(adata)` followed by chained methods (`matrix_transform`, `matrix_normalize`,
    `GNN_configure`, `GNN_preprocess`, `GNN_model`, `GNN_result`, `GNN_plot`, `cal_nocd`, `calculate_nocd`).
- **Result checks**
  - Compare standard Leiden clusters versus NOCD outputs on UMAP embeddings to identify multi-fate cells.
- **Resources**
  - Graph neural network stages can be GPU-accelerated; ensure CUDA availability or be prepared for longer CPU runtimes.
  - Track memory usage when constructing large adjacency matrices.
- **Optional validation / exports**
  - Generate multiple UMAP overlays (`sc.pl.umap`) for `nocd`, `nocd_n`, and Leiden labels using shared color maps.

## Lazy pipeline & reporting (`t_lazy.ipynb`)
- **Prerequisites**
  - Install OmicVerse ≥1.7.0 with lazy utilities; supported species currently human/mouse.
  - Prepare batch metadata (`sample_key`) and optionally initialize hybrid compute (`ov.settings.cpu_gpu_mixed_init()`).
- **Core calls**
  - Turnkey preprocessing: `ov.single.lazy(adata, species='mouse', sample_key='batch', ...)` with optional `reforce_steps`
    and module-specific kwargs.
  - Reporting: `ov.single.generate_scRNA_report(...)` to build HTML summary; `ov.generate_reference_table(adata)` for
    citation tracking.
- **Result checks**
  - Inspect generated embeddings (`ov.pl.embedding`) for quality and annotation alignment.
  - Review HTML report for QC metrics, normalization, batch correction, and embeddings.
- **Resources**
  - Steps like Harmony or scVI may invoke GPU; confirm hardware availability or adjust `reforce_steps` accordingly.
  - Report generation writes to disk; ensure output path is writable.
- **Optional validation / exports**
  - Customize embeddings by color key; store HTML report and reference table alongside project documentation.

Related Skills

tooluniverse-variant-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Production-ready VCF processing, variant annotation, mutation analysis, and structural variant (SV/CNV) interpretation for bioinformatics questions. Parses VCF files (streaming, large files), classifies mutation types (missense, nonsense, synonymous, frameshift, splice, intronic, intergenic) and structural variants (deletions, duplications, inversions, translocations), applies VAF/depth/quality/consequence filters, annotates with ClinVar/dbSNP/gnomAD/CADD via ToolUniverse, interprets SV/CNV clinical significance using ClinGen dosage sensitivity scores, computes variant statistics, and generates reports. Solves questions like "What fraction of variants with VAF < 0.3 are missense?", "How many non-reference variants remain after filtering intronic/intergenic?", "What is the pathogenicity of this deletion affecting BRCA1?", or "Which dosage-sensitive genes overlap this CNV?". Use when processing VCF files, annotating variants, filtering by VAF/depth/consequence, classifying mutations, interpreting structural variants, assessing CNV pathogenicity, comparing cohorts, or answering variant analysis questions.

tooluniverse-structural-variant-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.

tooluniverse-spatial-omics-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Computational analysis framework for spatial multi-omics data integration. Given spatially variable genes (SVGs), spatial domain annotations, tissue type, and disease context from spatial transcriptomics/proteomics experiments (10x Visium, MERFISH, DBiTplus, SLIDE-seq, etc.), performs comprehensive biological interpretation including pathway enrichment, cell-cell interaction inference, druggable target identification, immune microenvironment characterization, and multi-modal integration. Produces a detailed markdown report with Spatial Omics Integration Score (0-100), domain-by-domain characterization, and validation recommendations. Uses 70+ ToolUniverse tools across 9 analysis phases. Use when users ask about spatial transcriptomics analysis, spatial omics interpretation, tissue heterogeneity, spatial gene expression patterns, tumor microenvironment mapping, tissue zonation, or cell-cell communication from spatial data.

tooluniverse-single-cell

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Production-ready single-cell and expression matrix analysis using scanpy, anndata, and scipy. Performs scRNA-seq QC, normalization, PCA, UMAP, Leiden/Louvain clustering, differential expression (Wilcoxon, t-test, DESeq2), cell type annotation, per-cell-type statistical analysis, gene-expression correlation, batch correction (Harmony), trajectory inference, and cell-cell communication analysis. NEW: Analyzes ligand-receptor interactions between cell types using OmniPath (CellPhoneDB, CellChatDB), scores communication strength, identifies signaling cascades, and handles multi-subunit receptor complexes. Integrates with ToolUniverse gene annotation tools (HPA, Ensembl, MyGene, UniProt) and enrichment tools (gseapy, PANTHER, STRING). Supports h5ad, 10X, CSV/TSV count matrices, and pre-annotated datasets. Use when analyzing single-cell RNA-seq data, studying cell-cell interactions, performing cell type differential expression, computing gene-expression correlations by cell type, analyzing tumor-immune communication, or answering questions about scRNA-seq datasets.

tooluniverse-proteomics-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Analyze mass spectrometry proteomics data including protein quantification, differential expression, post-translational modifications (PTMs), and protein-protein interactions. Processes MaxQuant, Spectronaut, DIA-NN, and other MS platform outputs. Performs normalization, statistical analysis, pathway enrichment, and integration with transcriptomics. Use when analyzing proteomics data, comparing protein abundance between conditions, identifying PTM changes, studying protein complexes, integrating protein and RNA data, discovering protein biomarkers, or conducting quantitative proteomics experiments.

protein-interaction-network-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Analyze protein-protein interaction networks using STRING, BioGRID, and SASBDB databases. Maps protein identifiers, retrieves interaction networks with confidence scores, performs functional enrichment analysis (GO/KEGG/Reactome), and optionally includes structural data. No API key required for core functionality (STRING). Use when analyzing protein networks, discovering interaction partners, identifying functional modules, or studying protein complexes.

tooluniverse-metabolomics-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Analyze metabolomics data including metabolite identification, quantification, pathway analysis, and metabolic flux. Processes LC-MS, GC-MS, NMR data from targeted and untargeted experiments. Performs normalization, statistical analysis, pathway enrichment, metabolite-enzyme integration, and biomarker discovery. Use when analyzing metabolomics datasets, identifying differential metabolites, studying metabolic pathways, integrating with transcriptomics/proteomics, discovering metabolic biomarkers, performing flux balance analysis, or characterizing metabolic phenotypes in disease, drug response, or physiological conditions.

tooluniverse-immune-repertoire-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Comprehensive immune repertoire analysis for T-cell and B-cell receptor sequencing data. Analyze TCR/BCR repertoires to assess clonality, diversity, V(D)J gene usage, CDR3 characteristics, convergence, and predict epitope specificity. Integrate with single-cell data for clonotype-phenotype associations. Use for adaptive immune response profiling, cancer immunotherapy research, vaccine response assessment, autoimmune disease studies, or repertoire diversity analysis in immunology research.

tooluniverse-image-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Production-ready microscopy image analysis and quantitative imaging data skill for colony morphometry, cell counting, fluorescence quantification, and statistical analysis of imaging-derived measurements. Processes ImageJ/CellProfiler output (area, circularity, intensity, cell counts), performs Dunnett's test, Cohen's d effect size, power analysis, Shapiro-Wilk normality tests, two-way ANOVA, polynomial regression, natural spline regression with confidence intervals, and comparative morphometry. Supports CSV/TSV measurement tables, multi-channel fluorescence data, colony swarming assays, and neuron counting datasets. Use when analyzing microscopy measurement data, colony area/circularity, cell count statistics, swarming assays, co-culture ratio optimization, or answering questions about imaging-derived quantitative data.

tooluniverse-crispr-screen-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Comprehensive CRISPR screen analysis for functional genomics. Analyze pooled or arrayed CRISPR screens (knockout, activation, interference) to identify essential genes, synthetic lethal interactions, and drug targets. Perform sgRNA count processing, gene-level scoring (MAGeCK, BAGEL), quality control, pathway enrichment, and drug target prioritization. Use for CRISPR screen analysis, gene essentiality studies, synthetic lethality detection, functional genomics, drug target validation, or identifying genetic vulnerabilities.

statistical-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Statistical analysis toolkit. Hypothesis tests (t-test, ANOVA, chi-square), regression, correlation, Bayesian stats, power analysis, assumption checks, APA reporting, for academic research.

single-trajectory-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Guide to reproducing OmicVerse trajectory workflows spanning PAGA, Palantir, VIA, velocity coupling, and fate scoring notebooks.