scvi-tools

Deep learning for single-cell analysis using scvi-tools. This skill should be used when users need (1) data integration and batch correction with scVI/scANVI, (2) ATAC-seq analysis with PeakVI, (3) CITE-seq multi-modal analysis with totalVI, (4) multiome RNA+ATAC analysis with MultiVI, (5) spatial transcriptomics deconvolution with DestVI, (6) label transfer and reference mapping with scANVI/scArches, (7) RNA velocity with veloVI, or (8) any deep learning-based single-cell method. Triggers include mentions of scVI, scANVI, totalVI, PeakVI, MultiVI, DestVI, veloVI, sysVI, scArches, variational autoencoder, VAE, batch correction, data integration, multi-modal, CITE-seq, multiome, reference mapping, latent space.

10,671 stars

byanthropics

View on GitHub Installation ↓

Best use case

scvi-tools is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using scvi-tools should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/scvi-tools/SKILL.md --create-dirs "https://raw.githubusercontent.com/anthropics/knowledge-work-plugins/main/bio-research/skills/scvi-tools/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/scvi-tools/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How scvi-tools Compares

Feature / Agent	scvi-tools	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for ChatGPT

Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

SKILL.md Source

# scvi-tools Deep Learning Skill

This skill provides guidance for deep learning-based single-cell analysis using scvi-tools, the leading framework for probabilistic models in single-cell genomics.

## How to Use This Skill

1. Identify the appropriate workflow from the model/workflow tables below
2. Read the corresponding reference file for detailed steps and code
3. Use scripts in `scripts/` to avoid rewriting common code
4. For installation or GPU issues, consult `references/environment_setup.md`
5. For debugging, consult `references/troubleshooting.md`

## When to Use This Skill

- When scvi-tools, scVI, scANVI, or related models are mentioned
- When deep learning-based batch correction or integration is needed
- When working with multi-modal data (CITE-seq, multiome)
- When reference mapping or label transfer is required
- When analyzing ATAC-seq or spatial transcriptomics data
- When learning latent representations of single-cell data

## Model Selection Guide

| Data Type | Model | Primary Use Case |
|-----------|-------|------------------|
| scRNA-seq | **scVI** | Unsupervised integration, DE, imputation |
| scRNA-seq + labels | **scANVI** | Label transfer, semi-supervised integration |
| CITE-seq (RNA+protein) | **totalVI** | Multi-modal integration, protein denoising |
| scATAC-seq | **PeakVI** | Chromatin accessibility analysis |
| Multiome (RNA+ATAC) | **MultiVI** | Joint modality analysis |
| Spatial + scRNA reference | **DestVI** | Cell type deconvolution |
| RNA velocity | **veloVI** | Transcriptional dynamics |
| Cross-technology | **sysVI** | System-level batch correction |

## Workflow Reference Files

| Workflow | Reference File | Description |
|----------|---------------|-------------|
| Environment Setup | `references/environment_setup.md` | Installation, GPU, version info |
| Data Preparation | `references/data_preparation.md` | Formatting data for any model |
| scRNA Integration | `references/scrna_integration.md` | scVI/scANVI batch correction |
| ATAC-seq Analysis | `references/atac_peakvi.md` | PeakVI for accessibility |
| CITE-seq Analysis | `references/citeseq_totalvi.md` | totalVI for protein+RNA |
| Multiome Analysis | `references/multiome_multivi.md` | MultiVI for RNA+ATAC |
| Spatial Deconvolution | `references/spatial_deconvolution.md` | DestVI spatial analysis |
| Label Transfer | `references/label_transfer.md` | scANVI reference mapping |
| scArches Mapping | `references/scarches_mapping.md` | Query-to-reference mapping |
| Batch Correction | `references/batch_correction_sysvi.md` | Advanced batch methods |
| RNA Velocity | `references/rna_velocity_velovi.md` | veloVI dynamics |
| Troubleshooting | `references/troubleshooting.md` | Common issues and solutions |

## CLI Scripts

Modular scripts for common workflows. Chain together or modify as needed.

### Pipeline Scripts

| Script | Purpose | Usage |
|--------|---------|-------|
| `prepare_data.py` | QC, filter, HVG selection | `python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch` |
| `train_model.py` | Train any scvi-tools model | `python scripts/train_model.py prepared.h5ad results/ --model scvi` |
| `cluster_embed.py` | Neighbors, UMAP, Leiden | `python scripts/cluster_embed.py adata.h5ad results/` |
| `differential_expression.py` | DE analysis | `python scripts/differential_expression.py model/ adata.h5ad de.csv --groupby leiden` |
| `transfer_labels.py` | Label transfer with scANVI | `python scripts/transfer_labels.py ref_model/ query.h5ad results/` |
| `integrate_datasets.py` | Multi-dataset integration | `python scripts/integrate_datasets.py results/ data1.h5ad data2.h5ad` |
| `validate_adata.py` | Check data compatibility | `python scripts/validate_adata.py data.h5ad --batch-key batch` |

### Example Workflow

```bash
# 1. Validate input data
python scripts/validate_adata.py raw.h5ad --batch-key batch --suggest

# 2. Prepare data (QC, HVG selection)
python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch --n-hvgs 2000

# 3. Train model
python scripts/train_model.py prepared.h5ad results/ --model scvi --batch-key batch

# 4. Cluster and visualize
python scripts/cluster_embed.py results/adata_trained.h5ad results/ --resolution 0.8

# 5. Differential expression
python scripts/differential_expression.py results/model results/adata_clustered.h5ad results/de.csv --groupby leiden
```

### Python Utilities

The `scripts/model_utils.py` provides importable functions for custom workflows:

| Function | Purpose |
|----------|---------|
| `prepare_adata()` | Data preparation (QC, HVG, layer setup) |
| `train_scvi()` | Train scVI or scANVI |
| `evaluate_integration()` | Compute integration metrics |
| `get_marker_genes()` | Extract DE markers |
| `save_results()` | Save model, data, plots |
| `auto_select_model()` | Suggest best model |
| `quick_clustering()` | Neighbors + UMAP + Leiden |

## Critical Requirements

1. **Raw counts required**: scvi-tools models require integer count data
   ```python
   adata.layers["counts"] = adata.X.copy()  # Before normalization
   scvi.model.SCVI.setup_anndata(adata, layer="counts")
   ```

2. **HVG selection**: Use 2000-4000 highly variable genes
   ```python
   sc.pp.highly_variable_genes(adata, n_top_genes=2000, batch_key="batch", layer="counts", flavor="seurat_v3")
   adata = adata[:, adata.var['highly_variable']].copy()
   ```

3. **Batch information**: Specify batch_key for integration
   ```python
   scvi.model.SCVI.setup_anndata(adata, layer="counts", batch_key="batch")
   ```

## Quick Decision Tree

```
Need to integrate scRNA-seq data?
├── Have cell type labels? → scANVI (references/label_transfer.md)
└── No labels? → scVI (references/scrna_integration.md)

Have multi-modal data?
├── CITE-seq (RNA + protein)? → totalVI (references/citeseq_totalvi.md)
├── Multiome (RNA + ATAC)? → MultiVI (references/multiome_multivi.md)
└── scATAC-seq only? → PeakVI (references/atac_peakvi.md)

Have spatial data?
└── Need cell type deconvolution? → DestVI (references/spatial_deconvolution.md)

Have pre-trained reference model?
└── Map query to reference? → scArches (references/scarches_mapping.md)

Need RNA velocity?
└── veloVI (references/rna_velocity_velovi.md)

Strong cross-technology batch effects?
└── sysVI (references/batch_correction_sysvi.md)
```

## Key Resources

- [scvi-tools Documentation](https://docs.scvi-tools.org/)
- [scvi-tools Tutorials](https://docs.scvi-tools.org/en/stable/tutorials/index.html)
- [Model Hub](https://huggingface.co/scvi-tools)
- [GitHub Issues](https://github.com/scverse/scvi-tools/issues)

Related Skills

pipeline-review

10671

from anthropics/knowledge-work-plugins

Analyze pipeline health — prioritize deals, flag risks, get a weekly action plan. Use when running a weekly pipeline review, deciding which deals to focus on this week, spotting stale or stuck opportunities, auditing for hygiene issues like bad close dates, or identifying single-threaded deals.

forecast

10671

from anthropics/knowledge-work-plugins

Generate a weighted sales forecast with best/likely/worst scenarios, commit vs. upside breakdown, and gap analysis. Use when preparing a quarterly forecast call, assessing gap-to-quota from a pipeline CSV, deciding which deals to commit vs. call upside, or checking pipeline coverage against your number.

draft-outreach

10671

from anthropics/knowledge-work-plugins

Research a prospect then draft personalized outreach. Uses web research by default, supercharged with enrichment and CRM. Trigger with "draft outreach to [person/company]", "write cold email to [prospect]", "reach out to [name]".

daily-briefing

10671

from anthropics/knowledge-work-plugins

Start your day with a prioritized sales briefing. Works standalone when you tell me your meetings and priorities, supercharged when you connect your calendar, CRM, and email. Trigger with "morning briefing", "daily brief", "what's on my plate today", "prep my day", or "start my day".

create-an-asset

10671

from anthropics/knowledge-work-plugins

Generate tailored sales assets (landing pages, decks, one-pagers, workflow demos) from your deal context. Describe your prospect, audience, and goal — get a polished, branded asset ready to share with customers.

competitive-intelligence

10671

from anthropics/knowledge-work-plugins

Research your competitors and build an interactive battlecard. Outputs an HTML artifact with clickable competitor cards and a comparison matrix. Trigger with "competitive intel", "research competitors", "how do we compare to [competitor]", "battlecard for [competitor]", or "what's new with [competitor]".

call-summary

10671

from anthropics/knowledge-work-plugins

Process call notes or a transcript — extract action items, draft follow-up email, generate internal summary. Use when pasting rough notes or a transcript after a discovery, demo, or negotiation call, drafting a customer follow-up, logging the activity for your CRM, or capturing objections and next steps for your team.

update

10671

from anthropics/knowledge-work-plugins

Sync tasks and refresh memory from your current activity. Use when pulling new assignments from your project tracker into TASKS.md, triaging stale or overdue tasks, filling memory gaps for unknown people or projects, or running a comprehensive scan to catch todos buried in chat and email.

task-management

10671

from anthropics/knowledge-work-plugins

Simple task management using a shared TASKS.md file. Reference this when the user asks about their tasks, wants to add/complete tasks, or needs help tracking commitments.

memory-management

10671

from anthropics/knowledge-work-plugins

Two-tier memory system that makes Claude a true workplace collaborator. Decodes shorthand, acronyms, nicknames, and internal language so Claude understands requests like a colleague would. CLAUDE.md for working memory, memory/ directory for the full knowledge base.

write-spec

10671

from anthropics/knowledge-work-plugins

Write a feature spec or PRD from a problem statement or feature idea. Use when turning a vague idea or user request into a structured document, scoping a feature with goals and non-goals, defining success metrics and acceptance criteria, or breaking a big ask into a phased spec.

synthesize-research

10671

from anthropics/knowledge-work-plugins

Synthesize user research from interviews, surveys, and feedback into structured insights. Use when you have a pile of interview notes, survey responses, or support tickets to make sense of, need to extract themes and rank findings by frequency and impact, or want to turn raw feedback into roadmap recommendations.