bio-single-cell-cell-annotation

Automated cell type annotation using reference-based methods including CellTypist, scPred, SingleR, and Azimuth for consistent, reproducible cell labeling. Use when automatically annotating cell types using reference datasets.

1,802 stars

Best use case

bio-single-cell-cell-annotation is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Automated cell type annotation using reference-based methods including CellTypist, scPred, SingleR, and Azimuth for consistent, reproducible cell labeling. Use when automatically annotating cell types using reference datasets.

Teams using bio-single-cell-cell-annotation should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-single-cell-cell-annotation/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-single-cell-cell-annotation/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/bio-single-cell-cell-annotation/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How bio-single-cell-cell-annotation Compares

Feature / Agentbio-single-cell-cell-annotationStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Automated cell type annotation using reference-based methods including CellTypist, scPred, SingleR, and Azimuth for consistent, reproducible cell labeling. Use when automatically annotating cell types using reference datasets.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## Version Compatibility

Reference examples tested with: pandas 2.2+, scanpy 1.10+, scikit-learn 1.4+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Automated Cell Type Annotation

## CellTypist (Python)

**Goal:** Automatically annotate cell types using a pre-trained or custom CellTypist model.

**Approach:** Load a reference model, predict cell types with majority voting for cluster-level consensus, and add predictions to AnnData.

**"Automatically label my cell types"** → Apply a trained classifier to assign cell type identities based on transcriptomic similarity to a reference atlas.

```python
import celltypist
import scanpy as sc

adata = sc.read_h5ad('adata_processed.h5ad')

# List available models
celltypist.models.models_description()

# Download model
celltypist.models.download_models(model='Immune_All_Low.pkl')

# Load model
model = celltypist.models.Model.load(model='Immune_All_Low.pkl')

# Predict cell types
predictions = celltypist.annotate(adata, model=model, majority_voting=True)

# Add predictions to adata
adata = predictions.to_adata()

# Access predictions
adata.obs['cell_type_celltypist'] = adata.obs['majority_voting']
adata.obs['cell_type_confidence'] = adata.obs['conf_score']

# Visualize
sc.pl.umap(adata, color=['cell_type_celltypist', 'conf_score'])
```

## CellTypist with Custom Model

**Goal:** Train a custom CellTypist model on a reference dataset for domain-specific annotation.

**Approach:** Train a logistic regression classifier on labeled reference data with feature selection, then apply to query data.

```python
# Train custom model
new_model = celltypist.train(adata_reference, labels='cell_type', n_jobs=10,
                              feature_selection=True, use_SGD=True)

# Save model
new_model.write('custom_model.pkl')

# Use custom model
predictions = celltypist.annotate(adata_query, model='custom_model.pkl')
```

## SingleR (R)

**Goal:** Annotate cell types by correlating expression profiles against curated reference datasets.

**Approach:** Compare each cell's expression to reference transcriptomes using SingleR's correlation-based assignment, with pruning for low-confidence calls.

```r
library(SingleR)
library(celldex)
library(Seurat)
library(SingleCellExperiment)

seurat_obj <- readRDS('seurat_processed.rds')
sce <- as.SingleCellExperiment(seurat_obj)

# Load reference (multiple available)
ref <- celldex::HumanPrimaryCellAtlasData()
# Other options:
# ref <- celldex::BlueprintEncodeData()
# ref <- celldex::MonacoImmuneData()
# ref <- celldex::ImmGenData()  # mouse

# Run SingleR
pred <- SingleR(test = sce, ref = ref, labels = ref$label.main, de.method = 'wilcox')

# Add to Seurat
seurat_obj$SingleR_labels <- pred$labels
seurat_obj$SingleR_pruned <- pred$pruned.labels

# Check annotation quality
plotScoreHeatmap(pred)
plotDeltaDistribution(pred)
```

## SingleR Fine Labels

```r
# Use fine-grained labels
pred_fine <- SingleR(test = sce, ref = ref, labels = ref$label.fine)

# Combine multiple references
ref1 <- celldex::BlueprintEncodeData()
ref2 <- celldex::MonacoImmuneData()
pred_combined <- SingleR(test = sce, ref = list(BP = ref1, Monaco = ref2),
                          labels = list(ref1$label.main, ref2$label.main))
```

## Azimuth (R/Seurat)

**Goal:** Annotate cell types using Seurat's Azimuth reference-mapping framework.

**Approach:** Map query cells onto a pre-built Azimuth reference atlas to transfer cell type labels with confidence scores.

```r
library(Seurat)
library(Azimuth)

seurat_obj <- readRDS('seurat_processed.rds')

# Run Azimuth with PBMC reference
seurat_obj <- RunAzimuth(seurat_obj, reference = 'pbmcref')

# Available references: pbmcref, bonemarrowref, lungref, etc.

# Access predictions
seurat_obj$azimuth_labels <- seurat_obj$predicted.celltype.l2
seurat_obj$azimuth_score <- seurat_obj$predicted.celltype.l2.score

# Visualize
DimPlot(seurat_obj, group.by = 'azimuth_labels', label = TRUE) + NoLegend()
FeaturePlot(seurat_obj, features = 'predicted.celltype.l2.score')
```

## scPred (R)

**Goal:** Train and apply a supervised classifier for cell type prediction using scPred.

**Approach:** Extract informative PCA features from a labeled reference, train an SVM/RF classifier, and predict cell types on query data.

```r
library(scPred)
library(Seurat)

# Train on reference
reference <- readRDS('reference_seurat.rds')
reference <- getFeatureSpace(reference, 'cell_type')
reference <- trainModel(reference)

# Get training probabilities
get_probabilities(reference)
get_scpred(reference)

# Plot model performance
plot_probabilities(reference)

# Predict on query
query <- readRDS('query_seurat.rds')
query <- scPredict(query, reference)

# Results
query$scpred_prediction
query$scpred_max
```

## Annotation Confidence Filtering

```python
# CellTypist: filter low confidence
high_conf = adata[adata.obs['conf_score'] > 0.5].copy()

# Flag uncertain cells
adata.obs['annotation_uncertain'] = adata.obs['conf_score'] < 0.3
```

```r
# SingleR: use pruned labels (low-quality removed)
seurat_obj$final_labels <- ifelse(is.na(pred$pruned.labels), 'Unknown', pred$labels)

# Azimuth: filter by score
seurat_obj$high_conf_labels <- ifelse(seurat_obj$predicted.celltype.l2.score > 0.7,
                                       seurat_obj$predicted.celltype.l2, 'Low_confidence')
```

## Consensus Annotation

**Goal:** Combine predictions from multiple annotation tools into a single consensus label per cell.

**Approach:** Aggregate labels from SingleR, Azimuth, and CellTypist using majority voting, flagging ambiguous cells where methods disagree.

```r
# Combine multiple methods
annotations <- data.frame(
    SingleR = seurat_obj$SingleR_labels,
    Azimuth = seurat_obj$azimuth_labels,
    CellTypist = seurat_obj$celltypist_labels
)

# Majority vote
get_consensus <- function(x) {
    tbl <- table(x)
    if (max(tbl) >= 2) names(which.max(tbl)) else 'Ambiguous'
}
seurat_obj$consensus_label <- apply(annotations, 1, get_consensus)
```

## Compare Annotations

**Goal:** Quantitatively assess agreement between different annotation methods.

**Approach:** Compute adjusted Rand index and normalized mutual information between label sets, and build a confusion matrix.

```python
import pandas as pd
from sklearn.metrics import adjusted_rand_score, normalized_mutual_info_score

# Compare two annotations
ari = adjusted_rand_score(adata.obs['manual_annotation'], adata.obs['celltypist'])
nmi = normalized_mutual_info_score(adata.obs['manual_annotation'], adata.obs['celltypist'])

# Confusion matrix
pd.crosstab(adata.obs['manual_annotation'], adata.obs['celltypist'])
```

## Marker-Based Validation

```r
# Validate predictions with known markers
canonical_markers <- list(
    T_cell = c('CD3D', 'CD3E', 'CD4', 'CD8A'),
    B_cell = c('CD19', 'MS4A1', 'CD79A'),
    Monocyte = c('CD14', 'LYZ', 'S100A8'),
    NK = c('NKG7', 'GNLY', 'NCAM1')
)

# Check marker expression per predicted type
DotPlot(seurat_obj, features = unlist(canonical_markers), group.by = 'predicted_labels') +
    RotatedAxis()
```

## Related Skills

- single-cell/clustering - Manual marker-based annotation
- single-cell/cell-communication - Use annotated types for CCC
- single-cell/trajectory-inference - Trajectory on annotated data

Related Skills

tooluniverse-single-cell

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Production-ready single-cell and expression matrix analysis using scanpy, anndata, and scipy. Performs scRNA-seq QC, normalization, PCA, UMAP, Leiden/Louvain clustering, differential expression (Wilcoxon, t-test, DESeq2), cell type annotation, per-cell-type statistical analysis, gene-expression correlation, batch correction (Harmony), trajectory inference, and cell-cell communication analysis. NEW: Analyzes ligand-receptor interactions between cell types using OmniPath (CellPhoneDB, CellChatDB), scores communication strength, identifies signaling cascades, and handles multi-subunit receptor complexes. Integrates with ToolUniverse gene annotation tools (HPA, Ensembl, MyGene, UniProt) and enrichment tools (gseapy, PANTHER, STRING). Supports h5ad, 10X, CSV/TSV count matrices, and pre-annotated datasets. Use when analyzing single-cell RNA-seq data, studying cell-cell interactions, performing cell type differential expression, computing gene-expression correlations by cell type, analyzing tumor-immune communication, or answering questions about scRNA-seq datasets.

single-trajectory-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Guide to reproducing OmicVerse trajectory workflows spanning PAGA, Palantir, VIA, velocity coupling, and fate scoring notebooks.

single2spatial-spatial-mapping

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Map scRNA-seq atlases onto spatial transcriptomics slides using omicverse's Single2Spatial workflow for deep-forest training, spot-level assessment, and marker visualisation.

single-cell-preprocessing-with-omicverse

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Walk through omicverse's single-cell preprocessing tutorials to QC PBMC3k data, normalise counts, detect HVGs, and run PCA/embedding pipelines on CPU, CPU–GPU mixed, or GPU stacks.

single-cell-multi-omics-integration

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Quick-reference sheet for OmicVerse tutorials spanning MOFA, GLUE pairing, SIMBA integration, TOSICA transfer, and StaVIA cartography.

single-cell-downstream-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Checklist-style reference for OmicVerse downstream tutorials covering AUCell scoring, metacell DEG, and related exports.

single-cell-clustering-and-batch-correction-with-omicverse

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Guide Claude through omicverse's single-cell clustering workflow, covering preprocessing, QC, multimethod clustering, topic modeling, cNMF, and cross-batch integration as demonstrated in t_cluster.ipynb and t_single_batch.ipynb.

single-cell-cellphonedb-communication-mapping

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Run omicverse's CellPhoneDB v5 wrapper on annotated single-cell data to infer ligand-receptor networks and produce CellChat-style visualisations.

single-cell-rna-qc

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Performs quality control on single-cell RNA-seq data (.h5ad or .h5 files) using scverse best practices with MAD-based filtering and comprehensive visualizations. Use when users request QC analysis, filtering low-quality cells, assessing data quality, or following scverse/scanpy best practices for single-cell analysis.

single-cell-annotation-skills-with-omicverse

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Guide Claude through SCSA, MetaTiME, CellVote, CellMatch, GPTAnno, and weighted KNN transfer workflows for annotating single-cell modalities.

cellxgene-census

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Query CZ CELLxGENE Census (61M+ cells). Filter by cell type/tissue/disease, retrieve expression data, integrate with scanpy/PyTorch, for population-scale single-cell analysis.

cell-free-expression

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Guidance for cell-free protein synthesis (CFPS) optimization. Use when: (1) Planning CFPS experiments, (2) Troubleshooting low yield or aggregation, (3) Optimizing DNA template design for CFPS, (4) Expressing difficult proteins (disulfide-rich, toxic, membrane).