bio-single-cell-markers-annotation

Find marker genes and annotate cell types in single-cell RNA-seq using Seurat (R) and Scanpy (Python). Use for differential expression between clusters, identifying cluster-specific markers, scoring gene sets, and assigning cell type labels. Use when finding marker genes and annotating clusters.

1,802 stars

Best use case

bio-single-cell-markers-annotation is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Find marker genes and annotate cell types in single-cell RNA-seq using Seurat (R) and Scanpy (Python). Use for differential expression between clusters, identifying cluster-specific markers, scoring gene sets, and assigning cell type labels. Use when finding marker genes and annotating clusters.

Teams using bio-single-cell-markers-annotation should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-single-cell-markers-annotation/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-single-cell-markers-annotation/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/bio-single-cell-markers-annotation/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How bio-single-cell-markers-annotation Compares

Feature / Agentbio-single-cell-markers-annotationStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Find marker genes and annotate cell types in single-cell RNA-seq using Seurat (R) and Scanpy (Python). Use for differential expression between clusters, identifying cluster-specific markers, scoring gene sets, and assigning cell type labels. Use when finding marker genes and annotating clusters.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

## Version Compatibility

Reference examples tested with: DESeq2 1.42+, pandas 2.2+, scanpy 1.10+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Marker Genes and Cell Type Annotation

Find differentially expressed genes between clusters and annotate cell types.

## Scanpy (Python)

**Goal:** Identify cluster-specific marker genes, score gene sets, and annotate cell types using Scanpy.

**Approach:** Perform differential expression testing between clusters with Wilcoxon rank-sum, visualize markers with dot plots and heatmaps, and assign cell type labels manually.

**"Find marker genes for each cluster"** → Test each cluster against all others for differentially expressed genes and rank by statistical significance and fold change.

### Required Imports

```python
import scanpy as sc
import pandas as pd
```

### Find Markers for All Clusters

```python
# Find marker genes for each cluster vs all others
sc.tl.rank_genes_groups(adata, groupby='leiden', method='wilcoxon')

# View top markers
sc.pl.rank_genes_groups(adata, n_genes=10, sharey=False)

# Get results as DataFrame
markers = sc.get.rank_genes_groups_df(adata, group=None)
print(markers.head(20))
```

### Marker Detection Methods

```python
# Wilcoxon rank-sum test (default, recommended)
sc.tl.rank_genes_groups(adata, groupby='leiden', method='wilcoxon')

# t-test
sc.tl.rank_genes_groups(adata, groupby='leiden', method='t-test')

# Logistic regression
sc.tl.rank_genes_groups(adata, groupby='leiden', method='logreg')
```

### Filter Markers

```python
# Get markers with filters
markers = sc.get.rank_genes_groups_df(adata, group='0')
significant = markers[(markers['pvals_adj'] < 0.05) & (markers['logfoldchanges'] > 1)]
print(f'Cluster 0 significant markers: {len(significant)}')

# Filter all groups
sc.tl.filter_rank_genes_groups(adata, min_fold_change=1.5, min_in_group_fraction=0.25)
```

### Compare Specific Clusters

```python
# Find markers between two specific clusters
sc.tl.rank_genes_groups(adata, groupby='leiden', groups=['0'], reference='1', method='wilcoxon')
sc.pl.rank_genes_groups(adata, n_genes=10)
```

### Visualize Marker Expression

```python
# Dot plot of top markers per cluster
markers_to_plot = ['CD3D', 'CD8A', 'MS4A1', 'CD14', 'FCGR3A', 'NKG7']
sc.pl.dotplot(adata, var_names=markers_to_plot, groupby='leiden')

# Stacked violin
sc.pl.stacked_violin(adata, var_names=markers_to_plot, groupby='leiden')

# Heatmap
sc.pl.rank_genes_groups_heatmap(adata, n_genes=5, groupby='leiden')

# Matrix plot
sc.pl.matrixplot(adata, var_names=markers_to_plot, groupby='leiden')
```

### Gene Set Scoring

```python
# Score cells for gene set expression
t_cell_genes = ['CD3D', 'CD3E', 'CD4', 'CD8A', 'CD8B']
sc.tl.score_genes(adata, gene_list=t_cell_genes, score_name='T_cell_score')

# Visualize score
sc.pl.umap(adata, color='T_cell_score')
```

### Cell Cycle Scoring

```python
# Score cell cycle phases
s_genes = ['MCM5', 'PCNA', 'TYMS', 'FEN1', 'MCM2']  # S phase genes
g2m_genes = ['HMGB2', 'CDK1', 'NUSAP1', 'UBE2C', 'BIRC5']  # G2/M genes

sc.tl.score_genes_cell_cycle(adata, s_genes=s_genes, g2m_genes=g2m_genes)
sc.pl.umap(adata, color=['S_score', 'G2M_score', 'phase'])
```

### Manual Cell Type Annotation

```python
# Create annotation dictionary
cluster_annotations = {
    '0': 'CD4 T cells',
    '1': 'CD14 Monocytes',
    '2': 'B cells',
    '3': 'CD8 T cells',
    '4': 'NK cells',
    '5': 'FCGR3A Monocytes'
}

# Add annotations
adata.obs['cell_type'] = adata.obs['leiden'].map(cluster_annotations)

# Visualize
sc.pl.umap(adata, color='cell_type')
```

### Export Markers

```python
# Export all markers to CSV
markers = sc.get.rank_genes_groups_df(adata, group=None)
markers.to_csv('all_markers.csv', index=False)

# Export top markers per cluster
top_markers = markers.groupby('group').head(20)
top_markers.to_csv('top_markers.csv', index=False)
```

---

## Seurat (R)

**Goal:** Identify cluster-specific marker genes, score gene modules, and annotate cell types using Seurat.

**Approach:** Run FindAllMarkers with Wilcoxon or MAST tests, visualize with FeaturePlot/DotPlot/DoHeatmap, and rename cluster identities with cell type labels.

### Required Libraries

```r
library(Seurat)
library(dplyr)
```

### Find All Markers

```r
# Find markers for all clusters
all_markers <- FindAllMarkers(seurat_obj, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)

# View top markers per cluster
top_markers <- all_markers %>%
    group_by(cluster) %>%
    slice_max(n = 5, order_by = avg_log2FC)
print(top_markers)
```

### Find Markers for Specific Cluster

```r
# Markers for cluster 0 vs all others
cluster0_markers <- FindMarkers(seurat_obj, ident.1 = 0, min.pct = 0.25)
head(cluster0_markers)
```

### Compare Two Clusters

```r
# Find markers between two specific clusters
markers_0_vs_1 <- FindMarkers(seurat_obj, ident.1 = 0, ident.2 = 1, min.pct = 0.25)
head(markers_0_vs_1)
```

### Marker Detection Methods

```r
# Wilcoxon (default, fast)
markers <- FindMarkers(seurat_obj, ident.1 = 0, test.use = 'wilcox')

# MAST (recommended for DE)
markers <- FindMarkers(seurat_obj, ident.1 = 0, test.use = 'MAST')

# DESeq2
markers <- FindMarkers(seurat_obj, ident.1 = 0, test.use = 'DESeq2')

# Logistic regression
markers <- FindMarkers(seurat_obj, ident.1 = 0, test.use = 'LR')
```

### Visualize Markers

```r
# Feature plot on UMAP
FeaturePlot(seurat_obj, features = c('CD3D', 'MS4A1', 'CD14', 'NKG7'))

# Violin plot
VlnPlot(seurat_obj, features = c('CD3D', 'MS4A1', 'CD14'))

# Dot plot
markers_to_plot <- c('CD3D', 'CD8A', 'MS4A1', 'CD14', 'FCGR3A', 'NKG7')
DotPlot(seurat_obj, features = markers_to_plot) + RotatedAxis()

# Heatmap
top10 <- all_markers %>%
    group_by(cluster) %>%
    top_n(n = 10, wt = avg_log2FC)
DoHeatmap(seurat_obj, features = top10$gene)
```

### Gene Module Scoring

```r
# Score cells for gene set
t_cell_genes <- list(c('CD3D', 'CD3E', 'CD4', 'CD8A', 'CD8B'))
seurat_obj <- AddModuleScore(seurat_obj, features = t_cell_genes, name = 'T_cell_score')

# Visualize
FeaturePlot(seurat_obj, features = 'T_cell_score1')
```

### Cell Cycle Scoring

```r
# Built-in cell cycle genes
s.genes <- cc.genes$s.genes
g2m.genes <- cc.genes$g2m.genes

seurat_obj <- CellCycleScoring(seurat_obj, s.features = s.genes, g2m.features = g2m.genes)
DimPlot(seurat_obj, group.by = 'Phase')
```

### Manual Cell Type Annotation

```r
# Rename cluster identities
new_cluster_ids <- c(
    '0' = 'CD4 T cells',
    '1' = 'CD14 Monocytes',
    '2' = 'B cells',
    '3' = 'CD8 T cells',
    '4' = 'NK cells',
    '5' = 'FCGR3A Monocytes'
)

seurat_obj <- RenameIdents(seurat_obj, new_cluster_ids)
DimPlot(seurat_obj, reduction = 'umap', label = TRUE)

# Store in metadata
seurat_obj$cell_type <- Idents(seurat_obj)
```

### Export Markers

```r
# Export to CSV
write.csv(all_markers, file = 'all_markers.csv', row.names = FALSE)

# Export top markers
write.csv(top_markers, file = 'top_markers.csv', row.names = FALSE)
```

---

## Common PBMC Markers

| Cell Type | Markers |
|-----------|---------|
| CD4 T cells | CD3D, CD4, IL7R |
| CD8 T cells | CD3D, CD8A, CD8B |
| B cells | MS4A1, CD79A, CD19 |
| NK cells | NKG7, GNLY, NCAM1 |
| CD14 Monocytes | CD14, LYZ, S100A8 |
| FCGR3A Monocytes | FCGR3A, MS4A7 |
| Dendritic cells | FCER1A, CST3 |
| Platelets | PPBP, PF4 |

## Method Comparison

| Task | Scanpy | Seurat |
|------|--------|--------|
| All markers | `rank_genes_groups()` | `FindAllMarkers()` |
| Specific cluster | `rank_genes_groups(groups=['0'])` | `FindMarkers(ident.1=0)` |
| Two clusters | `rank_genes_groups(reference='1')` | `FindMarkers(ident.1=0, ident.2=1)` |
| Gene scoring | `score_genes()` | `AddModuleScore()` |
| Dot plot | `sc.pl.dotplot()` | `DotPlot()` |

## Related Skills

- clustering - Must cluster before finding markers
- preprocessing - Data must be normalized
- data-io - Export annotated data

Related Skills

tooluniverse-single-cell

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Production-ready single-cell and expression matrix analysis using scanpy, anndata, and scipy. Performs scRNA-seq QC, normalization, PCA, UMAP, Leiden/Louvain clustering, differential expression (Wilcoxon, t-test, DESeq2), cell type annotation, per-cell-type statistical analysis, gene-expression correlation, batch correction (Harmony), trajectory inference, and cell-cell communication analysis. NEW: Analyzes ligand-receptor interactions between cell types using OmniPath (CellPhoneDB, CellChatDB), scores communication strength, identifies signaling cascades, and handles multi-subunit receptor complexes. Integrates with ToolUniverse gene annotation tools (HPA, Ensembl, MyGene, UniProt) and enrichment tools (gseapy, PANTHER, STRING). Supports h5ad, 10X, CSV/TSV count matrices, and pre-annotated datasets. Use when analyzing single-cell RNA-seq data, studying cell-cell interactions, performing cell type differential expression, computing gene-expression correlations by cell type, analyzing tumor-immune communication, or answering questions about scRNA-seq datasets.

single-trajectory-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Guide to reproducing OmicVerse trajectory workflows spanning PAGA, Palantir, VIA, velocity coupling, and fate scoring notebooks.

single2spatial-spatial-mapping

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Map scRNA-seq atlases onto spatial transcriptomics slides using omicverse's Single2Spatial workflow for deep-forest training, spot-level assessment, and marker visualisation.

single-cell-preprocessing-with-omicverse

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Walk through omicverse's single-cell preprocessing tutorials to QC PBMC3k data, normalise counts, detect HVGs, and run PCA/embedding pipelines on CPU, CPU–GPU mixed, or GPU stacks.

single-cell-multi-omics-integration

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Quick-reference sheet for OmicVerse tutorials spanning MOFA, GLUE pairing, SIMBA integration, TOSICA transfer, and StaVIA cartography.

single-cell-downstream-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Checklist-style reference for OmicVerse downstream tutorials covering AUCell scoring, metacell DEG, and related exports.

single-cell-clustering-and-batch-correction-with-omicverse

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Guide Claude through omicverse's single-cell clustering workflow, covering preprocessing, QC, multimethod clustering, topic modeling, cNMF, and cross-batch integration as demonstrated in t_cluster.ipynb and t_single_batch.ipynb.

single-cell-cellphonedb-communication-mapping

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Run omicverse's CellPhoneDB v5 wrapper on annotated single-cell data to infer ligand-receptor networks and produce CellChat-style visualisations.

single-cell-rna-qc

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Performs quality control on single-cell RNA-seq data (.h5ad or .h5 files) using scverse best practices with MAD-based filtering and comprehensive visualizations. Use when users request QC analysis, filtering low-quality cells, assessing data quality, or following scverse/scanpy best practices for single-cell analysis.

single-cell-annotation-skills-with-omicverse

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Guide Claude through SCSA, MetaTiME, CellVote, CellMatch, GPTAnno, and weighted KNN transfer workflows for annotating single-cell modalities.

cellxgene-census

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Query CZ CELLxGENE Census (61M+ cells). Filter by cell type/tissue/disease, retrieve expression data, integrate with scanpy/PyTorch, for population-scale single-cell analysis.

cell-free-expression

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Guidance for cell-free protein synthesis (CFPS) optimization. Use when: (1) Planning CFPS experiments, (2) Troubleshooting low yield or aggregation, (3) Optimizing DNA template design for CFPS, (4) Expressing difficult proteins (disulfide-rich, toxic, membrane).