bio-flow-cytometry-clustering-phenotyping

Unsupervised clustering and cell type identification for flow/mass cytometry. Covers FlowSOM, Phenograph, and CATALYST workflows. Use when discovering cell populations in high-dimensional cytometry data without predefined gates.

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

bio-flow-cytometry-clustering-phenotyping is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using bio-flow-cytometry-clustering-phenotyping should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-flow-cytometry-clustering-phenotyping/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-flow-cytometry-clustering-phenotyping/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bio-flow-cytometry-clustering-phenotyping/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bio-flow-cytometry-clustering-phenotyping Compares

Feature / Agent	bio-flow-cytometry-clustering-phenotyping	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## Version Compatibility

Reference examples tested with: FlowSOM 2.10+, scanpy 1.10+

Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Clustering and Phenotyping

**"Cluster my cytometry data to find cell types"** → Discover cell populations in high-dimensional flow/mass cytometry data using unsupervised clustering without predefined gates.
- R: `FlowSOM::FlowSOM()` for self-organizing map clustering
- R: `CATALYST::cluster()` with Phenograph or FlowSOM

## FlowSOM Clustering

**Goal:** Cluster cytometry events into cell populations using self-organizing maps.

**Approach:** Build a FlowSOM grid on marker channels, then extract metacluster assignments per cell.

```r
library(FlowSOM)

# Prepare data
expr <- exprs(fcs)
marker_cols <- grep('CD|HLA', colnames(fcs), value = TRUE)

# Build SOM
fsom <- FlowSOM(fcs,
                colsToUse = marker_cols,
                xdim = 10, ydim = 10,
                nClus = 20,
                seed = 42)

# Get cluster assignments
clusters <- GetMetaclusters(fsom)

# Add to flowFrame
exprs(fcs) <- cbind(exprs(fcs), cluster = clusters)
```

## CATALYST Workflow (Full Pipeline)

**Goal:** Run the complete CATALYST clustering pipeline from flowSet to annotated cell populations.

**Approach:** Convert flowSet to SingleCellExperiment with prepData, then cluster on type markers with FlowSOM via CATALYST.

```r
library(CATALYST)
library(SingleCellExperiment)

# Create SCE from flowSet
sce <- prepData(fs, panel, md, transform = TRUE, cofactor = 5)

# Clustering
sce <- cluster(sce,
               features = 'type',  # Use 'type' markers from panel
               xdim = 10, ydim = 10,
               maxK = 20,
               seed = 42)

# View cluster assignments
table(cluster_ids(sce, 'meta20'))
```

## Phenograph Clustering

**Goal:** Identify cell populations using graph-based community detection on marker expression.

**Approach:** Build a k-nearest-neighbor graph on type markers, then partition with Louvain community detection via Rphenograph.

```r
library(Rphenograph)

# Extract expression matrix
expr <- assay(sce, 'exprs')

# Run Phenograph
pheno_result <- Rphenograph(t(expr[rowData(sce)$marker_class == 'type', ]), k = 30)

# Get clusters
sce$phenograph <- factor(membership(pheno_result[[2]]))
```

## Dimensionality Reduction

**Goal:** Project high-dimensional cytometry data into 2D for visualization of cell populations.

**Approach:** Run UMAP or tSNE on type marker channels using CATALYST's runDR wrapper, then plot colored by cluster.

```r
# UMAP
sce <- runDR(sce, dr = 'UMAP', features = 'type')

# tSNE
sce <- runDR(sce, dr = 'TSNE', features = 'type')

# Plot
plotDR(sce, 'UMAP', color_by = 'meta20')
```

## Cluster Annotation

**Goal:** Assign cell type labels to clusters based on marker expression profiles.

**Approach:** Visualize median marker expression per cluster with a heatmap, then map cluster IDs to cell type names.

```r
# Heatmap of marker expression by cluster
plotExprHeatmap(sce, features = 'type',
                by = 'cluster_id', k = 'meta20',
                scale = 'first', row_anno = FALSE)

# Manual annotation
cluster_annotation <- c(
    '1' = 'CD4 T cells',
    '2' = 'CD8 T cells',
    '3' = 'B cells',
    '4' = 'NK cells',
    '5' = 'Monocytes'
)

sce$cell_type <- cluster_annotation[as.character(cluster_ids(sce, 'meta20'))]
```

## Cluster Merging

**Goal:** Reduce overclustering by merging similar clusters into biologically meaningful groups.

**Approach:** Define a mapping table from original to merged cluster IDs, then apply with CATALYST's mergeClusters.

```r
# Merge similar clusters
merging_table <- data.frame(
    original = 1:20,
    merged = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5,
               6, 6, 7, 7, 8, 8, 9, 9, 10, 10)
)

sce <- mergeClusters(sce, k = 'meta20', table = merging_table, id = 'merged')
```

## Abundance Analysis (per sample)

**Goal:** Quantify the relative frequency of each cell population across samples and conditions.

**Approach:** Cross-tabulate cluster assignments by sample ID, convert to proportions, and plot grouped by condition.

```r
# Cluster frequencies per sample
abundances <- table(cluster_ids(sce, 'meta20'), sce$sample_id)
freq <- prop.table(abundances, margin = 2)

# Plot
plotAbundances(sce, k = 'meta20', by = 'cluster_id', group_by = 'condition')
```

## Marker Expression Summary

**Goal:** Summarize and compare marker expression levels across clusters and conditions.

**Approach:** Plot per-cluster median expression with CATALYST's plotClusterExprs and pseudo-bulk expression faceted by cluster.

```r
# Median expression per cluster
plotClusterExprs(sce, k = 'meta20', features = 'type')

# Expression by cluster and condition
plotPbExprs(sce, k = 'meta20', features = 'type', facet_by = 'cluster_id')
```

## Export Results

**Goal:** Save clustering results and annotated SCE object for downstream analysis or sharing.

**Approach:** Extract cluster assignments into colData, export as CSV, and serialize the full SCE as RDS.

```r
# Add cluster info to metadata
colData(sce)$cluster <- cluster_ids(sce, 'meta20')

# Export to CSV
results <- as.data.frame(colData(sce))
write.csv(results, 'clustering_results.csv', row.names = FALSE)

# Save SCE
saveRDS(sce, 'sce_clustered.rds')
```

## Choosing Number of Clusters

**Goal:** Determine the optimal number of metaclusters for the dataset.

**Approach:** Compare normalized reduction stability (NRS) plots and heatmaps at different K values to find where clusters remain distinct.

```r
# Delta area plot
plotNRS(sce, features = 'type')

# Or visual inspection of heatmap at different K
plotExprHeatmap(sce, features = 'type', by = 'cluster_id', k = 'meta10')
plotExprHeatmap(sce, features = 'type', by = 'cluster_id', k = 'meta20')
```

## Batch Integration

**Goal:** Remove batch effects from cytometry data before or after clustering.

**Approach:** Detect batch effects by coloring UMAP by batch variable, then apply MNN correction with batchelor if needed.

```r
# If batch effects present
library(batchelor)

sce <- runDR(sce, dr = 'UMAP', features = 'type')

# Check for batch effects
plotDR(sce, 'UMAP', color_by = 'batch')

# MNN correction if needed
sce_corrected <- fastMNN(sce, batch = sce$batch)
```

## Related Skills

- gating-analysis - Manual alternative
- differential-analysis - Compare clusters between conditions
- single-cell/clustering - Similar concepts for scRNA-seq

Related Skills

single-cell-clustering-and-batch-correction-with-omicverse

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Guide Claude through omicverse's single-cell clustering workflow, covering preprocessing, QC, multimethod clustering, topic modeling, cNMF, and cross-batch integration as demonstrated in t_cluster.ipynb and t_single_batch.ipynb.

protein-design-workflow

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

End-to-end guidance for protein design pipelines. Use this skill when: (1) Starting a new protein design project, (2) Need step-by-step workflow guidance, (3) Understanding the full design pipeline, (4) Planning compute resources and timelines, (5) Integrating multiple design tools. For tool selection, use binder-design. For QC thresholds, use protein-qc.

nextflow-development

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Run nf-core bioinformatics pipelines (rnaseq, sarek, atacseq) on sequencing data. Use when analyzing RNA-seq, WGS/WES, or ATAC-seq data—either local FASTQs or public datasets from GEO/SRA. Triggers on nf-core, Nextflow, FASTQ analysis, variant calling, gene expression, differential expression, GEO reanalysis, GSE/GSM/SRR accessions, or samplesheet creation.

flowio

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Parse FCS (Flow Cytometry Standard) files v2.0-3.1. Extract events as NumPy arrays, read metadata/channels, convert to CSV/DataFrame, for flow cytometry data preprocessing.

bio-single-cell-clustering

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Dimensionality reduction and clustering for single-cell RNA-seq using Seurat (R) and Scanpy (Python). Use for running PCA, computing neighbors, clustering with Leiden/Louvain algorithms, generating UMAP/tSNE embeddings, and visualizing clusters. Use when performing dimensionality reduction and clustering on single-cell data.

bio-read-qc-fastp-workflow

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

All-in-one read preprocessing with fastp including adapter trimming, quality filtering, deduplication, base correction, and HTML report generation. Use when preprocessing Illumina data and wanting a single fast tool instead of separate Cutadapt, Trimmomatic, and FastQC steps.

bio-microbiome-qiime2-workflow

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

QIIME2 command-line workflow for 16S/ITS amplicon analysis. Alternative to DADA2/phyloseq R workflow with built-in provenance tracking. Use when preferring CLI over R, needing reproducible provenance, or working within QIIME2 ecosystem.

bio-imaging-mass-cytometry-spatial-analysis

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Spatial analysis of cell neighborhoods and interactions in IMC data. Covers neighbor graphs, spatial statistics, and interaction testing. Use when analyzing spatial relationships between cell types, testing for neighborhood enrichment, or identifying cell-cell interaction patterns in imaging mass cytometry data.

bio-imaging-mass-cytometry-quality-metrics

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Quality metrics for IMC data including signal-to-noise, channel correlation, tissue integrity, and acquisition QC. Use when assessing data quality before analysis or troubleshooting problematic acquisitions.

bio-imaging-mass-cytometry-phenotyping

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Cell type assignment from marker expression in IMC data. Covers manual gating, clustering, and automated classification approaches. Use when assigning cell types to segmented IMC cells based on protein marker expression or when phenotyping cells in multiplexed imaging data.

bio-imaging-mass-cytometry-interactive-annotation

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Interactive cell type annotation for IMC data. Covers napari-based annotation, marker-guided labeling, training data generation, and annotation validation. Use when manually annotating cell types for training classifiers or validating automated phenotyping results.

bio-imaging-mass-cytometry-data-preprocessing

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Load and preprocess imaging mass cytometry (IMC) and MIBI data. Covers MCD/TIFF handling, hot pixel removal, and image normalization. Use when starting IMC analysis from raw MCD files or preparing images for segmentation.