bio-flow-cytometry-doublet-detection

Detect and remove doublets from flow and mass cytometry data. Covers FSC/SSC gating and computational doublet detection methods. Use when filtering out cell aggregates before clustering or quantitative analysis.

1,802 stars

byFreedomIntelligence

View on GitHub Installation ↓

Best use case

bio-flow-cytometry-doublet-detection is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using bio-flow-cytometry-doublet-detection should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-flow-cytometry-doublet-detection/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-flow-cytometry-doublet-detection/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bio-flow-cytometry-doublet-detection/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bio-flow-cytometry-doublet-detection Compares

Feature / Agent	bio-flow-cytometry-doublet-detection	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## Version Compatibility

Reference examples tested with: flowCore 2.14+, ggplot2 3.5+

Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Doublet Detection

**"Remove doublets from my flow cytometry data"** → Detect and filter out cell aggregates using FSC-A/FSC-H gating or computational methods before clustering or quantitative analysis.
- R: `flowCore` rectangular gates on FSC-A vs FSC-H

## FSC-A vs FSC-H Gating (Standard Method)

```r
library(flowCore)
library(ggcyto)

# Load data
fs <- read.flowSet(list.files('data/', pattern = '\\.fcs$', full.names = TRUE))

# FSC-A vs FSC-H for doublet discrimination
# Singlets fall on diagonal, doublets have higher FSC-A for given FSC-H

# Manual rectangular gate
singlet_gate <- rectangleGate(
    filterId = 'singlets',
    'FSC-A' = c(50000, 250000),
    'FSC-H' = c(50000, 250000)
)

# Or use polygon gate for diagonal
singlet_polygon <- polygonGate(
    filterId = 'singlets',
    .gate = data.frame(
        'FSC-A' = c(50000, 250000, 250000, 50000),
        'FSC-H' = c(40000, 200000, 260000, 60000)
    )
)

# Apply gate
singlets <- Subset(fs, singlet_gate)

# Visualize
autoplot(fs[[1]], 'FSC-A', 'FSC-H') + geom_gate(singlet_gate)
```

## Automated Singlet Gating with flowDensity

```r
library(flowDensity)

# Automatic singlet gate
singlet_result <- flowDensity(
    fs[[1]],
    channels = c('FSC-A', 'FSC-H'),
    position = c(TRUE, TRUE),
    gates = c(NA, NA)
)

# Get gated population
singlets <- getflowFrame(singlet_result)

# Percentage singlets
pct_singlets <- nrow(singlets) / nrow(fs[[1]]) * 100
cat('Singlets:', round(pct_singlets, 1), '%\n')
```

## flowAI Quality Control

```r
library(flowAI)

# flowAI performs comprehensive QC including:
# - Flow rate anomaly detection
# - Signal acquisition anomaly detection
# - Dynamic range anomaly detection

# Run flowAI
fs_qc <- flow_auto_qc(
    fs,
    folder_results = 'flowAI_results',
    fcs_QC = TRUE,
    fcs_highQ = TRUE
)

# Results include singlet detection based on flow rate stability
```

## FSC-A/FSC-W Method (Width Parameter)

```r
# Some instruments provide FSC-W (width) instead of FSC-H
# FSC-A = FSC-H × FSC-W
# Doublets have higher width

if ('FSC-W' %in% colnames(fs[[1]])) {
    singlet_gate_w <- rectangleGate(
        filterId = 'singlets',
        'FSC-A' = c(50000, 250000),
        'FSC-W' = c(50000, 100000)  # Lower width = singlets
    )

    singlets <- Subset(fs, singlet_gate_w)
}
```

## Ratio-Based Doublet Detection

```r
# Calculate FSC-A/FSC-H ratio
# Singlets have ratio close to constant (based on pulse geometry)
# Doublets have elevated ratio

calculate_fsc_ratio <- function(ff) {
    fsc_a <- exprs(ff)[, 'FSC-A']
    fsc_h <- exprs(ff)[, 'FSC-H']

    ratio <- fsc_a / (fsc_h + 1)  # Add small value to avoid division by zero
    return(ratio)
}

# Add ratio as derived parameter
for (i in 1:length(fs)) {
    ratio <- calculate_fsc_ratio(fs[[i]])
    fs[[i]] <- cbind2(fs[[i]], ratio)
    colnames(fs[[i]])[ncol(fs[[i]])] <- 'FSC_ratio'
}

# Gate on ratio
ratio_cutoff <- quantile(exprs(fs[[1]])[, 'FSC_ratio'], 0.95)
singlet_gate_ratio <- rectangleGate(filterId = 'singlets', 'FSC_ratio' = c(0, ratio_cutoff))
```

## SSC-Based Doublet Detection

```r
# For cell types where FSC doesn't discriminate well,
# use SSC-A vs SSC-H additionally

ssc_singlet_gate <- rectangleGate(
    filterId = 'ssc_singlets',
    'SSC-A' = c(10000, 200000),
    'SSC-H' = c(10000, 200000)
)

# Combine FSC and SSC gates
combined_gate <- singlet_gate & ssc_singlet_gate
singlets <- Subset(fs, combined_gate)
```

## CyTOF Doublet Detection

```r
library(CATALYST)

# For CyTOF data, use DNA channels or event length

# DNA-based doublet detection (if DNA channels present)
# Doublets have ~2x DNA content
sce <- prepData(fs, panel, md)

# If Event_length channel exists
if ('Event_length' %in% rownames(sce)) {
    event_length <- assay(sce)['Event_length', ]
    singlet_idx <- event_length < quantile(event_length, 0.95)

    sce_singlets <- sce[, singlet_idx]
    cat('Removed', sum(!singlet_idx), 'doublets based on event length\n')
}

# DNA intercalator method
if (all(c('DNA1', 'DNA2') %in% rownames(sce))) {
    dna_total <- assay(sce)['DNA1', ] + assay(sce)['DNA2', ]
    dna_cutoff <- quantile(dna_total, 0.95)

    singlet_idx <- dna_total < dna_cutoff
    sce_singlets <- sce[, singlet_idx]
}
```

## CATALYST Workflow with Doublet Removal

**Goal:** Detect and remove cell doublets from a CyTOF/flow dataset using a regression-based approach on scatter parameters.

**Approach:** Model the expected FSC-A vs FSC-H relationship for singlets with linear regression, classify events with large residuals (above the 95th percentile) as doublets, and filter them out.

```r
library(CATALYST)

# Load and prepare data
sce <- prepData(fs, panel, md, transform = TRUE, cofactor = 5)

# Remove doublets using marker-based method
sce <- filterSCE(sce, !is_doublet(sce))

# Custom doublet detection based on FSC
fsc_a <- colData(sce)$FSC_A
fsc_h <- colData(sce)$FSC_H

# Model expected singlet relationship
fit <- lm(fsc_a ~ fsc_h)
residuals <- abs(fsc_a - predict(fit))
threshold <- quantile(residuals, 0.95)

# Mark doublets
colData(sce)$doublet <- residuals > threshold
sce_singlets <- sce[, !colData(sce)$doublet]

cat('Doublet rate:', round(mean(colData(sce)$doublet) * 100, 1), '%\n')
```

## Batch Processing

```r
# Process all samples
detect_doublets <- function(ff, method = 'fsc') {
    if (method == 'fsc') {
        fsc_a <- exprs(ff)[, 'FSC-A']
        fsc_h <- exprs(ff)[, 'FSC-H']

        fit <- lm(fsc_a ~ fsc_h)
        residuals <- abs(fsc_a - predict(fit))
        threshold <- quantile(residuals, 0.95)

        singlet_idx <- residuals <= threshold
    } else if (method == 'ratio') {
        ratio <- exprs(ff)[, 'FSC-A'] / (exprs(ff)[, 'FSC-H'] + 1)
        singlet_idx <- ratio < quantile(ratio, 0.95)
    }

    return(ff[singlet_idx, ])
}

# Apply to all samples
fs_singlets <- fsApply(fs, detect_doublets, method = 'fsc')

# Report
doublet_rates <- sapply(1:length(fs), function(i) {
    1 - nrow(fs_singlets[[i]]) / nrow(fs[[i]])
})
cat('Mean doublet rate:', round(mean(doublet_rates) * 100, 1), '%\n')
```

## Visualization

```r
library(ggplot2)

# Extract data for plotting
plot_data <- data.frame(
    FSC_A = exprs(fs[[1]])[, 'FSC-A'],
    FSC_H = exprs(fs[[1]])[, 'FSC-H']
)

# Calculate doublet status
fit <- lm(FSC_A ~ FSC_H, data = plot_data)
plot_data$residual <- abs(plot_data$FSC_A - predict(fit))
plot_data$doublet <- plot_data$residual > quantile(plot_data$residual, 0.95)

# Plot
ggplot(plot_data, aes(x = FSC_H, y = FSC_A, color = doublet)) +
    geom_point(alpha = 0.3, size = 0.5) +
    scale_color_manual(values = c('gray', 'red')) +
    theme_bw() +
    labs(title = 'Doublet Detection', x = 'FSC-H', y = 'FSC-A')
ggsave('doublet_detection.png', width = 8, height = 6)
```

## Related Skills

Workflow order: cytometry-qc → doublet-detection → bead-normalization → clustering

- cytometry-qc - Run first: identify flow rate and signal issues
- bead-normalization - Run after: correct remaining instrument drift
- fcs-handling - Load FCS files
- gating-analysis - Manual gating workflows
- clustering-phenotyping - Downstream analysis after doublet removal

Related Skills

tooluniverse-adverse-event-detection

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Detect and analyze adverse drug event signals using FDA FAERS data, drug labels, disproportionality analysis (PRR, ROR, IC), and biomedical evidence. Generates quantitative safety signal scores (0-100) with evidence grading. Use for post-market surveillance, pharmacovigilance, drug safety assessment, adverse event investigation, and regulatory decision support.

protein-design-workflow

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

End-to-end guidance for protein design pipelines. Use this skill when: (1) Starting a new protein design project, (2) Need step-by-step workflow guidance, (3) Understanding the full design pipeline, (4) Planning compute resources and timelines, (5) Integrating multiple design tools. For tool selection, use binder-design. For QC thresholds, use protein-qc.

nextflow-development

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Run nf-core bioinformatics pipelines (rnaseq, sarek, atacseq) on sequencing data. Use when analyzing RNA-seq, WGS/WES, or ATAC-seq data—either local FASTQs or public datasets from GEO/SRA. Triggers on nf-core, Nextflow, FASTQ analysis, variant calling, gene expression, differential expression, GEO reanalysis, GSE/GSM/SRR accessions, or samplesheet creation.

flowio

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Parse FCS (Flow Cytometry Standard) files v2.0-3.1. Extract events as NumPy arrays, read metadata/channels, convert to CSV/DataFrame, for flow cytometry data preprocessing.

crisis-detection-intervention-ai

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Detect crisis signals in user content using NLP, mental health sentiment analysis, and safe intervention protocols. Implements suicide ideation detection, automated escalation, and crisis resource integration. Use for mental health apps, recovery platforms, support communities. Activate on "crisis detection", "suicide prevention", "mental health NLP", "intervention protocol". NOT for general sentiment analysis, medical diagnosis, or replacing professional help.

bio-single-cell-doublet-detection

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Detect and remove doublets (multiple cells captured in one droplet) from single-cell RNA-seq data. Uses Scrublet (Python), DoubletFinder (R), and scDblFinder (R). Essential QC step before clustering to avoid artificial cell populations. Use when identifying and removing doublets from scRNA-seq data.

bio-ribo-seq-orf-detection

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Detect and quantify translated ORFs from Ribo-seq data including uORFs and novel ORFs using RiboCode and ORFquant. Use when identifying translated regions beyond annotated coding sequences or quantifying ORF-level translation.

bio-read-qc-fastp-workflow

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

All-in-one read preprocessing with fastp including adapter trimming, quality filtering, deduplication, base correction, and HTML report generation. Use when preprocessing Illumina data and wanting a single fast tool instead of separate Cutadapt, Trimmomatic, and FastQC steps.

bio-microbiome-qiime2-workflow

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

QIIME2 command-line workflow for 16S/ITS amplicon analysis. Alternative to DADA2/phyloseq R workflow with built-in provenance tracking. Use when preferring CLI over R, needing reproducible provenance, or working within QIIME2 ecosystem.

bio-methylation-dmr-detection

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Differentially methylated region (DMR) detection using methylKit tiles, bsseq BSmooth, and DMRcate. Use when identifying contiguous genomic regions with methylation differences between experimental conditions or cell types.

bio-methylation-based-detection

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Analyzes cfDNA methylation patterns for cancer detection using cfMeDIP-seq or bisulfite sequencing with MethylDackel. Identifies cancer-specific methylation signatures and performs tissue-of-origin deconvolution. Use when using methylation biomarkers for early cancer detection or minimal residual disease.

bio-metagenomics-amr-detection

1802

from FreedomIntelligence/OpenClaw-Medical-Skills

Detect antimicrobial resistance genes using AMRFinderPlus, ResFinder, and CARD. Screen isolates and metagenomes for resistance determinants. Use when characterizing resistance profiles in clinical isolates, surveillance samples, or metagenomic data.