bio-single-cell-splicing

Analyzes alternative splicing at single-cell resolution using BRIE2 for probabilistic PSI estimation or leafcutter2 for cluster-based analysis with NMD detection. Identifies cell-type-specific splicing patterns. Use when analyzing isoform usage in scRNA-seq or finding splicing differences between cell populations.

1,802 stars

Best use case

bio-single-cell-splicing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Analyzes alternative splicing at single-cell resolution using BRIE2 for probabilistic PSI estimation or leafcutter2 for cluster-based analysis with NMD detection. Identifies cell-type-specific splicing patterns. Use when analyzing isoform usage in scRNA-seq or finding splicing differences between cell populations.

Teams using bio-single-cell-splicing should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bio-single-cell-splicing/SKILL.md --create-dirs "https://raw.githubusercontent.com/FreedomIntelligence/OpenClaw-Medical-Skills/main/skills/bio-single-cell-splicing/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/bio-single-cell-splicing/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How bio-single-cell-splicing Compares

Feature / Agentbio-single-cell-splicingStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Analyzes alternative splicing at single-cell resolution using BRIE2 for probabilistic PSI estimation or leafcutter2 for cluster-based analysis with NMD detection. Identifies cell-type-specific splicing patterns. Use when analyzing isoform usage in scRNA-seq or finding splicing differences between cell populations.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

## Version Compatibility

Reference examples tested with: anndata 0.10+, numpy 1.26+, pandas 2.2+, scanpy 1.10+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Single-Cell Splicing Analysis

Analyze alternative splicing at single-cell resolution.

## Tool Selection

| Tool | Approach | Strengths |
|------|----------|-----------|
| BRIE2 | Probabilistic PSI | Handles sparsity, regulatory features |
| leafcutter2 | Intron clustering | NMD detection, novel junctions |

Note: Avoid Whippet.jl (Julia 1.6.7 only, incompatible with Julia 1.9+)

## BRIE2 Analysis

**Goal:** Estimate per-cell PSI values for splicing events with uncertainty quantification.

**Approach:** Prepare splicing events from annotation, count reads per cell barcode, then fit a Bayesian variational inference model for probabilistic PSI estimation.

**"Analyze splicing in single-cell data"** -> Estimate per-cell inclusion levels for splicing events with uncertainty.
- Python: BRIE2 (probabilistic PSI, handles sparsity)
- Python/R: leafcutter2 (intron clustering, NMD detection)

```python
import brie
import scanpy as sc
import anndata as ad

# Load single-cell data
adata = sc.read_h5ad('scrnaseq.h5ad')

# Prepare splicing events from annotation
# BRIE2 uses pre-defined splicing events
brie.preprocessing.get_events(
    gtf_file='annotation.gtf',
    out_file='splicing_events.gff3'
)

# Count reads for splicing events from BAM files
# Requires cell barcodes and UMIs
brie.preprocessing.count(
    bam_file='possorted_genome_bam.bam',
    gff_file='splicing_events.gff3',
    out_dir='brie_counts/',
    cell_file='barcodes.tsv'  # Filtered cell barcodes
)

# Load BRIE count data
adata_splice = brie.read_h5ad('brie_counts/brie_count.h5ad')

# Run BRIE2 model for PSI estimation
# Uses variational inference for probabilistic estimates
brie.fit(
    adata_splice,
    layer='raw',
    n_epochs=400,
    batch_size=512
)

# PSI estimates stored in adata_splice.layers['Psi']
# Uncertainty in adata_splice.layers['Psi_var']
```

## Cell-Type Specific Splicing

**Goal:** Identify splicing events that vary between cell types.

**Approach:** Compute mean PSI per cell type from BRIE2 output and rank events by cross-cell-type variance.

```python
import numpy as np
import pandas as pd

# Add cell type annotations
adata_splice.obs['cell_type'] = adata.obs['cell_type']

# Calculate mean PSI per cell type
cell_types = adata_splice.obs['cell_type'].unique()
psi_matrix = adata_splice.layers['Psi']

mean_psi = pd.DataFrame(index=adata_splice.var_names)
for ct in cell_types:
    mask = adata_splice.obs['cell_type'] == ct
    mean_psi[ct] = np.nanmean(psi_matrix[mask, :], axis=0)

# Find cell-type specific splicing events
# Events with high variance across cell types
psi_var = mean_psi.var(axis=1)
variable_events = psi_var.nlargest(100)
print('Top variable splicing events:')
print(variable_events)
```

## leafcutter2 Analysis

**Goal:** Detect differential intron usage in single-cell data with NMD-inducing splicing detection.

**Approach:** Extract junctions from 10X BAMs with cell barcodes, cluster introns, and run differential analysis between cell groups.

```python
import subprocess

# leafcutter2 (April 2025): Adds NMD-inducing splicing detection

# Step 1: Extract junctions from BAM
# Works with 10X BAMs with cell barcodes
subprocess.run([
    'python', 'scripts/bam2junc.py',
    '-b', 'possorted_genome_bam.bam',
    '-o', 'junctions/',
    '--cb_tag', 'CB',  # Cell barcode tag
    '--umi_tag', 'UB'   # UMI tag
], check=True)

# Step 2: Cluster introns
subprocess.run([
    'python', 'clustering/leafcutter_cluster.py',
    '-j', 'junction_files.txt',
    '-o', 'leafcutter_sc',
    '-m', '10',  # Min reads per junction
    '-l', '500000'  # Max intron length
], check=True)

# Step 3: Differential splicing between clusters
# Pseudobulk approach for statistical power
subprocess.run([
    'Rscript', 'scripts/leafcutter_ds.R',
    'leafcutter_sc_perind_numers.counts.gz',
    'groups.txt',
    '-o', 'differential_splicing',
    '-e', 'annotation_exons.txt.gz'
], check=True)
```

## Pseudobulk Approach

**Goal:** Increase statistical power for splicing analysis by aggregating single cells into pseudobulk samples.

**Approach:** Sum junction counts within cell type groups, then apply bulk differential splicing methods to the aggregated counts.

```python
import pandas as pd
import numpy as np

# For better statistical power, aggregate cells by type
def pseudobulk_junctions(junction_counts, cell_metadata, groupby='cell_type'):
    '''Aggregate junction counts by cell group.'''
    groups = cell_metadata.groupby(groupby).groups

    pseudobulk = {}
    for group, cells in groups.items():
        cell_mask = junction_counts.index.isin(cells)
        pseudobulk[group] = junction_counts.loc[cell_mask].sum()

    return pd.DataFrame(pseudobulk)

# Run differential splicing on pseudobulk
# Use leafcutter or rMATS on aggregated counts
```

## Interpretation Considerations

| Challenge | Mitigation |
|-----------|------------|
| Sparse data | BRIE2 probabilistic model, pseudobulk |
| Low reads per cell | Aggregate similar cells |
| 3' bias (10X) | Use 5' kit or full-length methods |
| Doublets | Filter before splicing analysis |

## Quality Thresholds

| Metric | Recommendation |
|--------|----------------|
| Min cells per event | >= 50 with reads |
| Min reads per junction | >= 5 per cell with coverage |
| PSI confidence | Variance < 0.1 |

## Related Skills

- single-cell/preprocessing - QC before splicing analysis
- single-cell/clustering - Cell type annotation
- splicing-quantification - Bulk RNA-seq comparison

Related Skills

tooluniverse-single-cell

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Production-ready single-cell and expression matrix analysis using scanpy, anndata, and scipy. Performs scRNA-seq QC, normalization, PCA, UMAP, Leiden/Louvain clustering, differential expression (Wilcoxon, t-test, DESeq2), cell type annotation, per-cell-type statistical analysis, gene-expression correlation, batch correction (Harmony), trajectory inference, and cell-cell communication analysis. NEW: Analyzes ligand-receptor interactions between cell types using OmniPath (CellPhoneDB, CellChatDB), scores communication strength, identifies signaling cascades, and handles multi-subunit receptor complexes. Integrates with ToolUniverse gene annotation tools (HPA, Ensembl, MyGene, UniProt) and enrichment tools (gseapy, PANTHER, STRING). Supports h5ad, 10X, CSV/TSV count matrices, and pre-annotated datasets. Use when analyzing single-cell RNA-seq data, studying cell-cell interactions, performing cell type differential expression, computing gene-expression correlations by cell type, analyzing tumor-immune communication, or answering questions about scRNA-seq datasets.

single-trajectory-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Guide to reproducing OmicVerse trajectory workflows spanning PAGA, Palantir, VIA, velocity coupling, and fate scoring notebooks.

single2spatial-spatial-mapping

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Map scRNA-seq atlases onto spatial transcriptomics slides using omicverse's Single2Spatial workflow for deep-forest training, spot-level assessment, and marker visualisation.

single-cell-preprocessing-with-omicverse

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Walk through omicverse's single-cell preprocessing tutorials to QC PBMC3k data, normalise counts, detect HVGs, and run PCA/embedding pipelines on CPU, CPU–GPU mixed, or GPU stacks.

single-cell-multi-omics-integration

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Quick-reference sheet for OmicVerse tutorials spanning MOFA, GLUE pairing, SIMBA integration, TOSICA transfer, and StaVIA cartography.

single-cell-downstream-analysis

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Checklist-style reference for OmicVerse downstream tutorials covering AUCell scoring, metacell DEG, and related exports.

single-cell-clustering-and-batch-correction-with-omicverse

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Guide Claude through omicverse's single-cell clustering workflow, covering preprocessing, QC, multimethod clustering, topic modeling, cNMF, and cross-batch integration as demonstrated in t_cluster.ipynb and t_single_batch.ipynb.

single-cell-cellphonedb-communication-mapping

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Run omicverse's CellPhoneDB v5 wrapper on annotated single-cell data to infer ligand-receptor networks and produce CellChat-style visualisations.

single-cell-rna-qc

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Performs quality control on single-cell RNA-seq data (.h5ad or .h5 files) using scverse best practices with MAD-based filtering and comprehensive visualizations. Use when users request QC analysis, filtering low-quality cells, assessing data quality, or following scverse/scanpy best practices for single-cell analysis.

single-cell-annotation-skills-with-omicverse

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Guide Claude through SCSA, MetaTiME, CellVote, CellMatch, GPTAnno, and weighted KNN transfer workflows for annotating single-cell modalities.

cellxgene-census

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Query CZ CELLxGENE Census (61M+ cells). Filter by cell type/tissue/disease, retrieve expression data, integrate with scanpy/PyTorch, for population-scale single-cell analysis.

cell-free-expression

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Guidance for cell-free protein synthesis (CFPS) optimization. Use when: (1) Planning CFPS experiments, (2) Troubleshooting low yield or aggregation, (3) Optimizing DNA template design for CFPS, (4) Expressing difficult proteins (disulfide-rich, toxic, membrane).