single-cell-rnaseq-pipeline

Generate single-cell RNA-seq analysis code templates for Seurat and Scanpy, supporting QC, clustering, visualization, and downstream analysis. Trigger when users need scRNA-seq analysis pipelines, preprocessing workflows, or batch correction code.

3,891 stars

byopenclaw

View on GitHub Installation ↓

Best use case

single-cell-rnaseq-pipeline is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using single-cell-rnaseq-pipeline should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/single-cell-rnaseq-pipeline/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/aipoch-ai/single-cell-rnaseq-pipeline/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/single-cell-rnaseq-pipeline/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How single-cell-rnaseq-pipeline Compares

Feature / Agent	single-cell-rnaseq-pipeline	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

Cursor vs Codex for AI Workflows

Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.

SKILL.md Source

# Single-Cell RNA-seq Pipeline

## Overview

Generate comprehensive single-cell RNA-seq analysis code templates for **Seurat (R)** and **Scanpy (Python)**. This skill provides ready-to-use code frameworks for preprocessing, quality control, normalization, clustering, marker identification, visualization, and advanced analyses like batch correction and trajectory inference.

**Technical Difficulty**: High

## When to Use

- Building scRNA-seq analysis pipelines from raw count matrices
- Need standardized QC and preprocessing workflows
- Performing batch correction across multiple samples/datasets
- Running dimensionality reduction and clustering
- Identifying cell type-specific marker genes
- Creating publication-ready visualizations (UMAP, violin plots, heatmaps)
- Conducting trajectory inference (pseudotime analysis)
- Comparing cell populations between conditions

## Core Features

### Seurat (R) Templates
1. **Data Loading**: 10x Genomics, H5AD, Cell Ranger outputs
2. **QC Metrics**: Mitochondrial content, gene counts, doublet detection
3. **Normalization**: Log-normalization, SCTransform
4. **Integration**: Harmony, RPCA, CCA for batch correction
5. **Clustering**: Graph-based clustering with optimization
6. **Visualization**: UMAP, t-SNE, feature plots, dot plots
7. **Marker Analysis**: Wilcoxon tests, conserved markers
8. **Differential Expression**: FindAllMarkers, FindConservedMarkers
9. **Cell Typing**: Reference-based annotation with SingleR/Azimuth

### Scanpy (Python) Templates
1. **Data Loading**: AnnData, 10x, CSV, loom files
2. **QC Workflow**: Comprehensive filtering and metrics
3. **Normalization**: Log1p, scran, Combat batch correction
4. **Integration**: scVI, Scanorama, BBKNN
5. **Clustering**: Leiden/Louvain with resolution sweep
6. **Visualization**: UMAP, PAGA, embeddings
7. **Marker Analysis**: rank_genes_groups, filter markers
8. **Trajectory**: PAGA, diffusion pseudotime (DPT)
9. **CellChat/CellPhoneDB**: Cell-cell communication

## Usage

### Generate Seurat Template

```bash
python scripts/main.py --tool seurat --output seurat_analysis.R --species human
```

### Generate Scanpy Template

```bash
python scripts/main.py --tool scanpy --output scanpy_analysis.py --species mouse
```

### Generate Both Templates

```bash
python scripts/main.py --tool both --output scrna_pipeline --species human --batch-correction harmony --trajectory true
```

### Command-Line Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| --tool | string | Yes | Analysis tool: `seurat`, `scanpy`, or `both` |
| --output | string | Yes | Output file or directory path |
| --species | string | No | Species: `human` or `mouse` (default: human) |
| --batch-correction | string | No | Method: `harmony`, `rpca`, `cca`, `scanorama`, `scvi` |
| --trajectory | bool | No | Include trajectory analysis (default: false) |
| --cell-communication | bool | No | Include cell-cell communication (default: false) |
| --de-analysis | bool | No | Include differential expression (default: false) |
| --spatial | bool | No | Include spatial transcriptomics (default: false) |

## Output Structure

```
output/
├── seurat/
│   ├── 01_load_and_qc.R
│   ├── 02_normalize_integrate.R
│   ├── 03_cluster_annotate.R
│   ├── 04_visualize.R
│   └── 05_de_analysis.R (if --de-analysis)
├── scanpy/
│   ├── 01_load_qc.py
│   ├── 02_normalize_integrate.py
│   ├── 03_cluster_annotate.py
│   ├── 04_visualize.py
│   └── 05_trajectory.py (if --trajectory)
└── README.md
```

## Technical Details

### Supported Input Formats
- 10x Genomics Cell Ranger outputs (barcodes.tsv, features.tsv, matrix.mtx)
- H5AD (AnnData h5 format)
- Seurat RDS objects
- CSV/TSV count matrices
- HDF5 files

### QC Parameters (Default)
| Metric | Human | Mouse |
|--------|-------|-------|
| min_genes | 200 | 200 |
| max_genes | 25000 | 25000 |
| min_cells | 3 | 3 |
| max_mt_percent | 20% | 20% |
| doublet_threshold | Auto | Auto |

### Clustering Resolution Guidelines
- **0.4-0.6**: Broad cell types
- **0.8-1.2**: Subtypes
- **1.5-2.0**: Fine populations

### Batch Correction Recommendations
| Scenario | Seurat | Scanpy |
|----------|--------|--------|
| Small batches (<5) | Harmony | Harmony |
| Large batches | RPCA | Scanorama |
| Complex variation | CCA | scVI |

## Code Examples

### Seurat Quick Start

```r
# Load data
seurat_obj <- CreateSeuratObject(counts = raw_data, project = "Sample")

# QC
seurat_obj[["percent.mt"]] <- PercentageFeatureSet(seurat_obj, pattern = "^MT-")
seurat_obj <- subset(seurat_obj, subset = nFeature_RNA > 200 & percent.mt < 20)

# Normalize
seurat_obj <- NormalizeData(seurat_obj)
seurat_obj <- FindVariableFeatures(seurat_obj, selection.method = "vst", nfeatures = 2000)

# Scale and PCA
seurat_obj <- ScaleData(seurat_obj)
seurat_obj <- RunPCA(seurat_obj, features = VariableFeatures(object = seurat_obj))

# Cluster
seurat_obj <- FindNeighbors(seurat_obj, dims = 1:30)
seurat_obj <- FindClusters(seurat_obj, resolution = 1.0)
seurat_obj <- RunUMAP(seurat_obj, dims = 1:30)

# Visualize
DimPlot(seurat_obj, reduction = "umap", label = TRUE)
FeaturePlot(seurat_obj, features = c("CD3E", "CD14", "CD79A"))
```

### Scanpy Quick Start

```python
import scanpy as sc

# Load data
adata = sc.read_10x_mtx("filtered_gene_bc_matrices/")

# QC
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
adata.var['mt'] = adata.var_names.str.startswith('MT-')
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None, inplace=True)
adata = adata[adata.obs.pct_counts_mt < 20, :]

# Normalize
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=2000)

# PCA and UMAP
sc.pp.scale(adata)
sc.tl.pca(adata, svd_solver='arpack')
sc.pp.neighbors(adata, n_neighbors=15, n_pcs=30)
sc.tl.umap(adata)
sc.tl.leiden(adata, resolution=1.0)

# Visualize
sc.pl.umap(adata, color=['leiden', 'total_counts'])
sc.pl.dotplot(adata, var_names=['CD3E', 'CD14', 'CD79A'], groupby='leiden')
```

## References

- `references/seurat_template.R` - Complete Seurat analysis template
- `references/scanpy_template.py` - Complete Scanpy analysis template
- `references/batch_correction_guide.md` - Batch correction comparison
- `requirements.txt` - Python dependencies

## Dependencies

### Seurat (R)
```r
install.packages(c("Seurat", "SeuratObject", "tidyverse", "patchwork"))
# Optional
remotes::install_github("satijalab/seurat-wrappers")
remotes::install_github("immunogenomics/harmony")
BiocManager::install("SingleR")
```

### Scanpy (Python)
```bash
pip install scanpy leidenalg scvi-tools cellchatpy
```

## Testing

Run basic validation:
```bash
cd scripts
python test_main.py
```

## Error Handling

All errors return semantic messages:

```json
{
  "status": "error",
  "error": {
    "type": "invalid_parameter",
    "message": "Unsupported batch correction method: 'xyz'",
    "suggestion": "Use one of: harmony, rpca, cca, scanorama, scvi"
  }
}
```

## Safety & Compliance

- No external API calls
- All code templates are self-contained
- No hardcoded credentials or paths
- Templates use relative paths for data
- Default parameters are conservative for safety

## Citation

If using generated templates in publications:
- Seurat: Satija Lab, Nature Biotechnology 2015
- Scanpy: Wolf et al., Genome Biology 2018
- scVI: Lopez et al., Nature Methods 2018
- Harmony: Korsunsky et al., Nature Methods 2019

## Risk Assessment

| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |

## Security Checklist

- [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited
## Prerequisites

```bash
# Python dependencies
pip install -r requirements.txt
```

## Evaluation Criteria

### Success Metrics
- [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable

### Test Cases
1. **Basic Functionality**: Standard input → Expected output
2. **Edge Case**: Invalid input → Graceful error handling
3. **Performance**: Large dataset → Acceptable processing time

## Lifecycle Status

- **Current Stage**: Draft
- **Next Review Date**: 2026-03-06
- **Known Issues**: None
- **Planned Improvements**: 
  - Performance optimization
  - Additional feature support

Related Skills

docs-pipeline-automation

3891

from openclaw/skills

Build repeatable data-to-Docs pipelines from Sheets and Drive sources. Use for automated status reports, template-based document assembly, and scheduled publishing workflows.

Workflow & Productivity

scrna-cell-type-annotator

3891

from openclaw/skills

Auto-annotate cell clusters from single-cell RNA data using marker genes.

gitlab-mr-review-pipeline

3891

from openclaw/skills

自动化 GitLab MR 代码审核流水线。使用 AI 对 MR 进行代码审查，生成报告并邮件发送给提交人。

pipeline-analytics

3891

from openclaw/skills

Generate interactive analytics dashboards from CRM data. Use when asked to "show pipeline stats", "create a report", "analyze leads", "show conversion rates", "build a dashboard", "visualize outreach data", "funnel analysis", or any data visualization request from DuckDB workspace data.

ci-cd-pipeline-builder

3891

from openclaw/skills

CI/CD Pipeline Builder

lead-gen-pipeline

3891

from openclaw/skills

Automated lead generation pipeline with AI-powered lead scoring and personalized follow-up generation. Score leads 0-100 with reasoning, generate context-aware follow-ups in multiple tones. Integrates with any CRM. Use for sales automation, cold outreach, and pipeline management.

ml-pipeline

3891

from openclaw/skills

Complete machine learning pipeline for trading: feature engineering, AutoML, deep learning, and financial RL. Use for automated parameter sweeps, feature creation, model training, and anti-leakage validation.

Sales Pipeline Tracker

3891

from openclaw/skills

Track deals through every stage from lead to close. Manage pipeline stages, update deal status, forecast revenue, and identify bottlenecks in your sales process.

---

3891

from openclaw/skills

name: article-factory-wechat

Content & Documentation

humanizer

3891

from openclaw/skills

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.

Content & Documentation

find-skills

3891

from openclaw/skills

Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.

General Utilities

tavily-search

3891

from openclaw/skills

Use Tavily API for real-time web search and content extraction. Use when: user needs real-time web search results, research, or current information from the web. Requires Tavily API key.

Data & Research

single-cell-rnaseq-pipeline

Best use case

When to use this skill

When not to use this skill

Installation

How single-cell-rnaseq-pipeline Compares

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

Related Guides

Best AI Skills for Claude

ChatGPT vs Claude for Agent Skills

Cursor vs Codex for AI Workflows

SKILL.md Source

Related Skills

docs-pipeline-automation

scrna-cell-type-annotator

gitlab-mr-review-pipeline

pipeline-analytics

ci-cd-pipeline-builder

lead-gen-pipeline

ml-pipeline

Sales Pipeline Tracker

﻿---

humanizer

find-skills

tavily-search

---