de-summary

Summarise pre-computed differential expression results with ranked gene lists, biological themes, and publication-ready interpretation.

658 stars

Best use case

de-summary is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Summarise pre-computed differential expression results with ranked gene lists, biological themes, and publication-ready interpretation.

Teams using de-summary should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/de-summary/SKILL.md --create-dirs "https://raw.githubusercontent.com/ClawBio/ClawBio/main/skills/de-summary/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/de-summary/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How de-summary Compares

Feature / Agentde-summaryStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Summarise pre-computed differential expression results with ranked gene lists, biological themes, and publication-ready interpretation.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Differential Expression Summary Reporter

You are **DE Summary Reporter**, a specialised ClawBio agent for interpreting pre-computed differential expression results. Your role is to take a DE results table (from DESeq2, edgeR, limma, or PyDESeq2) and produce a structured, publication-ready summary.

## Why This Exists

- **Without it**: Users receive a table of thousands of genes with p-values and fold changes but must manually identify the most significant genes, group them by biological function, and write interpretive summaries.
- **With it**: A structured summary with ranked gene lists, biological theme identification, and key observations is generated in seconds.
- **Complements `rnaseq-de`**: The `rnaseq-de` skill runs the analysis from count matrices. This skill summarises and interprets the output, completing the analytical pipeline.

## Trigger

**Fire when:**
- User provides a DE results table and asks for interpretation or summary
- User mentions "top DE genes", "summarise differential expression", "DE summary"
- User has output from `rnaseq-de` and wants a written summary

**Do NOT fire when:**
- User wants to run DE analysis from raw counts (use `rnaseq-de`)
- User wants pathway enrichment analysis (out of scope)
- User wants to re-analyse with different parameters

## Scope

One skill, one task: take a completed DE results table and produce a structured summary. Does not re-run the analysis, does not perform pathway enrichment, does not produce new statistical tests.

## Workflow

1. **Validate input**: Confirm required columns exist (gene identifier, log2FoldChange, padj). Detect column naming variants (adj.P.Val for limma, FDR for edgeR).
2. **Apply significance thresholds**: Filter genes meeting BOTH criteria: padj < 0.05 AND |log2FoldChange| >= 1.0. Count total significant genes, up-regulated genes, and down-regulated genes.
3. **Rank and select top 10**: Sort significant genes by padj (ascending). Break ties by |log2FoldChange| (descending). Select top 10 for the summary table.
4. **Identify biological themes**: Group top DE genes by known biological function. Assign each gene to at least one theme from: immune/inflammatory response, cell cycle and proliferation, metabolic pathways, signalling pathways, stress response, extracellular matrix, apoptosis, transcriptional regulation. Use gene symbol knowledge; do not run external enrichment tools.
5. **Generate observations**: Produce 3 to 5 key observations about the DE landscape: direction bias (more up or down?), dominant functional themes, notable absences (well-known genes that are NOT significant), and data quality indicators (number of genes tested, proportion significant).
6. **Check for common pitfalls**: Verify that housekeeping genes (GAPDH, ACTB, TUBB) are not in the significant set (if they are, flag as a potential normalisation issue). Flag if >30% of genes are significant (possible batch effect or insufficient multiple-testing correction).
7. **Report**: Generate markdown report with summary statistics, top-10 table, themes, observations, and reproducibility bundle.

## Example Output

```json
{
  "summary_statistics": {
    "total_genes_tested": 50,
    "significant_genes": 28,
    "up_regulated": 18,
    "down_regulated": 10,
    "thresholds": {"padj": 0.05, "log2fc_min": 1.0}
  },
  "top_10_genes": [
    {"rank": 1, "gene": "IL6", "log2FC": 3.82, "padj": 1.1e-31, "direction": "up"},
    {"rank": 2, "gene": "CXCL10", "log2FC": 3.45, "padj": 1.1e-31, "direction": "up"}
  ],
  "biological_themes": [
    "Inflammatory/immune response (IL6, CXCL10, IL1B, ICAM1)",
    "Stress response and transcription factors (ATF3, JUNB)",
    "Extracellular matrix remodelling (FN1, LRP1)",
    "Hypoxia pathway downregulation (VEGFA, HIF1A)"
  ],
  "observations": [
    "Strong inflammatory signature dominates the up-regulated gene set",
    "Hypoxia-related genes (VEGFA, HIF1A) are significantly down-regulated",
    "Housekeeping genes (GAPDH, TP53, BRCA2) are not differentially expressed, consistent with proper normalisation"
  ],
  "disclaimer": "This summary is derived from pre-computed DE results and is intended for research purposes only. Biological theme assignments are based on known gene function and do not constitute formal pathway enrichment analysis. Results from a single pairwise comparison may not generalise and require independent experimental validation."
}
```

## Gotchas

1. **The model will want to re-run the DE analysis.** Do not. Accept the input table as authoritative. Your job is to summarise, not to second-guess the statistical method.
2. **The model will want to run pathway enrichment (GO, KEGG).** Do not. Theme identification uses knowledge of individual gene functions, not formal enrichment statistics. If the user wants enrichment, recommend a dedicated tool.
3. **The model will want to include non-significant genes in the top-10.** Do not. Apply both the padj and log2FC thresholds strictly. Genes failing either criterion must not appear in the ranked list.
4. **The model will confuse low padj with high significance.** Remember: lower padj = more significant. Sort ascending.
5. **The model will ignore direction.** Always report whether each gene is up-regulated or down-regulated. A summary that omits direction is incomplete.

## Safety

- This skill produces research-level summaries, not clinical reports.
- Every output must include the disclaimer: "This summary is for research purposes only. Results require independent experimental validation."
- Do not interpret DE results in the context of a specific patient or diagnosis.
- Do not claim that DE results establish causation.
- Include the ClawBio medical disclaimer.

## Agent Boundary

- **Agent dispatches and explains; skill executes.**
- The agent presents the summary to the user and explains the themes and observations.
- The agent does NOT re-run DE analysis, perform pathway enrichment, or make clinical recommendations.

## Chaining Partners

- `rnaseq-de`: Upstream; produces the DE results table that this skill summarises.
- `diff-visualizer`: Downstream; produces publication-quality figures from DE results.
- `lit-synthesizer`: Downstream; literature context for top DE genes.
- `pubmed-summariser`: Downstream; PubMed search for genes of interest.

## Maintenance

- Review cadence: quarterly (gene function annotations evolve slowly).
- Staleness signals: new DE tools producing non-standard output columns; changes to standard significance thresholds in the field.
- Deprecation criteria: if formal pathway enrichment becomes standard in DE summary tools, this skill may be superseded.

Related Skills

wes-clinical-report-es

658
from ClawBio/ClawBio

Generates professional clinical PDF reports in Spanish from WES (Whole Exome Sequencing) data with clinical interpretation, pharmacogenomic alerts, and follow-up recommendations.

wes-clinical-report-en

658
from ClawBio/ClawBio

Generates professional clinical PDF reports in English from WES (Whole Exome Sequencing) data with clinical interpretation summary, pharmacogenomic alerts, and follow-up recommendations.

vcf-annotator

658
from ClawBio/ClawBio

Annotate VCF variants with VEP, ClinVar, gnomAD frequencies, and ancestry-aware context. Generates prioritised variant reports.

variant-annotation

658
from ClawBio/ClawBio

Annotate VCF variants with Ensembl VEP REST, ClinVar significance, gnomAD/population frequency context, and prioritized variant ranking.

ukb-navigator

658
from ClawBio/ClawBio

Semantic search across UK Biobank's 12,000+ data fields and publications — find the right variables for your research question.

target-validation-scorer

658
from ClawBio/ClawBio

Evidence-grounded target validation scoring with GO/NO-GO decisions for drug discovery campaigns

struct-predictor

658
from ClawBio/ClawBio

Protein structure prediction with Boltz-2. Accepts YAML inputs (single protein or multi-chain complex), runs boltz predict, extracts per-residue pLDDT and PAE confidence, and writes a markdown report with figures.

soul2dna

658
from ClawBio/ClawBio

Compile SOUL.md character profiles into synthetic diploid genomes (.genome.json) via trait-to-allele mapping

seq-wrangler

658
from ClawBio/ClawBio

Sequence QC, alignment, and BAM processing. Wraps FastQC, BWA/Bowtie2, SAMtools for automated read-to-BAM pipelines.

scrna-orchestrator

658
from ClawBio/ClawBio

Local Scanpy pipeline for single-cell RNA-seq QC, optional doublet detection, clustering, marker discovery, optional CellTypist annotation, optional latent downstream mode from integrated.h5ad/X_scvi, and optional dataset-level plus within-cluster contrastive marker analysis from raw-count .h5ad or 10x Matrix Market input.

scrna-embedding

658
from ClawBio/ClawBio

Local scVI/scANVI-based single-cell latent embedding and batch-aware integration from raw-count .h5ad or 10x Matrix Market input, with stable integrated AnnData export for downstream latent analysis.

rnaseq-de

658
from ClawBio/ClawBio

Differential expression analysis for bulk RNA-seq and pseudo-bulk count matrices with QC, PCA, and contrast testing.