dual-disease-transcriptomic-ml-planner

Generates complete dual-disease transcriptomic + machine learning research designs from a user-provided disease pair. Use when users want to identify shared DEGs, common hub genes, cross-disease biomarkers, or shared molecular mechanisms between two diseases using public GEO data. Triggers:"shared biomarker study for two diseases", "dual-disease transcriptomic ML paper", "identify common DEGs between disease A and B", "cross-disease hub gene discovery", "shared DEG + PPI + ROC design", "immune infiltration shared biomarker", or "I want to study disease X and Y together". Always outputs four workload configurations (Lite / Standard / Advanced / Publication+) with a recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, and publication upgrade path.

53 stars

byaipoch

View on GitHub Installation ↓

Best use case

dual-disease-transcriptomic-ml-planner is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using dual-disease-transcriptomic-ml-planner should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/dual-disease-transcriptomic-ml-planner/SKILL.md --create-dirs "https://raw.githubusercontent.com/aipoch/medical-research-skills/main/scientific-skills/Protocol Design/dual-disease-transcriptomic-ml-planner/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/dual-disease-transcriptomic-ml-planner/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How dual-disease-transcriptomic-ml-planner Compares

Feature / Agent	dual-disease-transcriptomic-ml-planner	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

# Dual-Disease Transcriptomic Machine Learning Research Planner

Generates a complete dual-disease transcriptomic + ML study design from a user-provided disease pair. Always outputs four workload configurations and a recommended primary plan.

## Supported Study Styles

| Style | Description | Example |
|-------|-------------|---------|
| **A. Shared DEG → Hub Gene Core** | DEG overlap → PPI → hub consensus | Intracranial aneurysm + AAA; diabetic + hypertensive nephropathy |
| **B. Dual-Disease Shared Mechanism** | Pathway-level convergence | ECM, inflammation, fibrosis linking two diseases |
| **C. PPI + Multi-Algorithm Hub Prioritization** | STRING + MCODE + CytoHubba consensus | Any pair with sufficient shared DEGs |
| **D. Dual-Disease Biomarker Validation** | ROC in discovery + validation cohorts | Any pair with ≥2 GEO datasets per disease |
| **E. Immune Infiltration + Shared Biomarker** | CIBERSORT/alternative + gene–immune correlation | Immunologically active disease pairs |
| **F. Single-Gene Cross-Disease Deepening** | Hub-gene GSEA in both diseases | Single top hub with strong AUC |
| **G. Publication-Oriented Integrated Design** | Full pipeline: DEG → PPI → ROC → immune → GSEA | High-impact submission target |

## Minimum User Input

- Two diseases or phenotypes
- If limited detail is provided, infer a reasonable default design and state all assumptions explicitly (Hard Rule 9)

## Step-by-Step Execution

### Step 1: Infer Study Type

Identify:
- Disease pair and biological theme (vascular, autoimmune, fibrotic, metabolic, neurodegenerative, infectious-oncologic, comorbidity)
- User goal: shared biomarkers, shared mechanisms, immune relevance, or publication strength
- Whether ML is central (hub consensus, ROC) or supportive (biological interpretation)
- Whether immune analysis is appropriate — consult Hard Rule 5 and tissue/tool decision guide below
- Resource constraints: public data only, dataset count per disease, time limit, single-gene focus

### Step 2: Output Four Configurations

Always generate all four. For each describe: goal, required data, major modules, expected workload, figure set, strengths, weaknesses.

| Config | Goal | Timeframe | Best For |
|--------|------|-----------|----------|
| **Lite** | Shared DEG + basic hub, 1 dataset per disease | 2–4 weeks | Pilot, skeleton manuscript, single-dataset constraint |
| **Standard** | Full pipeline + validation + ROC + one deepening layer | 5–9 weeks | Core publishable paper |
| **Advanced** | Standard + immune + GSEA + multi-cohort robustness | 9–14 weeks | Competitive journal target |
| **Publication+** | Full multi-layer + experimental suggestions + reviewer defense | 12–20 weeks | High-impact submission |

### Step 3: Recommend One Primary Plan

Select the best-fit configuration and explain why, given disease pair biology, GEO data availability, time constraints, and publication ambition.

### Step 4: Full Step-by-Step Workflow

For each step include: step name, purpose, input, method, key parameters/thresholds, expected output, failure points, alternative approaches.

**Dataset & Preprocessing**
- GEO dataset search: one discovery + one validation per disease when feasible (see [references/geo_search_and_tools.md](references/geo_search_and_tools.md))
- Tissue-only filtering: exclude blood/CSF unless disease-appropriate; match tissue type across both diseases
- **Tissue selection rule**: use the tissue most proximal to disease pathology; for metabolic diseases refer to the tissue/tool decision guide
- Platform compatibility check: verify GPL IDs match or are cross-compatible before merging
- Normalization; batch-awareness without forced merging
- Disease vs control group assignment

**Fault tolerance — dataset level:**
- If no GEO dataset exists for one disease: state infeasibility, suggest the closest available proxy phenotype, downgrade to Lite with discovery-only design
- If only one dataset is available per disease: downgrade to Lite; clearly state validation ROC is not feasible; provide GEO search strategy for a second cohort

**DEG & Shared Signature**
- limma-based DEG analysis (logFC > 1–2, adj.p < 0.05)
- Volcano plots, heatmaps
- Shared up/downregulated DEG intersection (Venn diagram)
- Shared-gene summary table

**Fault tolerance — DEG intersection:**
- If shared DEG count = 0: do not proceed with PPI/hub analysis; apply the following recovery sequence in order:
  1. Relax logFC threshold to 0.5 (report alongside original results)
  2. Extend to top 500 DEGs per disease regardless of threshold
  3. Switch to WGCNA co-expression module overlap instead of direct DEG intersection
  4. Re-evaluate whether the disease pair shares a common tissue or biological mechanism; recommend alternative pairing if not

**Enrichment & Shared Mechanism**
- GO enrichment (BP, MF, CC) + KEGG enrichment (clusterProfiler / DAVID)
- Pathway visualization; shared biological module summarization

**PPI & Hub Prioritization**
- STRING PPI construction (confidence score > 0.4)
- Cytoscape visualization; MCODE dense-cluster identification
- CytoHubba multi-algorithm ranking (≥5 algorithms required: Degree, MCC, Betweenness, Closeness, EPC)
- Hub-gene consensus logic → top 1 / top 3 / top 10 candidates

**Biomarker Performance**
- ROC / AUC analysis (pROC); AUC > 0.70 as minimum threshold
- Discovery-cohort ROC + validation-cohort ROC (Standard and above)
- Expression validation across cohorts

**Fault tolerance — ROC:**
- If AUC ≈ 0.5 in discovery cohort: do not interpret as biomarker; flag as non-informative; consider mini-signature (3–5 genes) instead of single hub gene
- If n < 30 per group: explicitly flag AUC inflation risk; interpret AUC with bootstrap CI; do not generalize

**Immune Infiltration** (when disease-appropriate per Hard Rule 5)
- Deconvolution tool selection — consult [references/tissue_and_tool_decisions.md](references/tissue_and_tool_decisions.md) for the correct tool by tissue type
- Immune-cell proportion comparison (disease vs control); gene–immune cell correlation (Spearman)
- Violin plots, lollipop / heatmap correlation

**Single-Gene Deepening** (Standard and above)
- Stratify samples by hub gene expression (high vs low quartile)
- Single-gene GSEA in both diseases; cross-disease pathway convergence interpretation

### Step 5: Figure Plan

→ Full figure list and table templates: [references/figure_plan_template.md](references/figure_plan_template.md)

Core figures: workflow schematic (Fig 1), DEG volcanos + Venn (Fig 2), shared DEG heatmap (Fig 3), GO/KEGG enrichment (Fig 4), PPI + MCODE + hub ranking (Fig 5), ROC curves (Fig 6), immune infiltration + correlation (Fig 7), single-gene GSEA (Fig 8). Tables: dataset summary, shared DEG list, hub rankings, ROC/AUC summary.

### Step 6: Validation and Robustness Plan

State what each layer proves and what it does not prove:
- **Shared-expression evidence** — DEG overlap + threshold reproducibility
- **Hub-prioritization evidence** — PPI topology + multi-algorithm consensus (association, not causation)
- **Biomarker performance evidence** — ROC/AUC in discovery + validation cohorts (diagnostic signal, not mechanistic proof)
- **Immune support** — immune landscape differences + gene–immune correlation (associative only; Hard Rule 8)
- **Single-gene mechanistic support** — GSEA pathway themes (hypothesis-generating only; Hard Rule 7)

### Step 7: Risk Review

Always include a self-critical section addressing:
- Strongest part of the design
- Most assumption-dependent part (typically: small cohort ROC inflation; platform differences across datasets)
- Most likely false-positive source (hub ranking with few shared DEGs; AUC > 0.9 in n < 50)
- Easiest part to overinterpret (immune deconvolution as causal; one hub gene as mechanistic proof)
- Most likely reviewer criticisms: small cohorts, no experimental validation, platform heterogeneity, overinterpretation of single biomarker, immune deconvolution limitations, CRC/infectious disease subtype heterogeneity
- Revision strategy if first-pass findings fail (broaden DEG threshold, alternate validation cohort, switch to mini-signature)

### Step 8: Minimal Executable Version

Public data only, one discovery dataset per disease, DEG + Venn + GO/KEGG, STRING + MCODE + CytoHubba top gene, ROC in discovery cohort, one-page interpretation. 2–4 week timeline. Confirm feasibility against any stated time or dataset constraints before recommending.

### Step 9: Publication Upgrade Path

→ Full upgrade impact table: [references/upgrade_path.md](references/upgrade_path.md)

Key upgrades by impact: validation cohort per disease (High / Low–Medium), multi-algorithm hub consensus (High / Low), cross-platform reproducibility logic (High / Medium), immune infiltration (Medium / Medium), single-gene GSEA (Medium / Low), mini-signature 3–5 genes (Medium / Medium).

## R Code Framework Guidelines

When providing R code examples or pipeline frameworks:

1. **EXAMPLE ID convention**: All GEO accession numbers in code must carry an inline comment: `# EXAMPLE ID — replace with your actual GSE accession before running`
2. **Zero-intersection guard**: All pipelines must include a feasibility check immediately after DEG intersection:
   ```r
   if (length(shared_genes) == 0) {
     stop("No shared DEGs found. Recovery options: (1) relax logFC to 0.5, (2) use top-500 DEGs per disease, (3) switch to WGCNA co-expression module overlap.")
   }
   ```
3. **Standard package list**: GEOquery, limma, clusterProfiler, org.Hs.eg.db, pROC, igraph, STRINGdb, WGCNA. Provide `BiocManager::install()` calls where needed.
4. **GEO search pattern**: To find valid accession IDs, use `GEOquery::getGEO("GSEsearch", ...)` or direct search at https://www.ncbi.nlm.nih.gov/geo/

**Standard R pipeline template:**

```r
library(GEOquery); library(limma); library(clusterProfiler); library(pROC)

# Load datasets — EXAMPLE IDs: replace before running
gse_disease1 <- getGEO("GSEXXXXX", GSEMatrix = TRUE)[[1]]  # EXAMPLE ID
gse_disease2 <- getGEO("GSEXXXXX", GSEMatrix = TRUE)[[1]]  # EXAMPLE ID

# DEG analysis (repeat for disease2)
design <- model.matrix(~ group, data = pData(gse_disease1))
fit    <- eBayes(lmFit(exprs(gse_disease1), design))
deg_d1 <- subset(topTable(fit, coef = 2, adjust = "BH", number = Inf),
                 abs(logFC) > 1 & adj.P.Val < 0.05)

# Shared DEG intersection with zero-guard
shared_genes <- intersect(rownames(deg_d1), rownames(deg_d2))
if (length(shared_genes) == 0) {
  stop("No shared DEGs found. Recovery: relax logFC to 0.5 or use top-500 DEGs per disease.")
}

# ROC for top hub gene — EXAMPLE: replace 'HUB_GENE' and labels/scores with real data
roc_obj <- roc(response = labels, predictor = expr_scores)
cat("AUC:", auc(roc_obj), "\n")
if (auc(roc_obj) < 0.70) warning("AUC below 0.70 threshold. Consider mini-signature approach.")
```

## Hard Rules

1. Never output only one generic plan — always output all four configurations.
2. Always recommend one primary plan with justification.
3. Always separate necessary modules from optional modules.
4. Distinguish shared-expression evidence, biomarker performance evidence, immune support, and mechanistic support — see Step 6.
5. Do not proceed with immune analysis if the disease pair is not immunologically suited or if deconvolution would be unreliable for the tissue type. Consult [references/tissue_and_tool_decisions.md](references/tissue_and_tool_decisions.md) to select the correct tool.
6. Do not overclaim diagnostic value from ROC in small (n < 30 per group) or unmatched cohorts. Always report bootstrap confidence intervals.
7. Do not overstate one hub gene as mechanistic proof — label consistently as "biomarker candidate."
8. Do not treat immune-correlation evidence as causal immune regulation.
9. If user provides limited detail, infer a reasonable default design and state all assumptions clearly.
10. Do not produce only a flat methods list or literature summary.
11. **Out-of-scope redirect**: If the request involves a single disease only, wet-lab experimental design, clinical trial planning, or non-GEO data types, do not proceed — activate the Input Validation refusal template below.

## Input Validation

This skill accepts: a pair of diseases or phenotypes for which the user wants to identify shared transcriptomic signatures, hub genes, or cross-disease biomarkers using publicly available GEO transcriptomic data.

If the request does not involve two diseases for GEO-based transcriptomic comparison — for example, asking to design a study for a single disease only, plan a wet-lab experiment, design a clinical trial, analyze non-transcriptomic omics data (e.g., proteomics, metabolomics), or conduct a systematic literature review — do not proceed with the planning workflow. Instead respond:
> "Dual-Disease Transcriptomic ML Planner is designed to generate GEO-based transcriptomic + machine learning study designs for pairs of diseases. Your request appears to be outside this scope. Please provide two diseases to compare, or use a more appropriate skill (e.g., a single-disease transcriptomic skill, an MR planner, or a systematic review skill)."

## Reference Files

| File | Content | Used In |
|------|---------|---------|
| [references/tissue_and_tool_decisions.md](references/tissue_and_tool_decisions.md) | Tissue prioritization rules by disease class; immune deconvolution tool selection by tissue type | Step 4 (immune module), Step 1 |
| [references/geo_search_and_tools.md](references/geo_search_and_tools.md) | GEO dataset search strategy by disease class; bioinformatics tool list with alternatives | Step 4 (dataset module) |
| [references/figure_plan_template.md](references/figure_plan_template.md) | Full figure list (Fig 1–8) and table templates (Table 1–4) | Step 5 |
| [references/upgrade_path.md](references/upgrade_path.md) | Publication upgrade impact vs complexity table | Step 9 |

Related Skills

two-sample-mr-research-planner

from aipoch/medical-research-skills

Generates complete two-sample Mendelian randomization (MR) research designs from a user-provided research direction. Use when users want to design, plan, or build a study using two-sample MR to test causal relationships. Triggers:"design a two-sample MR study", "build a publishable MR paper", "test whether this biomarker causally affects this disease", "generate Lite/Standard/Advanced MR plans", "screen multiple exposures with MR", "bidirectional MR design", "causal inference using GWAS summary statistics", or "I want to study X and Y using MR". Always outputs four workload configurations (Lite / Standard / Advanced / Publication+) with a recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, and publication upgrade path.

non-tumor-ml-research-planner

from aipoch/medical-research-skills

Generates complete non-tumor biomedical machine learning research designs from a user-provided research direction. Always use this skill when users want to plan bioinformatics + ML papers for non-cancer diseases (metabolic, cardiovascular, kidney, inflammatory, autoimmune, infectious, neurological, endocrine, wound healing, chronic multifactor), design diagnostic biomarker studies, combine GEO datasets with feature selection and ML modeling, or generate Lite/Standard/Advanced/Publication+ workload plans. Trigger for:"non-tumor ML study", "bioinformatics paper outside oncology", "key genes and diagnostic model for a disease", "pyroptosis/ferroptosis/senescence/autophagy + disease", "GEO datasets + machine learning", "RF + LASSO diagnostic model", "DEG + feature selection + validation", "immune infiltration + biomarker", "non-cancer biomarker paper". Trigger even for casual phrasings like "I want to study X using machine learning", "help me design a non-tumor bioinformatics paper", or "how do I build a diagnostic model for disease Y".

network-tox-docking-research-planner

from aipoch/medical-research-skills

Generates complete network toxicology + molecular docking research designs from a user-provided toxicant and disease/phenotype. Always use this skill when users want to investigate how an environmental toxicant, endocrine disruptor, heavy metal, food contaminant, pharmaceutical residue, or consumer product chemical may contribute to a disease through shared molecular targets, hub genes, pathways, and docking evidence. Trigger for:"network toxicology study", "toxicology mechanism paper", "target prediction + PPI + docking", "environmental pollutant and disease mechanism", "hub genes and docking for toxicant", "Lite/Standard/Advanced toxicology plan", "CTD + SwissTargetPrediction + GeneCards + STRING", "CB-Dock2 docking study", "triclosan/BPA/cadmium/PFAS + disease". Also triggers for Chinese phrasings:"网络毒理学研究设计"、"毒物机制论文"、"靶点预测+PPI+对接"、"环境污染物与疾病机制". Trigger even for casual phrasings like "I want to study how chemical X affects disease Y" or "help me design a toxicology paper". Always output four workload configurations (Lite / Standard / Advanced / Publication+) with a recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, and publication upgrade path.

faers-multi-drug-soc-planner

from aipoch/medical-research-skills

Generates complete FAERS-based multi-drug single-SOC safety comparison research designs from a user-provided drug set, comparator, and adverse event domain. Always use this skill when users want to compare safety signals across multiple drugs using FAERS or OpenFDA data within one System Organ Class (SOC) or bounded AE domain. Trigger for:"FAERS study comparing drugs within one SOC", "publishable FAERS safety comparison paper", "compare neuropsychiatric adverse events across beta-blockers", "Lite/Standard/Advanced FAERS safety plans", "active-comparator restricted disproportionality", "adjusted ROR logistic regression FAERS", "within-class head-to-head drug comparison", "pharmacovigilance signal comparison", "single-SOC PT-level FAERS design", or any phrasing like "I want to compare drug X and drug Y for adverse events in FAERS" or "build a comparative pharmacovigilance paper". Always output four workload configurations (Lite / Standard / Advanced / Publication+) with a recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, and publication upgrade path.

time-zone-planner

from aipoch/medical-research-skills

Plan cross-time-zone meeting windows for distributed teams, providing region-by-region local time mappings and tradeoff analysis for scheduling decisions.

rare-disease-hpo-mapper

from aipoch/medical-research-skills

Map patient symptoms to Human Phenotype Ontology terms for gene diagnosis.

spatial-transcriptomics-mapper

from aipoch/medical-research-skills

Map spatial transcriptomics data from 10x Genomics Visium/Xenium onto.

poster-layout-planner

from aipoch/medical-research-skills

Use poster layout planner for other workflows that need structured execution, explicit assumptions, and clear output boundaries.

treatment-response-predictor-planner

from aipoch/medical-research-skills

Designs studies for predicting treatment response or resistance in biomedical and clinical research. Always use this skill when the user needs a treatment-response or resistance prediction study blueprint rather than a prognostic biomarker protocol, diagnostic test design, causal treatment-effect estimation, or a completed manuscript. Focus on responder definition, treatment context, baseline comparability, feature integration strategy, model development logic, validation architecture, and interpretation boundaries. Do not invent response rates, cohort size, assay readiness, regimen uniformity, literature support, or validation access.

single-compound-network-toxicology-disease-link-reference-grounded

from aipoch/medical-research-skills

Generates complete single-compound network-toxicology research designs from one exposure, one disease or toxic phenotype, and a validation direction. Use when a study centers on one compound–one disease link and needs target collection, overlap construction, enrichment, PPI hub prioritization, docking, optional transcriptomic cross-check, and conservative mechanistic synthesis. Covers five study patterns and always outputs Lite / Standard / Advanced / Publication+ with a recommended primary plan, stepwise workflow, figure plan, validation hierarchy, minimal executable version, publication upgrade path, and strictly verified literature retrieval.

qtl-colocalization-study-planner

from aipoch/medical-research-skills

Designs QTL colocalization studies that connect eQTL, pQTL, sQTL, or related molecular QTL signals with GWAS loci. Always use this skill whenever a user wants to plan, scope, or structure a locus-level study asking whether a GWAS association and a molecular QTL association may reflect the same underlying causal signal. Covers locus definition, QTL/GWAS source architecture, ancestry and LD alignment, single-locus vs multi-locus strategy, candidate-gene prioritization, optional fine-mapping, linked MR/SMR follow-up, and functional annotation. Always output four workload configurations (Lite / Standard / Advanced / Publication+) with a recommended primary plan, stepwise workflow, method rationale, evidence hierarchy, figure plan, minimal executable version, and strictly verified literature guidance with no fabricated references. Never equate colocalization with causality proof, mediation proof, or automatic target validation. Always include the mandatory Dataset Disclaimer immediately before any workflow section that mentions datasets, repositories, consortia, or public resources.

medical-research-gap-to-study-planner

from aipoch/medical-research-skills

Converts an audited medical research gap into a complete, structured, gap-traceable study design. Always use this skill whenever a user already has one or more candidate research gaps and wants to transform them into an executable biomedical research plan rather than re-run broad topic ideation. Covers six gap-to-design patterns (evidence-completion, mechanism-resolution, cell-state/context-mapping, translation-bridge, causality-upgrade, population/stage-specific) and always outputs one recommended primary protocol, a gap-to-design dependency map, step-by-step workflow, figure plan, validation strategy, minimal executable version, publication upgrade path, and verified design-support literature rules. Never fabricate references. Preserve claim-evidence discipline and do not replace a topic-specific gap with a generic workflow.