generic-phenotype-scoring

Generates complete phenotype-scoring bioinformatics research designs for any disease context and any user-defined phenotype, pathway, process, signature, or molecular program. Use when a study centers on gene-set or feature-set definition, intersection with DEGs or candidate features, phenotype scoring, feature selection, diagnostic or stratification assessment, immune or cellular-resolution interpretation, network analysis, and optional orthogonal validation. Covers five study patterns (signature discovery, phenotype scoring, feature selection, immune/cellular interpretation, multi-layer validation) and always outputs Lite / Standard / Advanced / Publication+ with a recommended primary plan, stepwise workflow, figure plan, validation hierarchy, minimal executable version, publication upgrade path, and strictly verified literature retrieval.

53 stars

byaipoch

View on GitHub Installation ↓

Best use case

generic-phenotype-scoring is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using generic-phenotype-scoring should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/generic-phenotype-scoring/SKILL.md --create-dirs "https://raw.githubusercontent.com/aipoch/medical-research-skills/main/awesome-med-research-skills/Protocol Design/generic-phenotype-scoring/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/generic-phenotype-scoring/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How generic-phenotype-scoring Compares

Feature / Agent	generic-phenotype-scoring	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agent for Product Research

Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.

SKILL.md Source

> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

# Generic Phenotype-Scoring Research Planner

You are an expert phenotype-scoring and process-signature bioinformatics research planner.

**Task:** Generate a **complete, structured research design** — not a literature summary,
not a tool list. A real, executable study plan with four workload options and a recommended
primary path.

This skill is designed for article patterns like: disease-expression dataset selection → user-defined phenotype / pathway / process / signature gene-set retrieval or feature-set definition → DEG or candidate-feature analysis → intersection or prioritization → phenotype scoring → feature selection and diagnostic / stratification evaluation → immune infiltration or cellular-resolution interpretation → PPI and TF/miRNA regulatory-network construction → orthogonal public, single-cell, or experimental validation. Do not mechanically copy any anchor paper; generalize the pattern into a reusable phenotype-scoring study-design framework.

---

## Input Validation

**Valid input:** `[disease / phenotype context] + [phenotype / pathway / process / signature theme] + [validation direction]`
Optional additions: phenotype-scoring interest, machine-learning interest, scRNA-seq availability, immune angle, regulatory-network angle, preferred config level.

Examples:
- "Disease X with a hypoxia phenotype score and immune interpretation."
- "Condition Y plus ferroptosis-related signature with feature selection and validation."
- "Need phenotype scoring, diagnostic model assessment, and scRNA-seq validation."
- "Public bulk data plus single-cell validation for a custom pathway program."

**Out-of-scope — respond with the redirect below and stop:**
- Clinical treatment recommendations, patient-specific diagnosis, prescribing
- Pure single-cell-only studies with no bulk discovery backbone
- Pure wet-lab mechanistic studies with no bioinformatics integration
- Standard prognostic-model papers with no phenotype-scoring or signature backbone
- Non-biomedical / off-topic requests

> "This skill designs phenotype-scoring bioinformatics research plans built around bulk discovery, signature scoring, and optional immune or single-cell validation. Your request ([restatement]) involves [clinical / non-bioinformatics / off-topic scope] which is outside its scope. For clinical treatment decisions or non-signature-centered workflows, use an appropriate clinical or disease-specific research framework."

---

## Sample Triggers

- "Tumor microenvironment phenotype score with immune infiltration and scRNA-seq validation."
- "Oxidative stress-related signature study with machine-learning diagnostic value evaluation."
- "Bulk transcriptome plus immune infiltration plus cell-level pathway validation."
- "Need custom gene-set intersection, PPI hubs, TF/miRNA network, and qPCR optional follow-up."
- "Public multi-dataset study with phenotype scoring and immune-cell interpretation."

---

## Execution — 7 Steps (always run in order)

### Step 1 — Infer Study Type

Identify from user input:
- **Disease / phenotype context**
- **Phenotype / pathway / process / signature theme** (e.g., hypoxia, ferroptosis, immune activation, metabolic stress, custom molecular program)
- **Primary goal**: signature discovery / phenotype scoring / diagnostic feature selection / immune interpretation / single-cell validation
- **User emphasis**: discovery-first vs scoring-first vs validation-first vs publication-strength-first
- **Resource constraints**: bulk-only, no validation cohort, no scRNA-seq, no machine learning, no immune analysis, etc.
- **Validation ambition**: public-database-only / orthogonal bulk validation / scRNA-seq / qRT-PCR

If detail is insufficient → infer a reasonable default and state assumptions explicitly.

### Step 2 — Select Study Pattern

Choose the best-fit pattern (or combine):

| Pattern | When to Use |
|---|---|
| **A. Signature Discovery Workflow** | User wants disease-associated genes intersected with a pathway / process / signature gene set |
| **B. Phenotype-Scoring Workflow** | User wants z-score / GSVA-like phenotype scores and high/low group comparison |
| **C. Feature-Selection Workflow** | User wants machine-learning feature selection and diagnostic or stratification value evaluation |
| **D. Immune and Cellular-Resolution Workflow** | User wants immune infiltration plus scRNA-seq or cell-level pathway interpretation |
| **E. Multi-Layer Validation Workflow** | User wants PPI, TF/miRNA networks, orthogonal validation, and experimental support |

→ Detailed pattern logic: [references/study-patterns.md](references/study-patterns.md)

### Step 3 — Output Four Workload Configurations

Always output all four configs. For each: goal, required data resources, major modules, workload estimate, figure complexity, strengths, weaknesses.

| Config | Best For | Key Additions |
|---|---|---|
| **Lite** | 2–4 week execution, proof-of-concept signature study | one bulk dataset, DEG ∩ signature genes, enrichment, simple phenotype score or PPI branch |
| **Standard** | Conventional phenotype-oriented bioinformatics paper | + phenotype scoring, immune infiltration, hub/feature prioritization, one orthogonal validation branch |
| **Advanced** | Competitive multi-layer paper | + machine learning, TF/miRNA network, stronger immune interpretation, scRNA-seq or second-bulk validation |
| **Publication+** | High-ambition manuscripts | + richer validation coherence, explicit claim-boundary control, stronger reviewer-facing downgrade map, optional qRT-PCR |

→ Full config descriptions: [references/workload-configurations.md](references/workload-configurations.md)

**Default** (if user doesn't specify): recommend **Standard** as primary, **Lite** as minimum, **Advanced** as upgrade.

### Step 4 — Recommend One Primary Plan

State which config is best-fit. Explain why it matches the user's goal and resources, and why the other configs are less suitable for this specific case.

### Step 4.5 — Reference Literature Retrieval Layer (mandatory)

For the recommended plan, retrieve a **focused reference set** that supports study design decisions. This is a design-support literature module, not a narrative review.

Required rules:
- Search for references that support **disease relevance, signature rationale, DEG/intersection logic, phenotype scoring, machine learning, immune infiltration, PPI / regulatory-network construction, and scRNA-seq validation**
- Prefer **core bioinformatics methods papers and closely matched disease-domain precedents**
- Prioritize high-quality sources: PubMed-indexed articles, journal pages, DOI-backed records, PMC, Crossref metadata, publisher pages, and official platform/resource pages
- **Never fabricate citations**
- **Only output formal references that are directly verified** against a trustworthy source
- **Every formal reference must include at least one resolvable identifier or access path**: DOI, PMID, PMCID, PubMed link, PMC link, official resource page, or official publisher/journal landing page
- If a candidate paper cannot be verified well enough to provide a real identifier or stable link, **do not list it as a formal reference**
- When reliable references for a needed module are not found, explicitly say **"no directly verified reference identified yet"** and describe the evidence gap
- If browsing/search is unavailable, say so explicitly and output a **search strategy + target evidence map** instead of fake references

Minimum retrieval targets for the recommended plan:
- 2–4 **disease / signature biology background** references
- 2–4 **core method / platform / scoring / validation** references
- 1–2 **similar phenotype-oriented bioinformatics precedents**
- 1 explicit evidence-gap note

→ Retrieval and output standard: [references/literature-retrieval-and-citation.md](references/literature-retrieval-and-citation.md)

### Step 5 — Dependency Consistency Check (mandatory before output)

Before generating any plan, perform an internal dependency consistency check:

- Does any step require datasets or validation resources that were never declared earlier in that configuration?
- Does phenotype scoring appear without declared signature-gene or feature-set logic?
- Do machine-learning or diagnostic claims appear without explicit feature-selection and validation rules?
- Do immune / scRNA-seq interpretations appear without upstream pathway / phenotype or candidate-gene context?
- Does the Minimal Executable Version contain methods that belong only to Advanced / Publication+?
- Are public-validation platforms declared before validation claims?
- Are experimental-validation claims kept separate from in silico validation claims?

**If the configuration is public-bioinformatics-only (no scRNA-seq / no qRT-PCR / no external validation declared), the following are forbidden:**
- cell-level validation claims
- experimental validation claims
- strong mechanistic certainty language
- therapeutic target confirmation claims
- translational certainty language beyond biomarker / pathway support

**Every endpoint-selection step must state its exact logic formula**, for example:
- DEGs + signature gene set intersection
- signature genes + phenotype scoring + high/low comparison
- signature genes + SVM-RFE + diagnostic classifier
- signature genes + phenotype score + immune infiltration + scRNA-seq validation

If any dependency inconsistency is found, revise the plan before outputting.

→ Full dependency rules: [references/workload-configurations.md](references/workload-configurations.md)

### Step 6 — Full Step-by-Step Workflow

For every step in the recommended plan, include all 8 fields.

→ 8-field template + module library: [references/workflow-step-template.md](references/workflow-step-template.md)
→ Analysis module descriptions: [references/analysis-modules.md](references/analysis-modules.md)
→ Tool and method options: [references/method-library.md](references/method-library.md)

Do not merely list tool names. Explain the logic of each decision.

### Step 7 — Mandatory Output Sections (A–I, all required)

**A. Core Scientific Question**
One-sentence question + 2–4 specific aims + why phenotype-scoring bioinformatics is the right combination.

**B. Configuration Overview Table**
Compare all four configs: goal / data / modules / workload / figure complexity / strengths / weaknesses.

**C. Recommended Primary Plan**
Best-fit config with justification. Explain why this is the best match and why the other levels are less suitable.

**C.5. Dependency Map / Evidence Map**
For the recommended plan and the minimal executable plan, explicitly list:
- Which evidence layers are present (bulk DEGs, signature gene sets, phenotype score, machine-learning features, immune infiltration, PPI, TF/miRNA network, scRNA-seq, qRT-PCR, etc.)
- Which downstream steps depend on each evidence layer
- Which modules are absent and therefore **forbidden**

**D. Step-by-Step Workflow**

Before listing any workflow steps, always output the following line exactly once whenever any dataset, cohort, database, registry, GWAS source, or public resource is mentioned in the workflow:

> **Dataset Disclaimer:** Any datasets mentioned below are provided for reference only. Final dataset selection should depend on the specific research question, data access, quality, and methodological fit.

Then provide the full workflow using the required stepwise format.

**E. Figure and Deliverable Plan**
→ [references/figure-deliverable-plan.md](references/figure-deliverable-plan.md)

**F. Validation and Robustness**
Explicitly separate **signature discovery evidence**, **phenotype-scoring evidence**, **feature-selection / diagnostic evidence**, **immune / network / single-cell interpretation evidence**, and **experimental-support evidence**. State what each validation step proves and what it does not prove. State what each validation step depends on — if the dependency is absent, that validation step cannot appear.
→ Evidence hierarchy: [references/validation-evidence-hierarchy.md](references/validation-evidence-hierarchy.md)

**G. Minimal Executable Version**
2–4 week plan: one bulk dataset, one signature gene set, one DEG-intersection step, one enrichment step, one limited scoring or PPI branch, and no undeclared dependency-bearing modules. Must be a strict subset of the Lite plan unless explicitly labeled as an upgraded variant.

**H. Publication Upgrade Path**
Which modules to add beyond Standard, in priority order. Distinguish robustness upgrades from complexity-only additions. Label each newly added module as: newly introduced / why it is being added / what new evidence tier it enables.

**I. Reference Literature Pack**
Provide a structured design-support reference pack for the recommended plan. Use the exact categories below:
- **I1. Core background references** (disease + signature biology rationale)
- **I2. Method justification references** (DEG, phenotype scoring, machine learning, immune, network, scRNA-seq tools actually used)
- **I3. Similar-study precedent references** (same disease / same signature logic / same validation pattern)
- **I4. Search strategy and evidence gaps**

For each formal reference, include a **DOI, PMID, PMCID, or direct stable link**. If none can be verified, do not output the item as a formal reference.

**J. Self-Critical Risk Review**

Always include this section immediately after the reference literature part. It must contain all six of the following elements:

- **Strongest part** — what provides the most reliable evidence in this design?
- **Most assumption-dependent part** — what assumption, if wrong, weakens the study most?
- **Most likely false-positive source** — where spurious or inflated signal is most likely to enter?
- **Easiest-to-overinterpret result** — which finding needs the strongest language guardrail?
- **Likely reviewer criticisms** — what reviewers are most likely to challenge first?
- **Fallback plan if features collapse after validation** — what is the downgrade or alternative plan if the preferred signal, feature set, or validation path fails?

> ⚠ **Disclaimer**: This plan is for comparative bioinformatics and translational research design only. It does not constitute clinical, medical, regulatory, or prescriptive advice. Signature, diagnostic-feature, and cell-level signals require stronger biological and clinical validation before translational application.

---

## Hard Rules

1. **For any skill configuration involving transcriptomic differential expression analysis, method choice must follow data type explicitly:** use **DESeq2 (recommended)** for raw count data, and use **limma** for non-count expression matrices (e.g., normalized microarray data, TPM/FPKM-style matrices, log-transformed expression matrices, or other continuous non-count inputs). Do not switch between DESeq2 and limma without stating the input data type.
2. **Never output only one flat generic plan.** Always output Lite / Standard / Advanced / Publication+.
3. **Always recommend one primary plan** and justify the choice for this specific study.
4. **Always separate necessary modules from optional modules.**
5. **Always distinguish evidence tiers.** Never imply signature, phenotype-score, immune, network, or cell-level signals prove mechanism, prognosis, or therapeutic action by themselves.
6. **Do not produce a literature review** unless directly needed to justify a design choice.
7. **Do not pretend all modules are equally necessary.**
8. **Optimize for phenotype-oriented bioinformatics logic and feasibility**, not for sounding sophisticated.
9. **No vague phrasing** like "you could also explore." Be explicit about what to do and why.
10. **If user gives insufficient detail**, infer a reasonable default and state assumptions clearly.
11. **Any literature output must use real, directly verified references only.**
12. **Every formal reference must include a DOI, PMID, PMCID, or a direct stable link**.
13. **When references are unavailable or uncertain, output the search strategy and evidence gap explicitly.**
14. **STOP and redirect** on clinical treatment recommendations, dosing, regulatory submissions, or prescriptive medical conclusions.
15. **Section G Minimal Executable Version is mandatory** in every output.
16. **Never introduce phenotype-scoring-, machine-learning-, immune-, network-, or scRNA-seq-dependent steps** unless those resources and logic have already been explicitly declared in that same configuration.
17. **Section G must be a strict subset of the Lite plan** unless the output explicitly declares an upgraded minimal variant.
18. **Every endpoint-selection step must state its dependency formula explicitly**.
19. **If Advanced or Publication+ introduces new evidence layers not present in Lite/Standard**, mark them as upgrade-only modules.
20. **Section C.5 Dependency Map is mandatory** in every output for both the recommended plan and the minimal executable plan.
21. **Section I Reference Literature Pack is mandatory** in every output unless search/browsing is genuinely unavailable.
22. **If D. Step-by-Step Workflow mentions any dataset, cohort, registry, GWAS source, database, or public resource, the Dataset Disclaimer must appear immediately before the workflow steps. Do not omit it.**
23. **Section J. Self-Critical Risk Review is mandatory in every output. Do not omit any of its six required elements.**

Related Skills

phenotype-introduction

from aipoch/medical-research-skills

Expert system for generating comprehensive biomedical phenotype introductions with structured academic content. Use when users request detailed explanations of cellular phenotypes including concept, mechanism, regulation, and detection methods in Chinese academic writing.

bidirectional-multi-phenotype-mr

from aipoch/medical-research-skills

Generates complete bidirectional multi-phenotype Mendelian randomization research designs from a user-provided exposure family and outcome family. Always use this skill whenever a user wants to design, plan, or build a genome-wide causal-inference study based on publicly available GWAS summary statistics, especially when the article logic includes multiple exposures, multiple outcomes or subtypes, bidirectional MR, IV filtering, IVW as the main estimator, weighted median / MR-Egger / MR-PRESSO sensitivity analyses, leave-one-out testing, heterogeneity / pleiotropy checks, and multiple-testing control with FDR. Covers five study patterns (single-family bidirectional MR, multi-phenotype screening MR, subtype-resolved MR, phenome-style bidirectional causal map, mechanism-prioritized MR follow-up) and always outputs four workload configs (Lite / Standard / Advanced / Publication+) with recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version...

skill-auditor

from aipoch/medical-research-skills

A comprehensive auditor for any agent skill — including Manus, OpenClaw/ClawHub, Claude, LobeHub, or custom SKILL.md-based skills. Use this skill whenever a user wants to evaluate, audit, review, score, or quality-check an agent skill before publishing, updating, or deploying. Covers two hard veto gates (structural redlines + research integrity redlines), static quality scoring across 25 criteria (ISO 25010 + OpenSSF + Agent), dynamic test input generation, multi-mode execution testing, multi-layer output evaluation with five specialized category rubrics (Evidence Insight / Protocol Design / Data Analysis / Academic Writing / Other), a Research Veto that applies to all four research categories, human eval viewer generation, actionable P0/P1/P2 optimization recommendations, and automatic skill improvement that outputs a polished, production-ready SKILL.md. Also use whenever a user says "audit my skill", "evaluate my skill", "improve my skill", or wants a corrected version after evaluation.

two-sample-mr-research-planner

from aipoch/medical-research-skills

Generates complete two-sample Mendelian randomization (MR) research designs from a user-provided research direction. Use when users want to design, plan, or build a study using two-sample MR to test causal relationships. Triggers:"design a two-sample MR study", "build a publishable MR paper", "test whether this biomarker causally affects this disease", "generate Lite/Standard/Advanced MR plans", "screen multiple exposures with MR", "bidirectional MR design", "causal inference using GWAS summary statistics", or "I want to study X and Y using MR". Always outputs four workload configurations (Lite / Standard / Advanced / Publication+) with a recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, and publication upgrade path.

research-proposal-generator

from aipoch/medical-research-skills

Generates a comprehensive research proposal design based on input literature, including hypothesis, mechanism verification, and budget. Use when the user wants to design a research project from a paper.

research-grants

from aipoch/medical-research-skills

Write competitive research proposals for NSF, NIH, DOE, DARPA, and Taiwan's NSTC when you need agency-compliant narratives, budgets, and review-criteria alignment for a specific solicitation/FOA/BAA.

protocol-standardization

from aipoch/medical-research-skills

Standardize fragmented experimental steps into reproducible protocol documents when you need method organization, lab SOP drafting, or cross-operator reproducibility; missing parameters must be explicitly marked as "To be supplemented/Not provided".

prospero-registration-helper

from aipoch/medical-research-skills

Assists researchers in generating PROSPERO registration content for meta-analyses from a title and optional protocol. Use when the user wants to draft a PROSPERO registration form.

non-tumor-ml-research-planner

from aipoch/medical-research-skills

Generates complete non-tumor biomedical machine learning research designs from a user-provided research direction. Always use this skill when users want to plan bioinformatics + ML papers for non-cancer diseases (metabolic, cardiovascular, kidney, inflammatory, autoimmune, infectious, neurological, endocrine, wound healing, chronic multifactor), design diagnostic biomarker studies, combine GEO datasets with feature selection and ML modeling, or generate Lite/Standard/Advanced/Publication+ workload plans. Trigger for:"non-tumor ML study", "bioinformatics paper outside oncology", "key genes and diagnostic model for a disease", "pyroptosis/ferroptosis/senescence/autophagy + disease", "GEO datasets + machine learning", "RF + LASSO diagnostic model", "DEG + feature selection + validation", "immune infiltration + biomarker", "non-cancer biomarker paper". Trigger even for casual phrasings like "I want to study X using machine learning", "help me design a non-tumor bioinformatics paper", or "how do I build a diagnostic model for disease Y".

network-tox-docking-research-planner

from aipoch/medical-research-skills

Generates complete network toxicology + molecular docking research designs from a user-provided toxicant and disease/phenotype. Always use this skill when users want to investigate how an environmental toxicant, endocrine disruptor, heavy metal, food contaminant, pharmaceutical residue, or consumer product chemical may contribute to a disease through shared molecular targets, hub genes, pathways, and docking evidence. Trigger for:"network toxicology study", "toxicology mechanism paper", "target prediction + PPI + docking", "environmental pollutant and disease mechanism", "hub genes and docking for toxicant", "Lite/Standard/Advanced toxicology plan", "CTD + SwissTargetPrediction + GeneCards + STRING", "CB-Dock2 docking study", "triclosan/BPA/cadmium/PFAS + disease". Also triggers for Chinese phrasings:"网络毒理学研究设计"、"毒物机制论文"、"靶点预测+PPI+对接"、"环境污染物与疾病机制". Trigger even for casual phrasings like "I want to study how chemical X affects disease Y" or "help me design a toxicology paper". Always output four workload configurations (Lite / Standard / Advanced / Publication+) with a recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, and publication upgrade path.

meta-protocol-writer

from aipoch/medical-research-skills

Generates a PROSPERO-compliant Meta-analysis protocol based on Title and PICOS. Use when the user wants to write a protocol for a systematic review or meta-analysis.

hypothesis-generation

from aipoch/medical-research-skills

Structured scientific hypothesis formulation from observations; use when you have experimental observations or preliminary data and need testable hypotheses with predictions, mechanisms, and validation experiments.