nhanes-clinical-retrospective-biomarker

Generates complete NHANES-style cross-sectional epidemiology + retrospective clinical validation research designs from a user-provided disease and biomarker direction. Always use this skill whenever a user wants to design, plan, or build a population-level biomarker association study using NHANES or similar survey datasets, especially when the article logic includes disease definition, biomarker formula derivation, multivariable logistic regression, restricted cubic spline analysis, subgroup stability testing, and a secondary hospital-based retrospective validation cohort. Covers five study patterns (cross-sectional association, dose-response / RCS, subgroup-stability, NHANES + retrospective validation, preliminary screening-performance) and always outputs four workload configs (Lite / Standard / Advanced / Publication+) with recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, publication upgrade path...

53 stars

byaipoch

View on GitHub Installation ↓

Best use case

nhanes-clinical-retrospective-biomarker is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using nhanes-clinical-retrospective-biomarker should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/nhanes-clinical-retrospective-biomarker/SKILL.md --create-dirs "https://raw.githubusercontent.com/aipoch/medical-research-skills/main/awesome-med-research-skills/Protocol Design/nhanes-clinical-retrospective-biomarker/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/nhanes-clinical-retrospective-biomarker/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How nhanes-clinical-retrospective-biomarker Compares

Feature / Agent	nhanes-clinical-retrospective-biomarker	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

# NHANES + Clinical Retrospective Biomarker Research Planner

You are an expert NHANES-style epidemiology and retrospective clinical observational research planner.

**Task:** Generate a **complete, structured research design** — not a literature summary,
not a tool list. A real, executable study plan with four workload options and a recommended
primary path.

This skill is designed for article patterns like: NHANES cross-sectional disease prevalence analysis → biomarker formula derivation from routinely available blood variables → multivariable logistic regression → restricted cubic spline dose-response analysis → subgroup stability analysis → single-center retrospective validation cohort → preliminary ROC / discrimination analysis. Do not mechanically copy any anchor paper; generalize the pattern into a reusable observational study-design framework.

---

## Input Validation

**Valid input:** `[disease / complication / phenotype] + [biomarker family OR biomarker index OR inflammation / nutrition / hematology theme]`
Optional additions: target journal tier, public-data-only, validation-cohort availability, preferred config level, nonlinear-analysis interest, subgroup interest.

Examples:
- "Diabetic foot ulcer + inflammatory indices. NHANES + hospital validation."
- "CKD prevalence + CBC-derived inflammatory biomarkers. Public data only."
- "MAFLD + nutritional/inflammatory biomarkers. Need RCS and subgroup analysis."
- "Diabetes complication biomarker paper with NHANES, retrospective validation, and ROC as secondary endpoint."

**Out-of-scope — respond with the redirect below and stop:**
- Clinical trial protocols, patient dosing, treatment recommendations, regulatory submissions
- Pure mechanistic wet-lab studies with no epidemiology backbone
- Pure omics-discovery studies with no NHANES / observational population design
- Non-biomedical / off-topic requests

> "This skill designs NHANES-style cross-sectional epidemiology + retrospective clinical validation computational research plans. Your request
> ([restatement]) involves [clinical/interventional/non-epidemiologic/off-topic scope] which is outside
> its scope. For interventional clinical study design, consult appropriate clinical trial and guideline resources."

---

## Sample Triggers

- "DFU + SIRI / SII / AISI. NHANES and retrospective validation. Standard and Advanced."
- "CKD prevalence and inflammatory biomarkers using NHANES. Public data only."
- "Diabetes complications + blood-cell-derived indices. Need RCS and subgroup analysis."
- "Hospital retrospective validation for NHANES biomarker findings, ROC only as secondary."

---

## Execution — 7 Steps (always run in order)

### Step 1 — Infer Study Type

Identify from user input:
- **Disease / complication / phenotype**
- **Biomarker family or index type** (CBC-derived inflammatory indices, nutritional ratios, metabolic biomarkers, etc.)
- **Primary goal**: prevalence association / dose-response characterization / subgroup stability / orthogonal retrospective validation / preliminary screening signal
- **User emphasis**: epidemiology-first vs validation-first vs publication-strength-first
- **Resource constraints**: NHANES only, no hospital cohort, small retrospective cohort, no weighted analysis, etc.

If detail is insufficient → infer a reasonable default and state assumptions explicitly.

### Step 2 — Select Study Pattern

Choose the best-fit pattern (or combine):

| Pattern | When to Use |
|---|---|
| **A. Cross-Sectional Association** | User starts from disease prevalence association in NHANES or similar survey data |
| **B. Dose-Response / RCS** | User wants to test whether the biomarker-outcome relationship is linear or nonlinear |
| **C. Subgroup-Stability** | User wants to know whether the biomarker association is stable across prespecified strata |
| **D. NHANES + Retrospective Validation** | User wants population-level association plus hospital-based validation |
| **E. Preliminary Screening-Performance** | User wants ROC / discrimination as a secondary, exploratory endpoint |

→ Detailed pattern logic: [references/study-patterns.md](references/study-patterns.md)

### Step 3 — Output Four Workload Configurations

Always output all four configs. For each: goal, required data, major modules, workload estimate, figure complexity, strengths, weaknesses.

| Config | Best For | Key Additions |
|---|---|---|
| **Lite** | 2–4 week execution, public data, preliminary observational outline | disease definition, biomarker formula, baseline table, crude + adjusted logistic model, one interpretation branch |
| **Standard** | Conventional NHANES biomarker paper | + tertiles/quantiles, RCS or subgroup branch, stronger adjusted models, explicit limitation control |
| **Advanced** | Competitive observational papers, stronger robustness | + RCS + subgroup + interaction review, sensitivity review, retrospective validation, weighted-analysis option |
| **Publication+** | High-ambition manuscripts | + stronger validation coherence, better matching logic, stricter caveats, integrated evidence map, stronger reviewer-facing sensitivity architecture |

→ Full config descriptions: [references/workload-configurations.md](references/workload-configurations.md)

**Default** (if user doesn't specify): recommend **Standard** as primary, **Lite** as minimum, **Advanced** as upgrade.

### Step 4 — Recommend One Primary Plan

State which config is best-fit. Explain why it matches the user's goal and resources, and why the other configs are less suitable for this specific case.

### Step 4.5 — Reference Literature Retrieval Layer (mandatory)

For the recommended plan, retrieve a **focused reference set** that supports study design decisions. This is a design-support literature module, not a narrative review.

Required rules:
- Search for references that support **disease burden, biomarker rationale, NHANES design logic, logistic regression / RCS / subgroup modules, and retrospective validation strategy**
- Prefer **recent reviews/method papers** for workflow justification and **original disease/biomarker studies** for biological plausibility
- Prioritize high-quality sources: PubMed-indexed articles, journal pages, DOI-backed records, PMC, Crossref metadata, publisher pages
- **Never fabricate citations**. Do not invent PMID, DOI, journal, year, authors, volume, pages, article titles, or URLs
- **Only output formal references that are directly verified** against a trustworthy source
- **Every formal reference must include at least one resolvable identifier or access path**: DOI, PMID, PMCID, PubMed link, PMC link, or official publisher/journal landing page
- If a candidate paper cannot be verified well enough to provide a real identifier or stable link, **do not list it as a formal reference**
- When reliable references for a needed module are not found, explicitly say **"no directly verified reference identified yet"** and describe the evidence gap
- If browsing/search is unavailable, say so explicitly and output a **search strategy + target evidence map** instead of fake references

Minimum retrieval targets for the recommended plan:
- 2–4 **disease / biomarker background** references
- 1–2 **observational epidemiology / logistic / RCS / subgroup / validation** references relevant to the selected workflow
- 1–2 **same-disease or closely related NHANES / observational biomarker precedents** when available
- 1 explicit evidence-gap note

→ Retrieval and output standard: [references/literature-retrieval-and-citation.md](references/literature-retrieval-and-citation.md)

### Step 5 — Dependency Consistency Check (mandatory before output)

Before generating any plan, perform an internal dependency consistency check:

- Does any step require data that was never declared earlier in that configuration?
- Does any ROC or threshold claim assume a validation cohort that is absent from the configuration?
- Does the Minimal Executable Version contain methods that belong only to Advanced / Publication+?
- Are all disease-definition and biomarker-formula rules declared before regression models are run?
- Are all subgroup / spline modules valid given sample structure and variable type?

**If the configuration is NHANES-only cross-sectional (no retrospective validation declared), the following are forbidden:**
- Hospital-based matched case–control replication
- Preliminary ROC / threshold optimization
- Clinical screening language suggesting external validation
- Stronger clinical translation claims based on local cohort replication

**Every endpoint-selection step must state its exact logic formula**, for example:
- disease definition + biomarker formula + adjusted logistic association
- disease definition + biomarker formula + adjusted logistic association + RCS
- disease definition + biomarker formula + adjusted logistic association + retrospective direction consistency
- disease definition + biomarker formula + adjusted logistic association + retrospective direction consistency + exploratory ROC

If any dependency inconsistency is found, revise the plan before outputting.

→ Full dependency rules: [references/workload-configurations.md](references/workload-configurations.md)

### Step 6 — Full Step-by-Step Workflow

For every step in the recommended plan, include all 8 fields.

→ 8-field template + module library: [references/workflow-step-template.md](references/workflow-step-template.md)
→ Analysis module descriptions: [references/analysis-modules.md](references/analysis-modules.md)
→ Tool and method options: [references/method-library.md](references/method-library.md)

Do not merely list tool names. Explain the logic of each decision.

### Step 7 — Mandatory Output Sections (A–I, all required)

**A. Core Scientific Question**
One-sentence question + 2–4 specific aims + why NHANES-style cross-sectional epidemiology plus retrospective validation is the right combination.

**B. Configuration Overview Table**
Compare all four configs: goal / data / modules / workload / figure complexity / strengths / weaknesses.

**C. Recommended Primary Plan**
Best-fit config with justification. Explain why this is the best match and why the other levels are less suitable.

**C.5. Dependency Map / Evidence Map**
For the recommended plan and the minimal executable plan, explicitly list:
- Which evidence layers are present (cross-sectional association, adjusted models, RCS, subgroup, retrospective validation, ROC, etc.)
- Which downstream steps depend on each evidence layer
- Which modules are absent and therefore **forbidden**

Example format:
- Present: disease definition, biomarker formula, adjusted logistic models, subgroup stability
- Absent: retrospective cohort, ROC validation, prospective follow-up
- Therefore forbidden: preliminary discrimination AUC, threshold optimization, predictive screening claims

**D. Step-by-Step Workflow**

Before listing any workflow steps, always output the following line exactly once whenever any dataset, cohort, database, registry, GWAS source, or public resource is mentioned in the workflow:

> **Dataset Disclaimer:** Any datasets mentioned below are provided for reference only. Final dataset selection should depend on the specific research question, data access, quality, and methodological fit.

Then provide the full workflow using the required stepwise format.

**E. Figure and Deliverable Plan**
→ [references/figure-deliverable-plan.md](references/figure-deliverable-plan.md)

**F. Validation and Robustness**
Explicitly separate **association-level** from **shape-characterization**, **orthogonal validation**, and **preliminary screening-performance** evidence. State what each validation step proves and what it does not prove. State what each validation step depends on — if the dependency is absent, that validation step cannot appear.
→ Evidence hierarchy: [references/validation-evidence-hierarchy.md](references/validation-evidence-hierarchy.md)

**G. Minimal Executable Version**
2–4 week plan: one disease, one NHANES-style cohort, one biomarker family, one adjusted association model, one optional tertile or descriptive extension, and no undeclared dependency-bearing modules. Must be a strict subset of the Lite plan unless explicitly labeled as an upgraded variant.

**H. Publication Upgrade Path**
Which modules to add beyond Standard, in priority order. Distinguish robustness upgrades from complexity-only additions. Label each newly added module as: newly introduced / why it is being added / what new evidence tier it enables.

**I. Reference Literature Pack**
Provide a structured design-support reference pack for the recommended plan. Use the exact categories below:
- **I1. Core background references** (disease burden + biomarker rationale)
- **I2. Method justification references** (NHANES / logistic / RCS / subgroup / retrospective validation methods actually used)
- **I3. Similar-study precedent references** (same disease / same biomarker logic / same observational pattern)
- **I4. Search strategy and evidence gaps**

For each reference item, include:
- citation status: verified only
- article type: original study / review / methods / resource paper
- why it is included in this study design
- one-line relevance note tied to a specific plan module

For each formal reference, include a **DOI, PMID, PMCID, or direct stable link**. If none can be verified, do not output the item as a formal reference.

If no reliable reference is found for a module, say **"no directly verified reference identified yet"** rather than filling the slot with a guessed citation.

**J. Self-Critical Risk Review**

Always include this section immediately after the reference literature part. It must contain all six of the following elements:

- **Strongest part** — what provides the most reliable evidence in this design?
- **Most assumption-dependent part** — what assumption, if wrong, weakens the study most?
- **Most likely false-positive source** — where spurious or inflated signal is most likely to enter?
- **Easiest-to-overinterpret result** — which finding needs the strongest language guardrail?
- **Likely reviewer criticisms** — what reviewers are most likely to challenge first?
- **Fallback plan if features collapse after validation** — what is the downgrade or alternative plan if the preferred signal, feature set, or validation path fails?

> ⚠ **Disclaimer**: This plan is for computational / observational research design only. It does not
> constitute clinical, medical, regulatory, or prescriptive advice. All biomarker and
> screening-performance claims require stronger prospective and/or external validation before application.

---

## Hard Rules

1. **Never output only one flat generic plan.** Always output Lite / Standard / Advanced / Publication+.
2. **Always recommend one primary plan** and justify the choice for this specific study.
3. **Always separate necessary modules from optional modules.**
4. **Always distinguish evidence tiers.** Never imply cross-sectional associations, ROC, or retrospective validation prove causality or prospective predictive value.
5. **Do not produce a literature review** unless directly needed to justify a design choice.
6. **Do not pretend all modules are equally necessary.**
7. **Optimize for epidemiologic logic and feasibility**, not for sounding sophisticated.
8. **No vague phrasing** like "you could also explore." Be explicit about what to do and why.
9. **If user gives insufficient detail**, infer a reasonable default and state assumptions clearly.
11. **Any literature output must use real, directly verified references only.** Never invent or auto-complete missing citation metadata.
12. **Every formal reference must include a DOI, PMID, PMCID, or a direct stable link**. If unavailable, do not promote the item to a formal citation.
13. **When references are unavailable or uncertain, output the search strategy and evidence gap explicitly.**
14. **STOP and redirect** on clinical trial protocols, dosing, regulatory submissions, or prescriptive medical conclusions.
15. **Section G Minimal Executable Version is mandatory** in every output.
16. **Never introduce retrospective-validation- or ROC-dependent steps** unless those resources and logic have already been explicitly declared in that same configuration.
17. **Section G must be a strict subset of the Lite plan** unless the output explicitly declares an upgraded minimal variant.
18. **Every endpoint-selection step must state its dependency formula explicitly** (e.g., disease definition + biomarker formula + adjusted logistic model). The skill must not switch from one formula to another silently.
19. **If Advanced or Publication+ introduces new evidence layers not present in Lite/Standard**, mark them as upgrade-only modules and do not back-propagate them into earlier sections.
20. **Section C.5 Dependency Map is mandatory** in every output for both the recommended plan and the minimal executable plan.
21. **Section I Reference Literature Pack is mandatory** in every output unless search/browsing is genuinely unavailable, in which case a transparent search strategy must be provided instead.
22. **If D. Step-by-Step Workflow mentions any dataset, cohort, registry, GWAS source, database, or public resource, the Dataset Disclaimer must appear immediately before the workflow steps. Do not omit it.**
23. **Section J. Self-Critical Risk Review is mandatory in every output. Do not omit any of its six required elements.**

Related Skills

clinical-reports

from aipoch/medical-research-skills

Write comprehensive clinical reports (case reports, diagnostic reports, clinical trial reports, and patient documentation) when accuracy, regulatory compliance (HIPAA/FDA/ICH-GCP), and template-driven validation are required.

clinicaltrials-gov-parser

from aipoch/medical-research-skills

Monitor and summarize competitor clinical trial status changes from ClinicalTrials.gov.

clinicaltrials-db

from aipoch/medical-research-skills

Query the ClinicalTrials.gov API v2 to search for clinical trials, retrieve detailed study protocols, and analyze recruitment status; use when you need to find trials by condition/drug, export results, or verify study details by NCT ID.

clinical-study-info-extractor

from aipoch/medical-research-skills

Batch extracts and verifies structured information (PMID, title, abstract, methodology, results, etc.) from clinical research literature using PMIDs. Use when the user wants to extract details from specific PMIDs.

preclinical-pkpd-analyst

from aipoch/medical-research-skills

Use preclinical pkpd analyst for data analysis workflows that need structured execution, explicit assumptions, and clear output boundaries.

outcome-extraction-for-clinical-trials

from aipoch/medical-research-skills

Clinical research outcome extraction for meta-analysis. Use when users need to extract outcome measures (binary, continuous, or survival data) from clinical research papers for systematic review and meta-analysis. Handles both database lookup by PMID and real-time LLM extraction.

clinical-data-cleaner

from aipoch/medical-research-skills

Use when cleaning clinical trial data, preparing data for FDA/EMA submission, standardizing SDTM datasets, handling missing values in clinical studies, detecting outliers in lab results, or converting raw CRF data to CDISC format. Cleans and standardizes clinical trial data for regulatory compliance with audit trails.

baseline-extraction-for-clinical-trials

from aipoch/medical-research-skills

Extracts clinical trial baseline data (study, region, participants, etc.) from article text or PMID. Checks PubMed for metadata; always falls back to LLM extraction for full details.

prognostic-biomarker-protocol-designer

from aipoch/medical-research-skills

Designs discovery, modeling, and validation workflows for prognostic biomarkers in biomedical and clinical research. Always use this skill when the user needs a prognostic biomarker study blueprint rather than a diagnostic test protocol, predictive biomarker design, treatment recommendation, or a completed manuscript. Focus on endpoint family, follow-up horizon, time scale, candidate marker strategy, model-building logic, risk stratification framework, and internal/external validation requirements. Do not invent cohort size, event rate, assay readiness, literature support, or validation access.

process-related-diagnostic-biomarker-nomogram

from aipoch/medical-research-skills

Generates complete process-related diagnostic biomarker bioinformatics research designs from a user-provided disease context, gene-family or pathway theme, and validation direction. Use when a study centers on process-related genes, DEG and WGCNA integration, machine-learning feature selection, nomogram-based diagnostic modeling, immune infiltration, regulatory-network analysis, and optional external or experimental validation. Covers five study patterns (process-DEG discovery, co-expression-module integration, machine-learning biomarker selection, diagnostic model/nomogram workflow, immune-regulatory interpretation and validation) and always outputs Lite / Standard / Advanced / Publication+ with a recommended primary plan, stepwise workflow, figure plan, validation hierarchy, minimal executable version, publication upgrade path, and strictly verified literature retrieval.

dual-disease-shared-transcriptome-biomarker

from aipoch/medical-research-skills

Generates complete dual-disease shared-transcriptome biomarker and hub-gene research designs from a user-provided disease pair and shared-biology direction. Always use this skill whenever a user wants to design, plan, or build a non-oncology two-disease transcriptome study centered on per-disease differential expression, shared-signal intersection or concordance, PPI-based hub-gene prioritization, diagnostic evaluation across both diseases, immune infiltration context, pathway interpretation, and optional orthogonal validation. Covers five study patterns (shared-DEG-first workflow, hub-gene-first shared-biomarker workflow, hybrid shared-biomarker compression workflow, immune-context shared-biomarker workflow, orthogonal validation workflow) and always outputs four workload configs (Lite / Standard / Advanced / Publication+) with recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, publication upgrade path...

cross-disease-shared-biomarker-network

from aipoch/medical-research-skills

Generates complete cross-disease shared-biomarker bioinformatics research designs from a user-provided disease pair and validation direction. Always use this skill whenever a user wants to design, plan, or build a multi-dataset study linking two related diseases through shared DEGs, enrichment, PPI hub genes, public validation, regulatory-network analysis, immune infiltration, drug-gene interaction screening, and optional qRT-PCR or cell-line validation. Covers five study patterns (shared-DEG discovery, hub-gene prioritization, regulatory-network interpretation, immune/drug follow-up, bioinformatics-plus-validation) and always outputs four workload configs (Lite / Standard / Advanced / Publication+) with recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, publication upgrade path, and a strictly verified reference literature retrieval layer with real references only.