conducting-meta-analyses

Performs meta-analysis with heterogeneity assessment, forest plot generation, and GRADE evidence grading. Use when conducting meta-analyses, assessing heterogeneity, or grading evidence quality.

11 stars

byCaseMark

View on GitHub Installation ↓

Best use case

conducting-meta-analyses is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Performs meta-analysis with heterogeneity assessment, forest plot generation, and GRADE evidence grading. Use when conducting meta-analyses, assessing heterogeneity, or grading evidence quality.

Teams using conducting-meta-analyses should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/conducting-meta-analyses/SKILL.md --create-dirs "https://raw.githubusercontent.com/CaseMark/skills/main/skills/med/conducting-meta-analyses/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/conducting-meta-analyses/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How conducting-meta-analyses Compares

Feature / Agent	conducting-meta-analyses	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Performs meta-analysis with heterogeneity assessment, forest plot generation, and GRADE evidence grading. Use when conducting meta-analyses, assessing heterogeneity, or grading evidence quality.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Conducting Meta-Analyses

## Why This Skill Exists

Meta-analysis quantitatively synthesizes results across multiple studies to produce pooled effect estimates with greater precision than any individual trial. It is the statistical engine behind systematic reviews, health technology assessments, and regulatory benefit-risk evaluations. Poorly conducted meta-analyses — using inappropriate pooling models, ignoring heterogeneity, or failing to assess publication bias — produce misleading conclusions that can harm patients and distort policy. This skill implements Cochrane Handbook methodology, PRISMA reporting standards, and GRADE evidence assessment for defensible quantitative synthesis.

---

## Checkpoint A — Intake and Scoping

### Required Intake Questions
1. Is this meta-analysis part of a registered systematic review (PROSPERO ID)?
2. What is the clinical question in PICOS format?
3. What is the primary effect measure (risk ratio, odds ratio, hazard ratio, mean difference, standardized mean difference)?
4. Are individual participant data (IPD) available, or is this an aggregate-data meta-analysis?
5. How many studies are expected to be included (impacts model choice and publication-bias assessment)?
6. Are subgroup analyses or meta-regression pre-specified?
7. Is a network meta-analysis (NMA) required for indirect comparisons?
8. What software will be used (R metafor/meta, Stata metan, RevMan, CMA)?
9. Is a GRADE Summary of Findings table required?
10. What is the intended audience (regulatory, HTA, journal publication, clinical guideline)?

### Required Source Documents
- Completed data extraction forms from the systematic review
- Risk-of-bias assessments for all included studies
- Pre-specified analysis plan (from the systematic review protocol)
- Existing meta-analyses on the same topic (for comparison and gap analysis)

---

## Step 1 — Select the Effect Measure

Choose the appropriate summary statistic based on outcome type:

### Binary Outcomes
- **Risk Ratio (RR)**: Preferred when baseline risk is stable across studies; interpretable as relative risk
- **Odds Ratio (OR)**: Required for case-control studies; approximates RR when event rates are low (<10%)
- **Risk Difference (RD)**: Absolute measure; useful for NNT calculation; but often heterogeneous across baseline risks

### Continuous Outcomes
- **Mean Difference (MD)**: When all studies use the same measurement scale
- **Standardized Mean Difference (SMD)**: When studies use different scales measuring the same construct (Cohen's d or Hedges' g with small-sample correction)

### Time-to-Event Outcomes
- **Hazard Ratio (HR)**: Preferred; requires either reported HRs or IPD reconstruction from KM curves using validated methods (Tierney et al., BMJ 2007)

### Count/Rate Data
- **Rate Ratio**: Events per person-time; use when follow-up durations differ substantially

Document the rationale for effect-measure selection in the methods section.

---

## Step 2 — Choose the Statistical Model

### Fixed-Effect Model
- Assumes all studies estimate the same true effect (common-effect assumption)
- Appropriate only when studies are clinically and methodologically homogeneous
- Weights studies by inverse variance (larger studies contribute more)
- Methods: Mantel-Haenszel (binary), inverse variance (continuous), Peto (rare events with balanced arms)

### Random-Effects Model
- Assumes study-level true effects vary around a mean (distributional assumption)
- Appropriate when heterogeneity is expected or observed
- Incorporates between-study variance (tau²) into weights
- Methods: DerSimonian-Laird (simple but biased tau² estimate), REML (restricted maximum likelihood — preferred), Paule-Mandel, Knapp-Hartung adjustment for CI calculation (recommended when <20 studies)

### Model Selection Decision
- Default to random-effects in most clinical contexts (heterogeneity is nearly always present)
- Present fixed-effect as sensitivity analysis
- When tau² estimate is zero, random-effects reduces to fixed-effect

---

## Step 3 — Assess and Quantify Heterogeneity

Heterogeneity assessment is mandatory, not optional:

1. **Cochran's Q test**: Chi-squared test for heterogeneity; low power with few studies (p < 0.10 threshold)
2. **I² statistic**: Proportion of variability due to between-study differences; 0-40% = low, 30-60% = moderate, 50-90% = substantial, 75-100% = considerable (Cochrane thresholds are overlapping deliberately)
3. **Tau² (τ²)**: Absolute between-study variance; more informative than I² for clinical interpretation
4. **Prediction interval**: Range within which the true effect in a new study would likely fall; far more informative than the confidence interval of the pooled estimate for clinical decision-making
5. **Visual inspection**: Examine forest plot for non-overlapping CIs, outliers, and directional inconsistency

When substantial heterogeneity exists, do not simply report the pooled estimate — explore and explain it.

---

## Step 4 — Explore Sources of Heterogeneity

When I² > 40% or clinical heterogeneity is suspected:

### Subgroup Analysis
- Pre-specified clinical or methodological moderators (dose, population, risk of bias, study design)
- Formal test for subgroup differences (interaction test, not separate subgroup p-values)
- Minimum 2-3 studies per subgroup for meaningful analysis

### Meta-Regression
- Requires ≥10 studies (rule of thumb: 10 studies per covariate)
- Random-effects meta-regression with Knapp-Hartung modification
- Report R² (proportion of between-study variance explained)
- Caution: ecological fallacy — study-level associations do not imply patient-level relationships

### Sensitivity Analyses
- Leave-one-out analysis (recompute removing each study sequentially)
- Restrict to low risk-of-bias studies
- Restrict to studies with adequate allocation concealment
- Compare fixed vs. random effects
- Compare effect measures (RR vs. OR)

---

## Step 5 — Assess Publication Bias

Required when ≥10 studies are included in the meta-analysis:

1. **Funnel plot**: Scatter plot of effect estimates vs. standard error; asymmetry suggests bias
2. **Egger's test**: Regression test for funnel-plot asymmetry (continuous outcomes)
3. **Peters' test**: Alternative for binary outcomes (Egger's test is biased for ORs)
4. **Trim-and-fill**: Imputes missing studies and recalculates the pooled estimate (provides sensitivity estimate)
5. **Contour-enhanced funnel plot**: Adds significance contours to distinguish publication bias from other asymmetry causes
6. **Selection models** (Copas, Vevea-Hedges): More sophisticated modeling of publication-selection process

Report the results of bias assessment even if no evidence of bias is found.

---

## Step 6 — Generate Forest Plots and Visualizations

Produce publication-quality figures:

### Forest Plot Requirements
- Individual study estimates with 95% CIs as horizontal lines/boxes
- Box size proportional to study weight
- Diamond for pooled estimate (width = 95% CI)
- Vertical line at null effect (RR=1 or MD=0)
- Numeric columns: study author/year, n per arm, effect estimate, CI, weight
- I² and Q-test results displayed below the plot
- Prediction interval overlaid on the diamond (when using random effects)

### Additional Plots
- Funnel plot (with or without contour enhancement)
- L'Abbe plot (for binary outcomes — event rates in treatment vs. control)
- Galbraith/radial plot (to identify outliers)
- Cumulative meta-analysis (chronological accumulation of evidence)
- Influence plot (leave-one-out results)

---

## Step 7 — Apply GRADE and Generate Summary of Findings

Rate the certainty of evidence for each outcome using GRADE:

| Domain | Downgrade Criteria |
|--------|--------------------|
| Risk of bias | Majority of evidence at high/unclear risk in key domains |
| Inconsistency | I² >60%, unexplained; prediction interval crosses null |
| Indirectness | Population, intervention, or outcome differs from target question |
| Imprecision | CI crosses clinical decision threshold; OIS not met |
| Publication bias | Significant asymmetry; missing studies suspected |

Upgrade criteria (for observational studies): large magnitude of effect, dose-response gradient, all plausible confounders would reduce effect.

Final rating: High, Moderate, Low, or Very Low certainty.

Present in a GRADE Summary of Findings (SoF) table with: outcome, number of studies/participants, effect estimate (95% CI), certainty rating, and plain-language interpretation.

---

## Checkpoint B — Analysis Review

1. [ ] Effect measure is appropriate for the outcome type and study designs
2. [ ] Statistical model (fixed/random) is justified and sensitivity analysis includes the alternative
3. [ ] Heterogeneity is quantified (I², tau², prediction interval) and explored
4. [ ] Subgroup analyses and meta-regression are pre-specified in the protocol
5. [ ] Publication-bias assessment is performed (for ≥10 studies)
6. [ ] Forest plots include all required elements (weights, CIs, pooled estimate, heterogeneity statistics)
7. [ ] GRADE assessment is completed for all critical outcomes
8. [ ] Sensitivity analyses are performed and reported
9. [ ] Results are interpreted in context of heterogeneity and certainty
10. [ ] Software, packages, and version numbers are documented

---

## Quality Audit

- [ ] Data extraction values entering the meta-analysis match the systematic review extraction forms
- [ ] Effect direction is consistent across all studies (verify coding of events/non-events)
- [ ] Zero-event studies are handled appropriately (continuity correction or exact methods, not excluded)
- [ ] Small-study effects are explored beyond simple funnel plots
- [ ] Prediction intervals are reported alongside confidence intervals for random-effects models
- [ ] Network meta-analysis (if conducted) assesses transitivity and coherence
- [ ] All analyses are reproducible from documented code/commands
- [ ] All [VERIFY] flags have been resolved or escalated

---

## Guidelines

1. Never pool clinically heterogeneous studies just because statistical heterogeneity is low — clinical judgment precedes statistical combination
2. Report prediction intervals for all random-effects meta-analyses — the pooled CI alone understates uncertainty
3. I² is not a measure of the amount of heterogeneity; it is the proportion of variability due to heterogeneity — interpret in context of tau²
4. Do not use fixed-effect models solely to obtain a smaller p-value when random-effects is more appropriate
5. Zero-event studies carry information and should not be silently excluded — use Peto method or exact methods
6. For rare events (<1% event rate), standard methods (Mantel-Haenszel, DerSimonian-Laird) may be unreliable — use exact methods or beta-binomial models
7. Subgroup analyses are hypothesis-generating unless pre-specified and powered — do not overinterpret
8. Cite the specific software, package, and version used (e.g., R 4.3, metafor 4.4-0)
9. Escalate to methodologist when I² >75%, when zero-event handling is complex, or when network meta-analysis shows incoherence
10. This skill produces statistical analyses — clinical interpretation of pooled effects requires domain-expert collaboration

Related Skills

conducting-surgical-time-outs

from CaseMark/skills

Structures WHO surgical safety checklist completion with sign-in, time-out, and sign-out documentation. Use when performing surgical time-outs, completing safety checklists, or documenting pre-incision verification.

conducting-stress-test-interpretation

from CaseMark/skills

Interprets exercise and pharmacologic stress tests with Duke treadmill score and nuclear findings. Use when reading stress tests, interpreting nuclear perfusion, or documenting exercise tolerance.

conducting-stark-law-analysis

from CaseMark/skills

Evaluates physician self-referral arrangements against Stark Law exceptions with documentation. Use when analyzing Stark compliance, evaluating referral arrangements, or documenting exception applicability.

conducting-psychiatric-evaluations

from CaseMark/skills

Structures comprehensive psychiatric evaluation with MSE, diagnostic formulation, and risk assessment. Use when performing psychiatric assessments, documenting mental status exams, or creating diagnostic formulations.

conducting-program-evaluation-public-health

from CaseMark/skills

Structures program evaluation using CDC framework with process, outcome, and impact assessment. Use when evaluating public health programs, measuring program effectiveness, or conducting logic model analysis.

conducting-preoperative-planning

from CaseMark/skills

Structures surgical planning with imaging review, risk stratification, and equipment/team requirements. Use when planning surgeries, reviewing preoperative imaging, or coordinating surgical teams.

conducting-pre-operative-evaluations

from CaseMark/skills

Structures pre-surgical risk assessment using ACC/AHA guidelines with cardiac and pulmonary clearance. Use when performing preop evaluations, assessing surgical risk, or providing medical clearance.

conducting-nursing-assessments

from CaseMark/skills

Structures head-to-toe nursing assessments with system-by-system documentation and abnormal findings. Use when performing nursing assessments, documenting patient evaluations, or creating assessment narratives.

conducting-mortality-reviews

from CaseMark/skills

Structures mortality case reviews with root cause analysis and system improvement recommendations. Use when conducting M&M reviews, analyzing adverse outcomes, or documenting mortality cases.

conducting-morbidity-mortality-reviews

from CaseMark/skills

Structures surgical M&M conference presentations with case analysis and system improvement recommendations. Use when presenting M&M cases, analyzing surgical outcomes, or documenting quality improvement.

conducting-literature-reviews-systematic

from CaseMark/skills

Performs systematic literature review following PRISMA guidelines with search strategy documentation. Use when conducting systematic reviews, documenting search strategies, or performing PRISMA analyses.

conducting-health-impact-assessments

from CaseMark/skills

Structures health impact assessment with exposure evaluation and risk characterization. Use when assessing health impacts, evaluating environmental exposures, or characterizing population risk.