epidemiology-guide

Epidemiological study designs, measures of association, and public health ana...

191 stars

Best use case

epidemiology-guide is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Epidemiological study designs, measures of association, and public health ana...

Teams using epidemiology-guide should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/epidemiology-guide/SKILL.md --create-dirs "https://raw.githubusercontent.com/wentorai/research-plugins/main/skills/domains/biomedical/epidemiology-guide/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/epidemiology-guide/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How epidemiology-guide Compares

Feature / Agent	epidemiology-guide	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Epidemiological study designs, measures of association, and public health ana...

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Epidemiology Guide

A skill for designing and analyzing epidemiological studies. Covers study design selection, measures of disease frequency and association, bias assessment, and public health data analysis methods.

## Study Design Selection

### Design Hierarchy

```
                    Evidence Strength
                         |
    Systematic Review / Meta-Analysis   (Highest)
                         |
         Randomized Controlled Trial
                         |
              Cohort Study (Prospective)
                         |
            Case-Control Study
                         |
         Cross-Sectional Study
                         |
         Case Report / Case Series       (Lowest)
```

### When to Use Each Design

| Design | Research Question | Time | Cost | Bias Risk |
|--------|------------------|------|------|-----------|
| RCT | Does intervention X prevent outcome Y? | Years | Very high | Lowest |
| Prospective Cohort | Does exposure X increase risk of Y? | Years | High | Moderate |
| Retrospective Cohort | Historical exposure-outcome relationship? | Months | Moderate | Moderate-High |
| Case-Control | What exposures are associated with rare disease? | Months | Low | High |
| Cross-Sectional | What is the prevalence of X? | Weeks | Low | High |
| Ecological | Do population-level factors correlate with disease? | Weeks | Very low | Very high |

## Measures of Disease Frequency

```python
import numpy as np

def compute_measures(cases: int, population: int,
                      person_time: float = None,
                      period_years: float = 1.0) -> dict:
    """
    Compute basic epidemiological measures.

    Args:
        cases: Number of new cases (for incidence) or existing cases (for prevalence)
        population: Population at risk
        person_time: Person-years of follow-up (for incidence rate)
        period_years: Time period in years (for cumulative incidence)
    """
    measures = {}

    # Point prevalence
    measures['prevalence'] = {
        'value': cases / population,
        'per_1000': (cases / population) * 1000,
        'formula': 'cases / population at a point in time'
    }

    # Cumulative incidence (risk)
    measures['cumulative_incidence'] = {
        'value': cases / population,
        'per_1000': (cases / population) * 1000,
        'period_years': period_years,
        'formula': 'new cases / population at risk during time period'
    }

    # Incidence rate (if person-time available)
    if person_time:
        measures['incidence_rate'] = {
            'value': cases / person_time,
            'per_1000_py': (cases / person_time) * 1000,
            'formula': 'new cases / person-time at risk'
        }

    return measures
```

## Measures of Association

### Risk Ratio, Odds Ratio, and Attributable Risk

```python
def measures_of_association(a: int, b: int, c: int, d: int) -> dict:
    """
    Compute epidemiological measures of association from a 2x2 table.

                    Disease+    Disease-
    Exposed+          a           b        a+b
    Exposed-          c           d        c+d
                     a+c         b+d        N

    Args:
        a: Exposed with disease
        b: Exposed without disease
        c: Unexposed with disease
        d: Unexposed without disease
    """
    # Risk in exposed and unexposed
    risk_exposed = a / (a + b)
    risk_unexposed = c / (c + d)

    # Risk Ratio (Relative Risk)
    rr = risk_exposed / risk_unexposed
    ln_rr = np.log(rr)
    se_ln_rr = np.sqrt(1/a - 1/(a+b) + 1/c - 1/(c+d))
    rr_ci = (np.exp(ln_rr - 1.96*se_ln_rr), np.exp(ln_rr + 1.96*se_ln_rr))

    # Odds Ratio
    or_val = (a * d) / (b * c)
    ln_or = np.log(or_val)
    se_ln_or = np.sqrt(1/a + 1/b + 1/c + 1/d)
    or_ci = (np.exp(ln_or - 1.96*se_ln_or), np.exp(ln_or + 1.96*se_ln_or))

    # Attributable Risk (Risk Difference)
    ar = risk_exposed - risk_unexposed
    se_ar = np.sqrt(risk_exposed*(1-risk_exposed)/(a+b) +
                     risk_unexposed*(1-risk_unexposed)/(c+d))
    ar_ci = (ar - 1.96*se_ar, ar + 1.96*se_ar)

    # Attributable Fraction in Exposed
    af_exposed = (rr - 1) / rr

    # Population Attributable Fraction
    prevalence_exposure = (a + b) / (a + b + c + d)
    paf = prevalence_exposure * (rr - 1) / (prevalence_exposure * (rr - 1) + 1)

    return {
        'risk_ratio': {'value': round(rr, 3), 'ci_95': tuple(round(x, 3) for x in rr_ci)},
        'odds_ratio': {'value': round(or_val, 3), 'ci_95': tuple(round(x, 3) for x in or_ci)},
        'risk_difference': {'value': round(ar, 4), 'ci_95': tuple(round(x, 4) for x in ar_ci)},
        'attributable_fraction_exposed': round(af_exposed, 3),
        'population_attributable_fraction': round(paf, 3),
        'number_needed_to_harm': round(1/ar, 1) if ar > 0 else None
    }

# Example: smoking and lung cancer
result = measures_of_association(a=80, b=920, c=10, d=990)
print(f"RR: {result['risk_ratio']['value']} ({result['risk_ratio']['ci_95']})")
print(f"OR: {result['odds_ratio']['value']} ({result['odds_ratio']['ci_95']})")
print(f"PAF: {result['population_attributable_fraction']}")
```

## Bias Assessment

### Types of Bias and Mitigation

| Bias Type | Description | Mitigation Strategy |
|-----------|------------|-------------------|
| Selection bias | Non-random sample selection | Random sampling, matching |
| Information bias | Measurement error in exposure/outcome | Validated instruments, blinding |
| Recall bias | Differential recall by disease status | Use records, not self-report |
| Confounding | Third variable affects both exposure and outcome | Stratification, regression, matching |
| Lead-time bias | Earlier detection misinterpreted as longer survival | Use mortality, not survival |
| Healthy worker effect | Workers are healthier than general population | Use employed comparison group |

### Confounding Assessment

```python
def assess_confounding(crude_rr: float, adjusted_rr: float,
                        threshold: float = 0.10) -> dict:
    """
    Assess whether a variable is a confounder.
    """
    pct_change = abs(crude_rr - adjusted_rr) / crude_rr * 100

    return {
        'crude_RR': crude_rr,
        'adjusted_RR': adjusted_rr,
        'percent_change': round(pct_change, 1),
        'is_confounder': pct_change > threshold * 100,
        'interpretation': (
            f"{'Confounder detected' if pct_change > threshold * 100 else 'Not a confounder'}: "
            f"adjusting changed the RR by {pct_change:.1f}% "
            f"(threshold: {threshold*100:.0f}%)"
        )
    }
```

## Survival Analysis

For time-to-event data, use Kaplan-Meier estimators for descriptive analysis, log-rank tests for group comparisons, and Cox proportional hazards regression for multivariable analysis. Always check the proportional hazards assumption using Schoenfeld residuals and report median survival times with 95% confidence intervals.

## Reporting Standards

Follow STROBE (observational studies), CONSORT (trials), or RECORD (routinely collected data) reporting guidelines. Report all measures with 95% confidence intervals. Present both crude and adjusted estimates to show the impact of confounding adjustment.