bayesian-statistics-guide

Bayesian inference methods including prior selection, MCMC, and model comparison

191 stars

Best use case

bayesian-statistics-guide is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Bayesian inference methods including prior selection, MCMC, and model comparison

Teams using bayesian-statistics-guide should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/bayesian-statistics-guide/SKILL.md --create-dirs "https://raw.githubusercontent.com/wentorai/research-plugins/main/skills/analysis/statistics/bayesian-statistics-guide/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/bayesian-statistics-guide/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How bayesian-statistics-guide Compares

Feature / Agent	bayesian-statistics-guide	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Bayesian inference methods including prior selection, MCMC, and model comparison

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Bayesian Statistics Guide

A skill for applying Bayesian statistical methods to research data analysis. Covers prior specification, Markov chain Monte Carlo (MCMC) sampling, posterior interpretation, model comparison, and reporting standards.

## Bayesian Framework Overview

### Bayes' Theorem in Practice

```
Posterior = (Likelihood x Prior) / Evidence

P(theta | data) = P(data | theta) * P(theta) / P(data)

In practice:
  P(theta | data) is proportional to P(data | theta) * P(theta)
  (the denominator is a normalizing constant)
```

### When to Use Bayesian Methods

| Scenario | Bayesian Advantage |
|----------|-------------------|
| Small sample sizes | Priors regularize estimates |
| Complex hierarchical models | Natural framework for multilevel data |
| Sequential data collection | Update beliefs as data arrives |
| Prior knowledge available | Formally incorporate existing evidence |
| Model comparison | Bayes factors and posterior model probabilities |
| Prediction | Full posterior predictive distributions |

## Prior Specification

### Types of Priors

```python
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

def visualize_priors(parameter_name: str, prior_type: str = 'weakly_informative'):
    """
    Visualize common prior choices for a parameter.
    """
    x = np.linspace(-10, 10, 1000)

    priors = {
        'flat': {
            'dist': stats.uniform(loc=-100, scale=200),
            'description': 'Flat/Uniform: minimal prior info (often improper)',
            'recommendation': 'Avoid -- can lead to improper posteriors'
        },
        'weakly_informative': {
            'dist': stats.norm(loc=0, scale=2.5),
            'description': 'Weakly informative: Normal(0, 2.5)',
            'recommendation': 'Good default for regression coefficients'
        },
        'informative': {
            'dist': stats.norm(loc=0.5, scale=0.2),
            'description': 'Informative: based on previous studies',
            'recommendation': 'Use when strong prior evidence exists'
        },
        'horseshoe': {
            'dist': stats.cauchy(loc=0, scale=1),
            'description': 'Horseshoe-like (Cauchy): sparsity-inducing',
            'recommendation': 'Good for variable selection problems'
        }
    }

    prior = priors.get(prior_type, priors['weakly_informative'])
    return prior

# Recommended default priors (Gelman et al., 2008):
# Intercept: Normal(0, 10)
# Coefficients: Normal(0, 2.5) on standardized predictors
# Standard deviation: Half-Cauchy(0, 2.5) or Exponential(1)
# Correlation: LKJ(2) for correlation matrices
```

## MCMC with PyMC

### Linear Regression Example

```python
import pymc as pm
import arviz as az

def bayesian_regression(X, y, feature_names=None):
    """
    Fit a Bayesian linear regression model using PyMC.

    Args:
        X: Feature matrix (n_samples, n_features)
        y: Response variable (n_samples,)
        feature_names: List of feature names
    """
    n_features = X.shape[1]
    if feature_names is None:
        feature_names = [f'x{i}' for i in range(n_features)]

    with pm.Model() as model:
        # Priors
        intercept = pm.Normal('intercept', mu=0, sigma=10)
        betas = pm.Normal('betas', mu=0, sigma=2.5, shape=n_features)
        sigma = pm.HalfCauchy('sigma', beta=2.5)

        # Linear predictor
        mu = intercept + pm.math.dot(X, betas)

        # Likelihood
        y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed=y)

        # MCMC sampling
        trace = pm.sample(
            draws=2000,
            tune=1000,
            chains=4,
            cores=4,
            target_accept=0.9,
            return_inferencedata=True
        )

    return model, trace

# After fitting, analyze results:
# az.summary(trace, var_names=['intercept', 'betas', 'sigma'])
# az.plot_trace(trace)
# az.plot_forest(trace, var_names=['betas'])
```

## Diagnostics

### MCMC Convergence Checks

```python
def check_mcmc_diagnostics(trace) -> dict:
    """
    Check MCMC convergence diagnostics.
    """
    summary = az.summary(trace)

    diagnostics = {
        'r_hat': {
            'values': summary['r_hat'].to_dict(),
            'threshold': 1.01,
            'pass': (summary['r_hat'] < 1.01).all(),
            'interpretation': 'R-hat < 1.01 indicates convergence'
        },
        'ess_bulk': {
            'min_value': summary['ess_bulk'].min(),
            'threshold': 400,
            'pass': (summary['ess_bulk'] > 400).all(),
            'interpretation': 'ESS > 400 ensures reliable posterior estimates'
        },
        'ess_tail': {
            'min_value': summary['ess_tail'].min(),
            'threshold': 400,
            'pass': (summary['ess_tail'] > 400).all(),
            'interpretation': 'Tail ESS > 400 ensures reliable credible intervals'
        }
    }

    # Overall assessment
    diagnostics['converged'] = all(
        d['pass'] for d in diagnostics.values() if 'pass' in d
    )

    return diagnostics
```

## Model Comparison

### Bayesian Model Selection

```python
def compare_models(traces: dict) -> dict:
    """
    Compare Bayesian models using LOO-CV and WAIC.

    Args:
        traces: Dict mapping model names to InferenceData objects
    """
    comparison = az.compare(traces, ic='loo')

    return {
        'ranking': comparison.index.tolist(),
        'loo_values': comparison['loo'].to_dict(),
        'weights': comparison['weight'].to_dict(),
        'interpretation': (
            f"Best model: {comparison.index[0]} "
            f"(weight = {comparison['weight'].iloc[0]:.2f})"
        )
    }
```

## Reporting Bayesian Results

Follow the WAMBS checklist (Depaoli & van de Schoot, 2017):

1. **Priors**: Report all prior distributions and justify choices
2. **Convergence**: Report R-hat, ESS, and trace plots (in supplement)
3. **Posteriors**: Report posterior mean/median, 95% credible interval (HDI preferred)
4. **Sensitivity**: Show results are robust to reasonable prior changes
5. **Model fit**: Report LOO-IC, WAIC, or posterior predictive checks

Example results sentence: "The effect of treatment on outcome was estimated at beta = 0.45, 95% HDI [0.21, 0.68], with a posterior probability of 0.99 that the effect is positive."

## References

- Gelman, A., et al. (2013). *Bayesian Data Analysis* (3rd ed.). CRC Press.
- McElreath, R. (2020). *Statistical Rethinking* (2nd ed.). CRC Press.