ab-test-calculator
Calculate statistical significance for A/B tests. Sample size estimation, power analysis, and conversion rate comparisons with confidence intervals.
Best use case
ab-test-calculator is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Calculate statistical significance for A/B tests. Sample size estimation, power analysis, and conversion rate comparisons with confidence intervals.
Teams using ab-test-calculator should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/ab-test-calculator/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How ab-test-calculator Compares
| Feature / Agent | ab-test-calculator | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Calculate statistical significance for A/B tests. Sample size estimation, power analysis, and conversion rate comparisons with confidence intervals.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# A/B Test Calculator
Statistical significance testing for A/B experiments with power analysis and sample size estimation.
## Features
- **Significance Testing**: Chi-square, Z-test, T-test for conversions
- **Sample Size Estimation**: Calculate required samples for desired power
- **Power Analysis**: Determine test power given sample size
- **Confidence Intervals**: Calculate CIs for conversion rates
- **Multiple Variants**: Support A/B/n testing
- **Bayesian Analysis**: Probability to beat baseline
## Quick Start
```python
from ab_test_calc import ABTestCalculator
calc = ABTestCalculator()
# Test significance
result = calc.test_significance(
control_visitors=10000,
control_conversions=500,
variant_visitors=10000,
variant_conversions=550
)
print(f"Significant: {result['significant']}")
print(f"P-value: {result['p_value']:.4f}")
print(f"Lift: {result['lift']:.2%}")
```
## CLI Usage
```bash
# Test significance
python ab_test_calc.py --test 10000 500 10000 550
# Calculate sample size
python ab_test_calc.py --sample-size --baseline 0.05 --mde 0.10 --power 0.8
# Power analysis
python ab_test_calc.py --power-analysis --baseline 0.05 --mde 0.10 --samples 5000
# Bayesian analysis
python ab_test_calc.py --bayesian 10000 500 10000 550
# Multiple variants
python ab_test_calc.py --test-multi 10000 500 10000 550 10000 520
```
## API Reference
### ABTestCalculator Class
```python
class ABTestCalculator:
def __init__(self, alpha: float = 0.05)
# Significance testing
def test_significance(self, control_visitors: int, control_conversions: int,
variant_visitors: int, variant_conversions: int,
test: str = "chi_square") -> dict
# Sample size calculation
def calculate_sample_size(self, baseline_rate: float,
minimum_detectable_effect: float,
power: float = 0.8,
alpha: float = 0.05) -> dict
# Power analysis
def calculate_power(self, baseline_rate: float,
minimum_detectable_effect: float,
sample_size: int,
alpha: float = 0.05) -> dict
# Confidence interval
def confidence_interval(self, visitors: int, conversions: int,
confidence: float = 0.95) -> dict
# Bayesian analysis
def bayesian_analysis(self, control_visitors: int, control_conversions: int,
variant_visitors: int, variant_conversions: int,
simulations: int = 100000) -> dict
# Multiple variants
def test_multiple_variants(self, control: tuple, variants: list,
correction: str = "bonferroni") -> dict
# Duration estimation
def estimate_duration(self, daily_visitors: int, baseline_rate: float,
minimum_detectable_effect: float,
power: float = 0.8) -> dict
```
## Test Methods
### Chi-Square Test (Default)
Best for comparing conversion rates between groups.
```python
result = calc.test_significance(
control_visitors=10000,
control_conversions=500,
variant_visitors=10000,
variant_conversions=550,
test="chi_square"
)
```
### Z-Test for Proportions
Good for large sample sizes.
```python
result = calc.test_significance(
control_visitors=10000,
control_conversions=500,
variant_visitors=10000,
variant_conversions=550,
test="z_test"
)
```
## Sample Size Estimation
Calculate the number of visitors needed per variant:
```python
result = calc.calculate_sample_size(
baseline_rate=0.05, # Current conversion rate (5%)
minimum_detectable_effect=0.10, # 10% relative improvement
power=0.8, # 80% power
alpha=0.05 # 5% significance level
)
# Returns:
{
"sample_size_per_variant": 31234,
"total_sample_size": 62468,
"baseline_rate": 0.05,
"expected_variant_rate": 0.055,
"minimum_detectable_effect": 0.10,
"power": 0.8,
"alpha": 0.05
}
```
## Power Analysis
Calculate the probability of detecting an effect:
```python
result = calc.calculate_power(
baseline_rate=0.05,
minimum_detectable_effect=0.10,
sample_size=25000,
alpha=0.05
)
# Returns:
{
"power": 0.72,
"interpretation": "72% chance of detecting the effect if it exists"
}
```
## Bayesian Analysis
Get probability that variant beats control:
```python
result = calc.bayesian_analysis(
control_visitors=10000,
control_conversions=500,
variant_visitors=10000,
variant_conversions=550
)
# Returns:
{
"prob_variant_better": 0.9523,
"prob_control_better": 0.0477,
"expected_lift": 0.098,
"credible_interval_95": [0.02, 0.18]
}
```
## Multiple Variant Testing
Test multiple variants with correction for multiple comparisons:
```python
result = calc.test_multiple_variants(
control=(10000, 500), # (visitors, conversions)
variants=[
(10000, 550), # Variant A
(10000, 520), # Variant B
(10000, 480) # Variant C
],
correction="bonferroni" # or "holm", "none"
)
# Returns:
{
"control": {"visitors": 10000, "conversions": 500, "rate": 0.05},
"variants": [
{"visitors": 10000, "conversions": 550, "rate": 0.055,
"lift": 0.10, "p_value": 0.012, "significant": True},
...
],
"winner": "Variant A",
"correction_method": "bonferroni"
}
```
## Output Format
### Significance Test Result
```python
{
"significant": True,
"p_value": 0.0234,
"control_rate": 0.05,
"variant_rate": 0.055,
"lift": 0.10,
"lift_absolute": 0.005,
"confidence_interval": {
"lower": 0.02,
"upper": 0.18
},
"test_method": "chi_square",
"alpha": 0.05,
"recommendation": "Variant shows significant improvement"
}
```
## Example Workflows
### Pre-Test Planning
```python
calc = ABTestCalculator()
# 1. Estimate required sample size
sample = calc.calculate_sample_size(
baseline_rate=0.03, # Current 3% conversion
minimum_detectable_effect=0.15, # Want to detect 15% lift
power=0.8
)
print(f"Need {sample['sample_size_per_variant']} visitors per variant")
# 2. Estimate test duration
duration = calc.estimate_duration(
daily_visitors=5000,
baseline_rate=0.03,
minimum_detectable_effect=0.15
)
print(f"Test will take ~{duration['days']} days")
```
### Post-Test Analysis
```python
calc = ABTestCalculator()
# 1. Test significance
result = calc.test_significance(
control_visitors=15000,
control_conversions=450,
variant_visitors=15000,
variant_conversions=525
)
# 2. Get Bayesian probability
bayes = calc.bayesian_analysis(15000, 450, 15000, 525)
print(f"P-value: {result['p_value']:.4f}")
print(f"Lift: {result['lift']:.2%}")
print(f"Probability variant wins: {bayes['prob_variant_better']:.1%}")
```
## Dependencies
- scipy>=1.10.0
- numpy>=1.24.0
- statsmodels>=0.14.0Related Skills
agent-test-automator
Expert test automation engineer specializing in building robust test frameworks, CI/CD integration, and comprehensive test coverage. Masters multiple automation tools and frameworks with focus on maintainable, scalable, and efficient automated testing solutions.
agent-penetration-tester
Expert penetration tester specializing in ethical hacking, vulnerability assessment, and security testing. Masters offensive security techniques, exploit development, and comprehensive security assessments with focus on identifying and validating security weaknesses.
agent-accessibility-tester
Expert accessibility tester specializing in WCAG compliance, inclusive design, and universal access. Masters screen reader compatibility, keyboard navigation, and assistive technology integration with focus on creating barrier-free digital experiences.
add-unit-tests
Guide for adding unit tests to AReaL. Use when user wants to add tests for new functionality or increase test coverage.
accessibility-testing
WCAG compliance testing and accessibility quality assurance workflows for iOS apps. Use when validating accessibility labels, testing VoiceOver compatibility, checking contrast ratios, or ensuring WCAG 2.1 compliance. Covers accessibility tree analysis, semantic validation, and automated accessibility testing patterns.
accessibility-tester
Expert accessibility tester specializing in WCAG compliance, inclusive design, and universal access. Masters screen reader compatibility, keyboard navigation, and assistive technology integration with focus on creating barrier-free digital experiences.
accessibility-test-axe
Эксперт по a11y тестированию. Используй для axe-core, automated testing и accessibility audits.
acceptance-tester
Execute systematic acceptance testing to verify implementations against acceptance criteria. Use this skill when tasks mention "驗收測試", "acceptance testing", "驗收", "validate implementation", or when Gherkin scenarios need to be executed.
acceptance-test-writing
Guide for writing high-quality acceptance criteria and acceptance tests using industry-standard BDD (Behavior-Driven Development) and ATDD (Acceptance Test-Driven Development) practices. Use this skill when creating acceptance criteria for user stories, writing Gherkin scenarios, or implementing acceptance test specifications following Given-When-Then format.
acceptance-test-driven-development
Write acceptance tests before unit tests to ensure you're building the right thing
acc-testing-knowledge
Testing knowledge base for PHP 8.5 projects. Provides testing pyramid, AAA pattern, naming conventions, isolation principles, DDD testing guidelines, and PHPUnit patterns.
acc-detect-test-smells
Detects test antipatterns and code smells in PHP test suites. Identifies 15 smells (Logic in Test, Mock Overuse, Fragile Tests, Mystery Guest, etc.) with fix recommendations and refactoring patterns for testability.