A/B Test Design

Statistical experiment design and analysis capabilities for product experimentation

509 stars

Best use case

A/B Test Design is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Statistical experiment design and analysis capabilities for product experimentation

Teams using A/B Test Design should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ab-test-design/SKILL.md --create-dirs "https://raw.githubusercontent.com/a5c-ai/babysitter/main/library/specializations/product-management/skills/ab-test-design/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/ab-test-design/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How A/B Test Design Compares

Feature / AgentA/B Test DesignStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Statistical experiment design and analysis capabilities for product experimentation

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# A/B Test Design Skill

## Overview

Specialized skill for statistical experiment design and analysis capabilities. Enables product teams to design rigorous experiments, calculate sample sizes, and interpret results with statistical confidence.

## Capabilities

### Experiment Design
- Calculate required sample sizes for experiments
- Design experiment variants and hypotheses
- Define success metrics and guardrail metrics
- Create experiment documentation templates
- Design multi-variant tests (A/B/n)
- Plan sequential and Bayesian experiments

### Statistical Analysis
- Validate statistical significance of results
- Calculate practical significance and effect sizes
- Detect interaction effects and segments
- Perform power analysis
- Calculate confidence intervals
- Handle multiple comparison corrections

### Decision Support
- Recommend ship/iterate/kill decisions
- Identify segment-specific impacts
- Assess long-term vs short-term effects
- Generate experiment reports
- Track experiment velocity metrics

## Target Processes

This skill integrates with the following processes:
- `product-market-fit.js` - Validation experiments for PMF hypotheses
- `conversion-funnel-analysis.js` - Funnel optimization experiments
- `beta-program.js` - A/B testing during beta phases

## Input Schema

```json
{
  "type": "object",
  "properties": {
    "experimentType": {
      "type": "string",
      "enum": ["ab", "multivariate", "sequential", "bandit"],
      "description": "Type of experiment to design"
    },
    "hypothesis": {
      "type": "string",
      "description": "Hypothesis to test"
    },
    "primaryMetric": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "baseline": { "type": "number" },
        "mde": { "type": "number", "description": "Minimum detectable effect" }
      }
    },
    "guardrailMetrics": {
      "type": "array",
      "items": { "type": "string" },
      "description": "Metrics that should not regress"
    },
    "trafficAllocation": {
      "type": "number",
      "description": "Percentage of traffic for experiment"
    },
    "confidenceLevel": {
      "type": "number",
      "default": 0.95,
      "description": "Statistical confidence level"
    }
  },
  "required": ["experimentType", "hypothesis", "primaryMetric"]
}
```

## Output Schema

```json
{
  "type": "object",
  "properties": {
    "experimentPlan": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "hypothesis": { "type": "string" },
        "variants": { "type": "array", "items": { "type": "object" } },
        "sampleSize": { "type": "number" },
        "duration": { "type": "string" },
        "metrics": { "type": "object" }
      }
    },
    "powerAnalysis": {
      "type": "object",
      "properties": {
        "requiredSampleSize": { "type": "number" },
        "estimatedDuration": { "type": "string" },
        "power": { "type": "number" }
      }
    },
    "implementation": {
      "type": "object",
      "properties": {
        "trackingEvents": { "type": "array", "items": { "type": "string" } },
        "segmentation": { "type": "array", "items": { "type": "string" } },
        "rolloutPlan": { "type": "string" }
      }
    },
    "analysisFramework": {
      "type": "object",
      "properties": {
        "primaryAnalysis": { "type": "string" },
        "secondaryAnalyses": { "type": "array", "items": { "type": "string" } },
        "decisionCriteria": { "type": "object" }
      }
    }
  }
}
```

## Usage Example

```javascript
const experimentDesign = await executeSkill('ab-test-design', {
  experimentType: 'ab',
  hypothesis: 'Adding social proof to pricing page increases conversion by 10%',
  primaryMetric: {
    name: 'pricing_page_conversion',
    baseline: 0.05,
    mde: 0.10
  },
  guardrailMetrics: ['revenue_per_visitor', 'bounce_rate'],
  trafficAllocation: 50,
  confidenceLevel: 0.95
});
```

## Dependencies

- Statistical libraries for power analysis
- Experimentation platform integrations (Optimizely, LaunchDarkly, etc.)

Related Skills

vitest

509
from a5c-ai/babysitter

Vitest configuration, mocking, coverage, snapshot testing, and performance.

rest-api-design

509
from a5c-ai/babysitter

RESTful API design principles, versioning, pagination, HATEOAS, and documentation.

react-testing-library

509
from a5c-ai/babysitter

React Testing Library patterns, queries, user events, and accessibility testing.

design-tokens

509
from a5c-ai/babysitter

Design token management, generation, and multi-platform support.

design-token-transformer

509
from a5c-ai/babysitter

Transform design tokens across multiple formats and platforms. Parse W3C design token format, transform to CSS/SCSS/JS/iOS/Android, handle token aliases and references, and generate documentation.

design-system-validator

509
from a5c-ai/babysitter

Validate design system compliance in code and detect token usage violations

load-test-generator

509
from a5c-ai/babysitter

Generate load test scripts for k6, Locust, and Gatling from OpenAPI specs

cloud-security-testing

509
from a5c-ai/babysitter

Multi-cloud security assessment and penetration testing capabilities. Execute Prowler/ScoutSuite assessments, analyze IAM policies, identify cloud misconfigurations, test permissions, and enumerate cloud resources across AWS/GCP/Azure.

scope-permission-designer

509
from a5c-ai/babysitter

Design and implement scoped permission models

rate-limiter-designer

509
from a5c-ai/babysitter

Design and implement rate limiting strategies

protobuf-grpc-designer

509
from a5c-ai/babysitter

Protocol Buffers and gRPC service definition with backward compatibility checks

middleware-chain-designer

509
from a5c-ai/babysitter

Design middleware and interceptor chains for SDK extensibility