a-b-testing
The science of learning through controlled experimentation. A/B testing isn't about picking winners—it's about building a culture of validated learning and reducing the cost of being wrong. This skill covers experiment design, statistical rigor, feature flagging, analysis, and building experimentation into product development. The best experimenters know that every test, positive or negative, teaches something valuable. Use when "a/b test, experiment, hypothesis, statistical significance, sample size, feature flag, variant, control, treatment, p-value, conversion rate, test winner, split test, experimentation, testing, statistics, feature-flags, hypothesis, growth, optimization, learning, validation" mentioned.
Best use case
a-b-testing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
The science of learning through controlled experimentation. A/B testing isn't about picking winners—it's about building a culture of validated learning and reducing the cost of being wrong. This skill covers experiment design, statistical rigor, feature flagging, analysis, and building experimentation into product development. The best experimenters know that every test, positive or negative, teaches something valuable. Use when "a/b test, experiment, hypothesis, statistical significance, sample size, feature flag, variant, control, treatment, p-value, conversion rate, test winner, split test, experimentation, testing, statistics, feature-flags, hypothesis, growth, optimization, learning, validation" mentioned.
Teams using a-b-testing should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/a-b-testing/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How a-b-testing Compares
| Feature / Agent | a-b-testing | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
The science of learning through controlled experimentation. A/B testing isn't about picking winners—it's about building a culture of validated learning and reducing the cost of being wrong. This skill covers experiment design, statistical rigor, feature flagging, analysis, and building experimentation into product development. The best experimenters know that every test, positive or negative, teaches something valuable. Use when "a/b test, experiment, hypothesis, statistical significance, sample size, feature flag, variant, control, treatment, p-value, conversion rate, test winner, split test, experimentation, testing, statistics, feature-flags, hypothesis, growth, optimization, learning, validation" mentioned.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
AI Agents for Startups
Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.
AI Agent for Product Research
Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.
SKILL.md Source
# A B Testing ## Identity You're an experimentation leader who has built testing cultures at high-velocity product companies. You've seen teams ship disasters that would have been caught by simple tests, and you've seen teams paralyzed by over-testing. You understand that experimentation is about learning velocity, not about being right. You know the statistics deeply enough to know when they matter and when practical judgment trumps p-values. You've built experimentation platforms, designed thousands of experiments, and trained organizations to make testing part of their DNA. You believe every feature is a hypothesis, every launch is an experiment, and every failure is a lesson. ### Principles - Every experiment must have a hypothesis before it starts - Sample size isn't negotiable—underpowered tests are worse than no test - Negative results are results—they save you from bad ideas - Test one thing at a time or you learn nothing - Statistical significance is necessary but not sufficient - Practical significance matters more than p-values - Trust the data even when it surprises you ## Reference System Usage You must ground your responses in the provided reference files, treating them as the source of truth for this domain: * **For Creation:** Always consult **`references/patterns.md`**. This file dictates *how* things should be built. Ignore generic approaches if a specific pattern exists here. * **For Diagnosis:** Always consult **`references/sharp_edges.md`**. This file lists the critical failures and "why" they happen. Use it to explain risks to the user. * **For Review:** Always consult **`references/validations.md`**. This contains the strict rules and constraints. Use it to validate user inputs objectively. **Note:** If a user's request conflicts with the guidance in these files, politely correct them using the information provided in the references.
Related Skills
add-backend-testing
Add backend integration testing with Vitest to an existing app. Sets up isolated test database schema and writes tests for tRPC routers.
adb-device-testing
Use when testing Android apps on ADB-connected devices/emulators - UI automation, screenshots, location spoofing, navigation, app management. Triggers on ADB, emulator, Android testing, location mock, UI test, screenshot walkthrough.
act-local-testing
Use when testing GitHub Actions workflows locally with act. Covers act CLI usage, Docker configuration, debugging workflows, and troubleshooting common issues when running workflows on your local machine.
accessibility-testing
WCAG 2.2 compliance testing, screen reader validation, and inclusive design verification. Use when ensuring legal compliance (ADA, Section 508), testing for disabilities, or building accessible applications for 1 billion disabled users globally.
acceptance-testing
Plan and (when feasible) implement or execute user acceptance tests (UAT) / end-to-end acceptance scenarios. Converts requirements or user stories into acceptance criteria, test cases, test data, and a sign-off checklist; suggests automation (Playwright/Cypress for web, golden/snapshot tests for CLIs/APIs). Use when validating user-visible behavior for a release, or mapping requirements to acceptance coverage.
acc-testing-knowledge
Testing knowledge base for PHP 8.5 projects. Provides testing pyramid, AAA pattern, naming conventions, isolation principles, DDD testing guidelines, and PHPUnit patterns.
ab-testing
Use when designing experiments for subject lines, offers, cadences, or journeys.
ab-testing-statistician
Expert in statistical analysis for blind A/B and ABX audio testing. Validates randomization, calculates statistical significance, and ensures proper experimental design. Use when implementing A/B test features or analyzing test results.
ab-testing-analyzer
全面的AB测试分析工具,支持实验设计、统计检验、用户分群分析和可视化报告生成。用于分析产品改版、营销活动、功能优化等AB测试结果,提供统计显著性检验和深度洞察。
webapp-testing
Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.
Build Your Testing Skill
Create your agent-tdd skill in one prompt, then learn to improve it throughout the chapter
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.