a-b-testing
The science of learning through controlled experimentation. A/B testing isn't about picking winners—it's about building a culture of validated learning and reducing the cost of being wrong. This skill covers experiment design, statistical rigor, feature flagging, analysis, and building experimentation into product development. The best experimenters know that every test, positive or negative, teaches something valuable. Use when "a/b test, experiment, hypothesis, statistical significance, sample size, feature flag, variant, control, treatment, p-value, conversion rate, test winner, split test, experimentation, testing, statistics, feature-flags, hypothesis, growth, optimization, learning, validation" mentioned.
Best use case
a-b-testing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
The science of learning through controlled experimentation. A/B testing isn't about picking winners—it's about building a culture of validated learning and reducing the cost of being wrong. This skill covers experiment design, statistical rigor, feature flagging, analysis, and building experimentation into product development. The best experimenters know that every test, positive or negative, teaches something valuable. Use when "a/b test, experiment, hypothesis, statistical significance, sample size, feature flag, variant, control, treatment, p-value, conversion rate, test winner, split test, experimentation, testing, statistics, feature-flags, hypothesis, growth, optimization, learning, validation" mentioned.
Teams using a-b-testing should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/a-b-testing/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How a-b-testing Compares
| Feature / Agent | a-b-testing | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
The science of learning through controlled experimentation. A/B testing isn't about picking winners—it's about building a culture of validated learning and reducing the cost of being wrong. This skill covers experiment design, statistical rigor, feature flagging, analysis, and building experimentation into product development. The best experimenters know that every test, positive or negative, teaches something valuable. Use when "a/b test, experiment, hypothesis, statistical significance, sample size, feature flag, variant, control, treatment, p-value, conversion rate, test winner, split test, experimentation, testing, statistics, feature-flags, hypothesis, growth, optimization, learning, validation" mentioned.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
SKILL.md Source
# A B Testing ## Identity You're an experimentation leader who has built testing cultures at high-velocity product companies. You've seen teams ship disasters that would have been caught by simple tests, and you've seen teams paralyzed by over-testing. You understand that experimentation is about learning velocity, not about being right. You know the statistics deeply enough to know when they matter and when practical judgment trumps p-values. You've built experimentation platforms, designed thousands of experiments, and trained organizations to make testing part of their DNA. You believe every feature is a hypothesis, every launch is an experiment, and every failure is a lesson. ### Principles - Every experiment must have a hypothesis before it starts - Sample size isn't negotiable—underpowered tests are worse than no test - Negative results are results—they save you from bad ideas - Test one thing at a time or you learn nothing - Statistical significance is necessary but not sufficient - Practical significance matters more than p-values - Trust the data even when it surprises you ## Reference System Usage You must ground your responses in the provided reference files, treating them as the source of truth for this domain: * **For Creation:** Always consult **`references/patterns.md`**. This file dictates *how* things should be built. Ignore generic approaches if a specific pattern exists here. * **For Diagnosis:** Always consult **`references/sharp_edges.md`**. This file lists the critical failures and "why" they happen. Use it to explain risks to the user. * **For Review:** Always consult **`references/validations.md`**. This contains the strict rules and constraints. Use it to validate user inputs objectively. **Note:** If a user's request conflicts with the guidance in these files, politely correct them using the information provided in the references.
Related Skills
web-testing
Playwright automation, Chrome DevTools debugging, and browser interaction testing. Use for E2E/unit tests, capturing screenshots, inspecting network/console logs, or validating user flows in web applications.
qa-testing-mobile
Mobile app testing strategy and execution for iOS and Android (native + cross-platform): choose automation frameworks, define device matrix, control flakes, validate performance/reliability/accessibility, and set CI + release gates. Use when you need a mobile QA plan, device lab/CI setup, or guidance on XCUITest/Espresso/Appium/Detox/Maestro/Flutter testing.
anthropic-webapp-testing
Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.
anthropic-web-testing
Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.
Frontend Verification & Testing
Verify and test Angular 18 frontend changes using Chrome DevTools MCP. Automatically check console errors, network requests, and visual rendering after implementing tasks or when fixing UI bugs. Use when creating components, debugging visual issues, validating API integration, or ensuring UI requirements are met. File types: .ts, .html, .css, .scss
frontend-react-testing-strategy
Standardized guidelines and patterns for Frontend React Testing Strategy.
angular-testing
Write unit and integration tests for Angular v21+ applications using Vitest or Jasmine with TestBed, component harnesses, and modern testing patterns. Use for testing components with signals, OnPush change detection, services with inject(), and HTTP interactions. Triggers on test creation, testing signal-based components, mocking dependencies, or setting up test infrastructure.
testing-automation
Automated testing workflow that orchestrates unit, integration, and E2E tests with CI/CD integration
continuous-testing
Integrate automated testing into CI/CD pipelines for continuous quality feedback. Use for continuous testing, CI testing, automated testing pipelines, test orchestration, and DevOps quality practices.
cloud-penetration-testing
This skill should be used when the user asks to "perform cloud penetration testing", "assess Azure or AWS or GCP security", "enumerate cloud resources", "exploit cloud misconfiguratio...
bats-testing-patterns
Master Bash Automated Testing System (Bats) for comprehensive shell script testing. Use when writing tests for shell scripts, CI/CD pipelines, or requiring test-driven development of shell utilities.
azure-microsoft-playwright-testing-ts
Run Playwright tests at scale using Azure Playwright Workspaces (formerly Microsoft Playwright Testing). Use when scaling browser tests across cloud-hosted browsers, integrating with CI/CD pipeline...