Best use case
A/B Test Setup Skill is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
## Trigger
Teams using A/B Test Setup Skill should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/sw-ab-test-setup/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How A/B Test Setup Skill Compares
| Feature / Agent | A/B Test Setup Skill | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
## Trigger
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
AI Agents for Marketing
Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.
AI Agents for Startups
Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.
SKILL.md Source
# A/B Test Setup Skill ## Trigger Plan A/B tests with proper methodology — hypothesis, sample size, duration, variant design, statistical significance. **Trigger phrases:** "A/B test", "split test", "experiment", "test this change", "variant", "multivariate test", "hypothesis" ## Process 1. **Hypothesis**: What are you testing and why? 2. **Metrics**: Primary metric, guardrail metrics, success criteria 3. **Design**: Control vs variant(s), what exactly changes 4. **Calculate**: Sample size, test duration, minimum detectable effect 5. **Plan**: Implementation, QA, analysis timeline ## Output Format ```markdown # A/B Test Plan: [Name] ## Hypothesis If we [change], then [metric] will [improve/increase] because [reason]. ## Variants - **Control (A):** [current experience] - **Variant (B):** [proposed change — be specific] ## Metrics - **Primary:** [metric] — current: [X%] — target: [Y%] - **Guardrail:** [metric that should NOT decrease] ## Sample Size & Duration - MDE: [minimum detectable effect, e.g., 10% relative] - Sample needed: [N per variant] - Current traffic: [X visitors/day to test area] - Estimated duration: [Y days/weeks] - Confidence level: 95% ## Implementation Notes [What needs to change, where, any technical considerations] ## Decision Framework - If primary metric improves ≥ MDE with p < 0.05 → ship variant - If no significant difference after [duration] → keep control - If guardrail metric drops > [threshold] → stop test immediately ``` ## Rules - Never run a test without a hypothesis - One change per test (unless multivariate with sufficient traffic) - Run for minimum 2 full business cycles (usually 2 weeks) - Don't peek at results daily — pre-commit to evaluation date - 95% confidence minimum. 80% power minimum. - Document everything: future you needs to know why this was tested
Related Skills
backtester
Professional backtesting framework for trading strategies. Tests SMA crossover, RSI, MACD, Bollinger Bands, and custom strategies on historical data. Generates equity curves, drawdown analysis, and performance metrics.
pentest-c2-operator
Set up authorized C2 simulation workflows and measure defensive detection outcomes.
pentest-auth-bypass
Test authentication and session management controls for bypass and account takeover scenarios.
pentest-api-attacker
Test APIs against OWASP API Security Top 10 including discovery, auth abuse, and protocol-specific checks.
pentest-active-directory
Assess Active Directory identity attack paths including roasting, relay, and delegation abuse.
nmap-pentest-scans
Plan and orchestrate authorized Nmap host discovery, port and service enumeration, NSE profiling, and reporting artifacts for in-scope targets.
rust-testing-code-review
Reviews Rust test code for unit test patterns, integration test structure, async testing, mocking approaches, and property-based testing. Use when reviewing _test.rs files,
skill-test-sandbox
将用户给出的任意非技术话题用三行打油诗(每行字数相近、押韵或顺口)进行趣味总结。不调用任何工具。 在用户要求测试 Skill、沙盒演示、打油诗总结、或明确说与充电业务无关的玩笑/练习时使用。
pytest-code-review
Reviews pytest test code for async patterns, fixtures, parametrize, and mocking. Use when reviewing test_*.py files, checking async test functions, fixture usage, or mock patterns.
vynn-backtester
Run trading strategy backtests with natural language — powered by Vynn
qa-tester
Strict QA and test engineering skill for fullstack repositories. Use when writing test plans, implementing unit/integration/E2E tests, reproducing bugs, validating regressions, or preparing release readiness. Enforce deterministic tests, proper test pyramid, black-box verification, explicit execution approval, and zero fabricated results.
cmdb-test
自动化 IMyFone CMDB 版本发布到测试环境 - 一键发布应用