A/B Test Setup Skill

## Trigger

3,891 stars

Best use case

A/B Test Setup Skill is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

## Trigger

Teams using A/B Test Setup Skill should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/sw-ab-test-setup/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/amdf01-debug/sw-ab-test-setup/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/sw-ab-test-setup/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How A/B Test Setup Skill Compares

Feature / AgentA/B Test Setup SkillStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

## Trigger

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# A/B Test Setup Skill

## Trigger
Plan A/B tests with proper methodology — hypothesis, sample size, duration, variant design, statistical significance.

**Trigger phrases:** "A/B test", "split test", "experiment", "test this change", "variant", "multivariate test", "hypothesis"

## Process

1. **Hypothesis**: What are you testing and why?
2. **Metrics**: Primary metric, guardrail metrics, success criteria
3. **Design**: Control vs variant(s), what exactly changes
4. **Calculate**: Sample size, test duration, minimum detectable effect
5. **Plan**: Implementation, QA, analysis timeline

## Output Format

```markdown
# A/B Test Plan: [Name]

## Hypothesis
If we [change], then [metric] will [improve/increase] because [reason].

## Variants
- **Control (A):** [current experience]
- **Variant (B):** [proposed change — be specific]

## Metrics
- **Primary:** [metric] — current: [X%] — target: [Y%]
- **Guardrail:** [metric that should NOT decrease]

## Sample Size & Duration
- MDE: [minimum detectable effect, e.g., 10% relative]
- Sample needed: [N per variant]
- Current traffic: [X visitors/day to test area]
- Estimated duration: [Y days/weeks]
- Confidence level: 95%

## Implementation Notes
[What needs to change, where, any technical considerations]

## Decision Framework
- If primary metric improves ≥ MDE with p < 0.05 → ship variant
- If no significant difference after [duration] → keep control
- If guardrail metric drops > [threshold] → stop test immediately
```

## Rules

- Never run a test without a hypothesis
- One change per test (unless multivariate with sufficient traffic)
- Run for minimum 2 full business cycles (usually 2 weeks)
- Don't peek at results daily — pre-commit to evaluation date
- 95% confidence minimum. 80% power minimum.
- Document everything: future you needs to know why this was tested

Related Skills

backtester

3891
from openclaw/skills

Professional backtesting framework for trading strategies. Tests SMA crossover, RSI, MACD, Bollinger Bands, and custom strategies on historical data. Generates equity curves, drawdown analysis, and performance metrics.

Finance & Investing

pentest-c2-operator

3891
from openclaw/skills

Set up authorized C2 simulation workflows and measure defensive detection outcomes.

Security

pentest-auth-bypass

3891
from openclaw/skills

Test authentication and session management controls for bypass and account takeover scenarios.

Security

pentest-api-attacker

3891
from openclaw/skills

Test APIs against OWASP API Security Top 10 including discovery, auth abuse, and protocol-specific checks.

Security

pentest-active-directory

3891
from openclaw/skills

Assess Active Directory identity attack paths including roasting, relay, and delegation abuse.

Security

nmap-pentest-scans

3891
from openclaw/skills

Plan and orchestrate authorized Nmap host discovery, port and service enumeration, NSE profiling, and reporting artifacts for in-scope targets.

Security

rust-testing-code-review

3891
from openclaw/skills

Reviews Rust test code for unit test patterns, integration test structure, async testing, mocking approaches, and property-based testing. Use when reviewing _test.rs files,

skill-test-sandbox

3891
from openclaw/skills

将用户给出的任意非技术话题用三行打油诗(每行字数相近、押韵或顺口)进行趣味总结。不调用任何工具。 在用户要求测试 Skill、沙盒演示、打油诗总结、或明确说与充电业务无关的玩笑/练习时使用。

pytest-code-review

3891
from openclaw/skills

Reviews pytest test code for async patterns, fixtures, parametrize, and mocking. Use when reviewing test_*.py files, checking async test functions, fixture usage, or mock patterns.

vynn-backtester

3891
from openclaw/skills

Run trading strategy backtests with natural language — powered by Vynn

qa-tester

3891
from openclaw/skills

Strict QA and test engineering skill for fullstack repositories. Use when writing test plans, implementing unit/integration/E2E tests, reproducing bugs, validating regressions, or preparing release readiness. Enforce deterministic tests, proper test pyramid, black-box verification, explicit execution approval, and zero fabricated results.

cmdb-test

3891
from openclaw/skills

自动化 IMyFone CMDB 版本发布到测试环境 - 一键发布应用