AI Agent Skill HUB

A/B Test Setup Skill

## Trigger

3,891 stars

View on GitHub Installation ↓

Best use case

A/B Test Setup Skill is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

## Trigger

Teams using A/B Test Setup Skill should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/sw-ab-test-setup/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/amdf01-debug/sw-ab-test-setup/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/sw-ab-test-setup/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How A/B Test Setup Skill Compares

Feature / Agent	A/B Test Setup Skill	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

## Trigger

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

SKILL.md Source

# A/B Test Setup Skill

## Trigger
Plan A/B tests with proper methodology — hypothesis, sample size, duration, variant design, statistical significance.

**Trigger phrases:** "A/B test", "split test", "experiment", "test this change", "variant", "multivariate test", "hypothesis"

## Process

1. **Hypothesis**: What are you testing and why?
2. **Metrics**: Primary metric, guardrail metrics, success criteria
3. **Design**: Control vs variant(s), what exactly changes
4. **Calculate**: Sample size, test duration, minimum detectable effect
5. **Plan**: Implementation, QA, analysis timeline

## Output Format

```markdown
# A/B Test Plan: [Name]

## Hypothesis
If we [change], then [metric] will [improve/increase] because [reason].

## Variants
- **Control (A):** [current experience]
- **Variant (B):** [proposed change — be specific]

## Metrics
- **Primary:** [metric] — current: [X%] — target: [Y%]
- **Guardrail:** [metric that should NOT decrease]

## Sample Size & Duration
- MDE: [minimum detectable effect, e.g., 10% relative]
- Sample needed: [N per variant]
- Current traffic: [X visitors/day to test area]
- Estimated duration: [Y days/weeks]
- Confidence level: 95%

## Implementation Notes
[What needs to change, where, any technical considerations]

## Decision Framework
- If primary metric improves ≥ MDE with p < 0.05 → ship variant
- If no significant difference after [duration] → keep control
- If guardrail metric drops > [threshold] → stop test immediately
```

## Rules

- Never run a test without a hypothesis
- One change per test (unless multivariate with sufficient traffic)
- Run for minimum 2 full business cycles (usually 2 weeks)
- Don't peek at results daily — pre-commit to evaluation date
- 95% confidence minimum. 80% power minimum.
- Document everything: future you needs to know why this was tested

Related Skills

backtester

from openclaw/skills

Professional backtesting framework for trading strategies. Tests SMA crossover, RSI, MACD, Bollinger Bands, and custom strategies on historical data. Generates equity curves, drawdown analysis, and performance metrics.

Finance & Investing

pentest-c2-operator

from openclaw/skills

Set up authorized C2 simulation workflows and measure defensive detection outcomes.

pentest-auth-bypass

from openclaw/skills

Test authentication and session management controls for bypass and account takeover scenarios.

pentest-api-attacker

from openclaw/skills

Test APIs against OWASP API Security Top 10 including discovery, auth abuse, and protocol-specific checks.

pentest-active-directory

from openclaw/skills

Assess Active Directory identity attack paths including roasting, relay, and delegation abuse.

nmap-pentest-scans

from openclaw/skills

Plan and orchestrate authorized Nmap host discovery, port and service enumeration, NSE profiling, and reporting artifacts for in-scope targets.

rust-testing-code-review

from openclaw/skills

Reviews Rust test code for unit test patterns, integration test structure, async testing, mocking approaches, and property-based testing. Use when reviewing _test.rs files,

skill-test-sandbox

from openclaw/skills

将用户给出的任意非技术话题用三行打油诗（每行字数相近、押韵或顺口）进行趣味总结。不调用任何工具。在用户要求测试 Skill、沙盒演示、打油诗总结、或明确说与充电业务无关的玩笑/练习时使用。

pytest-code-review

from openclaw/skills

Reviews pytest test code for async patterns, fixtures, parametrize, and mocking. Use when reviewing test_*.py files, checking async test functions, fixture usage, or mock patterns.

vynn-backtester

from openclaw/skills

Run trading strategy backtests with natural language — powered by Vynn

qa-tester

from openclaw/skills

Strict QA and test engineering skill for fullstack repositories. Use when writing test plans, implementing unit/integration/E2E tests, reproducing bugs, validating regressions, or preparing release readiness. Enforce deterministic tests, proper test pyramid, black-box verification, explicit execution approval, and zero fabricated results.

cmdb-test

from openclaw/skills

自动化 IMyFone CMDB 版本发布到测试环境 - 一键发布应用