ab-testing

Set up and evaluate A/B tests using feature-flag-server. Use when you need to run a multivariate experiment, assign users to variants, track which variant they received, or analyze variant distribution. Triggers include "A/B test", "split test", "experiment", "variant", "multivariate", "rollout percentage", or any task involving exposing different experiences to different users.

7 stars

byheldernoid

View on GitHub Installation ↓

Best use case

ab-testing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using ab-testing should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ab-testing/SKILL.md --create-dirs "https://raw.githubusercontent.com/heldernoid/agentic-build-templates/main/projects/developer-tools/feature-flag-server/skills/ab-testing/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/ab-testing/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ab-testing Compares

Feature / Agent	ab-testing	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# ab-testing

Run A/B tests and multivariate experiments using feature-flag-server segment flags.

## When to use

- Running a two-variant A/B test (control vs treatment)
- Multivariate tests with more than two variants
- Gradual feature rollout with percentage-based targeting
- Segment-targeted experiments (e.g., only beta users see the variant)

## Flag types for experiments

| Type | Use case |
|---|---|
| `percentage` | Random percentage split (e.g., 50% see the feature) |
| `segment` | Target specific user groups with different variants |

For A/B tests, use a `segment` flag with weighted variants. This gives full control over the variant split.

## Creating an A/B test via REST API

```bash
# Step 1: Create the flag
curl -s -X POST http://localhost:7777/api/flags \
  -H "Authorization: Bearer ffs_..." \
  -H "Content-Type: application/json" \
  -d '{
    "key": "button-color-test",
    "name": "Button Color A/B Test",
    "flagType": "segment"
  }'
# Response: { "id": "550e8400-e29b-...", "key": "button-color-test", ... }

# Step 2: Configure variants and weights for production
curl -s -X PATCH http://localhost:7777/api/flags/550e8400-e29b-.../environments/production \
  -H "Authorization: Bearer ffs_..." \
  -H "Content-Type: application/json" \
  -d '{
    "enabled": 1,
    "variants": [
      { "key": "control",   "value": "#3B82F6", "weight": 50 },
      { "key": "variant-a", "value": "#10B981", "weight": 30 },
      { "key": "variant-b", "value": "#F59E0B", "weight": 20 }
    ]
  }'
```

Note: variant weights must sum to 100.

## Evaluating a variant (SDK)

```typescript
import { FlagClient } from '@ffs/js-sdk';

const client = new FlagClient({
  baseUrl: 'http://localhost:7777',
  apiKey: process.env.FFS_API_KEY!,
  environment: 'production',
});

const result = await client.getVariant('button-color-test', {
  userId: req.user.id,
  plan: req.user.plan,
  country: req.user.country,
});

// result.value   => "#10B981"      (the variant's value)
// result.variant => "variant-a"   (the variant key)
// result.reason  => "segment_match"

// Use the value in your UI
const buttonStyle = { backgroundColor: result.value };
```

## Checking variant distribution via API

```bash
curl -s http://localhost:7777/api/analytics/flags/550e8400-e29b-.../variants \
  -H "Authorization: Bearer ffs_..."
# Response:
# {
#   "total": 111432,
#   "variants": [
#     { "key": "control",   "count": 55716, "pct": 50.0 },
#     { "key": "variant-a", "count": 33430, "pct": 30.0 },
#     { "key": "variant-b", "count": 22286, "pct": 20.0 }
#   ],
#   "periodDays": 7
# }
```

## Percentage rollout (gradual launch)

For simple gradual rollouts without variants, use a `percentage` flag:

```bash
# Create percentage flag, start at 0%
curl -s -X POST http://localhost:7777/api/flags \
  -H "Authorization: Bearer ffs_..." \
  -H "Content-Type: application/json" \
  -d '{ "key": "feature-x-rollout", "name": "Feature X Rollout", "flagType": "percentage" }'

# Enable at 10% in production
curl -s -X PATCH http://localhost:7777/api/flags/<id>/environments/production \
  -H "Authorization: Bearer ffs_..." \
  -H "Content-Type: application/json" \
  -d '{ "enabled": 1, "rolloutPct": 10 }'

# Ramp to 50%
curl -s -X PATCH http://localhost:7777/api/flags/<id>/environments/production \
  -H "Authorization: Bearer ffs_..." \
  -H "Content-Type: application/json" \
  -d '{ "rolloutPct": 50 }'
```

Evaluating a percentage flag returns `true` if the user is within the rollout, `false` otherwise. The assignment is deterministic and stable for each user.

## Segment-targeted experiments

Target a variant only at specific users by combining a segment with a flag:

1. Create a segment: `POST /api/segments` with rules (e.g., `user.plan eq "enterprise"`)
2. Create a segment flag
3. Add a rule that assigns `variant-a` to users matching the segment
4. Set a default variant (returned for users who do not match any rule)

```json
{
  "rules": [
    {
      "segmentId": "segment-uuid-here",
      "variant": "variant-a"
    }
  ],
  "defaultVariant": "control"
}
```

## Behavior

- **Determinism:** The same `userId` + `flagKey` combination always produces the same variant. Changing the flag weights may re-assign some users.
- **Context isolation:** Context values are only used for segment rule matching. They are not stored or logged in full.
- **Audit trail:** Every variant weight or rule change is recorded in the audit log with a diff.

Related Skills

Skill: Uptime Monitoring

from heldernoid/agentic-build-templates

## Overview

Skill: Status Page

from heldernoid/agentic-build-templates

## Overview

Skill: unit-conversion

from heldernoid/agentic-build-templates

## Overview

Skill: recipe-scaler

from heldernoid/agentic-build-templates

## Overview

reading-list

from heldernoid/agentic-build-templates

Operate the reading-list API to save, manage, tag, search, and export articles.

email-digest

from heldernoid/agentic-build-templates

Configure, test, and troubleshoot the reading-list daily email digest delivered via nodemailer.

websocket-realtime

from heldernoid/agentic-build-templates

Use the WebSocket connection in poll-builder to receive live vote updates. Use when you need to stream real-time poll results, monitor a poll for new votes, or build a live dashboard. Triggers include "live results", "real-time updates", "stream votes", "watch poll", or "WebSocket".

poll-builder

from heldernoid/agentic-build-templates

Self-hosted poll creation tool with real-time results. Use when you need to create a poll, check vote counts, close a poll, export results, or get the shareable link for a poll. Triggers include "create poll", "vote", "poll results", "survey", "collect votes", "share poll", or any task involving polling or voting.

Skill: personal-finance

from heldernoid/agentic-build-templates

## Overview

Skill: csv-import

from heldernoid/agentic-build-templates

## Overview

Skill: Syntax Highlighting

from heldernoid/agentic-build-templates

## Purpose

Skill: Pastebin Core

from heldernoid/agentic-build-templates

## Purpose