experiment-designer
Use when planning product experiments, writing testable hypotheses, estimating sample size, prioritizing tests, or interpreting A/B outcomes with practical statistical rigor.
Best use case
experiment-designer is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Use when planning product experiments, writing testable hypotheses, estimating sample size, prioritizing tests, or interpreting A/B outcomes with practical statistical rigor.
Teams using experiment-designer should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/experiment-designer/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How experiment-designer Compares
| Feature / Agent | experiment-designer | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Use when planning product experiments, writing testable hypotheses, estimating sample size, prioritizing tests, or interpreting A/B outcomes with practical statistical rigor.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Startups
Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Best AI Skills for ChatGPT
Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.
SKILL.md Source
# Experiment Designer Design, prioritize, and evaluate product experiments with clear hypotheses and defensible decisions. ## When To Use Use this skill for: - A/B and multivariate experiment planning - Hypothesis writing and success criteria definition - Sample size and minimum detectable effect planning - Experiment prioritization with ICE scoring - Reading statistical output for product decisions ## Core Workflow 1. Write hypothesis in If/Then/Because format - If we change `[intervention]` - Then `[metric]` will change by `[expected direction/magnitude]` - Because `[behavioral mechanism]` 2. Define metrics before running test - Primary metric: single decision metric - Guardrail metrics: quality/risk protection - Secondary metrics: diagnostics only 3. Estimate sample size - Baseline conversion or baseline mean - Minimum detectable effect (MDE) - Significance level (alpha) and power Use: ```bash python3 scripts/sample_size_calculator.py --baseline-rate 0.12 --mde 0.02 --mde-type absolute ``` 4. Prioritize experiments with ICE - Impact: potential upside - Confidence: evidence quality - Ease: cost/speed/complexity ICE Score = (Impact * Confidence * Ease) / 10 5. Launch with stopping rules - Decide fixed sample size or fixed duration in advance - Avoid repeated peeking without proper method - Monitor guardrails continuously 6. Interpret results - Statistical significance is not business significance - Compare point estimate + confidence interval to decision threshold - Investigate novelty effects and segment heterogeneity ## Hypothesis Quality Checklist - [ ] Contains explicit intervention and audience - [ ] Specifies measurable metric change - [ ] States plausible causal reason - [ ] Includes expected minimum effect - [ ] Defines failure condition ## Common Experiment Pitfalls - Underpowered tests leading to false negatives - Running too many simultaneous changes without isolation - Changing targeting or implementation mid-test - Stopping early on random spikes - Ignoring sample ratio mismatch and instrumentation drift - Declaring success from p-value without effect-size context ## Statistical Interpretation Guardrails - p-value < alpha indicates evidence against null, not guaranteed truth. - Confidence interval crossing zero/no-effect means uncertain directional claim. - Wide intervals imply low precision even when significant. - Use practical significance thresholds tied to business impact. See: - `references/experiment-playbook.md` - `references/statistics-reference.md` ## Tooling ### `scripts/sample_size_calculator.py` Computes required sample size (per variant and total) from: - baseline rate - MDE (absolute or relative) - significance level (alpha) - statistical power Example: ```bash python3 scripts/sample_size_calculator.py \ --baseline-rate 0.10 \ --mde 0.015 \ --mde-type absolute \ --alpha 0.05 \ --power 0.8 ```
Related Skills
ui-designer
Design beautiful interfaces using 16+ design systems including Material You, Fluent Design, Apple HIG, Ant Design, Carbon Design, Shopify Polaris, Minimalism, Glassmorphism, Neo-Brutalism, Neumorphism, Skeuomorphism, Claymorphism, Swiss Design, and Atlassian Design. Expert in Tailwind CSS, color harmonics, component theming, and accessibility (WCAG).
designer-intelligence-station
设计师情报收集工具。监控 40 个公开信息源(AI/硬件/手机/设计),6 维筛选标准 v2.0(基于 120+ 条行为分析),生成结构化日报/周报。仅抓取公开内容,不登录、不提交表单、不绕过付费墙。支持依赖自动检测和安装。
ml-experiment-tracker
Plan reproducible ML experiment runs with explicit parameters, metrics, and artifacts. Use before model training to standardize tracking-ready experiment definitions.
ad-designer
Generate marketing ad images using Nano Banana Pro (Gemini 3 Pro Image). Accepts campaign-planner creative briefs, reads brand bible for visual style, constructs marketing-optimized prompts, and produces platform-ready images at correct aspect ratios. Supports 1:1, 9:16, 16:9, 4:5 formats. Includes self-review loop to catch hallucinated logos, wrong text, and quality issues. Draft-first workflow (1K fast iteration, 4K final). Outputs to /tmp/marketing/assets/images/.
ux-researcher-designer
UX research and design toolkit for Senior UX Designer/Researcher including data-driven persona generation, journey mapping, usability testing frameworks, and research synthesis. Use for user research, persona creation, journey mapping, and design validation.
observability-designer
Observability Designer (POWERFUL)
interview-system-designer
This skill should be used when the user asks to "design interview processes", "create hiring pipelines", "calibrate interview loops", "generate interview questions", "design competency matrices", "analyze interviewer bias", "create scoring rubrics", "build question banks", or "optimize hiring systems". Use for designing role-specific interview loops, competency assessments, and hiring calibration systems.
database-schema-designer
Database Schema Designer
database-designer
Database Designer - POWERFUL Tier Skill
flow-panel-designer
Design multicolor flow cytometry panels minimizing spectral overlap
crispr-grna-designer
Design CRISPR gRNA sequences for specific gene exons with off-target prediction and efficiency scoring. Trigger when user needs gRNA design, CRISPR guide RNA selection, or genome editing target analysis.
stitch-ui-designer
Design, preview, and generate UI code using Google Stitch (via MCP). Helps developers choose the best UI by generating previews first, allowing iteration, and then exporting code.