dev-ai-based-test

AI-based testing via subagent + a per-task test-flow skill. Use when the user wants to verify something that mechanical assertions can't fully capture — image recognition, visual size/position comparison, animation smoothness, multi-step manual flows that need AI judgment. Triggers: 'AI-based test', 'AI test', 'visual verify', 'image recognition test', 'manual operation test', 'human-eye check', 'verify visually', 'compare screenshots', 'looks the same', 'looks correct'. The skill's job is to (1) author a focused test-flow skill that captures the exact procedure + verdict criteria, then (2) dispatch a verification subagent via the Agent tool that loads BOTH the test-flow skill AND a browser-driving skill (/verify-ui primary, /headless-browser fallback) so the subagent has clear context and consistent verdicts. NEVER uses `claude -p` — subagent dispatch goes through the Agent tool exclusively.

6 stars

byTakazudo

View on GitHub Installation ↓

Best use case

dev-ai-based-test is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using dev-ai-based-test should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/dev-ai-based-test/SKILL.md --create-dirs "https://raw.githubusercontent.com/Takazudo/claude-resources/main/skills/dev-ai-based-test/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/dev-ai-based-test/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How dev-ai-based-test Compares

Feature / Agent	dev-ai-based-test	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# Dev: AI-based test

AI-based testing for things that can't be cleanly mechanically asserted: image recognition, visual size / position parity, animation correctness, multi-step manual flows where a human eye would catch the bug but `assertEqual` won't.

The deliverable is **not just "run a test"** — it's a reusable, focused **test-flow skill** that captures the test procedure with clear context, plus a dispatched verification subagent that loads that skill alongside a browser-driving skill.

## When to use

- User explicitly asks for AI-based / visual / human-eye verification — "test this with AI", "verify visually", "make sure it looks the same", "image recognition test".
- A symptom is reported that mechanical assertions can't see: image rendered visibly smaller, animation stutters in a way frame-rate doesn't capture, a screenshot looks "off" in a way the user can describe but DOM measurements agree.
- A multi-step manual flow needs verification (drag-and-drop a real file, then compare two phases visually) and the existing fixed-suite specs don't cover it.
- The user wants the verdict produced **by an AI subagent** so it has fresh, isolated context — instead of inlining the test into the main agent's prompt where context drift erodes consistency.

## When NOT to use

- The test is a clean computed-style or DOM-rect assertion → use `/verify-ui` directly, no subagent needed.
- The test is a fixed Playwright spec that always runs the same way → write it as `.spec.ts` in `e2e/`, run via `pnpm exec playwright test`. No AI judgment required.
- The user wants a regression gate for CI — AI verdicts are non-deterministic. Reserve this for local b4push / one-shot evidence; pair with a deterministic spec for CI.

## Hard rule: NEVER use `claude -p`

The subagent dispatch in this skill uses the **Agent tool** (the same tool the main agent uses to spawn `subagent_type: general-purpose`, `Plan`, `Explore`, etc.). Never `claude -p`, never a subprocess shell invocation. The reasons matter:

- The Agent tool returns a structured result message into the parent's context. `claude -p` produces stdout text the parent has to re-parse and interpret.
- The Agent tool respects the parent's session permissions, memory, and skill availability. `claude -p` starts a fresh process that may not see project skills the parent does.
- The Agent tool's lifecycle is observable in the conversation. `claude -p` is opaque — if it stalls or fails, the parent doesn't get clean error signaling.

If you find yourself reaching for `claude -p` for a subagent dispatch, stop and use the Agent tool instead.

## Workflow

The skill has two halves: **author the test-flow skill**, then **dispatch the verification subagent**.

### Half 1 — author the test-flow skill

A test-flow skill is a small, focused skill at `$HOME/.claude/skills/test-flow-<topic>/SKILL.md` (or project-local `.claude/skills/test-flow-<topic>/SKILL.md`) that captures:

- **What scenario** to drive (the exact user-reproduce flow — open template, drop fixture, click button, etc.)
- **What to capture** (which screenshots, which DOM measurements, which evidence)
- **The verdict criteria** (specifically: what counts as PASS vs FAIL, tolerance numbers, threshold ratios)
- **The output format** (what the subagent should return — a JSON-like structured result with named fields)

The skill is **per-task**, not per-app. A single project will accumulate multiple test-flow skills as different tests are needed.

#### Authoring checklist

- [ ] Name follows convention: `test-flow-<short-topic-slug>` (e.g. `test-flow-composer-image-same-size`, `test-flow-animation-frame-pacing`).
- [ ] Description includes BOTH the trigger keywords AND a one-line "use when" — the test-flow skill is triggered by the verification subagent's prompt, so it has to load when the subagent reads its instructions.
- [ ] Body is **self-contained** — the subagent will start fresh with NO conversation history; the test-flow skill body must include everything needed to drive and verdict the test.
- [ ] Procedure is numbered and concrete — exact selectors, exact URLs, exact viewport sizes, exact fixture paths.
- [ ] Verdict criteria are mechanical where possible (tolerance numbers, pixel deltas) and AI-judgment-only where necessary (visual sameness, image recognition).
- [ ] Output schema is explicit — what fields the subagent must return (e.g. `pgenImageWidth`, `composerImageWidth`, `ratio`, `verdict`, `summary`).

Use the `skill-creator` skill's `init_skill.py` to scaffold the new test-flow skill, then write its body. Format with `pnpm dlx @takazudo/mdx-formatter --write <path-to-SKILL.md>`.

### Half 2 — dispatch the verification subagent

After the test-flow skill is written, dispatch a subagent via the Agent tool:

```
Agent({
subagent_type: "general-purpose", // browser-driving + structured output, no specialty needed
description: "<short description>",
prompt: `<self-contained brief — see template below>`,
})
```

The subagent's prompt must include:

- **Goal:** one sentence describing what verdict to produce.
- **Skills to load:** invoke `/test-flow-<topic>` (the just-authored skill) AND a browser-driving skill — `/verify-ui` for computed-styles / screenshot comparison, OR `/headless-browser` for multi-step interactive flows.
- **Inputs:** any per-run inputs the test-flow skill needs (e.g. the W2 PR preview URL, the fixture image path, the viewport size).
- **Output contract:** match the output schema declared in the test-flow skill.

#### Subagent prompt template

```
You are a verification subagent. Produce a structured verdict using the test-flow skill below.

## Goal
{one-sentence verdict goal, e.g. "Determine whether the composer-side image visually matches the pgen-side image at default landing viewport."}

## Skills to load
- /test-flow-<topic> — the test procedure and verdict criteria. Read this first.
- /verify-ui — primary browser-driving skill (computed-styles + screenshots).
- /headless-browser — fallback if /verify-ui doesn't fit the task shape.

## Inputs
- Preview URL: <resolved URL — pass from the parent>
- Fixture: <path or asset reference>
- Viewport: <e.g. 1440x900>
- Any other per-run knobs the test-flow skill expects

## Output contract
Return a structured result message containing exactly these fields:
{ <list each field from the test-flow skill's output schema> }

Plus a `summary` field with a one-line human-readable verdict.

## Don'ts
- Don't improvise the test procedure — follow /test-flow-<topic> exactly.
- Don't change the verdict tolerance — it's locked in /test-flow-<topic>.
- Don't post anywhere — return the result to me; I (the parent agent) handle posting.
```

### After the subagent returns

The parent agent receives the structured result and decides what to do with it: post to a PR comment, write to an evidence file, gate a workflow step, etc. The test-flow skill stays on disk for reuse — next time the same test class is needed, the existing skill is invoked without re-authoring.

## Choosing the browser-driving skill — primary vs fallback

| Skill | Best for | When to fall back |
|---|---|---|
| `/verify-ui` | Deterministic computed-style checks; pure pgen-vs-composer parity; CSS / layout assertions | Cannot drive multi-step UI flows beyond single-page reads |
| `/headless-browser` | Multi-step interactive flows (drag-drop a file, click → screenshot → click → screenshot); element bounding-rect reads via Playwright CLI | Slightly heavier; only use when /verify-ui can't reach the test surface |

The test-flow skill should name BOTH so the subagent picks based on the task shape. If `/verify-ui` returns "cannot perform this flow" the subagent transparently switches to `/headless-browser` without re-prompting the parent.

## Reusability — the test-flow skill outlives the test

A test-flow skill is **not** a one-shot scaffold for a single PR. It's a permanent artifact that captures "how to verify this class of behavior in this codebase." When a similar test is needed later (regression check, repeated verification across PRs), invoke the same test-flow skill — the AI subagent gets the same context, produces consistent verdicts.

Sign that you're using this pattern correctly:

- The test-flow skill is checked into the project's `.claude/skills/` (project-scope, shared with the team), not just `$HOME/.claude/skills/` (personal-only).
- Subsequent invocations DO NOT re-author the skill — they just dispatch a fresh subagent that loads it.
- Updates to the procedure happen by editing the test-flow skill, not by inlining new instructions in the subagent prompt.

## Example skeleton — what a real test-flow skill looks like

```markdown
---
name: test-flow-composer-image-same-size
description: Verify the composer-side image visually matches the pgen-side image at default landing viewport. Use when /dev-ai-based-test dispatches a subagent for issue #1678 / composer-image-same-size verification.
---

# Test flow: composer image same size as pgen

## Scenario
1. Open <preview URL from inputs> at viewport 1440x900.
2. Click the first template card.
3. Click "Start cropping the pattern".
4. Drop `packages/pattern-gen-viewer/e2e/fixtures/red-100-fits-composition.png` on the pgen canvas-layer.
5. Capture screenshot A (pgen with image visible).
6. Click "Commit selection and open Composer".
7. Wait for composer mount (composer-art-canvas visible).
8. Capture screenshot B (composer with image visible).

## Measurements
- pgen image width (CSS px): read via `__pgenLayerState.getSelectedLayerTransform()` + pgen canvas CSS scale.
- composer image width (CSS px): read via `__composerTest.getState()` + cameraZoom + composer canvas CSS rect.
- ratio = composer / pgen.

## Verdict
PASS if ratio ∈ [0.95, 1.05] (±5%). FAIL otherwise.

## Output schema
{
pgenImageWidth: number,
composerImageWidth: number,
ratio: number,
delta: number,
verdict: "PASS" | "FAIL",
summary: string,
pgenScreenshot: string (path),
composerScreenshot: string (path),
toolUsed: "verify-ui" | "headless-browser"
}
```

The example shows the shape; the verification subagent reads this and follows the procedure verbatim.

Related Skills

zudoesa-articlify

from Takazudo/claude-resources

Convert conversation context into an esa article via the zudoesa-writer subagent. ONLY invoke when the user explicitly asks — NEVER proactively propose. Triggers: 'write esa article', 'esa記事', 'esaに書いて', 'articlify for esa', or /zudoesa-articlify. Gathers context, creates a writing brief, delegates to the writer subagent.

zudoesa-apply-voice

from Takazudo/claude-resources

Apply Takazudo's esa writing voice and vocabulary rules to text. Use when: (1) User wants to write/rewrite text in Takazudo's esa style, (2) User says 'apply voice', 'esa voice', 'esa文体で', 'esa風に書いて', '文体を適用', (3) User provides text to transform to esa style. Reads writing-style.md and vocabulary-rule.md from takazudo-esa-writing repo and applies the rules.

zudocg-articlify

from Takazudo/claude-resources

Convert conversation context into a CodeGrid article via the zudocg-writer subagent. ONLY invoke when the user explicitly asks — NEVER proactively propose. Triggers: 'write codegrid article', 'CodeGrid記事', 'codegridに書いて', 'articlify for codegrid', or /zudocg-articlify. Gathers context, creates a writing brief, delegates to the writer subagent.

zudocg-apply-voice

from Takazudo/claude-resources

Apply Takazudo's CodeGrid writing voice and vocabulary rules to text. Use when: (1) User wants to write/rewrite text in Takazudo's CodeGrid style, (2) User says 'apply voice', 'codegrid voice', 'codegrid文体で', 'codegrid風に書いて', '文体を適用', (3) User provides text to transform to CodeGrid style. Reads writing-style.md and vocabulary-rule.md from takazudo-codegrid-writing repo and applies the rules.

zpaper-articlify

from Takazudo/claude-resources

Convert conversation context into a zpaper blog article via the zpaper-writer subagent. ONLY invoke when the user explicitly asks — NEVER proactively propose. Triggers: 'write zpaper article', 'zpaper記事', 'zpaperに書いて', 'articlify for zpaper', or /zpaper-articlify. Gathers context, creates a writing brief, delegates to the writer subagent.

zpaper-apply-voice

from Takazudo/claude-resources

Apply Takazudo's zpaper blog writing voice and vocabulary rules to text. Use when: (1) User wants to write/rewrite text in Takazudo's zpaper style, (2) User says 'apply voice', 'zpaper voice', 'zpaper文体で', 'zpaper風に書いて', 'ブログ文体を適用', (3) User provides text to transform to zpaper style. Reads writing-style.md and vocabulary-rule.md from the zpaper repo and applies the rules.

xlsx

from Takazudo/claude-resources

Spreadsheet creation, editing, and analysis. Use when working with .xlsx, .xlsm, .csv, .tsv files for: (1) Creating spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modifying existing spreadsheets while preserving formulas, (4) Data analysis and visualization, (5) Recalculating formulas.

x

from Takazudo/claude-resources

Facade for development workflows. Routes on two axes: plan-first vs implement-now (escalates to /big-plan -a when the request needs research / decomposition / has unclear scope — the appended -a makes the plan chain into implementation in-session), then single vs multi on the ready-to-build fast paths (/x-as-pr single-topic, /x-wt-teams multi-topic parallel). Use when: (1) User says '/x' followed by dev instructions, (2) User wants to start development without choosing the workflow skill, (3) User says 'dev', 'implement', or 'build' with a task. Default option: -v (verify-ui). Review-loop (-l) is opt-in — without -l the downstream skill runs a single /deep-review pass. Forwards -a (autonomy/auto-chain) and -m (merge at the end + cleanup + CI watch) through every route; auto-fix of raised findings (-f) and issue-raising (-ri) are downstream defaults, with -nf/--no-fix and -nori/--no-raise-issues as the forwarded opt-outs. -a and -m are orthogonal — full hands-off end-to-end is -a -m.

x-wt-teams

from Takazudo/claude-resources

Parallel multi-topic development using git worktrees, base branches, and Claude Code agent teams. Use when: (1) User wants to work on multiple related features in parallel, (2) User mentions 'worktree', 'base branch', 'parallel development', 'split into topics', or 'multi-topic'. FULLY AUTONOMOUS — creates worktrees, spawns teams, coordinates everything. Also supports Super-Epic child mode for [Epic] issues from /big-plan with '**Super-epic:** #N' markers (targets the super-epic base branch instead of main).

x-as-pr

from Takazudo/claude-resources

Start a development workflow as a draft PR. Creates a NEW branch from the current branch, empty start commit, draft PR targeting the current branch, then implements. ALWAYS creates a new branch by default — produces a nested PR-on-PR when the current branch already has one. Use when: (1) User says 'dev as pr', (2) User wants a PR-first workflow before coding, (3) User passes -s/--stay to reuse the current branch instead of nesting, (4) User passes a GitHub issue URL to implement, (5) User passes --make-issue/--issue to create an issue first. Logs progress via issue comments when an issue is linked.

watch-ci

from Takazudo/claude-resources

Watch GitHub PR CI checks in the background and notify on completion. Use when: (1) User wants to monitor CI/CD status, (2) User says 'watch CI', 'check CI', 'monitor checks', or 'wait for CI', (3) User wants to know when checks pass or fail. Runs a background gh polling shell loop (NOT a subagent — near-zero token cost), sends macOS notification on completion. Also handles merged PRs by watching the target branch CI.

w-update-wording-rule

from Takazudo/claude-resources

Add or update wording rules (表記ルール) in the w repo's vocabulary-rule.md files. Use when: (1) User says 'add wording rule', 'update wording rule', '表記ルール追加', (2) User wants to add a kanji/hiragana usage rule, (3) User provides a rule like 'X should be Y' with examples.