visual-verdict

Structured visual QA verdict for screenshot-to-reference comparisons

18,419 stars

Best use case

visual-verdict is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Structured visual QA verdict for screenshot-to-reference comparisons

Teams using visual-verdict should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/visual-verdict/SKILL.md --create-dirs "https://raw.githubusercontent.com/Yeachan-Heo/oh-my-codex/main/skills/visual-verdict/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/visual-verdict/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How visual-verdict Compares

Feature / Agentvisual-verdictStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Structured visual QA verdict for screenshot-to-reference comparisons

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

<Purpose>
Use this skill to compare generated UI screenshots against one or more reference images and return a strict JSON verdict that can drive the next edit iteration.
</Purpose>

<Use_When>
- The task includes visual fidelity requirements (layout, spacing, typography, component styling)
- You have a generated screenshot and at least one reference image
- You need deterministic pass/fail guidance before continuing edits
</Use_When>

<Inputs>
- `reference_images[]` (one or more image paths)
- `generated_screenshot` (current output image)
- Optional: `category_hint` (e.g., `hackernews`, `sns-feed`, `dashboard`)
</Inputs>

<Output_Contract>
Return **JSON only** with this exact shape:

```json
{
  "score": 0,
  "verdict": "revise",
  "category_match": false,
  "differences": ["..."],
  "suggestions": ["..."],
  "reasoning": "short explanation"
}
```

Rules:
- `score`: integer 0-100
- `verdict`: short status (`pass`, `revise`, or `fail`)
- `category_match`: `true` when the generated screenshot matches the intended UI category/style
- `differences[]`: concrete visual mismatches (layout, spacing, typography, colors, hierarchy)
- `suggestions[]`: actionable next edits tied to the differences
- `reasoning`: 1-2 sentence summary

<Threshold_And_Loop>
- Target pass threshold is **90+**.
- If `score < 90`, continue editing and rerun `$visual-verdict` before any further code edits in the next iteration.
- Persist the verdict in `.omx/state/{scope}/ralph-progress.json` with both:
  - numeric signal (`score`, threshold pass/fail)
  - qualitative signal (`reasoning`, `suggestions`, `next_actions`)
</Threshold_And_Loop>

<Debug_Visualization>
When mismatch diagnosis is hard:
1. Keep `$visual-verdict` as the authoritative decision.
2. Use pixel-level diff tooling (pixel diff / pixelmatch overlay) as a **secondary debug aid** to localize hotspots.
3. Convert pixel diff hotspots into concrete `differences[]` and `suggestions[]` updates.
</Debug_Visualization>

<Example>
```json
{
  "score": 87,
  "verdict": "revise",
  "category_match": true,
  "differences": [
    "Top nav spacing is tighter than reference",
    "Primary button uses smaller font weight"
  ],
  "suggestions": [
    "Increase nav item horizontal padding by 4px",
    "Set primary button font-weight to 600"
  ],
  "reasoning": "Core layout matches, but style details still diverge."
}
```
</Example>