visual-verdict
Structured visual QA verdict for screenshot-to-reference comparisons
Best use case
visual-verdict is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Structured visual QA verdict for screenshot-to-reference comparisons
Teams using visual-verdict should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/visual-verdict/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How visual-verdict Compares
| Feature / Agent | visual-verdict | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Structured visual QA verdict for screenshot-to-reference comparisons
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Cursor vs Codex for AI Workflows
Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
SKILL.md Source
<Purpose>
Use this skill to compare generated UI screenshots against one or more reference images and return a strict JSON verdict that can drive the next edit iteration.
</Purpose>
<Use_When>
- The task includes visual fidelity requirements (layout, spacing, typography, component styling)
- You have a generated screenshot and at least one reference image
- You need deterministic pass/fail guidance before continuing edits
</Use_When>
<Inputs>
- `reference_images[]` (one or more image paths)
- `generated_screenshot` (current output image)
- Optional: `category_hint` (e.g., `hackernews`, `sns-feed`, `dashboard`)
</Inputs>
<Output_Contract>
Return **JSON only** with this exact shape:
```json
{
"score": 0,
"verdict": "revise",
"category_match": false,
"differences": ["..."],
"suggestions": ["..."],
"reasoning": "short explanation"
}
```
Rules:
- `score`: integer 0-100
- `verdict`: short status (`pass`, `revise`, or `fail`)
- `category_match`: `true` when the generated screenshot matches the intended UI category/style
- `differences[]`: concrete visual mismatches (layout, spacing, typography, colors, hierarchy)
- `suggestions[]`: actionable next edits tied to the differences
- `reasoning`: 1-2 sentence summary
<Threshold_And_Loop>
- Target pass threshold is **90+**.
- If `score < 90`, continue editing and rerun `$visual-verdict` before any further code edits in the next iteration.
- Persist the verdict in `.omx/state/{scope}/ralph-progress.json` with both:
- numeric signal (`score`, threshold pass/fail)
- qualitative signal (`reasoning`, `suggestions`, `next_actions`)
</Threshold_And_Loop>
<Debug_Visualization>
When mismatch diagnosis is hard:
1. Keep `$visual-verdict` as the authoritative decision.
2. Use pixel-level diff tooling (pixel diff / pixelmatch overlay) as a **secondary debug aid** to localize hotspots.
3. Convert pixel diff hotspots into concrete `differences[]` and `suggestions[]` updates.
</Debug_Visualization>
<Example>
```json
{
"score": 87,
"verdict": "revise",
"category_match": true,
"differences": [
"Top nav spacing is tighter than reference",
"Primary button uses smaller font weight"
],
"suggestions": [
"Increase nav item horizontal padding by 4px",
"Set primary button font-weight to 600"
],
"reasoning": "Core layout matches, but style details still diverge."
}
```
</Example>Related Skills
worker
Team worker protocol (ACK, mailbox, task lifecycle) for tmux-based OMX teams
web-clone
URL-driven website cloning with visual + functional verification
ultrawork
Parallel execution engine for high-throughput task completion
ultraqa
QA cycling workflow - test, verify, fix, repeat until goal met
trace
Show agent flow trace timeline and summary
team
N coordinated agents on shared task list using tmux-based orchestration
tdd
Test-Driven Development enforcement skill - write tests first, always
swarm
N coordinated agents on shared task list (compatibility facade over team)
skill
Manage local skills - list, add, remove, search, edit, setup wizard
ralplan
Alias for $plan --consensus
ralph
Self-referential loop until task completion with architect verification
ralph-init
Initialize a PRD (Product Requirements Document) for structured ralph-loop execution