review-quality

Review quality framework for the work-to-review transition gate. Guides verification of plan alignment, test quality, and code simplification before marking implementation complete. Referenced by schema guidance fields during review-phase note filling. Read this skill when filling review-checklist notes or when asked to review completed implementation work.

177 stars

byjpicklyk

View on GitHub Installation ↓

Best use case

review-quality is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using review-quality should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/review-quality/SKILL.md --create-dirs "https://raw.githubusercontent.com/jpicklyk/task-orchestrator/main/.claude/skills/review-quality/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/review-quality/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How review-quality Compares

Feature / Agent	review-quality	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

Cursor vs Codex for AI Workflows

Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.

SKILL.md Source

# Review Quality Framework

This skill defines what a reviewer must verify before implementation work advances to
completion. It applies whether the reviewer is the orchestrator directly or a delegated
subagent.

The review gate exists because implementation agents optimize for getting things working,
not for verifying they built the right thing. Without a structured review checkpoint,
planned work gets silently dropped, tests get written to pass rather than to verify, and
unnecessary complexity accumulates. The review is where these failure modes get caught.

**Critical separation of concerns:** The reviewer must not be the same agent that wrote
the code or the tests. An agent reviewing its own work will rationalize rather than
evaluate. The reviewer reads, runs, and reports — it never fixes. If issues are found,
they go back to the implementation agent for resolution.

---

## Getting Started

The reviewer is given an MCP item ID. Use MCP tools and codebase access to gather what
you need — do not expect context to be pre-loaded for you.

1. **Load the item's notes** — `query_notes(itemId=..., includeBody=true)` to retrieve
the `specification` and `implementation-notes`.
2. **Read the changed files** — use the implementation notes to identify which files were
modified, then read them directly. Review the actual code, not just summaries.
3. **Run the test suite** — execute the project's test command and capture the results.
Do not assume tests pass because the implementation agent said they did.

If the specification or implementation notes are missing, the review cannot proceed.
Report this as a blocking issue.

---

## Review Areas

These four areas form the minimum review. Each one catches a different class of failure.
If the review surfaces additional concerns, include them — this is a floor, not a ceiling.

### 1. Test Suite Verification

Run the test suite before anything else. Everything downstream depends on knowing the
actual state of the tests.

Run `./gradlew :current:test` and capture the output. Record the total test count and
the pass/fail breakdown.

**If tests fail:** Document every failure — test name, assertion message, and the file
where the test lives. Do not attempt to fix failures. Do not speculate about whether
failures are pre-existing or new. Report what you observe. Test failures are a blocking
issue — the item cannot advance with a failing test suite.

**If tests pass:** Record the count and move on.

### 2. Plan Alignment

Compare what was built against the specification. The goal is to catch drift in both
directions — work that was planned but not done, and work that was done but not planned.

**Check each acceptance criterion.** Walk through the acceptance criteria from the
specification one by one. For each criterion, identify the specific code change that
satisfies it. If a criterion has no corresponding implementation, flag it — either the
work is incomplete or the criterion was intentionally descoped (which should appear in
the implementation notes).

**Check for unplanned changes.** Review the changed files for modifications that don't
trace back to any acceptance criterion. Unplanned changes aren't automatically wrong —
sometimes implementation reveals necessary adjacent work. But they should be
acknowledged and justified in the implementation notes, not silent.

**Check non-goals weren't violated.** Review the specification's non-goals list. If the
implementation touched areas that were explicitly scoped out, flag it.

### 3. Test Quality

The specification's test strategy defined what should be tested — happy paths, failure
paths, and edge cases. The reviewer verifies that the tests actually deliver on that
strategy, not just that they exist and pass.

This is where the separation of concerns matters most. The agent that wrote the tests
has an inherent bias toward believing they're correct. An independent reviewer can
evaluate whether the tests verify real behavior or just confirm that code runs.

**Map tests to the test strategy.** For each scenario in the specification's test
strategy, identify the corresponding test. Missing coverage is a gap to report.

**Evaluate test substance.** Watch for these patterns that produce green results
without catching real bugs:

- **Tautological assertions** — asserting something equals itself, or that a non-null
value is not null, without verifying the actual value is correct.
- **Mock-heavy tests that verify nothing real** — every dependency mocked, test only
confirms mocks were called in order. Mocks are fine for isolation, but the test must
still assert something meaningful about the unit's output or state change.
- **Happy-path-only coverage** — if the test strategy called for failure paths, those
tests need to exist and need to verify the failure behavior is correct (right
exception type, right error message, right fallback behavior).
- **Overly broad assertions** — `result != null` or `list.isNotEmpty()` when specific
values, sizes, or contents should be checked. These pass even when the implementation
is wrong.

**Check edge cases.** Verify each boundary condition from the test strategy has a
corresponding test. If implementation notes documented new edge cases discovered during
development, check whether tests were added for those too.

### 4. Simplification

Run the `/simplify` skill on the changed files and document its findings. The reviewer
records what simplify identifies — it does not apply fixes.

Document each finding from simplify with:
- The file and area affected
- What simplify identified (duplication, unnecessary complexity, over-engineering)
- Whether it's minor (style/preference) or substantive (affects maintainability)

Simplification findings are not blocking unless they indicate a structural problem
that would make the code difficult to maintain or extend.

---

## Review Output

The review produces a `review-checklist` note on the MCP item. Structure the note
around findings, not process.

### Verdict

Every review must end with a clear verdict:

- **Pass** — all acceptance criteria met, tests pass and have substance, no blocking
issues. The item can advance.
- **Fail — blocking issues** — test failures, missing acceptance criteria, or critical
gaps in test coverage. The item must go back for fixes before it can advance. List
every blocking issue.
- **Pass with observations** — no blocking issues, but simplification findings or
minor test quality concerns worth addressing. The item can advance, but the
observations should be tracked for follow-up.

### Findings Format

For each finding, state:
- **What was expected** (from the specification or test strategy)
- **What was found** (in the code or test output)
- **Severity** (blocking or observation)

Be specific. "Tests could be better" is not actionable. "Test `testCreateItem` asserts
only that the result is not null — it should verify the item's title and status match
the input parameters" is actionable.

### Gate Enforcement

The reviewer does not advance the item. It fills the `review-checklist` note and
reports the verdict. The orchestrator reads the verdict and decides whether to:
- Advance the item (pass or pass-with-observations)
- Send the item back to the implementation agent with the blocking issues list (fail)

A failing verdict with clear findings gives the implementation agent exactly what to
fix without ambiguity.

Related Skills

spec-quality

177

from jpicklyk/task-orchestrator

Specification quality framework for planning. Defines the minimum bar for what a plan must address — alternatives, non-goals, blast radius, risk flags, and test strategy. Referenced by schema guidance fields during queue-phase note filling. Read this skill whenever filling requirements or design notes for any MCP work item.

work-summary

177

from jpicklyk/task-orchestrator

Generate a hierarchical project dashboard showing all work items organized by container, with IDs, tags, status, and priority visible. Always use this skill for any request about project status, work summaries, or item overviews — never construct dashboards manually with raw MCP calls. Trigger on any of these phrases or intent: "project status", "what's active", "show me the dashboard", "work summary", "summary", "what should I work on", "project health", "what's blocked", "where did I leave off", "show items", "what's in the backlog", "overview", or any request to see or review the current state of work items. This includes session-start context gathering — if you need to understand current project state, use this skill.

status-progression

177

from jpicklyk/task-orchestrator

Navigate role transitions for MCP work items using advance_item. Shows current role, gate status, required notes, and the correct trigger to use. Use when a user says "advance this item", "move to work", "start this task", "complete this item", "what's the next status", "why can't I advance", "unblock this", "cancel this item", or "check gate status".

schema-workflow

177

from jpicklyk/task-orchestrator

Guide an MCP work item through its schema-defined lifecycle — filling required notes using guidancePointer and advancing through gate-enforced phases. Internal skill triggered by hooks and output styles during orchestration workflows. Use when an item has schema tags and needs to progress through queue, work, review, or terminal phases with note gates.

quick-start

177

from jpicklyk/task-orchestrator

Interactive onboarding for the MCP Task Orchestrator. Detects empty or populated workspaces and walks through how plan mode, persistent tracking, and the MCP work together. Use when a user says "get started", "how do I use this", "quick start", "first time setup", "onboard me", "what can this MCP do", or "help me learn task orchestrator".

pre-plan-workflow

177

from jpicklyk/task-orchestrator

Internal workflow for plan mode — checks MCP for existing work, note schemas, and gate requirements to set the definition floor before planning begins. Triggered automatically when entering plan mode for any non-trivial implementation task.

post-plan-workflow

177

from jpicklyk/task-orchestrator

Internal workflow for post-plan materialization — creates MCP items from the approved plan and dispatches implementation. Triggered automatically after plan approval when MCP tracking is active.

manage-schemas

177

from jpicklyk/task-orchestrator

Create, view, edit, delete, and validate note schemas for the MCP Task Orchestrator in .taskorchestrator/config.yaml — the templates that define which notes agents must fill at each workflow phase. Use when user says "create schema", "show schemas", "edit schema", "delete schema", "validate config", "what schemas exist", "add a note to schema", "remove note from schema", or "configure gates".

dependency-manager

177

from jpicklyk/task-orchestrator

Visualize, create, and diagnose dependencies between MCP work items. Use when a user says "what blocks this", "add a dependency", "show dependency graph", "why can't this start", "link these items", "unblock this", "remove dependency", or "show blockers".

create-item

177

from jpicklyk/task-orchestrator

Create an MCP work item from conversation context. Scans existing containers to anchor the item in the right place (Bugs, Features, Tech Debt, Observations, etc.), infers type and priority, creates single items or work trees, and pre-fills required notes. Use this whenever the conversation surfaces a bug, feature idea, tech debt item, or observation worth tracking persistently. Also use when user says "track this", "log this bug", "create a task for", or "add this to the backlog".

batch-complete

177

from jpicklyk/task-orchestrator

Complete or cancel multiple items at once — close out features, clean up old work, archive completed workstreams. Use when a user says "close out this feature", "complete everything under X", "cancel this workstream", "clean up old items", "bulk complete", "finish this feature", or "archive completed work".

session-retrospective

177

from jpicklyk/task-orchestrator

Analyze the current implementation run — evaluate schema effectiveness, delegation alignment, note quality, plan-to-execution fit. Captures cross-session trends and proposes improvements when patterns repeat. Use after implementation runs, or when user says 'retrospective', 'session review', 'what did we learn', 'analyze this run', 'how did that go', 'evaluate our process', 'wrap up', 'end of session review'. Also use when the output style's retrospective nudge fires after complete_tree.