review

Review incoming PRs, agent-generated changes, or diffs. Structured review with security, correctness, performance, and maintainability checks. Triggers: "review", "review PR", "review changes", "code review", "review this PR", "review agent output", "check this diff".

244 stars

byboshu2

View on GitHub Installation ↓

Best use case

review is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using review should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/review/SKILL.md --create-dirs "https://raw.githubusercontent.com/boshu2/agentops/main/skills-codex/review/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/review/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How review Compares

Feature / Agent	review	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Cursor vs Codex for AI Workflows

Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.

SKILL.md Source

# Review Skill

> **Quick Ref:** `$review <PR>` reviews a PR, `$review --diff` reviews local changes, `$review --agent <path>` reviews agent output with extra scrutiny.

**YOU MUST EXECUTE THIS WORKFLOW. Do not just describe it.**

This skill is for reviewing OTHER people's or agents' changes. For validating your own code quality, use `$vibe` instead.

---

## Modes

```bash
$review 42                          # PR mode — review PR #42
$review https://github.com/o/r/pull/42  # PR mode — review by URL
$review --diff                      # Diff mode — review unstaged/staged changes
$review --diff --staged             # Diff mode — staged only
$review --agent .agents/crank/      # Agent mode — review agent-generated output
$review --agent ./output.patch      # Agent mode — review a patch file
$review --deep 42                   # Deep mode — spawns council for second opinion
```

---

## Execution Steps

### Step 0: Detect Review Target and Load Standards

Determine the review mode from arguments:

1. **PR mode** (default): argument is a number or GitHub PR URL.
2. **Diff mode**: `--diff` flag present.
3. **Agent mode**: `--agent <path>` flag present.

Load language-specific conventions from `$standards` based on file extensions in the diff. If `ao` is available, pull prior review context:

```bash
ao lookup --query "code review patterns $(basename "$PWD")" --limit 3 2>/dev/null || true
```

**Apply retrieved knowledge (mandatory when results returned):**

If learnings are returned, do NOT just load them as passive context. For each returned item:
1. Check: does this learning apply to the code under review? (answer yes/no)
2. If yes: include it as a `known_risk` — state the pattern, what to look for, and whether the diff exhibits it
3. Cite the learning by filename in your review output when it influences a finding

After applying, record the citation:
```bash
ao metrics cite "<learning-path>" --type applied 2>/dev/null || true
```

Skip silently if ao is unavailable or returns no results.

### Step 0.5: Apply Behavioral Discipline

Load the behavioral discipline standard from `$standards` before reviewing the diff. Use it to answer four questions:

1. What assumptions does this change make, and were they surfaced or silently chosen?
2. Could the same outcome be achieved with a smaller or more local change?
3. Does every changed line trace back to the stated goal?
4. Does the verification prove the claimed behavior, or only that the code builds?

If any answer is weak, record the problem as a finding. Hidden assumptions, speculative abstractions, drive-by edits, and weak verification are review defects, not style preferences.

---

### Step 1: Fetch the Diff

#### PR Mode

```bash
gh pr view "$PR_REF" --json title,body,author,baseRefName,headRefName,labels,reviewDecision,commits
gh pr diff "$PR_REF"
gh pr diff "$PR_REF" --name-only
```

If the PR has more than 500 changed lines, prioritize: security-sensitive files, high-complexity changes, new files, then test files.

#### Diff Mode

```bash
git diff HEAD                  # unstaged + staged
git diff --cached              # staged only (with --staged flag)
git diff HEAD --name-only      # changed file list
```

#### Agent Mode

```bash
# Directory: find all generated files
find "$AGENT_PATH" -type f \( -name '*.go' -o -name '*.py' -o -name '*.ts' -o -name '*.sh' -o -name '*.md' \)
# Patch file: inspect stats
git apply --stat "$AGENT_PATH"
```

---

### Step 2: Context Gathering

Understand the intent behind the changes before reviewing the code:

- **PR Mode:** Read PR title/body, check linked issues (`fixes #`, `closes #`), read commit messages.
- **Diff Mode:** Check `git log --oneline -5`, branch name, open issues via `bd list --status open`.
- **Agent Mode:** Read execution logs in output directory, check `.agents/rpi/` artifacts.

**Output a one-line intent summary before proceeding:**

```
INTENT: <what the change is trying to accomplish>
```

If intent is unclear, flag it: "PR description does not explain the purpose of this change."

---

### Step 3: Systematic Review Pass (SCORED)

Review every changed file against the SCORED checklist. For each category, actively look for problems. Do not skim -- read each changed line.

#### S -- Security

- [ ] No hardcoded secrets, API keys, tokens, or passwords
- [ ] Input validation on all external data (user input, API responses, file reads)
- [ ] SQL/command injection: parameterized queries, no string interpolation in commands
- [ ] Auth/authz checks present where needed (not just authn)
- [ ] Sensitive data not logged or exposed in error messages
- [ ] Dependencies: no known-vulnerable versions added
- [ ] File operations: path traversal prevention, safe temp file handling

#### C -- Correctness

- [ ] Logic errors: off-by-one, wrong operator, inverted condition
- [ ] Edge cases: nil/null handling, empty collections, boundary values
- [ ] Error handling: errors checked, not swallowed, wrapped with context
- [ ] Race conditions: shared mutable state, concurrent access patterns
- [ ] Resource leaks: unclosed files, connections, goroutines, channels
- [ ] Type safety: unchecked casts, implicit conversions, overflow potential
- [ ] Contract compliance: does the change match the stated intent?

#### O -- Observability

- [ ] Errors include enough context for debugging (what failed, with what input)
- [ ] New features have appropriate logging at correct levels
- [ ] Metrics or health indicators added for new failure modes
- [ ] Error messages are actionable (not just "something went wrong")

#### R -- Readability

- [ ] Names are descriptive and consistent with codebase conventions
- [ ] Functions are focused (single responsibility, not doing too much)
- [ ] Complex logic has comments explaining WHY (not WHAT)
- [ ] No dead code, commented-out code, or leftover debug statements
- [ ] Consistent formatting with the rest of the codebase

#### E -- Efficiency

- [ ] No unnecessary allocations in hot paths
- [ ] N+1 query patterns (database calls in loops)
- [ ] Unbounded growth: maps/slices that grow without limits
- [ ] Appropriate use of caching, batching, or pagination
- [ ] No blocking operations in async/concurrent contexts

#### D -- Design

- [ ] Abstraction level is appropriate (not over-engineered, not under-abstracted)
- [ ] API surface is minimal and consistent with existing patterns
- [ ] Changes are cohesive (single concern per PR, not mixing refactoring with features)
- [ ] Ambiguity was surfaced instead of silently assumed away
- [ ] No speculative flexibility or abstractions beyond the stated need
- [ ] Every changed line traces to the requested outcome or required cleanup
- [ ] Dependencies flow in the right direction (no circular imports)
- [ ] Test coverage: new code has tests, tests verify behavior (not just coverage)
- [ ] Breaking changes are documented and intentional

---

### Step 4: Agent-Specific Checks (--agent mode only)

When reviewing agent-generated code, apply additional scrutiny for common agent failure modes:

#### Hallucinated References
- [ ] All imports exist (no invented packages or modules)
- [ ] All called functions exist in the codebase or dependencies
- [ ] Referenced files and paths actually exist
- [ ] API endpoints and URLs are real

#### Over-Engineering
- [ ] No unnecessary abstractions (interfaces with one implementation, factory for one type)
- [ ] No premature generalization (generic solution where specific was asked)
- [ ] No gold-plating (features not requested)
- [ ] Reasonable LOC for the task complexity

#### Missing Fundamentals
- [ ] Error handling is present (agents frequently skip error paths)
- [ ] Edge cases are handled (agents often only handle the happy path)
- [ ] Cleanup/teardown logic exists (defer, finally, context cancellation)
- [ ] Concurrency safety if applicable

#### Test Quality
- [ ] Tests actually assert meaningful behavior (not just `!= nil` or `!= ""`)
- [ ] Test names describe the scenario being tested
- [ ] Tests cover error paths, not just happy paths
- [ ] No `cov*_test.go` naming pattern (coverage-padding anti-pattern)
- [ ] Mocks are realistic (not returning hardcoded success for everything)

#### Codebase Consistency
- [ ] Follows existing naming conventions (check 3+ similar files for patterns)
- [ ] Uses existing helpers/utilities instead of reimplementing
- [ ] Error handling style matches the codebase
- [ ] File organization follows project structure

---

### Step 5: Generate Structured Review Output

Create a review artifact:

```bash
REVIEW_DIR=".agents/review"
mkdir -p "$REVIEW_DIR"
REVIEW_FILE="$REVIEW_DIR/$(date +%Y-%m-%d)-review-$(echo "$PR_REF" | tr '/' '-').md"
```

#### Review Document Structure

```markdown
# Review: <PR title or change description>
**Date:** YYYY-MM-DD  |  **Verdict:** APPROVE | REQUEST_CHANGES | COMMENT
**Target:** PR #N / local diff / agent output at <path>

## Intent
<one-line summary>

## SCORED Assessment
| Category | Rating | Notes |
|----------|--------|-------|
| Security | pass/warn/fail | ... |
| Correctness | pass/warn/fail | ... |
| Observability | pass/warn/fail | ... |
| Readability | pass/warn/fail | ... |
| Efficiency | pass/warn/fail | ... |
| Design | pass/warn/fail | ... |

## Findings
### Critical (must fix)
- **[file:line]** Issue. Suggested fix: ...
### Warning (should fix)
- **[file:line]** Issue. Suggested fix: ...
### Suggestion / Nit
- **[file:line]** Description.

## Missing
<expected but absent: tests, docs, error handling, migration>
```

#### Verdict Rules

- **APPROVE**: No critical or warning findings. All SCORED categories pass.
- **REQUEST_CHANGES**: Any critical finding, OR 3+ warnings, OR any SCORED category rated "fail".
- **COMMENT**: 1-2 warnings with no critical findings. Worth discussing but not blocking.

#### PR Mode: Post Comments

If reviewing a PR and the verdict is REQUEST_CHANGES or COMMENT, offer to post the review:

```bash
# Post review comment on the PR
gh pr review "$PR_REF" --comment --body "$(cat "$REVIEW_FILE")"

# Or for blocking review
gh pr review "$PR_REF" --request-changes --body "$(cat "$REVIEW_FILE")"
```

Only post if the user confirms. Never auto-post a review without explicit approval.

---

## Deep Mode (--deep)

When `--deep` is specified, after the initial SCORED pass, spawn a council for a second opinion:

```bash
$council validate "Review these changes for issues I might have missed: <summary of changes>"
```

Merge council findings into the review document under a "## Council Findings" section.

---

## Integration with Other Skills

| Skill | Relationship |
|-------|-------------|
| `$vibe` | Self-review (your own code). `$review` is for others' code. |
| `$council` | Optional second opinion via `--deep` flag. |
| `$standards` | Auto-loaded for language-specific rules. |
| `$bug-hunt` | `$review` does a structured pass; `$bug-hunt` does deep investigation of suspected bugs. |
| `$pr-validate` | PR-specific validation (isolation, scope creep). Complementary to `$review`. |

---

## See Also

- [vibe](../vibe/SKILL.md) — Self-review and code quality validation
- [council](../council/SKILL.md) — Multi-model consensus council
- [standards](../standards/SKILL.md) — Language-specific coding conventions
- [bug-hunt](../bug-hunt/SKILL.md) — Deep bug investigation
- [pr-validate](../pr-validate/SKILL.md) — PR scope and isolation checks

Related Skills

vibe

244

from boshu2/agentops

Comprehensive code validation. Runs complexity analysis then multi-model council. Answer: Is this code ready to ship? Triggers: "vibe", "validate code", "check code", "review code", "code quality", "is this ready".

validation

244

from boshu2/agentops

Full validation phase orchestrator. Vibe + post-mortem + retro + forge. Reviews implementation quality, extracts learnings, feeds the knowledge flywheel. Triggers: "validation", "validate", "validate work", "review and learn", "validation phase", "post-implementation review".

update

244

from boshu2/agentops

Reinstall all AgentOps skills globally from the latest source. Triggers: "update skills", "reinstall skills", "sync skills".

trace

244

from boshu2/agentops

Trace design decisions and concepts through session history, handoffs, and git. Triggers: "trace decision", "how did we decide", "where did this come from", "design provenance", "decision history".

test

244

from boshu2/agentops

Test generation, coverage analysis, and TDD workflow. Triggers: "test", "generate tests", "test coverage", "write tests", "tdd", "add tests", "test strategy", "missing tests", "coverage gaps".

status

244

from boshu2/agentops

Single-screen dashboard showing current work, recent validations, flywheel health, and suggested next action. Triggers: "status", "dashboard", "what am I working on", "where was I".

standards

244

from boshu2/agentops

Language-specific coding standards and validation rules. Provides Python, Go, Rust, TypeScript, Shell, YAML, JSON, and Markdown standards. Auto-loaded by $vibe, $implement, $doc, $bug-hunt, $complexity based on file types.

shared

244

from boshu2/agentops

Shared reference documents for multi-agent skills (not directly invocable)

security

244

from boshu2/agentops

Continuous repository security scanning and release gating. Triggers: "security scan", "security audit", "pre-release security", "run scanners", "check vulnerabilities".

security-suite

244

from boshu2/agentops

Composable security suite for binary and prompt-surface assurance, static analysis, dynamic tracing, repo-native redteam scans, contract capture, baseline drift, and policy gating. Triggers: "binary security", "reverse engineer binary", "black-box binary test", "behavioral trace", "baseline diff", "prompt redteam", "security suite".

scenario

244

from boshu2/agentops

Author and manage holdout scenarios for behavioral validation. Scenarios are stored in .agents/holdout/ where implementing agents cannot see them. Triggers: "$scenario", "holdout", "behavioral scenario", "create scenario", "list scenarios".

scaffold

244

from boshu2/agentops

Project scaffolding, component generation, and boilerplate setup. Triggers: "scaffold", "new project", "init project", "create project", "generate component", "setup project", "starter", "boilerplate".