nw-ab-critique-dimensions

Review dimensions for validating agent quality - template compliance, safety, testing, and priority validation

322 stars

Best use case

nw-ab-critique-dimensions is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Review dimensions for validating agent quality - template compliance, safety, testing, and priority validation

Teams using nw-ab-critique-dimensions should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/nw-ab-critique-dimensions/SKILL.md --create-dirs "https://raw.githubusercontent.com/nWave-ai/nWave/main/nWave/skills/nw-ab-critique-dimensions/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/nw-ab-critique-dimensions/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How nw-ab-critique-dimensions Compares

Feature / Agent	nw-ab-critique-dimensions	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Review dimensions for validating agent quality - template compliance, safety, testing, and priority validation

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# Agent Quality Critique Dimensions

Use these dimensions when reviewing or validating agent definitions.

## Dimension 1: Template Compliance

Does the agent follow official Claude Code format?

**Check**: YAML frontmatter with name and description (required) | Markdown body as system prompt | No embedded YAML config blocks | No activation-instructions or IDE-FILE-RESOLUTION sections | Skills referenced in frontmatter, not inline

**Severity**: High -- non-compliant agents may not load correctly.

## Dimension 2: Size and Focus

**Check**: Core definition under 400 lines | Domain knowledge in Skills | Single clear responsibility | No monolithic sections (>50 lines without structure) | No redundant Claude default behaviors

**Measurement**: `wc -l {agent-file}`. Target: 200-400 lines.

**Severity**: High -- oversized agents suffer context rot.

## Dimension 3: Divergence Quality

Does the agent specify only what diverges from Claude defaults?

**Check**: No file operation instructions | No generic quality principles ("be thorough") | No tool usage guidelines | Core principles are domain-specific and non-obvious | Each instruction justifies why Claude wouldn't do this naturally

**Severity**: Medium -- redundant instructions waste tokens, cause overtriggering.

## Dimension 4: Safety Implementation

**Check**: Tools restricted via frontmatter `tools` field | maxTurns set | No prose-based security layers (use hooks) | No embedded enterprise safety frameworks | permissionMode set for risky actions

**Severity**: High -- prose safety is ineffective and token-wasteful.

## Dimension 5: Language and Tone

**Check**: No "CRITICAL:", "MANDATORY:", "ABSOLUTE" language | Direct statements ("Do X" not "You MUST X") | Affirmative phrasing ("Do Y" not "Don't do X") | Consistent terminology | No repetitive emphasis

**Severity**: Medium -- aggressive language causes overtriggering on Opus 4.6.

## Dimension 6: Examples Quality

**Check**: 3-5 canonical examples present | Cover critical/subtle decisions (not obvious cases) | Good/bad paired where useful | Concise (not full implementations)

**Severity**: Medium -- missing examples cause edge case failures.

## Dimension 7: Skill Loading Effectiveness

Does the agent ensure skills are actually loaded during execution?

**Check**: Skill Loading Strategy table present for agents with 3+ skills | Every frontmatter skill has matching `Load:` directive in workflow | Skills path documented (`~/.claude/skills/nw-{skill-name}/SKILL.md`) | Phase-gated loading (not "load everything at start")

**Severity**: High — orphan skills (declared but never loaded) mean sub-agents operate without domain knowledge. The `skills:` frontmatter field is declarative only; Claude Code does not auto-load skill files.

**Gold standard**: `nw-product-owner.md` — Skill Loading Strategy table mapping phases to skills with triggers + explicit `Load:` directives in each workflow phase.

## Dimension 8: Token Efficiency

Is the agent definition compressed without losing semantic content?

**Check**: No verbose prose where pipe-delimited lists suffice | Imperative voice throughout | No filler words ("in order to", "it is important to") | `### Example N:` headers preserved verbatim (not inlined) | AskUserQuestion options preserved with numbered descriptions | Code blocks preserved verbatim | No duplicate content already in skills

**Severity**: Medium — bloated definitions waste context window and degrade performance via context rot.

**Compression safe**: prose descriptions, bullet lists, related items → pipe-delimited
**Compression unsafe**: example headers, code blocks, decision tree options, YAML frontmatter

## Dimension 9: Priority Validation

**Questions**: 1. Is this the largest bottleneck? (Evidence required) | 2. Simpler alternatives considered? | 3. Constraint prioritization correct? | 4. Architecture data-justified?

**Severity**: High if agent addresses secondary concern while larger problem exists.

## Review Output Format

```yaml
review:
  agent: "{agent-name}"
  dimensions:
    template_compliance: {pass|fail}
    size_and_focus: {pass|fail}
    divergence_quality: {pass|fail}
    safety_implementation: {pass|fail}
    language_and_tone: {pass|fail}
    examples_quality: {pass|fail}
    skill_loading: {pass|fail|n/a}
    token_efficiency: {pass|fail}
    priority_validation: {pass|fail}
  issues:
    - dimension: "{dimension}"
      severity: "{high|medium|low}"
      finding: "{description}"
      recommendation: "{fix}"
  verdict: "{approved|revisions_needed}"
```

## Failure Conditions

Review blocked (verdict: revisions_needed) if: any high-severity dimension fails | 3+ medium-severity fail | Agent exceeds 400 lines without Skills extraction | Zero examples provided | Agent with 3+ skills missing Skill Loading Strategy table

Related Skills

nw-sc-review-dimensions

322

from nWave-ai/nWave

Reviewer critique dimensions for peer review - implementation bias detection, test quality validation, completeness checks, and priority validation

nw-sar-critique-dimensions

322

from nWave-ai/nWave

Architecture quality critique dimensions for peer review. Load when performing architecture document reviews.

nw-sa-critique-dimensions

322

from nWave-ai/nWave

Architecture quality critique dimensions for peer review. Load when invoking solution-architect-reviewer or performing self-review of architecture documents.

nw-rr-critique-dimensions

322

from nWave-ai/nWave

Critique dimensions and scoring for research document reviews

nw-po-review-dimensions

322

from nWave-ai/nWave

Requirements quality critique dimensions for peer review - confirmation bias detection, completeness validation, clarity checks, testability assessment, and priority validation

nw-par-critique-dimensions

322

from nWave-ai/nWave

Platform design review critique dimensions and severity levels. Load when reviewing CI/CD pipelines, infrastructure, deployment strategies, observability, or security designs.

nw-ad-critique-dimensions

322

from nWave-ai/nWave

Review dimensions for acceptance test quality - happy path bias, GWT compliance, business language purity, coverage completeness, walking skeleton user-centricity, priority validation, observable behavior assertions, traceability coverage, and walking skeleton boundary proof

nw-abr-critique-dimensions

322

from nWave-ai/nWave

Review dimensions for validating agent quality - template compliance, safety, testing, and priority validation

nw-ux-web-patterns

322

from nWave-ai/nWave

Web UI design patterns for product owners. Load when designing web application interfaces, writing web-specific acceptance criteria, or evaluating responsive designs.

nw-ux-tui-patterns

322

from nWave-ai/nWave

Terminal UI and CLI design patterns for product owners. Load when designing command-line tools, interactive terminal applications, or writing CLI-specific acceptance criteria.

nw-ux-principles

322

from nWave-ai/nWave

Core UX principles for product owners. Load when evaluating interface designs, writing acceptance criteria with UX requirements, or reviewing wireframes and mockups.

nw-ux-emotional-design

322

from nWave-ai/nWave

Emotional design and delight patterns for product owners. Load when designing onboarding flows, empty states, first-run experiences, or evaluating the emotional quality of an interface.