nw-agent-testing

5-layer testing approach for agent validation including adversarial testing, security validation, and prompt injection resistance

322 stars

bynWave-ai

View on GitHub Installation ↓

Best use case

nw-agent-testing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

5-layer testing approach for agent validation including adversarial testing, security validation, and prompt injection resistance

Teams using nw-agent-testing should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/nw-agent-testing/SKILL.md --create-dirs "https://raw.githubusercontent.com/nWave-ai/nWave/main/nWave/skills/nw-agent-testing/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/nw-agent-testing/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How nw-agent-testing Compares

Feature / Agent	nw-agent-testing	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

5-layer testing approach for agent validation including adversarial testing, security validation, and prompt injection resistance

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

SKILL.md Source

# Agent Testing Framework

## 5-Layer Testing Approach

### Layer 1: Output Quality (Unit-Level)

Validate agent produces correct, well-structured outputs for typical inputs.

**Test**: Agent follows workflow phases | Outputs match expected format/structure | Domain-specific rules correctly applied | Token efficiency within bounds

**How**: Manual invocation with representative inputs. Check against acceptance criteria in agent description.

### Layer 2: Integration / Handoff Validation

Validate correct input/output between agents in workflows.

**Test**: Input parsing handles upstream format | Output format matches downstream expectations | Error signals propagate correctly | Subagent mode activation works (skip greet, execute autonomously)

**How**: End-to-end workflow execution through full agent chain (e.g., DISCUSS -> DESIGN -> DELIVER).

### Layer 3: Adversarial Output Validation

Challenge validity of agent outputs rather than accepting at face value.

**Test**: Source verification (cited sources real and accurate?) | Bias detection (favors one approach without evidence?) | Edge case coverage | Completeness (required sections present?)

**How**: Peer review by `-reviewer` agent using structured critique dimensions.

### Layer 4: Adversarial Verification (Peer Review)

Independent review to catch biases and blind spots in agent design.

**Test**: Definition follows validation checklist? | Redundant Claude default instructions? | Over/under-specified? | Could simpler agent achieve same results?

**How**: `@nw-agent-builder` validates via 11-point checklist or `@agent-builder-reviewer` runs structured review.

### Layer 5: Security Validation

Test resilience against misuse and prompt injection.

**Test**: Tool restriction enforcement | maxTurns respected | Permission mode correctly scoped | Agent stays within declared scope

**How**: Frontmatter fields enforce at platform level. Verify configuration.

## Prompt Injection Resistance

Claude Code platform provides injection resistance through: subagent isolation (own context, no sub-subagents) | Tool restriction via frontmatter `tools` | Permission modes via `permissionMode` | Hook-based validation (PreToolUse, PostToolUse)

Do NOT add prose-based injection defense. Configure platform features:

```yaml
---
tools: Read, Glob, Grep           # Only tools this agent needs
maxTurns: 30                       # Prevents runaway execution
permissionMode: default            # User approves dangerous actions
---
```

## Security Validation Checklist

- [ ] `tools` restricted to minimum necessary (least privilege)
- [ ] `maxTurns` set to prevent runaway execution
- [ ] `permissionMode` appropriate for risk level
- [ ] No `Bash` unless agent requires command execution
- [ ] No `Write` unless agent creates/modifies files
- [ ] Description accurately describes scope
- [ ] Subagent mode handles autonomous execution correctly
- [ ] No sensitive data hardcoded in definition

## Testing Workflow for New Agents

1. **Create** with minimal definition
2. **Layer 1**: Invoke with 2-3 representative inputs, check outputs
3. **Layer 2**: Run in workflow chain if applicable
4. **Fix** failures observed
5. **Validate**: Run 11-point checklist
6. **Iterate**: Add instructions only for observed failure modes

Related Skills

nw-property-based-testing

322

from nWave-ai/nWave

Property-based testing strategies, mutation testing, shrinking, and combined PBT+mutation workflow for test quality validation

nw-hexagonal-testing

322

from nWave-ai/nWave

5-layer agent output validation, I/O contract specification, vertical slice development, and test doubles policy with per-layer examples

nw-ux-web-patterns

322

from nWave-ai/nWave

Web UI design patterns for product owners. Load when designing web application interfaces, writing web-specific acceptance criteria, or evaluating responsive designs.

nw-ux-tui-patterns

322

from nWave-ai/nWave

Terminal UI and CLI design patterns for product owners. Load when designing command-line tools, interactive terminal applications, or writing CLI-specific acceptance criteria.

nw-ux-principles

322

from nWave-ai/nWave

Core UX principles for product owners. Load when evaluating interface designs, writing acceptance criteria with UX requirements, or reviewing wireframes and mockups.

nw-ux-emotional-design

322

from nWave-ai/nWave

Emotional design and delight patterns for product owners. Load when designing onboarding flows, empty states, first-run experiences, or evaluating the emotional quality of an interface.

nw-ux-desktop-patterns

322

from nWave-ai/nWave

Desktop application UI patterns for product owners. Load when designing native or cross-platform desktop applications, writing desktop-specific acceptance criteria, or evaluating panel layouts and keyboard workflows.

nw-user-story-mapping

322

from nWave-ai/nWave

User story mapping for backlog management and outcome-based prioritization. Load during Phase 2.5 (User Story Mapping) to produce story-map.md and prioritization.md.

nw-tr-review-criteria

322

from nWave-ai/nWave

Review dimensions and scoring for root cause analysis quality assessment

nw-tlaplus-verification

322

from nWave-ai/nWave

TLA+ formal verification for design correctness and PBT pipeline integration

nw-test-refactoring-catalog

322

from nWave-ai/nWave

Detailed refactoring mechanics with step-by-step procedures, and test code smell catalog with detection patterns and before/after examples

nw-test-organization-conventions

322

from nWave-ai/nWave

Test directory structure patterns by architecture style, language conventions, naming rules, and fixture placement. Decision tree for selecting test organization strategy.