nw-agent-testing
5-layer testing approach for agent validation including adversarial testing, security validation, and prompt injection resistance
Best use case
nw-agent-testing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
5-layer testing approach for agent validation including adversarial testing, security validation, and prompt injection resistance
Teams using nw-agent-testing should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/nw-agent-testing/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How nw-agent-testing Compares
| Feature / Agent | nw-agent-testing | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
5-layer testing approach for agent validation including adversarial testing, security validation, and prompt injection resistance
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
SKILL.md Source
# Agent Testing Framework ## 5-Layer Testing Approach ### Layer 1: Output Quality (Unit-Level) Validate agent produces correct, well-structured outputs for typical inputs. **Test**: Agent follows workflow phases | Outputs match expected format/structure | Domain-specific rules correctly applied | Token efficiency within bounds **How**: Manual invocation with representative inputs. Check against acceptance criteria in agent description. ### Layer 2: Integration / Handoff Validation Validate correct input/output between agents in workflows. **Test**: Input parsing handles upstream format | Output format matches downstream expectations | Error signals propagate correctly | Subagent mode activation works (skip greet, execute autonomously) **How**: End-to-end workflow execution through full agent chain (e.g., DISCUSS -> DESIGN -> DELIVER). ### Layer 3: Adversarial Output Validation Challenge validity of agent outputs rather than accepting at face value. **Test**: Source verification (cited sources real and accurate?) | Bias detection (favors one approach without evidence?) | Edge case coverage | Completeness (required sections present?) **How**: Peer review by `-reviewer` agent using structured critique dimensions. ### Layer 4: Adversarial Verification (Peer Review) Independent review to catch biases and blind spots in agent design. **Test**: Definition follows validation checklist? | Redundant Claude default instructions? | Over/under-specified? | Could simpler agent achieve same results? **How**: `@nw-agent-builder` validates via 11-point checklist or `@agent-builder-reviewer` runs structured review. ### Layer 5: Security Validation Test resilience against misuse and prompt injection. **Test**: Tool restriction enforcement | maxTurns respected | Permission mode correctly scoped | Agent stays within declared scope **How**: Frontmatter fields enforce at platform level. Verify configuration. ## Prompt Injection Resistance Claude Code platform provides injection resistance through: subagent isolation (own context, no sub-subagents) | Tool restriction via frontmatter `tools` | Permission modes via `permissionMode` | Hook-based validation (PreToolUse, PostToolUse) Do NOT add prose-based injection defense. Configure platform features: ```yaml --- tools: Read, Glob, Grep # Only tools this agent needs maxTurns: 30 # Prevents runaway execution permissionMode: default # User approves dangerous actions --- ``` ## Security Validation Checklist - [ ] `tools` restricted to minimum necessary (least privilege) - [ ] `maxTurns` set to prevent runaway execution - [ ] `permissionMode` appropriate for risk level - [ ] No `Bash` unless agent requires command execution - [ ] No `Write` unless agent creates/modifies files - [ ] Description accurately describes scope - [ ] Subagent mode handles autonomous execution correctly - [ ] No sensitive data hardcoded in definition ## Testing Workflow for New Agents 1. **Create** with minimal definition 2. **Layer 1**: Invoke with 2-3 representative inputs, check outputs 3. **Layer 2**: Run in workflow chain if applicable 4. **Fix** failures observed 5. **Validate**: Run 11-point checklist 6. **Iterate**: Add instructions only for observed failure modes
Related Skills
nw-property-based-testing
Property-based testing strategies, mutation testing, shrinking, and combined PBT+mutation workflow for test quality validation
nw-hexagonal-testing
5-layer agent output validation, I/O contract specification, vertical slice development, and test doubles policy with per-layer examples
nw-ux-web-patterns
Web UI design patterns for product owners. Load when designing web application interfaces, writing web-specific acceptance criteria, or evaluating responsive designs.
nw-ux-tui-patterns
Terminal UI and CLI design patterns for product owners. Load when designing command-line tools, interactive terminal applications, or writing CLI-specific acceptance criteria.
nw-ux-principles
Core UX principles for product owners. Load when evaluating interface designs, writing acceptance criteria with UX requirements, or reviewing wireframes and mockups.
nw-ux-emotional-design
Emotional design and delight patterns for product owners. Load when designing onboarding flows, empty states, first-run experiences, or evaluating the emotional quality of an interface.
nw-ux-desktop-patterns
Desktop application UI patterns for product owners. Load when designing native or cross-platform desktop applications, writing desktop-specific acceptance criteria, or evaluating panel layouts and keyboard workflows.
nw-user-story-mapping
User story mapping for backlog management and outcome-based prioritization. Load during Phase 2.5 (User Story Mapping) to produce story-map.md and prioritization.md.
nw-tr-review-criteria
Review dimensions and scoring for root cause analysis quality assessment
nw-tlaplus-verification
TLA+ formal verification for design correctness and PBT pipeline integration
nw-test-refactoring-catalog
Detailed refactoring mechanics with step-by-step procedures, and test code smell catalog with detection patterns and before/after examples
nw-test-organization-conventions
Test directory structure patterns by architecture style, language conventions, naming rules, and fixture placement. Decision tree for selecting test organization strategy.