subagent-testing

Test skills via RED/GREEN/REFACTOR TDD with fresh subagents

3,891 stars

Best use case

subagent-testing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Test skills via RED/GREEN/REFACTOR TDD with fresh subagents

Teams using subagent-testing should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/nm-abstract-subagent-testing/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/athola/nm-abstract-subagent-testing/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/nm-abstract-subagent-testing/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How subagent-testing Compares

Feature / Agentsubagent-testingStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Test skills via RED/GREEN/REFACTOR TDD with fresh subagents

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

> **Night Market Skill** — ported from [claude-night-market/abstract](https://github.com/athola/claude-night-market/tree/master/plugins/abstract). For the full experience with agents, hooks, and commands, install the Claude Code plugin.


# Subagent Testing - TDD for Skills

Test skills with fresh subagent instances to prevent priming bias and validate effectiveness.

## Table of Contents

1. [Overview](#overview)
2. [Why Fresh Instances Matter](#why-fresh-instances-matter)
3. [Testing Methodology](#testing-methodology)
4. [Quick Start](#quick-start)
5. [Detailed Testing Guide](#detailed-testing-guide)
6. [Success Criteria](#success-criteria)

## Overview

**Fresh instances prevent priming:** Each test uses a new Claude conversation to verify
the skill's impact is measured, not conversation history effects.

## Why Fresh Instances Matter

### The Priming Problem
Running tests in the same conversation creates bias:
- Prior context influences responses
- Skill effects get mixed with conversation history
- Can't isolate skill's true impact

### Fresh Instance Benefits
- **Isolation**: Each test starts clean
- **Reproducibility**: Consistent baseline state
- **Measurement**: Clear before/after comparison
- **Validation**: Proves skill effectiveness, not priming

## Testing Methodology

Three-phase TDD-style approach:

### Phase 1: Baseline Testing (RED)
Test without skill to establish baseline behavior.

### Phase 2: With-Skill Testing (GREEN)
Test with skill loaded to measure improvements.

### Phase 3: Rationalization Testing (REFACTOR)
Test skill's anti-rationalization guardrails.

## Quick Start

```bash
# 1. Create baseline tests (without skill)
# Use 5 diverse scenarios
# Document full responses

# 2. Create with-skill tests (fresh instances)
# Load skill explicitly
# Use identical prompts
# Compare to baseline

# 3. Create rationalization tests
# Test anti-rationalization patterns
# Verify guardrails work
```

## Detailed Testing Guide

For complete testing patterns, examples, and templates:
- **[Testing Patterns](modules/testing-patterns.md)** - Full TDD methodology
- **[Test Examples](modules/testing-patterns.md)** - Baseline, with-skill, rationalization tests
- **[Analysis Templates](modules/testing-patterns.md)** - Scoring and comparison frameworks

## Success Criteria

- **Baseline**: Document 5+ diverse baseline scenarios
- **Improvement**: ≥50% improvement in skill-related metrics
- **Consistency**: Results reproducible across fresh instances
- **Rationalization Defense**: Guardrails prevent ≥80% of rationalization attempts

## See Also

- **skill-authoring**: Creating effective skills
- **bulletproof-skill**: Anti-rationalization patterns
- **test-skill**: Automated skill testing command

Related Skills

limited-info-subagent-skill-verify

3891
from openclaw/skills

Validate whether a skill can be executed successfully by a minimally informed subagent. Use when the user wants to test a skill by giving a subagent only a minimal invocation, such as the skill name plus a single artifact like a link, file, or short prompt, and then grade whether the subagent actually performed the skill rather than merely describing it.

rust-testing-code-review

3891
from openclaw/skills

Reviews Rust test code for unit test patterns, integration test structure, async testing, mocking approaches, and property-based testing. Use when reviewing _test.rs files,

superpowers-subagent-dev

3891
from openclaw/skills

Use when executing implementation plans with independent tasks - coordinates task execution by dispatching subagents per task with verification checkpoints, adapted for OpenClaw's isolated session model

zod-testing

3891
from openclaw/skills

Testing patterns for Zod schemas using Jest and Vitest. Covers schema correctness testing, mock data generation, error assertion patterns, integration testing with API handlers and forms, snapshot testing with z.toJSONSchema(), and property-based testing. Baseline: zod ^4.0.0. Triggers on: test files for Zod schemas, zod-schema-faker imports, mentions of "test schema", "schema test", "zod mock", "zod test", or schema testing patterns.

redux-saga-testing

3891
from openclaw/skills

Write tests for Redux Sagas using redux-saga-test-plan, runSaga, and manual generator testing. Covers expectSaga (integration), testSaga (unit), providers, partial matchers, reducer integration, error simulation, and cancellation testing. Works with Jest and Vitest. Triggers on: test files for sagas, redux-saga-test-plan imports, mentions of "test saga", "saga test", "expectSaga", "testSaga", or "redux-saga-test-plan".

create-subagent

3891
from openclaw/skills

创建和管理 SubAgent(子智能体)。使用当用户需要:(1) 创建新的 SubAgent 执行特定任务,(2) 查看/管理已有的 SubAgent,(3) 终止或指导 SubAgent。支持多种预设类型:开发、研究、写作、数据分析等。

vitest-testing

3891
from openclaw/skills

Vitest testing framework patterns and best practices. Use when writing unit tests, integration tests, configuring vitest.config, mocking with vi.mock/vi.fn, using snapshots, or setting up test coverage. Triggers on describe, it, expect, vi.mock, vi.fn, beforeEach, afterEach, vitest.

swift-testing-code-review

3891
from openclaw/skills

Reviews Swift Testing code for proper use of

pydantic-ai-testing

3891
from openclaw/skills

Test PydanticAI agents using TestModel, FunctionModel, VCR cassettes, and inline snapshots. Use when writing unit tests, mocking LLM responses, or recording API interactions.

QA & Testing Engine — Complete Software Quality System

3880
from openclaw/skills

> The definitive testing methodology for AI agents. From test strategy to execution, coverage to reporting — everything you need to ship quality software.

---

3891
from openclaw/skills

name: article-factory-wechat

Content & Documentation

humanizer

3891
from openclaw/skills

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.

Content & Documentation