validate

Verify implementation against specifications

Best use case

validate is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Verify implementation against specifications

Teams using validate should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/validate/SKILL.md --create-dirs "https://raw.githubusercontent.com/stevengonsalvez/agents-in-a-box/main/toolkit/packages/skills/validate/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/validate/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How validate Compares

Feature / AgentvalidateStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Verify implementation against specifications

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Validate

You are tasked with validating that an implementation plan was correctly executed, verifying all success criteria and identifying any deviations or issues.

<!-- recall:begin -->

## Step 0: Prior-art check (RECOMMENDED)

Before validating, recall prior learnings from the global knowledge base so we don't re-learn or re-decide something already captured:

```bash
uv run "{{HOME_TOOL_DIR}}/skills/recall/scripts/recall.py" \
  "<QUERY>" \
  --limit 5 --format markdown
```

**Query construction for `/validate`**: the feature/change being validated + relevant verification keywords (e.g. `"OAuth callback validation edge cases"`).

**What to do with results:**

- If a returned learning names a constraint, anti-pattern, or prior decision directly relevant to the task — surface it to the user BEFORE proceeding with this skill's main flow.
- If nothing relevant returns — proceed silently, no need to mention the check.
- Never block on recall failure. Empty output / non-zero exit is expected when the KB is absent or the subprocess errors — treat it as "no prior art found", not as an error.

<!-- recall:end -->

## Initial Setup

When invoked:

1. **Determine context**:
   - Are you in an existing conversation where implementation just happened?
   - Or starting fresh and need to discover what was implemented?

2. **Locate the plan**:
   - If plan path provided, use it
   - Otherwise, check `plans/` directory for recent plans
   - Look for plans with checkmarks indicating implementation

3. **Respond appropriately**:
```
I'll validate the implementation against the plan.

[If in existing conversation]:
I'll verify the work we just completed.

[If starting fresh]:
Please provide the plan file to validate against, or I can check for recently implemented plans.
```

## Validation Process

### Step 1: Context Discovery

If starting fresh or need more context:

1. **Read the implementation plan** completely
2. **Identify what should have changed**:
   - List all files that should be modified
   - Note all success criteria (automated and manual)
   - Identify key functionality to verify

3. **Spawn parallel research tasks** to discover implementation:
   ```
   Task 1 - Verify database changes:
   Research if migration [N] was added and schema changes match plan.
   Check: migration files, schema version, table structure
   Return: What was implemented vs what plan specified

   Task 2 - Verify code changes:
   Find all modified files related to [feature].
   Compare actual changes to plan specifications.
   Return: File-by-file comparison of planned vs actual

   Task 3 - Verify test coverage:
   Check if tests were added/modified as specified.
   Run test commands and capture results.
   Return: Test status and any missing coverage
   ```

### Step 2: Systematic Validation

For each phase in the plan:

1. **Check completion status**:
   - Look for checkmarks in the plan (- [x])
   - Verify the actual code matches claimed completion

2. **Run automated verification**:
   - Execute each command from "Automated Verification"
   - Document pass/fail status
   - If failures, investigate root cause

3. **Assess manual criteria**:
   - List what needs manual testing
   - Provide clear steps for user verification

4. **Think deeply about edge cases**:
   - Were error conditions handled?
   - Are there missing validations?
   - Could the implementation break existing functionality?

### Step 2.5: Goal-Backward Verification

Go beyond task completion -- verify the codebase actually delivers what was promised.

1. **Extract must-haves from the plan**:
   - Read each phase's "Success Criteria" and "Overview"
   - Identify observable truths: what must be TRUE from a user's perspective
   - Map each truth to required artifacts (files) and key links (connections)

2. **Three-level artifact analysis** for each required artifact:

   | Level | Check | Question |
   |-------|-------|----------|
   | **Exists** | File/function present | `Does the file exist? Is the function defined?` |
   | **Substantive** | Real code, not stub | `Is this actual implementation or placeholder?` |
   | **Wired** | Connected and used | `Is this imported, called, routed, rendered?` |

3. **Stub detection** -- flag these patterns as NOT substantive:
   - `return null`, `return {}`, `return undefined`
   - `// TODO`, `// FIXME`, `// placeholder`
   - `throw new Error("Not implemented")`
   - Empty function bodies, `pass` in Python
   - `onClick={() => {}}`, empty event handlers
   - `Response.json({ message: "Not implemented" })`
   - Functions defined but never imported/called

4. **Produce verification matrix**:

   ```markdown
   ## Goal-Backward Verification

   | Truth | Artifact | Exists | Substantive | Wired | Status |
   |-------|----------|--------|-------------|-------|--------|
   | User can login | src/auth/login.ts | Y | Y | Y | VERIFIED |
   | Session persists | src/auth/session.ts | Y | Y | N | ORPHANED |
   | Errors display | src/components/Error.tsx | Y | N | - | STUB |
   | Rate limited | src/middleware/rate.ts | N | - | - | MISSING |
   ```

5. **Gap structuring** -- for any non-VERIFIED items:
   ```markdown
   ### Gaps Found

   #### Gap 1: Session persistence (ORPHANED)
   - **Truth**: "Session persists across page reloads"
   - **Issue**: session.ts exists and is substantive but never imported in app routes
   - **Fix**: Import and wire session middleware in src/app.ts

   #### Gap 2: Error display (STUB)
   - **Truth**: "Errors display to user"
   - **Issue**: Error.tsx returns null -- placeholder component
   - **Fix**: Implement error rendering with message prop
   ```

6. **Include in validation report** (Step 4) under a "Goal-Backward Verification" section

### Step 3: Code Quality Review

Spawn parallel Task agents for thorough review:

```
Task 1: "Review implementation of [Phase 1 feature]"
- Check if implementation matches plan specifications
- Verify error handling and edge cases
- Look for potential bugs or issues
- Return specific findings with file:line references

Task 2: "Verify test coverage for [feature]"
- Check if tests were added as specified
- Verify test quality and coverage
- Look for missing test cases
- Return test file locations and coverage gaps

Task 3: "Check for regressions in [related component]"
- Verify existing functionality still works
- Check for breaking changes
- Look for unintended side effects
- Return any concerning changes
```

### Step 4: Generate Validation Report

Create comprehensive validation summary:

```markdown
# Validation Report: [Plan Name]

**Date**: [Current date and time]
**Plan**: plans/[plan_file].md
**Validation Type**: [Automated | Manual | Comprehensive]

## Implementation Status

### Phase Completion
Phase 1: [Name] - Fully implemented
Phase 2: [Name] - Fully implemented
Phase 3: [Name] - Partially implemented (see issues)

### Files Modified
`src/auth/oauth.js` - Added as specified
`src/config/auth.config.js` - Updated correctly
`tests/auth.test.js` - Missing test cases

## Automated Verification Results

### Build & Compilation
Build passes: `npm run build` (2.3s)
TypeScript: No errors

### Tests
Unit tests: 142 passing
Integration tests: 2 failing
  - Error in OAuth callback test (timeout)
  - Error in token refresh test (undefined variable)

### Code Quality
Linting: Clean (ESLint)
Coverage: 78% (target was 80%)

## Code Review Findings

### Matches Plan
- OAuth2 flow implemented correctly
- Token storage follows security requirements
- Error handling comprehensive

### Deviations from Plan
- Used different library than specified (passport vs manual)
  - Impact: Positive - more maintainable
- Added extra validation not in plan
  - Impact: Positive - improved security

### Potential Issues
**Performance**: Token validation happens on every request
  - Recommendation: Add caching layer

**Security**: Refresh tokens stored in localStorage
  - Recommendation: Use httpOnly cookies

**Missing**: Rate limiting not implemented
  - Required by plan Phase 2

## Manual Testing Required

The following require manual verification:

### User Interface
- [ ] OAuth login button appears correctly
- [ ] Redirect flow works smoothly
- [ ] Error messages display appropriately

### Integration Testing
- [ ] Works with Google OAuth provider
- [ ] Works with GitHub OAuth provider
- [ ] Token refresh happens seamlessly

### Performance Testing
- [ ] Login completes within 3 seconds
- [ ] Handles 100 concurrent logins

## Recommendations

### Immediate Actions
1. Fix failing integration tests
2. Add missing rate limiting
3. Increase test coverage to 80%

### Before Production
1. Move refresh tokens to secure storage
2. Add token validation caching
3. Complete manual testing checklist

### Nice to Have
1. Add OAuth provider abstraction
2. Implement token rotation
3. Add audit logging

## Summary

**Overall Status**: **Mostly Complete**

The implementation follows the plan well with some positive deviations. However, there are 2 failing tests and missing rate limiting that must be addressed before this can be considered complete.

**Next Steps**:
1. Fix the failing integration tests
2. Implement rate limiting (Phase 2 requirement)
3. Complete manual testing
4. Address security recommendations
```

Save to: `validation/YYYY-MM-DD_HH-MM-SS_planname.md`

## Interactive Mode

After presenting the report, ask:

```
Would you like me to:
1. Help fix the failing tests?
2. Implement the missing features?
3. Run specific additional checks?
4. Generate a checklist for manual testing?
```

## Best Practices

1. **Be thorough but practical** - Focus on what matters
2. **Run all automated checks** - Don't skip verification commands
3. **Think critically** - Question if the implementation truly solves the problem
4. **Consider maintenance** - Will this be maintainable long-term?
5. **Be constructive** - Provide actionable recommendations

## Validation Checklist

Always verify:
- [ ] All phases marked complete are actually done
- [ ] Automated tests pass
- [ ] Code follows project conventions
- [ ] No regressions introduced
- [ ] Error handling is robust
- [ ] Performance is acceptable
- [ ] Security considerations addressed
- [ ] Documentation updated if needed

## Common Validation Commands

```bash
# JavaScript/TypeScript
npm test
npm run lint
npm run typecheck
npm run build

# Python
pytest
python -m pytest --cov
flake8
mypy

# Go
go test ./...
go vet ./...
golangci-lint run

# Rust
cargo test
cargo clippy
cargo fmt --check

# Generic
make test
make check
make lint
```

## When Validation Fails

If validation reveals issues:

1. **Categorize by severity**:
   - Blockers: Must fix before merge
   - Important: Should fix soon
   - Nice to have: Can be addressed later

2. **Provide fixes when possible**:
   - For simple issues, suggest the fix
   - For complex issues, outline an approach

3. **Update the plan**:
   - Add checkboxes for fixes needed
   - Note which phases need rework

Remember: Good validation catches issues before they reach production. Be constructive but thorough in identifying gaps or improvements.

Related Skills

workflow

8
from stevengonsalvez/agents-in-a-box

Guide through structured delivery workflow with plan, implement, validate phases

webapp-testing

8
from stevengonsalvez/agents-in-a-box

Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.

ui-ux-pro-max

8
from stevengonsalvez/agents-in-a-box

UI/UX design intelligence. 67 styles, 96 palettes, 57 font pairings, 25 charts, 13 stacks (React, Next.js, Vue, Svelte, Astro, Nuxt, SwiftUI, React Native, Flutter, Tailwind, shadcn/ui, Jetpack Compose). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient.

tui-style-guide

8
from stevengonsalvez/agents-in-a-box

TUI style guide for consistent terminal interface design

token-usage

8
from stevengonsalvez/agents-in-a-box

Show Claude Code token usage across sessions — daily, weekly, per-project, and per-session breakdowns. Parses {{HOME_TOOL_DIR}}/projects/**/*.jsonl for consumption data. Use when the user asks about token usage, costs, how many tokens were used, session statistics, or wants a usage report.

tmux-status

8
from stevengonsalvez/agents-in-a-box

Show status of all tmux sessions including dev environments, spawned agents, and running processes

tmux-monitor

8
from stevengonsalvez/agents-in-a-box

Monitor and report status of all tmux sessions including dev environments, spawned agents, and running processes. Uses tmuxwatch for enhanced visibility.

tmux-message

8
from stevengonsalvez/agents-in-a-box

Reliable peer-to-peer message delivery to other Claude Code instances via tmux send-keys. Use as a fallback when claude-peers MCP send_message fails to surface in the receiver's inbox (delivered server-side but receiver never picks it up — observed behaviour). Also use when sending a directive to a known Claude Code TUI session by tmux session name or fuzzy hint, or when injecting a multi-line directive into a peer's prompt and submitting it. Trigger phrases — "claude-peers fallback", "tmux send-keys", "send to peer via tmux", "inject directive", "deliver to nanoclaw/hermes peer", "peer message". Tmux-only — won't reach peers running outside tmux.

test-driven-development

8
from stevengonsalvez/agents-in-a-box

Use when implementing any feature or bugfix, before writing implementation code. Enforces RED-GREEN-REFACTOR cycle with test-first approach.

test-ainb

8
from stevengonsalvez/agents-in-a-box

Run tests for the ainb (agents-in-a-box) Rust workspace via a 5-layer strategy — unit, insta snapshot, mock-plugin compositing, real-plugin spawn, vhs recording. Wraps cargo + insta + vhs into one CLI. Use when Stevie says "/test-ainb", "test ainb", "run ainb tests", "snapshot <component>", "regenerate vhs tapes", or any phrasing about validating ainb test layers. The skill autodetects which ainb-tui worktree the cwd sits in and dispatches to scripts/run.sh.

sync-learnings

8
from stevengonsalvez/agents-in-a-box

Sync user-level agent config changes back to toolkit repository (works for Claude, Codex, Copilot)

swarm-status

8
from stevengonsalvez/agents-in-a-box

Display comprehensive status dashboard for a swarm team