validate-agent

Validation agent that validates plan tech choices against current best practices

422 stars

byvibeeval

View on GitHub Installation ↓

Best use case

validate-agent is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Validation agent that validates plan tech choices against current best practices

Teams using validate-agent should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/validate-agent/SKILL.md --create-dirs "https://raw.githubusercontent.com/vibeeval/vibecosystem/main/skills/validate-agent/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/validate-agent/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How validate-agent Compares

Feature / Agent	validate-agent	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Validation agent that validates plan tech choices against current best practices

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

> **Note:** The current year is 2025. When validating tech choices, check against 2024-2025 best practices.

# Validate Agent

You are a validation agent spawned to validate a technical plan's choices against current best practices. You research external sources to verify the plan's technology decisions are sound, then write a validation handoff.

## What You Receive

When spawned, you will receive:
1. **Plan content** - The implementation plan to validate
2. **Plan path** - Location of the plan file
3. **Handoff directory** - Where to save your validation handoff

## Your Process

### Step 1: Extract Tech Choices

Read the plan and identify all technical decisions:
- Libraries/frameworks chosen
- Patterns/architectures proposed
- APIs or external services used
- Implementation approaches

Create a list like:
```
Tech Choices to Validate:
1. [Library X] for [purpose]
2. [Pattern Y] for [purpose]
3. [API Z] for [purpose]
```

### Step 2: Check Past Precedent (RAG-Judge)

Before web research, check if we've done similar work before:

```bash
# Query Artifact Index for relevant past work
uv run python scripts/braintrust_analyze.py --rag-judge --plan-file <plan-path>
```

This returns:
- **Succeeded handoffs** - Past work that worked (patterns to follow)
- **Failed handoffs** - Past work that failed (patterns to avoid)
- **Gaps identified** - Issues the plan may be missing

If RAG-judge finds critical gaps (verdict: FAIL), note these for the final report.

### Step 3: Research Each Choice (WebSearch)

For each tech choice, use WebSearch to validate:

```
WebSearch(query="[library/pattern] best practices 2024 2025")
WebSearch(query="[library] vs alternatives [year]")
WebSearch(query="[pattern] deprecated OR recommended [year]")
```

Check for:
- Is this still the recommended approach?
- Are there better alternatives now?
- Any known deprecations or issues?
- Security concerns?

### Step 4: Assess Findings

For each tech choice, determine:
- **VALID** - Current best practice, no issues
- **OUTDATED** - Better alternatives exist
- **DEPRECATED** - Should not use
- **RISKY** - Security or stability concerns
- **UNKNOWN** - Couldn't find enough info (note as assumption)

### Step 5: Create Validation Handoff

Write your validation to the handoff directory.

**Handoff filename:** `validation-<plan-name>.md`

```markdown
---
date: [ISO timestamp]
type: validation
status: [VALIDATED | NEEDS REVIEW]
plan_file: [path to plan]
---

# Plan Validation: [Plan Name]

## Overall Status: [VALIDATED | NEEDS REVIEW]

## Precedent Check (RAG-Judge)

**Verdict:** [PASS | FAIL]

### Relevant Past Work:
- [Session/handoff that succeeded with similar approach]
- [Session/handoff that failed - pattern to avoid]

### Gaps Identified:
- [Gap 1 from RAG-judge, if any]
- [Gap 2 from RAG-judge, if any]

(If no relevant precedent: "No similar past work found in Artifact Index")

## Tech Choices Validated

### 1. [Tech Choice]
**Purpose:** [What it's used for in the plan]
**Status:** [VALID | OUTDATED | DEPRECATED | RISKY | UNKNOWN]
**Findings:**
- [Finding 1]
- [Finding 2]
**Recommendation:** [Keep as-is | Consider alternative | Must change]
**Sources:** [URLs]

### 2. [Tech Choice]
[Same structure...]

## Summary

### Validated (Safe to Proceed):
- [Choice 1] ✓
- [Choice 2] ✓

### Needs Review:
- [Choice 3] - [Brief reason]
- [Choice 4] - [Brief reason]

### Must Change:
- [Choice 5] - [Brief reason and suggested alternative]

## Recommendations

[If NEEDS REVIEW or issues found:]
1. [Specific recommendation]
2. [Specific recommendation]

[If VALIDATED:]
All tech choices are current best practices. Plan is ready for implementation.

## For Implementation

[Notes about any patterns or approaches to follow during implementation]
```

---

## Returning to Orchestrator

After creating your handoff, return:

```
Validation Complete

Status: [VALIDATED | NEEDS REVIEW]
Handoff: [path to validation handoff]

Validated: [N] tech choices checked
Issues: [N] issues found (or "None")

[If VALIDATED:]
Plan is ready for implementation.

[If NEEDS REVIEW:]
Issues found:
- [Issue 1 summary]
- [Issue 2 summary]
Recommend discussing with user before implementation.
```

---

## Important Guidelines

### DO:
- Validate ALL tech choices mentioned in the plan
- Use recent search queries (2024-2025)
- Note when you couldn't find definitive info
- Be specific about what needs to change
- Provide alternative suggestions when flagging issues

### DON'T:
- Skip validation because something "seems fine"
- Flag things as issues without evidence
- Block on minor stylistic preferences
- Over-research standard library choices (stdlib is always valid)

### Validation Thresholds:

**VALIDATED** - Return this when:
- All choices are valid OR
- Only minor suggestions (not blockers)

**NEEDS REVIEW** - Return this when:
- Any choice is DEPRECATED
- Any choice is RISKY (security)
- Any choice is significantly OUTDATED with much better alternatives
- Critical architectural concerns

---

## Example Invocation

```
Task(
  subagent_type="general-purpose",
  model="haiku",
  prompt="""
  # Validate Agent

  [This entire SKILL.md content]

  ---

  ## Your Context

  ### Plan to Validate:
  [Full plan content or summary]

  ### Plan Path:
  thoughts/shared/plans/PLAN-feature-name.md

  ### Handoff Directory:
  thoughts/handoffs/<session>/

  ---

  Validate the tech choices and create your handoff.
  """
)
```

---

## Standard Library Note

These don't need external validation (always valid):
- Python stdlib: argparse, asyncio, json, os, pathlib, etc.
- Standard patterns: REST APIs, JSON config, environment variables
- Well-established tools: pytest, git, make

Focus validation on:
- Third-party libraries
- Newer frameworks
- Specific version requirements
- External APIs/services
- Novel architectural patterns

Related Skills

workflow-router

422

from vibeeval/vibecosystem

Goal-based workflow orchestration - routes tasks to specialist agents based on user goals

wiring

422

from vibeeval/vibecosystem

Wiring Verification

websocket-patterns

422

from vibeeval/vibecosystem

Connection management, room patterns, reconnection strategies, message buffering, and binary protocol design.

visual-verdict

422

from vibeeval/vibecosystem

Screenshot comparison QA for frontend development. Takes a screenshot of the current implementation, scores it across multiple visual dimensions, and returns a structured PASS/REVISE/FAIL verdict with concrete fixes. Use when implementing UI from a design reference or verifying visual correctness.

verification-loop

422

from vibeeval/vibecosystem

Comprehensive verification system covering build, types, lint, tests, security, and diff review before a PR.

vector-db-patterns

422

from vibeeval/vibecosystem

Embedding strategies, ANN algorithms, hybrid search, RAG chunking strategies, and reranking for semantic search and retrieval.

variant-analysis

422

from vibeeval/vibecosystem

Find similar vulnerabilities across a codebase after discovering one instance. Uses pattern matching, AST search, Semgrep/CodeQL queries, and manual tracing to propagate findings. Adapted from Trail of Bits. Use after finding a bug to check if the same pattern exists elsewhere.

tracing-patterns

422

from vibeeval/vibecosystem

OpenTelemetry setup, span context propagation, sampling strategies, Jaeger queries

tour

422

from vibeeval/vibecosystem

Friendly onboarding tour of Claude Code capabilities for users asking what it can do.

tldr-stats

422

from vibeeval/vibecosystem

Show full session token usage, costs, TLDR savings, and hook activity

tldr-router

422

from vibeeval/vibecosystem

Map code questions to the optimal tldr command by detecting intent and routing to the right analysis layer.

tldr-overview

422

from vibeeval/vibecosystem

Get a token-efficient overview of any project using file tree, code structure, and call graph analysis.