validator-expert

Validate production readiness of Vertex AI Agent Engine deployments across security, monitoring, performance, compliance, and best practices. Generates weighted scores (0-100%) with actionable remediation plans. Use when asked to validate a deployment, run a production readiness check, audit security posture, or verify compliance for Vertex AI agents. Trigger with "validate deployment", "production readiness", "security audit", "compliance check", "is this agent ready for prod", "check my ADK agent", "review before deploy", or "production readiness check". Make sure to use this skill whenever validating ADK agents for Agent Engine.

25 stars

Best use case

validator-expert is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Validate production readiness of Vertex AI Agent Engine deployments across security, monitoring, performance, compliance, and best practices. Generates weighted scores (0-100%) with actionable remediation plans. Use when asked to validate a deployment, run a production readiness check, audit security posture, or verify compliance for Vertex AI agents. Trigger with "validate deployment", "production readiness", "security audit", "compliance check", "is this agent ready for prod", "check my ADK agent", "review before deploy", or "production readiness check". Make sure to use this skill whenever validating ADK agents for Agent Engine.

Teams using validator-expert should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/validator-expert/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/jeremylongshore/claude-code-plugins-plus-skills/validator-expert/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/validator-expert/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How validator-expert Compares

Feature / Agentvalidator-expertStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Validate production readiness of Vertex AI Agent Engine deployments across security, monitoring, performance, compliance, and best practices. Generates weighted scores (0-100%) with actionable remediation plans. Use when asked to validate a deployment, run a production readiness check, audit security posture, or verify compliance for Vertex AI agents. Trigger with "validate deployment", "production readiness", "security audit", "compliance check", "is this agent ready for prod", "check my ADK agent", "review before deploy", or "production readiness check". Make sure to use this skill whenever validating ADK agents for Agent Engine.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Validator Expert

## Current State
!`gcloud config get-value project 2>/dev/null || echo 'no active project'`
!`gcloud auth list --filter=status:ACTIVE --format="value(account)" 2>/dev/null || echo 'not authenticated'`

## Overview

Validate production readiness of Vertex AI Agent Engine deployments by executing weighted checks across five categories: security (30 points), monitoring (20 points), performance (25 points), compliance (15 points), and best practices (10 points). This skill produces a 0-100% composite score with pass/fail per check and prioritized remediation recommendations.

## Prerequisites

- `gcloud` CLI authenticated with `roles/aiplatform.viewer`, `roles/iam.securityReviewer`, and `roles/monitoring.viewer`
- Access to the target Google Cloud project and Vertex AI Agent Engine deployment
- Cloud Monitoring API and Cloud Logging API enabled in the project
- Knowledge of the deployment's expected SLOs (latency targets, error rate thresholds)
- Read-only access to IAM policies, VPC-SC configurations, and service account bindings

## Instructions

1. Retrieve the deployment configuration using the Python SDK (`vertexai.Client().agent_engines.get(name)`) or REST API (`GET https://{LOCATION}-aiplatform.googleapis.com/v1/projects/{PROJECT}/locations/{LOCATION}/reasoningEngines/{ID}`) and parse model, scaling, and feature settings
2. Run the security validation suite (see [security checklist](references/security-checklist.md)):
   - Check if Agent Identity is enabled (recommended over service accounts for 2025+ deployments)
   - If using service accounts, verify IAM roles follow least-privilege (`roles/aiplatform.expressUser`, not `roles/aiplatform.admin`)
   - Confirm VPC Service Controls perimeter is active and correctly scoped
   - Check encryption at rest (CMEK or Google-managed) and in-transit (TLS 1.3)
   - Scan configuration files and environment variables for hardcoded secrets
   - Validate Model Armor is enabled with `roles/modelarmor.user` granted
   - Check Memory Bank IAM Conditions for multi-tenant agents
3. Run the monitoring validation suite:
   - Verify Cloud Monitoring dashboards exist with required panels (request count, error rate, latency)
   - Confirm alerting policies cover error rate spikes, latency SLO breaches, and cost thresholds
   - Check token usage tracking is enabled with per-model granularity
   - Validate structured logging with severity levels and correlation IDs
   - Confirm latency SLOs are defined with p95 and p99 targets
4. Run the performance validation suite:
   - Verify auto-scaling is configured with appropriate min/max instance counts
   - Check resource limits (CPU, memory) match expected workload profile
   - Confirm caching strategy is implemented for repeated prompts or embeddings
   - Validate Code Execution Sandbox TTL is set between 7-14 days
   - Check Memory Bank retention policy (min 100 memories, auto-cleanup enabled)
5. Run the compliance validation suite:
   - Confirm audit logging is enabled for all admin and data access operations
   - Verify data residency meets regional requirements
   - Check privacy policies and data retention schedules
   - Validate backup and disaster recovery configuration
6. Calculate weighted scores per category and compute the overall production readiness percentage
7. Generate a prioritized recommendation list sorted by score impact per remediation effort

## Output

- Production readiness score: 0-100% with status (READY >= 85%, NEEDS WORK 70-84%, NOT READY < 70%)
- Per-category breakdown: security (x/30), monitoring (x/20), performance (x/25), compliance (x/15), best practices (x/10)
- Pass/fail table for each individual check with evidence notes
- Prioritized remediation plan: action items ranked by score improvement per effort
- Comparison to previous validation run (if available) showing score delta

## Error Handling

| Error | Cause | Solution |
|-------|-------|----------|
| Insufficient IAM permissions | Viewer roles not granted on target project | Request `roles/aiplatform.viewer` and `roles/iam.securityReviewer` from project admin |
| Agent deployment not found | Incorrect agent ID or deployment deleted | Verify agent ID with `vertexai.Client().agent_engines.list()` or REST `GET .../reasoningEngines`; confirm deployment region |
| Monitoring API returns no data | API not enabled or agent has zero traffic | Enable Monitoring API; generate synthetic traffic to populate baseline metrics |
| VPC-SC configuration inaccessible | Organization policy restricts VPC-SC reads | Request `roles/accesscontextmanager.policyReader` at organization level |
| Compliance check inconclusive | Audit logs not enabled or retention too short | Enable Data Access audit logs; set log retention to minimum 365 days |

## Examples

**Scenario 1: Pre-Launch Validation** -- Validate a new ADK agent before production launch. Run all five validation categories. Target score: 85%+ overall, with security score at 28/30 minimum. Generate remediation plan for any failing checks.

**Scenario 2: Post-Incident Security Audit** -- After a permission escalation incident, re-validate security posture. Focus on IAM least-privilege, service account bindings, and VPC-SC perimeter integrity. Compare scores against the last passing validation.

**Scenario 3: Quarterly Compliance Review** -- Execute compliance and monitoring validation suites for SOC 2 audit preparation. Verify audit logging coverage, data residency compliance, and backup/DR configuration. Export results as evidence artifacts.

## Resources

**Validation checklists** (read the relevant one during each validation step):
- [Security checklist](references/security-checklist.md) — IAM, VPC-SC, encryption, Model Armor (30% weight)
- [Monitoring checklist](references/monitoring-checklist.md) — dashboards, alerts, SLOs, logging (20% weight)
- [Performance & compliance checklist](references/performance-compliance-checklist.md) — auto-scaling, caching, audit logs, DR (40% weight)

**Official Google Cloud documentation:**
- [Vertex AI Security Best Practices](https://cloud.google.com/vertex-ai/docs/security)
- [Cloud Monitoring Alerting](https://cloud.google.com/monitoring/alerts)
- [VPC Service Controls](https://cloud.google.com/vpc-service-controls/docs)
- [Model Armor](https://cloud.google.com/vertex-ai/docs/generative-ai/model-armor)
- [Cloud Audit Logs](https://cloud.google.com/logging/docs/audit)

Related Skills

yaml-config-validator

25
from ComeOnOliver/skillshub

Yaml Config Validator - Auto-activating skill for DevOps Basics. Triggers on: yaml config validator, yaml config validator Part of the DevOps Basics skill category.

webhook-signature-validator

25
from ComeOnOliver/skillshub

Webhook Signature Validator - Auto-activating skill for API Integration. Triggers on: webhook signature validator, webhook signature validator Part of the API Integration skill category.

vertex-infra-expert

25
from ComeOnOliver/skillshub

Terraform infrastructure specialist for Vertex AI services and Gemini deployments. Provisions Model Garden, endpoints, vector search, pipelines, and enterprise AI infrastructure. Triggers: "vertex ai terraform", "gemini deployment terraform", "model garden infrastructure", "vertex ai endpoints"

schema-validator

25
from ComeOnOliver/skillshub

Schema Validator - Auto-activating skill for Data Pipelines. Triggers on: schema validator, schema validator Part of the Data Pipelines skill category.

request-validator-generator

25
from ComeOnOliver/skillshub

Request Validator Generator - Auto-activating skill for Backend Development. Triggers on: request validator generator, request validator generator Part of the Backend Development skill category.

request-body-validator

25
from ComeOnOliver/skillshub

Request Body Validator - Auto-activating skill for API Development. Triggers on: request body validator, request body validator Part of the API Development skill category.

plugin-validator

25
from ComeOnOliver/skillshub

Validate automatically validates AI assistant code plugin structure, schemas, and compliance when user mentions validate plugin, check plugin, or plugin errors. runs comprehensive validation specific to AI assistant-code-plugins repository standards. Use when validating configurations or code. Trigger with phrases like 'validate', 'check', or 'verify'.

jwt-token-validator

25
from ComeOnOliver/skillshub

Jwt Token Validator - Auto-activating skill for Security Fundamentals. Triggers on: jwt token validator, jwt token validator Part of the Security Fundamentals skill category.

gh-actions-validator

25
from ComeOnOliver/skillshub

Automatically validates and enforces GitHub Actions best practices for Vertex AI and Google Cloud deployments. Expert in Workload Identity Federation (WIF), Vertex AI Agent Engine deployment pipelines, security validation, and CI/CD automation. Triggers: "create github actions", "deploy vertex ai", "setup wif", "validate github workflow", "gcp deployment pipeline"

genkit-production-expert

25
from ComeOnOliver/skillshub

Build production Firebase Genkit applications including RAG systems, multi-step flows, and tool calling for Node.js/Python/Go. Deploy to Firebase Functions or Cloud Run with AI monitoring. Use when asked to "create genkit flow" or "implement RAG". Trigger with relevant phrases based on skill purpose.

genkit-infra-expert

25
from ComeOnOliver/skillshub

Terraform infrastructure specialist for deploying Genkit applications to production. Provisions Firebase Functions, Cloud Run services, GKE clusters, monitoring, and CI/CD for Genkit AI workflows. Triggers: "deploy genkit terraform", "genkit infrastructure", "firebase functions terraform", "cloud run genkit"

gcp-examples-expert

25
from ComeOnOliver/skillshub

Generate production-ready Google Cloud code examples from official repositories including ADK samples, Genkit templates, Vertex AI notebooks, and Gemini patterns. Use when asked to "show ADK example" or "provide GCP starter kit". Trigger with relevant phrases based on skill purpose.