agent-architecture-evaluator
Use when evaluating, testing, and optimizing an agent architecture or multi-agent system. Best for reviewing planning, routing, memory, tool use, reliability, observability, cost, and system-level failure modes.
Best use case
agent-architecture-evaluator is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Use when evaluating, testing, and optimizing an agent architecture or multi-agent system. Best for reviewing planning, routing, memory, tool use, reliability, observability, cost, and system-level failure modes.
Teams using agent-architecture-evaluator should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/agent-architecture-evaluator/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How agent-architecture-evaluator Compares
| Feature / Agent | agent-architecture-evaluator | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Use when evaluating, testing, and optimizing an agent architecture or multi-agent system. Best for reviewing planning, routing, memory, tool use, reliability, observability, cost, and system-level failure modes.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Best AI Skills for ChatGPT
Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.
SKILL.md Source
# Agent Architecture Evaluator Version: `1.0.0` ## Overview This skill reviews the architecture of an agent system, not just its prompts or its attached skills. Use it for architectures involving components such as: - planner / executor splits - routers and specialists - tool-use layers - memory systems - human approval gates - multi-agent coordination ## Use this skill when - A user wants to assess an existing agent architecture. - Reliability, latency, cost, or coordination problems appear to be architectural. - A team needs a structured architecture review and optimization roadmap. - You need system-level test scenarios rather than single-skill evals. ## Do not use this skill when - The problem is one isolated skill. - The task is to create a new skill from scratch. - The main need is portfolio review across many related skills. Use `agent-test-measure-refine` or `agent-skill-portfolio-evaluator` in those cases. ## Output contract Always produce these named outputs: - `architecture_inventory` - `failure_mode_map` - `architecture_test_plan` - `optimization_roadmap` - `measurement_plan` - `architecture_recommendation` ## Review dimensions Evaluate at least these dimensions: 1. `component clarity` 2. `routing correctness` 3. `memory usefulness` 4. `coordination reliability` 5. `cost and latency efficiency` 6. `observability and debuggability` ## Quick start 1. Map the current architecture. 2. Identify critical paths and failure-prone handoffs. 3. Define architecture-level test scenarios. 4. Identify bottlenecks in routing, memory, tools, or coordination. 5. Recommend the smallest structural changes with the highest leverage. ## Workflow ### 1. Build the architecture inventory Capture: - components - responsibilities - inputs and outputs - state or memory boundaries - human approval points - observability signals ### 2. Map failure modes Look for: - planner produces unusable tasks - router sends work to the wrong specialist - memory pollutes current decisions - tool calls are slow, redundant, or poorly validated - multi-agent handoffs lose context - approval gates appear too late ### 3. Design system tests Cover: - happy path - degraded upstream input - partial component failure - tool unavailability - stale or noisy memory - high-latency coordination - rollback or recovery behavior See `references/architecture-review-framework-v1.0.0.md`. ### 4. Prioritize architectural changes Prefer: - clarifying responsibilities before adding components - removing weak indirection - tightening interface contracts - adding observability before adding complexity - isolating state when cross-contamination is likely ### 5. Define measurement Recommend concrete metrics where available: - task success rate - retry rate - fallback rate - cost per successful task - latency by stage - human intervention rate ## Anti-patterns - adding new components to hide unclear ownership - keeping weak memory because it sounds sophisticated - optimizing one stage without measuring system impact - blaming prompts for structural routing failures ## Resources - `references/architecture-review-framework-v1.0.0.md` for system review steps. - `references/optimization-patterns-v1.0.0.md` for architecture optimization guidance. - `assets/architecture-review-template.md` for the final report structure. - `assets/example-architecture-review.md` for a realistic filled review. - `assets/architecture-input-example.json` for structured input. - `scripts/render_architecture_review.py` to normalize a structured architecture review into Markdown.
Related Skills
Agent Memory Architecture
Complete zero-dependency memory system for AI agents — file-based architecture, daily notes, long-term curation, context management, heartbeat integration, and memory hygiene. No APIs, no databases, no external tools. Works with any agent framework.
project-evaluator
描述一个项目想法,AI 从市场/技术/商业/风险四个维度系统评估, 输出评估报告、竞品速查、MVP建议,帮你决策「值不值得做」。
langgraph-architecture
Guides architectural decisions for LangGraph applications. Use when deciding between LangGraph vs alternatives, choosing state management strategies, designing multi-agent systems, or selecting persistence and streaming approaches.
deepagents-architecture
Guides architectural decisions for Deep Agents applications. Use when deciding between Deep Agents vs alternatives, choosing backend strategies, designing subagent systems, or selecting middleware approaches.
agent-architecture-analysis
Perform 12-Factor Agents compliance analysis on any codebase. Use when evaluating agent architecture, reviewing LLM-powered systems, or auditing agentic applications against the 12-Factor methodology.
tech-stack-evaluator
Technology stack evaluation and comparison with TCO analysis, security assessment, and ecosystem health scoring. Use when comparing frameworks, evaluating technology stacks, calculating total cost of ownership, assessing migration paths, or analyzing ecosystem viability.
site-architecture
When the user wants to audit, redesign, or plan their website's structure, URL hierarchy, navigation design, or internal linking strategy. Use when the user mentions 'site architecture,' 'URL structure,' 'internal links,' 'site navigation,' 'breadcrumbs,' 'topic clusters,' 'hub pages,' 'orphan pages,' 'silo structure,' 'information architecture,' or 'website reorganization.' Also use when someone has SEO problems and the root cause is structural (not content or schema). NOT for content strategy decisions about what to write (use content-strategy) or for schema markup (use schema-markup).
llm-evaluator
LLM-as-a-Judge evaluation system using Langfuse. Score AI outputs on relevance, accuracy, hallucination, and helpfulness. Backfill scoring on historical traces. Uses GPT-5-nano for cost-efficient judging. Use when evaluating AI quality, building evals, or monitoring output accuracy.
architecture-governance-assessment
Architecture governance and assessment tool. Evaluate cloud architectures against best practices and generate actionable improvement reports.
react-flow-architecture
Architectural guidance for building node-based UIs with React Flow. Use when designing flow-based applications, making decisions about state management, integration patterns, or evaluating whether React Flow fits a use case.
---
name: article-factory-wechat
humanizer
Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.