agent-architecture-analysis

Perform 12-Factor Agents compliance analysis on any codebase. Use when evaluating agent architecture, reviewing LLM-powered systems, or auditing agentic applications against the 12-Factor methodology.

3,891 stars

byopenclaw

View on GitHub Installation ↓

Best use case

agent-architecture-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using agent-architecture-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/agent-architecture-analysis/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/anderskev/agent-architecture-analysis/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/agent-architecture-analysis/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How agent-architecture-analysis Compares

Feature / Agent	agent-architecture-analysis	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for ChatGPT

Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.

SKILL.md Source

# 12-Factor Agents Compliance Analysis

> Reference: [12-Factor Agents](https://github.com/humanlayer/12-factor-agents)

## Input Parameters

| Parameter | Description | Required |
|-----------|-------------|----------|
| `docs_path` | Path to documentation directory (for existing analyses) | Optional |
| `codebase_path` | Root path of the codebase to analyze | Required |

## Analysis Framework

### Factor 1: Natural Language to Tool Calls

**Principle:** Convert natural language inputs into structured, deterministic tool calls using schema-validated outputs.

**Search Patterns:**
```bash
# Look for Pydantic schemas
grep -r "class.*BaseModel" --include="*.py"
grep -r "TaskDAG\|TaskResponse\|ToolCall" --include="*.py"

# Look for JSON schema generation
grep -r "model_json_schema\|json_schema" --include="*.py"

# Look for structured output generation
grep -r "output_type\|response_model" --include="*.py"
```

**File Patterns:** `**/agents/*.py`, `**/schemas/*.py`, `**/models/*.py`

**Compliance Criteria:**

| Level | Criteria |
|-------|----------|
| **Strong** | All LLM outputs use Pydantic/dataclass schemas with validators |
| **Partial** | Some outputs typed, but dict returns or unvalidated strings exist |
| **Weak** | LLM returns raw strings parsed manually or with regex |

**Anti-patterns:**
- `json.loads(llm_response)` without schema validation
- `output.split()` or regex parsing of LLM responses
- `dict[str, Any]` return types from agents
- No validation between LLM output and handler execution

---

### Factor 2: Own Your Prompts

**Principle:** Treat prompts as first-class code you control, version, and iterate on.

**Search Patterns:**
```bash
# Look for embedded prompts
grep -r "SYSTEM_PROMPT\|system_prompt" --include="*.py"
grep -r '""".*You are' --include="*.py"

# Look for template systems
grep -r "jinja\|Jinja\|render_template" --include="*.py"
find . -name "*.jinja2" -o -name "*.j2"

# Look for prompt directories
find . -type d -name "prompts"
```

**File Patterns:** `**/prompts/**`, `**/templates/**`, `**/agents/*.py`

**Compliance Criteria:**

| Level | Criteria |
|-------|----------|
| **Strong** | Prompts in separate files, templated (Jinja2), versioned |
| **Partial** | Prompts as module constants, some parameterization |
| **Weak** | Prompts hardcoded inline in functions, f-strings only |

**Anti-patterns:**
- `f"You are a {role}..."` inline in agent methods
- Prompts mixed with business logic
- No way to iterate on prompts without code changes
- No prompt versioning or A/B testing capability

---

### Factor 3: Own Your Context Window

**Principle:** Control how history, state, and tool results are formatted for the LLM.

**Search Patterns:**
```bash
# Look for context/message management
grep -r "AgentMessage\|ChatMessage\|messages" --include="*.py"
grep -r "context_window\|context_compiler" --include="*.py"

# Look for custom serialization
grep -r "to_xml\|to_context\|serialize" --include="*.py"

# Look for token management
grep -r "token_count\|max_tokens\|truncate" --include="*.py"
```

**File Patterns:** `**/context/*.py`, `**/state/*.py`, `**/core/*.py`

**Compliance Criteria:**

| Level | Criteria |
|-------|----------|
| **Strong** | Custom context format, token optimization, typed events, compaction |
| **Partial** | Basic message history with some structure |
| **Weak** | Raw message accumulation, standard OpenAI format only |

**Anti-patterns:**
- Unbounded message accumulation
- Large artifacts embedded inline (diffs, files)
- No agent-specific context filtering
- Same context for all agent types

---

### Factor 4: Tools Are Structured Outputs

**Principle:** Tools produce schema-validated JSON that triggers deterministic code, not magic function calls.

**Search Patterns:**
```bash
# Look for tool/response schemas
grep -r "class.*Response.*BaseModel" --include="*.py"
grep -r "ToolResult\|ToolOutput" --include="*.py"

# Look for deterministic handlers
grep -r "def handle_\|def execute_" --include="*.py"

# Look for validation layer
grep -r "model_validate\|parse_obj" --include="*.py"
```

**File Patterns:** `**/tools/*.py`, `**/handlers/*.py`, `**/agents/*.py`

**Compliance Criteria:**

| Level | Criteria |
|-------|----------|
| **Strong** | All tool outputs schema-validated, handlers type-safe |
| **Partial** | Most tools typed, some loose dict returns |
| **Weak** | Tools return arbitrary dicts, no validation layer |

**Anti-patterns:**
- Tool handlers that directly execute LLM output
- `eval()` or `exec()` on LLM-generated code
- No separation between decision (LLM) and execution (code)
- Magic method dispatch based on string matching

---

### Factor 5: Unify Execution State

**Principle:** Merge execution state (step, retries) with business state (messages, results).

**Search Patterns:**
```bash
# Look for state models
grep -r "ExecutionState\|WorkflowState\|Thread" --include="*.py"

# Look for dual state systems
grep -r "checkpoint\|MemorySaver" --include="*.py"
grep -r "sqlite\|database\|repository" --include="*.py"

# Look for state reconstruction
grep -r "load_state\|restore\|reconstruct" --include="*.py"
```

**File Patterns:** `**/state/*.py`, `**/models/*.py`, `**/database/*.py`

**Compliance Criteria:**

| Level | Criteria |
|-------|----------|
| **Strong** | Single serializable state object with all execution metadata |
| **Partial** | State exists but split across systems (memory + DB) |
| **Weak** | Execution state scattered, requires multiple queries to reconstruct |

**Anti-patterns:**
- Retry count stored separately from task state
- Error history in logs but not in state
- LangGraph checkpoints + separate database storage
- No unified event thread

---

### Factor 6: Launch/Pause/Resume

**Principle:** Agents support simple APIs for launching, pausing at any point, and resuming.

**Search Patterns:**
```bash
# Look for REST endpoints
grep -r "@router.post\|@app.post" --include="*.py"
grep -r "start_workflow\|pause\|resume" --include="*.py"

# Look for interrupt mechanisms
grep -r "interrupt_before\|interrupt_after" --include="*.py"

# Look for webhook handlers
grep -r "webhook\|callback" --include="*.py"
```

**File Patterns:** `**/routes/*.py`, `**/api/*.py`, `**/orchestrator/*.py`

**Compliance Criteria:**

| Level | Criteria |
|-------|----------|
| **Strong** | REST API + webhook resume, pause at any point including mid-tool |
| **Partial** | Launch/pause/resume exists but only at coarse-grained points |
| **Weak** | CLI-only launch, no pause/resume capability |

**Anti-patterns:**
- Blocking `input()` or `confirm()` calls
- No way to resume after process restart
- Approval only at plan level, not per-tool
- No webhook-based resume from external systems

---

### Factor 7: Contact Humans with Tools

**Principle:** Human contact is a tool call with question, options, and urgency.

**Search Patterns:**
```bash
# Look for human input mechanisms
grep -r "typer.confirm\|input(\|prompt(" --include="*.py"
grep -r "request_human_input\|human_contact" --include="*.py"

# Look for approval patterns
grep -r "approval\|approve\|reject" --include="*.py"

# Look for structured question formats
grep -r "question.*options\|HumanInputRequest" --include="*.py"
```

**File Patterns:** `**/agents/*.py`, `**/tools/*.py`, `**/orchestrator/*.py`

**Compliance Criteria:**

| Level | Criteria |
|-------|----------|
| **Strong** | `request_human_input` tool with question/options/urgency/format |
| **Partial** | Approval gates exist but hardcoded in graph structure |
| **Weak** | Blocking CLI prompts, no tool-based human contact |

**Anti-patterns:**
- `typer.confirm()` in agent code
- Human contact hardcoded at specific graph nodes
- No way for agents to ask clarifying questions
- Single response format (yes/no only)

---

### Factor 8: Own Your Control Flow

**Principle:** Custom control flow, not framework defaults. Full control over routing, retries, compaction.

**Search Patterns:**
```bash
# Look for routing logic
grep -r "add_conditional_edges\|route_\|should_continue" --include="*.py"

# Look for custom loops
grep -r "while True\|for.*in.*range" --include="*.py" | grep -v test

# Look for execution mode control
grep -r "execution_mode\|agentic\|structured" --include="*.py"
```

**File Patterns:** `**/orchestrator/*.py`, `**/graph/*.py`, `**/core/*.py`

**Compliance Criteria:**

| Level | Criteria |
|-------|----------|
| **Strong** | Custom routing functions, conditional edges, execution mode control |
| **Partial** | Framework control flow with some customization |
| **Weak** | Default framework loop with no custom routing |

**Anti-patterns:**
- Single path through graph with no branching
- No distinction between tool types (all treated same)
- Framework-default error handling only
- No rate limiting or resource management

---

### Factor 9: Compact Errors into Context

**Principle:** Errors in context enable self-healing. Track consecutive errors, escalate after threshold.

**Search Patterns:**
```bash
# Look for error handling
grep -r "except.*Exception\|error_history\|consecutive_errors" --include="*.py"

# Look for retry logic
grep -r "retry\|backoff\|max_attempts" --include="*.py"

# Look for escalation
grep -r "escalate\|human_escalation" --include="*.py"
```

**File Patterns:** `**/agents/*.py`, `**/orchestrator/*.py`, `**/core/*.py`

**Compliance Criteria:**

| Level | Criteria |
|-------|----------|
| **Strong** | Errors in context, retry with threshold, automatic escalation |
| **Partial** | Errors logged and returned, no automatic retry loop |
| **Weak** | Errors logged only, not fed back to LLM, task fails immediately |

**Anti-patterns:**
- `logger.error()` without adding to context
- No retry mechanism (fail immediately)
- No consecutive error tracking
- No escalation to humans after repeated failures

---

### Factor 10: Small, Focused Agents

**Principle:** Each agent has narrow responsibility, 3-10 steps max.

**Search Patterns:**
```bash
# Look for agent classes
grep -r "class.*Agent\|class.*Architect\|class.*Developer" --include="*.py"

# Look for step definitions
grep -r "steps\|tasks" --include="*.py" | head -20

# Count methods per agent
grep -r "async def\|def " agents/*.py 2>/dev/null | wc -l
```

**File Patterns:** `**/agents/*.py`

**Compliance Criteria:**

| Level | Criteria |
|-------|----------|
| **Strong** | 3+ specialized agents, each with single responsibility, step limits |
| **Partial** | Multiple agents but some have broad scope |
| **Weak** | Single "god" agent that handles everything |

**Anti-patterns:**
- Single agent with 20+ tools
- Agent with unbounded step count
- Mixed responsibilities (planning + execution + review)
- No step or time limits on agent execution

---

### Factor 11: Trigger from Anywhere

**Principle:** Workflows triggerable from CLI, REST, WebSocket, Slack, webhooks, etc.

**Search Patterns:**
```bash
# Look for entry points
grep -r "@cli.command\|@router.post\|@app.post" --include="*.py"

# Look for WebSocket support
grep -r "WebSocket\|websocket" --include="*.py"

# Look for external integrations
grep -r "slack\|discord\|webhook" --include="*.py" -i
```

**File Patterns:** `**/routes/*.py`, `**/cli/*.py`, `**/main.py`

**Compliance Criteria:**

| Level | Criteria |
|-------|----------|
| **Strong** | CLI + REST + WebSocket + webhooks + chat integrations |
| **Partial** | CLI + REST API available |
| **Weak** | CLI only, no programmatic access |

**Anti-patterns:**
- Only `if __name__ == "__main__"` entry point
- No REST API for external systems
- No event streaming for real-time updates
- Trigger logic tightly coupled to execution

---

### Factor 12: Stateless Reducer

**Principle:** Agents as pure functions: (state, input) -> (state, output). No side effects in agent logic.

**Search Patterns:**
```bash
# Look for state mutation patterns
grep -r "\.status = \|\.field = " --include="*.py"

# Look for immutable updates
grep -r "model_copy\|\.copy(\|with_" --include="*.py"

# Look for side effects in agents
grep -r "write_file\|subprocess\|requests\." agents/*.py 2>/dev/null
```

**File Patterns:** `**/agents/*.py`, `**/nodes/*.py`

**Compliance Criteria:**

| Level | Criteria |
|-------|----------|
| **Strong** | Immutable state updates, side effects isolated to tools/handlers |
| **Partial** | Mostly immutable, some in-place mutations |
| **Weak** | State mutated in place, side effects mixed with agent logic |

**Anti-patterns:**
- `state.field = new_value` (mutation)
- File writes inside agent methods
- HTTP calls inside agent decision logic
- Shared mutable state between agents

---

### Factor 13: Pre-fetch Context

**Principle:** Fetch likely-needed data upfront rather than mid-workflow.

**Search Patterns:**
```bash
# Look for context pre-fetching
grep -r "pre_fetch\|prefetch\|fetch_context" --include="*.py"

# Look for RAG/embedding systems
grep -r "embedding\|vector\|semantic_search" --include="*.py"

# Look for related file discovery
grep -r "related_tests\|similar_\|find_relevant" --include="*.py"
```

**File Patterns:** `**/context/*.py`, `**/retrieval/*.py`, `**/rag/*.py`

**Compliance Criteria:**

| Level | Criteria |
|-------|----------|
| **Strong** | Automatic pre-fetch of related tests, files, docs before planning |
| **Partial** | Manual context passing, design doc support |
| **Weak** | No pre-fetching, LLM must request all context via tools |

**Anti-patterns:**
- Architect starts with issue only, no codebase context
- No semantic search for similar past work
- Related tests/files discovered only during execution
- No RAG or document retrieval system

---

## Output Format

### Executive Summary Table

```markdown
| Factor | Status | Notes |
|--------|--------|-------|
| 1. Natural Language -> Tool Calls | **Strong/Partial/Weak** | [Key finding] |
| 2. Own Your Prompts | **Strong/Partial/Weak** | [Key finding] |
| ... | ... | ... |
| 13. Pre-fetch Context | **Strong/Partial/Weak** | [Key finding] |

**Overall**: X Strong, Y Partial, Z Weak
```

### Per-Factor Analysis

For each factor, provide:

1. **Current Implementation**
   - Evidence with file:line references
   - Code snippets showing patterns

2. **Compliance Level**
   - Strong/Partial/Weak with justification

3. **Gaps**
   - What's missing vs. 12-Factor ideal

4. **Recommendations**
   - Actionable improvements with code examples

---

## Analysis Workflow

1. **Initial Scan**
   - Run search patterns for all factors
   - Identify key files for each factor
   - Note any existing compliance documentation

2. **Deep Dive** (per factor)
   - Read identified files
   - Evaluate against compliance criteria
   - Document evidence with file paths

3. **Gap Analysis**
   - Compare current vs. 12-Factor ideal
   - Identify anti-patterns present
   - Prioritize by impact

4. **Recommendations**
   - Provide actionable improvements
   - Include before/after code examples
   - Reference roadmap if exists

5. **Summary**
   - Compile executive summary table
   - Highlight strengths and critical gaps
   - Suggest priority order for improvements

---

## Quick Reference: Compliance Scoring

| Score | Meaning | Action |
|-------|---------|--------|
| **Strong** | Fully implements principle | Maintain, minor optimizations |
| **Partial** | Some implementation, significant gaps | Planned improvements |
| **Weak** | Minimal or no implementation | High priority for roadmap |

## When to Use This Skill

- Evaluating new LLM-powered systems
- Reviewing agent architecture decisions
- Auditing production agentic applications
- Planning improvements to existing agents
- Comparing frameworks or implementations

Related Skills

Margin Analysis & Profit Optimization

3891

from openclaw/skills

Analyze gross, operating, and net margins by product line, customer segment, and channel. Identify margin erosion patterns and build pricing power.

Business Analysis

Investment Analysis & Portfolio Management Engine

3891

from openclaw/skills

Complete investment analysis, portfolio construction, risk management, and trade execution methodology. Works across stocks, crypto, ETFs, bonds, and alternatives. Zero dependencies — pure agent skill.

Finance & Investing

FP&A Command Center — Financial Planning & Analysis Engine

3891

from openclaw/skills

You are a senior FP&A professional. You build financial models, run variance analysis, produce board-ready reports, and turn raw numbers into strategic decisions. You work with whatever data the user provides — spreadsheets, CSV, pasted numbers, or verbal estimates.

Finance & Analytics

Agent Memory Architecture

3891

from openclaw/skills

Complete zero-dependency memory system for AI agents — file-based architecture, daily notes, long-term curation, context management, heartbeat integration, and memory hygiene. No APIs, no databases, no external tools. Works with any agent framework.

data-analysis-partner

3891

from openclaw/skills

智能数据分析 Skill，输入 CSV/Excel 文件和分析需求，输出带交互式 ECharts 图表的 HTML 自包含分析报告

Data & Research

onchain-contract-token-analysis

3891

from openclaw/skills

Analyze smart contracts, token mechanics, permissions, fee flows, upgradeability, market risks, and likely attack surfaces for onchain projects. Use when reviewing ERC-20s, launchpads, vaults, staking systems, LP fee routing, ownership controls, proxy setups, or suspicious token behavior.

Security

resume-analysis

3891

from openclaw/skills

简历分析 skill。用于诊断整份简历的完整性、清晰度、岗位相关性、成果表达和结构质量。当用户说“分析简历”“看看我的简历”“简历诊断”时使用。

Workflow & Productivity

contradiction-analysis

3891

from openclaw/skills

触发：当问题复杂、存在多个冲突因素、优先级不清，或你不知道应该先解决什么时调用；常见信号包括 trade-off、瓶颈、根因不明、主次不清、多个问题互相牵制。 English: Trigger when a problem contains competing forces, unclear priorities, or no obvious entry point. Use this skill to identify contradictions, isolate the principal contradiction, classify its nature, and choose the right response.

survey-analysis

3891

from openclaw/skills

AI-powered survey response analysis. Analyzes open-ended survey responses, clusters themes, detects sentiment, and generates actionable insights. Uses BERTopic + GPT-4o-mini.

ths-advanced-analysis

3891

from openclaw/skills

基于 thsdk 进行高级股票分析：分钟K线（1m/5m/15m/30m/60m/120m）、板块/指数行情（主要指数/申万行业/概念板块成分股）、多股票批量对比（表格+归一化走势图+相关性热力图）、盘口深度、大单流向、集合竞价异动、日内分时、历史分时。当用户提到"分钟K线"、"日内走势"、"盘口"、"大单"、"竞价异动"、"板块行情"、"行业排名"、"概念板块"、"成分股"、"对比多只股票"、"批量分析"、"涨幅对比"、"相关性"、"港股"、"美股"、"外汇"、"期货"、"资讯"、"快讯"，或者需要同时查看2只以上股票、关注短线交易、量化研究时，必须使用此skill。

ad-creative-analysis

3891

from openclaw/skills

Analyze ad creatives (images and videos) extracted from competitor research. Use when given a directory of ad images, video files, or transcripts to evaluate ad quality, score visual and messaging effectiveness, assign a scale score for viral/engagement potential, and generate a cross-creative pattern summary. Triggered by requests like "analyze these ads", "score these creatives", "what hooks are competitors using", "evaluate the ad library", "give me a scale score", "analyze the ad folder", or "what's working in these ads".

Amazon Listing Optimizer — Free Listing Analysis & Keyword Research

3891

from openclaw/skills

**Free alternative to Helium 10 ($97/mo) and Jungle Scout ($49/mo).**