architecture-discipline

Use when designing/modifying system architecture or evaluating technology choices. Enforces 7-section TodoWrite with 22+ items. Triggers: "design architecture", "system design", "architectural decision", "should we use [tech]", "compare [A] vs [B]", "add new service", "microservices", "database choice", "API design", "scale to [X] users", "infrastructure decision". If thinking ANY of these, USE THIS SKILL: "quick recommendation is fine", "obvious choice", "we already know the answer", "just need to pick one", "simple architecture question".

16 stars

Best use case

architecture-discipline is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Use when designing/modifying system architecture or evaluating technology choices. Enforces 7-section TodoWrite with 22+ items. Triggers: "design architecture", "system design", "architectural decision", "should we use [tech]", "compare [A] vs [B]", "add new service", "microservices", "database choice", "API design", "scale to [X] users", "infrastructure decision". If thinking ANY of these, USE THIS SKILL: "quick recommendation is fine", "obvious choice", "we already know the answer", "just need to pick one", "simple architecture question".

Teams using architecture-discipline should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/architecture-discipline/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/backend/architecture-discipline/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/architecture-discipline/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How architecture-discipline Compares

Feature / Agentarchitecture-disciplineStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Use when designing/modifying system architecture or evaluating technology choices. Enforces 7-section TodoWrite with 22+ items. Triggers: "design architecture", "system design", "architectural decision", "should we use [tech]", "compare [A] vs [B]", "add new service", "microservices", "database choice", "API design", "scale to [X] users", "infrastructure decision". If thinking ANY of these, USE THIS SKILL: "quick recommendation is fine", "obvious choice", "we already know the answer", "just need to pick one", "simple architecture question".

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Architecture Discipline

## When to Use

**Use when decisions affect:**
- Data models or schema changes
- Service boundaries or new services
- Deployment topology or infrastructure
- Scale characteristics (10x growth implications)
- Technology stack choices
- API contracts or external integrations

**Skip for:**
- Parameter tweaks or configuration changes
- UI/styling changes
- Localization or copy changes
- Bug fixes within existing architecture
- Adding fields to existing models (unless schema migration)

**Threshold:** If the decision could cause a 3+ month re-architecture project if wrong, use this skill.

## CRITICAL: This Is Reasoning Discipline, Not a Checklist

The 7 sections are a **reasoning sequence**, not boxes to check:
- Alternatives BEFORE choosing
- Scale requirements BEFORE design
- Failure modes built in, not bolted on

🚨 **If you wrote architecture without starting with all 7 sections:** DELETE and restart. Retrofitting analysis is rationalization, not evaluation.

---

## MANDATORY FIRST STEP

**CREATE TodoWrite** with these 7 sections (22+ items total):

| Section | Minimum Items |
|---------|---------------|
| Scale Analysis | 4+ |
| Architectural Options | 3+ |
| Ripple Effect Analysis | 5+ |
| Failure Modes | 3+ |
| Observability | 3+ |
| Documentation | 2+ |
| Migration/Compatibility | 2+ |

**Do not design, propose solutions, or implement until TodoWrite is verified.**

---

## Verification Checkpoint

After creating TodoWrite, verify 3 random items pass this test:

**Each item must have ALL THREE:**
- ✓ Concrete numbers/thresholds ("100K users", "$500/mo", "P95 < 500ms")
- ✓ Specific tools/technologies ("PostgreSQL", "Redis", "CloudWatch")
- ✓ Measurable outcome ("handles 1M req/sec", "costs $X at 10x")

| ❌ FAILS | ✅ PASSES |
|----------|-----------|
| "Add monitoring" | "CloudWatch: `websocket.connections.active`, alert if >5% error rate via PagerDuty" |
| "Evaluate caching" | "Compare: Redis (1ms, $300/mo) vs In-memory LRU (0.1ms, $0) vs No cache (100ms)" |
| "Analyze scale" | "Current: 100K DAU, 50 req/sec. 10x: 1M users, 500 req/sec. Bottleneck: PostgreSQL connection pool" |

**DO NOT PROCEED until 22+ items AND quality check passes.**

---

## Section Requirements

### 1. Scale Analysis (4+ items)

**NEVER design for current scale only.** Before proposing any solution:
- Current scale: Users (DAU/MAU), requests/sec, data volume, read/write ratio
- 10x scale: What numbers at 10x? When expected?
- Bottlenecks: What breaks at 10x? (DB connections, API limits, memory)
- Mitigation: Specific solution for each bottleneck

### 2. Architectural Options (3+ items)

**NEVER present single solution.** Minimum 3 distinct options, each with:
- Performance: Latency (P50/P95/P99), throughput, scale limit
- Complexity: LOC estimate, services involved, operational burden
- Cost: Infrastructure ($X/mo current, $Y/mo at 10x), development (engineer-weeks)
- Trade-offs: Specific advantages (✅) and disadvantages (❌)

**If stakeholder suggests solution:** Add as Option A, evaluate with SAME rigor as alternatives.

### 3. Ripple Effect Analysis (5+ items)

Changes propagate across layers. Analyze ALL:
- Data layer: Schema changes, migrations, indexes, query performance
- Services: Which need updates? API contracts changed?
- API: Breaking changes? Version bump? Backward compatibility?
- Clients: Mobile updates? Web UI changes?
- Operations: Deployment changes? New monitoring? Cost changes?

### 4. Failure Modes (3+ items)

For each mode:
- Scenario: [Component] fails because [reason]
- Detection: How we know (metrics drop, error rate spike)
- Impact: What breaks (user features, data integrity)
- Mitigation: Circuit breaker, fallback, redundancy

### 5. Observability (3+ items)

- Metrics: Specific (latency P95, error rate %, throughput)
- Alerts: Conditions (error rate > 5%, latency P95 > 500ms)
- Dashboards: Key visualizations

### 6. Documentation (2+ items)

- ADR: Chosen option, rejected alternatives, trade-offs, constraints
- Diagram: New components, data flows, failure paths

### 7. Migration/Compatibility (2+ items)

- Backward compatibility: Old clients work? API versioning?
- Migration path: Phased rollout, feature flags, rollback procedure

---

## Red Flags - STOP When You Think:

| Thought | Reality |
|---------|---------|
| "Analysis paralysis" | This IS the analysis that prevents expensive mistakes |
| "We'll add scale/alternatives/failure modes later" | Retrofitting costs 5-10x more |
| "CTO already decided" | Still needs independent evaluation |
| "Being pragmatic not dogmatic" | These requirements ARE pragmatic |
| "Just a simple feature" | Simple becomes complex at scale |
| "We already know the solution" | Compare 3 alternatives first |
| "Keep it simple" | Simple for current scale = complex re-architecture at 10x |
| "I can add missing sections to existing work" | DELETE and restart |

---

## Override Requirements

To skip ANY requirement, you MUST provide ALL 4:
1. Specific retrofit date (not "later")
2. Budget allocated (engineer-weeks)
3. Risk acceptance signed by decision maker
4. Interim mitigation plan

| Skipped | Risk | Cost |
|---------|------|------|
| Scale Analysis | Re-architecture in 6-12 months | 3-6 month project, 5-10x cost |
| Alternatives | Optimize wrong dimension | 2-4 month migration |
| Failure Modes | Production incidents | $5-50K per incident |
| Ripple Effects | Broken clients, data issues | Deployment failures |

---

## Verification Before Complete

| Category | Requirements |
|----------|-------------|
| Scale | ✓ Current + 10x projected ✓ Bottlenecks ✓ Mitigations |
| Trade-offs | ✓ 3+ options ✓ Performance/complexity/cost ✓ Rationale |
| Impact | ✓ All layers analyzed ✓ Breaking changes identified |
| Failure | ✓ Specific modes ✓ Detection ✓ Mitigation ✓ Rollback |
| Documentation | ✓ ADR ✓ Diagram updated |

**If any item missing, do not proceed to implementation.**

Related Skills

MCP Architecture Expert

16
from diegosouzapw/awesome-omni-skill

Design and implement Model Context Protocol servers for standardized AI-to-data integration with resources, tools, prompts, and security best practices

architecture-paradigm-pipeline

16
from diegosouzapw/awesome-omni-skill

Consult this skill when designing data pipelines or transformation workflows. Use when data flows through fixed sequence of transformations, stages can be independently developed and tested, parallel processing of stages is beneficial. Do not use when selecting from multiple paradigms - use architecture-paradigms first. DO NOT use when: data flow is not sequential or predictable. DO NOT use when: complex branching/merging logic dominates.

architecture-advisor

16
from diegosouzapw/awesome-omni-skill

Helps solo developers with AI agents choose optimal architecture (monolithic/microservices/hybrid)

agent-native-architecture

16
from diegosouzapw/awesome-omni-skill

Build applications where agents are first-class citizens. Use this skill when designing autonomous agents, creating MCP tools, implementing self-modifying systems, or building apps where features are outcomes achieved by agents operating in a loop.

agent-architecture

16
from diegosouzapw/awesome-omni-skill

Use when designing or implementing AI agent systems. Covers tool-using agents with mandatory guardrails, SSE streaming (FastAPI → Next.js via Vercel AI SDK v6), LangGraph stateful multi-agent graphs, episodic memory via pgvector, MCP overview, and production failure modes with anti-pattern/fix code pairs.

u07820-attention-management-architecture-for-personal-finance-management

16
from diegosouzapw/awesome-omni-skill

Build and operate the "Attention Management Architecture for personal finance management" capability for personal finance management. Use when this exact capability is required by autonomous or human-guided missions.

MCP Server Architecture

16
from diegosouzapw/awesome-omni-skill

This skill should be used when the user asks to "create an MCP server", "set up MCP server", "build ChatGPT app backend", "MCP transport type", "configure MCP endpoint", "server setup for Apps SDK", or needs guidance on MCP server architecture, transport protocols, or SDK setup for the OpenAI Apps SDK.

api-tier-architecture

16
from diegosouzapw/awesome-omni-skill

3-tier API architecture (Convex WebSocket, SSE, REST) for cross-platform data fetching. Platform detection, hybrid hooks, DAL layer patterns. Triggers on "API", "tier", "Convex", "REST", "SSE", "useConvexQuery", "useQuery", "withAuth", "DAL".

adr-architecture

16
from diegosouzapw/awesome-omni-skill

Use when documenting significant technical or architectural decisions that need context, rationale, and consequences recorded. Invoke when choosing between technology options, making infrastructure decisions, establishing standards, migrating systems, or when team needs to understand why a decision was made. Use when user mentions ADR, architecture decision, technical decision record, or decision documentation.

langchain-architecture

16
from diegosouzapw/awesome-omni-skill

Design LLM applications using the LangChain framework with agents, memory, and tool integration patterns. Use when building LangChain applications, implementing AI agents, or creating complex LLM w...

architecture-agent-creation

16
from diegosouzapw/awesome-omni-skill

Create specialized infrastructure agent definitions for platform/service management (Grafana, Prometheus, Traefik, ERPNext, etc.). Use when the user requests creation of an agent for a specific technology platform or infrastructure component. This skill produces complete agent prompts with integrated research, SOPs, tool references, and handoff protocols following the Linear-First Agentic Workflow framework.

bgo

10
from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development