nw-sd-framework

4-step system design framework with back-of-envelope estimation, scaling ladder, and common pitfalls

322 stars

Best use case

nw-sd-framework is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

4-step system design framework with back-of-envelope estimation, scaling ladder, and common pitfalls

Teams using nw-sd-framework should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/nw-sd-framework/SKILL.md --create-dirs "https://raw.githubusercontent.com/nWave-ai/nWave/main/nWave/skills/nw-sd-framework/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/nw-sd-framework/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How nw-sd-framework Compares

Feature / Agentnw-sd-frameworkStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

4-step system design framework with back-of-envelope estimation, scaling ladder, and common pitfalls

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# System Design Framework

## The 4-Step Process

Every system design follows this structure. Skipping steps is the top mistake.

### Step 1: Understand the Problem and Establish Design Scope (3-10 min)

Narrow an impossibly broad question into a tractable problem.

**Ask about**: users and scale | most important features | read/write ratio | non-functional requirements (latency, availability, consistency) | existing infrastructure | special constraints (mobile-first, offline, regulatory)

**Produce**: functional requirements (3-5 bullets) | non-functional requirements (scale, latency, availability, consistency model) | capacity estimation (QPS, storage, bandwidth)

**Red flags if skipped**: designing a system nobody asked for | over-engineering for imaginary scale | missing critical constraints (GDPR, real-time)

### Step 2: Propose High-Level Design and Get Buy-In (10-15 min)

Sketch the big picture. Validate before diving deep.

**Do**: draw architecture diagram (clients, servers, databases, caches, queues) | define API contract (REST/GraphQL/gRPC -- key endpoints) | design data model (entities, relationships, access patterns) | walk through 1-2 core use cases end-to-end | get buy-in: "Does this make sense before I go deeper?"

**API patterns**: RESTful for CRUD-heavy | GraphQL for flexible client queries | gRPC for internal service-to-service | WebSocket/SSE for real-time

**Data model**: SQL vs NoSQL based on access patterns, not hype | denormalization trade-offs | partitioning key selection (directly impacts scalability)

### Step 3: Design Deep Dive (10-25 min)

Go deep on 2-3 components.

**Choose**: most technically challenging | most interesting trade-offs | bottleneck components (highest load, most failure-prone)

**Depth means**: specific algorithms (consistent hashing, Bloom filters) | failure modes and handling | scaling strategy per component | data flow with edge cases | monitoring and operational concerns

### Step 4: Wrap Up (3-5 min)

**Cover**: summarize design in 2-3 sentences | identify known bottlenecks | what you'd improve with more time | operational concerns (monitoring, alerting, deployment) | future enhancements

**Avoid**: introducing entirely new components at this stage | second-guessing your design

---

## Back-of-Envelope Estimation

### Powers of 2 Reference

| Power | Value | Meaning |
|-------|-------|---------|
| 10 | 1 Thousand | 1 KB |
| 20 | 1 Million | 1 MB |
| 30 | 1 Billion | 1 GB |
| 40 | 1 Trillion | 1 TB |
| 50 | 1 Quadrillion | 1 PB |

### Latency Numbers Every Engineer Should Know

| Operation | Latency |
|-----------|---------|
| L1 cache reference | 0.5 ns |
| L2 cache reference | 7 ns |
| Main memory reference | 100 ns |
| Compress 1KB (Zippy) | 10 us |
| Send 2KB over 1 Gbps | 20 us |
| Read 1 MB from memory | 250 us |
| Datacenter round trip | 500 us |
| Disk seek | 10 ms |
| Read 1 MB from network | 10 ms |
| Read 1 MB from disk | 30 ms |
| CA to Netherlands round trip | 150 ms |

**Key takeaways**: memory fast, disk slow -- cache aggressively | compress before network send | inter-datacenter trips expensive -- minimize cross-region calls

### Common Estimation Patterns

**DAU to QPS**: `QPS = DAU * actions_per_user / 86400` | Peak QPS = QPS * 2 (or *3 for spiky)

**Storage**: `daily = DAU * actions * avg_size` | yearly = daily * 365 | 5-year = yearly * 5

**Bandwidth**: `QPS * average_response_size`

**Servers**: `Peak QPS / QPS_per_server` where CPU-bound ~hundreds | IO-bound with cache ~thousands | static content ~tens of thousands

### Estimation Example: Twitter-like Service

```
150M DAU, 2 tweets/day, 10 reads/day
Write QPS = 150M * 2 / 86400 ~ 3,500
Read QPS = 150M * 10 / 86400 ~ 17,000; Peak ~ 50,000
Storage: 300M tweets * 1KB + 30M media * 500KB ~ 15.3 TB/day
```

---

## Scaling Ladder

Each step solves a specific bottleneck. Never introduce a component without articulating which bottleneck it addresses.

1. **Load balancer** -- distribute traffic across web servers
2. **Database replication** -- master-slave for read scaling
3. **Cache layer** -- reduce database load (Redis/Memcached)
4. **CDN** -- serve static content from edge
5. **Stateless web tier** -- move session state to shared store
6. **Database sharding** -- horizontal partitioning for write scaling
7. **Message queue** -- decouple components, handle spikes
8. **Logging, metrics, monitoring** -- observability at scale
9. **Multiple data centers** -- geographic redundancy and latency reduction

---

## Common Pitfalls

1. **Jumping to solutions** -- design before understanding requirements
2. **Over-engineering** -- adding components for imaginary scale
3. **Ignoring trade-offs** -- every choice has a cost; name it
4. **SPOF blindness** -- always ask "what if this dies?"
5. **Neglecting data** -- the data model drives everything
6. **Forgetting operations** -- a system you can't monitor is one you can't run
7. **Not doing math** -- gut feelings are wrong; estimates keep you honest

Related Skills

nw-quality-framework

322
from nWave-ai/nWave

Quality gates - 11 commit readiness gates, build/test protocol, validation checkpoints, and quality metrics

nw-post-mortem-framework

322
from nWave-ai/nWave

Blameless post-mortem structure, incident timeline reconstruction, response evaluation, and organizational learning

nw-outcome-kpi-framework

322
from nWave-ai/nWave

Outcome KPI definition methodology - synthesizes Who Does What By How Much (Gothelf/Seiden), Running Lean (Maurya), and Measure What Matters (Doerr) into a practical framework for measurable outcome KPIs

nw-divio-framework

322
from nWave-ai/nWave

DIVIO/Diataxis four-quadrant documentation framework - type definitions, classification decision tree, and signal catalog

nw-ux-web-patterns

322
from nWave-ai/nWave

Web UI design patterns for product owners. Load when designing web application interfaces, writing web-specific acceptance criteria, or evaluating responsive designs.

nw-ux-tui-patterns

322
from nWave-ai/nWave

Terminal UI and CLI design patterns for product owners. Load when designing command-line tools, interactive terminal applications, or writing CLI-specific acceptance criteria.

nw-ux-principles

322
from nWave-ai/nWave

Core UX principles for product owners. Load when evaluating interface designs, writing acceptance criteria with UX requirements, or reviewing wireframes and mockups.

nw-ux-emotional-design

322
from nWave-ai/nWave

Emotional design and delight patterns for product owners. Load when designing onboarding flows, empty states, first-run experiences, or evaluating the emotional quality of an interface.

nw-ux-desktop-patterns

322
from nWave-ai/nWave

Desktop application UI patterns for product owners. Load when designing native or cross-platform desktop applications, writing desktop-specific acceptance criteria, or evaluating panel layouts and keyboard workflows.

nw-user-story-mapping

322
from nWave-ai/nWave

User story mapping for backlog management and outcome-based prioritization. Load during Phase 2.5 (User Story Mapping) to produce story-map.md and prioritization.md.

nw-tr-review-criteria

322
from nWave-ai/nWave

Review dimensions and scoring for root cause analysis quality assessment

nw-tlaplus-verification

322
from nWave-ai/nWave

TLA+ formal verification for design correctness and PBT pipeline integration