microservices-architecture

Microservices architecture patterns and best practices. Use when designing distributed systems, breaking down monoliths, or implementing service communication.

16 stars

Best use case

microservices-architecture is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Microservices architecture patterns and best practices. Use when designing distributed systems, breaking down monoliths, or implementing service communication.

Teams using microservices-architecture should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/microservices-architecture/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/design/microservices-architecture/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/microservices-architecture/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How microservices-architecture Compares

Feature / Agentmicroservices-architectureStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Microservices architecture patterns and best practices. Use when designing distributed systems, breaking down monoliths, or implementing service communication.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Microservices Architecture

Comprehensive guide for designing and implementing microservices-based systems.

## Microservices Fundamentals

### What are Microservices?

```
MICROSERVICES = Independently deployable services
               that do one thing well

Characteristics:
┌─────────────────────────────────────────┐
│ ✓ Single responsibility                 │
│ ✓ Own their data                        │
│ ✓ Independently deployable              │
│ ✓ Communicate via APIs                  │
│ ✓ Technology agnostic                   │
│ ✓ Owned by small teams                  │
└─────────────────────────────────────────┘
```

### Monolith vs Microservices

```
MONOLITH:
┌─────────────────────────────────────────┐
│           Single Application            │
│  ┌──────┬──────┬──────┬──────┐         │
│  │ Users│Orders│ Cart │Search│         │
│  └──────┴──────┴──────┴──────┘         │
│           Single Database               │
└─────────────────────────────────────────┘

MICROSERVICES:
┌──────┐  ┌──────┐  ┌──────┐  ┌──────┐
│Users │  │Orders│  │ Cart │  │Search│
│ DB   │  │ DB   │  │ DB   │  │ DB   │
└──────┘  └──────┘  └──────┘  └──────┘
    │         │         │         │
    └─────────┴─────────┴─────────┘
              API Gateway
```

### When to Use Microservices

```
USE WHEN:
✓ Large, complex domain
✓ Need independent scaling
✓ Multiple teams working in parallel
✓ Different technology needs per service
✓ Fault isolation is critical
✓ Frequent, independent deployments

DON'T USE WHEN:
✗ Small team/application
✗ Simple domain
✗ Tight latency requirements
✗ Limited DevOps maturity
✗ Unclear domain boundaries
```

---

## Service Design

### Domain-Driven Design

```
BOUNDED CONTEXTS:
┌─────────────────┐  ┌─────────────────┐
│   Orders        │  │   Shipping      │
│  Context        │  │   Context       │
│ ┌─────────────┐ │  │ ┌─────────────┐ │
│ │ Order       │ │  │ │ Shipment    │ │
│ │ LineItem    │ │  │ │ Carrier     │ │
│ │ Customer(ID)│ │  │ │ Address     │ │
│ └─────────────┘ │  │ └─────────────┘ │
└─────────────────┘  └─────────────────┘

Each context has its own:
- Ubiquitous language
- Data model
- Business rules
```

### Service Boundaries

```
GOOD boundaries follow:
┌─────────────────────────────────────────┐
│ Business Capability                     │
│ - What the business does                │
│ - e.g., Payment Processing, Inventory   │
├─────────────────────────────────────────┤
│ Subdomain                               │
│ - Area of expertise                     │
│ - e.g., Pricing, Catalog, Customer      │
├─────────────────────────────────────────┤
│ Single Responsibility                   │
│ - Does one thing well                   │
│ - Can explain in one sentence           │
└─────────────────────────────────────────┘

BAD boundaries:
✗ Technical layers (UI service, DB service)
✗ CRUD operations (User CRUD service)
✗ Too granular (EmailSender service)
```

### Service Size Guidelines

```
Right-sized service:
- 2-pizza team can own it (5-8 people)
- Rewrite in 2-4 weeks if needed
- Clear, single business purpose
- Minimal external dependencies
- Own its data completely

Too big: Multiple teams needed, mixed concerns
Too small: Can't function independently
```

---

## Communication Patterns

### Synchronous (Request/Response)

```
REST:
┌──────┐  HTTP GET /users/123  ┌──────┐
│Client│ ─────────────────────>│Server│
│      │<───────────────────── │      │
└──────┘  { "name": "John" }   └──────┘

gRPC:
┌──────┐  Binary/Protobuf      ┌──────┐
│Client│ ─────────────────────>│Server│
│      │<───────────────────── │      │
└──────┘  Strongly typed       └──────┘

GraphQL:
┌──────┐  POST /graphql        ┌──────┐
│Client│ ─────────────────────>│Server│
│      │<───────────────────── │      │
└──────┘  Flexible queries     └──────┘
```

### Asynchronous (Event-Driven)

```
MESSAGE QUEUE:
┌────────┐      ┌─────────┐      ┌────────┐
│Producer│ ───> │  Queue  │ ───> │Consumer│
└────────┘      │(RabbitMQ│      └────────┘
                │ SQS)    │
                └─────────┘

EVENT STREAMING:
┌────────┐      ┌─────────┐      ┌────────┐
│Producer│ ───> │  Topic  │ ───> │Consumer│
└────────┘      │ (Kafka) │ ───> │Consumer│
                └─────────┘ ───> │Consumer│
                                 └────────┘
```

### Communication Comparison

| Pattern             | Use Case                   | Trade-offs             |
| ------------------- | -------------------------- | ---------------------- |
| **REST**            | CRUD, simple queries       | Simple, but chatty     |
| **gRPC**            | High performance, internal | Fast, but complex      |
| **GraphQL**         | Flexible client needs      | Flexible, but overhead |
| **Message Queue**   | Task processing            | Decoupled, but delay   |
| **Event Streaming** | Event sourcing, analytics  | Scalable, but complex  |

---

## Data Management

### Database per Service

```
SEPARATE DATABASES:
┌────────┐  ┌────────┐  ┌────────┐
│Users   │  │Orders  │  │Products│
│Service │  │Service │  │Service │
├────────┤  ├────────┤  ├────────┤
│PostgreSQL│ │MongoDB │  │MySQL   │
└────────┘  └────────┘  └────────┘

Benefits:
✓ Independent scaling
✓ Technology freedom
✓ No shared schema coupling
✓ Fault isolation

Challenges:
- No joins across services
- Eventual consistency
- Data duplication
```

### Data Consistency Patterns

```
SAGA PATTERN (Choreography):
┌──────┐      ┌──────┐      ┌──────┐
│Order │─────>│Payment│─────>│Ship  │
│Create│      │Process│      │Order │
└──────┘      └──────┘      └──────┘
    │             │             │
    └─────────────┴─────────────┘
    Each service publishes events
    that trigger next step

SAGA PATTERN (Orchestration):
           ┌────────────┐
           │Orchestrator│
           └────────────┘
          /      │      \
         ↓       ↓       ↓
    ┌──────┐ ┌──────┐ ┌──────┐
    │Order │ │Payment│ │Ship  │
    └──────┘ └──────┘ └──────┘
    Central coordinator manages flow
```

### Event Sourcing

```
Instead of storing current state:
┌────────────────────────────┐
│ Account: $500              │
└────────────────────────────┘

Store all events:
┌────────────────────────────┐
│ 1. AccountCreated $0       │
│ 2. Deposited $1000         │
│ 3. Withdrawn $300          │
│ 4. Withdrawn $200          │
│ Current: $500              │
└────────────────────────────┘

Benefits:
✓ Complete audit trail
✓ Temporal queries
✓ Event replay
✓ Natural fit for CQRS
```

---

## API Gateway

### Gateway Pattern

```
               ┌─────────────┐
               │ API Gateway │
               └──────┬──────┘
        ┌──────────┬──┴──┬──────────┐
        ↓          ↓     ↓          ↓
   ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
   │Users   │ │Orders  │ │Products│ │Auth    │
   │Service │ │Service │ │Service │ │Service │
   └────────┘ └────────┘ └────────┘ └────────┘

Gateway responsibilities:
- Request routing
- Authentication/Authorization
- Rate limiting
- Load balancing
- Request/Response transformation
- Caching
- Monitoring
```

### BFF (Backend for Frontend)

```
              Mobile App        Web App
                  │                │
                  ↓                ↓
          ┌────────────┐   ┌────────────┐
          │ Mobile BFF │   │  Web BFF   │
          └─────┬──────┘   └─────┬──────┘
                │                │
         ┌──────┴────────────────┴──────┐
         │         Internal APIs         │
         └───────────────────────────────┘

Each BFF:
- Optimized for its client
- Aggregates multiple services
- Handles client-specific logic
```

---

## Service Discovery

### Client-Side Discovery

```
┌──────┐     ┌──────────────┐
│Client│────>│Service       │────> Service A (192.168.1.10)
│      │     │Registry      │────> Service A (192.168.1.11)
│      │<────│(Consul, etcd)│────> Service A (192.168.1.12)
└──────┘     └──────────────┘

Client queries registry, then calls service directly.
Client handles load balancing.
```

### Server-Side Discovery

```
┌──────┐     ┌─────────────┐     ┌──────────────┐
│Client│────>│Load Balancer│────>│Service       │
│      │     │             │     │Registry      │
└──────┘     └─────────────┘     └──────────────┘
                                        │
                               ┌────────┴────────┐
                               ↓        ↓        ↓
                            Service  Service  Service

Load balancer handles discovery and routing.
Simpler for clients.
```

---

## Resilience Patterns

### Circuit Breaker

```
States:
┌────────┐    Failures    ┌────────┐
│ CLOSED │───────────────>│  OPEN  │
│(normal)│                │(reject)│
└────────┘                └────────┘
    ↑                          │
    │      Timeout             │
    │   ┌────────────┐         │
    └───│HALF-OPEN   │<────────┘
        │(test)      │
        └────────────┘

Implementation:
- Track failure count
- Open circuit after threshold
- Reject calls while open
- Periodically test with half-open
- Close circuit on success
```

### Retry Pattern

```typescript
async function withRetry<T>(
  fn: () => Promise<T>,
  maxAttempts: number = 3,
  backoff: number = 1000,
): Promise<T> {
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      return await fn();
    } catch (error) {
      if (attempt === maxAttempts) throw error;

      // Exponential backoff with jitter
      const delay = backoff * Math.pow(2, attempt - 1);
      const jitter = delay * 0.1 * Math.random();
      await sleep(delay + jitter);
    }
  }
}
```

### Bulkhead Pattern

```
Isolate resources to prevent cascade:

┌─────────────────────────────────────┐
│           Service A                 │
│  ┌─────────┐  ┌─────────┐          │
│  │Thread   │  │Thread   │          │
│  │Pool 1   │  │Pool 2   │          │
│  │(Service │  │(Service │          │
│  │   B)    │  │   C)    │          │
│  └─────────┘  └─────────┘          │
└─────────────────────────────────────┘

If Service C is slow, only Pool 2 is affected.
Service B calls continue normally.
```

### Timeout Pattern

```typescript
async function withTimeout<T>(fn: () => Promise<T>, ms: number): Promise<T> {
  return Promise.race([
    fn(),
    new Promise<T>((_, reject) =>
      setTimeout(() => reject(new Error("Timeout")), ms),
    ),
  ]);
}

// Always set timeouts on external calls
const user = await withTimeout(
  () => userService.getUser(id),
  5000, // 5 second timeout
);
```

---

## Observability

### The Three Pillars

```
LOGS:
┌─────────────────────────────────────────┐
│ 2024-01-15 10:30:45 [INFO] OrderService │
│ Order created: { id: 123, user: 456 }   │
└─────────────────────────────────────────┘

METRICS:
┌─────────────────────────────────────────┐
│ order_created_total: 1523               │
│ order_processing_seconds: 0.234         │
│ active_connections: 45                  │
└─────────────────────────────────────────┘

TRACES:
┌─────────────────────────────────────────┐
│ [Request ID: abc123]                    │
│ ├─ Gateway: 2ms                         │
│ ├─ OrderService: 150ms                  │
│ │  ├─ UserService: 45ms                 │
│ │  └─ InventoryService: 80ms            │
│ └─ Total: 152ms                         │
└─────────────────────────────────────────┘
```

### Distributed Tracing

```
Trace Context Propagation:
┌──────┐  X-Trace-ID: abc  ┌──────┐
│API   │─────────────────>│Order │
│GW    │                   │Svc   │
└──────┘                   └──────┘
                              │
              X-Trace-ID: abc │
                              ↓
                           ┌──────┐
                           │User  │
                           │Svc   │
                           └──────┘

Tools: Jaeger, Zipkin, AWS X-Ray, Datadog
```

### Health Checks

```typescript
// Liveness: Is the service running?
app.get("/health/live", (req, res) => {
  res.status(200).json({ status: "alive" });
});

// Readiness: Is the service ready to handle traffic?
app.get("/health/ready", async (req, res) => {
  const dbHealthy = await checkDatabase();
  const cacheHealthy = await checkCache();

  if (dbHealthy && cacheHealthy) {
    res.status(200).json({ status: "ready" });
  } else {
    res.status(503).json({
      status: "not ready",
      checks: { database: dbHealthy, cache: cacheHealthy },
    });
  }
});
```

---

## Deployment

### Containerization

```dockerfile
# Dockerfile
FROM node:18-alpine

WORKDIR /app

COPY package*.json ./
RUN npm ci --only=production

COPY dist ./dist

USER node

EXPOSE 3000

CMD ["node", "dist/main.js"]
```

### Kubernetes Basics

```yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: order-service
  template:
    metadata:
      labels:
        app: order-service
    spec:
      containers:
        - name: order-service
          image: order-service:1.0.0
          ports:
            - containerPort: 3000
          resources:
            limits:
              cpu: "500m"
              memory: "256Mi"
          livenessProbe:
            httpGet:
              path: /health/live
              port: 3000
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 3000
```

---

## Testing Strategies

### Test Pyramid for Microservices

```
              /\
             /  \  E2E Tests
            /────\  (Few, slow, brittle)
           /      \
          /────────\  Contract Tests
         /          \  (Service boundaries)
        /────────────\
       /              \  Integration Tests
      /────────────────\  (With dependencies)
     /                  \
    /────────────────────\  Unit Tests
   /                      \  (Many, fast, isolated)
  /________________________\
```

### Contract Testing

```typescript
// Consumer test (Order Service)
describe("User Service Contract", () => {
  it("returns user by ID", async () => {
    // Define expected interaction
    await provider.addInteraction({
      state: "user 123 exists",
      uponReceiving: "a request for user 123",
      withRequest: {
        method: "GET",
        path: "/users/123",
      },
      willRespondWith: {
        status: 200,
        body: {
          id: "123",
          name: like("John"),
          email: like("john@example.com"),
        },
      },
    });

    // Test passes if consumer expectations match
  });
});
```

---

## Best Practices

### DO:

- Start with a monolith, extract services later
- Define clear service boundaries
- Use asynchronous communication where possible
- Implement circuit breakers
- Centralize logging and monitoring
- Automate everything
- Design for failure
- Version your APIs

### DON'T:

- Create too many, too small services
- Share databases between services
- Make synchronous chains too deep
- Ignore distributed system complexities
- Couple services through shared libraries
- Skip contract testing
- Deploy without monitoring

---

## Migration Checklist

### Monolith to Microservices

- [ ] Map domain boundaries clearly
- [ ] Identify candidate services (start small)
- [ ] Establish CI/CD for new services
- [ ] Implement API gateway
- [ ] Set up service discovery
- [ ] Add distributed tracing
- [ ] Implement circuit breakers
- [ ] Extract first service (strangler fig pattern)
- [ ] Test extensively
- [ ] Monitor and iterate

Related Skills

microservices-patterns

16
from diegosouzapw/awesome-omni-skill

Design microservices architectures with service boundaries, event-driven communication, and resilience patterns. Use when building distributed systems, decomposing monoliths, or implementing micros...

architecture-patterns

16
from diegosouzapw/awesome-omni-skill

Padrões de arquitetura de software - Decisões OBJETIVAS sobre design de sistemas

rails-architecture

16
from diegosouzapw/awesome-omni-skill

Guides modern Rails 8 code architecture decisions and patterns. Use when deciding where to put code, choosing between patterns (service objects vs concerns vs query objects), designing feature architecture, refactoring for better organization, or when user mentions architecture, code organization, design patterns, or layered design.

mvvm-architecture

16
from diegosouzapw/awesome-omni-skill

Expert MVVM decisions for iOS/tvOS: choosing between ViewModel patterns (state enum vs published properties vs Combine), service layer boundaries, dependency injection strategies, and testing approaches. Use when designing ViewModel architecture, debugging data flow issues, or deciding where business logic belongs. Trigger keywords: MVVM, ViewModel, ObservableObject, @StateObject, service layer, dependency injection, unit test, mock, architecture

MCP Architecture Expert

16
from diegosouzapw/awesome-omni-skill

Design and implement Model Context Protocol servers for standardized AI-to-data integration with resources, tools, prompts, and security best practices

architecture-paradigm-pipeline

16
from diegosouzapw/awesome-omni-skill

Consult this skill when designing data pipelines or transformation workflows. Use when data flows through fixed sequence of transformations, stages can be independently developed and tested, parallel processing of stages is beneficial. Do not use when selecting from multiple paradigms - use architecture-paradigms first. DO NOT use when: data flow is not sequential or predictable. DO NOT use when: complex branching/merging logic dominates.

architecture-advisor

16
from diegosouzapw/awesome-omni-skill

Helps solo developers with AI agents choose optimal architecture (monolithic/microservices/hybrid)

agent-native-architecture

16
from diegosouzapw/awesome-omni-skill

Build applications where agents are first-class citizens. Use this skill when designing autonomous agents, creating MCP tools, implementing self-modifying systems, or building apps where features are outcomes achieved by agents operating in a loop.

agent-architecture

16
from diegosouzapw/awesome-omni-skill

Use when designing or implementing AI agent systems. Covers tool-using agents with mandatory guardrails, SSE streaming (FastAPI → Next.js via Vercel AI SDK v6), LangGraph stateful multi-agent graphs, episodic memory via pgvector, MCP overview, and production failure modes with anti-pattern/fix code pairs.

u07820-attention-management-architecture-for-personal-finance-management

16
from diegosouzapw/awesome-omni-skill

Build and operate the "Attention Management Architecture for personal finance management" capability for personal finance management. Use when this exact capability is required by autonomous or human-guided missions.

Microservices Communication

16
from diegosouzapw/awesome-omni-skill

Thiết kế kiến trúc giao tiếp Microservices (gRPC, message queues, event-driven pattern).

MCP Server Architecture

16
from diegosouzapw/awesome-omni-skill

This skill should be used when the user asks to "create an MCP server", "set up MCP server", "build ChatGPT app backend", "MCP transport type", "configure MCP endpoint", "server setup for Apps SDK", or needs guidance on MCP server architecture, transport protocols, or SDK setup for the OpenAI Apps SDK.