microservices-architecture
Microservices architecture patterns and best practices. Use when designing distributed systems, breaking down monoliths, or implementing service communication.
Best use case
microservices-architecture is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Microservices architecture patterns and best practices. Use when designing distributed systems, breaking down monoliths, or implementing service communication.
Teams using microservices-architecture should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/microservices-architecture/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How microservices-architecture Compares
| Feature / Agent | microservices-architecture | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Microservices architecture patterns and best practices. Use when designing distributed systems, breaking down monoliths, or implementing service communication.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Microservices Architecture
Comprehensive guide for designing and implementing microservices-based systems.
## Microservices Fundamentals
### What are Microservices?
```
MICROSERVICES = Independently deployable services
that do one thing well
Characteristics:
┌─────────────────────────────────────────┐
│ ✓ Single responsibility │
│ ✓ Own their data │
│ ✓ Independently deployable │
│ ✓ Communicate via APIs │
│ ✓ Technology agnostic │
│ ✓ Owned by small teams │
└─────────────────────────────────────────┘
```
### Monolith vs Microservices
```
MONOLITH:
┌─────────────────────────────────────────┐
│ Single Application │
│ ┌──────┬──────┬──────┬──────┐ │
│ │ Users│Orders│ Cart │Search│ │
│ └──────┴──────┴──────┴──────┘ │
│ Single Database │
└─────────────────────────────────────────┘
MICROSERVICES:
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│Users │ │Orders│ │ Cart │ │Search│
│ DB │ │ DB │ │ DB │ │ DB │
└──────┘ └──────┘ └──────┘ └──────┘
│ │ │ │
└─────────┴─────────┴─────────┘
API Gateway
```
### When to Use Microservices
```
USE WHEN:
✓ Large, complex domain
✓ Need independent scaling
✓ Multiple teams working in parallel
✓ Different technology needs per service
✓ Fault isolation is critical
✓ Frequent, independent deployments
DON'T USE WHEN:
✗ Small team/application
✗ Simple domain
✗ Tight latency requirements
✗ Limited DevOps maturity
✗ Unclear domain boundaries
```
---
## Service Design
### Domain-Driven Design
```
BOUNDED CONTEXTS:
┌─────────────────┐ ┌─────────────────┐
│ Orders │ │ Shipping │
│ Context │ │ Context │
│ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ Order │ │ │ │ Shipment │ │
│ │ LineItem │ │ │ │ Carrier │ │
│ │ Customer(ID)│ │ │ │ Address │ │
│ └─────────────┘ │ │ └─────────────┘ │
└─────────────────┘ └─────────────────┘
Each context has its own:
- Ubiquitous language
- Data model
- Business rules
```
### Service Boundaries
```
GOOD boundaries follow:
┌─────────────────────────────────────────┐
│ Business Capability │
│ - What the business does │
│ - e.g., Payment Processing, Inventory │
├─────────────────────────────────────────┤
│ Subdomain │
│ - Area of expertise │
│ - e.g., Pricing, Catalog, Customer │
├─────────────────────────────────────────┤
│ Single Responsibility │
│ - Does one thing well │
│ - Can explain in one sentence │
└─────────────────────────────────────────┘
BAD boundaries:
✗ Technical layers (UI service, DB service)
✗ CRUD operations (User CRUD service)
✗ Too granular (EmailSender service)
```
### Service Size Guidelines
```
Right-sized service:
- 2-pizza team can own it (5-8 people)
- Rewrite in 2-4 weeks if needed
- Clear, single business purpose
- Minimal external dependencies
- Own its data completely
Too big: Multiple teams needed, mixed concerns
Too small: Can't function independently
```
---
## Communication Patterns
### Synchronous (Request/Response)
```
REST:
┌──────┐ HTTP GET /users/123 ┌──────┐
│Client│ ─────────────────────>│Server│
│ │<───────────────────── │ │
└──────┘ { "name": "John" } └──────┘
gRPC:
┌──────┐ Binary/Protobuf ┌──────┐
│Client│ ─────────────────────>│Server│
│ │<───────────────────── │ │
└──────┘ Strongly typed └──────┘
GraphQL:
┌──────┐ POST /graphql ┌──────┐
│Client│ ─────────────────────>│Server│
│ │<───────────────────── │ │
└──────┘ Flexible queries └──────┘
```
### Asynchronous (Event-Driven)
```
MESSAGE QUEUE:
┌────────┐ ┌─────────┐ ┌────────┐
│Producer│ ───> │ Queue │ ───> │Consumer│
└────────┘ │(RabbitMQ│ └────────┘
│ SQS) │
└─────────┘
EVENT STREAMING:
┌────────┐ ┌─────────┐ ┌────────┐
│Producer│ ───> │ Topic │ ───> │Consumer│
└────────┘ │ (Kafka) │ ───> │Consumer│
└─────────┘ ───> │Consumer│
└────────┘
```
### Communication Comparison
| Pattern | Use Case | Trade-offs |
| ------------------- | -------------------------- | ---------------------- |
| **REST** | CRUD, simple queries | Simple, but chatty |
| **gRPC** | High performance, internal | Fast, but complex |
| **GraphQL** | Flexible client needs | Flexible, but overhead |
| **Message Queue** | Task processing | Decoupled, but delay |
| **Event Streaming** | Event sourcing, analytics | Scalable, but complex |
---
## Data Management
### Database per Service
```
SEPARATE DATABASES:
┌────────┐ ┌────────┐ ┌────────┐
│Users │ │Orders │ │Products│
│Service │ │Service │ │Service │
├────────┤ ├────────┤ ├────────┤
│PostgreSQL│ │MongoDB │ │MySQL │
└────────┘ └────────┘ └────────┘
Benefits:
✓ Independent scaling
✓ Technology freedom
✓ No shared schema coupling
✓ Fault isolation
Challenges:
- No joins across services
- Eventual consistency
- Data duplication
```
### Data Consistency Patterns
```
SAGA PATTERN (Choreography):
┌──────┐ ┌──────┐ ┌──────┐
│Order │─────>│Payment│─────>│Ship │
│Create│ │Process│ │Order │
└──────┘ └──────┘ └──────┘
│ │ │
└─────────────┴─────────────┘
Each service publishes events
that trigger next step
SAGA PATTERN (Orchestration):
┌────────────┐
│Orchestrator│
└────────────┘
/ │ \
↓ ↓ ↓
┌──────┐ ┌──────┐ ┌──────┐
│Order │ │Payment│ │Ship │
└──────┘ └──────┘ └──────┘
Central coordinator manages flow
```
### Event Sourcing
```
Instead of storing current state:
┌────────────────────────────┐
│ Account: $500 │
└────────────────────────────┘
Store all events:
┌────────────────────────────┐
│ 1. AccountCreated $0 │
│ 2. Deposited $1000 │
│ 3. Withdrawn $300 │
│ 4. Withdrawn $200 │
│ Current: $500 │
└────────────────────────────┘
Benefits:
✓ Complete audit trail
✓ Temporal queries
✓ Event replay
✓ Natural fit for CQRS
```
---
## API Gateway
### Gateway Pattern
```
┌─────────────┐
│ API Gateway │
└──────┬──────┘
┌──────────┬──┴──┬──────────┐
↓ ↓ ↓ ↓
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│Users │ │Orders │ │Products│ │Auth │
│Service │ │Service │ │Service │ │Service │
└────────┘ └────────┘ └────────┘ └────────┘
Gateway responsibilities:
- Request routing
- Authentication/Authorization
- Rate limiting
- Load balancing
- Request/Response transformation
- Caching
- Monitoring
```
### BFF (Backend for Frontend)
```
Mobile App Web App
│ │
↓ ↓
┌────────────┐ ┌────────────┐
│ Mobile BFF │ │ Web BFF │
└─────┬──────┘ └─────┬──────┘
│ │
┌──────┴────────────────┴──────┐
│ Internal APIs │
└───────────────────────────────┘
Each BFF:
- Optimized for its client
- Aggregates multiple services
- Handles client-specific logic
```
---
## Service Discovery
### Client-Side Discovery
```
┌──────┐ ┌──────────────┐
│Client│────>│Service │────> Service A (192.168.1.10)
│ │ │Registry │────> Service A (192.168.1.11)
│ │<────│(Consul, etcd)│────> Service A (192.168.1.12)
└──────┘ └──────────────┘
Client queries registry, then calls service directly.
Client handles load balancing.
```
### Server-Side Discovery
```
┌──────┐ ┌─────────────┐ ┌──────────────┐
│Client│────>│Load Balancer│────>│Service │
│ │ │ │ │Registry │
└──────┘ └─────────────┘ └──────────────┘
│
┌────────┴────────┐
↓ ↓ ↓
Service Service Service
Load balancer handles discovery and routing.
Simpler for clients.
```
---
## Resilience Patterns
### Circuit Breaker
```
States:
┌────────┐ Failures ┌────────┐
│ CLOSED │───────────────>│ OPEN │
│(normal)│ │(reject)│
└────────┘ └────────┘
↑ │
│ Timeout │
│ ┌────────────┐ │
└───│HALF-OPEN │<────────┘
│(test) │
└────────────┘
Implementation:
- Track failure count
- Open circuit after threshold
- Reject calls while open
- Periodically test with half-open
- Close circuit on success
```
### Retry Pattern
```typescript
async function withRetry<T>(
fn: () => Promise<T>,
maxAttempts: number = 3,
backoff: number = 1000,
): Promise<T> {
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
try {
return await fn();
} catch (error) {
if (attempt === maxAttempts) throw error;
// Exponential backoff with jitter
const delay = backoff * Math.pow(2, attempt - 1);
const jitter = delay * 0.1 * Math.random();
await sleep(delay + jitter);
}
}
}
```
### Bulkhead Pattern
```
Isolate resources to prevent cascade:
┌─────────────────────────────────────┐
│ Service A │
│ ┌─────────┐ ┌─────────┐ │
│ │Thread │ │Thread │ │
│ │Pool 1 │ │Pool 2 │ │
│ │(Service │ │(Service │ │
│ │ B) │ │ C) │ │
│ └─────────┘ └─────────┘ │
└─────────────────────────────────────┘
If Service C is slow, only Pool 2 is affected.
Service B calls continue normally.
```
### Timeout Pattern
```typescript
async function withTimeout<T>(fn: () => Promise<T>, ms: number): Promise<T> {
return Promise.race([
fn(),
new Promise<T>((_, reject) =>
setTimeout(() => reject(new Error("Timeout")), ms),
),
]);
}
// Always set timeouts on external calls
const user = await withTimeout(
() => userService.getUser(id),
5000, // 5 second timeout
);
```
---
## Observability
### The Three Pillars
```
LOGS:
┌─────────────────────────────────────────┐
│ 2024-01-15 10:30:45 [INFO] OrderService │
│ Order created: { id: 123, user: 456 } │
└─────────────────────────────────────────┘
METRICS:
┌─────────────────────────────────────────┐
│ order_created_total: 1523 │
│ order_processing_seconds: 0.234 │
│ active_connections: 45 │
└─────────────────────────────────────────┘
TRACES:
┌─────────────────────────────────────────┐
│ [Request ID: abc123] │
│ ├─ Gateway: 2ms │
│ ├─ OrderService: 150ms │
│ │ ├─ UserService: 45ms │
│ │ └─ InventoryService: 80ms │
│ └─ Total: 152ms │
└─────────────────────────────────────────┘
```
### Distributed Tracing
```
Trace Context Propagation:
┌──────┐ X-Trace-ID: abc ┌──────┐
│API │─────────────────>│Order │
│GW │ │Svc │
└──────┘ └──────┘
│
X-Trace-ID: abc │
↓
┌──────┐
│User │
│Svc │
└──────┘
Tools: Jaeger, Zipkin, AWS X-Ray, Datadog
```
### Health Checks
```typescript
// Liveness: Is the service running?
app.get("/health/live", (req, res) => {
res.status(200).json({ status: "alive" });
});
// Readiness: Is the service ready to handle traffic?
app.get("/health/ready", async (req, res) => {
const dbHealthy = await checkDatabase();
const cacheHealthy = await checkCache();
if (dbHealthy && cacheHealthy) {
res.status(200).json({ status: "ready" });
} else {
res.status(503).json({
status: "not ready",
checks: { database: dbHealthy, cache: cacheHealthy },
});
}
});
```
---
## Deployment
### Containerization
```dockerfile
# Dockerfile
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY dist ./dist
USER node
EXPOSE 3000
CMD ["node", "dist/main.js"]
```
### Kubernetes Basics
```yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
spec:
replicas: 3
selector:
matchLabels:
app: order-service
template:
metadata:
labels:
app: order-service
spec:
containers:
- name: order-service
image: order-service:1.0.0
ports:
- containerPort: 3000
resources:
limits:
cpu: "500m"
memory: "256Mi"
livenessProbe:
httpGet:
path: /health/live
port: 3000
readinessProbe:
httpGet:
path: /health/ready
port: 3000
```
---
## Testing Strategies
### Test Pyramid for Microservices
```
/\
/ \ E2E Tests
/────\ (Few, slow, brittle)
/ \
/────────\ Contract Tests
/ \ (Service boundaries)
/────────────\
/ \ Integration Tests
/────────────────\ (With dependencies)
/ \
/────────────────────\ Unit Tests
/ \ (Many, fast, isolated)
/________________________\
```
### Contract Testing
```typescript
// Consumer test (Order Service)
describe("User Service Contract", () => {
it("returns user by ID", async () => {
// Define expected interaction
await provider.addInteraction({
state: "user 123 exists",
uponReceiving: "a request for user 123",
withRequest: {
method: "GET",
path: "/users/123",
},
willRespondWith: {
status: 200,
body: {
id: "123",
name: like("John"),
email: like("john@example.com"),
},
},
});
// Test passes if consumer expectations match
});
});
```
---
## Best Practices
### DO:
- Start with a monolith, extract services later
- Define clear service boundaries
- Use asynchronous communication where possible
- Implement circuit breakers
- Centralize logging and monitoring
- Automate everything
- Design for failure
- Version your APIs
### DON'T:
- Create too many, too small services
- Share databases between services
- Make synchronous chains too deep
- Ignore distributed system complexities
- Couple services through shared libraries
- Skip contract testing
- Deploy without monitoring
---
## Migration Checklist
### Monolith to Microservices
- [ ] Map domain boundaries clearly
- [ ] Identify candidate services (start small)
- [ ] Establish CI/CD for new services
- [ ] Implement API gateway
- [ ] Set up service discovery
- [ ] Add distributed tracing
- [ ] Implement circuit breakers
- [ ] Extract first service (strangler fig pattern)
- [ ] Test extensively
- [ ] Monitor and iterateRelated Skills
event-driven-architecture
Kafka, RabbitMQ, SQS/SNS, event sourcing, CQRS, saga patterns, dead letter queues, and idempotency. Use when designing asynchronous systems, implementing message-driven workflows, or building event streaming pipelines.
example-skill
Example skill - replace with your skill's description and activation keywords
websockets-realtime
Real-time communication with WebSockets, Server-Sent Events, and related technologies. Use when building chat, live updates, collaborative features, or any real-time functionality.
video-production
Professional video production from planning to delivery. Use when creating video content, editing workflows, motion graphics, or optimizing video for different platforms.
ui-research
Research-first UI/UX design workflow. Use BEFORE any frontend visual work to research modern patterns, gather inspiration from real products, and avoid generic AI-generated looks. Mandatory prerequisite for quality UI work.
ui-animation
Motion design and animation for user interfaces. Use when creating micro-interactions, page transitions, loading states, or any UI animation across web and mobile platforms.
travel-planner
Travel destination research and daily itinerary creation with logistics planning, budget tracking, and experience optimization. Use when planning trips, creating travel itineraries, comparing destinations, or organizing travel logistics.
test-specialist
This skill should be used when writing test cases, fixing bugs, analyzing code for potential issues, or improving test coverage for JavaScript/TypeScript applications. Use this for unit tests, integration tests, end-to-end tests, debugging runtime errors, logic bugs, performance issues, security vulnerabilities, and systematic code analysis.
tech-debt-analyzer
This skill should be used when analyzing technical debt in a codebase, documenting code quality issues, creating technical debt registers, or assessing code maintainability. Use this for identifying code smells, architectural issues, dependency problems, missing documentation, security vulnerabilities, and creating comprehensive technical debt documentation.
tdd-workflow
Test-Driven Development workflow enforcement with RED-GREEN-REFACTOR cycle. Use when implementing features test-first or improving test coverage.
tauri-desktop
Tauri 2.0 project setup, Rust backend + web frontend, plugin system, IPC commands, security model, auto-update, and mobile support. Use when building lightweight cross-platform desktop or mobile apps with Tauri.
svelte-development
Svelte 5 development with runes ($state, $derived, $effect), SvelteKit full-stack framework, and modern reactive patterns. Use when building Svelte applications, implementing fine-grained reactivity, or working with SvelteKit routing and server functions.