canary-deploy-patterns
Traffic splitting, health checks, automated rollback, progressive delivery, and canary analysis for safe deployments.
Best use case
canary-deploy-patterns is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Traffic splitting, health checks, automated rollback, progressive delivery, and canary analysis for safe deployments.
Teams using canary-deploy-patterns should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/canary-deploy-patterns/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How canary-deploy-patterns Compares
| Feature / Agent | canary-deploy-patterns | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Traffic splitting, health checks, automated rollback, progressive delivery, and canary analysis for safe deployments.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Canary Deploy Patterns
Progressive delivery patterns for safe, automated production deployments.
## Traffic Splitting Strategy
```yaml
# Istio VirtualService: gradual traffic shift
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: api-canary
spec:
hosts:
- api.example.com
http:
- route:
- destination:
host: api-stable
port:
number: 80
weight: 95 # 95% to stable version
- destination:
host: api-canary
port:
number: 80
weight: 5 # 5% to canary version
---
# Progressive rollout schedule
# Step 1: 5% canary, observe 10 minutes
# Step 2: 25% canary, observe 10 minutes
# Step 3: 50% canary, observe 10 minutes
# Step 4: 75% canary, observe 10 minutes
# Step 5: 100% canary → promote to stable
```
## Argo Rollouts Canary
```yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: api-server
spec:
replicas: 10
strategy:
canary:
canaryService: api-canary-svc
stableService: api-stable-svc
trafficRouting:
istio:
virtualService:
name: api-vsvc
steps:
# Step 1: 5% traffic to canary
- setWeight: 5
- pause: { duration: 10m }
# Step 2: Run analysis (automated health check)
- analysis:
templates:
- templateName: canary-success-rate
args:
- name: service-name
value: api-canary-svc
# Step 3: Increase to 25%
- setWeight: 25
- pause: { duration: 10m }
# Step 4: Another analysis gate
- analysis:
templates:
- templateName: canary-success-rate
- templateName: canary-latency
# Step 5: Increase to 50%
- setWeight: 50
- pause: { duration: 15m }
# Step 6: Final analysis before full promotion
- analysis:
templates:
- templateName: canary-success-rate
- templateName: canary-latency
- templateName: canary-error-rate
# Step 7: Full rollout
- setWeight: 100
# Auto-rollback on analysis failure
rollbackWindow:
revisions: 2
---
# Analysis template: success rate must stay above 99%
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: canary-success-rate
spec:
metrics:
- name: success-rate
interval: 60s
count: 5
successCondition: result[0] >= 0.99
failureLimit: 2
provider:
prometheus:
address: http://prometheus:9090
query: |
sum(rate(http_requests_total{
service="{{args.service-name}}",
status=~"2.."
}[2m]))
/
sum(rate(http_requests_total{
service="{{args.service-name}}"
}[2m]))
```
## Health Check Design
```typescript
// Multi-level health checks for canary validation
interface HealthCheckResult {
status: 'healthy' | 'degraded' | 'unhealthy'
checks: Record<string, {
status: 'pass' | 'fail'
latencyMs: number
message?: string
}>
version: string
uptime: number
}
async function deepHealthCheck(): Promise<HealthCheckResult> {
const checks: HealthCheckResult['checks'] = {}
// Database connectivity
const dbStart = Date.now()
try {
await db.$queryRaw`SELECT 1`
checks.database = { status: 'pass', latencyMs: Date.now() - dbStart }
} catch (err) {
checks.database = {
status: 'fail',
latencyMs: Date.now() - dbStart,
message: (err as Error).message
}
}
// Redis connectivity
const redisStart = Date.now()
try {
await redis.ping()
checks.redis = { status: 'pass', latencyMs: Date.now() - redisStart }
} catch (err) {
checks.redis = {
status: 'fail',
latencyMs: Date.now() - redisStart,
message: (err as Error).message
}
}
// Downstream service
const apiStart = Date.now()
try {
const res = await fetch('http://payment-service/health', { signal: AbortSignal.timeout(3000) })
checks.paymentService = {
status: res.ok ? 'pass' : 'fail',
latencyMs: Date.now() - apiStart,
}
} catch (err) {
checks.paymentService = {
status: 'fail',
latencyMs: Date.now() - apiStart,
message: (err as Error).message
}
}
const allPassing = Object.values(checks).every(c => c.status === 'pass')
const anyFailing = Object.values(checks).some(c => c.status === 'fail')
return {
status: allPassing ? 'healthy' : anyFailing ? 'unhealthy' : 'degraded',
checks,
version: process.env.APP_VERSION ?? 'unknown',
uptime: process.uptime(),
}
}
```
## Automated Rollback
```typescript
// Canary controller: monitor metrics and auto-rollback
interface CanaryConfig {
maxErrorRate: number // e.g., 0.02 (2%)
maxP95LatencyMs: number // e.g., 500
minSuccessRate: number // e.g., 0.99
evaluationIntervalMs: number // e.g., 60000 (1 minute)
warmupPeriodMs: number // e.g., 120000 (2 minutes, ignore initial spike)
}
class CanaryController {
private startTime: number = Date.now()
constructor(
private config: CanaryConfig,
private metrics: MetricsClient,
private deployer: DeployClient,
) {}
async evaluate(): Promise<'continue' | 'promote' | 'rollback'> {
// Skip evaluation during warmup
if (Date.now() - this.startTime < this.config.warmupPeriodMs) {
return 'continue'
}
const [errorRate, p95Latency, successRate] = await Promise.all([
this.metrics.getErrorRate('canary', '5m'),
this.metrics.getP95Latency('canary', '5m'),
this.metrics.getSuccessRate('canary', '5m'),
])
// Automatic rollback conditions
if (errorRate > this.config.maxErrorRate) {
console.error(`Canary rollback: error rate ${errorRate} > ${this.config.maxErrorRate}`)
await this.deployer.rollback()
return 'rollback'
}
if (p95Latency > this.config.maxP95LatencyMs) {
console.error(`Canary rollback: p95 latency ${p95Latency}ms > ${this.config.maxP95LatencyMs}ms`)
await this.deployer.rollback()
return 'rollback'
}
if (successRate < this.config.minSuccessRate) {
console.error(`Canary rollback: success rate ${successRate} < ${this.config.minSuccessRate}`)
await this.deployer.rollback()
return 'rollback'
}
return 'continue'
}
}
```
## CI/CD Integration
```yaml
# GitHub Actions: canary deploy pipeline
name: Canary Deploy
on:
push:
branches: [main]
jobs:
deploy-canary:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build and push image
run: |
docker build -t myapp:${{ github.sha }} .
docker push myregistry/myapp:${{ github.sha }}
- name: Deploy canary (5%)
run: |
kubectl argo rollouts set image api-server \
api=myregistry/myapp:${{ github.sha }}
- name: Wait for canary analysis
run: |
kubectl argo rollouts status api-server \
--watch \
--timeout 30m
- name: Promote or rollback
if: success()
run: |
kubectl argo rollouts promote api-server
- name: Rollback on failure
if: failure()
run: |
kubectl argo rollouts abort api-server
kubectl argo rollouts undo api-server
- name: Notify on rollback
if: failure()
uses: slackapi/slack-github-action@v1
with:
payload: |
{
"text": "Canary deploy ROLLED BACK for ${{ github.sha }}"
}
```
## Deployment Comparison Table
```
Strategy | Risk | Speed | Complexity | Use When
----------------|---------|---------|------------|---------------------------
Rolling Update | Medium | Fast | Low | Non-critical services
Blue/Green | Low | Instant | Medium | Stateless services, instant rollback needed
Canary | Low | Slow | High | Critical services, need metric validation
Shadow/Dark | None | N/A | High | Testing with production traffic (no user impact)
Feature Flag | Low | Instant | Medium | Decoupling deploy from release
```
## Checklist
- [ ] Canary starts at 5% or less of total traffic
- [ ] Minimum 10 minutes observation per traffic increase step
- [ ] Automated analysis gates between each step (error rate, latency, success rate)
- [ ] Warmup period (2-5 min) before first evaluation (ignore cold-start metrics)
- [ ] Auto-rollback on metric threshold breach (no manual approval needed)
- [ ] Health checks include downstream dependencies (DB, cache, services)
- [ ] Rollback completes in under 60 seconds
- [ ] Slack/PagerDuty notification on rollback
- [ ] Canary uses same production database and config (not staging)
- [ ] Compare canary metrics against stable baseline (not absolute thresholds)
## Anti-Patterns
- Canary without automated analysis: manual watching is error-prone and slow
- Too fast promotion: 1-minute windows miss slow-burn issues (memory leaks)
- Only checking error rate: latency degradation goes undetected
- Canary on different infrastructure than production: results not representative
- No warmup period: JIT compilation and cache cold-start cause false alarms
- Rollback requires manual approval: defeats the purpose of automated safetyRelated Skills
websocket-patterns
Connection management, room patterns, reconnection strategies, message buffering, and binary protocol design.
vector-db-patterns
Embedding strategies, ANN algorithms, hybrid search, RAG chunking strategies, and reranking for semantic search and retrieval.
tracing-patterns
OpenTelemetry setup, span context propagation, sampling strategies, Jaeger queries
terraform-patterns
Module composition, state management, workspace strategy, provider versioning, and infrastructure-as-code best practices.
swift-patterns
SwiftUI view composition, @Observable patterns, async/await concurrency, TCA architecture, and Combine reactive streams.
springboot-patterns
Spring Boot architecture patterns, REST API design, layered services, data access, caching, async processing, and logging. Use for Java Spring Boot backend work.
seo-patterns
Meta tag patterns, structured data (JSON-LD), Core Web Vitals optimization, and SSR/SSG strategies for search visibility.
secret-patterns
30+ service-specific secret detection regex patterns, entropy-based detection, PEM/JWT/Base64 identification, and false positive filtering.
saas-payment-patterns
Payment provider abstraction, webhook security, subscription lifecycle, dunning flows, pricing models, invoicing, tax handling, and refund patterns for SaaS applications.
saas-auth-patterns
SaaS authentication and authorization patterns including JWT vs session strategies, multi-tenant isolation, RBAC, API key management, passwordless flows, MFA, and secure session handling.
saas-analytics-patterns
SaaS analytics event taxonomy, metric formulas (MRR, churn, LTV), provider-agnostic tracking, funnel analysis, cohort setup, and privacy-respecting instrumentation.
revenuecat-patterns
RevenueCat SDK entegrasyon pattern'leri. iOS (Swift), Android (Kotlin), React Native ve Flutter icin setup, offerings, entitlement checking, webhook integration, StoreKit 2 migration ve sandbox testing.