istio-traffic-management
Comprehensive guide to Istio traffic management for production service mesh deployments.
About this skill
This skill empowers AI agents to provide detailed, actionable guidance on configuring and managing Istio for production service mesh environments. It covers essential traffic management concepts, best practices, and verification steps for robust microservice communication. Key areas include advanced routing strategies like canary deployments, blue/green deployments, A/B testing, and fine-grained control over service-to-service routing within an Istio service mesh. The skill guides agents in clarifying goals, applying best practices, and providing verifiable, actionable steps to ensure successful Istio configurations.
Best use case
Configure service-to-service routing in Istio; Implement advanced deployment strategies such as canary releases, blue/green deployments, or A/B testing; Troubleshoot Istio traffic flow issues; Understand and apply Istio best practices for production service mesh environments; Generate Istio configuration examples for specific traffic management scenarios.
Comprehensive guide to Istio traffic management for production service mesh deployments.
Users will receive actionable steps, relevant best practices, and verification guidance for Istio traffic management configurations. The agent can provide detailed examples or an implementation playbook when required, leading to a well-configured and validated Istio service mesh that meets specified routing and deployment goals.
Practical example
Example input
How do I set up a canary deployment for my 'my-microservice' using Istio, shifting 10% of traffic to the new 'v2' version, and what metrics should I monitor?
Example output
To set up a canary deployment for `my-microservice` in Istio, shifting 10% of traffic to version `v2`, follow these steps:
**1. Ensure your Service and Deployments are ready:**
* Confirm you have `my-microservice` (v1) and `my-microservice` (v2) deployments running with appropriate labels (e.g., `app: my-microservice`, `version: v1` and `version: v2`).
* Verify your Kubernetes Service `my-microservice` targets both versions (typically selector `app: my-microservice`).
**2. Create a DestinationRule:** Define subsets for your service versions based on labels.
```yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: my-microservice
namespace: default
spec:
host: my-microservice
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
```
**3. Create a VirtualService:** Define how traffic is routed to `my-microservice`, splitting 90% to `v1` and 10% to `v2`.
```yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: my-microservice
namespace: default
spec:
hosts:
- my-microservice
http:
- route:
- destination:
host: my-microservice
subset: v1
weight: 90
- destination:
host: my-microservice
subset: v2
weight: 10
```
**4. Apply these Configurations:** Use `kubectl apply -f <your-manifests.yaml>` to deploy the `DestinationRule` and `VirtualService` to your cluster.
**Verification & Monitoring:**
* **Traffic Monitoring:** Use Istio's built-in telemetry (e.g., Kiali, Prometheus/Grafana) to observe the traffic split and confirm 10% of requests are reaching `v2`.
* **Error Rates:** Monitor HTTP 5xx errors for both `v1` and `v2`. An increase in `v2` errors indicates an issue.
* **Latency:** Compare request latency between `v1` and `v2`. Higher latency in `v2` might signal performance regressions.
* **Resource Utilization:** Keep an eye on CPU, memory, and network I/O for `v2` pods.
* **Application-Specific Metrics:** Monitor business-critical metrics relevant to `my-microservice` (e.g., user sign-ups, transaction completion rates) for any degradation caused by `v2`.
Gradually increase the `weight` for `v2` in the `VirtualService` as you gain confidence in its stability and performance.When to use this skill
- When configuring service-to-service routing within an Istio service mesh; When planning or implementing canary deployments, blue/green deployments, or other advanced traffic shifting strategies; When seeking best practices and actionable steps for managing Istio traffic in a production environment; When you need to understand or apply Istio's advanced traffic management features.
When not to use this skill
- The task is unrelated to Istio traffic management; You need assistance with a different domain or tool outside the scope of Istio (e.g., Kubernetes networking without Istio, cloud load balancing, general networking concepts); The task requires direct modification of live production systems without explicit approval or human oversight.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/istio-traffic-management/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How istio-traffic-management Compares
| Feature / Agent | istio-traffic-management | Standard Approach |
|---|---|---|
| Platform Support | Claude | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | medium | N/A |
Frequently Asked Questions
What does this skill do?
Comprehensive guide to Istio traffic management for production service mesh deployments.
Which AI agents support this skill?
This skill is designed for Claude.
How difficult is it to install?
The installation complexity is rated as medium. You can find the installation instructions above.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Top AI Agents for Productivity
See the top AI agent skills for productivity, workflow automation, operational systems, documentation, and everyday task execution.
Cursor vs Codex for AI Workflows
Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.
SKILL.md Source
# Istio Traffic Management
Comprehensive guide to Istio traffic management for production service mesh deployments.
## Do not use this skill when
- The task is unrelated to istio traffic management
- You need a different domain or tool outside this scope
## Instructions
- Clarify goals, constraints, and required inputs.
- Apply relevant best practices and validate outcomes.
- Provide actionable steps and verification.
- If detailed examples are required, open `resources/implementation-playbook.md`.
## Use this skill when
- Configuring service-to-service routing
- Implementing canary or blue-green deployments
- Setting up circuit breakers and retries
- Load balancing configuration
- Traffic mirroring for testing
- Fault injection for chaos engineering
## Core Concepts
### 1. Traffic Management Resources
| Resource | Purpose | Scope |
|----------|---------|-------|
| **VirtualService** | Route traffic to destinations | Host-based |
| **DestinationRule** | Define policies after routing | Service-based |
| **Gateway** | Configure ingress/egress | Cluster edge |
| **ServiceEntry** | Add external services | Mesh-wide |
### 2. Traffic Flow
```
Client → Gateway → VirtualService → DestinationRule → Service
(routing) (policies) (pods)
```
## Templates
### Template 1: Basic Routing
```yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews-route
namespace: bookinfo
spec:
hosts:
- reviews
http:
- match:
- headers:
end-user:
exact: jason
route:
- destination:
host: reviews
subset: v2
- route:
- destination:
host: reviews
subset: v1
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: reviews-destination
namespace: bookinfo
spec:
host: reviews
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
- name: v3
labels:
version: v3
```
### Template 2: Canary Deployment
```yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: my-service-canary
spec:
hosts:
- my-service
http:
- route:
- destination:
host: my-service
subset: stable
weight: 90
- destination:
host: my-service
subset: canary
weight: 10
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: my-service-dr
spec:
host: my-service
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
h2UpgradePolicy: UPGRADE
http1MaxPendingRequests: 100
http2MaxRequests: 1000
subsets:
- name: stable
labels:
version: stable
- name: canary
labels:
version: canary
```
### Template 3: Circuit Breaker
```yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: circuit-breaker
spec:
host: my-service
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 100
http2MaxRequests: 1000
maxRequestsPerConnection: 10
maxRetries: 3
outlierDetection:
consecutive5xxErrors: 5
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 50
minHealthPercent: 30
```
### Template 4: Retry and Timeout
```yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: ratings-retry
spec:
hosts:
- ratings
http:
- route:
- destination:
host: ratings
timeout: 10s
retries:
attempts: 3
perTryTimeout: 3s
retryOn: connect-failure,refused-stream,unavailable,cancelled,retriable-4xx,503
retryRemoteLocalities: true
```
### Template 5: Traffic Mirroring
```yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: mirror-traffic
spec:
hosts:
- my-service
http:
- route:
- destination:
host: my-service
subset: v1
mirror:
host: my-service
subset: v2
mirrorPercentage:
value: 100.0
```
### Template 6: Fault Injection
```yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: fault-injection
spec:
hosts:
- ratings
http:
- fault:
delay:
percentage:
value: 10
fixedDelay: 5s
abort:
percentage:
value: 5
httpStatus: 503
route:
- destination:
host: ratings
```
### Template 7: Ingress Gateway
```yaml
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: my-gateway
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 443
name: https
protocol: HTTPS
tls:
mode: SIMPLE
credentialName: my-tls-secret
hosts:
- "*.example.com"
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: my-vs
spec:
hosts:
- "api.example.com"
gateways:
- my-gateway
http:
- match:
- uri:
prefix: /api/v1
route:
- destination:
host: api-service
port:
number: 8080
```
## Load Balancing Strategies
```yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: load-balancing
spec:
host: my-service
trafficPolicy:
loadBalancer:
simple: ROUND_ROBIN # or LEAST_CONN, RANDOM, PASSTHROUGH
---
# Consistent hashing for sticky sessions
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: sticky-sessions
spec:
host: my-service
trafficPolicy:
loadBalancer:
consistentHash:
httpHeaderName: x-user-id
# or: httpCookie, useSourceIp, httpQueryParameterName
```
## Best Practices
### Do's
- **Start simple** - Add complexity incrementally
- **Use subsets** - Version your services clearly
- **Set timeouts** - Always configure reasonable timeouts
- **Enable retries** - But with backoff and limits
- **Monitor** - Use Kiali and Jaeger for visibility
### Don'ts
- **Don't over-retry** - Can cause cascading failures
- **Don't ignore outlier detection** - Enable circuit breakers
- **Don't mirror to production** - Mirror to test environments
- **Don't skip canary** - Test with small traffic percentage first
## Debugging Commands
```bash
# Check VirtualService configuration
istioctl analyze
# View effective routes
istioctl proxy-config routes deploy/my-app -o json
# Check endpoint discovery
istioctl proxy-config endpoints deploy/my-app
# Debug traffic
istioctl proxy-config log deploy/my-app --level debug
```
## Resources
- [Istio Traffic Management](https://istio.io/latest/docs/concepts/traffic-management/)
- [Virtual Service Reference](https://istio.io/latest/docs/reference/config/networking/virtual-service/)
- [Destination Rule Reference](https://istio.io/latest/docs/reference/config/networking/destination-rule/)Related Skills
linux-shell-scripting
Provide production-ready shell script templates for common Linux system administration tasks including backups, monitoring, user management, log analysis, and automation. These scripts serve as building blocks for security operations and penetration testing environments.
iterate-pr
Iterate on a PR until CI passes. Use when you need to fix CI failures, address review feedback, or continuously push fixes until all checks are green. Automates the feedback-fix-push-wait cycle.
incident-runbook-templates
Production-ready templates for incident response runbooks covering detection, triage, mitigation, resolution, and communication.
incident-response-smart-fix
[Extended thinking: This workflow implements a sophisticated debugging and resolution pipeline that leverages AI-assisted debugging tools and observability platforms to systematically diagnose and res
incident-responder
Expert SRE incident responder specializing in rapid problem resolution, modern observability, and comprehensive incident management.
expo-cicd-workflows
Helps understand and write EAS workflow YAML files for Expo projects. Use this skill when the user asks about CI/CD or workflows in an Expo or EAS context, mentions .eas/workflows/, or wants help with EAS build pipelines or deployment automation.
error-diagnostics-error-trace
You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging,
error-debugging-error-trace
You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging, and ensure teams can quickly identify and resolve production issues.
error-debugging-error-analysis
You are an expert error analysis specialist with deep expertise in debugging distributed systems, analyzing production incidents, and implementing comprehensive observability solutions.
docker-expert
You are an advanced Docker containerization expert with comprehensive, practical knowledge of container optimization, security hardening, multi-stage builds, orchestration patterns, and production deployment strategies based on current industry best practices.
devops-troubleshooter
Expert DevOps troubleshooter specializing in rapid incident response, advanced debugging, and modern observability.
devops-deploy
DevOps e deploy de aplicacoes — Docker, CI/CD com GitHub Actions, AWS Lambda, SAM, Terraform, infraestrutura como codigo e monitoramento.