istio-traffic-management

Comprehensive guide to Istio traffic management for production service mesh deployments.

31,392 stars
Complexity: medium

About this skill

This skill empowers AI agents to provide detailed, actionable guidance on configuring and managing Istio for production service mesh environments. It covers essential traffic management concepts, best practices, and verification steps for robust microservice communication. Key areas include advanced routing strategies like canary deployments, blue/green deployments, A/B testing, and fine-grained control over service-to-service routing within an Istio service mesh. The skill guides agents in clarifying goals, applying best practices, and providing verifiable, actionable steps to ensure successful Istio configurations.

Best use case

Configure service-to-service routing in Istio; Implement advanced deployment strategies such as canary releases, blue/green deployments, or A/B testing; Troubleshoot Istio traffic flow issues; Understand and apply Istio best practices for production service mesh environments; Generate Istio configuration examples for specific traffic management scenarios.

Comprehensive guide to Istio traffic management for production service mesh deployments.

Users will receive actionable steps, relevant best practices, and verification guidance for Istio traffic management configurations. The agent can provide detailed examples or an implementation playbook when required, leading to a well-configured and validated Istio service mesh that meets specified routing and deployment goals.

Practical example

Example input

How do I set up a canary deployment for my 'my-microservice' using Istio, shifting 10% of traffic to the new 'v2' version, and what metrics should I monitor?

Example output

To set up a canary deployment for `my-microservice` in Istio, shifting 10% of traffic to version `v2`, follow these steps:

**1. Ensure your Service and Deployments are ready:**
   *   Confirm you have `my-microservice` (v1) and `my-microservice` (v2) deployments running with appropriate labels (e.g., `app: my-microservice`, `version: v1` and `version: v2`).
   *   Verify your Kubernetes Service `my-microservice` targets both versions (typically selector `app: my-microservice`).

**2. Create a DestinationRule:** Define subsets for your service versions based on labels.
   ```yaml
   apiVersion: networking.istio.io/v1beta1
   kind: DestinationRule
   metadata:
     name: my-microservice
     namespace: default
   spec:
     host: my-microservice
     subsets:
       - name: v1
         labels:
           version: v1
       - name: v2
         labels:
           version: v2
   ```

**3. Create a VirtualService:** Define how traffic is routed to `my-microservice`, splitting 90% to `v1` and 10% to `v2`.
   ```yaml
   apiVersion: networking.istio.io/v1beta1
   kind: VirtualService
   metadata:
     name: my-microservice
     namespace: default
   spec:
     hosts:
       - my-microservice
     http:
       - route:
           - destination:
               host: my-microservice
               subset: v1
             weight: 90
           - destination:
               host: my-microservice
               subset: v2
             weight: 10
   ```

**4. Apply these Configurations:** Use `kubectl apply -f <your-manifests.yaml>` to deploy the `DestinationRule` and `VirtualService` to your cluster.

**Verification & Monitoring:**
*   **Traffic Monitoring:** Use Istio's built-in telemetry (e.g., Kiali, Prometheus/Grafana) to observe the traffic split and confirm 10% of requests are reaching `v2`.
*   **Error Rates:** Monitor HTTP 5xx errors for both `v1` and `v2`. An increase in `v2` errors indicates an issue.
*   **Latency:** Compare request latency between `v1` and `v2`. Higher latency in `v2` might signal performance regressions.
*   **Resource Utilization:** Keep an eye on CPU, memory, and network I/O for `v2` pods.
*   **Application-Specific Metrics:** Monitor business-critical metrics relevant to `my-microservice` (e.g., user sign-ups, transaction completion rates) for any degradation caused by `v2`.

Gradually increase the `weight` for `v2` in the `VirtualService` as you gain confidence in its stability and performance.

When to use this skill

  • When configuring service-to-service routing within an Istio service mesh; When planning or implementing canary deployments, blue/green deployments, or other advanced traffic shifting strategies; When seeking best practices and actionable steps for managing Istio traffic in a production environment; When you need to understand or apply Istio's advanced traffic management features.

When not to use this skill

  • The task is unrelated to Istio traffic management; You need assistance with a different domain or tool outside the scope of Istio (e.g., Kubernetes networking without Istio, cloud load balancing, general networking concepts); The task requires direct modification of live production systems without explicit approval or human oversight.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/istio-traffic-management/SKILL.md --create-dirs "https://raw.githubusercontent.com/sickn33/antigravity-awesome-skills/main/plugins/antigravity-awesome-skills-claude/skills/istio-traffic-management/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/istio-traffic-management/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How istio-traffic-management Compares

Feature / Agentistio-traffic-managementStandard Approach
Platform SupportClaudeLimited / Varies
Context Awareness High Baseline
Installation ComplexitymediumN/A

Frequently Asked Questions

What does this skill do?

Comprehensive guide to Istio traffic management for production service mesh deployments.

Which AI agents support this skill?

This skill is designed for Claude.

How difficult is it to install?

The installation complexity is rated as medium. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Istio Traffic Management

Comprehensive guide to Istio traffic management for production service mesh deployments.

## Do not use this skill when

- The task is unrelated to istio traffic management
- You need a different domain or tool outside this scope

## Instructions

- Clarify goals, constraints, and required inputs.
- Apply relevant best practices and validate outcomes.
- Provide actionable steps and verification.
- If detailed examples are required, open `resources/implementation-playbook.md`.

## Use this skill when

- Configuring service-to-service routing
- Implementing canary or blue-green deployments
- Setting up circuit breakers and retries
- Load balancing configuration
- Traffic mirroring for testing
- Fault injection for chaos engineering

## Core Concepts

### 1. Traffic Management Resources

| Resource | Purpose | Scope |
|----------|---------|-------|
| **VirtualService** | Route traffic to destinations | Host-based |
| **DestinationRule** | Define policies after routing | Service-based |
| **Gateway** | Configure ingress/egress | Cluster edge |
| **ServiceEntry** | Add external services | Mesh-wide |

### 2. Traffic Flow

```
Client → Gateway → VirtualService → DestinationRule → Service
                   (routing)        (policies)        (pods)
```

## Templates

### Template 1: Basic Routing

```yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews-route
  namespace: bookinfo
spec:
  hosts:
    - reviews
  http:
    - match:
        - headers:
            end-user:
              exact: jason
      route:
        - destination:
            host: reviews
            subset: v2
    - route:
        - destination:
            host: reviews
            subset: v1
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: reviews-destination
  namespace: bookinfo
spec:
  host: reviews
  subsets:
    - name: v1
      labels:
        version: v1
    - name: v2
      labels:
        version: v2
    - name: v3
      labels:
        version: v3
```

### Template 2: Canary Deployment

```yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: my-service-canary
spec:
  hosts:
    - my-service
  http:
    - route:
        - destination:
            host: my-service
            subset: stable
          weight: 90
        - destination:
            host: my-service
            subset: canary
          weight: 10
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: my-service-dr
spec:
  host: my-service
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        h2UpgradePolicy: UPGRADE
        http1MaxPendingRequests: 100
        http2MaxRequests: 1000
  subsets:
    - name: stable
      labels:
        version: stable
    - name: canary
      labels:
        version: canary
```

### Template 3: Circuit Breaker

```yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: circuit-breaker
spec:
  host: my-service
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 100
        http2MaxRequests: 1000
        maxRequestsPerConnection: 10
        maxRetries: 3
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50
      minHealthPercent: 30
```

### Template 4: Retry and Timeout

```yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: ratings-retry
spec:
  hosts:
    - ratings
  http:
    - route:
        - destination:
            host: ratings
      timeout: 10s
      retries:
        attempts: 3
        perTryTimeout: 3s
        retryOn: connect-failure,refused-stream,unavailable,cancelled,retriable-4xx,503
        retryRemoteLocalities: true
```

### Template 5: Traffic Mirroring

```yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: mirror-traffic
spec:
  hosts:
    - my-service
  http:
    - route:
        - destination:
            host: my-service
            subset: v1
      mirror:
        host: my-service
        subset: v2
      mirrorPercentage:
        value: 100.0
```

### Template 6: Fault Injection

```yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: fault-injection
spec:
  hosts:
    - ratings
  http:
    - fault:
        delay:
          percentage:
            value: 10
          fixedDelay: 5s
        abort:
          percentage:
            value: 5
          httpStatus: 503
      route:
        - destination:
            host: ratings
```

### Template 7: Ingress Gateway

```yaml
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: my-gateway
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 443
        name: https
        protocol: HTTPS
      tls:
        mode: SIMPLE
        credentialName: my-tls-secret
      hosts:
        - "*.example.com"
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: my-vs
spec:
  hosts:
    - "api.example.com"
  gateways:
    - my-gateway
  http:
    - match:
        - uri:
            prefix: /api/v1
      route:
        - destination:
            host: api-service
            port:
              number: 8080
```

## Load Balancing Strategies

```yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: load-balancing
spec:
  host: my-service
  trafficPolicy:
    loadBalancer:
      simple: ROUND_ROBIN  # or LEAST_CONN, RANDOM, PASSTHROUGH
---
# Consistent hashing for sticky sessions
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: sticky-sessions
spec:
  host: my-service
  trafficPolicy:
    loadBalancer:
      consistentHash:
        httpHeaderName: x-user-id
        # or: httpCookie, useSourceIp, httpQueryParameterName
```

## Best Practices

### Do's
- **Start simple** - Add complexity incrementally
- **Use subsets** - Version your services clearly
- **Set timeouts** - Always configure reasonable timeouts
- **Enable retries** - But with backoff and limits
- **Monitor** - Use Kiali and Jaeger for visibility

### Don'ts
- **Don't over-retry** - Can cause cascading failures
- **Don't ignore outlier detection** - Enable circuit breakers
- **Don't mirror to production** - Mirror to test environments
- **Don't skip canary** - Test with small traffic percentage first

## Debugging Commands

```bash
# Check VirtualService configuration
istioctl analyze

# View effective routes
istioctl proxy-config routes deploy/my-app -o json

# Check endpoint discovery
istioctl proxy-config endpoints deploy/my-app

# Debug traffic
istioctl proxy-config log deploy/my-app --level debug
```

## Resources

- [Istio Traffic Management](https://istio.io/latest/docs/concepts/traffic-management/)
- [Virtual Service Reference](https://istio.io/latest/docs/reference/config/networking/virtual-service/)
- [Destination Rule Reference](https://istio.io/latest/docs/reference/config/networking/destination-rule/)

Related Skills

linux-shell-scripting

31392
from sickn33/antigravity-awesome-skills

Provide production-ready shell script templates for common Linux system administration tasks including backups, monitoring, user management, log analysis, and automation. These scripts serve as building blocks for security operations and penetration testing environments.

DevOps & InfrastructureClaude

iterate-pr

31392
from sickn33/antigravity-awesome-skills

Iterate on a PR until CI passes. Use when you need to fix CI failures, address review feedback, or continuously push fixes until all checks are green. Automates the feedback-fix-push-wait cycle.

DevOps & InfrastructureClaude

incident-runbook-templates

31392
from sickn33/antigravity-awesome-skills

Production-ready templates for incident response runbooks covering detection, triage, mitigation, resolution, and communication.

DevOps & InfrastructureClaude

incident-response-smart-fix

31392
from sickn33/antigravity-awesome-skills

[Extended thinking: This workflow implements a sophisticated debugging and resolution pipeline that leverages AI-assisted debugging tools and observability platforms to systematically diagnose and res

DevOps & InfrastructureClaudeGitHub Copilot

incident-responder

31392
from sickn33/antigravity-awesome-skills

Expert SRE incident responder specializing in rapid problem resolution, modern observability, and comprehensive incident management.

DevOps & InfrastructureClaude

expo-cicd-workflows

31392
from sickn33/antigravity-awesome-skills

Helps understand and write EAS workflow YAML files for Expo projects. Use this skill when the user asks about CI/CD or workflows in an Expo or EAS context, mentions .eas/workflows/, or wants help with EAS build pipelines or deployment automation.

DevOps & InfrastructureClaude

error-diagnostics-error-trace

31392
from sickn33/antigravity-awesome-skills

You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging,

DevOps & InfrastructureClaude

error-debugging-error-trace

31392
from sickn33/antigravity-awesome-skills

You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging, and ensure teams can quickly identify and resolve production issues.

DevOps & InfrastructureClaude

error-debugging-error-analysis

31392
from sickn33/antigravity-awesome-skills

You are an expert error analysis specialist with deep expertise in debugging distributed systems, analyzing production incidents, and implementing comprehensive observability solutions.

DevOps & InfrastructureClaude

docker-expert

31392
from sickn33/antigravity-awesome-skills

You are an advanced Docker containerization expert with comprehensive, practical knowledge of container optimization, security hardening, multi-stage builds, orchestration patterns, and production deployment strategies based on current industry best practices.

DevOps & InfrastructureClaude

devops-troubleshooter

31392
from sickn33/antigravity-awesome-skills

Expert DevOps troubleshooter specializing in rapid incident response, advanced debugging, and modern observability.

DevOps & InfrastructureClaude

devops-deploy

31392
from sickn33/antigravity-awesome-skills

DevOps e deploy de aplicacoes — Docker, CI/CD com GitHub Actions, AWS Lambda, SAM, Terraform, infraestrutura como codigo e monitoramento.

DevOps & InfrastructureClaudeCursorGemini