gke-expert

Expert guidance for Google Kubernetes Engine (GKE) operations including cluster management, workload deployment, scaling, monitoring, troubleshooting, and optimization. Use when working with GKE clusters, Kubernetes deployments on GCP, container orchestration, or when users need help with kubectl commands, GKE networking, autoscaling, workload identity, or GKE-specific features like Autopilot, Binary Authorization, or Config Sync.

242 stars

byaiskillstore

View on GitHub Installation ↓

Best use case

gke-expert is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Expert guidance for Google Kubernetes Engine (GKE) operations including cluster management, workload deployment, scaling, monitoring, troubleshooting, and optimization. Use when working with GKE clusters, Kubernetes deployments on GCP, container orchestration, or when users need help with kubectl commands, GKE networking, autoscaling, workload identity, or GKE-specific features like Autopilot, Binary Authorization, or Config Sync.

Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.

Practical example

Example input

Use the "gke-expert" skill to help with this workflow task. Context: Expert guidance for Google Kubernetes Engine (GKE) operations including cluster management, workload deployment, scaling, monitoring, troubleshooting, and optimization. Use when working with GKE clusters, Kubernetes deployments on GCP, container orchestration, or when users need help with kubectl commands, GKE networking, autoscaling, workload identity, or GKE-specific features like Autopilot, Binary Authorization, or Config Sync.

Example output

A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.

When to use this skill

Use this skill when you want a reusable workflow rather than writing the same prompt again and again.

When not to use this skill

Do not use this when you only need a one-off answer and do not need a reusable workflow.
Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/gke-expert/SKILL.md --create-dirs "https://raw.githubusercontent.com/aiskillstore/marketplace/main/skills/adminturneddevops/gke-expert/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/gke-expert/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How gke-expert Compares

Feature / Agent	gke-expert	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# GKE Expert

Initial Assessment
When user requests GKE help, determine:

Cluster type: Autopilot or Standard?
Task: Create, Deploy, Scale, Troubleshoot, or Optimize?
Environment: Dev, Staging, or Production?

Quick Start Workflows
Create Cluster
Autopilot (recommended for most):
bashgcloud container clusters create-auto CLUSTER_NAME \
  --region=REGION \
  --release-channel=regular
Standard (for specific node requirements):
bashgcloud container clusters create CLUSTER_NAME \
  --zone=ZONE \
  --num-nodes=3 \
  --enable-autoscaling \
  --min-nodes=2 \
  --max-nodes=10
Always authenticate after creation:
bashgcloud container clusters get-credentials CLUSTER_NAME --region=REGION
Deploy Application

Create deployment manifest:

yamlapiVersion: apps/v1
kind: Deployment
metadata:
  name: APP_NAME
spec:
  replicas: 3
  selector:
    matchLabels:
      app: APP_NAME
  template:
    metadata:
      labels:
        app: APP_NAME
    spec:
      containers:
      - name: APP_NAME
        image: gcr.io/PROJECT_ID/IMAGE:TAG
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi

Apply and expose:

bashkubectl apply -f deployment.yaml
kubectl expose deployment APP_NAME --type=LoadBalancer --port=80 --target-port=8080
Setup Autoscaling
HPA for pods:
bashkubectl autoscale deployment APP_NAME --cpu-percent=70 --min=2 --max=100
Cluster autoscaling (Standard only):
bashgcloud container clusters update CLUSTER_NAME \
  --enable-autoscaling --min-nodes=2 --max-nodes=10 --zone=ZONE
Configure Workload Identity

Enable on cluster:

bashgcloud container clusters update CLUSTER_NAME \
  --workload-pool=PROJECT_ID.svc.id.goog

Link service accounts:

bash# Create GCP service account
gcloud iam service-accounts create GSA_NAME

## Create K8s service account
kubectl create serviceaccount KSA_NAME

# Bind them
gcloud iam service-accounts add-iam-policy-binding \
  GSA_NAME@PROJECT_ID.iam.gserviceaccount.com \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:PROJECT_ID.svc.id.goog[default/KSA_NAME]"

# Annotate K8s SA
kubectl annotate serviceaccount KSA_NAME \
  iam.gke.io/gcp-service-account=GSA_NAME@PROJECT_ID.iam.gserviceaccount.com
Troubleshooting Guide
Pod Issues
bash# Pod not starting - check events
kubectl describe pod POD_NAME
kubectl get events --field-selector involvedObject.name=POD_NAME

## Common fixes:

### ImagePullBackOff: Check image exists and pull secrets
### CrashLoopBackOff: kubectl logs POD_NAME --previous
### Pending: kubectl describe nodes (check resources)
### OOMKilled: Increase memory limits
Service Issues
bash# No endpoints
kubectl get endpoints SERVICE_NAME
kubectl get pods -l app=APP_NAME  # Check if pods match selector

## Test connectivity
kubectl run test --image=busybox -it --rm -- wget -O- SERVICE_NAME
Performance Issues
bash# Check resource usage
kubectl top nodes
kubectl top pods --all-namespaces

## Find bottlenecks
kubectl describe resourcequotas
kubectl describe limitranges
Production Patterns
Ingress with HTTPS
yamlapiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: APP_NAME-ingress
  annotations:
    networking.gke.io/managed-certificates: "CERT_NAME"
spec:
  rules:
  - host: example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: APP_NAME
            port:
              number: 80
Pod Disruption Budget
yamlapiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: APP_NAME-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: APP_NAME
Security Context
yamlspec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
  containers:
  - name: app
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop: ["ALL"]
Cost Optimization

Use Autopilot for automatic right-sizing
Enable cluster autoscaling with appropriate limits
Use Spot VMs for non-critical workloads:

bashgcloud container node-pools create spot-pool \
  --cluster=CLUSTER_NAME \
  --spot \
  --num-nodes=2

Set resource requests/limits appropriately
Use VPA for recommendations: kubectl describe vpa APP_NAME-vpa

Essential Commands
bash# Cluster management
gcloud container clusters list
kubectl config get-contexts
kubectl cluster-info

## Deployments
kubectl rollout status deployment/APP_NAME
kubectl rollout undo deployment/APP_NAME
kubectl scale deployment APP_NAME --replicas=5

## Debugging
kubectl logs -f POD_NAME --tail=50
kubectl exec -it POD_NAME -- /bin/bash
kubectl port-forward pod/POD_NAME 8080:80

## Monitoring
kubectl top nodes
kubectl top pods
kubectl get events --sort-by='.lastTimestamp'

## External Documentation

For detailed documentation beyond this skill:
- **Official GKE Docs**: https://cloud.google.com/kubernetes-engine/docs
- **kubectl Reference**: https://kubernetes.io/docs/reference/kubectl/
- **GKE Best Practices**: https://cloud.google.com/kubernetes-engine/docs/best-practices
- **Workload Identity**: https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity
- **GKE Pricing Calculator**: https://cloud.google.com/products/calculator

## Cleanup
kubectl delete all -l app=APP_NAME
kubectl drain NODE_NAME --ignore-daemonsets
Advanced Topics Reference

## For complex scenarios, consult:
Stateful workloads: Use StatefulSets with persistent volumes
Batch jobs: Use Jobs/CronJobs with appropriate backoff policies
Multi-region: Use Multi-cluster Ingress or Traffic Director
Service mesh: Install Anthos Service Mesh for advanced networking
GitOps: Implement Config Sync or Flux for declarative management
Monitoring: Integrate with Cloud Monitoring or install Prometheus

Related Skills

typescript-expert

242

from aiskillstore/marketplace

TypeScript and JavaScript expert with deep knowledge of type-level programming, performance optimization, monorepo management, migration strategies, and modern tooling. Use PROACTIVELY for any TypeScript/JavaScript issues including complex type gymnastics, build performance, debugging, and architectural decisions. If a specialized expert is a better fit, I will recommend switching and stop.

threat-modeling-expert

242

from aiskillstore/marketplace

Expert in threat modeling methodologies, security architecture review, and risk assessment. Masters STRIDE, PASTA, attack trees, and security requirement extraction. Use for security architecture reviews, threat identification, and secure-by-design planning.

service-mesh-expert

242

from aiskillstore/marketplace

Expert service mesh architect specializing in Istio, Linkerd, and cloud-native networking patterns. Masters traffic management, security policies, observability integration, and multi-cluster mesh con

prisma-expert

242

from aiskillstore/marketplace

Prisma ORM expert for schema design, migrations, query optimization, relations modeling, and database operations. Use PROACTIVELY for Prisma schema issues, migration problems, query performance, relation design, or database connection issues.

nosql-expert

242

from aiskillstore/marketplace

Expert guidance for distributed NoSQL databases (Cassandra, DynamoDB). Focuses on mental models, query-first modeling, single-table design, and avoiding hot partitions in high-scale systems.

nestjs-expert

242

from aiskillstore/marketplace

Nest.js framework expert specializing in module architecture, dependency injection, middleware, guards, interceptors, testing with Jest/Supertest, TypeORM/Mongoose integration, and Passport.js authentication. Use PROACTIVELY for any Nest.js application issues including architecture decisions, testing strategies, performance optimization, or debugging complex dependency injection problems. If a specialized expert is a better fit, I will recommend switching and stop.

n8n-mcp-tools-expert

242

from aiskillstore/marketplace

Expert guide for using n8n-mcp MCP tools effectively. Use when searching for nodes, validating configurations, accessing templates, managing workflows, or using any n8n-mcp tool. Provides tool selection guidance, parameter formats, and common patterns.

mermaid-expert

242

from aiskillstore/marketplace

Create Mermaid diagrams for flowcharts, sequences, ERDs, and architectures. Masters syntax for all diagram types and styling. Use PROACTIVELY for visual documentation, system diagrams, or process flows.

laravel-expert

242

from aiskillstore/marketplace

Senior Laravel Engineer role for production-grade, maintainable, and idiomatic Laravel solutions. Focuses on clean architecture, security, performance, and modern standards (Laravel 10/11+).

kotlin-coroutines-expert

242

from aiskillstore/marketplace

Expert patterns for Kotlin Coroutines and Flow, covering structured concurrency, error handling, and testing.

computer-vision-expert

242

from aiskillstore/marketplace

SOTA Computer Vision Expert (2026). Specialized in YOLO26, Segment Anything 3 (SAM 3), Vision Language Models, and real-time spatial analysis.

bevy-ecs-expert

242

from aiskillstore/marketplace

Master Bevy's Entity Component System (ECS) in Rust, covering Systems, Queries, Resources, and parallel scheduling.