deploying-cloud-k8s

Deploys applications to cloud Kubernetes (AKS/GKE/DOKS) with CI/CD pipelines. Use when deploying to production, setting up GitHub Actions, troubleshooting deployments. Covers build-time vs runtime vars, architecture matching, and battle-tested debugging.

25 stars

Best use case

deploying-cloud-k8s is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Deploys applications to cloud Kubernetes (AKS/GKE/DOKS) with CI/CD pipelines. Use when deploying to production, setting up GitHub Actions, troubleshooting deployments. Covers build-time vs runtime vars, architecture matching, and battle-tested debugging.

Teams using deploying-cloud-k8s should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/deploying-cloud-k8s/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/aiskillstore/marketplace/asmayaseen/deploying-cloud-k8s/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/deploying-cloud-k8s/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How deploying-cloud-k8s Compares

Feature / Agentdeploying-cloud-k8sStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Deploys applications to cloud Kubernetes (AKS/GKE/DOKS) with CI/CD pipelines. Use when deploying to production, setting up GitHub Actions, troubleshooting deployments. Covers build-time vs runtime vars, architecture matching, and battle-tested debugging.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Deploying Cloud K8s

## Quick Start

1. Check cluster architecture: `kubectl get nodes -o jsonpath='{.items[*].status.nodeInfo.architecture}'`
2. Match build platform to cluster (arm64 vs amd64)
3. Set up GitHub Actions with path filters
4. Deploy with Helm, passing secrets via `--set`

## Critical: Build-Time vs Runtime Variables

### The Problem

Next.js `NEXT_PUBLIC_*` variables are **embedded at build time**, not runtime:

```dockerfile
# WRONG: Runtime ENV does nothing for NEXT_PUBLIC_*
ENV NEXT_PUBLIC_API_URL=https://api.example.com

# RIGHT: Must be build ARG
ARG NEXT_PUBLIC_API_URL=https://api.example.com
ENV NEXT_PUBLIC_API_URL=$NEXT_PUBLIC_API_URL
```

### Build-Time (Next.js)

| Variable | Purpose |
|----------|---------|
| `NEXT_PUBLIC_SSO_URL` | SSO endpoint for browser OAuth |
| `NEXT_PUBLIC_API_URL` | API endpoint for browser fetch |
| `NEXT_PUBLIC_APP_URL` | App URL for redirects |

### Runtime (ConfigMaps/Secrets)

| Variable | Source |
|----------|--------|
| `DATABASE_URL` | Secret (Neon/managed DB) |
| `SSO_URL` | ConfigMap (internal K8s: `http://sso:3001`) |
| `BETTER_AUTH_SECRET` | Secret |

## Architecture Matching

**BEFORE ANY DEPLOYMENT**, check architecture:

```bash
kubectl get nodes -o jsonpath='{.items[*].status.nodeInfo.architecture}'
# Output: arm64 arm64  OR  amd64 amd64
```

### Docker Build

```yaml
- uses: docker/build-push-action@v5
  with:
    platforms: linux/arm64      # MATCH YOUR CLUSTER!
    provenance: false           # Avoid manifest issues
    no-cache: true              # When debugging
```

**Why `provenance: false`?** Buildx attestation creates complex manifest lists that cause "no match for platform" errors.

## GitHub Actions CI/CD

### Selective Builds with Path Filters

```yaml
jobs:
  changes:
    runs-on: ubuntu-latest
    outputs:
      api: ${{ steps.filter.outputs.api }}
      web: ${{ steps.filter.outputs.web }}
    steps:
      - uses: dorny/paths-filter@v3
        id: filter
        with:
          filters: |
            api:
              - 'apps/api/**'
            web:
              - 'apps/web/**'

  build-api:
    needs: changes
    if: needs.changes.outputs.api == 'true'
```

### Next.js Build Args

```yaml
- name: Build and push (web)
  uses: docker/build-push-action@v5
  with:
    build-args: |
      NEXT_PUBLIC_SSO_URL=https://sso.${{ vars.DOMAIN }}
      NEXT_PUBLIC_API_URL=https://api.${{ vars.DOMAIN }}
```

### Helm Deployment

```yaml
- name: Deploy
  run: |
    helm upgrade --install myapp ./helm/myapp \
      --set global.imageTag=${{ github.sha }} \
      --set "secrets.databaseUrl=${{ secrets.DATABASE_URL }}" \
      --set "secrets.authSecret=${{ secrets.BETTER_AUTH_SECRET }}"
```

## Troubleshooting Guide

### Quick Diagnosis Flow

```
Pod not running?
    │
    ├─► ImagePullBackOff
    │       ├─► "not found" ──► Wrong tag or registry
    │       ├─► "unauthorized" ──► Auth/imagePullSecrets
    │       └─► "no match for platform" ──► Architecture mismatch
    │
    ├─► CrashLoopBackOff
    │       ├─► "exec format error" ──► Wrong CPU architecture
    │       ├─► Exit code 1 ──► App startup failure
    │       └─► OOMKilled ──► Memory limits too low
    │
    └─► Pending
            ├─► Insufficient resources ──► Scale cluster
            └─► No matching node ──► Check nodeSelector
```

### Diagnostic Commands

```bash
kubectl get pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace> | grep -E "(Image:|Failed|Error)"
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | tail -20
kubectl logs <pod-name> -n <namespace> --tail=50
```

### Error: ImagePullBackOff "not found"

**Causes:**
- Tag doesn't exist (short vs full SHA)
- Wrong registry path
- Builds skipped by path filters

**Fix:** Verify image was pushed with exact tag used in deployment

### Error: "no match for platform in manifest"

**Cause:** Image built for wrong architecture OR buildx provenance issue

**Fix:**
```yaml
platforms: linux/arm64  # Match cluster!
provenance: false       # Simple manifest
no-cache: true          # Force rebuild
```

### Error: "exec format error"

**Cause:** Binary architecture doesn't match node

**Fix:** Rebuild with correct platform, use `no-cache: true`

### Error: Helm comma parsing

```
failed parsing --set data: key "com" has no value
```

**Cause:** Helm interprets commas as array separators

**Fix:** Use heredoc values file:
```yaml
- name: Deploy
  run: |
    cat > /tmp/overrides.yaml << EOF
    sso:
      env:
        ALLOWED_ORIGINS: "https://a.com,https://b.com"
    EOF
    helm upgrade --install app ./chart --values /tmp/overrides.yaml
```

### Error: Password authentication failed

**Cause:** Password with special characters (base64 `+/=`)

**Fix:** Use hex passwords:
```bash
# Wrong
openssl rand -base64 16  # Can have +/=

# Right
openssl rand -hex 16     # Alphanumeric only
```

### Error: Logout redirects to 0.0.0.0

**Cause:** `request.url` returns container bind address

**Fix:**
```typescript
const APP_URL = process.env.NEXT_PUBLIC_APP_URL || "http://localhost:3000";
const response = NextResponse.redirect(new URL("/", APP_URL));
```

## Pre-Deployment Checklist

### Architecture
- [ ] Checked cluster node architecture
- [ ] Build platform matches cluster

### Docker Build
- [ ] `provenance: false` set
- [ ] `platforms: linux/<arch>` matches cluster
- [ ] Image tags consistent between build and deploy

### CI/CD
- [ ] All `NEXT_PUBLIC_*` as build args
- [ ] Secrets passed via `--set` (not in values.yaml)
- [ ] Path filters configured

### Helm
- [ ] No commas in `--set` values
- [ ] Internal K8s service names for inter-service communication
- [ ] Password single source of truth in values.yaml

## Production Debugging

### Trace Request Path

```bash
# 1. Frontend logs
kubectl logs deploy/web -n myapp --tail=50

# 2. API logs
kubectl logs deploy/api -n myapp --tail=100 | grep -i error

# 3. Sidecar logs (Dapr, etc.)
kubectl logs deploy/api -n myapp -c daprd --tail=50
```

### Common Bug Patterns

| Error | Likely Cause |
|-------|--------------|
| `AttributeError: no attribute 'X'` | Model/schema mismatch |
| `404 Not Found` on internal call | Wrong endpoint URL |
| Times off by hours | Timezone handling bug |
| `greenlet_spawn not called` | Async SQLAlchemy pattern |

## GitOps with ArgoCD

```yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
  namespace: argocd
spec:
  source:
    repoURL: https://github.com/org/repo.git
    path: k8s/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: myapp
  syncPolicy:
    automated:
      prune: true      # Delete resources not in Git
      selfHeal: true   # Fix drift automatically
```

## Observability

```yaml
# ServiceMonitor for Prometheus
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp
spec:
  selector:
    matchLabels:
      app: myapp
  endpoints:
    - port: metrics
      interval: 30s
```

## Security

```yaml
# Pod Security Context
securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  capabilities:
    drop: ["ALL"]
```

## Resilience

```yaml
# HPA + PDB
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
---
apiVersion: policy/v1
kind: PodDisruptionBudget
spec:
  minAvailable: 1
```

See [references/production-patterns.md](references/production-patterns.md) for full GitOps, observability, security, and resilience patterns.

## Verification

Run: `python scripts/verify.py`

## Related Skills

- `containerizing-applications` - Docker and Helm charts
- `operating-k8s-local` - Local Kubernetes with Minikube
- `building-nextjs-apps` - Next.js patterns

## References

- [references/production-patterns.md](references/production-patterns.md) - GitOps, ArgoCD, Prometheus, RBAC, HPA, PDB

Related Skills

optimizing-cloud-costs

25
from ComeOnOliver/skillshub

Execute use when you need to work with cloud cost optimization. This skill provides cost analysis and optimization with comprehensive guidance and automation. Trigger with phrases like "optimize costs", "analyze spending", or "reduce costs".

deploying-monitoring-stacks

25
from ComeOnOliver/skillshub

This skill deploys monitoring stacks, including Prometheus, Grafana, and Datadog. It is used when the user needs to set up or configure monitoring infrastructure for applications or systems. The skill generates production-ready configurations, implements best practices, and supports multi-platform deployments. Use this when the user explicitly requests to deploy a monitoring stack, or mentions Prometheus, Grafana, or Datadog in the context of infrastructure setup.

deploying-machine-learning-models

25
from ComeOnOliver/skillshub

This skill enables Claude to deploy machine learning models to production environments. It automates the deployment workflow, implements best practices for serving models, optimizes performance, and handles potential errors. Use this skill when the user requests to deploy a model, serve a model via an API, or put a trained model into a production environment. The skill is triggered by requests containing terms like "deploy model," "productionize model," "serve model," or "model deployment."

Google Cloud Agent SDK Master

25
from ComeOnOliver/skillshub

Execute automatic activation for all google cloud agent development kit (adk) Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.

cloudwatch-alarm-creator

25
from ComeOnOliver/skillshub

Cloudwatch Alarm Creator - Auto-activating skill for AWS Skills. Triggers on: cloudwatch alarm creator, cloudwatch alarm creator Part of the AWS Skills skill category.

cloudfront-distribution-setup

25
from ComeOnOliver/skillshub

Cloudfront Distribution Setup - Auto-activating skill for AWS Skills. Triggers on: cloudfront distribution setup, cloudfront distribution setup Part of the AWS Skills skill category.

cloudformation-template-creator

25
from ComeOnOliver/skillshub

Cloudformation Template Creator - Auto-activating skill for AWS Skills. Triggers on: cloudformation template creator, cloudformation template creator Part of the AWS Skills skill category.

cloud-tasks-queue-setup

25
from ComeOnOliver/skillshub

Cloud Tasks Queue Setup - Auto-activating skill for GCP Skills. Triggers on: cloud tasks queue setup, cloud tasks queue setup Part of the GCP Skills skill category.

cloud-sql-instance-setup

25
from ComeOnOliver/skillshub

Cloud Sql Instance Setup - Auto-activating skill for GCP Skills. Triggers on: cloud sql instance setup, cloud sql instance setup Part of the GCP Skills skill category.

cloud-security-posture

25
from ComeOnOliver/skillshub

Cloud Security Posture - Auto-activating skill for Security Advanced. Triggers on: cloud security posture, cloud security posture Part of the Security Advanced skill category.

cloud-scheduler-job-creator

25
from ComeOnOliver/skillshub

Cloud Scheduler Job Creator - Auto-activating skill for GCP Skills. Triggers on: cloud scheduler job creator, cloud scheduler job creator Part of the GCP Skills skill category.

cloud-run-service-config

25
from ComeOnOliver/skillshub

Cloud Run Service Config - Auto-activating skill for GCP Skills. Triggers on: cloud run service config, cloud run service config Part of the GCP Skills skill category.