deploying-cloud-k8s

Deploys applications to cloud Kubernetes (AKS/GKE/DOKS) with CI/CD pipelines. Use when deploying to production, setting up GitHub Actions, troubleshooting deployments. Covers build-time vs runtime vars, architecture matching, and battle-tested debugging.

242 stars

byaiskillstore

View on GitHub Installation ↓

Best use case

deploying-cloud-k8s is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Deploys applications to cloud Kubernetes (AKS/GKE/DOKS) with CI/CD pipelines. Use when deploying to production, setting up GitHub Actions, troubleshooting deployments. Covers build-time vs runtime vars, architecture matching, and battle-tested debugging.

Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.

Practical example

Example input

Use the "deploying-cloud-k8s" skill to help with this workflow task. Context: Deploys applications to cloud Kubernetes (AKS/GKE/DOKS) with CI/CD pipelines.
Use when deploying to production, setting up GitHub Actions, troubleshooting deployments.
Covers build-time vs runtime vars, architecture matching, and battle-tested debugging.

Example output

A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.

When to use this skill

Use this skill when you want a reusable workflow rather than writing the same prompt again and again.

When not to use this skill

Do not use this when you only need a one-off answer and do not need a reusable workflow.
Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/deploying-cloud-k8s/SKILL.md --create-dirs "https://raw.githubusercontent.com/aiskillstore/marketplace/main/skills/asmayaseen/deploying-cloud-k8s/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/deploying-cloud-k8s/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How deploying-cloud-k8s Compares

Feature / Agent	deploying-cloud-k8s	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# Deploying Cloud K8s

## Quick Start

1. Check cluster architecture: `kubectl get nodes -o jsonpath='{.items[*].status.nodeInfo.architecture}'`
2. Match build platform to cluster (arm64 vs amd64)
3. Set up GitHub Actions with path filters
4. Deploy with Helm, passing secrets via `--set`

## Critical: Build-Time vs Runtime Variables

### The Problem

Next.js `NEXT_PUBLIC_*` variables are **embedded at build time**, not runtime:

```dockerfile
# WRONG: Runtime ENV does nothing for NEXT_PUBLIC_*
ENV NEXT_PUBLIC_API_URL=https://api.example.com

# RIGHT: Must be build ARG
ARG NEXT_PUBLIC_API_URL=https://api.example.com
ENV NEXT_PUBLIC_API_URL=$NEXT_PUBLIC_API_URL
```

### Build-Time (Next.js)

| Variable | Purpose |
|----------|---------|
| `NEXT_PUBLIC_SSO_URL` | SSO endpoint for browser OAuth |
| `NEXT_PUBLIC_API_URL` | API endpoint for browser fetch |
| `NEXT_PUBLIC_APP_URL` | App URL for redirects |

### Runtime (ConfigMaps/Secrets)

| Variable | Source |
|----------|--------|
| `DATABASE_URL` | Secret (Neon/managed DB) |
| `SSO_URL` | ConfigMap (internal K8s: `http://sso:3001`) |
| `BETTER_AUTH_SECRET` | Secret |

## Architecture Matching

**BEFORE ANY DEPLOYMENT**, check architecture:

```bash
kubectl get nodes -o jsonpath='{.items[*].status.nodeInfo.architecture}'
# Output: arm64 arm64  OR  amd64 amd64
```

### Docker Build

```yaml
- uses: docker/build-push-action@v5
  with:
    platforms: linux/arm64      # MATCH YOUR CLUSTER!
    provenance: false           # Avoid manifest issues
    no-cache: true              # When debugging
```

**Why `provenance: false`?** Buildx attestation creates complex manifest lists that cause "no match for platform" errors.

## GitHub Actions CI/CD

### Selective Builds with Path Filters

```yaml
jobs:
  changes:
    runs-on: ubuntu-latest
    outputs:
      api: ${{ steps.filter.outputs.api }}
      web: ${{ steps.filter.outputs.web }}
    steps:
      - uses: dorny/paths-filter@v3
        id: filter
        with:
          filters: |
            api:
              - 'apps/api/**'
            web:
              - 'apps/web/**'

  build-api:
    needs: changes
    if: needs.changes.outputs.api == 'true'
```

### Next.js Build Args

```yaml
- name: Build and push (web)
  uses: docker/build-push-action@v5
  with:
    build-args: |
      NEXT_PUBLIC_SSO_URL=https://sso.${{ vars.DOMAIN }}
      NEXT_PUBLIC_API_URL=https://api.${{ vars.DOMAIN }}
```

### Helm Deployment

```yaml
- name: Deploy
  run: |
    helm upgrade --install myapp ./helm/myapp \
      --set global.imageTag=${{ github.sha }} \
      --set "secrets.databaseUrl=${{ secrets.DATABASE_URL }}" \
      --set "secrets.authSecret=${{ secrets.BETTER_AUTH_SECRET }}"
```

## Troubleshooting Guide

### Quick Diagnosis Flow

```
Pod not running?
    │
    ├─► ImagePullBackOff
    │       ├─► "not found" ──► Wrong tag or registry
    │       ├─► "unauthorized" ──► Auth/imagePullSecrets
    │       └─► "no match for platform" ──► Architecture mismatch
    │
    ├─► CrashLoopBackOff
    │       ├─► "exec format error" ──► Wrong CPU architecture
    │       ├─► Exit code 1 ──► App startup failure
    │       └─► OOMKilled ──► Memory limits too low
    │
    └─► Pending
            ├─► Insufficient resources ──► Scale cluster
            └─► No matching node ──► Check nodeSelector
```

### Diagnostic Commands

```bash
kubectl get pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace> | grep -E "(Image:|Failed|Error)"
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | tail -20
kubectl logs <pod-name> -n <namespace> --tail=50
```

### Error: ImagePullBackOff "not found"

**Causes:**
- Tag doesn't exist (short vs full SHA)
- Wrong registry path
- Builds skipped by path filters

**Fix:** Verify image was pushed with exact tag used in deployment

### Error: "no match for platform in manifest"

**Cause:** Image built for wrong architecture OR buildx provenance issue

**Fix:**
```yaml
platforms: linux/arm64  # Match cluster!
provenance: false       # Simple manifest
no-cache: true          # Force rebuild
```

### Error: "exec format error"

**Cause:** Binary architecture doesn't match node

**Fix:** Rebuild with correct platform, use `no-cache: true`

### Error: Helm comma parsing

```
failed parsing --set data: key "com" has no value
```

**Cause:** Helm interprets commas as array separators

**Fix:** Use heredoc values file:
```yaml
- name: Deploy
  run: |
    cat > /tmp/overrides.yaml << EOF
    sso:
      env:
        ALLOWED_ORIGINS: "https://a.com,https://b.com"
    EOF
    helm upgrade --install app ./chart --values /tmp/overrides.yaml
```

### Error: Password authentication failed

**Cause:** Password with special characters (base64 `+/=`)

**Fix:** Use hex passwords:
```bash
# Wrong
openssl rand -base64 16  # Can have +/=

# Right
openssl rand -hex 16     # Alphanumeric only
```

### Error: Logout redirects to 0.0.0.0

**Cause:** `request.url` returns container bind address

**Fix:**
```typescript
const APP_URL = process.env.NEXT_PUBLIC_APP_URL || "http://localhost:3000";
const response = NextResponse.redirect(new URL("/", APP_URL));
```

## Pre-Deployment Checklist

### Architecture
- [ ] Checked cluster node architecture
- [ ] Build platform matches cluster

### Docker Build
- [ ] `provenance: false` set
- [ ] `platforms: linux/<arch>` matches cluster
- [ ] Image tags consistent between build and deploy

### CI/CD
- [ ] All `NEXT_PUBLIC_*` as build args
- [ ] Secrets passed via `--set` (not in values.yaml)
- [ ] Path filters configured

### Helm
- [ ] No commas in `--set` values
- [ ] Internal K8s service names for inter-service communication
- [ ] Password single source of truth in values.yaml

## Production Debugging

### Trace Request Path

```bash
# 1. Frontend logs
kubectl logs deploy/web -n myapp --tail=50

# 2. API logs
kubectl logs deploy/api -n myapp --tail=100 | grep -i error

# 3. Sidecar logs (Dapr, etc.)
kubectl logs deploy/api -n myapp -c daprd --tail=50
```

### Common Bug Patterns

| Error | Likely Cause |
|-------|--------------|
| `AttributeError: no attribute 'X'` | Model/schema mismatch |
| `404 Not Found` on internal call | Wrong endpoint URL |
| Times off by hours | Timezone handling bug |
| `greenlet_spawn not called` | Async SQLAlchemy pattern |

## GitOps with ArgoCD

```yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
  namespace: argocd
spec:
  source:
    repoURL: https://github.com/org/repo.git
    path: k8s/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: myapp
  syncPolicy:
    automated:
      prune: true      # Delete resources not in Git
      selfHeal: true   # Fix drift automatically
```

## Observability

```yaml
# ServiceMonitor for Prometheus
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp
spec:
  selector:
    matchLabels:
      app: myapp
  endpoints:
    - port: metrics
      interval: 30s
```

## Security

```yaml
# Pod Security Context
securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  capabilities:
    drop: ["ALL"]
```

## Resilience

```yaml
# HPA + PDB
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
---
apiVersion: policy/v1
kind: PodDisruptionBudget
spec:
  minAvailable: 1
```

See [references/production-patterns.md](references/production-patterns.md) for full GitOps, observability, security, and resilience patterns.

## Verification

Run: `python scripts/verify.py`

## Related Skills

- `containerizing-applications` - Docker and Helm charts
- `operating-k8s-local` - Local Kubernetes with Minikube
- `building-nextjs-apps` - Next.js patterns

## References

- [references/production-patterns.md](references/production-patterns.md) - GitOps, ArgoCD, Prometheus, RBAC, HPA, PDB

Related Skills

deploying-to-production

242

from aiskillstore/marketplace

Automate creating a GitHub repository and deploying a web project to Vercel. Use when the user asks to deploy a website/app to production, publish a project, or set up GitHub + Vercel deployment.

openclaw-secure-linux-cloud

242

from aiskillstore/marketplace

Use when self-hosting OpenClaw on a cloud server, hardening a remote OpenClaw gateway, choosing between SSH tunneling, Tailscale, or reverse-proxy exposure, or reviewing Podman, pairing, sandboxing, token auth, and tool-permission defaults for a secure personal deployment.

multi-cloud-architecture

242

from aiskillstore/marketplace

Design multi-cloud architectures using a decision framework to select and integrate services across AWS, Azure, and GCP. Use when building multi-cloud systems, avoiding vendor lock-in, or leveraging best-of-breed services from multiple providers.

hybrid-cloud-networking

242

from aiskillstore/marketplace

Configure secure, high-performance connectivity between on-premises infrastructure and cloud platforms using VPN and dedicated connections. Use when building hybrid cloud architectures, connecting data centers to cloud, or implementing secure cross-premises networking.

hybrid-cloud-architect

242

from aiskillstore/marketplace

Expert hybrid cloud architect specializing in complex multi-cloud solutions across AWS/Azure/GCP and private clouds (OpenStack/VMware). Masters hybrid connectivity, workload placement optimization, edge computing, and cross-cloud automation. Handles compliance, cost optimization, disaster recovery, and migration strategies. Use PROACTIVELY for hybrid architecture, multi-cloud strategy, or complex infrastructure integration.

gcp-cloud-run

242

from aiskillstore/marketplace

Specialized skill for building production-ready serverless applications on GCP. Covers Cloud Run services (containerized), Cloud Run Functions (event-driven), cold start optimization, and event-driven architecture with Pub/Sub.

database-cloud-optimization-cost-optimize

242

from aiskillstore/marketplace

You are a cloud cost optimization expert specializing in reducing infrastructure expenses while maintaining performance and reliability. Analyze cloud spending, identify savings opportunities, and implement cost-effective architectures across AWS, Azure, and GCP.

cloudformation-best-practices

242

from aiskillstore/marketplace

CloudFormation template optimization, nested stacks, drift detection, and production-ready patterns. Use when writing or reviewing CF templates.

cloud-penetration-testing

242

from aiskillstore/marketplace

This skill should be used when the user asks to "perform cloud penetration testing", "assess Azure or AWS or GCP security", "enumerate cloud resources", "exploit cloud misconfigurations", "test O365 security", "extract secrets from cloud environments", or "audit cloud infrastructure". It provides comprehensive techniques for security assessment across major cloud platforms.

cloud-devops

242

from aiskillstore/marketplace

Cloud infrastructure and DevOps workflow covering AWS, Azure, GCP, Kubernetes, Terraform, CI/CD, monitoring, and cloud-native development.

cloud-architect

242

from aiskillstore/marketplace

Expert cloud architect specializing in AWS/Azure/GCP multi-cloud infrastructure design, advanced IaC (Terraform/OpenTofu/CDK), FinOps cost optimization, and modern architectural patterns. Masters serverless, microservices, security, compliance, and disaster recovery. Use PROACTIVELY for cloud architecture, cost optimization, migration planning, or multi-cloud strategies.

azure-cloud-migrate

242

from aiskillstore/marketplace

Assess and migrate cross-cloud workloads to Azure. Generates assessment reports and converts code from AWS, GCP, or other providers to Azure services. WHEN: "migrate Lambda to Azure Functions", "migrate AWS to Azure", "Lambda migration assessment", "convert AWS serverless to Azure", "migration readiness report", "migrate from AWS", "migrate from GCP", "cross-cloud migration".