deploying-cloud-k8s
Deploys applications to cloud Kubernetes (AKS/GKE/DOKS) with CI/CD pipelines. Use when deploying to production, setting up GitHub Actions, troubleshooting deployments. Covers build-time vs runtime vars, architecture matching, and battle-tested debugging.
Best use case
deploying-cloud-k8s is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Deploys applications to cloud Kubernetes (AKS/GKE/DOKS) with CI/CD pipelines. Use when deploying to production, setting up GitHub Actions, troubleshooting deployments. Covers build-time vs runtime vars, architecture matching, and battle-tested debugging.
Deploys applications to cloud Kubernetes (AKS/GKE/DOKS) with CI/CD pipelines. Use when deploying to production, setting up GitHub Actions, troubleshooting deployments. Covers build-time vs runtime vars, architecture matching, and battle-tested debugging.
Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.
Practical example
Example input
Use the "deploying-cloud-k8s" skill to help with this workflow task. Context: Deploys applications to cloud Kubernetes (AKS/GKE/DOKS) with CI/CD pipelines. Use when deploying to production, setting up GitHub Actions, troubleshooting deployments. Covers build-time vs runtime vars, architecture matching, and battle-tested debugging.
Example output
A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.
When to use this skill
- Use this skill when you want a reusable workflow rather than writing the same prompt again and again.
When not to use this skill
- Do not use this when you only need a one-off answer and do not need a reusable workflow.
- Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/deploying-cloud-k8s/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How deploying-cloud-k8s Compares
| Feature / Agent | deploying-cloud-k8s | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Deploys applications to cloud Kubernetes (AKS/GKE/DOKS) with CI/CD pipelines. Use when deploying to production, setting up GitHub Actions, troubleshooting deployments. Covers build-time vs runtime vars, architecture matching, and battle-tested debugging.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
SKILL.md Source
# Deploying Cloud K8s
## Quick Start
1. Check cluster architecture: `kubectl get nodes -o jsonpath='{.items[*].status.nodeInfo.architecture}'`
2. Match build platform to cluster (arm64 vs amd64)
3. Set up GitHub Actions with path filters
4. Deploy with Helm, passing secrets via `--set`
## Critical: Build-Time vs Runtime Variables
### The Problem
Next.js `NEXT_PUBLIC_*` variables are **embedded at build time**, not runtime:
```dockerfile
# WRONG: Runtime ENV does nothing for NEXT_PUBLIC_*
ENV NEXT_PUBLIC_API_URL=https://api.example.com
# RIGHT: Must be build ARG
ARG NEXT_PUBLIC_API_URL=https://api.example.com
ENV NEXT_PUBLIC_API_URL=$NEXT_PUBLIC_API_URL
```
### Build-Time (Next.js)
| Variable | Purpose |
|----------|---------|
| `NEXT_PUBLIC_SSO_URL` | SSO endpoint for browser OAuth |
| `NEXT_PUBLIC_API_URL` | API endpoint for browser fetch |
| `NEXT_PUBLIC_APP_URL` | App URL for redirects |
### Runtime (ConfigMaps/Secrets)
| Variable | Source |
|----------|--------|
| `DATABASE_URL` | Secret (Neon/managed DB) |
| `SSO_URL` | ConfigMap (internal K8s: `http://sso:3001`) |
| `BETTER_AUTH_SECRET` | Secret |
## Architecture Matching
**BEFORE ANY DEPLOYMENT**, check architecture:
```bash
kubectl get nodes -o jsonpath='{.items[*].status.nodeInfo.architecture}'
# Output: arm64 arm64 OR amd64 amd64
```
### Docker Build
```yaml
- uses: docker/build-push-action@v5
with:
platforms: linux/arm64 # MATCH YOUR CLUSTER!
provenance: false # Avoid manifest issues
no-cache: true # When debugging
```
**Why `provenance: false`?** Buildx attestation creates complex manifest lists that cause "no match for platform" errors.
## GitHub Actions CI/CD
### Selective Builds with Path Filters
```yaml
jobs:
changes:
runs-on: ubuntu-latest
outputs:
api: ${{ steps.filter.outputs.api }}
web: ${{ steps.filter.outputs.web }}
steps:
- uses: dorny/paths-filter@v3
id: filter
with:
filters: |
api:
- 'apps/api/**'
web:
- 'apps/web/**'
build-api:
needs: changes
if: needs.changes.outputs.api == 'true'
```
### Next.js Build Args
```yaml
- name: Build and push (web)
uses: docker/build-push-action@v5
with:
build-args: |
NEXT_PUBLIC_SSO_URL=https://sso.${{ vars.DOMAIN }}
NEXT_PUBLIC_API_URL=https://api.${{ vars.DOMAIN }}
```
### Helm Deployment
```yaml
- name: Deploy
run: |
helm upgrade --install myapp ./helm/myapp \
--set global.imageTag=${{ github.sha }} \
--set "secrets.databaseUrl=${{ secrets.DATABASE_URL }}" \
--set "secrets.authSecret=${{ secrets.BETTER_AUTH_SECRET }}"
```
## Troubleshooting Guide
### Quick Diagnosis Flow
```
Pod not running?
│
├─► ImagePullBackOff
│ ├─► "not found" ──► Wrong tag or registry
│ ├─► "unauthorized" ──► Auth/imagePullSecrets
│ └─► "no match for platform" ──► Architecture mismatch
│
├─► CrashLoopBackOff
│ ├─► "exec format error" ──► Wrong CPU architecture
│ ├─► Exit code 1 ──► App startup failure
│ └─► OOMKilled ──► Memory limits too low
│
└─► Pending
├─► Insufficient resources ──► Scale cluster
└─► No matching node ──► Check nodeSelector
```
### Diagnostic Commands
```bash
kubectl get pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace> | grep -E "(Image:|Failed|Error)"
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | tail -20
kubectl logs <pod-name> -n <namespace> --tail=50
```
### Error: ImagePullBackOff "not found"
**Causes:**
- Tag doesn't exist (short vs full SHA)
- Wrong registry path
- Builds skipped by path filters
**Fix:** Verify image was pushed with exact tag used in deployment
### Error: "no match for platform in manifest"
**Cause:** Image built for wrong architecture OR buildx provenance issue
**Fix:**
```yaml
platforms: linux/arm64 # Match cluster!
provenance: false # Simple manifest
no-cache: true # Force rebuild
```
### Error: "exec format error"
**Cause:** Binary architecture doesn't match node
**Fix:** Rebuild with correct platform, use `no-cache: true`
### Error: Helm comma parsing
```
failed parsing --set data: key "com" has no value
```
**Cause:** Helm interprets commas as array separators
**Fix:** Use heredoc values file:
```yaml
- name: Deploy
run: |
cat > /tmp/overrides.yaml << EOF
sso:
env:
ALLOWED_ORIGINS: "https://a.com,https://b.com"
EOF
helm upgrade --install app ./chart --values /tmp/overrides.yaml
```
### Error: Password authentication failed
**Cause:** Password with special characters (base64 `+/=`)
**Fix:** Use hex passwords:
```bash
# Wrong
openssl rand -base64 16 # Can have +/=
# Right
openssl rand -hex 16 # Alphanumeric only
```
### Error: Logout redirects to 0.0.0.0
**Cause:** `request.url` returns container bind address
**Fix:**
```typescript
const APP_URL = process.env.NEXT_PUBLIC_APP_URL || "http://localhost:3000";
const response = NextResponse.redirect(new URL("/", APP_URL));
```
## Pre-Deployment Checklist
### Architecture
- [ ] Checked cluster node architecture
- [ ] Build platform matches cluster
### Docker Build
- [ ] `provenance: false` set
- [ ] `platforms: linux/<arch>` matches cluster
- [ ] Image tags consistent between build and deploy
### CI/CD
- [ ] All `NEXT_PUBLIC_*` as build args
- [ ] Secrets passed via `--set` (not in values.yaml)
- [ ] Path filters configured
### Helm
- [ ] No commas in `--set` values
- [ ] Internal K8s service names for inter-service communication
- [ ] Password single source of truth in values.yaml
## Production Debugging
### Trace Request Path
```bash
# 1. Frontend logs
kubectl logs deploy/web -n myapp --tail=50
# 2. API logs
kubectl logs deploy/api -n myapp --tail=100 | grep -i error
# 3. Sidecar logs (Dapr, etc.)
kubectl logs deploy/api -n myapp -c daprd --tail=50
```
### Common Bug Patterns
| Error | Likely Cause |
|-------|--------------|
| `AttributeError: no attribute 'X'` | Model/schema mismatch |
| `404 Not Found` on internal call | Wrong endpoint URL |
| Times off by hours | Timezone handling bug |
| `greenlet_spawn not called` | Async SQLAlchemy pattern |
## GitOps with ArgoCD
```yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp
namespace: argocd
spec:
source:
repoURL: https://github.com/org/repo.git
path: k8s/overlays/production
destination:
server: https://kubernetes.default.svc
namespace: myapp
syncPolicy:
automated:
prune: true # Delete resources not in Git
selfHeal: true # Fix drift automatically
```
## Observability
```yaml
# ServiceMonitor for Prometheus
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: myapp
spec:
selector:
matchLabels:
app: myapp
endpoints:
- port: metrics
interval: 30s
```
## Security
```yaml
# Pod Security Context
securityContext:
runAsNonRoot: true
runAsUser: 1000
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
```
## Resilience
```yaml
# HPA + PDB
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
---
apiVersion: policy/v1
kind: PodDisruptionBudget
spec:
minAvailable: 1
```
See [references/production-patterns.md](references/production-patterns.md) for full GitOps, observability, security, and resilience patterns.
## Verification
Run: `python scripts/verify.py`
## Related Skills
- `containerizing-applications` - Docker and Helm charts
- `operating-k8s-local` - Local Kubernetes with Minikube
- `building-nextjs-apps` - Next.js patterns
## References
- [references/production-patterns.md](references/production-patterns.md) - GitOps, ArgoCD, Prometheus, RBAC, HPA, PDBRelated Skills
deploying-to-production
Automate creating a GitHub repository and deploying a web project to Vercel. Use when the user asks to deploy a website/app to production, publish a project, or set up GitHub + Vercel deployment.
openclaw-secure-linux-cloud
Use when self-hosting OpenClaw on a cloud server, hardening a remote OpenClaw gateway, choosing between SSH tunneling, Tailscale, or reverse-proxy exposure, or reviewing Podman, pairing, sandboxing, token auth, and tool-permission defaults for a secure personal deployment.
multi-cloud-architecture
Design multi-cloud architectures using a decision framework to select and integrate services across AWS, Azure, and GCP. Use when building multi-cloud systems, avoiding vendor lock-in, or leveraging best-of-breed services from multiple providers.
hybrid-cloud-networking
Configure secure, high-performance connectivity between on-premises infrastructure and cloud platforms using VPN and dedicated connections. Use when building hybrid cloud architectures, connecting data centers to cloud, or implementing secure cross-premises networking.
hybrid-cloud-architect
Expert hybrid cloud architect specializing in complex multi-cloud solutions across AWS/Azure/GCP and private clouds (OpenStack/VMware). Masters hybrid connectivity, workload placement optimization, edge computing, and cross-cloud automation. Handles compliance, cost optimization, disaster recovery, and migration strategies. Use PROACTIVELY for hybrid architecture, multi-cloud strategy, or complex infrastructure integration.
gcp-cloud-run
Specialized skill for building production-ready serverless applications on GCP. Covers Cloud Run services (containerized), Cloud Run Functions (event-driven), cold start optimization, and event-driven architecture with Pub/Sub.
database-cloud-optimization-cost-optimize
You are a cloud cost optimization expert specializing in reducing infrastructure expenses while maintaining performance and reliability. Analyze cloud spending, identify savings opportunities, and implement cost-effective architectures across AWS, Azure, and GCP.
cloudformation-best-practices
CloudFormation template optimization, nested stacks, drift detection, and production-ready patterns. Use when writing or reviewing CF templates.
cloud-penetration-testing
This skill should be used when the user asks to "perform cloud penetration testing", "assess Azure or AWS or GCP security", "enumerate cloud resources", "exploit cloud misconfigurations", "test O365 security", "extract secrets from cloud environments", or "audit cloud infrastructure". It provides comprehensive techniques for security assessment across major cloud platforms.
cloud-devops
Cloud infrastructure and DevOps workflow covering AWS, Azure, GCP, Kubernetes, Terraform, CI/CD, monitoring, and cloud-native development.
cloud-architect
Expert cloud architect specializing in AWS/Azure/GCP multi-cloud infrastructure design, advanced IaC (Terraform/OpenTofu/CDK), FinOps cost optimization, and modern architectural patterns. Masters serverless, microservices, security, compliance, and disaster recovery. Use PROACTIVELY for cloud architecture, cost optimization, migration planning, or multi-cloud strategies.
azure-cloud-migrate
Assess and migrate cross-cloud workloads to Azure. Generates assessment reports and converts code from AWS, GCP, or other providers to Azure services. WHEN: "migrate Lambda to Azure Functions", "migrate AWS to Azure", "Lambda migration assessment", "convert AWS serverless to Azure", "migration readiness report", "migrate from AWS", "migrate from GCP", "cross-cloud migration".