k8s-debug-pods

Debug Kurtosis pods on Kubernetes. Diagnose why pods are Pending, CrashLoopBackOff, ImagePullBackOff, or Evicted. Check node taints, tolerations, resource pressure, and pod events. Use when kurtosis engine start fails or pods aren't coming online.

533 stars

bykurtosis-tech

View on GitHub Installation ↓

Best use case

k8s-debug-pods is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Debug Kurtosis pods on Kubernetes. Diagnose why pods are Pending, CrashLoopBackOff, ImagePullBackOff, or Evicted. Check node taints, tolerations, resource pressure, and pod events. Use when kurtosis engine start fails or pods aren't coming online.

Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.

Practical example

Example input

Use the "k8s-debug-pods" skill to help with this workflow task. Context: Debug Kurtosis pods on Kubernetes. Diagnose why pods are Pending, CrashLoopBackOff, ImagePullBackOff, or Evicted. Check node taints, tolerations, resource pressure, and pod events. Use when kurtosis engine start fails or pods aren't coming online.

Example output

A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.

When to use this skill

Use this skill when you want a reusable workflow rather than writing the same prompt again and again.

When not to use this skill

Do not use this when you only need a one-off answer and do not need a reusable workflow.
Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/k8s-debug-pods/SKILL.md --create-dirs "https://raw.githubusercontent.com/kurtosis-tech/kurtosis/main/skills/k8s-debug-pods/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/k8s-debug-pods/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How k8s-debug-pods Compares

Feature / Agent	k8s-debug-pods	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# K8s Debug Pods

Diagnose and fix issues with Kurtosis pods on Kubernetes.

## Quick triage

```bash
# See all kurtosis-related pods across namespaces
kubectl get pods -A | grep kurtosis

# Check for problem pods (not Running)
kubectl get pods -A | grep kurtosis | grep -v Running

# Get events for a specific pod
kubectl describe pod <POD_NAME> -n <NAMESPACE> | tail -30
```

## Common pod states and fixes

### Pending — Unschedulable

The pod can't be scheduled because of node taints, resource pressure, or affinity rules.

```bash
# Check node taints
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

# Check node conditions (DiskPressure, MemoryPressure, etc.)
kubectl get nodes -o custom-columns=NAME:.metadata.name,CONDITIONS:.status.conditions[*].type
```

**Fix**: Add tolerations to the kurtosis config at `~/Library/Application Support/kurtosis/kurtosis-config.yml` or fix the node condition.

### ImagePullBackOff

The image tag doesn't exist on the registry.

```bash
# Check which image is failing
kubectl describe pod <POD_NAME> -n <NAMESPACE> | grep -A5 "Image:"

# Verify image exists on Docker Hub
docker manifest inspect <IMAGE>:<TAG>
```

**Fix**: Push the correct image tag, or fix the image reference in the code.

### CrashLoopBackOff

The container starts but crashes immediately.

```bash
# Check container logs
kubectl logs <POD_NAME> -n <NAMESPACE>
kubectl logs <POD_NAME> -n <NAMESPACE> --previous
```

### Evicted

The node evicted the pod due to resource pressure.

```bash
# Check which nodes have pressure
kubectl get nodes -o custom-columns=NAME:.metadata.name,STATUS:.status.conditions[-1].type

# Clean up evicted pods
kubectl get pods -A | grep Evicted | awk '{print $2 " -n " $1}' | xargs -L1 kubectl delete pod
```

## Kurtosis-specific pod types

| Pod pattern | Component | Image source |
|-------------|-----------|-------------|
| `kurtosis-engine-*` | Engine server | `engine/server/Dockerfile` |
| `kurtosis-api` (in `kt-*` namespaces) | API Container (APIC) | `core/server/Dockerfile` |
| `kurtosis-logs-collector-*` | Fluentbit DaemonSet | Pulled from registry |
| `kurtosis-logs-aggregator-*` | Vector deployment | Pulled from registry |
| `remove-dir-pod-*` | Fluentbit cleanup pods | busybox |
| `files-artifact-expander` (init container) | Files artifacts | `core/files_artifacts_expander/Dockerfile` |

## Engine start failures

If `kurtosis engine start` fails:

1. Check if old kurtosis namespaces exist: `kubectl get ns | grep kurtosis`
2. Delete them: `kubectl get ns | grep kurtosis | awk '{print $1}' | xargs -r kubectl delete ns`
3. Retry engine start

## Logs collector issues

The logs collector is a DaemonSet that runs on every node. If some nodes are unhealthy:

```bash
# Check DaemonSet status
kubectl get ds -A | grep kurtosis

# See which pods are not running
kubectl get pods -A | grep logs-collector | grep -v Running
```

Nodes with DiskPressure or other taints may not schedule collector pods — this is expected and the engine should start with a warning about partially degraded collection.

Related Skills

docker-debug

533

from kurtosis-tech/kurtosis

Debug Kurtosis running on local Docker. Inspect engine, API container, and service logs. Diagnose container crashes, port conflicts, and networking issues. Use when kurtosis commands fail or services aren't reachable on Docker.

starlark-dev

533

from kurtosis-tech/kurtosis

Develop and debug Kurtosis Starlark packages. Create packages from scratch, understand the plan-based execution model, use print() debugging, handle future references, and test packages locally. Use when writing or troubleshooting .star files.

service-manage

533

from kurtosis-tech/kurtosis

Manage services in Kurtosis enclaves. Add, inspect, stop, start, remove, update services. View logs, shell into containers, and execute commands. Use when you need to interact with running services.

run-package

533

from kurtosis-tech/kurtosis

Run Starlark scripts and packages with kurtosis run. Covers all flags including dry-run, args-file, parallel execution, image download modes, verbosity levels, and production mode. Use when executing Kurtosis packages locally or from GitHub.

portal

533

from kurtosis-tech/kurtosis

Manage Kurtosis Portal for remote context access. Start, stop, and check status of the Portal daemon that enables communication with remote Kurtosis servers. Use when working with remote Kurtosis contexts.

port-forward

533

from kurtosis-tech/kurtosis

View and manage port mappings for Kurtosis services. Check which local ports map to service ports and troubleshoot connectivity. Use when services aren't reachable or you need to find the right port.

lint

533

from kurtosis-tech/kurtosis

Lint and format Kurtosis Starlark files. Check syntax, validate docstrings, and auto-format .star files. Use when writing or reviewing Starlark packages to ensure code quality.

k8s-dev-deploy

533

from kurtosis-tech/kurtosis

Build, push, and deploy Kurtosis dev images to a Kubernetes cluster without creating a release. Rebuilds engine, core, and files-artifacts-expander as multi-arch Docker images with a unique tag, pushes to the logged-in user's Docker Hub, and restarts the engine. Use when testing local code changes on a k8s cluster.

k8s-clean-cluster

533

from kurtosis-tech/kurtosis

Force-clean all Kurtosis resources from a Kubernetes cluster when kurtosis clean hangs or fails. Removes all kurtosis namespaces, pods, daemonsets, cluster roles, and cluster role bindings. Use when kurtosis clean -a hangs or leaves behind orphaned resources.

import-compose

533

from kurtosis-tech/kurtosis

Import Docker Compose files into Kurtosis. Convert docker-compose.yml to Starlark packages or run them directly. Use when migrating existing Docker Compose workflows to Kurtosis.

grafloki

533

from kurtosis-tech/kurtosis

Start Grafana and Loki for centralized log collection from Kurtosis enclaves. View aggregated service logs in a Grafana dashboard. Use when you need a UI for browsing logs across multiple services or want persistent log storage.

gateway

533

from kurtosis-tech/kurtosis

Start and manage the Kurtosis gateway for Kubernetes. The gateway forwards local ports to the Kurtosis engine and services running in a k8s cluster. Required when using Kurtosis with Kubernetes. Use when kurtosis engine status shows nothing on k8s or services aren't reachable.