coreweave-debug-bundle

Collect CoreWeave cluster diagnostics for support tickets. Use when preparing a support case, collecting GPU node status, or documenting pod failures. Trigger with phrases like "coreweave debug", "coreweave support", "coreweave diagnostics", "collect coreweave logs".

1,868 stars

byjeremylongshore

View on GitHub Installation ↓

Best use case

coreweave-debug-bundle is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using coreweave-debug-bundle should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/coreweave-debug-bundle/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/saas-packs/coreweave-pack/skills/coreweave-debug-bundle/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/coreweave-debug-bundle/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How coreweave-debug-bundle Compares

Feature / Agent	coreweave-debug-bundle	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Cursor vs Codex for AI Workflows

Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

SKILL.md Source

# CoreWeave Debug Bundle

## Overview

Collect GPU node health, Kubernetes pod status, event logs, and API connectivity into a single diagnostic archive for CoreWeave support tickets. This bundle captures cluster-level resource allocation, failed pod logs, GPU device plugin state, and network reachability so support engineers can diagnose infrastructure issues without requesting additional information. Useful when GPU pods are stuck pending, inference workloads OOM, or node autoscaling behaves unexpectedly.

## Debug Collection Script

```bash
#!/bin/bash
set -euo pipefail
BUNDLE="debug-coreweave-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BUNDLE"

# Environment check
echo "=== CoreWeave Debug Bundle ===" | tee "$BUNDLE/summary.txt"
echo "Generated: $(date -u +%Y-%m-%dT%H:%M:%SZ)" >> "$BUNDLE/summary.txt"
echo "COREWEAVE_API_KEY: ${COREWEAVE_API_KEY:+[SET]}" >> "$BUNDLE/summary.txt"
echo "KUBECONFIG: ${KUBECONFIG:-default}" >> "$BUNDLE/summary.txt"
echo "kubectl: $(kubectl version --client --short 2>/dev/null || echo 'not found')" >> "$BUNDLE/summary.txt"

# API connectivity
HTTP=$(curl -s -o /dev/null -w "%{http_code}" -H "Authorization: Bearer ${COREWEAVE_API_KEY}" \
  https://api.coreweave.com/v1/namespaces 2>/dev/null || echo "000")
echo "API Status: HTTP $HTTP" >> "$BUNDLE/summary.txt"

# Cluster state
kubectl get nodes -o wide > "$BUNDLE/nodes.txt" 2>&1 || true
kubectl get pods --all-namespaces -o wide > "$BUNDLE/pods.txt" 2>&1 || true
kubectl get events --sort-by=.lastTimestamp > "$BUNDLE/events.txt" 2>&1 || true

# GPU allocation and device plugin status
kubectl describe nodes | grep -A10 "Allocated resources" > "$BUNDLE/gpu-allocation.txt" 2>&1 || true
kubectl get pods -n kube-system -l k8s-app=nvidia-device-plugin -o wide > "$BUNDLE/gpu-plugin.txt" 2>&1 || true

# Failed pod logs
for pod in $(kubectl get pods --field-selector=status.phase=Failed -o name 2>/dev/null); do
  kubectl logs "$pod" --tail=200 > "$BUNDLE/$(basename "$pod")-logs.txt" 2>&1 || true
done

# Rate limit headers
curl -s -D "$BUNDLE/rate-headers.txt" -o /dev/null \
  -H "Authorization: Bearer ${COREWEAVE_API_KEY}" \
  https://api.coreweave.com/v1/namespaces 2>/dev/null || true

tar -czf "$BUNDLE.tar.gz" "$BUNDLE" && rm -rf "$BUNDLE"
echo "Bundle: $BUNDLE.tar.gz"
```

## Analyzing the Bundle

```bash
tar -xzf debug-coreweave-*.tar.gz
cat debug-coreweave-*/summary.txt          # API + env status at a glance
grep -i "error\|fail\|oom" debug-coreweave-*/events.txt  # Critical events
cat debug-coreweave-*/gpu-allocation.txt   # GPU resource pressure
```

## Common Issues

| Symptom | Check in Bundle | Fix |
|---------|----------------|-----|
| GPU pods stuck Pending | `gpu-allocation.txt` shows 0 allocatable GPUs | Request quota increase or switch to available GPU type |
| OOMKilled on inference pod | `events.txt` for OOMKilled entries | Increase memory limits in pod spec; check model size vs allocated RAM |
| Node NotReady | `nodes.txt` status column | Check `events.txt` for kubelet issues; contact CoreWeave if persistent |
| API returns 401 | `summary.txt` shows HTTP 401 | Regenerate API key at CoreWeave dashboard; verify `COREWEAVE_API_KEY` is set |
| NVIDIA device plugin missing | `gpu-plugin.txt` empty or error | Verify namespace `kube-system` has device plugin DaemonSet; redeploy if missing |

## Automated Health Check

```typescript
async function checkCoreWeave(): Promise<void> {
  const key = process.env.COREWEAVE_API_KEY;
  if (!key) { console.error("[FAIL] COREWEAVE_API_KEY not set"); return; }

  const res = await fetch("https://api.coreweave.com/v1/namespaces", {
    headers: { Authorization: `Bearer ${key}` },
  });
  console.log(`[${res.ok ? "OK" : "FAIL"}] API: HTTP ${res.status}`);

  const limit = res.headers.get("x-ratelimit-remaining");
  if (limit) console.log(`[INFO] Rate limit remaining: ${limit}`);
}
checkCoreWeave();
```

## Resources

- [CoreWeave Status](https://status.coreweave.com)

## Next Steps

See `coreweave-common-errors` for GPU scheduling and Kubernetes troubleshooting patterns.

Related Skills

workhuman-debug-bundle

1868

from jeremylongshore/claude-code-plugins-plus-skills

Workhuman debug bundle for employee recognition and rewards API. Use when integrating Workhuman Social Recognition, or building recognition workflows with HRIS systems. Trigger: "workhuman debug bundle".

wispr-debug-bundle

1868

from jeremylongshore/claude-code-plugins-plus-skills

Wispr Flow debug bundle for voice-to-text API integration. Use when integrating Wispr Flow dictation, WebSocket streaming, or building voice-powered applications. Trigger: "wispr debug bundle".

webflow-debug-bundle

1868

from jeremylongshore/claude-code-plugins-plus-skills

Collect Webflow debug evidence for support tickets and troubleshooting. Gathers SDK version, token validation, rate limit status, site connectivity, CMS health, and error logs into a single diagnostic bundle. Trigger with phrases like "webflow debug", "webflow support bundle", "collect webflow logs", "webflow diagnostic", "webflow troubleshoot".

vercel-debug-bundle

1868

from jeremylongshore/claude-code-plugins-plus-skills

Collect Vercel debug evidence for support tickets and troubleshooting. Use when encountering persistent issues, preparing support tickets, or collecting diagnostic information for Vercel problems. Trigger with phrases like "vercel debug", "vercel support bundle", "collect vercel logs", "vercel diagnostic".

veeva-debug-bundle

1868

from jeremylongshore/claude-code-plugins-plus-skills

Veeva Vault debug bundle for REST API and clinical operations. Use when working with Veeva Vault document management and CRM. Trigger: "veeva debug bundle".

vastai-debug-bundle

1868

from jeremylongshore/claude-code-plugins-plus-skills

Collect Vast.ai debug evidence for support tickets and troubleshooting. Use when encountering persistent issues, preparing support tickets, or collecting diagnostic information for Vast.ai problems. Trigger with phrases like "vastai debug", "vastai support bundle", "collect vastai logs", "vastai diagnostic".

twinmind-debug-bundle

1868

from jeremylongshore/claude-code-plugins-plus-skills

Collect comprehensive diagnostic information for TwinMind issues. Use when preparing support requests, investigating complex problems, or gathering evidence for bug reports. Trigger with phrases like "twinmind debug", "twinmind diagnostics", "collect twinmind info", "twinmind support bundle".

together-debug-bundle

1868

from jeremylongshore/claude-code-plugins-plus-skills

Together AI debug bundle for inference, fine-tuning, and model deployment. Use when working with Together AI's OpenAI-compatible API. Trigger: "together debug bundle".

techsmith-debug-bundle

1868

from jeremylongshore/claude-code-plugins-plus-skills

TechSmith debug bundle for Snagit COM API and Camtasia automation. Use when working with TechSmith screen capture and video editing automation. Trigger: "techsmith debug bundle".

supabase-debug-bundle

1868

from jeremylongshore/claude-code-plugins-plus-skills

Collect Supabase diagnostic info for troubleshooting and support tickets. Use when debugging connection failures, auth issues, Realtime drops, Storage errors, RLS misconfigurations, or preparing a support escalation. Trigger: "supabase debug", "supabase diagnostics", "supabase support bundle", "collect supabase logs", "debug supabase connection".

stackblitz-debug-bundle

1868

from jeremylongshore/claude-code-plugins-plus-skills

Collect WebContainer diagnostic info: boot state, file system, process list. Use when working with WebContainers or StackBlitz SDK. Trigger: "stackblitz debug".

speak-debug-bundle

1868

from jeremylongshore/claude-code-plugins-plus-skills

Collect diagnostic information for Speak API issues: auth verification, audio format validation, session inspection, and network testing. Use when implementing debug bundle features, or troubleshooting Speak language learning integration issues. Trigger with phrases like "speak debug bundle", "speak debug bundle".