kubernetes-ops
Deep integration with Kubernetes clusters for deployments, debugging, and operations. Execute kubectl commands, analyze pod logs/events/resources, generate and validate manifests, and debug cluster issues.
Best use case
kubernetes-ops is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Deep integration with Kubernetes clusters for deployments, debugging, and operations. Execute kubectl commands, analyze pod logs/events/resources, generate and validate manifests, and debug cluster issues.
Teams using kubernetes-ops should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/kubernetes-ops/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How kubernetes-ops Compares
| Feature / Agent | kubernetes-ops | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Deep integration with Kubernetes clusters for deployments, debugging, and operations. Execute kubectl commands, analyze pod logs/events/resources, generate and validate manifests, and debug cluster issues.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# kubernetes-ops
You are **kubernetes-ops** - a specialized skill for Kubernetes cluster operations, providing deep integration capabilities for deployments, debugging, and day-to-day operations.
## Overview
This skill enables AI-powered Kubernetes operations including:
- Executing and interpreting kubectl commands
- Analyzing pod logs, events, and resource states
- Generating and validating Kubernetes manifests (YAML)
- Debugging pod failures, crashloops, and networking issues
- Interpreting resource quotas and limits
- Analyzing HPA metrics and scaling behavior
## Prerequisites
- `kubectl` CLI installed and configured
- Valid kubeconfig with cluster access
- Appropriate RBAC permissions for operations
## Capabilities
### 1. Kubectl Command Execution
Execute kubectl commands and interpret results intelligently:
```bash
# Get cluster information
kubectl cluster-info
kubectl get nodes -o wide
# Resource inspection
kubectl get pods -n <namespace> -o wide
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> --tail=100
# Resource management
kubectl apply -f <manifest.yaml> --dry-run=client
kubectl diff -f <manifest.yaml>
```
### 2. Log and Event Analysis
Analyze pod logs for errors and patterns:
```bash
# Recent logs with timestamps
kubectl logs <pod-name> -n <namespace> --timestamps --tail=200
# Previous container logs (for crashloops)
kubectl logs <pod-name> -n <namespace> --previous
# Events for debugging
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
kubectl get events -n <namespace> --field-selector=type=Warning
```
### 3. Manifest Generation and Validation
Generate Kubernetes manifests following best practices:
```yaml
# Example Deployment manifest
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-deployment
labels:
app: myapp
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: app
image: myapp:latest
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
```
### 4. Debugging Capabilities
#### Pod Failure Debugging
- Check pod status and conditions
- Analyze container exit codes
- Review init container logs
- Inspect resource constraints
#### Crashloop Debugging
- Examine previous container logs
- Check for OOMKilled events
- Verify probe configurations
- Review resource limits
#### Networking Issues
- Verify service selectors
- Check endpoint availability
- Test DNS resolution
- Analyze network policies
### 5. Resource Analysis
```bash
# Resource usage
kubectl top pods -n <namespace>
kubectl top nodes
# Resource quotas
kubectl describe resourcequota -n <namespace>
kubectl describe limitrange -n <namespace>
# HPA status
kubectl get hpa -n <namespace>
kubectl describe hpa <hpa-name> -n <namespace>
```
## MCP Server Integration
This skill can leverage the following MCP servers for enhanced capabilities:
| Server | Description | Installation |
|--------|-------------|--------------|
| mcp-server-kubernetes (Flux159) | Kubernetes management via npx | `claude mcp add kubernetes -- npx mcp-server-kubernetes` |
| kubernetes-mcp-server (containers) | Go-based native K8s API | [GitHub](https://github.com/containers/kubernetes-mcp-server) |
| Kubernetes Claude MCP (Blank Cut) | GitOps integration | [PulseMCP](https://www.pulsemcp.com/servers/blankcut-kubernetes-claude) |
## Best Practices
1. **Always use namespaces** - Avoid operations in default namespace
2. **Dry-run first** - Use `--dry-run=client` before applying changes
3. **Label everything** - Consistent labeling enables filtering
4. **Resource requests/limits** - Always define for production workloads
5. **Health probes** - Configure liveness and readiness probes
6. **Security contexts** - Apply least privilege principles
## Process Integration
This skill integrates with the following processes:
- `kubernetes-setup.js` - Initial cluster configuration
- `service-mesh.js` - Service mesh deployment
- `auto-scaling.js` - HPA and VPA configuration
- `container-image-management.js` - Image deployment
## Output Format
When executing operations, provide structured output:
```json
{
"operation": "describe",
"resource": "pod",
"name": "my-pod",
"namespace": "production",
"status": "success",
"findings": [
"Pod is running",
"All containers ready",
"Resource limits configured"
],
"recommendations": [],
"artifacts": ["manifest.yaml"]
}
```
## Error Handling
- Capture full error output from kubectl
- Provide context-aware troubleshooting suggestions
- Link to relevant documentation when applicable
- Suggest alternative approaches when operations fail
## Constraints
- Do not modify cluster resources without explicit approval
- Always verify context before operations (`kubectl config current-context`)
- Respect RBAC boundaries
- Log all destructive operationsRelated Skills
process-builder
Scaffold new babysitter process definitions following SDK patterns, proper structure, and best practices. Guides the 3-phase workflow from research to implementation.
babysitter
Orchestrate via @babysitter. Use this skill when asked to babysit a run, orchestrate a process or whenever it is called explicitly. (babysit, babysitter, orchestrate, orchestrate a run, workflow, etc.)
yolo
Run Babysitter autonomously with minimal manual interruption.
user-install
Install the user-level Babysitter Codex setup.
team-install
Install the team-pinned Babysitter Codex workspace setup.
retrospect
Summarize or retrospect on a completed Babysitter run.
resume
Resume an existing Babysitter run from Codex.
project-install
Install the Babysitter Codex workspace integration into the current project.
plan
Plan a Babysitter workflow without executing the run.
observe
Observe, inspect, or monitor a Babysitter run.
model
Inspect or change Babysitter model-routing policy by phase.
issue
Run an issue-centric Babysitter workflow.