runpod

Cloud GPU processing via RunPod serverless. Use when setting up RunPod endpoints, deploying Docker images, managing GPU resources, troubleshooting endpoint issues, or understanding costs. Covers all 5 toolkit images (qwen-edit, realesrgan, propainter, sadtalker, qwen3-tts).

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

runpod is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using runpod should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/runpod/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/devops/runpod/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/runpod/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How runpod Compares

Feature / Agent	runpod	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# RunPod Cloud GPU

Run open-source AI models on cloud GPUs via RunPod serverless. Pay-per-second, no minimums.

## Setup

```bash
# 1. Create account at https://runpod.io
# 2. Add API key to .env
echo "RUNPOD_API_KEY=your_key_here" >> .env

# 3. Deploy any tool with --setup
python tools/image_edit.py --setup
python tools/upscale.py --setup
python tools/dewatermark.py --setup
python tools/sadtalker.py --setup
python tools/qwen3_tts.py --setup
```

Each `--setup` command:
1. Creates a RunPod **template** from the Docker image
2. Creates a serverless **endpoint** with appropriate GPU
3. Saves the endpoint ID to `.env` (e.g. `RUNPOD_QWEN_EDIT_ENDPOINT_ID`)

## Available Images

All images are public on GHCR — no authentication needed.

| Tool | Docker Image | GPU | VRAM | Typical Cost |
|------|-------------|-----|------|-------------|
| image_edit | `ghcr.io/conalmullan/video-toolkit-qwen-edit:latest` | A6000/L40S | 48GB+ | ~$0.05-0.15/job |
| upscale | `ghcr.io/conalmullan/video-toolkit-realesrgan:latest` | RTX 3090/4090 | 24GB | ~$0.01-0.05/job |
| dewatermark | `ghcr.io/conalmullan/video-toolkit-propainter:latest` | RTX 3090/4090 | 24GB | ~$0.05-0.30/job |
| sadtalker | `ghcr.io/conalmullan/video-toolkit-sadtalker:latest` | RTX 4090 | 24GB | ~$0.05-0.15/job |
| qwen3_tts | `ghcr.io/conalmullan/video-toolkit-qwen3-tts:latest` | ADA 24GB | 24GB | ~$0.01-0.05/job |

**Total monthly cost:** Rarely exceeds $10 even with heavy use.

## How It Works

All tools follow the same pattern:

```
Local CLI → Upload input to cloud storage → RunPod API → Poll for result → Download output
```

1. **File transfer:** Tools use Cloudflare R2 when configured (`R2_ACCOUNT_ID`, `R2_ACCESS_KEY_ID`, `R2_SECRET_ACCESS_KEY`, `R2_BUCKET_NAME`), falling back to free upload services
2. **RunPod API:** Tools call the `/run` endpoint, then poll `/status/{job_id}` until complete
3. **Cold vs warm start:** First request after idle spins up a worker (~30-90s). Subsequent requests are fast (~5-15s)

## Endpoint Management

### Workers

```
workersMin: 0    — Scale to zero when idle (no cost)
workersMax: 1    — Max concurrent jobs (increase for throughput)
idleTimeout: 5   — Seconds before worker scales down
```

Across all endpoints, you share a total worker pool based on your RunPod plan. If you hit limits, reduce `workersMax` on endpoints you're not actively using.

### Checking Endpoint Status

Each tool stores its endpoint ID in `.env`:

| Tool | Env Var |
|------|---------|
| image_edit | `RUNPOD_QWEN_EDIT_ENDPOINT_ID` |
| upscale | `RUNPOD_UPSCALE_ENDPOINT_ID` |
| dewatermark | `RUNPOD_DEWATERMARK_ENDPOINT_ID` |
| sadtalker | `RUNPOD_SADTALKER_ENDPOINT_ID` |
| qwen3_tts | `RUNPOD_QWEN3_TTS_ENDPOINT_ID` |

### Disabling an Endpoint

To free worker slots without deleting the endpoint, set `workersMax=0` via the RunPod dashboard or GraphQL API.

## Troubleshooting

### Force Image Pull

When you push a new Docker image version, RunPod may still use the cached old one. To force a pull:

1. Update the template's `imageName` to use `@sha256:DIGEST` notation
2. Wait for the worker to restart
3. Revert to `:latest` tag after confirming

### Cold Start Too Slow

- **qwen3-tts:** ~70s cold start, ~7s warm
- **sadtalker:** ~60s cold start, ~10s warm
- **image_edit:** ~90s cold start, ~15s warm

If cold starts are a problem, set `workersMin: 1` (costs money when idle).

### Job Fails with OOM

The model needs more VRAM than the GPU provides. Options:
- Use a larger GPU tier
- For dewatermark: reduce `--resize-ratio` (default 0.5 for safety)
- For image_edit: reduce `--steps`

### "No workers available"

You've hit your plan's concurrent worker limit. Either:
- Wait for a running job to finish
- Set `workersMax=0` on endpoints you're not using
- Upgrade your RunPod plan

## Docker Images

All Dockerfiles live in `docker/runpod-*/`. Images use `runpod/pytorch` as the base to share layers across tools.

Building for RunPod (from Apple Silicon Mac):
```bash
docker buildx build --platform linux/amd64 -t ghcr.io/conalmullan/video-toolkit-<name>:latest docker/runpod-<name>/
docker push ghcr.io/conalmullan/video-toolkit-<name>:latest
```

GHCR packages default to **private** — you must manually make them public for RunPod to pull them. Go to GitHub > Packages > Package Settings > Change Visibility.

## Cost Optimization

- Keep `workersMin: 0` on all endpoints (scale to zero)
- Only deploy endpoints you actively need
- Use `workersMax=0` to disable idle endpoints without deleting them
- Qwen3-TTS is significantly cheaper than ElevenLabs for voiceovers
- Check the RunPod dashboard for usage and billing

Related Skills

bgo

from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

terraform-engineer

from diegosouzapw/awesome-omni-skill

Use when implementing infrastructure as code with Terraform across AWS, Azure, or GCP. Invoke for module development, state management, provider configuration, multi-environment workflows, infrastructure testing.

terraform-diagrams

from diegosouzapw/awesome-omni-skill

Generates architecture diagrams from Terraform code. Use when user has .tf files or asks to visualize Terraform infrastructure.

terraform-azurerm-set-diff-analyzer

from diegosouzapw/awesome-omni-skill

Wave 5 migration placeholder for `awesome-copilot/terraform-azurerm-set-diff-analyzer` imported from antigravity-awesome-skills manifest.

terraform-aws-modules

from diegosouzapw/awesome-omni-skill

Terraform module creation for AWS — reusable modules, state management, and HCL best practices. Use when building or reviewing Terraform AWS infrastructure.

terraform-analyzer

from diegosouzapw/awesome-omni-skill

Specialized skill for analyzing Terraform configurations. Supports parsing, security scanning (tfsec, checkov), cost estimation (infracost), drift detection, and plan visualization across AWS, Azure, and GCP.

terradev-gpu-cloud

from diegosouzapw/awesome-omni-skill

Cross-cloud GPU provisioning with NUMA-aligned topology optimization, K8s cluster creation, and inference overflow. Get real-time pricing across 11+ cloud providers, provision the cheapest GPUs in seconds, spin up production K8s clusters with automatic GPU-NIC pairing, and burst to cloud when your local GPU maxes out. BYOAPI — your keys never leave your machine.

tencent-cloud-pptx

from diegosouzapw/awesome-omni-skill

Create professional Tencent Cloud themed presentations from markdown content. Use when users request: (1) Creating presentations with Tencent Cloud branding, (2) Converting markdown documents to PowerPoint slides, (3) Generating slides with automatic content structuring, (4) Creating bilingual (Chinese/English) technical presentations, (5) Adding AI-generated images to presentation slides. Keywords to watch: 腾讯云, Tencent Cloud, markdown to PPT, presentation generation, slides with images.

telegram-reminders

from diegosouzapw/awesome-omni-skill

Send reminders and messages to Telegram with cloud-based scheduling. Use when the user wants to send immediate messages or schedule future reminders to Telegram. Supports text messages, timestamp-based scheduling, recurring reminders, viewing and canceling scheduled messages, and message history.

tech-detection

from diegosouzapw/awesome-omni-skill

Detects project tech stack including languages, frameworks, package managers, and cloud platforms. Use when analyzing a project, detecting technologies, bootstrapping infrastructure, or setting up permissions. Generates project-context.json with detected stack.

team-lifecycle

from diegosouzapw/awesome-omni-skill

Unified team skill for full lifecycle - spec/impl/test. All roles invoke this skill with --role arg for role-specific execution.

synchronization

from diegosouzapw/awesome-omni-skill

Convergence to common trajectory in coupled systems