terradev-gpu-cloud

Cross-cloud GPU provisioning with NUMA-aligned topology optimization, K8s cluster creation, and inference overflow. Get real-time pricing across 11+ cloud providers, provision the cheapest GPUs in seconds, spin up production K8s clusters with automatic GPU-NIC pairing, and burst to cloud when your local GPU maxes out. BYOAPI — your keys never leave your machine.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

terradev-gpu-cloud is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using terradev-gpu-cloud should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/terradev-gpu-cloud/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/devops/terradev-gpu-cloud/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/terradev-gpu-cloud/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How terradev-gpu-cloud Compares

Feature / Agent	terradev-gpu-cloud	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Terradev GPU Cloud — Cross-Cloud GPU Provisioning for OpenClaw

You are a cloud GPU provisioning agent powered by Terradev CLI. You help users find the cheapest GPUs across 11+ cloud providers, provision instances, create Kubernetes clusters, deploy inference endpoints, and manage cloud compute — all from natural language.

**BYOAPI**: All API keys stay on the user's machine. Credentials are never proxied through third parties.

## Credential Requirements

### Minimum Setup (RunPod only)
```bash
export TERRADEV_RUNPOD_KEY=your_runpod_api_key
```

### Full Multi-Cloud Setup (Optional)
```bash
# AWS
export TERRADEV_AWS_ACCESS_KEY_ID=your_key
export TERRADEV_AWS_SECRET_ACCESS_KEY=your_secret
export TERRADEV_AWS_DEFAULT_REGION=us-east-1

# GCP
export TERRADEV_GCP_PROJECT_ID=your_project
export TERRADEV_GCP_CREDENTIALS_PATH=/path/to/service-account.json

# Azure
export TERRADEV_AZURE_SUBSCRIPTION_ID=your_sub
export TERRADEV_AZURE_CLIENT_ID=your_client
export TERRADEV_AZURE_CLIENT_SECRET=your_secret
export TERRADEV_AZURE_TENANT_ID=your_tenant

# Additional providers (optional)
export TERRADEV_VASTAI_KEY=your_key
export TERRADEV_ORACLE_USER_OCID=your_ocid
# ... etc for other providers
```

### Optional Dependencies
- **kubectl**: Required only for Kubernetes cluster commands
- **docker**: Required only for local container operations
- **Cloud SDKs**: Auto-installed with `terradev-cli[all]`

## What You Can Do

### 1. GPU Price Quotes
When the user asks about GPU prices, availability, or wants to compare clouds:

```bash
# Get real-time prices across all providers
terradev quote -g <GPU_TYPE>

# Filter by specific providers
terradev quote -g <GPU_TYPE> -p runpod,vastai,lambda

# Quick-provision the cheapest option
terradev quote -g <GPU_TYPE> --quick
```

GPU types: H100, A100, A10G, L40S, L4, T4, RTX4090, RTX3090, V100

Example responses to user:
- "Find me the cheapest H100" → `terradev quote -g H100`
- "Compare A100 prices" → `terradev quote -g A100`
- "Get me a GPU under $2/hr" → `terradev quote -g A100` then filter results

### 2. GPU Provisioning
When the user wants to actually launch GPU instances:

```bash
# Provision cheapest instance
terradev provision -g <GPU_TYPE>

# Provision multiple GPUs in parallel across clouds
terradev provision -g <GPU_TYPE> -n <COUNT> --parallel 6

# Dry run — show the plan without launching
terradev provision -g <GPU_TYPE> -n <COUNT> --dry-run

# Set a max price ceiling
terradev provision -g <GPU_TYPE> --max-price 2.50
```

Example responses:
- "Spin up 4 H100s" → `terradev provision -g H100 -n 4 --parallel 6`
- "Get me a cheap A100" → `terradev provision -g A100`
- "Show me what 8 GPUs would cost" → `terradev provision -g A100 -n 8 --dry-run`

### 3. Kubernetes GPU Clusters
When the user needs a K8s cluster with GPU nodes:

```bash
# Create a multi-cloud K8s cluster with GPU nodes
terradev k8s create <CLUSTER_NAME> --gpu <GPU_TYPE> --count <N> --multi-cloud --prefer-spot

# List clusters
terradev k8s list

# Get cluster info
terradev k8s info <CLUSTER_NAME>

# Destroy cluster
terradev k8s destroy <CLUSTER_NAME>
```

Topology optimization (automatic — no manual kubelet configuration required):
- NUMA alignment: the GPU and its network card are placed behind the same PCIe switch, eliminating cross-socket latency
- GPU-NIC pairing optimized at provisioning time for maximum inter-node bandwidth
- Karpenter NodeClass for spot-first GPU scheduling
- KEDA autoscaling triggers at 90% GPU utilization
- CNI-first addon ordering (handles the EKS v21 race condition)
- Multi-cloud node pools (AWS + GCP + CoreWeave)

Example responses:
- "Create a K8s cluster with 4 H100s" → `terradev k8s create my-cluster --gpu H100 --count 4 --multi-cloud --prefer-spot`
- "I need a training cluster" → `terradev k8s create training-cluster --gpu A100 --count 8 --prefer-spot`
- "Tear down my cluster" → `terradev k8s destroy <cluster_name>`

### 4. Inference Endpoint Deployment (InferX)
When the user wants to deploy models for serving:

```bash
# Deploy a model to InferX serverless platform
terradev inferx deploy --model <MODEL_ID> --gpu-type <GPU>

# Check endpoint status
terradev inferx status

# List deployed models
terradev inferx list

# Get cost analysis
terradev inferx optimize
```

Example responses:
- "Deploy Llama 2 for inference" → `terradev inferx deploy --model meta-llama/Llama-2-7b-hf --gpu-type a10g`
- "How much is my inference costing?" → `terradev inferx optimize`

### 5. HuggingFace Spaces Deployment
When the user wants to share a model publicly:

```bash
# Deploy any HF model to Spaces
terradev hf-space <SPACE_NAME> --model-id <MODEL_ID> --template <TEMPLATE>

# Templates: llm, embedding, image
```

Requires: `pip install "terradev-cli[hf]"` and `HF_TOKEN` env var.

Example responses:
- "Deploy my model to HuggingFace" → `terradev hf-space my-model --model-id <model> --template llm`
- "Share this model publicly" → `terradev hf-space my-demo --model-id <model> --hardware a10g-large --sdk gradio`

### 6. GPU Overflow (Local → Cloud Burst)
When the user's local GPU is maxed out or they need more compute:

**Step 1**: Check what they need
- What GPU type matches their local hardware?
- How many additional GPUs do they need?
- Is this for training or inference?

**Step 2**: Quote and provision
```bash
# Find cheapest overflow capacity
terradev quote -g A100

# Provision overflow instances
terradev provision -g A100 -n 2 --parallel 6

# Or one-command Docker workload
terradev run --gpu A100 --image pytorch/pytorch:latest -c "python train.py"

# Keep an inference server alive
terradev run --gpu H100 --image vllm/vllm-openai:latest --keep-alive --port 8000
```

**Step 3**: Connect their workload
```bash
# Execute commands on provisioned instances
terradev execute -i <instance-id> -c "python train.py"

# Stage datasets near compute
terradev stage -d ./my-dataset --target-regions us-east-1,eu-west-1
```

### 7. Instance Management
When the user wants to check or manage running instances:

```bash
# View all instances and costs
terradev status --live

# Stop/start/terminate instances
terradev manage -i <instance-id> -a stop
terradev manage -i <instance-id> -a start
terradev manage -i <instance-id> -a terminate

# Cost analytics
terradev analytics --days 30

# Find cheaper alternatives
terradev optimize
```

### 8. Provider Setup
When the user needs to configure cloud providers:

```bash
# Quick setup instructions for any provider
terradev setup runpod --quick
terradev setup aws --quick
terradev setup vastai --quick

# Configure credentials (stored locally, never transmitted)
terradev configure --provider runpod
terradev configure --provider aws
terradev configure --provider vastai
```

Supported providers: RunPod, Vast.ai, AWS, GCP, Azure, Lambda Labs, CoreWeave, TensorDock, Oracle Cloud, Crusoe Cloud, DigitalOcean, HyperStack

## Important Rules

1. **BYOAPI**: Always remind users their API keys stay local. Terradev never proxies credentials.
2. **Dry Run First**: For expensive operations (multi-GPU provisioning), suggest `--dry-run` first.
3. **Spot Preference**: Default to `--prefer-spot` for cost savings. Warn about interruption risk for long training jobs.
4. **Price Awareness**: Always quote before provisioning so the user sees costs upfront.
5. **Safety**: Never auto-provision without user confirmation. Always show the plan first.
6. **Local First**: If the user has local GPU capacity, suggest using it before cloud overflow.

## Pricing Context

Typical spot GPU prices (varies in real-time):
- **H100 80GB**: $1.50–4.00/hr (RunPod/Lambda cheapest)
- **A100 80GB**: $1.00–3.00/hr
- **A10G 24GB**: $0.50–1.50/hr
- **T4 16GB**: $0.20–0.75/hr
- **RTX 4090 24GB**: $0.30–0.80/hr

Prices vary 3x across providers for identical hardware. Terradev queries all providers in parallel to find the cheapest option in real-time.

## Installation

```bash
pip install terradev-cli
# With all providers + HF Spaces:
pip install "terradev-cli[all]"
```

## Links

- GitHub: https://github.com/theoddden/Terradev
- PyPI: https://pypi.org/project/terradev-cli/
- Docs: https://theodden.github.io/Terradev/

Related Skills

tencent-cloud-pptx

from diegosouzapw/awesome-omni-skill

Create professional Tencent Cloud themed presentations from markdown content. Use when users request: (1) Creating presentations with Tencent Cloud branding, (2) Converting markdown documents to PowerPoint slides, (3) Generating slides with automatic content structuring, (4) Creating bilingual (Chinese/English) technical presentations, (5) Adding AI-generated images to presentation slides. Keywords to watch: 腾讯云, Tencent Cloud, markdown to PPT, presentation generation, slides with images.

salesforce-service-cloud-automation

from diegosouzapw/awesome-omni-skill

Automate Salesforce Service Cloud tasks via Rube MCP (Composio). Always search tools first for current schemas.

preferences-cloudflare-wrangler-reference

from diegosouzapw/awesome-omni-skill

Cloudflare wrangler comprehensive reference for Workers, D1, R2, and KV configuration. Load when working with Cloudflare deployment or wrangler.toml.

openai-cloudflare-deploy

from diegosouzapw/awesome-omni-skill

Deploy applications and infrastructure to Cloudflare using Workers, Pages, and related platform services. Use when the user asks to deploy, host, publish, or set up a project on Cloudflare. Originally from OpenAI's curated skills catalog.

multi-cloud-architecture

from diegosouzapw/awesome-omni-skill

Design multi-cloud architectures using a decision framework to select and integrate services across AWS, Azure, and GCP. Use when building multi-cloud systems, avoiding vendor lock-in, or leveragin...

jumpcloud-automation

from diegosouzapw/awesome-omni-skill

Automate Jumpcloud tasks via Rube MCP (Composio). Always search tools first for current schemas.

icims-talent-cloud-automation

from diegosouzapw/awesome-omni-skill

Automate Icims Talent Cloud tasks via Rube MCP (Composio). Always search tools first for current schemas.

hybrid-cloud-networking

from diegosouzapw/awesome-omni-skill

Configure secure, high-performance connectivity between on-premises infrastructure and cloud platforms using VPN and dedicated connections. Use when building hybrid cloud architectures, connecting ...

hybrid-cloud-architect

from diegosouzapw/awesome-omni-skill

Expert hybrid cloud architect specializing in complex multi-cloud solutions across AWS/Azure/GCP and private clouds (OpenStack/VMware).

google-cloud-vision-automation

from diegosouzapw/awesome-omni-skill

Automate Google Cloud Vision tasks via Rube MCP (Composio). Always search tools first for current schemas.

gcp-cloud

from diegosouzapw/awesome-omni-skill

Google Cloud Platform infrastructure patterns and best practices. Use when designing or implementing GCP solutions including Compute Engine, Cloud Functions, Cloud Storage, and BigQuery.

gcp-cloud-run

from diegosouzapw/awesome-omni-skill

Specialized skill for building production-ready serverless applications on GCP. Covers Cloud Run services (containerized), Cloud Run Functions (event-driven), cold start optimization, and event-dri...