vastai-migration-deep-dive

Migrate GPU workloads to or from Vast.ai, or between GPU providers. Use when switching from AWS/GCP/Azure GPU instances to Vast.ai, migrating between GPU types, or re-platforming ML infrastructure. Trigger with phrases like "migrate to vastai", "vastai migration", "switch to vastai", "vastai from aws", "vastai from lambda".

1,868 stars

Best use case

vastai-migration-deep-dive is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Migrate GPU workloads to or from Vast.ai, or between GPU providers. Use when switching from AWS/GCP/Azure GPU instances to Vast.ai, migrating between GPU types, or re-platforming ML infrastructure. Trigger with phrases like "migrate to vastai", "vastai migration", "switch to vastai", "vastai from aws", "vastai from lambda".

Teams using vastai-migration-deep-dive should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/vastai-migration-deep-dive/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/saas-packs/vastai-pack/skills/vastai-migration-deep-dive/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/vastai-migration-deep-dive/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How vastai-migration-deep-dive Compares

Feature / Agentvastai-migration-deep-diveStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Migrate GPU workloads to or from Vast.ai, or between GPU providers. Use when switching from AWS/GCP/Azure GPU instances to Vast.ai, migrating between GPU types, or re-platforming ML infrastructure. Trigger with phrases like "migrate to vastai", "vastai migration", "switch to vastai", "vastai from aws", "vastai from lambda".

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Vast.ai Migration Deep Dive

## Current State
!`vastai --version 2>/dev/null || echo 'vastai CLI not installed'`
!`pip show vastai 2>/dev/null | grep Version || echo 'N/A'`

## Overview
Migrate GPU workloads to Vast.ai from hyperscaler providers (AWS, GCP, Azure) or other GPU clouds (Lambda, RunPod, CoreWeave). Also covers migrating between GPU types on Vast.ai and the reverse migration away from Vast.ai.

## Prerequisites
- Existing GPU workload with Docker image
- Understanding of current GPU costs and utilization
- Checkpoint-based training pipeline (for training migrations)

## Instructions

### Step 1: Cost Comparison Analysis

```python
# Compare your current GPU costs against Vast.ai marketplace prices
PROVIDER_COSTS = {
    "aws_p4d.24xlarge":      {"gpu": "A100 40GB", "gpus": 8, "hourly": 32.77},
    "aws_p3.2xlarge":        {"gpu": "V100 16GB", "gpus": 1, "hourly": 3.06},
    "gcp_a2-highgpu-1g":     {"gpu": "A100 40GB", "gpus": 1, "hourly": 3.67},
    "azure_NC24ads_A100_v4": {"gpu": "A100 80GB", "gpus": 1, "hourly": 3.67},
    "lambda_1xA100":         {"gpu": "A100",      "gpus": 1, "hourly": 1.25},
}

VASTAI_TYPICAL = {
    "RTX_4090":  0.20,
    "A100":      1.50,
    "H100_SXM":  3.00,
}

def savings_analysis(current_provider, current_hourly, vastai_gpu, vastai_hourly):
    monthly_current = current_hourly * 730  # hours/month
    monthly_vastai = vastai_hourly * 730
    savings = monthly_current - monthly_vastai
    pct = (savings / monthly_current) * 100
    print(f"Current ({current_provider}): ${monthly_current:,.0f}/mo")
    print(f"Vast.ai ({vastai_gpu}): ${monthly_vastai:,.0f}/mo")
    print(f"Savings: ${savings:,.0f}/mo ({pct:.0f}%)")

savings_analysis("AWS p3.2xlarge", 3.06, "RTX_4090", 0.20)
# Output: Savings: $2,088/mo (93%)
```

### Step 2: Docker Image Migration

```bash
# Most Docker images work unchanged on Vast.ai
# Key differences:
# - Vast.ai instances run as root
# - /workspace is the default working directory
# - SSH access (not IAM roles) for authentication

# Adapt your existing Dockerfile
cat << 'DOCKERFILE' > Dockerfile.vastai
FROM your-existing-image:latest

# Vast.ai instances use /workspace by default
WORKDIR /workspace

# Install any Vast.ai-specific tools
RUN pip install boto3  # for S3 checkpoint uploads

# Copy training code
COPY src/ /workspace/src/
COPY configs/ /workspace/configs/

CMD ["python", "src/train.py"]
DOCKERFILE

docker build -t ghcr.io/org/training:vastai -f Dockerfile.vastai .
docker push ghcr.io/org/training:vastai
```

### Step 3: Adapt Cloud Storage Credentials

```python
# On AWS/GCP: IAM roles provide automatic credentials
# On Vast.ai: Pass credentials explicitly via environment variables

# Create instance with env vars for cloud storage access
vastai create instance $OFFER_ID \
  --image ghcr.io/org/training:vastai \
  --disk 100 \
  --env "AWS_ACCESS_KEY_ID=AKIA... AWS_SECRET_ACCESS_KEY=... AWS_DEFAULT_REGION=us-east-1"
```

### Step 4: Migration Validation

```bash
#!/bin/bash
set -euo pipefail
echo "Migration Validation Checklist"

# 1. Docker image runs on Vast.ai
vastai create instance $OFFER_ID --image ghcr.io/org/training:vastai --disk 50
# Wait for running...

# 2. GPU access works
ssh -p $PORT root@$HOST "nvidia-smi && python -c 'import torch; print(torch.cuda.is_available())'"

# 3. Cloud storage works
ssh -p $PORT root@$HOST "aws s3 ls s3://your-bucket/ | head -5"

# 4. Training runs and saves checkpoints
ssh -p $PORT root@$HOST "cd /workspace && python src/train.py --epochs 1 --checkpoint-dir /workspace/ckpt"

# 5. Checkpoints uploaded to cloud storage
ssh -p $PORT root@$HOST "aws s3 sync /workspace/ckpt/ s3://your-bucket/ckpt/"

# 6. Clean up
vastai destroy instance $INSTANCE_ID
echo "Migration validation complete"
```

### Step 5: Rollback Plan

```markdown
## Rollback Procedure
1. Stop all Vast.ai instances: `vastai show instances` → `vastai destroy instance ID`
2. Re-provision on original cloud provider
3. Resume training from cloud-stored checkpoint
4. Vast.ai Docker image remains available for future retry
```

## Migration Comparison

| Factor | AWS/GCP/Azure | Vast.ai |
|--------|--------------|---------|
| Pricing | Fixed, premium | Variable, 50-90% cheaper |
| GPU availability | On-demand guaranteed | Marketplace (may sell out) |
| SLA | 99.9% uptime | No SLA (spot instances) |
| IAM roles | Native | Manual credential passing |
| Networking | VPC, private subnets | Public SSH only |
| Storage | EBS/PD attached | Local disk + cloud storage |
| Support | Enterprise support | Community/email |

## Output
- Cost savings analysis comparing providers
- Adapted Docker image for Vast.ai
- Cloud credential migration pattern
- Validation script for migration testing
- Rollback procedure

## Error Handling
| Error | Cause | Solution |
|-------|-------|----------|
| Docker image incompatible | Relies on IAM roles or cloud-specific APIs | Pass credentials via env vars |
| CUDA version mismatch | Different CUDA on Vast.ai hosts | Filter by `cuda_max_good` in search |
| Data transfer too slow | Large dataset over public internet | Stage data in cloud storage, download on instance |
| No matching offers | Specific GPU unavailable | Try alternative GPU type or wait for availability |

## Resources
- [Vast.ai vs AWS](https://vast.ai/)
- [Vast.ai CLI](https://docs.vast.ai/cli/get-started)
- [Docker Migration](https://docs.docker.com/get-started/)

## Next Steps
Review `vastai-reference-architecture` for best-practice project structure.

## Examples

**AWS to Vast.ai**: Replace p3.2xlarge ($3.06/hr) with RTX 4090 ($0.20/hr) for a 93% cost reduction. Adapt the Dockerfile to pass AWS credentials via env vars for S3 checkpoint access.

**Hybrid approach**: Use Vast.ai for experimentation and hyperparameter search (cheap GPUs), then run final training on AWS for SLA guarantees.

Related Skills

workhuman-upgrade-migration

1868
from jeremylongshore/claude-code-plugins-plus-skills

Workhuman upgrade migration for employee recognition and rewards API. Use when integrating Workhuman Social Recognition, or building recognition workflows with HRIS systems. Trigger: "workhuman upgrade migration".

wispr-upgrade-migration

1868
from jeremylongshore/claude-code-plugins-plus-skills

Wispr Flow upgrade migration for voice-to-text API integration. Use when integrating Wispr Flow dictation, WebSocket streaming, or building voice-powered applications. Trigger: "wispr upgrade migration".

windsurf-upgrade-migration

1868
from jeremylongshore/claude-code-plugins-plus-skills

Upgrade Windsurf IDE, migrate settings from VS Code or Cursor, and handle breaking changes. Use when upgrading Windsurf versions, migrating from another editor, or handling configuration changes after updates. Trigger with phrases like "upgrade windsurf", "windsurf update", "migrate to windsurf", "windsurf from cursor", "windsurf from vscode".

windsurf-migration-deep-dive

1868
from jeremylongshore/claude-code-plugins-plus-skills

Migrate to Windsurf from VS Code, Cursor, or other AI IDEs with full configuration transfer. Use when migrating a team to Windsurf, transferring Cursor rules, or evaluating Windsurf against other AI editors. Trigger with phrases like "migrate to windsurf", "switch to windsurf", "windsurf from cursor", "windsurf from copilot", "windsurf evaluation".

webflow-upgrade-migration

1868
from jeremylongshore/claude-code-plugins-plus-skills

Analyze, plan, and execute Webflow SDK upgrades (webflow-api v1 to v3) with breaking change detection, API v1-to-v2 migration, and deprecation handling. Trigger with phrases like "upgrade webflow", "webflow migration", "webflow breaking changes", "update webflow SDK", "webflow v1 to v2".

webflow-migration-deep-dive

1868
from jeremylongshore/claude-code-plugins-plus-skills

Execute major Webflow migrations — from other CMS platforms to Webflow CMS, between Webflow sites, or large-scale content re-architecture using the Data API v2 bulk endpoints, strangler fig pattern, and data validation. Trigger with phrases like "migrate to webflow", "webflow migration", "import into webflow", "webflow replatform", "move content to webflow", "webflow bulk import", "wordpress to webflow".

vercel-upgrade-migration

1868
from jeremylongshore/claude-code-plugins-plus-skills

Upgrade Vercel CLI, Node.js runtime, and Next.js framework versions with breaking change detection. Use when upgrading Vercel CLI versions, migrating Node.js runtimes, or updating Next.js between major versions on Vercel. Trigger with phrases like "upgrade vercel", "vercel migration", "vercel breaking changes", "update vercel CLI", "next.js upgrade on vercel".

vercel-migration-deep-dive

1868
from jeremylongshore/claude-code-plugins-plus-skills

Migrate to Vercel from other platforms or re-architecture existing Vercel deployments. Use when migrating from Netlify, AWS, or Cloudflare to Vercel, or when re-platforming an existing Vercel application. Trigger with phrases like "migrate to vercel", "vercel migration", "switch to vercel", "netlify to vercel", "aws to vercel", "vercel replatform".

veeva-upgrade-migration

1868
from jeremylongshore/claude-code-plugins-plus-skills

Veeva Vault upgrade migration for REST API and clinical operations. Use when working with Veeva Vault document management and CRM. Trigger: "veeva upgrade migration".

veeva-migration-deep-dive

1868
from jeremylongshore/claude-code-plugins-plus-skills

Veeva Vault migration deep dive for enterprise operations. Use when implementing advanced Veeva Vault patterns. Trigger: "veeva migration deep dive".

vastai-webhooks-events

1868
from jeremylongshore/claude-code-plugins-plus-skills

Build event-driven workflows around Vast.ai instance lifecycle events. Use when monitoring instance status changes, implementing auto-recovery, or building event-driven GPU orchestration. Trigger with phrases like "vastai events", "vastai instance monitoring", "vastai status changes", "vastai lifecycle events".

vastai-upgrade-migration

1868
from jeremylongshore/claude-code-plugins-plus-skills

Upgrade Vast.ai CLI, migrate API versions, and handle breaking changes. Use when upgrading vastai CLI, detecting deprecations, or migrating between API versions. Trigger with phrases like "upgrade vastai", "vastai migration", "vastai breaking changes", "update vastai CLI".