vastai-webhooks-events

Build event-driven workflows around Vast.ai instance lifecycle events. Use when monitoring instance status changes, implementing auto-recovery, or building event-driven GPU orchestration. Trigger with phrases like "vastai events", "vastai instance monitoring", "vastai status changes", "vastai lifecycle events".

1,868 stars

Best use case

vastai-webhooks-events is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Build event-driven workflows around Vast.ai instance lifecycle events. Use when monitoring instance status changes, implementing auto-recovery, or building event-driven GPU orchestration. Trigger with phrases like "vastai events", "vastai instance monitoring", "vastai status changes", "vastai lifecycle events".

Teams using vastai-webhooks-events should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/vastai-webhooks-events/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/saas-packs/vastai-pack/skills/vastai-webhooks-events/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/vastai-webhooks-events/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How vastai-webhooks-events Compares

Feature / Agentvastai-webhooks-eventsStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Build event-driven workflows around Vast.ai instance lifecycle events. Use when monitoring instance status changes, implementing auto-recovery, or building event-driven GPU orchestration. Trigger with phrases like "vastai events", "vastai instance monitoring", "vastai status changes", "vastai lifecycle events".

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Vast.ai Webhooks & Events

## Overview
Build event-driven workflows around Vast.ai GPU instance lifecycle. Vast.ai does not provide traditional webhooks, so event detection relies on polling the REST API at `cloud.vast.ai/api/v0` and reacting to instance status transitions (loading, running, exited, error, offline).

## Prerequisites
- Vast.ai CLI authenticated
- Understanding of instance lifecycle states
- Python 3.8+ for event loop implementation

## Instructions

### Step 1: Instance Status Poller

```python
import time, json, subprocess
from typing import Callable, Dict, List

class InstanceEventPoller:
    """Poll Vast.ai API and emit events on status transitions."""

    def __init__(self, api_key: str, poll_interval: int = 30):
        self.api_key = api_key
        self.poll_interval = poll_interval
        self.previous_states: Dict[int, str] = {}
        self.handlers: Dict[str, List[Callable]] = {}

    def on(self, event: str, handler: Callable):
        self.handlers.setdefault(event, []).append(handler)

    def poll_once(self):
        result = subprocess.run(
            ["vastai", "show", "instances", "--raw"],
            capture_output=True, text=True)
        instances = json.loads(result.stdout)

        for inst in instances:
            inst_id = inst["id"]
            status = inst.get("actual_status", "unknown")
            prev = self.previous_states.get(inst_id)

            if prev and prev != status:
                event = f"{prev}_to_{status}"
                for handler in self.handlers.get(event, []):
                    handler(inst)
                for handler in self.handlers.get("any_change", []):
                    handler(inst, prev, status)

            self.previous_states[inst_id] = status

    def run(self):
        print(f"Polling every {self.poll_interval}s...")
        while True:
            self.poll_once()
            time.sleep(self.poll_interval)
```

### Step 2: Event Handlers

```python
def on_instance_running(instance):
    print(f"Instance {instance['id']} is RUNNING")
    print(f"  SSH: ssh -p {instance['ssh_port']} root@{instance['ssh_host']}")
    # Trigger: start training job, send notification, etc.

def on_instance_exited(instance):
    print(f"Instance {instance['id']} EXITED")
    # Trigger: collect results, check for errors, notify team

def on_spot_preemption(instance, old_status, new_status):
    if old_status == "running" and new_status in ("exited", "offline"):
        print(f"ALERT: Instance {instance['id']} may have been preempted")
        # Trigger: auto-recovery, provision replacement

# Wire up handlers
poller = InstanceEventPoller(api_key)
poller.on("loading_to_running", on_instance_running)
poller.on("running_to_exited", on_instance_exited)
poller.on("any_change", on_spot_preemption)
poller.run()
```

### Step 3: Auto-Recovery on Preemption

```python
def auto_recover(instance, old_status, new_status):
    """Automatically replace preempted instances."""
    if old_status != "running" or new_status not in ("exited", "offline", "error"):
        return

    gpu_name = instance.get("gpu_name", "RTX_4090")
    image = instance.get("image_uuid", "pytorch/pytorch:latest")

    print(f"Auto-recovering {instance['id']} ({gpu_name})...")

    # Search for replacement
    offers = json.loads(subprocess.run(
        ["vastai", "search", "offers",
         f"gpu_name={gpu_name} reliability>0.98 rentable=true",
         "--order", "dph_total", "--raw", "--limit", "3"],
        capture_output=True, text=True, check=True).stdout)

    if offers:
        new_id = json.loads(subprocess.run(
            ["vastai", "create", "instance", str(offers[0]["id"]),
             "--image", image, "--disk", "50", "--raw"],
            capture_output=True, text=True, check=True).stdout)["new_contract"]
        print(f"Replacement instance: {new_id}")
```

### Step 4: Cost Event Tracking

```python
def track_costs(instance, old_status, new_status):
    """Log cost events for billing tracking."""
    if new_status == "running":
        print(f"BILLING START: Instance {instance['id']} "
              f"at ${instance.get('dph_total', 0):.3f}/hr")
    elif old_status == "running":
        print(f"BILLING STOP: Instance {instance['id']}")
```

## Output
- Polling-based event detection for instance status changes
- Event handlers for running, exited, preempted states
- Auto-recovery on spot preemption
- Cost tracking event logger

## Error Handling
| Error | Cause | Solution |
|-------|-------|----------|
| Missed status transition | Poll interval too long | Reduce to 15-30s for critical instances |
| False preemption alert | Instance restarted intentionally | Track expected state changes |
| Auto-recovery loops | Same host keeps failing | Exclude failed host IDs from search |
| API timeout during poll | Network or rate limiting | Retry with backoff; continue polling |

## Resources
- [Vast.ai REST API](https://vast.ai/developers/api)
- [Instance Management](https://docs.vast.ai/api-reference/instances/create-instance)

## Next Steps
For performance optimization, see `vastai-performance-tuning`.

## Examples

**Slack notifications**: Wire `on_instance_running` to send a Slack message with SSH connection details. Wire `on_spot_preemption` to alert the team.

**Training monitor**: Track `running_to_exited` events. If exit was expected (job complete), collect results. If unexpected, trigger auto-recovery with checkpoint resume.

Related Skills

workhuman-webhooks-events

1868
from jeremylongshore/claude-code-plugins-plus-skills

Workhuman webhooks events for employee recognition and rewards API. Use when integrating Workhuman Social Recognition, or building recognition workflows with HRIS systems. Trigger: "workhuman webhooks events".

wispr-webhooks-events

1868
from jeremylongshore/claude-code-plugins-plus-skills

Wispr Flow webhooks events for voice-to-text API integration. Use when integrating Wispr Flow dictation, WebSocket streaming, or building voice-powered applications. Trigger: "wispr webhooks events".

windsurf-webhooks-events

1868
from jeremylongshore/claude-code-plugins-plus-skills

Build Windsurf extensions and integrate with VS Code extension API events. Use when building custom Windsurf extensions, tracking editor events, or integrating Windsurf with external tools via extension development. Trigger with phrases like "windsurf extension", "windsurf events", "windsurf plugin", "build windsurf extension", "windsurf API".

webflow-webhooks-events

1868
from jeremylongshore/claude-code-plugins-plus-skills

Implement Webflow webhook registration, signature verification, and event handling for form_submission, site_publish, ecomm_new_order, page_created, and more. Use when setting up webhook endpoints, implementing event-driven workflows, or handling Webflow notifications. Trigger with phrases like "webflow webhook", "webflow events", "webflow webhook signature", "handle webflow events", "webflow notifications".

vercel-webhooks-events

1868
from jeremylongshore/claude-code-plugins-plus-skills

Implement Vercel webhook handling with signature verification and event processing. Use when setting up webhook endpoints, processing deployment events, or building integrations that react to Vercel deployment lifecycle. Trigger with phrases like "vercel webhook", "vercel events", "vercel deployment.ready", "handle vercel events", "vercel webhook signature".

veeva-webhooks-events

1868
from jeremylongshore/claude-code-plugins-plus-skills

Veeva Vault webhooks events for REST API and clinical operations. Use when working with Veeva Vault document management and CRM. Trigger: "veeva webhooks events".

vastai-upgrade-migration

1868
from jeremylongshore/claude-code-plugins-plus-skills

Upgrade Vast.ai CLI, migrate API versions, and handle breaking changes. Use when upgrading vastai CLI, detecting deprecations, or migrating between API versions. Trigger with phrases like "upgrade vastai", "vastai migration", "vastai breaking changes", "update vastai CLI".

vastai-security-basics

1868
from jeremylongshore/claude-code-plugins-plus-skills

Apply Vast.ai security best practices for API keys and instance access. Use when securing API keys, hardening SSH access to GPU instances, or auditing Vast.ai security configuration. Trigger with phrases like "vastai security", "vastai secrets", "secure vastai", "vastai API key security", "vastai ssh security".

vastai-sdk-patterns

1868
from jeremylongshore/claude-code-plugins-plus-skills

Apply production-ready Vast.ai SDK patterns for Python and REST API. Use when implementing Vast.ai integrations, refactoring SDK usage, or establishing coding standards for GPU cloud operations. Trigger with phrases like "vastai SDK patterns", "vastai best practices", "vastai code patterns", "idiomatic vastai".

vastai-reference-architecture

1868
from jeremylongshore/claude-code-plugins-plus-skills

Implement Vast.ai reference architecture for GPU compute workflows. Use when designing ML training pipelines, structuring GPU orchestration, or establishing architecture patterns for Vast.ai applications. Trigger with phrases like "vastai architecture", "vastai design pattern", "vastai project structure", "vastai ml pipeline".

vastai-rate-limits

1868
from jeremylongshore/claude-code-plugins-plus-skills

Handle Vast.ai API rate limits with backoff and request optimization. Use when encountering 429 errors, implementing retry logic, or optimizing API request throughput. Trigger with phrases like "vastai rate limit", "vastai throttling", "vastai 429", "vastai retry", "vastai backoff".

vastai-prod-checklist

1868
from jeremylongshore/claude-code-plugins-plus-skills

Execute Vast.ai production deployment checklist for GPU workloads. Use when deploying training pipelines to production, preparing for large-scale GPU jobs, or auditing production readiness. Trigger with phrases like "vastai production", "deploy vastai", "vastai go-live", "vastai launch checklist".