vercel-load-scale

Load test and scale Vercel deployments with concurrency tuning and capacity planning. Use when running performance tests, planning for traffic spikes, or optimizing serverless function scaling on Vercel. Trigger with phrases like "vercel load test", "vercel scale", "vercel performance test", "vercel capacity", "vercel benchmark".

1,868 stars

byjeremylongshore

View on GitHub Installation ↓

Best use case

vercel-load-scale is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using vercel-load-scale should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/vercel-load-scale/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/saas-packs/vercel-pack/skills/vercel-load-scale/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/vercel-load-scale/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How vercel-load-scale Compares

Feature / Agent	vercel-load-scale	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

SKILL.md Source

# Vercel Load & Scale

## Overview
Load test Vercel deployments to identify scaling limits, cold start impact, and concurrency thresholds. Covers k6/autocannon test scripts, Vercel's auto-scaling model, Fluid Compute concurrency, and capacity planning.

## Prerequisites
- Load testing tool: k6, autocannon, or artillery
- Test environment deployment (never load test production without approval)
- Access to Vercel Analytics for monitoring during tests

## Instructions

### Step 1: Understand Vercel's Scaling Model
Vercel serverless functions scale automatically:

| Behavior | Details |
|----------|---------|
| Scale-up | New function instances spawn on demand |
| Scale-down | Idle instances shut down after ~15 minutes |
| Cold starts | First request to a new instance pays initialization cost |
| Concurrency | Each instance handles one request at a time (by default) |
| Fluid Compute | Pro/Enterprise: multiple requests per instance |

**Concurrency limits by plan:**

| Plan | Max Concurrent Functions |
|------|------------------------|
| Hobby | 10 |
| Pro | 1,000 |
| Enterprise | 100,000 |

### Step 2: Basic Load Test with autocannon
```bash
# Install autocannon
npm install -g autocannon

# Test with 50 concurrent connections for 30 seconds
autocannon -c 50 -d 30 https://my-app-preview.vercel.app/api/endpoint

# Output includes:
# Latency: avg, p50, p99, max
# Requests/sec: avg, min, max
# Errors: timeouts, non-2xx responses
```

### Step 3: k6 Load Test Script
```javascript
// load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';

const errorRate = new Rate('errors');
const coldStartRate = new Rate('cold_starts');
const latency = new Trend('api_latency');

export const options = {
  stages: [
    { duration: '1m', target: 10 },   // Warm up
    { duration: '3m', target: 50 },   // Ramp to 50 users
    { duration: '2m', target: 100 },  // Peak load
    { duration: '1m', target: 0 },    // Cool down
  ],
  thresholds: {
    http_req_duration: ['p(95)<2000'],  // P95 < 2s
    errors: ['rate<0.01'],              // Error rate < 1%
  },
};

export default function () {
  const res = http.get('https://my-app-preview.vercel.app/api/endpoint');

  check(res, {
    'status is 200': (r) => r.status === 200,
    'latency < 2s': (r) => r.timings.duration < 2000,
  });

  errorRate.add(res.status !== 200);
  latency.add(res.timings.duration);

  // Track cold starts if your API returns this header
  if (res.headers['X-Cold-Start'] === 'true') {
    coldStartRate.add(1);
  }

  sleep(1);
}
```

```bash
# Run the load test
k6 run load-test.js

# Run with output to JSON for analysis
k6 run --out json=results.json load-test.js
```

### Step 4: Cold Start Stress Test
```javascript
// cold-start-test.js — specifically test cold start behavior
import http from 'k6/http';
import { sleep } from 'k6';

export const options = {
  scenarios: {
    // Scenario 1: Sustained load (warm instances)
    sustained: {
      executor: 'constant-arrival-rate',
      rate: 10,
      timeUnit: '1s',
      duration: '2m',
      preAllocatedVUs: 20,
    },
    // Scenario 2: Spike (forces new cold starts)
    spike: {
      executor: 'ramping-arrival-rate',
      startRate: 10,
      timeUnit: '1s',
      stages: [
        { target: 200, duration: '10s' },  // Sudden spike
        { target: 10, duration: '1m' },     // Return to normal
      ],
      preAllocatedVUs: 300,
      startTime: '2m',  // Start after sustained phase
    },
  },
};

export default function () {
  const res = http.get('https://my-app-preview.vercel.app/api/endpoint');
  // Log cold start timing for analysis
}
```

### Step 5: Fluid Compute Concurrency Tuning
```json
// vercel.json — configure concurrency for Fluid Compute (Pro/Enterprise)
{
  "functions": {
    "api/high-throughput.ts": {
      "memory": 1024,
      "maxDuration": 30,
      "concurrency": 10
    }
  }
}
```

With Fluid Compute concurrency, a single function instance handles multiple requests:
- Reduces cold starts (fewer instances needed)
- Reduces cost (shared memory across requests)
- Best for I/O-bound functions (waiting on DB/API calls)
- Not ideal for CPU-bound functions (computation blocks other requests)

### Step 6: Capacity Planning
```
Capacity Planning Formula:

  Required instances = Peak RPS * Avg Response Time (seconds)

  Example:
  - Peak: 500 requests/second
  - Avg response: 200ms (0.2s)
  - Required: 500 * 0.2 = 100 concurrent instances

  With Fluid Compute (concurrency=10):
  - Required: 500 * 0.2 / 10 = 10 concurrent instances

  Plan check:
  - Hobby (10 concurrent): NOT sufficient
  - Pro (1000 concurrent): Sufficient with headroom
```

## Load Test Results Template

```markdown
## Load Test Report — [Date]

### Configuration
- Target: https://my-app-preview.vercel.app/api/endpoint
- Tool: k6 v0.50
- Duration: 7 minutes (ramp up → peak → cool down)
- Peak concurrent users: 100

### Results
| Metric | Value |
|--------|-------|
| Total requests | 12,450 |
| Success rate | 99.8% |
| P50 latency | 45ms |
| P95 latency | 320ms |
| P99 latency | 1,200ms |
| Max latency | 3,400ms |
| Cold start % | 8% |
| Avg cold start duration | 650ms |
| Throttled (429) | 0 |

### Recommendations
1. Cold start: 650ms avg — consider Edge Functions for latency-critical paths
2. P99 spike: caused by cold starts — Fluid Compute concurrency would help
3. No throttling at 100 concurrent — Pro plan (1000 limit) is sufficient
```

## Output
- Load test scripts for sustained and spike traffic scenarios
- Cold start frequency and duration measured
- Concurrency limits tested and validated
- Capacity plan with scaling recommendations
- Benchmark results documented

## Error Handling
| Error | Cause | Solution |
|-------|-------|----------|
| `FUNCTION_THROTTLED` (429) | Exceeded concurrent limit | Reduce test concurrency or upgrade plan |
| Vercel blocks load test | Not from approved IP | Contact Vercel support before load testing |
| High P99 but low P50 | Cold starts on spikes | Use Fluid Compute concurrency or Edge Functions |
| All requests timeout | Function region far from test origin | Set `regions` in vercel.json closer to test source |
| Inconsistent results | Shared infrastructure variability | Run multiple test rounds, use median results |

## Resources
- [Vercel Function Limits](https://vercel.com/docs/functions/limitations)
- [Concurrency Scaling](https://vercel.com/docs/functions/concurrency-scaling)
- [Fluid Compute](https://vercel.com/docs/functions/usage-and-pricing)
- [k6 Documentation](https://k6.io/docs/)
- [Vercel Load Testing Policy](https://vercel.com/kb/guide/what-s-vercel-s-policy-regarding-load-testing-deployments)

## Next Steps
For reliability patterns, see `vercel-reliability-patterns`.

Related Skills

testing-load-balancers

1868

from jeremylongshore/claude-code-plugins-plus-skills

Validate load balancer behavior, failover, and traffic distribution. Use when performing specialized testing. Trigger with phrases like "test load balancer", "validate failover", or "check traffic distribution".

windsurf-load-scale

1868

from jeremylongshore/claude-code-plugins-plus-skills

Scale Windsurf adoption across large organizations with workspace strategies and performance tuning. Use when rolling out Windsurf to 50+ developers, managing large monorepo workspaces, or planning enterprise-scale deployment. Trigger with phrases like "windsurf at scale", "windsurf large team", "windsurf monorepo", "windsurf organization", "windsurf 100 developers".

vercel-webhooks-events

1868

from jeremylongshore/claude-code-plugins-plus-skills

Implement Vercel webhook handling with signature verification and event processing. Use when setting up webhook endpoints, processing deployment events, or building integrations that react to Vercel deployment lifecycle. Trigger with phrases like "vercel webhook", "vercel events", "vercel deployment.ready", "handle vercel events", "vercel webhook signature".

vercel-upgrade-migration

1868

from jeremylongshore/claude-code-plugins-plus-skills

Upgrade Vercel CLI, Node.js runtime, and Next.js framework versions with breaking change detection. Use when upgrading Vercel CLI versions, migrating Node.js runtimes, or updating Next.js between major versions on Vercel. Trigger with phrases like "upgrade vercel", "vercel migration", "vercel breaking changes", "update vercel CLI", "next.js upgrade on vercel".

vercel-security-basics

1868

from jeremylongshore/claude-code-plugins-plus-skills

Apply Vercel security best practices for secrets, headers, and access control. Use when securing API keys, configuring security headers, or auditing Vercel security configuration. Trigger with phrases like "vercel security", "vercel secrets", "secure vercel", "vercel headers", "vercel CSP".

vercel-sdk-patterns

1868

from jeremylongshore/claude-code-plugins-plus-skills

Production-ready Vercel REST API patterns with typed fetch wrappers and error handling. Use when integrating with the Vercel API programmatically, building deployment tools, or establishing team coding standards for Vercel API calls. Trigger with phrases like "vercel SDK patterns", "vercel API wrapper", "vercel REST API client", "vercel best practices", "idiomatic vercel API".

vercel-reliability-patterns

1868

from jeremylongshore/claude-code-plugins-plus-skills

Implement reliability patterns for Vercel deployments including circuit breakers, retry logic, and graceful degradation. Use when building fault-tolerant serverless functions, implementing retry strategies, or adding resilience to production Vercel services. Trigger with phrases like "vercel reliability", "vercel circuit breaker", "vercel resilience", "vercel fallback", "vercel graceful degradation".

vercel-reference-architecture

1868

from jeremylongshore/claude-code-plugins-plus-skills

Implement a Vercel reference architecture with layered project structure and best practices. Use when designing new Vercel projects, reviewing project structure, or establishing architecture standards for Vercel applications. Trigger with phrases like "vercel architecture", "vercel project structure", "vercel best practices layout", "how to organize vercel project".

vercel-rate-limits

1868

from jeremylongshore/claude-code-plugins-plus-skills

Handle Vercel API rate limits, implement retry logic, and configure WAF rate limiting. Use when hitting 429 errors, implementing retry logic, or setting up rate limiting for your Vercel-deployed API endpoints. Trigger with phrases like "vercel rate limit", "vercel throttling", "vercel 429", "vercel retry", "vercel backoff", "vercel WAF rate limit".

vercel-prod-checklist

1868

from jeremylongshore/claude-code-plugins-plus-skills

Vercel production deployment checklist with rollback and promotion procedures. Use when deploying to production, preparing for launch, or implementing go-live and instant rollback procedures. Trigger with phrases like "vercel production", "deploy vercel prod", "vercel go-live", "vercel launch checklist", "vercel promote".

vercel-policy-guardrails

1868

from jeremylongshore/claude-code-plugins-plus-skills

Implement lint rules, CI policy checks, and automated guardrails for Vercel projects. Use when setting up code quality rules, preventing secret exposure, or enforcing deployment policies for Vercel applications. Trigger with phrases like "vercel policy", "vercel lint", "vercel guardrails", "vercel best practices check", "vercel secret scan".

vercel-performance-tuning

1868

from jeremylongshore/claude-code-plugins-plus-skills

Optimize Vercel deployment performance with caching, bundle optimization, and cold start reduction. Use when experiencing slow page loads, optimizing Core Web Vitals, or reducing serverless function cold start times. Trigger with phrases like "vercel performance", "optimize vercel", "vercel latency", "vercel caching", "vercel slow", "vercel cold start".