vercel-incident-runbook

Vercel incident response procedures with triage, instant rollback, and postmortem. Use when responding to Vercel-related outages, investigating production errors, or running post-incident reviews for deployment failures. Trigger with phrases like "vercel incident", "vercel outage", "vercel down", "vercel on-call", "vercel emergency", "vercel broken".

1,868 stars

Best use case

vercel-incident-runbook is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Vercel incident response procedures with triage, instant rollback, and postmortem. Use when responding to Vercel-related outages, investigating production errors, or running post-incident reviews for deployment failures. Trigger with phrases like "vercel incident", "vercel outage", "vercel down", "vercel on-call", "vercel emergency", "vercel broken".

Teams using vercel-incident-runbook should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/vercel-incident-runbook/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/saas-packs/vercel-pack/skills/vercel-incident-runbook/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/vercel-incident-runbook/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How vercel-incident-runbook Compares

Feature / Agentvercel-incident-runbookStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Vercel incident response procedures with triage, instant rollback, and postmortem. Use when responding to Vercel-related outages, investigating production errors, or running post-incident reviews for deployment failures. Trigger with phrases like "vercel incident", "vercel outage", "vercel down", "vercel on-call", "vercel emergency", "vercel broken".

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Vercel Incident Runbook

## Overview
Step-by-step incident response for Vercel deployment failures, function errors, and platform outages. Covers rapid triage, instant rollback, communication templates, and postmortem procedures.

## Prerequisites
- Access to Vercel dashboard and CLI
- Access to Vercel status page (vercel-status.com)
- Communication channels (Slack, PagerDuty) configured
- Log drain or runtime log access

## Instructions

### Step 1: Rapid Triage (First 5 Minutes)
```bash
# 1. Check if it's a Vercel platform issue
curl -s "https://www.vercel-status.com/api/v2/summary.json" \
  | jq '.status.description, [.components[] | select(.status != "operational") | {name, status}]'

# 2. Check current production deployment status
vercel ls --prod
vercel inspect $(vercel ls --prod --json | jq -r '.[0].url')

# 3. Check recent deployments — did a deploy just happen?
curl -s -H "Authorization: Bearer $VERCEL_TOKEN" \
  "https://api.vercel.com/v6/deployments?target=production&limit=5&projectId=prj_xxx" \
  | jq '.deployments[] | {uid, state, createdAt: (.createdAt/1000 | todate), url}'

# 4. Check function logs for errors
vercel logs $(vercel ls --prod --json | jq -r '.[0].url') --level=error --limit=20
```

### Step 2: Decision Tree
```
Is vercel-status.com showing an incident?
├── YES → Vercel platform issue
│   ├── Subscribe to updates on status page
│   ├── Post internal status: "Vercel platform incident — monitoring"
│   └── No action needed from us — wait for Vercel resolution
│
└── NO → Issue is in our deployment
    ├── Did a deployment happen in the last 30 minutes?
    │   ├── YES → Likely deployment regression
    │   │   └── ROLLBACK immediately (Step 3)
    │   └── NO → Application-level issue
    │       ├── Check function logs for new errors
    │       ├── Check external dependency status (DB, APIs)
    │       └── Investigate and hotfix (Step 4)
    │
    └── Is the issue region-specific?
        ├── YES → Check function regions, possible edge issue
        └── NO → Global issue, check code and env vars
```

### Step 3: Instant Rollback (< 30 Seconds)
```bash
# Option A: Rollback to previous production deployment (fastest)
vercel rollback
# This instantly swaps production traffic — no rebuild needed

# Option B: Rollback to a specific known-good deployment
vercel rollback dpl_xxxxxxxxxxxx

# Option C: Via API (for automation/PagerDuty integration)
curl -X POST "https://api.vercel.com/v9/projects/my-app/promote" \
  -H "Authorization: Bearer $VERCEL_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"deploymentId": "dpl_known_good_id"}'

# Verify rollback succeeded
vercel ls --prod
curl -s https://yourdomain.com/api/health | jq .
```

### Step 4: Investigate Root Cause
```bash
# Collect evidence while it's fresh
mkdir incident-$(date +%Y%m%d)
cd incident-$(date +%Y%m%d)

# Function logs around the incident time
vercel logs https://yourdomain.com --limit=200 > function-logs.txt

# Deployment diff — what changed?
curl -s -H "Authorization: Bearer $VERCEL_TOKEN" \
  "https://api.vercel.com/v13/deployments/dpl_broken" \
  | jq '.meta' > broken-deployment-meta.json

# Compare env vars between working and broken deployments
vercel env ls > env-vars.txt

# Check git diff between last good and broken commit
git log --oneline -10
git diff dpl_good_commit..dpl_broken_commit -- api/ src/
```

### Step 5: Enable Maintenance Page (If Needed)
```json
// vercel.json — temporary maintenance mode via rewrite
{
  "rewrites": [
    {
      "source": "/((?!_next|api/health).*)",
      "destination": "/maintenance.html"
    }
  ]
}
```

```html
<!-- public/maintenance.html -->
<!DOCTYPE html>
<html>
<head><title>Maintenance</title></head>
<body>
  <h1>We'll be right back</h1>
  <p>We're performing scheduled maintenance. Please check back shortly.</p>
</body>
</html>
```

### Step 6: Communication Templates

**Internal — Slack (Incident Start)**
```
:rotating_light: INCIDENT: [Project Name] production issue detected
Status: Investigating
Impact: [Description of user impact]
Start time: [UTC timestamp]
On-call: @[engineer]
Thread: replies here
```

**Internal — Slack (Mitigation)**
```
:white_check_mark: MITIGATED: [Project Name]
Action: Rolled back to deployment dpl_xxx
Impact duration: [X minutes]
Root cause: [Brief description]
Postmortem: [link] scheduled for [date]
```

**External — Status Page**
```
Title: Degraded performance on [service]
Body: We are investigating reports of [issue]. Some users may experience
[impact]. Our team is actively working on a resolution.
Update: The issue has been resolved. [Brief root cause].
```

### Step 7: Postmortem Template
```markdown
# Incident Postmortem: [Title]

## Summary
- Duration: [start] to [end] ([X minutes])
- Impact: [users/requests affected]
- Severity: [P1/P2/P3]

## Timeline (UTC)
- HH:MM — [event]
- HH:MM — Alert fired
- HH:MM — On-call acknowledged
- HH:MM — Root cause identified
- HH:MM — Rollback executed
- HH:MM — Service restored

## Root Cause
[What broke and why]

## Resolution
[What was done to fix it]

## Action Items
- [ ] [Preventive action] — Owner: @xxx — Due: [date]
- [ ] [Detection improvement] — Owner: @xxx — Due: [date]
- [ ] [Process improvement] — Owner: @xxx — Due: [date]
```

## Incident Severity Levels

| Severity | Definition | Response Time | Rollback? |
|----------|-----------|---------------|-----------|
| P1 | Production down, all users affected | < 5 min | Immediate |
| P2 | Degraded, some users affected | < 15 min | If not fixable in 30 min |
| P3 | Minor issue, workaround exists | < 1 hour | No |
| P4 | Cosmetic or non-urgent | Next business day | No |

## Output
- Incident categorized and triaged within 5 minutes
- Instant rollback executed if deployment regression detected
- Communication sent to internal and external stakeholders
- Postmortem scheduled with action items

## Error Handling
| Scenario | Action |
|----------|--------|
| Vercel status page shows incident | Monitor, communicate, no deployment changes |
| `vercel rollback` fails | Use API promotion: POST to `/v9/projects/.../promote` |
| Rollback deployment also broken | Deploy from a known-good git tag |
| Cannot access Vercel dashboard | Use CLI with saved VERCEL_TOKEN |
| Log retention expired | Check external log drain provider |

## Resources
- [Vercel Status Page](https://www.vercel-status.com)
- [Instant Rollback](https://vercel.com/docs/instant-rollback)
- [Vercel Support](https://vercel.com/support)
- [Vercel Logs CLI](https://vercel.com/docs/cli/logs)

## Next Steps
For data handling and compliance, see `vercel-data-handling`.

Related Skills

responding-to-security-incidents

1868
from jeremylongshore/claude-code-plugins-plus-skills

Analyze and guide security incident response, investigation, and remediation processes. Use when you need to handle security breaches, classify incidents, develop response playbooks, gather forensic evidence, or coordinate remediation efforts. Trigger with phrases like "security incident response", "ransomware attack response", "data breach investigation", "incident playbook", or "security forensics".

windsurf-incident-runbook

1868
from jeremylongshore/claude-code-plugins-plus-skills

Execute Windsurf incident response when AI features fail or cause production issues. Use when Cascade breaks code, Windsurf service is down, AI-generated code causes production incidents, or team needs emergency Windsurf troubleshooting. Trigger with phrases like "windsurf incident", "windsurf outage", "windsurf broke production", "cascade caused bug", "windsurf emergency".

webflow-incident-runbook

1868
from jeremylongshore/claude-code-plugins-plus-skills

Execute Webflow incident response — triage by HTTP status (401/403/429/500), circuit breaker activation, cached fallback, Webflow status page checks, communication templates, and postmortem process. Trigger with phrases like "webflow incident", "webflow outage", "webflow down", "webflow on-call", "webflow emergency", "webflow broken".

vercel-webhooks-events

1868
from jeremylongshore/claude-code-plugins-plus-skills

Implement Vercel webhook handling with signature verification and event processing. Use when setting up webhook endpoints, processing deployment events, or building integrations that react to Vercel deployment lifecycle. Trigger with phrases like "vercel webhook", "vercel events", "vercel deployment.ready", "handle vercel events", "vercel webhook signature".

vercel-upgrade-migration

1868
from jeremylongshore/claude-code-plugins-plus-skills

Upgrade Vercel CLI, Node.js runtime, and Next.js framework versions with breaking change detection. Use when upgrading Vercel CLI versions, migrating Node.js runtimes, or updating Next.js between major versions on Vercel. Trigger with phrases like "upgrade vercel", "vercel migration", "vercel breaking changes", "update vercel CLI", "next.js upgrade on vercel".

vercel-security-basics

1868
from jeremylongshore/claude-code-plugins-plus-skills

Apply Vercel security best practices for secrets, headers, and access control. Use when securing API keys, configuring security headers, or auditing Vercel security configuration. Trigger with phrases like "vercel security", "vercel secrets", "secure vercel", "vercel headers", "vercel CSP".

vercel-sdk-patterns

1868
from jeremylongshore/claude-code-plugins-plus-skills

Production-ready Vercel REST API patterns with typed fetch wrappers and error handling. Use when integrating with the Vercel API programmatically, building deployment tools, or establishing team coding standards for Vercel API calls. Trigger with phrases like "vercel SDK patterns", "vercel API wrapper", "vercel REST API client", "vercel best practices", "idiomatic vercel API".

vercel-reliability-patterns

1868
from jeremylongshore/claude-code-plugins-plus-skills

Implement reliability patterns for Vercel deployments including circuit breakers, retry logic, and graceful degradation. Use when building fault-tolerant serverless functions, implementing retry strategies, or adding resilience to production Vercel services. Trigger with phrases like "vercel reliability", "vercel circuit breaker", "vercel resilience", "vercel fallback", "vercel graceful degradation".

vercel-reference-architecture

1868
from jeremylongshore/claude-code-plugins-plus-skills

Implement a Vercel reference architecture with layered project structure and best practices. Use when designing new Vercel projects, reviewing project structure, or establishing architecture standards for Vercel applications. Trigger with phrases like "vercel architecture", "vercel project structure", "vercel best practices layout", "how to organize vercel project".

vercel-rate-limits

1868
from jeremylongshore/claude-code-plugins-plus-skills

Handle Vercel API rate limits, implement retry logic, and configure WAF rate limiting. Use when hitting 429 errors, implementing retry logic, or setting up rate limiting for your Vercel-deployed API endpoints. Trigger with phrases like "vercel rate limit", "vercel throttling", "vercel 429", "vercel retry", "vercel backoff", "vercel WAF rate limit".

vercel-prod-checklist

1868
from jeremylongshore/claude-code-plugins-plus-skills

Vercel production deployment checklist with rollback and promotion procedures. Use when deploying to production, preparing for launch, or implementing go-live and instant rollback procedures. Trigger with phrases like "vercel production", "deploy vercel prod", "vercel go-live", "vercel launch checklist", "vercel promote".

vercel-policy-guardrails

1868
from jeremylongshore/claude-code-plugins-plus-skills

Implement lint rules, CI policy checks, and automated guardrails for Vercel projects. Use when setting up code quality rules, preventing secret exposure, or enforcing deployment policies for Vercel applications. Trigger with phrases like "vercel policy", "vercel lint", "vercel guardrails", "vercel best practices check", "vercel secret scan".