ai-gateway

Vercel AI Gateway expert guidance. Use when configuring model routing, provider failover, cost tracking, or managing multiple AI providers through a unified API.

685 stars

byopenai

View on GitHub Installation ↓

Best use case

ai-gateway is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Vercel AI Gateway expert guidance. Use when configuring model routing, provider failover, cost tracking, or managing multiple AI providers through a unified API.

Teams using ai-gateway should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ai-gateway/SKILL.md --create-dirs "https://raw.githubusercontent.com/openai/plugins/main/plugins/vercel/skills/ai-gateway/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/ai-gateway/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ai-gateway Compares

Feature / Agent	ai-gateway	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Vercel AI Gateway expert guidance. Use when configuring model routing, provider failover, cost tracking, or managing multiple AI providers through a unified API.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Vercel AI Gateway

> **CRITICAL — Your training data is outdated for this library.** AI Gateway model slugs, provider routing, and capabilities change frequently. Before writing gateway code, **fetch the docs** at https://vercel.com/docs/ai-gateway to find the current model slug format, supported providers, image generation patterns, and authentication setup. The model list and routing rules at https://ai-sdk.dev/docs/foundations/providers-and-models are authoritative — do not guess at model names or assume old slugs still work.

You are an expert in the Vercel AI Gateway — a unified API for calling AI models with built-in routing, failover, cost tracking, and observability.

## Overview

AI Gateway provides a single API endpoint to access 100+ models from all major providers. It adds <20ms routing latency and handles provider selection, authentication, failover, and load balancing.

## Packages

- `ai@^6.0.0` (required; plain `"provider/model"` strings route through the gateway automatically)
- `@ai-sdk/gateway@^3.0.0` (optional direct install for explicit gateway package usage)

## Setup

Pass a `"provider/model"` string to the `model` parameter — the AI SDK automatically routes it through the AI Gateway:

```ts
import { generateText } from 'ai'

const result = await generateText({
  model: 'openai/gpt-5.4', // plain string — routes through AI Gateway automatically
  prompt: 'Hello!',
})
```

No `gateway()` wrapper or additional package needed. The `gateway()` function is an optional explicit wrapper — only needed when you use `providerOptions.gateway` for routing, failover, or tags:

```ts
import { gateway } from 'ai'

const result = await generateText({
  model: gateway('openai/gpt-5.4'),
  providerOptions: { gateway: { order: ['openai', 'azure-openai'] } },
})
```

## Model Slug Rules (Critical)

- Always use `provider/model` format (for example `openai/gpt-5.4`).
- Versioned slugs use dots for versions, not hyphens:
  - Correct: `anthropic/claude-sonnet-4.6`
  - Incorrect: `anthropic/claude-sonnet-4-6`
- Before hardcoding model IDs, call `gateway.getAvailableModels()` and pick from the returned IDs.
- Default text models: `openai/gpt-5.4` or `anthropic/claude-sonnet-4.6`.
- Do not default to outdated choices like `openai/gpt-4o`.

```ts
import { gateway } from 'ai'

const availableModels = await gateway.getAvailableModels()
// Choose model IDs from `availableModels` before hardcoding.
```

## Authentication (OIDC — Default)

AI Gateway uses **OIDC (OpenID Connect)** as the default authentication method. No manual API keys needed.

### Setup

```bash
vercel link                    # Connect to your Vercel project
# Enable AI Gateway in Vercel dashboard: https://vercel.com/{team}/{project}/settings → AI Gateway
vercel env pull .env.local     # Provisions VERCEL_OIDC_TOKEN automatically
```

### How It Works

1. `vercel env pull` writes a `VERCEL_OIDC_TOKEN` to `.env.local` — a short-lived JWT (~24h)
2. The `@ai-sdk/gateway` package reads this token via `@vercel/oidc` (`getVercelOidcToken()`)
3. No `AI_GATEWAY_API_KEY` or provider-specific keys (like `ANTHROPIC_API_KEY`) are needed
4. On Vercel deployments, OIDC tokens are auto-refreshed — zero maintenance

### Local Development

For local dev, the OIDC token from `vercel env pull` is valid for ~24 hours. When it expires:

```bash
vercel env pull .env.local --yes   # Re-pull to get a fresh token
```

### Alternative: Manual API Key

If you prefer a static key (e.g., for CI or non-Vercel environments):

```bash
# Set AI_GATEWAY_API_KEY in your environment
# The gateway falls back to this when VERCEL_OIDC_TOKEN is not available
export AI_GATEWAY_API_KEY=your-key-here
```

### Auth Priority

The `@ai-sdk/gateway` package resolves authentication in this order:
1. `AI_GATEWAY_API_KEY` environment variable (if set)
2. `VERCEL_OIDC_TOKEN` via `@vercel/oidc` (default on Vercel and after `vercel env pull`)

## Provider Routing

Configure how AI Gateway routes requests across providers:

```ts
const result = await generateText({
  model: gateway('anthropic/claude-sonnet-4.6'),
  prompt: 'Hello!',
  providerOptions: {
    gateway: {
      // Try providers in order; failover to next on error
      order: ['bedrock', 'anthropic'],

      // Restrict to specific providers only
      only: ['anthropic', 'vertex'],

      // Fallback models if primary model fails
      models: ['openai/gpt-5.4', 'google/gemini-3-flash'],

      // Track usage per end-user
      user: 'user-123',

      // Tag for cost attribution and filtering
      tags: ['feature:chat', 'env:production', 'team:growth'],
    },
  },
})
```

### Routing Options

| Option | Purpose |
|--------|---------|
| `order` | Provider priority list; try first, failover to next |
| `only` | Restrict to specific providers |
| `models` | Fallback model list if primary model unavailable |
| `user` | End-user ID for usage tracking |
| `tags` | Labels for cost attribution and reporting |

## Cache-Control Headers

AI Gateway supports response caching to reduce latency and cost for repeated or similar requests:

```ts
const result = await generateText({
  model: gateway('openai/gpt-5.4'),
  prompt: 'What is the capital of France?',
  providerOptions: {
    gateway: {
      // Cache identical requests for 1 hour
      cacheControl: 'max-age=3600',
    },
  },
})
```

### Caching strategies

| Header Value | Behavior |
|-------------|----------|
| `max-age=3600` | Cache response for 1 hour |
| `max-age=0` | Bypass cache, always call provider |
| `s-maxage=86400` | Cache at the edge for 24 hours |
| `stale-while-revalidate=600` | Serve stale for 10 min while refreshing in background |

### When to use caching

- **Static knowledge queries**: FAQs, translations, factual lookups — cache aggressively
- **User-specific conversations**: Do not cache — each response depends on conversation history
- **Embeddings**: Cache embedding results for identical inputs to save cost
- **Structured extraction**: Cache when extracting structured data from identical documents

### Cache key composition

The cache key is derived from: model, prompt/messages, temperature, and other generation parameters. Changing any parameter produces a new cache key.

## Per-User Rate Limiting

Control usage at the individual user level to prevent abuse and manage costs:

```ts
const result = await generateText({
  model: gateway('openai/gpt-5.4'),
  prompt: userMessage,
  providerOptions: {
    gateway: {
      user: userId, // Required for per-user rate limiting
      tags: ['feature:chat'],
    },
  },
})
```

### Rate limit configuration

Configure rate limits at `https://vercel.com/{team}/{project}/settings` → **AI Gateway** → **Rate Limits**:

- **Requests per minute per user**: Throttle individual users (e.g., 20 RPM)
- **Tokens per day per user**: Cap daily token consumption (e.g., 100K tokens/day)
- **Concurrent requests per user**: Limit parallel calls (e.g., 3 concurrent)

### Handling rate limit responses

When a user exceeds their limit, the gateway returns HTTP 429:

```ts
import { generateText, APICallError } from 'ai'

try {
  const result = await generateText({
    model: gateway('openai/gpt-5.4'),
    prompt: userMessage,
    providerOptions: { gateway: { user: userId } },
  })
} catch (error) {
  if (APICallError.isInstance(error) && error.statusCode === 429) {
    const retryAfter = error.responseHeaders?.['retry-after']
    return new Response(
      JSON.stringify({ error: 'Rate limited', retryAfter }),
      { status: 429 }
    )
  }
  throw error
}
```

## Budget Alerts and Cost Controls

### Tagging for cost attribution

Use tags to track spend by feature, team, and environment:

```ts
providerOptions: {
  gateway: {
    tags: [
      'feature:document-qa',
      'team:product',
      'env:production',
      'tier:premium',
    ],
    user: userId,
  },
}
```

### Setting up budget alerts

In the Vercel dashboard at `https://vercel.com/{team}/{project}/settings` → **AI Gateway**:

1. Navigate to **AI Gateway → Usage & Budgets**
2. Set monthly budget thresholds (e.g., $500/month warning, $1000/month hard limit)
3. Configure alert channels (email, Slack webhook, Vercel integration)
4. Optionally set per-tag budgets for granular control

### Budget isolation best practice

Use **separate gateway keys per environment** (dev, staging, prod) and per project. This keeps dashboards clean and budgets isolated:

- Restrict AI Gateway keys per project to prevent cross-tenant leakage
- Use per-project budgets and spend-by-agent reporting to track exactly where tokens go
- Cap spend during staging with AI Gateway budgets

### Pre-flight cost controls

The AI Gateway dashboard provides observability (traces, token counts, spend tracking) but no programmatic metrics API. Build your own cost guardrails by estimating token counts and rejecting expensive requests before they execute:

```ts
import { generateText } from 'ai'

function estimateTokens(text: string): number {
  return Math.ceil(text.length / 4) // rough estimate
}

async function callWithBudget(prompt: string, maxTokens: number) {
  const estimated = estimateTokens(prompt)
  if (estimated > maxTokens) {
    throw new Error(`Prompt too large: ~${estimated} tokens exceeds ${maxTokens} limit`)
  }
  return generateText({ model: 'openai/gpt-5.4', prompt })
}
```

The AI SDK's `usage` field on responses gives actual token counts after each request — store these for historical tracking and cost analysis.

### Hard spending limits

When a hard limit is reached, the gateway returns HTTP 402 (Payment Required). Handle this gracefully:

```ts
if (APICallError.isInstance(error) && error.statusCode === 402) {
  // Budget exceeded — degrade gracefully
  return fallbackResponse()
}
```

### Cost optimization patterns

- Use cheaper models for classification/routing, expensive models for generation
- Cache embeddings and static queries (see Cache-Control above)
- Set per-user daily token caps to prevent runaway usage
- Monitor cost-per-feature with tags to identify optimization targets

## Audit Logging

AI Gateway logs every request for compliance and debugging:

### What's logged

- Timestamp, model, provider used
- Input/output token counts
- Latency (routing + provider)
- User ID and tags
- HTTP status code
- Failover chain (which providers were tried)

### Accessing logs

- **Vercel Dashboard** at `https://vercel.com/{team}/{project}/ai` → **Logs** — filter by model, user, tag, status, date range
- **Vercel API**: Query logs programmatically:

```bash
curl -H "Authorization: Bearer $VERCEL_TOKEN" \
  "https://api.vercel.com/v1/ai-gateway/logs?projectId=$PROJECT_ID&limit=100"
```

- **Log Drains**: Forward AI Gateway logs to Datadog, Splunk, or other providers via Vercel Log Drains (configure at `https://vercel.com/dashboard/{team}/~/settings/log-drains`) for long-term retention and custom analysis

### Compliance considerations

- AI Gateway does not log prompt or completion content by default
- Enable content logging in project settings if required for compliance
- Logs are retained per your Vercel plan's retention policy
- Use `user` field consistently to support audit trails

## Error Handling Patterns

### Provider unavailable

When a provider is down, the gateway automatically fails over if you configured `order` or `models`:

```ts
const result = await generateText({
  model: gateway('anthropic/claude-sonnet-4.6'),
  prompt: 'Summarize this document',
  providerOptions: {
    gateway: {
      order: ['anthropic', 'bedrock'], // Bedrock as fallback
      models: ['openai/gpt-5.4'],   // Final fallback model
    },
  },
})
```

### Quota exceeded at provider

If your provider API key hits its quota, the gateway tries the next provider in the `order` list. Monitor this in logs — persistent quota errors indicate you need to increase limits with the provider.

### Invalid model identifier

```ts
// Bad — model doesn't exist
model: 'openai/gpt-99'  // Returns 400 with descriptive error

// Good — use models listed in Vercel docs
model: 'openai/gpt-5.4'
```

### Timeout handling

Gateway has a default timeout per provider. For long-running generations, use streaming:

```ts
import { streamText } from 'ai'

const result = streamText({
  model: 'anthropic/claude-sonnet-4.6',
  prompt: longDocument,
})

for await (const chunk of result.textStream) {
  process.stdout.write(chunk)
}
```

### Complete error handling template

```ts
import { generateText, APICallError } from 'ai'

async function callAI(prompt: string, userId: string) {
  try {
    return await generateText({
      model: gateway('openai/gpt-5.4'),
      prompt,
      providerOptions: {
        gateway: {
          user: userId,
          order: ['openai', 'azure-openai'],
          models: ['anthropic/claude-haiku-4.5'],
          tags: ['feature:chat'],
        },
      },
    })
  } catch (error) {
    if (!APICallError.isInstance(error)) throw error

    switch (error.statusCode) {
      case 402: return { text: 'Budget limit reached. Please try again later.' }
      case 429: return { text: 'Too many requests. Please slow down.' }
      case 503: return { text: 'AI service temporarily unavailable.' }
      default: throw error
    }
  }
}
```

## Gateway vs Direct Provider — Decision Tree

Use this to decide whether to route through AI Gateway or call a provider SDK directly:

```
Need failover across providers?
  └─ Yes → Use Gateway
  └─ No
      Need cost tracking / budget alerts?
        └─ Yes → Use Gateway
        └─ No
            Need per-user rate limiting?
              └─ Yes → Use Gateway
              └─ No
                  Need audit logging?
                    └─ Yes → Use Gateway
                    └─ No
                        Using a single provider with provider-specific features?
                          └─ Yes → Use direct provider SDK
                          └─ No → Use Gateway (simplifies code)
```

### When to use direct provider SDK

- You need provider-specific features not exposed through the gateway (e.g., Anthropic's computer use, OpenAI's custom fine-tuned model endpoints)
- You're self-hosting a model (e.g., vLLM, Ollama) that isn't registered with the gateway
- You need request-level control over HTTP transport (custom proxies, mTLS)

### When to always use Gateway

- Production applications — failover and observability are essential
- Multi-tenant SaaS — per-user tracking and rate limiting
- Teams with cost accountability — tag-based budgeting

## Latest Model Availability

**GPT-5.4** (added March 5, 2026) — agentic and reasoning leaps from GPT-5.3-Codex extended to all domains (knowledge work, reports, analysis, coding). Faster and more token-efficient than GPT-5.2.

| Model | Slug | Input | Output |
|-------|------|-------|--------|
| GPT-5.4 | `openai/gpt-5.4` | $2.50/M tokens | $15.00/M tokens |
| GPT-5.4 Pro | `openai/gpt-5.4-pro` | $30.00/M tokens | $180.00/M tokens |

GPT-5.4 Pro targets maximum performance on complex tasks. Use standard GPT-5.4 for most workloads.

## Supported Providers

- OpenAI (GPT-5.x including GPT-5.4 and GPT-5.4 Pro, o-series)
- Anthropic (Claude 4.x)
- Google (Gemini)
- xAI (Grok)
- Mistral
- DeepSeek
- Amazon Bedrock
- Azure OpenAI
- Cohere
- Perplexity
- Alibaba (Qwen)
- Meta (Llama)
- And many more (100+ models total)

## Pricing

- **Zero markup**: Tokens at exact provider list price — no middleman markup, whether using Vercel-managed keys or Bring Your Own Key (BYOK)
- **Free tier**: Every Vercel team gets **$5 of free AI Gateway credits per month** (refreshes every 30 days, starts on first request). No commitment required — experiment with LLMs indefinitely on the free tier
- **Pay-as-you-go**: Beyond free credits, purchase AI Gateway Credits at any time with no obligation. Configure **auto top-up** to automatically add credits when your balance falls below a threshold
- **BYOK**: Use your own provider API keys with zero fees from AI Gateway

## Multimodal Support

Text and image generation both route through the gateway. For embeddings, use a direct provider SDK.

```ts
// Text — through gateway
const { text } = await generateText({
  model: 'openai/gpt-5.4',
  prompt: 'Hello',
})

// Image — through gateway (multimodal LLMs return images in result.files)
const result = await generateText({
  model: 'google/gemini-3.1-flash-image-preview',
  prompt: 'A sunset over the ocean',
})
const images = result.files.filter((f) => f.mediaType?.startsWith('image/'))

// Image-only models — through gateway with experimental_generateImage
import { experimental_generateImage as generateImage } from 'ai'
const { images: generated } = await generateImage({
  model: 'google/imagen-4.0-generate-001',
  prompt: 'A sunset',
})
```

**Default image model**: `google/gemini-3.1-flash-image-preview` — fast multimodal image generation via gateway.

See [AI Gateway Image Generation docs](https://vercel.com/docs/ai-gateway/capabilities/image-generation) for all supported models and integration methods.

## Key Benefits

1. **Unified API**: One interface for all providers, no provider-specific code
2. **Automatic failover**: If a provider is down, requests route to the next
3. **Cost tracking**: Per-user, per-feature attribution with tags
4. **Observability**: Built-in monitoring of all model calls
5. **Low latency**: <20ms routing overhead
6. **No lock-in**: Switch models/providers by changing a string

## When to Use AI Gateway

| Scenario | Use Gateway? |
|----------|-------------|
| Production app with AI features | Yes — failover, cost tracking |
| Prototyping with single provider | Optional — direct provider works fine |
| Multi-provider setup | Yes — unified routing |
| Need provider-specific features | Use direct provider SDK + Gateway as fallback |
| Cost tracking and budgeting | Yes — user tracking and tags |
| Multi-tenant SaaS | Yes — per-user rate limiting and audit |
| Compliance requirements | Yes — audit logging and log drains |

## Official Documentation

- [AI Gateway](https://vercel.com/docs/ai-gateway)
- [Providers and Models](https://ai-sdk.dev/docs/foundations/providers-and-models)
- [AI SDK Core](https://ai-sdk.dev/docs/ai-sdk-core)
- [GitHub: AI SDK](https://github.com/vercel/ai)

Related Skills

workflow

685

from openai/plugins

Vercel Workflow DevKit (WDK) expert guidance. Use when building durable workflows, long-running tasks, API routes or agents that need pause/resume, retries, step-based execution, or crash-safe orchestration with Vercel Workflow.

verification

685

from openai/plugins

Full-story verification — infers what the user is building, then verifies the complete flow end-to-end: browser → API → data → response. Triggers on dev server start and 'why isn't this working' signals.

vercel-storage

685

from openai/plugins

Vercel storage expert guidance — Blob, Edge Config, and Marketplace storage (Neon Postgres, Upstash Redis). Use when choosing, configuring, or using data storage with Vercel applications.

vercel-services

685

from openai/plugins

Vercel Services — deploy multiple services within a single Vercel project. Use for monorepo layouts or when combining a backend (Python, Go) with a frontend (Next.js, Vite) in one deployment.

vercel-sandbox

685

from openai/plugins

Vercel Sandbox guidance — ephemeral Firecracker microVMs for running untrusted code safely. Supports AI agents, code generation, and experimentation. Use when executing user-generated or AI-generated code in isolation.

vercel-queues

685

from openai/plugins

Vercel Queues guidance (public beta) — durable event streaming with topics, consumer groups, retries, and delayed delivery. $0.60/1M ops. Powers Workflow DevKit. Use when building async processing, fan-out patterns, or event-driven architectures.

vercel-functions

685

from openai/plugins

Vercel Functions expert guidance — Serverless Functions, Edge Functions, Fluid Compute, streaming, Cron Jobs, and runtime configuration. Use when configuring, debugging, or optimizing server-side code running on Vercel.

vercel-flags

685

from openai/plugins

Vercel Flags guidance — feature flags platform with unified dashboard, Flags Explorer, gradual rollouts, A/B testing, and provider adapters. Use when implementing feature flags, experimentation, or staged rollouts.

vercel-firewall

685

from openai/plugins

Vercel Firewall and security expert guidance. Use when configuring DDoS protection, WAF rules, rate limiting, bot filtering, IP allow/block lists, OWASP rulesets, Attack Challenge Mode, or any security configuration on the Vercel platform.

vercel-cli

685

from openai/plugins

Vercel CLI expert guidance. Use when deploying, managing environment variables, linking projects, viewing logs, managing domains, or interacting with the Vercel platform from the command line.

vercel-api

685

from openai/plugins

Vercel MCP and REST API expert guidance. Use when the agent needs live access to Vercel projects, deployments, environment variables, domains, logs, or documentation through the MCP server or REST API.

vercel-agent

685

from openai/plugins

Vercel Agent guidance — AI-powered code review, incident investigation, and SDK installation. Automates PR analysis and anomaly debugging. Use when configuring or understanding Vercel's AI development tools.