ai-gateway
Route LLM API requests through a unified proxy with rate limiting, caching, and fallback. Use when you need to: call multiple LLM providers from one endpoint, add caching to reduce API costs, implement fallback between providers, enforce rate limits per team or application, or log all LLM usage centrally. Triggers include "LLM proxy", "unified API", "provider fallback", "LLM rate limiting", "AI gateway", "route to OpenAI", "cache LLM responses", or any task calling multiple LLM providers.
Best use case
ai-gateway is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Route LLM API requests through a unified proxy with rate limiting, caching, and fallback. Use when you need to: call multiple LLM providers from one endpoint, add caching to reduce API costs, implement fallback between providers, enforce rate limits per team or application, or log all LLM usage centrally. Triggers include "LLM proxy", "unified API", "provider fallback", "LLM rate limiting", "AI gateway", "route to OpenAI", "cache LLM responses", or any task calling multiple LLM providers.
Teams using ai-gateway should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/ai-gateway/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How ai-gateway Compares
| Feature / Agent | ai-gateway | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Route LLM API requests through a unified proxy with rate limiting, caching, and fallback. Use when you need to: call multiple LLM providers from one endpoint, add caching to reduce API costs, implement fallback between providers, enforce rate limits per team or application, or log all LLM usage centrally. Triggers include "LLM proxy", "unified API", "provider fallback", "LLM rate limiting", "AI gateway", "route to OpenAI", "cache LLM responses", or any task calling multiple LLM providers.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# ai-gateway
Unified LLM proxy with rate limiting, caching, logging, and automatic fallback. Drop-in replacement for direct provider calls.
## When to use
- Calling multiple LLM providers from one endpoint
- Adding response caching to reduce API costs
- Implementing automatic fallback when a provider fails
- Enforcing per-team or per-application rate limits
- Centralizing LLM usage logs and cost tracking
- A/B testing different models or providers
- Any application calling OpenAI, Anthropic, or Cohere APIs
## Prerequisites
1. A running ai-gateway instance (local or Docker)
2. A gateway API key (`gw_` prefix)
3. At least one provider configured in the gateway admin
## Quick Start
```bash
# Start the gateway
gateway start
# Create a gateway API key
gateway key create --label myapp
# Point your OpenAI client at the gateway
# (change the base URL, keep your application logic unchanged)
```
## Usage in Code
### Node.js with OpenAI SDK
```typescript
import OpenAI from 'openai'
const client = new OpenAI({
apiKey: 'gw_your_gateway_key',
baseURL: 'http://localhost:4080/v1',
})
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello' }],
})
```
### Python with OpenAI SDK
```python
import openai
client = openai.OpenAI(
api_key="gw_your_gateway_key",
base_url="http://localhost:4080/v1",
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
```
### Direct HTTP
```bash
curl http://localhost:4080/v1/chat/completions \
-H "Authorization: Bearer gw_your_gateway_key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}]
}'
```
## Model Routing
The gateway routes requests to providers based on the model name:
| Model pattern | Default provider |
|---|---|
| `gpt-4*` | OpenAI |
| `claude-*` | Anthropic |
| `command-*` | Cohere |
| unknown model | Highest-priority provider that lists it |
Custom routing rules can override defaults. Set them in the admin dashboard or API.
## Caching
Responses are cached automatically when `temperature` is 0 or not set. The cache key is derived from the model, messages, and parameters. On a cache hit, the stored response is returned without calling the provider.
Cache TTL defaults to 1 hour. Configure in settings:
```
GATEWAY_CACHE_TTL=3600 # seconds
GATEWAY_CACHE_ENABLED=1 # 0 to disable
```
## Rate Limiting
Each gateway key has a requests-per-minute limit. The default is 60 rpm (global setting). Override per key:
```bash
gateway key create --label myapp --rate-limit 100
```
On limit exceeded, the gateway returns:
```
HTTP 429 Too Many Requests
Retry-After: 15
```
## Fallback Chains
When a provider fails (timeout, 5xx, or provider-side 429), the gateway automatically tries the next provider in the configured fallback chain. The response includes standard OpenAI format regardless of which provider served it.
Configure fallback in the admin dashboard under Routes.
## CLI Reference
| Command | Description |
|---|---|
| `gateway start` | Start the gateway |
| `gateway key create --label <name>` | Create a gateway API key |
| `gateway key create --label <name> --rate-limit <rpm>` | Create key with custom rate limit |
| `gateway key list` | List all gateway keys |
| `gateway key revoke <id>` | Revoke a key |
| `gateway provider add` | Add a provider interactively |
| `gateway provider list` | List providers |
| `gateway provider health` | Check all provider health |
| `gateway --help` | Show help |
| `gateway --version` | Show version |
## Environment Variables
| Variable | Description | Default |
|---|---|---|
| `GATEWAY_PORT` | Proxy port | 4080 |
| `GATEWAY_ADMIN_PORT` | Admin API + dashboard port | 4081 |
| `GATEWAY_ENCRYPTION_KEY` | 32-byte hex key for provider key encryption | required |
| `GATEWAY_DATA_DIR` | Data directory for SQLite and logs | ~/.ai-gateway |
| `GATEWAY_REDIS_URL` | Redis URL for distributed rate limiting | (none) |
| `GATEWAY_CACHE_ENABLED` | Enable response cache (1 or 0) | 1 |
| `GATEWAY_CACHE_TTL` | Cache TTL in seconds | 3600 |
| `GATEWAY_RATE_LIMIT_RPM` | Default rate limit per key | 60 |
| `GATEWAY_LOG_RETENTION` | Request log retention in days | 30 |
| `GATEWAY_DEV` | Dev mode: verbose logging, no auth (1 or 0) | 0 |
## Troubleshooting
### "invalid gateway key"
Key is wrong, revoked, or missing `gw_` prefix. Check with `gateway key list`.
### Request goes to wrong provider
Check routing rules in the admin dashboard. Model name must match the glob pattern. Default routing uses provider priority order.
### Cache never hits
Ensure `temperature` is 0 or absent. Cache only stores deterministic responses. Check `GATEWAY_CACHE_ENABLED=1`.
### 429 from gateway vs 429 from provider
Gateway 429 means your key hit the gateway rate limit. Provider 429 (wrapped in a 502) means the provider rejected the request. Check the error response body for the source.Related Skills
Skill: Uptime Monitoring
## Overview
Skill: Status Page
## Overview
Skill: unit-conversion
## Overview
Skill: recipe-scaler
## Overview
reading-list
Operate the reading-list API to save, manage, tag, search, and export articles.
email-digest
Configure, test, and troubleshoot the reading-list daily email digest delivered via nodemailer.
websocket-realtime
Use the WebSocket connection in poll-builder to receive live vote updates. Use when you need to stream real-time poll results, monitor a poll for new votes, or build a live dashboard. Triggers include "live results", "real-time updates", "stream votes", "watch poll", or "WebSocket".
poll-builder
Self-hosted poll creation tool with real-time results. Use when you need to create a poll, check vote counts, close a poll, export results, or get the shareable link for a poll. Triggers include "create poll", "vote", "poll results", "survey", "collect votes", "share poll", or any task involving polling or voting.
Skill: personal-finance
## Overview
Skill: csv-import
## Overview
Skill: Syntax Highlighting
## Purpose
Skill: Pastebin Core
## Purpose