ai-gateway

Route LLM API requests through a unified proxy with rate limiting, caching, and fallback. Use when you need to: call multiple LLM providers from one endpoint, add caching to reduce API costs, implement fallback between providers, enforce rate limits per team or application, or log all LLM usage centrally. Triggers include "LLM proxy", "unified API", "provider fallback", "LLM rate limiting", "AI gateway", "route to OpenAI", "cache LLM responses", or any task calling multiple LLM providers.

7 stars

Best use case

ai-gateway is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Route LLM API requests through a unified proxy with rate limiting, caching, and fallback. Use when you need to: call multiple LLM providers from one endpoint, add caching to reduce API costs, implement fallback between providers, enforce rate limits per team or application, or log all LLM usage centrally. Triggers include "LLM proxy", "unified API", "provider fallback", "LLM rate limiting", "AI gateway", "route to OpenAI", "cache LLM responses", or any task calling multiple LLM providers.

Teams using ai-gateway should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ai-gateway/SKILL.md --create-dirs "https://raw.githubusercontent.com/heldernoid/agentic-build-templates/main/projects/ai-llm-tools/ai-gateway/skills/ai-gateway/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/ai-gateway/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How ai-gateway Compares

Feature / Agentai-gatewayStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Route LLM API requests through a unified proxy with rate limiting, caching, and fallback. Use when you need to: call multiple LLM providers from one endpoint, add caching to reduce API costs, implement fallback between providers, enforce rate limits per team or application, or log all LLM usage centrally. Triggers include "LLM proxy", "unified API", "provider fallback", "LLM rate limiting", "AI gateway", "route to OpenAI", "cache LLM responses", or any task calling multiple LLM providers.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# ai-gateway

Unified LLM proxy with rate limiting, caching, logging, and automatic fallback. Drop-in replacement for direct provider calls.

## When to use

- Calling multiple LLM providers from one endpoint
- Adding response caching to reduce API costs
- Implementing automatic fallback when a provider fails
- Enforcing per-team or per-application rate limits
- Centralizing LLM usage logs and cost tracking
- A/B testing different models or providers
- Any application calling OpenAI, Anthropic, or Cohere APIs

## Prerequisites

1. A running ai-gateway instance (local or Docker)
2. A gateway API key (`gw_` prefix)
3. At least one provider configured in the gateway admin

## Quick Start

```bash
# Start the gateway
gateway start

# Create a gateway API key
gateway key create --label myapp

# Point your OpenAI client at the gateway
# (change the base URL, keep your application logic unchanged)
```

## Usage in Code

### Node.js with OpenAI SDK

```typescript
import OpenAI from 'openai'

const client = new OpenAI({
  apiKey: 'gw_your_gateway_key',
  baseURL: 'http://localhost:4080/v1',
})

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello' }],
})
```

### Python with OpenAI SDK

```python
import openai

client = openai.OpenAI(
    api_key="gw_your_gateway_key",
    base_url="http://localhost:4080/v1",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
```

### Direct HTTP

```bash
curl http://localhost:4080/v1/chat/completions \
  -H "Authorization: Bearer gw_your_gateway_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
```

## Model Routing

The gateway routes requests to providers based on the model name:

| Model pattern | Default provider |
|---|---|
| `gpt-4*` | OpenAI |
| `claude-*` | Anthropic |
| `command-*` | Cohere |
| unknown model | Highest-priority provider that lists it |

Custom routing rules can override defaults. Set them in the admin dashboard or API.

## Caching

Responses are cached automatically when `temperature` is 0 or not set. The cache key is derived from the model, messages, and parameters. On a cache hit, the stored response is returned without calling the provider.

Cache TTL defaults to 1 hour. Configure in settings:

```
GATEWAY_CACHE_TTL=3600   # seconds
GATEWAY_CACHE_ENABLED=1  # 0 to disable
```

## Rate Limiting

Each gateway key has a requests-per-minute limit. The default is 60 rpm (global setting). Override per key:

```bash
gateway key create --label myapp --rate-limit 100
```

On limit exceeded, the gateway returns:
```
HTTP 429 Too Many Requests
Retry-After: 15
```

## Fallback Chains

When a provider fails (timeout, 5xx, or provider-side 429), the gateway automatically tries the next provider in the configured fallback chain. The response includes standard OpenAI format regardless of which provider served it.

Configure fallback in the admin dashboard under Routes.

## CLI Reference

| Command | Description |
|---|---|
| `gateway start` | Start the gateway |
| `gateway key create --label <name>` | Create a gateway API key |
| `gateway key create --label <name> --rate-limit <rpm>` | Create key with custom rate limit |
| `gateway key list` | List all gateway keys |
| `gateway key revoke <id>` | Revoke a key |
| `gateway provider add` | Add a provider interactively |
| `gateway provider list` | List providers |
| `gateway provider health` | Check all provider health |
| `gateway --help` | Show help |
| `gateway --version` | Show version |

## Environment Variables

| Variable | Description | Default |
|---|---|---|
| `GATEWAY_PORT` | Proxy port | 4080 |
| `GATEWAY_ADMIN_PORT` | Admin API + dashboard port | 4081 |
| `GATEWAY_ENCRYPTION_KEY` | 32-byte hex key for provider key encryption | required |
| `GATEWAY_DATA_DIR` | Data directory for SQLite and logs | ~/.ai-gateway |
| `GATEWAY_REDIS_URL` | Redis URL for distributed rate limiting | (none) |
| `GATEWAY_CACHE_ENABLED` | Enable response cache (1 or 0) | 1 |
| `GATEWAY_CACHE_TTL` | Cache TTL in seconds | 3600 |
| `GATEWAY_RATE_LIMIT_RPM` | Default rate limit per key | 60 |
| `GATEWAY_LOG_RETENTION` | Request log retention in days | 30 |
| `GATEWAY_DEV` | Dev mode: verbose logging, no auth (1 or 0) | 0 |

## Troubleshooting

### "invalid gateway key"

Key is wrong, revoked, or missing `gw_` prefix. Check with `gateway key list`.

### Request goes to wrong provider

Check routing rules in the admin dashboard. Model name must match the glob pattern. Default routing uses provider priority order.

### Cache never hits

Ensure `temperature` is 0 or absent. Cache only stores deterministic responses. Check `GATEWAY_CACHE_ENABLED=1`.

### 429 from gateway vs 429 from provider

Gateway 429 means your key hit the gateway rate limit. Provider 429 (wrapped in a 502) means the provider rejected the request. Check the error response body for the source.

Related Skills

Skill: Uptime Monitoring

7
from heldernoid/agentic-build-templates

## Overview

Skill: Status Page

7
from heldernoid/agentic-build-templates

## Overview

Skill: unit-conversion

7
from heldernoid/agentic-build-templates

## Overview

Skill: recipe-scaler

7
from heldernoid/agentic-build-templates

## Overview

reading-list

7
from heldernoid/agentic-build-templates

Operate the reading-list API to save, manage, tag, search, and export articles.

email-digest

7
from heldernoid/agentic-build-templates

Configure, test, and troubleshoot the reading-list daily email digest delivered via nodemailer.

websocket-realtime

7
from heldernoid/agentic-build-templates

Use the WebSocket connection in poll-builder to receive live vote updates. Use when you need to stream real-time poll results, monitor a poll for new votes, or build a live dashboard. Triggers include "live results", "real-time updates", "stream votes", "watch poll", or "WebSocket".

poll-builder

7
from heldernoid/agentic-build-templates

Self-hosted poll creation tool with real-time results. Use when you need to create a poll, check vote counts, close a poll, export results, or get the shareable link for a poll. Triggers include "create poll", "vote", "poll results", "survey", "collect votes", "share poll", or any task involving polling or voting.

Skill: personal-finance

7
from heldernoid/agentic-build-templates

## Overview

Skill: csv-import

7
from heldernoid/agentic-build-templates

## Overview

Skill: Syntax Highlighting

7
from heldernoid/agentic-build-templates

## Purpose

Skill: Pastebin Core

7
from heldernoid/agentic-build-templates

## Purpose