rate-limiting

Rate limiting algorithms, implementation strategies, HTTP conventions, tiered limits, distributed patterns, and client-side handling. Use when protecting APIs from abuse, implementing usage tiers, or configuring gateway-level throttling.

7 stars

bywpank

View on GitHub Installation ↓

Best use case

rate-limiting is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using rate-limiting should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/rate-limiting/SKILL.md --create-dirs "https://raw.githubusercontent.com/wpank/ai/main/skills/api/rate-limiting/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/rate-limiting/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How rate-limiting Compares

Feature / Agent	rate-limiting	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Rate Limiting Patterns

## Algorithms

| Algorithm | Accuracy | Burst Handling | Best For |
|-----------|----------|----------------|----------|
| **Token Bucket** | High | Allows controlled bursts | API rate limiting, traffic shaping |
| **Leaky Bucket** | High | Smooths bursts entirely | Steady-rate processing, queues |
| **Fixed Window** | Low | Allows edge bursts (2x) | Simple use cases, prototyping |
| **Sliding Window Log** | Very High | Precise control | Strict compliance, billing-critical |
| **Sliding Window Counter** | High | Good approximation | **Production APIs — best tradeoff** |

**Fixed window problem:** A user sends the full limit at 11:59 and again at 12:01, doubling the effective rate. Sliding window fixes this.

### Token Bucket

Bucket holds tokens up to capacity. Tokens refill at a fixed rate. Each request consumes one.

```python
class TokenBucket:
    def __init__(self, capacity: int, refill_rate: float):
        self.capacity = capacity
        self.tokens = capacity
        self.refill_rate = refill_rate  # tokens per second
        self.last_refill = time.monotonic()

    def allow(self) -> bool:
        now = time.monotonic()
        elapsed = now - self.last_refill
        self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_rate)
        self.last_refill = now
        if self.tokens >= 1:
            self.tokens -= 1
            return True
        return False
```

### Sliding Window Counter

Hybrid of fixed window and sliding window log — weights the previous window's count by overlap percentage:

```python
def sliding_window_allow(key: str, limit: int, window_sec: int) -> bool:
    now = time.time()
    current_window = int(now // window_sec)
    position_in_window = (now % window_sec) / window_sec

    prev_count = get_count(key, current_window - 1)
    curr_count = get_count(key, current_window)

    estimated = prev_count * (1 - position_in_window) + curr_count
    if estimated >= limit:
        return False
    increment_count(key, current_window)
    return True
```


## Installation

### OpenClaw / Moltbot / Clawbot

```bash
npx clawhub@latest install rate-limiting
```


---

## Implementation Options

| Approach | Scope | Best For |
|----------|-------|----------|
| **In-memory** | Single server | Zero latency, no dependencies |
| **Redis** (`INCR` + `EXPIRE`) | Distributed | **Multi-instance deployments** |
| **API Gateway** | Edge | No code, built-in dashboards |
| **Middleware** | Per-service | Fine-grained per-user/endpoint control |

Use gateway-level limiting as outer defense + application-level for fine-grained control.

---

## HTTP Headers

Always return rate limit info, even on successful requests:

```
RateLimit-Limit: 1000
RateLimit-Remaining: 742
RateLimit-Reset: 1625097600
Retry-After: 30
```

| Header | When to Include |
|--------|-----------------|
| `RateLimit-Limit` | Every response |
| `RateLimit-Remaining` | Every response |
| `RateLimit-Reset` | Every response |
| `Retry-After` | 429 responses only |

### 429 Response Body

```json
{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Maximum 1000 requests per hour.",
    "retry_after": 30,
    "limit": 1000,
    "reset_at": "2025-07-01T12:00:00Z"
  }
}
```

Never return `500` or `503` for rate limiting — `429` is the correct status code.

---

## Rate Limit Tiers

Apply limits at multiple granularities:

| Scope | Key | Example Limit | Purpose |
|-------|-----|---------------|---------|
| **Per-IP** | Client IP | 100 req/min | Abuse prevention |
| **Per-User** | User ID | 1000 req/hr | Fair usage |
| **Per-API-Key** | API key | 5000 req/hr | Service-to-service |
| **Per-Endpoint** | Route + key | 60 req/min on `/search` | Protect expensive ops |

**Tiered pricing:**

| Tier | Rate Limit | Burst | Cost |
|------|-----------|-------|------|
| Free | 100 req/hr | 10 | $0 |
| Pro | 5,000 req/hr | 100 | $49/mo |
| Enterprise | 100,000 req/hr | 2,000 | Custom |

Evaluate from most specific to least specific: per-endpoint > per-user > per-IP.

---

## Distributed Rate Limiting

Redis-based pattern for consistent limiting across instances:

```python
def redis_rate_limit(redis, key: str, limit: int, window: int) -> bool:
    pipe = redis.pipeline()
    now = time.time()
    window_key = f"rl:{key}:{int(now // window)}"
    pipe.incr(window_key)
    pipe.expire(window_key, window * 2)
    results = pipe.execute()
    return results[0] <= limit
```

**Atomic Lua script** (prevents race conditions):

```lua
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local current = redis.call('INCR', key)
if current == 1 then
    redis.call('EXPIRE', key, window)
end
return current <= limit and 1 or 0
```

Never do separate GET then SET — the gap allows overcount.

---

## API Gateway Configuration

**NGINX:**

```nginx
http {
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
    server {
        location /api/ {
            limit_req zone=api burst=20 nodelay;
            limit_req_status 429;
        }
    }
}
```

**Kong:**

```yaml
plugins:
  - name: rate-limiting
    config:
      minute: 60
      hour: 1000
      policy: redis
      redis_host: redis.internal
```

---

## Client-Side Handling

Clients must handle `429` gracefully:

```typescript
async function fetchWithRetry(url: string, maxRetries = 3): Promise<Response> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const res = await fetch(url);
    if (res.status !== 429) return res;

    const retryAfter = res.headers.get('Retry-After');
    const delay = retryAfter
      ? parseInt(retryAfter, 10) * 1000
      : Math.min(1000 * 2 ** attempt, 30000);
    await new Promise(r => setTimeout(r, delay));
  }
  throw new Error('Rate limit exceeded after retries');
}
```

- Always respect `Retry-After` when present
- Use exponential backoff with jitter when absent
- Implement request queuing for batch operations

---

## Monitoring

Track these metrics:

- **Rate limit hit rate** — % of requests returning 429 (alert if >5% sustained)
- **Near-limit warnings** — requests where remaining < 10% of limit
- **Top offenders** — keys/IPs hitting limits most frequently
- **Limit headroom** — how close normal traffic is to the ceiling
- **False positives** — legitimate users being rate limited

---

## Anti-Patterns

| Anti-Pattern | Fix |
|-------------|-----|
| **Application-only limiting** | Always combine with infrastructure-level limits |
| **No retry guidance** | Always include `Retry-After` header on 429 |
| **Inconsistent limits** | Same endpoint, same limits across services |
| **No burst allowance** | Allow controlled bursts for legitimate traffic |
| **Silent dropping** | Always return 429 so clients can distinguish from errors |
| **Global single counter** | Per-endpoint counters to protect expensive operations |
| **Hard-coded limits** | Use configuration, not code constants |

---

## NEVER Do

1. **NEVER rate limit health check endpoints** — monitoring systems will false-alarm
2. **NEVER use client-supplied identifiers as sole rate limit key** — trivially spoofed
3. **NEVER return `200 OK` when rate limiting** — clients must know they were throttled
4. **NEVER set limits without measuring actual traffic first** — you'll block legitimate users or set limits too high to matter
5. **NEVER share counters across unrelated tenants** — noisy neighbor problem
6. **NEVER skip rate limiting on internal APIs** — misbehaving internal services can take down shared infrastructure
7. **NEVER implement rate limiting without logging** — you need visibility to tune limits and detect abuse

Related Skills

Content Strategy

from wpank/ai

Plan content aligned with user personas and journey stages. Use when asked to "plan content," "create content strategy," "build editorial calendar," "map content to personas," "audit existing content," or when aligning marketing/docs content with user journeys defined in persona docs.

schema-markup

from wpank/ai

Add, fix, or optimize schema markup and structured data. Use when the user mentions schema markup, structured data, JSON-LD, rich snippets, schema.org, FAQ schema, product schema, review schema, or breadcrumb schema.

prompt-engineering

from wpank/ai

Master advanced prompt engineering techniques to maximize LLM performance, reliability, and controllability in production. Use when optimizing prompts, improving LLM outputs, designing production prompt templates, or building AI-powered features.

professional-communication

from wpank/ai

Write effective professional messages for software teams. Use when drafting emails, Slack/Teams messages, meeting agendas, status updates, or translating technical concepts for non-technical audiences. Triggers on email, slack, teams, message, meeting agenda, status update, stakeholder communication, escalation, jargon translation.

persona-docs

from wpank/ai

Create persona documentation for a product or codebase. Use when asked to create persona docs, document target users, define user journeys, document onboarding flows, or when starting a new product and needing to define its audience. Persona docs should be the first documentation created for any product.

mermaid-diagrams

from wpank/ai

Create software diagrams using Mermaid syntax. Use when users need to create, visualize, or document software through diagrams including class diagrams, sequence diagrams, flowcharts, ERDs, C4 architecture diagrams, state diagrams, git graphs, and other diagram types. Triggers include requests to diagram, visualize, model, map out, or show the flow of a system.

game-changing-features

from wpank/ai

Find 10x product opportunities and high-leverage improvements. Use when the user wants strategic product thinking, mentions 10x, wants to find high-impact features, or asks what would make a product dramatically more valuable.

clear-writing

from wpank/ai

Write clear, concise prose for humans — documentation, READMEs, API docs, commit messages, error messages, UI text, reports, and explanations. Combines Strunk's rules for clearer prose with technical documentation patterns, structure templates, and review checklists.

brainstorming

from wpank/ai

Explore ideas before implementation through collaborative dialogue. Use before any creative work — creating features, building components, adding functionality, or modifying behavior. Turns ideas into fully formed designs and specs through structured conversation.

Article Illustrator

from wpank/ai

When the user wants to add illustrations to an article or blog post. Triggers on: "illustrate article", "add images to article", "generate illustrations", "article images", or requests to visually enhance written content. Analyzes article structure, identifies positions for visual aids, and generates illustrations using a Type x Style two-dimension approach.

subagent-driven-development

from wpank/ai

Execute implementation plans by dispatching a fresh subagent per task with two-stage review (spec compliance then code quality). Use when you have an implementation plan with mostly independent tasks and want high-quality, fast iteration within a single session.

skill-judge

from wpank/ai

Evaluate Agent Skill quality against official specifications. Use when reviewing SKILL.md files, auditing skill packages, improving skill design, or checking if a skill follows best practices. Provides 8-dimension scoring (120 points) with actionable improvements. Triggers on review skill, evaluate skill, audit skill, improve skill, skill quality, SKILL.md review.