openrouter-caching-strategy

Implement caching for OpenRouter API responses to reduce cost and latency. Use when optimizing repeat queries, building RAG systems, or reducing API spend. Triggers: 'openrouter cache', 'cache llm responses', 'openrouter caching', 'reduce openrouter cost'.

1,868 stars

byjeremylongshore

View on GitHub Installation ↓

Best use case

openrouter-caching-strategy is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using openrouter-caching-strategy should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/openrouter-caching-strategy/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/saas-packs/openrouter-pack/skills/openrouter-caching-strategy/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/openrouter-caching-strategy/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How openrouter-caching-strategy Compares

Feature / Agent	openrouter-caching-strategy	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

SKILL.md Source

# OpenRouter Caching Strategy

## Overview

OpenRouter charges per token, so caching identical or similar requests can dramatically cut costs. Deterministic requests (`temperature=0`) with the same model and messages produce identical outputs -- these are safe to cache. This skill covers in-memory caching, persistent caching with TTL, and Anthropic prompt caching via OpenRouter.

## In-Memory Cache

```python
import os, hashlib, json, time
from typing import Optional
from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
    default_headers={"HTTP-Referer": "https://my-app.com", "X-Title": "my-app"},
)

class LLMCache:
    def __init__(self, ttl_seconds: int = 3600):
        self._cache: dict[str, tuple[dict, float]] = {}
        self._ttl = ttl_seconds
        self.hits = 0
        self.misses = 0

    def _key(self, model: str, messages: list, **kwargs) -> str:
        blob = json.dumps({"model": model, "messages": messages, **kwargs}, sort_keys=True)
        return hashlib.sha256(blob.encode()).hexdigest()

    def get(self, model: str, messages: list, **kwargs) -> Optional[dict]:
        k = self._key(model, messages, **kwargs)
        if k in self._cache:
            data, ts = self._cache[k]
            if time.time() - ts < self._ttl:
                self.hits += 1
                return data
            del self._cache[k]
        self.misses += 1
        return None

    def set(self, model: str, messages: list, response: dict, **kwargs):
        k = self._key(model, messages, **kwargs)
        self._cache[k] = (response, time.time())

cache = LLMCache(ttl_seconds=1800)

def cached_completion(messages, model="anthropic/claude-3.5-sonnet", **kwargs):
    """Only cache deterministic requests (temperature=0)."""
    kwargs.setdefault("temperature", 0)
    kwargs.setdefault("max_tokens", 1024)

    cached = cache.get(model, messages, **kwargs)
    if cached:
        return cached

    response = client.chat.completions.create(model=model, messages=messages, **kwargs)
    result = {
        "content": response.choices[0].message.content,
        "model": response.model,
        "usage": {"prompt": response.usage.prompt_tokens, "completion": response.usage.completion_tokens},
    }
    cache.set(model, messages, result, **kwargs)
    return result
```

## Persistent Cache with Redis

```python
import redis, json, hashlib

r = redis.Redis(host="localhost", port=6379, db=0)

def redis_cached_completion(messages, model="openai/gpt-4o-mini", ttl=3600, **kwargs):
    """Cache in Redis with automatic TTL expiry."""
    kwargs["temperature"] = 0  # Must be deterministic
    key = f"or:{hashlib.sha256(json.dumps({'m': model, 'msgs': messages, **kwargs}, sort_keys=True).encode()).hexdigest()}"

    cached = r.get(key)
    if cached:
        return json.loads(cached)

    response = client.chat.completions.create(model=model, messages=messages, **kwargs)
    result = {
        "content": response.choices[0].message.content,
        "model": response.model,
        "tokens": response.usage.prompt_tokens + response.usage.completion_tokens,
    }
    r.setex(key, ttl, json.dumps(result))
    return result
```

## Anthropic Prompt Caching via OpenRouter

Anthropic models on OpenRouter support prompt caching -- large system prompts are cached server-side, reducing input cost by 90% on cache hits.

```python
# Mark large static content blocks with cache_control
response = client.chat.completions.create(
    model="anthropic/claude-3.5-sonnet",
    messages=[
        {
            "role": "system",
            "content": [
                {
                    "type": "text",
                    "text": "You are an expert. Here is the full source:\n" + large_context,
                    "cache_control": {"type": "ephemeral"},  # Cache this block
                }
            ],
        },
        {"role": "user", "content": "What does the main() function do?"},
    ],
    max_tokens=1024,
)
# First call: cache_creation_input_tokens charged at 1.25x
# Subsequent: cache_read_input_tokens charged at 0.1x (90% savings)
```

## Cache Key Design

```python
def cache_key(model: str, messages: list, **params) -> str:
    """Deterministic cache key. Include everything that affects output.

    Include: model ID (with variant like :floor), messages, temperature,
    max_tokens, top_p, transforms, provider routing.
    Exclude: stream (doesn't affect content), HTTP-Referer, X-Title.
    """
    canonical = json.dumps({
        "model": model, "messages": messages,
        "temperature": params.get("temperature", 0),
        "max_tokens": params.get("max_tokens"),
        "top_p": params.get("top_p"),
    }, sort_keys=True)
    return hashlib.sha256(canonical.encode()).hexdigest()
```

## Cache Invalidation

| Trigger | Action | Why |
|---------|--------|-----|
| Model version update | Flush keys for that model | New version may give different outputs |
| System prompt change | Flush all keys | Output semantics changed |
| TTL expiry | Automatic eviction | Prevents stale data |
| Manual purge | `r.delete(key)` or clear by prefix | Debugging or policy change |

## Error Handling

| Error | Cause | Fix |
|-------|-------|-----|
| Stale cache response | TTL too long | Reduce TTL or version cache keys |
| Cache miss storm | Cold start or invalidation | Warm cache with common queries at deploy |
| Redis connection error | Redis down | Fall through to direct API call |
| Non-deterministic cache | `temperature > 0` cached | Only cache when `temperature=0` |

## Enterprise Considerations

- Only cache deterministic requests (`temperature=0`) -- non-zero temperatures produce different outputs each time
- Use Anthropic prompt caching for large system prompts (RAG context) -- 90% cost reduction on cache hits
- Set TTL based on content freshness needs (30 min for dynamic, 24h for reference data)
- Track cache hit rate to justify caching infrastructure cost
- Use Redis or Memcached for multi-instance deployments; in-memory only works for single-process
- Version cache keys when updating system prompts or switching model versions

## References

- [Examples](${CLAUDE_SKILL_DIR}/references/examples.md) | [Errors](${CLAUDE_SKILL_DIR}/references/errors.md)
- [Prompt Caching](https://openrouter.ai/docs/features/prompt-caching) | [Models API](https://openrouter.ai/docs/api/api-reference/models/get-models)

Related Skills

openrouter-usage-analytics

1868

from jeremylongshore/claude-code-plugins-plus-skills

Track and analyze OpenRouter API usage patterns, costs, and performance. Use when building dashboards, optimizing spend, or reporting on AI usage. Triggers: 'openrouter analytics', 'openrouter usage', 'openrouter metrics', 'track openrouter spend'.

openrouter-upgrade-migration

1868

from jeremylongshore/claude-code-plugins-plus-skills

Migrate to OpenRouter from direct provider APIs or upgrade between SDK/model versions. Triggers: 'openrouter migrate', 'openrouter upgrade', 'switch to openrouter', 'migrate from openai to openrouter'.

openrouter-team-setup

1868

from jeremylongshore/claude-code-plugins-plus-skills

Configure OpenRouter for multi-user teams with per-user keys, budget controls, and usage attribution. Triggers: 'openrouter team', 'openrouter multi-user', 'openrouter organization', 'team api keys openrouter'.

openrouter-routing-rules

1868

from jeremylongshore/claude-code-plugins-plus-skills

Define custom routing rules for OpenRouter requests based on user tier, task type, cost budget, and availability. Triggers: 'openrouter rules', 'routing rules', 'custom routing openrouter', 'conditional model selection'.

openrouter-reference-architecture

1868

from jeremylongshore/claude-code-plugins-plus-skills

Design production architectures using OpenRouter as the LLM gateway. Use when planning system design, reviewing architecture, or scaling AI applications. Triggers: 'openrouter architecture', 'openrouter system design', 'openrouter at scale', 'llm gateway architecture'.

openrouter-rate-limits

1868

from jeremylongshore/claude-code-plugins-plus-skills

Understand and handle OpenRouter rate limits. Use when hitting 429 errors, building high-throughput systems, or implementing retry logic. Triggers: 'openrouter rate limit', 'openrouter 429', 'openrouter throttle', 'rate limiting openrouter'.

openrouter-prod-checklist

1868

from jeremylongshore/claude-code-plugins-plus-skills

Validate production readiness of your OpenRouter integration. Use before launching to production or during operational reviews. Triggers: 'openrouter production', 'openrouter launch', 'production checklist openrouter', 'openrouter deploy'.

openrouter-pricing-basics

1868

from jeremylongshore/claude-code-plugins-plus-skills

Understand OpenRouter pricing, calculate costs, and optimize spend. Use when budgeting, comparing model costs, or tracking spend. Triggers: 'openrouter pricing', 'openrouter cost', 'model pricing', 'openrouter budget', 'how much does openrouter cost'.

openrouter-performance-tuning

1868

from jeremylongshore/claude-code-plugins-plus-skills

Optimize OpenRouter request latency and throughput. Use when building real-time applications, reducing TTFT, or scaling request volume. Triggers: 'openrouter performance', 'openrouter latency', 'openrouter speed', 'optimize openrouter throughput'.

openrouter-openai-compat

1868

from jeremylongshore/claude-code-plugins-plus-skills

Migrate from OpenAI to OpenRouter with minimal code changes. Use when switching to OpenRouter or maintaining dual compatibility. Triggers: 'openrouter openai compatible', 'openrouter drop-in', 'openai to openrouter', 'openrouter migration'.

openrouter-multi-provider

1868

from jeremylongshore/claude-code-plugins-plus-skills

Use multiple AI providers (OpenAI, Anthropic, Google, Meta) through OpenRouter's unified API. Use when comparing providers, building cross-provider workflows, or maximizing availability. Triggers: 'openrouter providers', 'multi provider', 'openrouter openai anthropic', 'compare models openrouter'.

openrouter-model-routing

1868

from jeremylongshore/claude-code-plugins-plus-skills

Implement intelligent model routing to optimize cost, quality, and latency on OpenRouter. Use when building multi-model systems or optimizing spend across task types. Triggers: 'openrouter routing', 'model routing', 'route to model', 'model selection openrouter'.