openrouter-context-optimization
Optimize context window usage for OpenRouter models to reduce cost and improve quality. Use when hitting context limits, managing long conversations, or building RAG systems. Triggers: 'openrouter context', 'context window', 'openrouter token limit', 'reduce tokens openrouter'.
Best use case
openrouter-context-optimization is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Optimize context window usage for OpenRouter models to reduce cost and improve quality. Use when hitting context limits, managing long conversations, or building RAG systems. Triggers: 'openrouter context', 'context window', 'openrouter token limit', 'reduce tokens openrouter'.
Teams using openrouter-context-optimization should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/openrouter-context-optimization/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How openrouter-context-optimization Compares
| Feature / Agent | openrouter-context-optimization | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Optimize context window usage for OpenRouter models to reduce cost and improve quality. Use when hitting context limits, managing long conversations, or building RAG systems. Triggers: 'openrouter context', 'context window', 'openrouter token limit', 'reduce tokens openrouter'.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
SKILL.md Source
# OpenRouter Context Optimization
## Overview
OpenRouter models have varying context windows (4K to 1M+ tokens). Since pricing is per-token, stuffing unnecessary context wastes money and can degrade output quality. This skill covers context window lookup, token estimation, conversation trimming, chunking strategies, and Anthropic prompt caching for large contexts.
## Query Context Limits
```bash
# Check context window for specific models
curl -s https://openrouter.ai/api/v1/models | jq '[.data[] | select(
.id == "anthropic/claude-3.5-sonnet" or
.id == "openai/gpt-4o" or
.id == "google/gemini-2.0-flash-001" or
.id == "meta-llama/llama-3.1-70b-instruct"
) | {id, context_length, prompt_per_M: ((.pricing.prompt|tonumber)*1000000)}]'
```
## Context-Aware Model Selection
```python
import os, requests
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"],
default_headers={"HTTP-Referer": "https://my-app.com", "X-Title": "my-app"},
)
# Cache model metadata at startup
MODELS = {m["id"]: m for m in requests.get("https://openrouter.ai/api/v1/models").json()["data"]}
def estimate_tokens(text: str) -> int:
"""Rough estimate: 1 token ~ 4 characters for English text."""
return len(text) // 4
def select_model_for_context(messages: list, preferred: str = "anthropic/claude-3.5-sonnet") -> str:
"""Pick a model that fits the context, falling back to larger windows."""
estimated_tokens = sum(len(m.get("content", "")) for m in messages) // 4
FALLBACK_CHAIN = [
("openai/gpt-4o-mini", 128_000),
("anthropic/claude-3.5-sonnet", 200_000),
("google/gemini-2.0-flash-001", 1_000_000),
]
# Try preferred model first
preferred_ctx = MODELS.get(preferred, {}).get("context_length", 0)
if estimated_tokens < preferred_ctx * 0.8: # 80% safety margin
return preferred
for model_id, ctx in FALLBACK_CHAIN:
if estimated_tokens < ctx * 0.8:
return model_id
raise ValueError(f"Content too large ({estimated_tokens} est. tokens)")
```
## Conversation Trimming
```python
def trim_conversation(
messages: list[dict],
max_tokens: int = 100_000,
keep_system: bool = True,
keep_last_n: int = 4,
) -> list[dict]:
"""Trim conversation history to fit context window.
Strategy: Keep system prompt + last N messages.
If still too large, reduce to last 2 messages.
"""
system = [m for m in messages if m["role"] == "system"] if keep_system else []
non_system = [m for m in messages if m["role"] != "system"]
kept = non_system[-keep_last_n:]
trimmed = non_system[:-keep_last_n] if len(non_system) > keep_last_n else []
total_est = sum(estimate_tokens(m.get("content", "")) for m in system + kept)
if total_est > max_tokens and keep_last_n > 2:
kept = non_system[-2:]
result = system + kept
if trimmed:
summary_note = {
"role": "system",
"content": f"[Previous {len(trimmed)} messages trimmed for context limits]",
}
result = system + [summary_note] + kept
return result
```
## Chunking for Large Documents
```python
def chunk_and_process(document: str, question: str, model: str = "openai/gpt-4o-mini",
chunk_size: int = 8000, overlap: int = 500) -> str:
"""Process a large document in overlapping chunks, then synthesize."""
chunks = []
start = 0
while start < len(document):
chunks.append(document[start:start + chunk_size])
start += chunk_size - overlap
results = []
for i, chunk in enumerate(chunks):
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": f"Analyzing chunk {i+1}/{len(chunks)}."},
{"role": "user", "content": f"Document:\n{chunk}\n\nQuestion: {question}"},
],
max_tokens=1024, temperature=0,
)
results.append(response.choices[0].message.content)
# Synthesize
synthesis = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "Synthesize these partial analyses."},
{"role": "user", "content": f"Question: {question}\n\nResults:\n" + "\n---\n".join(results)},
],
max_tokens=2048, temperature=0,
)
return synthesis.choices[0].message.content
```
## Prompt Caching for Repeated Context
```python
# Anthropic models support prompt caching -- mark large static blocks
# Subsequent requests with same cached block cost 90% less for input tokens
response = client.chat.completions.create(
model="anthropic/claude-3.5-sonnet",
messages=[
{
"role": "system",
"content": [
{
"type": "text",
"text": large_reference_document, # 50K+ tokens
"cache_control": {"type": "ephemeral"},
}
],
},
{"role": "user", "content": "Summarize section 3."},
],
max_tokens=1024,
)
# First request: cache_creation_input_tokens at 1.25x rate
# Subsequent: cache_read_input_tokens at 0.1x rate (90% savings)
```
## Error Handling
| Error | Cause | Fix |
|-------|-------|-----|
| 400 `context_length_exceeded` | Input + max_tokens > model limit | Trim messages or use larger-context model |
| 400 `max_tokens too large` | max_tokens alone exceeds limit | Reduce max_tokens |
| Slow responses | Very large context | Use streaming; consider chunking |
| Degraded quality | Too much irrelevant context | Trim to relevant content only |
## Enterprise Considerations
- Query `/api/v1/models` at startup to cache context limits -- don't hardcode (they change)
- Use `max_tokens` on every request to prevent runaway completion costs on large contexts
- Implement conversation trimming as middleware so all calls respect limits
- Use Anthropic prompt caching for RAG contexts that repeat across requests (90% input savings)
- Route large-context tasks to cost-effective models (Gemini Flash for 1M context at low cost)
- Monitor `prompt_tokens` in responses to detect context bloat before it hits limits
## References
- [Examples](${CLAUDE_SKILL_DIR}/references/examples.md) | [Errors](${CLAUDE_SKILL_DIR}/references/errors.md)
- [Prompt Caching](https://openrouter.ai/docs/features/prompt-caching) | [Models API](https://openrouter.ai/docs/api/api-reference/models/get-models)Related Skills
schema-optimization-orchestrator
Multi-phase schema optimization workflow orchestrator. Creates session directories, spawns phase agents sequentially, validates outputs, aggregates results. Trigger: "run schema optimization", "optimize schema workflow", "execute schema phases"
windsurf-cascade-context
Manage Cascade context window and memory for complex projects. Activate when users mention "cascade context", "ai memory", "context management", "large codebase navigation", or "multi-session development". Handles context optimization and persistence. Use when working with windsurf cascade context functionality. Trigger with phrases like "windsurf cascade context", "windsurf context", "windsurf".
openrouter-usage-analytics
Track and analyze OpenRouter API usage patterns, costs, and performance. Use when building dashboards, optimizing spend, or reporting on AI usage. Triggers: 'openrouter analytics', 'openrouter usage', 'openrouter metrics', 'track openrouter spend'.
openrouter-upgrade-migration
Migrate to OpenRouter from direct provider APIs or upgrade between SDK/model versions. Triggers: 'openrouter migrate', 'openrouter upgrade', 'switch to openrouter', 'migrate from openai to openrouter'.
openrouter-team-setup
Configure OpenRouter for multi-user teams with per-user keys, budget controls, and usage attribution. Triggers: 'openrouter team', 'openrouter multi-user', 'openrouter organization', 'team api keys openrouter'.
openrouter-routing-rules
Define custom routing rules for OpenRouter requests based on user tier, task type, cost budget, and availability. Triggers: 'openrouter rules', 'routing rules', 'custom routing openrouter', 'conditional model selection'.
openrouter-reference-architecture
Design production architectures using OpenRouter as the LLM gateway. Use when planning system design, reviewing architecture, or scaling AI applications. Triggers: 'openrouter architecture', 'openrouter system design', 'openrouter at scale', 'llm gateway architecture'.
openrouter-rate-limits
Understand and handle OpenRouter rate limits. Use when hitting 429 errors, building high-throughput systems, or implementing retry logic. Triggers: 'openrouter rate limit', 'openrouter 429', 'openrouter throttle', 'rate limiting openrouter'.
openrouter-prod-checklist
Validate production readiness of your OpenRouter integration. Use before launching to production or during operational reviews. Triggers: 'openrouter production', 'openrouter launch', 'production checklist openrouter', 'openrouter deploy'.
openrouter-pricing-basics
Understand OpenRouter pricing, calculate costs, and optimize spend. Use when budgeting, comparing model costs, or tracking spend. Triggers: 'openrouter pricing', 'openrouter cost', 'model pricing', 'openrouter budget', 'how much does openrouter cost'.
openrouter-performance-tuning
Optimize OpenRouter request latency and throughput. Use when building real-time applications, reducing TTFT, or scaling request volume. Triggers: 'openrouter performance', 'openrouter latency', 'openrouter speed', 'optimize openrouter throughput'.
openrouter-openai-compat
Migrate from OpenAI to OpenRouter with minimal code changes. Use when switching to OpenRouter or maintaining dual compatibility. Triggers: 'openrouter openai compatible', 'openrouter drop-in', 'openai to openrouter', 'openrouter migration'.