openrouter-streaming-setup
Implement streaming responses with OpenRouter for real-time UIs. Use when building chat interfaces, reducing time-to-first-token, or processing long completions. Triggers: 'openrouter streaming', 'openrouter sse', 'stream response openrouter', 'real-time openrouter'.
Best use case
openrouter-streaming-setup is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Implement streaming responses with OpenRouter for real-time UIs. Use when building chat interfaces, reducing time-to-first-token, or processing long completions. Triggers: 'openrouter streaming', 'openrouter sse', 'stream response openrouter', 'real-time openrouter'.
Teams using openrouter-streaming-setup should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/openrouter-streaming-setup/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How openrouter-streaming-setup Compares
| Feature / Agent | openrouter-streaming-setup | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Implement streaming responses with OpenRouter for real-time UIs. Use when building chat interfaces, reducing time-to-first-token, or processing long completions. Triggers: 'openrouter streaming', 'openrouter sse', 'stream response openrouter', 'real-time openrouter'.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
SKILL.md Source
# OpenRouter Streaming Setup
## Overview
OpenRouter supports Server-Sent Events (SSE) streaming via `stream: true`, compatible with the OpenAI SDK. Streaming returns tokens as they're generated, reducing time-to-first-token (TTFT) from seconds to milliseconds. Usage stats are available via `stream_options: {include_usage: true}` in the final chunk. This skill covers Python and TypeScript streaming, SSE forwarding to browsers, and error recovery.
## Python: Basic Streaming
```python
import os
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"],
default_headers={"HTTP-Referer": "https://my-app.com", "X-Title": "my-app"},
)
# Stream with usage stats
stream = client.chat.completions.create(
model="anthropic/claude-3.5-sonnet",
messages=[{"role": "user", "content": "Explain how HTTP streaming works"}],
max_tokens=500,
stream=True,
stream_options={"include_usage": True}, # Get token counts in final chunk
)
full_content = []
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
token = chunk.choices[0].delta.content
print(token, end="", flush=True)
full_content.append(token)
# Final chunk contains usage stats
if chunk.usage:
print(f"\n---\nTokens: {chunk.usage.prompt_tokens} in + {chunk.usage.completion_tokens} out")
result = "".join(full_content)
```
## Python: Streaming with Metrics
```python
import time
def stream_with_metrics(messages, model="anthropic/claude-3.5-sonnet", **kwargs):
"""Stream response and capture performance metrics."""
start = time.monotonic()
first_token_time = None
chunks = []
usage = None
stream = client.chat.completions.create(
model=model, messages=messages, stream=True,
stream_options={"include_usage": True},
**kwargs,
)
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
token = chunk.choices[0].delta.content
if first_token_time is None:
first_token_time = (time.monotonic() - start) * 1000
chunks.append(token)
yield token # Yield each token as it arrives
if chunk.usage:
usage = {
"prompt_tokens": chunk.usage.prompt_tokens,
"completion_tokens": chunk.usage.completion_tokens,
}
total_time = (time.monotonic() - start) * 1000
# Metrics available after generator exhausted
stream_with_metrics.last_metrics = {
"ttft_ms": round(first_token_time or 0),
"total_ms": round(total_time),
"usage": usage,
"model": model,
}
# Usage
for token in stream_with_metrics(
[{"role": "user", "content": "Hello"}],
model="openai/gpt-4o-mini",
max_tokens=200,
):
print(token, end="", flush=True)
print(f"\nMetrics: {stream_with_metrics.last_metrics}")
```
## TypeScript: Streaming
```typescript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: process.env.OPENROUTER_API_KEY,
defaultHeaders: { "HTTP-Referer": "https://my-app.com", "X-Title": "my-app" },
});
async function streamCompletion(prompt: string, model = "openai/gpt-4o-mini") {
const stream = await client.chat.completions.create({
model,
messages: [{ role: "user", content: prompt }],
max_tokens: 500,
stream: true,
});
const chunks: string[] = [];
for await (const chunk of stream) {
const token = chunk.choices[0]?.delta?.content;
if (token) {
process.stdout.write(token);
chunks.push(token);
}
}
return chunks.join("");
}
```
## SSE Forwarding to Browser (FastAPI)
```python
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
app = FastAPI()
@app.post("/v1/stream")
async def stream_endpoint(prompt: str, model: str = "openai/gpt-4o-mini"):
"""Forward OpenRouter SSE stream to browser."""
async def generate():
stream = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=1024,
stream=True,
)
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
token = chunk.choices[0].delta.content
yield f"data: {json.dumps({'token': token})}\n\n"
yield "data: [DONE]\n\n"
return StreamingResponse(generate(), media_type="text/event-stream")
```
## Browser Client (JavaScript)
```javascript
// Consume SSE stream from your backend
async function streamChat(prompt) {
const response = await fetch("/v1/stream", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ prompt }),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
for (const line of text.split("\n")) {
if (line.startsWith("data: ") && line !== "data: [DONE]") {
const data = JSON.parse(line.slice(6));
document.getElementById("output").textContent += data.token;
}
}
}
}
```
## Async Streaming (Python)
```python
from openai import AsyncOpenAI
aclient = AsyncOpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"],
default_headers={"HTTP-Referer": "https://my-app.com", "X-Title": "my-app"},
)
async def async_stream(messages, model="openai/gpt-4o-mini", **kwargs):
"""Async streaming for use in async web frameworks."""
stream = await aclient.chat.completions.create(
model=model, messages=messages, stream=True, **kwargs,
)
async for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
yield chunk.choices[0].delta.content
```
## Error Handling
| Error | Cause | Fix |
|-------|-------|-----|
| Stream cuts off mid-response | Network timeout or provider error | Save partial content; implement retry from last position |
| Missing `usage` in stream | Didn't set `stream_options` | Add `stream_options: {"include_usage": True}` |
| Empty delta chunks | Keep-alive pings | Filter `chunk.choices[0].delta.content is None` |
| `finish_reason: "length"` | Hit max_tokens limit | Increase max_tokens or continue with follow-up request |
## Enterprise Considerations
- Always use `stream_options: {"include_usage": True}` to get token counts for cost tracking
- Set connection timeouts appropriate for streaming (longer than non-streaming, e.g., 120s)
- Implement heartbeat detection: if no chunks for >30s, consider the stream dead and retry
- Buffer partial tokens on the server before forwarding to the client for smoother rendering
- Log TTFT per model to benchmark streaming performance over time
- Use streaming for all user-facing requests; use non-streaming for batch/background processing
## References
- [Examples](${CLAUDE_SKILL_DIR}/references/examples.md) | [Errors](${CLAUDE_SKILL_DIR}/references/errors.md)
- [Streaming](https://openrouter.ai/docs/features/streaming) | [API Reference](https://openrouter.ai/docs/api/reference/overview)Related Skills
windsurf-multi-env-setup
Configure Windsurf IDE and Cascade AI across team members and project environments. Use when onboarding teams to Windsurf, setting up per-project Cascade configuration, or managing Windsurf settings across development, staging, and production contexts. Trigger with phrases like "windsurf team setup", "windsurf environments", "windsurf multi-project", "windsurf team config", "cascade rules per env".
webflow-multi-env-setup
Configure Webflow across development, staging, and production environments with per-environment API tokens, site IDs, and secret management via Vault/AWS/GCP. Trigger with phrases like "webflow environments", "webflow staging", "webflow dev prod", "webflow environment setup", "webflow config by env".
vercel-multi-env-setup
Configure Vercel across development, preview, and production environments with scoped secrets. Use when setting up per-environment configuration, managing environment-specific variables, or implementing environment isolation on Vercel. Trigger with phrases like "vercel environments", "vercel staging", "vercel dev prod", "vercel environment setup", "vercel env scoping".
veeva-multi-env-setup
Veeva Vault multi env setup for enterprise operations. Use when implementing advanced Veeva Vault patterns. Trigger: "veeva multi env setup".
vastai-multi-env-setup
Configure Vast.ai GPU cloud across dev, staging, and production environments. Use when isolating GPU pools per team, managing API key separation by env, or implementing spending controls per deployment tier. Trigger with phrases like "vastai environments", "vastai staging", "vastai dev prod", "vastai multi-env".
supabase-multi-env-setup
Configure Supabase across development, staging, and production with separate projects, environment-specific secrets, and safe migration promotion. Use when setting up multi-environment deployments, isolating dev from prod data, configuring per-environment Supabase projects, or promoting migrations through environments. Trigger: "supabase environments", "supabase staging", "supabase dev prod", "supabase multi-project", "supabase env config", "database branching".
speak-multi-env-setup
Configure Speak across dev, staging, and production with separate API keys and mock modes. Use when implementing multi env setup, or managing Speak language learning platform operations. Trigger with phrases like "speak multi env setup", "speak multi env setup".
snowflake-multi-env-setup
Configure Snowflake across dev, staging, and production with account-level isolation, zero-copy clones, and environment-specific RBAC. Trigger with phrases like "snowflake environments", "snowflake staging", "snowflake dev prod", "snowflake clone", "snowflake environment setup".
windsurf-workspace-setup
Initialize Windsurf workspace with project-specific AI rules. Activate when users mention "create windsurfrules", "setup workspace", "configure project ai", "initialize windsurf workspace", or "migrate to windsurf". Handles workspace configuration and team standardization. Use when working with windsurf workspace setup functionality. Trigger with phrases like "windsurf workspace setup", "windsurf setup", "windsurf".
shopify-multi-env-setup
Configure Shopify apps across development, staging, and production environments with separate stores, API credentials, and app instances. Trigger with phrases like "shopify environments", "shopify staging", "shopify dev vs prod", "shopify multi-store", "shopify environment setup".
salesforce-multi-env-setup
Configure Salesforce across Developer, Sandbox, and Production environments with proper org management. Use when setting up multi-environment deployments, configuring per-environment credentials, or implementing sandbox-to-production promotion flows. Trigger with phrases like "salesforce environments", "salesforce sandbox", "salesforce dev prod", "salesforce org management", "salesforce sandbox types".
retellai-multi-env-setup
Retell AI multi env setup — AI voice agent and phone call automation. Use when working with Retell AI for voice agents, phone calls, or telephony. Trigger with phrases like "retell multi env setup", "retellai-multi-env-setup", "voice agent".