openrouter-streaming-setup

Implement streaming responses with OpenRouter for real-time UIs. Use when building chat interfaces, reducing time-to-first-token, or processing long completions. Triggers: 'openrouter streaming', 'openrouter sse', 'stream response openrouter', 'real-time openrouter'.

1,867 stars

Best use case

openrouter-streaming-setup is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Implement streaming responses with OpenRouter for real-time UIs. Use when building chat interfaces, reducing time-to-first-token, or processing long completions. Triggers: 'openrouter streaming', 'openrouter sse', 'stream response openrouter', 'real-time openrouter'.

Teams using openrouter-streaming-setup should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/openrouter-streaming-setup/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/saas-packs/openrouter-pack/skills/openrouter-streaming-setup/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/openrouter-streaming-setup/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How openrouter-streaming-setup Compares

Feature / Agentopenrouter-streaming-setupStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Implement streaming responses with OpenRouter for real-time UIs. Use when building chat interfaces, reducing time-to-first-token, or processing long completions. Triggers: 'openrouter streaming', 'openrouter sse', 'stream response openrouter', 'real-time openrouter'.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# OpenRouter Streaming Setup

## Overview

OpenRouter supports Server-Sent Events (SSE) streaming via `stream: true`, compatible with the OpenAI SDK. Streaming returns tokens as they're generated, reducing time-to-first-token (TTFT) from seconds to milliseconds. Usage stats are available via `stream_options: {include_usage: true}` in the final chunk. This skill covers Python and TypeScript streaming, SSE forwarding to browsers, and error recovery.

## Python: Basic Streaming

```python
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
    default_headers={"HTTP-Referer": "https://my-app.com", "X-Title": "my-app"},
)

# Stream with usage stats
stream = client.chat.completions.create(
    model="anthropic/claude-3.5-sonnet",
    messages=[{"role": "user", "content": "Explain how HTTP streaming works"}],
    max_tokens=500,
    stream=True,
    stream_options={"include_usage": True},  # Get token counts in final chunk
)

full_content = []
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        token = chunk.choices[0].delta.content
        print(token, end="", flush=True)
        full_content.append(token)

    # Final chunk contains usage stats
    if chunk.usage:
        print(f"\n---\nTokens: {chunk.usage.prompt_tokens} in + {chunk.usage.completion_tokens} out")

result = "".join(full_content)
```

## Python: Streaming with Metrics

```python
import time

def stream_with_metrics(messages, model="anthropic/claude-3.5-sonnet", **kwargs):
    """Stream response and capture performance metrics."""
    start = time.monotonic()
    first_token_time = None
    chunks = []
    usage = None

    stream = client.chat.completions.create(
        model=model, messages=messages, stream=True,
        stream_options={"include_usage": True},
        **kwargs,
    )

    for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            token = chunk.choices[0].delta.content
            if first_token_time is None:
                first_token_time = (time.monotonic() - start) * 1000
            chunks.append(token)
            yield token  # Yield each token as it arrives

        if chunk.usage:
            usage = {
                "prompt_tokens": chunk.usage.prompt_tokens,
                "completion_tokens": chunk.usage.completion_tokens,
            }

    total_time = (time.monotonic() - start) * 1000
    # Metrics available after generator exhausted
    stream_with_metrics.last_metrics = {
        "ttft_ms": round(first_token_time or 0),
        "total_ms": round(total_time),
        "usage": usage,
        "model": model,
    }

# Usage
for token in stream_with_metrics(
    [{"role": "user", "content": "Hello"}],
    model="openai/gpt-4o-mini",
    max_tokens=200,
):
    print(token, end="", flush=True)
print(f"\nMetrics: {stream_with_metrics.last_metrics}")
```

## TypeScript: Streaming

```typescript
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
  defaultHeaders: { "HTTP-Referer": "https://my-app.com", "X-Title": "my-app" },
});

async function streamCompletion(prompt: string, model = "openai/gpt-4o-mini") {
  const stream = await client.chat.completions.create({
    model,
    messages: [{ role: "user", content: prompt }],
    max_tokens: 500,
    stream: true,
  });

  const chunks: string[] = [];
  for await (const chunk of stream) {
    const token = chunk.choices[0]?.delta?.content;
    if (token) {
      process.stdout.write(token);
      chunks.push(token);
    }
  }
  return chunks.join("");
}
```

## SSE Forwarding to Browser (FastAPI)

```python
from fastapi import FastAPI
from fastapi.responses import StreamingResponse

app = FastAPI()

@app.post("/v1/stream")
async def stream_endpoint(prompt: str, model: str = "openai/gpt-4o-mini"):
    """Forward OpenRouter SSE stream to browser."""
    async def generate():
        stream = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            max_tokens=1024,
            stream=True,
        )
        for chunk in stream:
            if chunk.choices and chunk.choices[0].delta.content:
                token = chunk.choices[0].delta.content
                yield f"data: {json.dumps({'token': token})}\n\n"
        yield "data: [DONE]\n\n"

    return StreamingResponse(generate(), media_type="text/event-stream")
```

## Browser Client (JavaScript)

```javascript
// Consume SSE stream from your backend
async function streamChat(prompt) {
  const response = await fetch("/v1/stream", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ prompt }),
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const text = decoder.decode(value);
    for (const line of text.split("\n")) {
      if (line.startsWith("data: ") && line !== "data: [DONE]") {
        const data = JSON.parse(line.slice(6));
        document.getElementById("output").textContent += data.token;
      }
    }
  }
}
```

## Async Streaming (Python)

```python
from openai import AsyncOpenAI

aclient = AsyncOpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
    default_headers={"HTTP-Referer": "https://my-app.com", "X-Title": "my-app"},
)

async def async_stream(messages, model="openai/gpt-4o-mini", **kwargs):
    """Async streaming for use in async web frameworks."""
    stream = await aclient.chat.completions.create(
        model=model, messages=messages, stream=True, **kwargs,
    )
    async for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            yield chunk.choices[0].delta.content
```

## Error Handling

| Error | Cause | Fix |
|-------|-------|-----|
| Stream cuts off mid-response | Network timeout or provider error | Save partial content; implement retry from last position |
| Missing `usage` in stream | Didn't set `stream_options` | Add `stream_options: {"include_usage": True}` |
| Empty delta chunks | Keep-alive pings | Filter `chunk.choices[0].delta.content is None` |
| `finish_reason: "length"` | Hit max_tokens limit | Increase max_tokens or continue with follow-up request |

## Enterprise Considerations

- Always use `stream_options: {"include_usage": True}` to get token counts for cost tracking
- Set connection timeouts appropriate for streaming (longer than non-streaming, e.g., 120s)
- Implement heartbeat detection: if no chunks for >30s, consider the stream dead and retry
- Buffer partial tokens on the server before forwarding to the client for smoother rendering
- Log TTFT per model to benchmark streaming performance over time
- Use streaming for all user-facing requests; use non-streaming for batch/background processing

## References

- [Examples](${CLAUDE_SKILL_DIR}/references/examples.md) | [Errors](${CLAUDE_SKILL_DIR}/references/errors.md)
- [Streaming](https://openrouter.ai/docs/features/streaming) | [API Reference](https://openrouter.ai/docs/api/reference/overview)

Related Skills

windsurf-multi-env-setup

1868
from jeremylongshore/claude-code-plugins-plus-skills

Configure Windsurf IDE and Cascade AI across team members and project environments. Use when onboarding teams to Windsurf, setting up per-project Cascade configuration, or managing Windsurf settings across development, staging, and production contexts. Trigger with phrases like "windsurf team setup", "windsurf environments", "windsurf multi-project", "windsurf team config", "cascade rules per env".

webflow-multi-env-setup

1868
from jeremylongshore/claude-code-plugins-plus-skills

Configure Webflow across development, staging, and production environments with per-environment API tokens, site IDs, and secret management via Vault/AWS/GCP. Trigger with phrases like "webflow environments", "webflow staging", "webflow dev prod", "webflow environment setup", "webflow config by env".

vercel-multi-env-setup

1868
from jeremylongshore/claude-code-plugins-plus-skills

Configure Vercel across development, preview, and production environments with scoped secrets. Use when setting up per-environment configuration, managing environment-specific variables, or implementing environment isolation on Vercel. Trigger with phrases like "vercel environments", "vercel staging", "vercel dev prod", "vercel environment setup", "vercel env scoping".

veeva-multi-env-setup

1868
from jeremylongshore/claude-code-plugins-plus-skills

Veeva Vault multi env setup for enterprise operations. Use when implementing advanced Veeva Vault patterns. Trigger: "veeva multi env setup".

vastai-multi-env-setup

1868
from jeremylongshore/claude-code-plugins-plus-skills

Configure Vast.ai GPU cloud across dev, staging, and production environments. Use when isolating GPU pools per team, managing API key separation by env, or implementing spending controls per deployment tier. Trigger with phrases like "vastai environments", "vastai staging", "vastai dev prod", "vastai multi-env".

supabase-multi-env-setup

1868
from jeremylongshore/claude-code-plugins-plus-skills

Configure Supabase across development, staging, and production with separate projects, environment-specific secrets, and safe migration promotion. Use when setting up multi-environment deployments, isolating dev from prod data, configuring per-environment Supabase projects, or promoting migrations through environments. Trigger: "supabase environments", "supabase staging", "supabase dev prod", "supabase multi-project", "supabase env config", "database branching".

speak-multi-env-setup

1868
from jeremylongshore/claude-code-plugins-plus-skills

Configure Speak across dev, staging, and production with separate API keys and mock modes. Use when implementing multi env setup, or managing Speak language learning platform operations. Trigger with phrases like "speak multi env setup", "speak multi env setup".

snowflake-multi-env-setup

1868
from jeremylongshore/claude-code-plugins-plus-skills

Configure Snowflake across dev, staging, and production with account-level isolation, zero-copy clones, and environment-specific RBAC. Trigger with phrases like "snowflake environments", "snowflake staging", "snowflake dev prod", "snowflake clone", "snowflake environment setup".

windsurf-workspace-setup

1868
from jeremylongshore/claude-code-plugins-plus-skills

Initialize Windsurf workspace with project-specific AI rules. Activate when users mention "create windsurfrules", "setup workspace", "configure project ai", "initialize windsurf workspace", or "migrate to windsurf". Handles workspace configuration and team standardization. Use when working with windsurf workspace setup functionality. Trigger with phrases like "windsurf workspace setup", "windsurf setup", "windsurf".

shopify-multi-env-setup

1868
from jeremylongshore/claude-code-plugins-plus-skills

Configure Shopify apps across development, staging, and production environments with separate stores, API credentials, and app instances. Trigger with phrases like "shopify environments", "shopify staging", "shopify dev vs prod", "shopify multi-store", "shopify environment setup".

salesforce-multi-env-setup

1868
from jeremylongshore/claude-code-plugins-plus-skills

Configure Salesforce across Developer, Sandbox, and Production environments with proper org management. Use when setting up multi-environment deployments, configuring per-environment credentials, or implementing sandbox-to-production promotion flows. Trigger with phrases like "salesforce environments", "salesforce sandbox", "salesforce dev prod", "salesforce org management", "salesforce sandbox types".

retellai-multi-env-setup

1868
from jeremylongshore/claude-code-plugins-plus-skills

Retell AI multi env setup — AI voice agent and phone call automation. Use when working with Retell AI for voice agents, phone calls, or telephony. Trigger with phrases like "retell multi env setup", "retellai-multi-env-setup", "voice agent".