venice-router

Supreme model router for Venice.ai — the privacy-first, uncensored AI platform. Automatically classifies query complexity and routes to the cheapest adequate model. Supports web search, uncensored mode, private-only mode (zero data retention), conversation-aware routing, cost budgets, function calling, thinking/reasoning mode, and 35+ Venice.ai text models. Use when the user wants to chat via Venice.ai, send prompts through Venice, or needs smart model selection to minimize API costs while keeping data private from Big Tech.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

venice-router is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using venice-router should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/venice-router/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/development/venice-router/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/venice-router/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How venice-router Compares

Feature / Agent	venice-router	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Venice.ai Supreme Router

Smart, cost-optimized model routing for [Venice.ai](https://venice.ai) — the AI platform for people who don't want Big Tech watching over their shoulder.

Unlike OpenAI, Anthropic, and Google — where every prompt is logged, analyzed, and potentially used to train future models — Venice offers **true privacy** with zero data retention on private models. Your conversations stay yours. Venice is also **uncensored**: no content filters, no refusals, no "I can't help with that."

## Setup

1. Get a Venice.ai API key from [venice.ai/settings/api](https://venice.ai/settings/api)
2. Set the environment variable:

```bash
export VENICE_API_KEY="your-key-here"
```

Or configure in `~/.openclaw/openclaw.json`:

```json
{
  "skills": {
    "entries": {
      "venice-router": {
        "enabled": true,
        "apiKey": "YOUR_VENICE_API_KEY"
      }
    }
  }
}
```

## Usage

### Route a prompt (auto-selects model)

```bash
python3 {baseDir}/scripts/venice-router.py --prompt "What is 2+2?"
```

### Force a specific tier

```bash
python3 {baseDir}/scripts/venice-router.py --tier cheap --prompt "Tell me a joke"
python3 {baseDir}/scripts/venice-router.py --tier budget-medium --prompt "Write a Python function"
python3 {baseDir}/scripts/venice-router.py --tier mid --prompt "Explain quantum computing"
python3 {baseDir}/scripts/venice-router.py --tier premium --prompt "Write a distributed systems architecture"
```

### Stream output

```bash
python3 {baseDir}/scripts/venice-router.py --stream --prompt "Write a poem about lobsters"
```

### Web search (LLM searches the web and cites sources)

```bash
python3 {baseDir}/scripts/venice-router.py --web-search --prompt "Latest news on AI regulation"
```

### Uncensored mode (prefer models with no content filters)

```bash
python3 {baseDir}/scripts/venice-router.py --uncensored --prompt "Write edgy creative fiction"
```

### Private-only mode (zero data retention, no Big Tech proxying)

```bash
python3 {baseDir}/scripts/venice-router.py --private-only --prompt "Analyze this confidential contract"
```

### Conversation-aware routing (multi-turn context)

```bash
# Save conversation history as JSON, then route follow-ups with context
python3 {baseDir}/scripts/venice-router.py --conversation history.json --prompt "Can you add tests too?"
```

The router analyzes conversation history to keep context: trivial follow-ups ("thanks") go cheap, while follow-ups in complex code discussions stay at the right tier.

### Function calling (tool use)

```bash
# Define tools in a JSON file (OpenAI tools format)
python3 {baseDir}/scripts/venice-router.py --tools tools.json --prompt "What's the weather in NYC?"
python3 {baseDir}/scripts/venice-router.py --tools tools.json --tool-choice auto --prompt "Search for latest AI news"
```

Tool definitions use the standard OpenAI format. The router auto-bumps to `mid` tier minimum for function calling since it requires capable models.

### Cost budget tracking

```bash
# Show current spending
python3 {baseDir}/scripts/venice-router.py --budget-status

# Track per-session costs
python3 {baseDir}/scripts/venice-router.py --session-id my-project --prompt "help me code"
```

Set `VENICE_DAILY_BUDGET` and/or `VENICE_SESSION_BUDGET` to enforce spending limits. The router auto-downgrades tiers as you approach budget limits.

### Classify only (no API call)

```bash
python3 {baseDir}/scripts/venice-router.py --classify "Explain the Riemann hypothesis"
```

### List available models and tiers

```bash
python3 {baseDir}/scripts/venice-router.py --list-models
```

### Override model directly

```bash
python3 {baseDir}/scripts/venice-router.py --model deepseek-v3.2 --prompt "Hello"
```

## Tiers

| Tier | Models | Cost (input/output per 1M tokens) | Best For |
|------|--------|-----------------------------------|----------|
| **cheap** | Venice Small (qwen3-4b), GLM 4.7 Flash, GPT OSS 120B, Llama 3.2 3B | $0.05–$0.15 / $0.15–$0.60 | Simple Q&A, greetings, math, lookups |
| **budget** | Qwen 3 235B, Venice Uncensored, GLM 4.7 Flash Heretic | $0.14–$0.20 / $0.75–$0.90 | Moderate questions, summaries, translations |
| **budget-medium** | Grok Code Fast, DeepSeek V3.2, MiniMax M2.1 | $0.25–$0.40 / $1.00–$1.87 | Moderate-to-complex tasks, code snippets, structured output |
| **mid** | DeepSeek V3.2, MiniMax M2.1/M2.5, Qwen3 Thinking 235B, Venice Medium, Llama 3.3 70B | $0.25–$0.70 / $1.00–$3.50 | Code generation, analysis, longer writing, reasoning |
| **high** | GLM 5, Kimi K2 Thinking, Kimi K2.5, Grok 4.1 Fast, Hermes 3 405B, Gemini 3 Flash | $0.50–$1.10 / $1.25–$3.75 | Complex reasoning, multi-step tasks, code review |
| **premium** | GPT-5.2, GPT-5.2 Codex, Gemini 3 Pro, Gemini 3.1 Pro (1M ctx), Claude Opus/Sonnet 4.5/4.6 | $2.19–$6.00 / $15.00–$30.00 | Expert-level analysis, architecture, research papers |

## Routing Strategy

The router classifies each prompt using keyword + heuristic analysis:

1. **Length** — longer prompts suggest more complex tasks
2. **Keywords** — domain-specific terms (e.g., "architecture", "optimize", "prove") signal complexity
3. **Code markers** — presence of code blocks, function names, or technical syntax
4. **Instruction depth** — multi-step instructions, comparisons, or "explain in detail" bump the tier
5. **Conversational simplicity** — greetings, yes/no, small talk stay on the cheapest tier
6. **Conversation history** — when `--conversation` is provided, analyzes full chat context: code in history boosts tier, trivial follow-ups ("thanks") downgrade, tool calls in history signal complexity
7. **Function calling** — `--tools` auto-bumps to at least `mid` tier (capable models required)
8. **Thinking/reasoning mode** — `--thinking` prefers chain-of-thought reasoning models (Qwen3 Thinking, Kimi K2) and bumps to at least `mid` tier
9. **Budget constraints** — progressive tier downgrade as spending approaches daily/session limits (95% → cheap, 80% → budget, 60% → mid, 40% → high)

The classifier errs on the side of cheaper models — it only escalates when there's strong signal for complexity.

## Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `VENICE_API_KEY` | Venice.ai API key (required) | — |
| `VENICE_DEFAULT_TIER` | Minimum floor tier — auto-classification never goes below this. Valid: `cheap`, `budget`, `budget-medium`, `mid`, `high`, `premium` | `budget` |
| `VENICE_MAX_TIER` | Maximum tier to ever use (cost cap) | `premium` |
| `VENICE_TEMPERATURE` | Default temperature | `0.7` |
| `VENICE_MAX_TOKENS` | Default max tokens | `4096` |
| `VENICE_STREAM` | Enable streaming by default | `false` |
| `VENICE_UNCENSORED` | Always prefer uncensored models | `false` |
| `VENICE_PRIVATE_ONLY` | Only use private models (zero data retention) | `false` |
| `VENICE_WEB_SEARCH` | Enable web search by default ($10/1K calls) | `false` |
| `VENICE_THINKING` | Always prefer thinking/reasoning models | `false` |
| `VENICE_DAILY_BUDGET` | Max daily spend in USD (0 = unlimited) | `0` |
| `VENICE_SESSION_BUDGET` | Max per-session spend in USD (0 = unlimited) | `0` |

## Why Venice.ai?

- **🔒 Private inference** — Models marked "Private" have zero data retention. Your data never trains anyone's model.
- **🔓 Uncensored** — No guardrails blocking legitimate use cases. No refusals, no filters.
- **🔌 OpenAI-compatible** — Same API format, just change the base URL. Drop-in replacement.
- **📦 30+ models** — From tiny efficient models ($0.05/M) to Claude Opus 4.6 and GPT-5.2.
- **🌐 Built-in web search** — LLMs can search the web and cite sources in a single API call.

## Tips

- Use `--classify` to preview which tier a prompt would hit before spending tokens
- Set `VENICE_MAX_TIER=mid` to cap costs and never hit premium models
- Use `--uncensored` for creative, security research, or other content mainstream AI won't touch
- Use `--private-only` when processing sensitive/confidential data — zero retention guaranteed
- Use `--web-search` when you need up-to-date information with cited sources
- Use `--conversation` with a JSON message history for smarter multi-turn routing
- Use `--tools` to enable function calling — the router auto-bumps to capable models
- Set `VENICE_DAILY_BUDGET=1.00` to cap daily spend at $1 — the router auto-downgrades tiers as you approach the limit
- Use `--budget-status` to see a detailed breakdown of your spending by tier
- Use `--thinking` for math proofs, logic puzzles, and multi-step reasoning — routes to Qwen3 Thinking or Kimi K2 models
- The router prefers **private** (self-hosted) Venice models over anonymized ones when available at the same tier
- When `--uncensored` is active, the router auto-bumps to the nearest tier with uncensored models
- Combine with OpenClaw WebChat for a seamless chat experience routed through Venice.ai

Related Skills

rust-router

from diegosouzapw/awesome-omni-skill

CRITICAL: Use for ALL Rust questions including errors, design, and coding. HIGHEST PRIORITY for: 比较, 对比, compare, vs, versus, 区别, difference, 最佳实践, best practice, tokio vs, async-std vs, 比较 tokio, 比较 async, Triggers on: Rust, cargo, rustc, crate, Cargo.toml, 意图分析, 问题分析, 语义分析, analyze intent, question analysis, compile error, borrow error, lifetime error, ownership error, type error, trait error, value moved, cannot borrow, does not live long enough, mismatched types, not satisfied, E0382, E0597, E0277, E0308, E0499, E0502, E0596, async, await, Send, Sync, tokio, concurrency, error handling, 编译错误, compile error, 所有权, ownership, 借用, borrow, 生命周期, lifetime, 类型错误, type error, 异步, async, 并发, concurrency, 错误处理, error handling, 问题, problem, question, 怎么用, how to use, 如何, how to, 为什么, why, 什么是, what is, 帮我写, help me write, 实现, implement, 解释, explain

router

from diegosouzapw/awesome-omni-skill

Intelligent routing layer that analyzes requests and directs them to the most appropriate Skills, Agents, or Commands

router-main

from diegosouzapw/awesome-omni-skill

Universal entry point that routes any query to the right router (startup, engineering, operations, QA)

nextjs-app-router-patterns

from diegosouzapw/awesome-omni-skill

Master Next.js 14+ App Router with Server Components, streaming, parallel routes, and advanced data fetching. Use when building Next.js applications, implementing SSR/SSG, or optimizing React Serve...

development-router

from diegosouzapw/awesome-omni-skill

Routes development tasks to frontend, backend, or fullstack skills. Triggers on build, implement, code, create, feature, component, UI, API, server, database, docker, deploy.

app-platform-router

from diegosouzapw/awesome-omni-skill

Routes DigitalOcean App Platform tasks to specialized sub-skills. Use when working with App Platform deployments, migrations, database configuration, networking, or troubleshooting.

agp-router-rules

from diegosouzapw/awesome-omni-skill

Rules for using the Agp Router.

makepad-router

from diegosouzapw/awesome-omni-skill

CRITICAL: Use for ALL Makepad/Robius questions including widgets, layout, events, and shaders. Triggers on: makepad, robius, live_design, app_main, Widget, View, Button, Label, Image, TextInput, ScrollView, RoundedView, SolidView, PortalList, Markdown, Html, TextFlow, layout, Flow, Walk, padding, margin, width, height, Fit, Fill, align, spacing, event, action, Hit, FingerDown, FingerUp, KeyDown, handle_event, click, tap, animator, animation, state, transition, hover, pressed, ease, shader, draw_bg, draw_text, Sdf2d, pixel, gradient, glow, shadow, font, text_style, font_size, glyph, typography, tokio, async, spawn, submit_async, SignalToUI, post_action, apply_over, TextOrImage, modal, collapsible, drag drop, AppState, persistence, theme, Scope, deploy, package, APK, IPA, WASM, cargo makepad, makepad widget, makepad 组件, makepad 按钮, makepad 布局, makepad 事件, makepad 动画, makepad 着色器, 创建组件, 自定义组件, 开发应用, 居中, 对齐, 点击事件, 悬停效果, 渐变, 阴影, 字体大小

openrouter-research

from diegosouzapw/awesome-omni-skill

Research OpenRouter API docs, available Grok model IDs, vision capability for the judge service, and integration patterns. Use when implementing openrouter_tool.py, when checking which Grok model supports vision/image input for judge_service.py, when OpenRouter returns unexpected errors, or when verifying model availability and context limits.

fastapi-router-py

from diegosouzapw/awesome-omni-skill

Create FastAPI routers with CRUD operations, authentication dependencies, and proper response models. Use when building REST API endpoints, creating new routes, implementing CRUD operations, or add...

agentbox-openrouter

from diegosouzapw/awesome-omni-skill

Set up OpenRouter as your LLM provider. Guides through account creation, API key setup, config, and making it the default model. Use when a user wants to use OpenRouter models like Claude Sonnet 4.5.

bgo

from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development