local-stt
Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).
Best use case
local-stt is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).
Teams using local-stt should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/local-stt/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How local-stt Compares
| Feature / Agent | local-stt | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Local STT (Parakeet / Whisper)
Unified local speech-to-text using ONNX Runtime with int8 quantization. Choose your backend:
- **Parakeet** (default): Best accuracy for English, correctly captures names and filler words
- **Whisper**: Fastest inference, supports 99 languages
## Usage
```bash
# Default: Parakeet v2 (best English accuracy)
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg
# Explicit backend selection
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b whisper
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b parakeet -m v3
# Quiet mode (suppress progress)
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg --quiet
```
## Options
- `-b/--backend`: `parakeet` (default), `whisper`
- `-m/--model`: Model variant (see below)
- `--no-int8`: Disable int8 quantization
- `-q/--quiet`: Suppress progress
- `--room-id`: Matrix room ID for direct message
## Models
### Parakeet (default backend)
| Model | Description |
|-------|-------------|
| **v2** (default) | English only, best accuracy |
| v3 | Multilingual |
### Whisper
| Model | Description |
|-------|-------------|
| tiny | Fastest, lower accuracy |
| **base** (default) | Good balance |
| small | Better accuracy |
| large-v3-turbo | Best quality, slower |
## Benchmark (24s audio)
| Backend/Model | Time | RTF | Notes |
|---------------|------|-----|-------|
| Whisper Base int8 | 0.43s | 0.018x | Fastest |
| **Parakeet v2 int8** | 0.60s | 0.025x | Best accuracy |
| Parakeet v3 int8 | 0.63s | 0.026x | Multilingual |
## openclaw.json
```json
{
"tools": {
"media": {
"audio": {
"enabled": true,
"models": [
{
"type": "cli",
"command": "~/.openclaw/skills/local-stt/scripts/local-stt.py",
"args": ["--quiet", "{{MediaPath}}"],
"timeoutSeconds": 30
}
]
}
}
}
}
```Related Skills
local-first-llm
Routes LLM requests to a local model (Ollama, LM Studio, llamafile) before falling back to cloud APIs.
qwen3-tts-local-inference
Generate speech from text using Qwen3-TTS via direct Python inference — no server required.
local-system-info
Return system metrics (CPU, RAM, disk, processes) using psutil.
iyeque-local-system-info
Return system metrics (CPU, RAM, disk, processes) using psutil.
whisper-mlx-local
Free local speech-to-text for Telegram and WhatsApp using MLX Whisper on Apple Silicon. Private, no API costs.
parakeet-local-asr
Install and operate local NVIDIA Parakeet ASR for OpenClaw with an OpenAI-compatible transcription API.
browser-use-local
Use when you need browser automation via the browser-use CLI or Python code in this OpenClaw container/host: open pages, click/type, take screenshots, extract HTML/links, or run an Agent with an OpenAI-compatible LLM (e.g. Moonshot/Kimi) using a custom base_url. Also use for debugging browser-use sessions (state empty, page readiness timeouts), and for extracting login QR codes from demo/login pages via screenshots or HTML data:image.
zvec-local-rag-service
Operate an always-on local semantic-search service using zvec + Ollama embeddings.
shodh-local
Local Shodh-Memory v0.1.74 (offline cognitive memory for AI agents). Use for persistent remembering, semantic recall, GTD todos/projects, knowledge graph. Triggers: \"remember/save/merke X\", \"recall/Erinnere/search memories about Y\", \"todos/add/complete\", \"projects\", \"proactive context\", \"what learned about Z\". Server localhost:3030 (amber-seaslug), key in TOOLS.md. Hebbian learning, 3-tier (working/session/LTM), TUI dashboard.
comfyui-local
Generate high-quality images using a local ComfyUI instance.
local-task-runner
This skill provides a mechanism to execute Node.js code snippets or full scripts locally on the host machine.
localsend
Send and receive files to/from nearby devices using the LocalSend protocol.