local-whisper
Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.
Best use case
local-whisper is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.
Teams using local-whisper should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/local-whisper/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How local-whisper Compares
| Feature / Agent | local-whisper | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Local Whisper STT Local speech-to-text using OpenAI's Whisper. **Fully offline** after initial model download. ## Usage ```bash # Basic ~/.clawdbot/skills/local-whisper/scripts/local-whisper audio.wav # Better model ~/.clawdbot/skills/local-whisper/scripts/local-whisper audio.wav --model turbo # With timestamps ~/.clawdbot/skills/local-whisper/scripts/local-whisper audio.wav --timestamps --json ``` ## Models | Model | Size | Notes | |-------|------|-------| | `tiny` | 39M | Fastest | | `base` | 74M | **Default** | | `small` | 244M | Good balance | | `turbo` | 809M | Best speed/quality | | `large-v3` | 1.5GB | Maximum accuracy | ## Options - `--model/-m` — Model size (default: base) - `--language/-l` — Language code (auto-detect if omitted) - `--timestamps/-t` — Include word timestamps - `--json/-j` — JSON output - `--quiet/-q` — Suppress progress ## Setup Uses uv-managed venv at `.venv/`. To reinstall: ```bash cd ~/.clawdbot/skills/local-whisper uv venv .venv --python 3.12 uv pip install --python .venv/bin/python click openai-whisper torch --index-url https://download.pytorch.org/whl/cpu ```
Related Skills
mlx-whisper
Local speech-to-text with MLX Whisper (Apple Silicon optimized, no API key).
local-first-llm
Routes LLM requests to a local model (Ollama, LM Studio, llamafile) before falling back to cloud APIs.
qwen3-tts-local-inference
Generate speech from text using Qwen3-TTS via direct Python inference — no server required.
local-system-info
Return system metrics (CPU, RAM, disk, processes) using psutil.
iyeque-local-system-info
Return system metrics (CPU, RAM, disk, processes) using psutil.
whisper-mlx-local
Free local speech-to-text for Telegram and WhatsApp using MLX Whisper on Apple Silicon. Private, no API costs.
parakeet-local-asr
Install and operate local NVIDIA Parakeet ASR for OpenClaw with an OpenAI-compatible transcription API.
llmwhisperer
Extract text and layout from images and PDFs using LLMWhisperer API. Good for handwriting and complex forms.
whisper
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
browser-use-local
Use when you need browser automation via the browser-use CLI or Python code in this OpenClaw container/host: open pages, click/type, take screenshots, extract HTML/links, or run an Agent with an OpenAI-compatible LLM (e.g. Moonshot/Kimi) using a custom base_url. Also use for debugging browser-use sessions (state empty, page readiness timeouts), and for extracting login QR codes from demo/login pages via screenshots or HTML data:image.
zvec-local-rag-service
Operate an always-on local semantic-search service using zvec + Ollama embeddings.
shodh-local
Local Shodh-Memory v0.1.74 (offline cognitive memory for AI agents). Use for persistent remembering, semantic recall, GTD todos/projects, knowledge graph. Triggers: \"remember/save/merke X\", \"recall/Erinnere/search memories about Y\", \"todos/add/complete\", \"projects\", \"proactive context\", \"what learned about Z\". Server localhost:3030 (amber-seaslug), key in TOOLS.md. Hebbian learning, 3-tier (working/session/LTM), TUI dashboard.