local-whisper

Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.

7 stars

Best use case

local-whisper is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.

Teams using local-whisper should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/local-whisper/SKILL.md --create-dirs "https://raw.githubusercontent.com/Demerzels-lab/elsamultiskillagent/main/public/skills/araa47/local-whisper/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/local-whisper/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How local-whisper Compares

Feature / Agentlocal-whisperStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Local Whisper STT

Local speech-to-text using OpenAI's Whisper. **Fully offline** after initial model download.

## Usage

```bash
# Basic
~/.clawdbot/skills/local-whisper/scripts/local-whisper audio.wav

# Better model
~/.clawdbot/skills/local-whisper/scripts/local-whisper audio.wav --model turbo

# With timestamps
~/.clawdbot/skills/local-whisper/scripts/local-whisper audio.wav --timestamps --json
```

## Models

| Model | Size | Notes |
|-------|------|-------|
| `tiny` | 39M | Fastest |
| `base` | 74M | **Default** |
| `small` | 244M | Good balance |
| `turbo` | 809M | Best speed/quality |
| `large-v3` | 1.5GB | Maximum accuracy |

## Options

- `--model/-m` — Model size (default: base)
- `--language/-l` — Language code (auto-detect if omitted)
- `--timestamps/-t` — Include word timestamps
- `--json/-j` — JSON output
- `--quiet/-q` — Suppress progress

## Setup

Uses uv-managed venv at `.venv/`. To reinstall:
```bash
cd ~/.clawdbot/skills/local-whisper
uv venv .venv --python 3.12
uv pip install --python .venv/bin/python click openai-whisper torch --index-url https://download.pytorch.org/whl/cpu
```

Related Skills

mlx-whisper

7
from Demerzels-lab/elsamultiskillagent

Local speech-to-text with MLX Whisper (Apple Silicon optimized, no API key).

local-first-llm

7
from Demerzels-lab/elsamultiskillagent

Routes LLM requests to a local model (Ollama, LM Studio, llamafile) before falling back to cloud APIs.

qwen3-tts-local-inference

7
from Demerzels-lab/elsamultiskillagent

Generate speech from text using Qwen3-TTS via direct Python inference — no server required.

local-system-info

7
from Demerzels-lab/elsamultiskillagent

Return system metrics (CPU, RAM, disk, processes) using psutil.

iyeque-local-system-info

7
from Demerzels-lab/elsamultiskillagent

Return system metrics (CPU, RAM, disk, processes) using psutil.

whisper-mlx-local

7
from Demerzels-lab/elsamultiskillagent

Free local speech-to-text for Telegram and WhatsApp using MLX Whisper on Apple Silicon. Private, no API costs.

parakeet-local-asr

7
from Demerzels-lab/elsamultiskillagent

Install and operate local NVIDIA Parakeet ASR for OpenClaw with an OpenAI-compatible transcription API.

llmwhisperer

7
from Demerzels-lab/elsamultiskillagent

Extract text and layout from images and PDFs using LLMWhisperer API. Good for handwriting and complex forms.

whisper

7
from Demerzels-lab/elsamultiskillagent

End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.

browser-use-local

7
from Demerzels-lab/elsamultiskillagent

Use when you need browser automation via the browser-use CLI or Python code in this OpenClaw container/host: open pages, click/type, take screenshots, extract HTML/links, or run an Agent with an OpenAI-compatible LLM (e.g. Moonshot/Kimi) using a custom base_url. Also use for debugging browser-use sessions (state empty, page readiness timeouts), and for extracting login QR codes from demo/login pages via screenshots or HTML data:image.

zvec-local-rag-service

7
from Demerzels-lab/elsamultiskillagent

Operate an always-on local semantic-search service using zvec + Ollama embeddings.

shodh-local

7
from Demerzels-lab/elsamultiskillagent

Local Shodh-Memory v0.1.74 (offline cognitive memory for AI agents). Use for persistent remembering, semantic recall, GTD todos/projects, knowledge graph. Triggers: \"remember/save/merke X\", \"recall/Erinnere/search memories about Y\", \"todos/add/complete\", \"projects\", \"proactive context\", \"what learned about Z\". Server localhost:3030 (amber-seaslug), key in TOOLS.md. Hebbian learning, 3-tier (working/session/LTM), TUI dashboard.