whisper-mlx-local
Free local speech-to-text for Telegram and WhatsApp using MLX Whisper on Apple Silicon. Private, no API costs.
Best use case
whisper-mlx-local is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Free local speech-to-text for Telegram and WhatsApp using MLX Whisper on Apple Silicon. Private, no API costs.
Teams using whisper-mlx-local should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/whisper-mlx-local/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How whisper-mlx-local Compares
| Feature / Agent | whisper-mlx-local | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Free local speech-to-text for Telegram and WhatsApp using MLX Whisper on Apple Silicon. Private, no API costs.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Local Whisper
**Transcribe voice messages for free on Telegram and WhatsApp.** No API keys. No costs. Runs on your Mac.
## The Problem
Voice transcription APIs cost money:
- OpenAI Whisper: **$0.006/minute**
- Groq: **$0.001/minute**
- AssemblyAI: **$0.01/minute**
If you transcribe a lot of Telegram voice messages, it adds up.
## The Solution
This skill runs Whisper **locally on your Mac**. Same quality, **zero cost**.
- ✅ Free forever
- ✅ Private (audio never leaves your Mac)
- ✅ Fast (~1 second per message)
- ✅ Works offline
## ⚠️ Important Notes
- **First run downloads ~1.5GB model** — be patient, this only happens once
- **First transcription is slow** — model loads into memory (~10-30 seconds), then it's instant
- **Already using OpenAI API for transcription?** Replace your existing `tools.media.audio` config with the one below
## Quick Start
### 1. Install dependencies
```bash
pip3 install -r requirements.txt
```
### 2. Start the daemon
```bash
python3 scripts/daemon.py
```
First run will download the Whisper model (~1.5GB). Wait for "Ready" message.
### 3. Add to OpenClaw config
Add this to your `~/.openclaw/openclaw.json`:
```json
{
"tools": {
"media": {
"audio": {
"enabled": true,
"models": [
{
"type": "cli",
"command": "~/.openclaw/workspace/skills/local-whisper/scripts/transcribe.sh",
"args": ["{{MediaPath}}"],
"timeoutSeconds": 60
}
]
}
}
}
}
```
### 4. Restart gateway
```bash
openclaw gateway restart
```
Now voice messages from Telegram, WhatsApp, etc. will be transcribed locally for free!
### Manual test
```bash
./scripts/transcribe.sh voice_message.ogg
```
## Use Case: Telegram Voice Messages
Instead of paying for OpenAI API to transcribe incoming voice messages, point OpenClaw to this local daemon. Free transcription forever.
## Auto-Start on Login
```bash
cp com.local-whisper.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/com.local-whisper.plist
```
## API
Daemon runs at `localhost:8787`:
```bash
curl -X POST http://localhost:8787/transcribe -F "file=@audio.ogg"
# {"text": "Hello world", "language": "en"}
```
## Translation
Any language → English:
```bash
./scripts/transcribe.sh spanish_audio.ogg --translate
```
## Requirements
- macOS with Apple Silicon (M1/M2/M3/M4)
- Python 3.9+
## License
MITRelated Skills
mlx-whisper
Local speech-to-text with MLX Whisper (Apple Silicon optimized, no API key).
local-first-llm
Routes LLM requests to a local model (Ollama, LM Studio, llamafile) before falling back to cloud APIs.
qwen3-tts-local-inference
Generate speech from text using Qwen3-TTS via direct Python inference — no server required.
local-system-info
Return system metrics (CPU, RAM, disk, processes) using psutil.
iyeque-local-system-info
Return system metrics (CPU, RAM, disk, processes) using psutil.
parakeet-local-asr
Install and operate local NVIDIA Parakeet ASR for OpenClaw with an OpenAI-compatible transcription API.
llmwhisperer
Extract text and layout from images and PDFs using LLMWhisperer API. Good for handwriting and complex forms.
whisper
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
browser-use-local
Use when you need browser automation via the browser-use CLI or Python code in this OpenClaw container/host: open pages, click/type, take screenshots, extract HTML/links, or run an Agent with an OpenAI-compatible LLM (e.g. Moonshot/Kimi) using a custom base_url. Also use for debugging browser-use sessions (state empty, page readiness timeouts), and for extracting login QR codes from demo/login pages via screenshots or HTML data:image.
zvec-local-rag-service
Operate an always-on local semantic-search service using zvec + Ollama embeddings.
shodh-local
Local Shodh-Memory v0.1.74 (offline cognitive memory for AI agents). Use for persistent remembering, semantic recall, GTD todos/projects, knowledge graph. Triggers: \"remember/save/merke X\", \"recall/Erinnere/search memories about Y\", \"todos/add/complete\", \"projects\", \"proactive context\", \"what learned about Z\". Server localhost:3030 (amber-seaslug), key in TOOLS.md. Hebbian learning, 3-tier (working/session/LTM), TUI dashboard.
comfyui-local
Generate high-quality images using a local ComfyUI instance.