whisper-mlx-local

Free local speech-to-text for Telegram and WhatsApp using MLX Whisper on Apple Silicon. Private, no API costs.

7 stars

Best use case

whisper-mlx-local is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Free local speech-to-text for Telegram and WhatsApp using MLX Whisper on Apple Silicon. Private, no API costs.

Teams using whisper-mlx-local should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/whisper-mlx-local/SKILL.md --create-dirs "https://raw.githubusercontent.com/Demerzels-lab/elsamultiskillagent/main/public/skills/impkind/whisper-mlx-local/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/whisper-mlx-local/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How whisper-mlx-local Compares

Feature / Agentwhisper-mlx-localStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Free local speech-to-text for Telegram and WhatsApp using MLX Whisper on Apple Silicon. Private, no API costs.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Local Whisper

**Transcribe voice messages for free on Telegram and WhatsApp.** No API keys. No costs. Runs on your Mac.

## The Problem

Voice transcription APIs cost money:
- OpenAI Whisper: **$0.006/minute**
- Groq: **$0.001/minute**  
- AssemblyAI: **$0.01/minute**

If you transcribe a lot of Telegram voice messages, it adds up.

## The Solution

This skill runs Whisper **locally on your Mac**. Same quality, **zero cost**.

- ✅ Free forever
- ✅ Private (audio never leaves your Mac)
- ✅ Fast (~1 second per message)
- ✅ Works offline

## ⚠️ Important Notes

- **First run downloads ~1.5GB model** — be patient, this only happens once
- **First transcription is slow** — model loads into memory (~10-30 seconds), then it's instant
- **Already using OpenAI API for transcription?** Replace your existing `tools.media.audio` config with the one below

## Quick Start

### 1. Install dependencies
```bash
pip3 install -r requirements.txt
```

### 2. Start the daemon
```bash
python3 scripts/daemon.py
```
First run will download the Whisper model (~1.5GB). Wait for "Ready" message.

### 3. Add to OpenClaw config

Add this to your `~/.openclaw/openclaw.json`:

```json
{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          {
            "type": "cli",
            "command": "~/.openclaw/workspace/skills/local-whisper/scripts/transcribe.sh",
            "args": ["{{MediaPath}}"],
            "timeoutSeconds": 60
          }
        ]
      }
    }
  }
}
```

### 4. Restart gateway
```bash
openclaw gateway restart
```

Now voice messages from Telegram, WhatsApp, etc. will be transcribed locally for free!

### Manual test
```bash
./scripts/transcribe.sh voice_message.ogg
```

## Use Case: Telegram Voice Messages

Instead of paying for OpenAI API to transcribe incoming voice messages, point OpenClaw to this local daemon. Free transcription forever.

## Auto-Start on Login

```bash
cp com.local-whisper.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/com.local-whisper.plist
```

## API

Daemon runs at `localhost:8787`:

```bash
curl -X POST http://localhost:8787/transcribe -F "file=@audio.ogg"
# {"text": "Hello world", "language": "en"}
```

## Translation

Any language → English:

```bash
./scripts/transcribe.sh spanish_audio.ogg --translate
```

## Requirements

- macOS with Apple Silicon (M1/M2/M3/M4)
- Python 3.9+

## License

MIT

Related Skills

mlx-whisper

7
from Demerzels-lab/elsamultiskillagent

Local speech-to-text with MLX Whisper (Apple Silicon optimized, no API key).

local-first-llm

7
from Demerzels-lab/elsamultiskillagent

Routes LLM requests to a local model (Ollama, LM Studio, llamafile) before falling back to cloud APIs.

qwen3-tts-local-inference

7
from Demerzels-lab/elsamultiskillagent

Generate speech from text using Qwen3-TTS via direct Python inference — no server required.

local-system-info

7
from Demerzels-lab/elsamultiskillagent

Return system metrics (CPU, RAM, disk, processes) using psutil.

iyeque-local-system-info

7
from Demerzels-lab/elsamultiskillagent

Return system metrics (CPU, RAM, disk, processes) using psutil.

parakeet-local-asr

7
from Demerzels-lab/elsamultiskillagent

Install and operate local NVIDIA Parakeet ASR for OpenClaw with an OpenAI-compatible transcription API.

llmwhisperer

7
from Demerzels-lab/elsamultiskillagent

Extract text and layout from images and PDFs using LLMWhisperer API. Good for handwriting and complex forms.

whisper

7
from Demerzels-lab/elsamultiskillagent

End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.

browser-use-local

7
from Demerzels-lab/elsamultiskillagent

Use when you need browser automation via the browser-use CLI or Python code in this OpenClaw container/host: open pages, click/type, take screenshots, extract HTML/links, or run an Agent with an OpenAI-compatible LLM (e.g. Moonshot/Kimi) using a custom base_url. Also use for debugging browser-use sessions (state empty, page readiness timeouts), and for extracting login QR codes from demo/login pages via screenshots or HTML data:image.

zvec-local-rag-service

7
from Demerzels-lab/elsamultiskillagent

Operate an always-on local semantic-search service using zvec + Ollama embeddings.

shodh-local

7
from Demerzels-lab/elsamultiskillagent

Local Shodh-Memory v0.1.74 (offline cognitive memory for AI agents). Use for persistent remembering, semantic recall, GTD todos/projects, knowledge graph. Triggers: \"remember/save/merke X\", \"recall/Erinnere/search memories about Y\", \"todos/add/complete\", \"projects\", \"proactive context\", \"what learned about Z\". Server localhost:3030 (amber-seaslug), key in TOOLS.md. Hebbian learning, 3-tier (working/session/LTM), TUI dashboard.

comfyui-local

7
from Demerzels-lab/elsamultiskillagent

Generate high-quality images using a local ComfyUI instance.