use-local-whisper

Use when the user wants local voice transcription instead of OpenAI Whisper API. Switches to whisper.cpp running on Apple Silicon. WhatsApp only for now. Requires voice-transcription skill to be applied first.

66 stars

bysbusso

View on GitHub Installation ↓

Best use case

use-local-whisper is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using use-local-whisper should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/use-local-whisper/SKILL.md --create-dirs "https://raw.githubusercontent.com/sbusso/claudeclaw/main/skills/use-local-whisper/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/use-local-whisper/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How use-local-whisper Compares

Feature / Agent	use-local-whisper	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Use Local Whisper

Switches voice transcription from OpenAI's Whisper API to local whisper.cpp. Runs entirely on-device — no API key, no network, no cost.

**Channel support:** Currently WhatsApp only. The transcription module (`src/transcription.ts`) uses Baileys types for audio download. Other channels (Telegram, Discord, etc.) would need their own audio-download logic before this skill can serve them.

**Note:** The Homebrew package is `whisper-cpp`, but the CLI binary it installs is `whisper-cli`.

## Prerequisites

- `voice-transcription` skill must be applied first (WhatsApp channel)
- macOS with Apple Silicon (M1+) recommended
- `whisper-cpp` installed: `brew install whisper-cpp` (provides the `whisper-cli` binary)
- `ffmpeg` installed: `brew install ffmpeg`
- A GGML model file downloaded to `data/models/`

## Phase 1: Pre-flight

### Check if already applied

Check if `src/transcription.ts` already uses `whisper-cli`:

```bash
grep 'whisper-cli' src/transcription.ts && echo "Already applied" || echo "Not applied"
```

If already applied, skip to Phase 3 (Verify).

### Check dependencies are installed

```bash
whisper-cli --help >/dev/null 2>&1 && echo "WHISPER_OK" || echo "WHISPER_MISSING"
ffmpeg -version >/dev/null 2>&1 && echo "FFMPEG_OK" || echo "FFMPEG_MISSING"
```

If missing, install via Homebrew:
```bash
brew install whisper-cpp ffmpeg
```

### Check for model file

```bash
ls data/models/ggml-*.bin 2>/dev/null || echo "NO_MODEL"
```

If no model exists, download the base model (148MB, good balance of speed and accuracy):
```bash
mkdir -p data/models
curl -L -o data/models/ggml-base.bin "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin"
```

For better accuracy at the cost of speed, use `ggml-small.bin` (466MB) or `ggml-medium.bin` (1.5GB).

## Phase 2: Apply Code Changes

### Ensure WhatsApp fork remote

```bash
git remote -v
```

If `whatsapp` is missing, add it:

```bash
git remote add whatsapp https://github.com/qwibitai/claudeclaw-whatsapp.git
```

### Merge the skill branch

```bash
git fetch whatsapp skill/local-whisper
git merge whatsapp/skill/local-whisper || {
  git checkout --theirs package-lock.json
  git add package-lock.json
  git merge --continue
}
```

This modifies `src/transcription.ts` to use the `whisper-cli` binary instead of the OpenAI API.

### Validate

```bash
npm run build
```

## Phase 3: Verify

### Ensure launchd PATH includes Homebrew

The ClaudeClaw launchd service runs with a restricted PATH. `whisper-cli` and `ffmpeg` are in `/opt/homebrew/bin/` (Apple Silicon) or `/usr/local/bin/` (Intel), which may not be in the plist's PATH.

> **Service name:** Derived from the directory name: `com.claudeclaw.<dirname>` (macOS) / `claudeclaw-<dirname>` (Linux). For example, if cwd is `my-assistant`, the service is `com.claudeclaw.my-assistant`. Determine the correct service name before running service commands below.

Check the current PATH:
```bash
grep -A1 'PATH' ~/Library/LaunchAgents/com.claudeclaw.plist
```

If `/opt/homebrew/bin` is missing, add it to the `<string>` value inside the `PATH` key in the plist. Then reload:
```bash
launchctl unload ~/Library/LaunchAgents/com.claudeclaw.plist
launchctl load ~/Library/LaunchAgents/com.claudeclaw.plist
```

### Build and restart

```bash
npm run build
launchctl kickstart -k gui/$(id -u)/com.claudeclaw
```

### Test

Send a voice note in any registered group. The agent should receive it as `[Voice: <transcript>]`.

### Check logs

```bash
tail -f logs/claudeclaw.log | grep -i -E "voice|transcri|whisper"
```

Look for:
- `Transcribed voice message` — successful transcription
- `whisper.cpp transcription failed` — check model path, ffmpeg, or PATH

## Configuration

Environment variables (optional, set in `.env`):

| Variable | Default | Description |
|----------|---------|-------------|
| `WHISPER_BIN` | `whisper-cli` | Path to whisper.cpp binary |
| `WHISPER_MODEL` | `data/models/ggml-base.bin` | Path to GGML model file |

## Troubleshooting

**"whisper.cpp transcription failed"**: Ensure both `whisper-cli` and `ffmpeg` are in PATH. The launchd service uses a restricted PATH — see Phase 3 above. Test manually:
```bash
ffmpeg -f lavfi -i anullsrc=r=16000:cl=mono -t 1 -f wav /tmp/test.wav -y
whisper-cli -m data/models/ggml-base.bin -f /tmp/test.wav --no-timestamps -nt
```

**Transcription works in dev but not as service**: The launchd plist PATH likely doesn't include `/opt/homebrew/bin`. See "Ensure launchd PATH includes Homebrew" in Phase 3.

**Slow transcription**: The base model processes ~30s of audio in <1s on M1+. If slower, check CPU usage — another process may be competing.

**Wrong language**: whisper.cpp auto-detects language. To force a language, you can set `WHISPER_LANG` and modify `src/transcription.ts` to pass `-l $WHISPER_LANG`.

Related Skills

x-integration

from sbusso/claudeclaw

X (Twitter) integration for ClaudeClaw. Post tweets, like, reply, retweet, and quote. Use for setup, testing, or troubleshooting X functionality. Triggers on "setup x", "x integration", "twitter", "post tweet", "tweet".

update-skills

from sbusso/claudeclaw

Check for and apply updates to installed skill branches from upstream.

update-claudeclaw

from sbusso/claudeclaw

Efficiently bring upstream ClaudeClaw updates into a customized install, with preview, selective cherry-pick, and low token usage.

uninstall

from sbusso/claudeclaw

Stop and remove the ClaudeClaw background service and agents for this instance

uninstall-extension

from sbusso/claudeclaw

Uninstall a ClaudeClaw extension

setup

from sbusso/claudeclaw

Run initial ClaudeClaw setup. Use when user wants to install dependencies, authenticate messaging channels, register their main channel, or start the background services. Triggers on "setup", "install", "configure claudeclaw", or first-time setup requests.

qodo-pr-resolver

from sbusso/claudeclaw

Review and resolve PR issues with Qodo - get AI-powered code review issues and fix them interactively (GitHub, GitLab, Bitbucket, Azure DevOps)

install-extension

from sbusso/claudeclaw

Install a ClaudeClaw extension (e.g., slack, triage)

get-qodo-rules

from sbusso/claudeclaw

Loads org- and repo-level coding rules from Qodo before code tasks begin, ensuring all generation and modification follows team standards. Use before any code generation or modification task when rules are not already loaded. Invoke when user asks to write, edit, refactor, or review code, or when starting implementation planning.

debug

from sbusso/claudeclaw

Debug container agent issues. Use when things aren't working, container fails, authentication problems, or to understand how the container system works. Covers logs, environment variables, mounts, and common issues.

customize

from sbusso/claudeclaw

Add new capabilities or modify ClaudeClaw behavior. Use when user wants to add channels (Telegram, Slack, email input), change triggers, add integrations, modify the router, or make any other customizations. This is an interactive skill that asks questions to understand what the user wants.

convert-to-apple-container

from sbusso/claudeclaw

Switch from Docker to Apple Container for macOS-native container isolation. Use when the user wants Apple Container instead of Docker, or is setting up on macOS and prefers the native runtime. Triggers on "apple container", "convert to apple container", "switch to apple container", or "use apple container".