mlx-whisper

Local speech-to-text with MLX Whisper (Apple Silicon optimized, no API key).

7 stars

Best use case

mlx-whisper is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Local speech-to-text with MLX Whisper (Apple Silicon optimized, no API key).

Teams using mlx-whisper should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/mlx-whisper/SKILL.md --create-dirs "https://raw.githubusercontent.com/Demerzels-lab/elsamultiskillagent/main/public/skills/kevin37li/mlx-whisper/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/mlx-whisper/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How mlx-whisper Compares

Feature / Agentmlx-whisperStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Local speech-to-text with MLX Whisper (Apple Silicon optimized, no API key).

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# MLX Whisper

Local speech-to-text using Apple MLX, optimized for Apple Silicon Macs.

## Quick Start

```bash
mlx_whisper /path/to/audio.mp3 --model mlx-community/whisper-large-v3-turbo
```

## Common Usage

```bash
# Transcribe to text file
mlx_whisper audio.m4a -f txt -o ./output

# Transcribe with language hint
mlx_whisper audio.mp3 --language en --model mlx-community/whisper-large-v3-turbo

# Generate subtitles (SRT)
mlx_whisper video.mp4 -f srt -o ./subs

# Translate to English
mlx_whisper foreign.mp3 --task translate
```

## Models (download on first use)

| Model | Size | Speed | Quality |
|-------|------|-------|---------|
| mlx-community/whisper-tiny | ~75MB | Fastest | Basic |
| mlx-community/whisper-base | ~140MB | Fast | Good |
| mlx-community/whisper-small | ~470MB | Medium | Better |
| mlx-community/whisper-medium | ~1.5GB | Slower | Great |
| mlx-community/whisper-large-v3 | ~3GB | Slowest | Best |
| mlx-community/whisper-large-v3-turbo | ~1.6GB | Fast | Excellent (Recommended) |

## Notes

- Requires Apple Silicon Mac (M1/M2/M3/M4)
- Models cache to `~/.cache/huggingface/`
- Default model is `mlx-community/whisper-tiny`; use `--model mlx-community/whisper-large-v3-turbo` for best results

Related Skills

whisper-mlx-local

7
from Demerzels-lab/elsamultiskillagent

Free local speech-to-text for Telegram and WhatsApp using MLX Whisper on Apple Silicon. Private, no API costs.

llmwhisperer

7
from Demerzels-lab/elsamultiskillagent

Extract text and layout from images and PDFs using LLMWhisperer API. Good for handwriting and complex forms.

whisper

7
from Demerzels-lab/elsamultiskillagent

End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.

local-whisper

7
from Demerzels-lab/elsamultiskillagent

Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.

usewhisper

7
from Demerzels-lab/elsamultiskillagent

Official Whisper Context skill for OpenClaw.

usewhisper-autohook

7
from Demerzels-lab/elsamultiskillagent

Auto-hook tools for OpenClaw: query Whisper Context before every generation, ingest after every turn.

paylock

7
from Demerzels-lab/elsamultiskillagent

Non-custodial SOL escrow for AI agent deals.

agent-reputation

7
from Demerzels-lab/elsamultiskillagent

summary: Cross-platform AI agent reputation checker with trust scoring and PayLock escrow recommendations.

Telecom Agent Skill

7
from Demerzels-lab/elsamultiskillagent

Turn your AI Agent into a Telecom Operator. Bulk calling, ChatOps, and Field Monitoring.

OpenClaw-Finnhub

7
from Demerzels-lab/elsamultiskillagent

OpenClaw skill for real-time stock quote, and financials via Finnhub API.

```markdown

7
from Demerzels-lab/elsamultiskillagent

# OpenClaw-Last.fm

security-operator

7
from Demerzels-lab/elsamultiskillagent

Runtime security guardrails for OpenClaw agents.