local-stt

Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).

7 stars

byDemerzels-lab

View on GitHub Installation ↓

Best use case

local-stt is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).

Teams using local-stt should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/local-stt/SKILL.md --create-dirs "https://raw.githubusercontent.com/Demerzels-lab/elsamultiskillagent/main/public/skills/araa47/local-stt/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/local-stt/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How local-stt Compares

Feature / Agent	local-stt	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Local STT (Parakeet / Whisper)

Unified local speech-to-text using ONNX Runtime with int8 quantization. Choose your backend:

- **Parakeet** (default): Best accuracy for English, correctly captures names and filler words
- **Whisper**: Fastest inference, supports 99 languages

## Usage

```bash
# Default: Parakeet v2 (best English accuracy)
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg

# Explicit backend selection
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b whisper
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b parakeet -m v3

# Quiet mode (suppress progress)
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg --quiet
```

## Options

- `-b/--backend`: `parakeet` (default), `whisper`
- `-m/--model`: Model variant (see below)
- `--no-int8`: Disable int8 quantization
- `-q/--quiet`: Suppress progress
- `--room-id`: Matrix room ID for direct message

## Models

### Parakeet (default backend)
| Model | Description |
|-------|-------------|
| **v2** (default) | English only, best accuracy |
| v3 | Multilingual |

### Whisper
| Model | Description |
|-------|-------------|
| tiny | Fastest, lower accuracy |
| **base** (default) | Good balance |
| small | Better accuracy |
| large-v3-turbo | Best quality, slower |

## Benchmark (24s audio)

| Backend/Model | Time | RTF | Notes |
|---------------|------|-----|-------|
| Whisper Base int8 | 0.43s | 0.018x | Fastest |
| **Parakeet v2 int8** | 0.60s | 0.025x | Best accuracy |
| Parakeet v3 int8 | 0.63s | 0.026x | Multilingual |

## openclaw.json

```json
{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          {
            "type": "cli",
            "command": "~/.openclaw/skills/local-stt/scripts/local-stt.py",
            "args": ["--quiet", "{{MediaPath}}"],
            "timeoutSeconds": 30
          }
        ]
      }
    }
  }
}
```

Related Skills

local-first-llm

from Demerzels-lab/elsamultiskillagent

Routes LLM requests to a local model (Ollama, LM Studio, llamafile) before falling back to cloud APIs.

qwen3-tts-local-inference

from Demerzels-lab/elsamultiskillagent

Generate speech from text using Qwen3-TTS via direct Python inference — no server required.

local-system-info

from Demerzels-lab/elsamultiskillagent

Return system metrics (CPU, RAM, disk, processes) using psutil.

iyeque-local-system-info

from Demerzels-lab/elsamultiskillagent

Return system metrics (CPU, RAM, disk, processes) using psutil.

whisper-mlx-local

from Demerzels-lab/elsamultiskillagent

Free local speech-to-text for Telegram and WhatsApp using MLX Whisper on Apple Silicon. Private, no API costs.

parakeet-local-asr

from Demerzels-lab/elsamultiskillagent

Install and operate local NVIDIA Parakeet ASR for OpenClaw with an OpenAI-compatible transcription API.

browser-use-local

from Demerzels-lab/elsamultiskillagent

Use when you need browser automation via the browser-use CLI or Python code in this OpenClaw container/host: open pages, click/type, take screenshots, extract HTML/links, or run an Agent with an OpenAI-compatible LLM (e.g. Moonshot/Kimi) using a custom base_url. Also use for debugging browser-use sessions (state empty, page readiness timeouts), and for extracting login QR codes from demo/login pages via screenshots or HTML data:image.

zvec-local-rag-service

from Demerzels-lab/elsamultiskillagent

Operate an always-on local semantic-search service using zvec + Ollama embeddings.

shodh-local

from Demerzels-lab/elsamultiskillagent

Local Shodh-Memory v0.1.74 (offline cognitive memory for AI agents). Use for persistent remembering, semantic recall, GTD todos/projects, knowledge graph. Triggers: \"remember/save/merke X\", \"recall/Erinnere/search memories about Y\", \"todos/add/complete\", \"projects\", \"proactive context\", \"what learned about Z\". Server localhost:3030 (amber-seaslug), key in TOOLS.md. Hebbian learning, 3-tier (working/session/LTM), TUI dashboard.

comfyui-local

from Demerzels-lab/elsamultiskillagent

Generate high-quality images using a local ComfyUI instance.

local-task-runner

from Demerzels-lab/elsamultiskillagent

This skill provides a mechanism to execute Node.js code snippets or full scripts locally on the host machine.

localsend

from Demerzels-lab/elsamultiskillagent

Send and receive files to/from nearby devices using the LocalSend protocol.