parakeet-stt
Local speech-to-text with NVIDIA Parakeet TDT 0.6B v3 (ONNX on CPU). 30x faster than Whisper, 25 languages, auto-detection, OpenAI-compatible API. Use when transcribing audio files, converting speech to text, or processing voice recordings locally without cloud APIs.
Best use case
parakeet-stt is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Local speech-to-text with NVIDIA Parakeet TDT 0.6B v3 (ONNX on CPU). 30x faster than Whisper, 25 languages, auto-detection, OpenAI-compatible API. Use when transcribing audio files, converting speech to text, or processing voice recordings locally without cloud APIs.
Teams using parakeet-stt should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/parakeet-stt/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How parakeet-stt Compares
| Feature / Agent | parakeet-stt | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Local speech-to-text with NVIDIA Parakeet TDT 0.6B v3 (ONNX on CPU). 30x faster than Whisper, 25 languages, auto-detection, OpenAI-compatible API. Use when transcribing audio files, converting speech to text, or processing voice recordings locally without cloud APIs.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Parakeet TDT (Speech-to-Text)
Local transcription using NVIDIA Parakeet TDT 0.6B v3 with ONNX Runtime.
Runs on CPU — no GPU required. ~30x faster than realtime.
## Installation
```bash
# Clone the repo
git clone https://github.com/groxaxo/parakeet-tdt-0.6b-v3-fastapi-openai.git
cd parakeet-tdt-0.6b-v3-fastapi-openai
# Run with Docker (recommended)
docker compose up -d parakeet-cpu
# Or run directly with Python
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 5000
```
Default port is `5000`. Set `PARAKEET_URL` to override (e.g., `http://localhost:5092`).
## API Endpoint
OpenAI-compatible API at `$PARAKEET_URL` (default: `http://localhost:5000`).
## Quick Start
```bash
# Transcribe audio file (plain text)
curl -X POST $PARAKEET_URL/v1/audio/transcriptions \
-F "file=@/path/to/audio.mp3" \
-F "response_format=text"
# Get timestamps and segments
curl -X POST $PARAKEET_URL/v1/audio/transcriptions \
-F "file=@/path/to/audio.mp3" \
-F "response_format=verbose_json"
# Generate subtitles (SRT)
curl -X POST $PARAKEET_URL/v1/audio/transcriptions \
-F "file=@/path/to/audio.mp3" \
-F "response_format=srt"
```
## Python / OpenAI SDK
```python
import os
from openai import OpenAI
client = OpenAI(
base_url=os.getenv("PARAKEET_URL", "http://localhost:5000") + "/v1",
api_key="not-needed"
)
with open("audio.mp3", "rb") as f:
transcript = client.audio.transcriptions.create(
model="parakeet-tdt-0.6b-v3",
file=f,
response_format="text"
)
print(transcript)
```
## Response Formats
| Format | Output |
|--------|--------|
| `text` | Plain text |
| `json` | `{"text": "..."}` |
| `verbose_json` | Segments with timestamps and words |
| `srt` | SRT subtitles |
| `vtt` | WebVTT subtitles |
## Supported Languages (25)
English, Spanish, French, German, Italian, Portuguese, Polish, Russian,
Ukrainian, Dutch, Swedish, Danish, Finnish, Norwegian, Greek, Czech,
Romanian, Hungarian, Bulgarian, Slovak, Croatian, Lithuanian, Latvian,
Estonian, Slovenian
Language is auto-detected — no configuration needed.
## Web Interface
Open `$PARAKEET_URL` in a browser for drag-and-drop transcription UI.
## Docker Management
```bash
# Check status
docker ps --filter "name=parakeet"
# View logs
docker logs -f <container-name>
# Restart
docker compose restart
# Stop
docker compose down
```
## Why Parakeet over Whisper?
- **Speed**: ~30x faster than realtime on CPU
- **Accuracy**: Comparable to Whisper large-v3
- **Privacy**: Runs 100% locally, no cloud calls
- **Compatibility**: Drop-in replacement for OpenAI's transcription APIRelated Skills
parakeet-local-asr
Install and operate local NVIDIA Parakeet ASR for OpenClaw with an OpenAI-compatible transcription API.
paylock
Non-custodial SOL escrow for AI agent deals.
agent-reputation
summary: Cross-platform AI agent reputation checker with trust scoring and PayLock escrow recommendations.
Telecom Agent Skill
Turn your AI Agent into a Telecom Operator. Bulk calling, ChatOps, and Field Monitoring.
OpenClaw-Finnhub
OpenClaw skill for real-time stock quote, and financials via Finnhub API.
```markdown
# OpenClaw-Last.fm
security-operator
Runtime security guardrails for OpenClaw agents.
operator-humanizer
Transform AI-generated text into authentic human writing.
kit-email-operator
**AI-powered email marketing for Kit (ConvertKit)**.
agora
Trade prediction markets on Agora — the prediction market exclusively for AI agents. Register, browse markets, trade YES/NO, create markets, earn reputation via Brier scores.
surf-check
Surf forecast decision engine.
jinko-flight-search
Search flights and discover travel destinations using the Jinko MCP server. Provides two core capabilities: (1) Destination discovery — find where to travel based on criteria like budget, climate, or activities when the user has no specific destination in mind, and (2) Specific flight search — compare flights between two known cities/airports with flexible dates, cabin classes, and budget filters. Use this skill when the user wants to: search for flights, find cheap flights, discover travel destinations, compare flight prices, plan a trip, find deals from a specific city, or explore where to go. Triggers on any flight-booking, travel-planning, or destination-discovery request. Requires the Jinko MCP server connected at https://mcp.gojinko.com.