whisper-transcribe
Transcribe audio and video files to text using the Whisper speech-to-text API at {{WHISPER_HOST}}:{{WHISPER_PORT}}.
Best use case
whisper-transcribe is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Transcribe audio and video files to text using the Whisper speech-to-text API at {{WHISPER_HOST}}:{{WHISPER_PORT}}.
Teams using whisper-transcribe should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/whisper-transcribe/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How whisper-transcribe Compares
| Feature / Agent | whisper-transcribe | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Transcribe audio and video files to text using the Whisper speech-to-text API at {{WHISPER_HOST}}:{{WHISPER_PORT}}.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Whisper Transcribe Skill
Whisper speech-to-text service is available at `http://{{WHISPER_HOST}}:{{WHISPER_PORT}}` within the Docker network.
## Basic Transcription
Transcribe an audio file to text:
```bash
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
-F "audio_file=@/data/input/recording.wav" \
-F "output=json"
```
## Transcription with Options
Specify language, task, and output format:
```bash
# Transcribe with language hint
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
-F "audio_file=@/data/input/recording.wav" \
-F "language=en" \
-F "output=json"
# Translate non-English audio to English
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
-F "audio_file=@/data/input/french_audio.mp3" \
-F "task=translate" \
-F "output=json"
```
## Output Formats
Choose between different output formats:
```bash
# Plain text output
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
-F "audio_file=@/data/input/recording.wav" \
-F "output=txt"
# JSON with timestamps
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
-F "audio_file=@/data/input/recording.wav" \
-F "output=json"
# SRT subtitles
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
-F "audio_file=@/data/input/recording.wav" \
-F "output=srt"
# VTT subtitles
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
-F "audio_file=@/data/input/recording.wav" \
-F "output=vtt"
# Verbose JSON with word-level timestamps
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
-F "audio_file=@/data/input/recording.wav" \
-F "output=verbose_json" \
-F "word_timestamps=true"
```
## Preparing Audio with FFmpeg
For best results, convert audio to the optimal format before sending to Whisper:
```bash
# Convert any media to 16kHz mono WAV (optimal for Whisper)
ffmpeg -i /data/input/video.mp4 \
-vn -ar 16000 -ac 1 -c:a pcm_s16le \
/data/input/audio_for_whisper.wav
# Extract and convert specific time range
ffmpeg -i /data/input/long_recording.mp3 \
-ss 00:05:00 -to 00:10:00 \
-vn -ar 16000 -ac 1 -c:a pcm_s16le \
/data/input/segment.wav
```
## Saving Transcription Output
```bash
# Save transcription to file
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
-F "audio_file=@/data/input/recording.wav" \
-F "output=json" \
-o /data/output/transcription.json
# Save subtitles
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
-F "audio_file=@/data/input/recording.wav" \
-F "output=srt" \
-o /data/output/subtitles.srt
```
## Response Structure (JSON)
```json
{
"text": "The full transcription text goes here.",
"segments": [
{
"id": 0,
"start": 0.0,
"end": 4.5,
"text": "The full transcription",
"avg_logprob": -0.25,
"no_speech_prob": 0.01
},
{
"id": 1,
"start": 4.5,
"end": 7.2,
"text": " text goes here.",
"avg_logprob": -0.18,
"no_speech_prob": 0.02
}
],
"language": "en"
}
```
## Supported Audio Formats
- WAV (recommended for best quality)
- MP3
- FLAC
- OGG
- M4A
- WebM
## Tips for AI Agents
- Always convert input audio to 16kHz mono WAV for the best accuracy and fastest processing.
- Use `language` parameter when you know the source language to improve accuracy and speed.
- Use `task=translate` to translate any language directly to English text.
- For long audio (>30 minutes), split into segments with FFmpeg first, then transcribe each segment.
- Check `no_speech_prob` in segments to filter out silence or noise sections.
- Use `output=verbose_json` with `word_timestamps=true` when you need precise timing for subtitle sync.
- SRT/VTT output formats are ready to use as video subtitles without additional processing.
- Check Whisper service health at `http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/` or `/health`.Related Skills
youtube-growth
Act as an expert YouTube Strategy Consultant. Apply the Creator Unlock N.I.C.E.R. Framework for conducting channel audits, niche validation, and data-backed video ideation/thumbnail generation.
xyops-automate
Build and manage automation pipelines using xyOps at {{XYOPS_HOST}}:{{XYOPS_PORT}}.
xml-parse
Parse and transform XML/HTML documents using command-line tools in the shared volume at {{SHARED_VOLUME}}.
woodpecker-ci
Lightweight container-native CI/CD with Woodpecker
web-interface-guidelines
Checklist for reviewing UI code for compliance with comprehensive web interface, accessibility, performance, and content guidelines — based on Vercel's Web Interface Guidelines.
web-design-reviewer
Inspect web interfaces for layout, responsive, accessibility, and visual issues, then apply targeted source code fixes and re-verify results.
weaviate-search
Perform hybrid vector and keyword search using Weaviate at {{WEAVIATE_HOST}}:{{WEAVIATE_PORT}}.
watchtower-update
Auto-update Docker containers using Watchtower.
vaultwarden-manage
Self-hosted password management with Vaultwarden
vault-secrets
Secrets management with HashiCorp Vault
vantajs-background
Add animated WebGL background effects with Vanta.js — setup, parameters, resizing, performance considerations, and integration patterns in React/Next.js.
valkey-cache
Cache data and manage state using Valkey (Redis-compatible) at {{VALKEY_HOST}}:{{VALKEY_PORT}}.