whisper-transcribe

Transcribe audio and video files to text using the Whisper speech-to-text API at {{WHISPER_HOST}}:{{WHISPER_PORT}}.

54 stars

Best use case

whisper-transcribe is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Transcribe audio and video files to text using the Whisper speech-to-text API at {{WHISPER_HOST}}:{{WHISPER_PORT}}.

Teams using whisper-transcribe should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/whisper-transcribe/SKILL.md --create-dirs "https://raw.githubusercontent.com/bidewio/better-openclaw/main/skills/whisper-transcribe/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/whisper-transcribe/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How whisper-transcribe Compares

Feature / Agentwhisper-transcribeStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Transcribe audio and video files to text using the Whisper speech-to-text API at {{WHISPER_HOST}}:{{WHISPER_PORT}}.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Whisper Transcribe Skill

Whisper speech-to-text service is available at `http://{{WHISPER_HOST}}:{{WHISPER_PORT}}` within the Docker network.

## Basic Transcription

Transcribe an audio file to text:

```bash
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
  -F "audio_file=@/data/input/recording.wav" \
  -F "output=json"
```

## Transcription with Options

Specify language, task, and output format:

```bash
# Transcribe with language hint
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
  -F "audio_file=@/data/input/recording.wav" \
  -F "language=en" \
  -F "output=json"

# Translate non-English audio to English
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
  -F "audio_file=@/data/input/french_audio.mp3" \
  -F "task=translate" \
  -F "output=json"
```

## Output Formats

Choose between different output formats:

```bash
# Plain text output
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
  -F "audio_file=@/data/input/recording.wav" \
  -F "output=txt"

# JSON with timestamps
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
  -F "audio_file=@/data/input/recording.wav" \
  -F "output=json"

# SRT subtitles
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
  -F "audio_file=@/data/input/recording.wav" \
  -F "output=srt"

# VTT subtitles
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
  -F "audio_file=@/data/input/recording.wav" \
  -F "output=vtt"

# Verbose JSON with word-level timestamps
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
  -F "audio_file=@/data/input/recording.wav" \
  -F "output=verbose_json" \
  -F "word_timestamps=true"
```

## Preparing Audio with FFmpeg

For best results, convert audio to the optimal format before sending to Whisper:

```bash
# Convert any media to 16kHz mono WAV (optimal for Whisper)
ffmpeg -i /data/input/video.mp4 \
  -vn -ar 16000 -ac 1 -c:a pcm_s16le \
  /data/input/audio_for_whisper.wav

# Extract and convert specific time range
ffmpeg -i /data/input/long_recording.mp3 \
  -ss 00:05:00 -to 00:10:00 \
  -vn -ar 16000 -ac 1 -c:a pcm_s16le \
  /data/input/segment.wav
```

## Saving Transcription Output

```bash
# Save transcription to file
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
  -F "audio_file=@/data/input/recording.wav" \
  -F "output=json" \
  -o /data/output/transcription.json

# Save subtitles
curl -X POST "http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/asr" \
  -F "audio_file=@/data/input/recording.wav" \
  -F "output=srt" \
  -o /data/output/subtitles.srt
```

## Response Structure (JSON)

```json
{
  "text": "The full transcription text goes here.",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 4.5,
      "text": "The full transcription",
      "avg_logprob": -0.25,
      "no_speech_prob": 0.01
    },
    {
      "id": 1,
      "start": 4.5,
      "end": 7.2,
      "text": " text goes here.",
      "avg_logprob": -0.18,
      "no_speech_prob": 0.02
    }
  ],
  "language": "en"
}
```

## Supported Audio Formats

- WAV (recommended for best quality)
- MP3
- FLAC
- OGG
- M4A
- WebM

## Tips for AI Agents

- Always convert input audio to 16kHz mono WAV for the best accuracy and fastest processing.
- Use `language` parameter when you know the source language to improve accuracy and speed.
- Use `task=translate` to translate any language directly to English text.
- For long audio (>30 minutes), split into segments with FFmpeg first, then transcribe each segment.
- Check `no_speech_prob` in segments to filter out silence or noise sections.
- Use `output=verbose_json` with `word_timestamps=true` when you need precise timing for subtitle sync.
- SRT/VTT output formats are ready to use as video subtitles without additional processing.
- Check Whisper service health at `http://{{WHISPER_HOST}}:{{WHISPER_PORT}}/` or `/health`.

Related Skills

youtube-growth

54
from bidewio/better-openclaw

Act as an expert YouTube Strategy Consultant. Apply the Creator Unlock N.I.C.E.R. Framework for conducting channel audits, niche validation, and data-backed video ideation/thumbnail generation.

xyops-automate

54
from bidewio/better-openclaw

Build and manage automation pipelines using xyOps at {{XYOPS_HOST}}:{{XYOPS_PORT}}.

xml-parse

54
from bidewio/better-openclaw

Parse and transform XML/HTML documents using command-line tools in the shared volume at {{SHARED_VOLUME}}.

woodpecker-ci

54
from bidewio/better-openclaw

Lightweight container-native CI/CD with Woodpecker

web-interface-guidelines

54
from bidewio/better-openclaw

Checklist for reviewing UI code for compliance with comprehensive web interface, accessibility, performance, and content guidelines — based on Vercel's Web Interface Guidelines.

web-design-reviewer

54
from bidewio/better-openclaw

Inspect web interfaces for layout, responsive, accessibility, and visual issues, then apply targeted source code fixes and re-verify results.

weaviate-search

54
from bidewio/better-openclaw

Perform hybrid vector and keyword search using Weaviate at {{WEAVIATE_HOST}}:{{WEAVIATE_PORT}}.

watchtower-update

54
from bidewio/better-openclaw

Auto-update Docker containers using Watchtower.

vaultwarden-manage

54
from bidewio/better-openclaw

Self-hosted password management with Vaultwarden

vault-secrets

54
from bidewio/better-openclaw

Secrets management with HashiCorp Vault

vantajs-background

54
from bidewio/better-openclaw

Add animated WebGL background effects with Vanta.js — setup, parameters, resizing, performance considerations, and integration patterns in React/Next.js.

valkey-cache

54
from bidewio/better-openclaw

Cache data and manage state using Valkey (Redis-compatible) at {{VALKEY_HOST}}:{{VALKEY_PORT}}.