Audio Transcription Skill

Auto-transcribe voice messages using faster-whisper (local, no API key needed).

3,891 stars

Best use case

Audio Transcription Skill is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Auto-transcribe voice messages using faster-whisper (local, no API key needed).

Teams using Audio Transcription Skill should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/audio-transcribe/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/aktheknight/audio-transcribe/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/audio-transcribe/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How Audio Transcription Skill Compares

Feature / AgentAudio Transcription SkillStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Auto-transcribe voice messages using faster-whisper (local, no API key needed).

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Audio Transcription Skill

Auto-transcribe voice messages using faster-whisper (local, no API key needed).

## Requirements

```bash
pip install faster-whisper
```

Models download automatically on first use.

## Usage

### Transcribe a file

```bash
python3 /root/clawd/skills/audio-transcribe/scripts/transcribe.py /path/to/audio.ogg
```

### Change model (edit script)

Edit `transcribe.py` and change:
```python
model = WhisperModel('small', device='cpu', compute_type='int8')  # Options: tiny, base, small, medium, large-v3
```

## Models

| Model | Size | VRAM/RAM | Speed | Use Case |
|-------|------|----------|-------|----------|
| tiny | 39 MB | ~1 GB | ⚡⚡⚡ | Quick drafts |
| base | 74 MB | ~1 GB | ⚡⚡ | Basic accuracy |
| **small** | **244 MB** | **~2 GB** | **⚡** | **Recommended** |
| medium | 769 MB | ~5 GB | 🐢 | Better accuracy |
| large-v3 | 1.5 GB | ~10 GB | 🐢🐢 | Best accuracy |

## Integration

Clawdbot auto-transcribes incoming voice messages when this skill is enabled.

## Files

- `scripts/transcribe.py` — Main transcription script
- `SKILL.md` — This file

Related Skills

youtube-audio-download

3891
from openclaw/skills

Download YouTube video audio and convert to MP3. Supports age-restricted videos with cookies.

audio-play

3891
from openclaw/skills

Play audio files using Windows media player. Non-blocking execution.

audio-rename

3891
from openclaw/skills

Rename audio files with Chinese/special characters to simple English names for mlx-stt compatibility.

audiobooklm

3891
from openclaw/skills

提供有声书创作与音频能力(ABS 读写、音效/音频检索、二创、音色推荐、章节角色分析等),通过 HTTP Streamable MCP 调用。

qwen-audio-lab

3891
from openclaw/skills

Hybrid text-to-speech, reusable voice cloning, and narrated audio generation for macOS plus Aliyun Qwen. Use when the user wants to convert text into speech, clone and reuse a voice from a reference recording, generate narration files from plain text or text files, or create PPT speaker-note voiceovers.

deapi-audio

3891
from openclaw/skills

Text-to-speech, voice cloning, voice design, and transcribe audio files via deAPI GPU network. Trigger on 'text to speech', 'TTS', 'generate voice', 'read aloud', 'voice clone', 'clone voice', 'voice design', 'design voice', 'custom voice', 'transcribe audio', 'STT'. For video/YouTube transcription use deapi-video instead.

audio-summary Skill

3891
from openclaw/skills

音频/视频转文本总结助手。

audio-script-writer

3891
from openclaw/skills

Convert written medical content into podcast or video scripts optimized for audio delivery. Transforms academic papers, reports, and educational materials into engaging spoken-word formats with pronunciation guides, timing markers, and audio-friendly structure.

audio-to-text-and-video-to-text

3891
from openclaw/skills

Transcribe audio and video files into text using OpenAI's Whisper API. Use this skill whenever a user wants to convert any audio or video file to text — including MP3, MP4, WAV, M4A, OGG, WEBM, MOV, AVI, FLAC, and more. Trigger this skill for any request involving: "transcribe", "convert audio to text", "speech to text", "get transcript of", "extract audio from video", "meeting notes from recording", "subtitles", "captions", or similar. Also trigger when the user uploads or references a media file and asks what was said, discussed, or mentioned in it. If unsure whether audio/video transcription is involved, use this skill.

u2-audio-file-transcriber

3891
from openclaw/skills

Transcribe audio files via UniCloud ASR (云知声语音识别, recorded audio → text) API from UniSound. Supports multiple formats, optimized for finance, customer service, and other domains.

name: u2-audio-file-transcriber

3891
from openclaw/skills

description: "Transcribe audio files via UniCloud ASR (云知声语音识别, recorded audio → text) API from UniSound. Supports multiple formats, optimized for finance, customer service, and other domains. 调用云知声语音识别服务转写音频文件,支持多种音频格式,适用于金融、客服等场景。Use when the user needs to transcribe recorded audio files, or asks for UniSound/云知声 audio file transcription. Do NOT use for real-time/streaming speech recognition, text-to-speech (TTS), or live captioning. 不适用于实时语音识别、语音合成(TTS)或直播字幕。"

li-feishu-audio

3891
from openclaw/skills

飞书语音交互技能。支持语音消息自动识别、AI 处理、语音回复全流程。需要配置 FEISHU_APP_ID 和 FEISHU_APP_SECRET 环境变量。使用 faster-whisper 进行语音识别,Edge TTS 进行语音合成,自动转换 OPUS 格式并通过飞书发送。适用于飞书平台的语音对话场景。