whisper-gpu-transcribe

Convert audio to SRT subtitles using OpenAI Whisper with automatic GPU acceleration for Intel XPU / NVIDIA CUDA / AMD ROCm / Apple Metal. Ideal for content creators as a free alternative to paid subtitle generation.

3,891 stars

Best use case

whisper-gpu-transcribe is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Convert audio to SRT subtitles using OpenAI Whisper with automatic GPU acceleration for Intel XPU / NVIDIA CUDA / AMD ROCm / Apple Metal. Ideal for content creators as a free alternative to paid subtitle generation.

Teams using whisper-gpu-transcribe should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/whisper-gpu-transcriber-skill/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/allanmeng/whisper-gpu-transcriber-skill/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/whisper-gpu-transcriber-skill/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How whisper-gpu-transcribe Compares

Feature / Agentwhisper-gpu-transcribeStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Convert audio to SRT subtitles using OpenAI Whisper with automatic GPU acceleration for Intel XPU / NVIDIA CUDA / AMD ROCm / Apple Metal. Ideal for content creators as a free alternative to paid subtitle generation.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# 🎙️ Whisper GPU Audio Transcriber

Convert audio files to SRT subtitles using local Whisper models — **completely free**, offline, and GPU accelerated.

---

## Use Cases

- Content creation, free alternative to paid subtitle features (e.g., CapCut/剪映)
- Meeting recording to text
- Podcast/course subtitles

---

## Supported GPU Acceleration

| Device | Acceleration | FP16 |
|--------|-------------|------|
| Intel Arc Series | XPU | ❌ Auto disabled |
| NVIDIA GPUs | CUDA | ✅ Auto enabled |
| AMD GPUs | ROCm | ✅ Auto enabled |
| Apple M Series | Metal | ✅ Auto enabled |
| No GPU | CPU | ❌ Auto disabled |

---

## Usage

### Basic Usage

Place the audio file in your current working directory and tell the AI:

```
Convert xxx.mp3 to SRT subtitles
```

Or specify the full path directly:

```
Convert /path/to/audio.mp3 to SRT subtitles
```

### Advanced Usage

```
Convert xxx.mp3 to English subtitles using large-v3-turbo model

Convert xxx.mp3 to subtitles, language is Japanese
```

---

## Execution

AI will execute the `scripts/transcribe.py` script, which will:

1. Automatically detect available GPU and select optimal acceleration
2. Load Whisper model (default: `turbo`)
3. Transcribe audio to SRT format
4. Save output in the same directory as the audio

---

## Requirements

- Python 3.8+
- PyTorch (version matching your hardware)
  - Intel GPU: `pip install torch==2.10.0+xpu`
  - NVIDIA GPU: `pip install torch --index-url https://download.pytorch.org/whl/cu121`
  - CPU: `pip install torch`
- openai-whisper: Automatically installed via `pip install openai-whisper`

---

## Notes

- First run will auto-download the model file (turbo ~1.5GB)
- Models cache in `~/.cache/whisper` by default, use symlink/Junction to redirect to another disk
- Intel XPU requires Intel Arc GPU + matching PyTorch version

> **Tip for China users**: If model download fails, manually download from mirror sites and place in `~/.cache/whisper/`

---

## Supported Models

| Model | Size | Speed | Accuracy |
|-------|------|-------|----------|
| `tiny` | 39M | Fastest | Low |
| `base` | 74M | Fast | Medium |
| `small` | 244M | Medium | Medium |
| `medium` | 769M | Slow | High |
| `turbo` | 809M | Medium | High ✅ Recommended |
| `large-v3` | 1550M | Slowest | Highest |
| `large-v3-turbo` | 1550M | Slow | Highest |

---
---

# 🎙️ Whisper GPU 音频转字幕

使用本地 Whisper 模型将音频文件转录为 SRT 字幕,**完全免费**,无需联网,支持 GPU 加速。

---

## 适用场景

- 自媒体视频制作,替代剪映付费字幕功能
- 会议录音转文字
- 播客/课程内容转字幕

---

## 支持的 GPU 加速

| 设备 | 加速方式 | FP16 |
|------|---------|------|
| Intel Arc 系列 | XPU | ❌ 自动禁用 |
| NVIDIA 显卡 | CUDA | ✅ 自动启用 |
| AMD 显卡 | ROCm | ✅ 自动启用 |
| Apple M 系列 | Metal | ✅ 自动启用 |
| 无独显 | CPU | ❌ 自动禁用 |

---

## 使用方法

### 基础用法

将音频文件放入当前工作目录,然后告诉 AI:

```
把 xxx.mp3 转成 SRT 字幕文件
```

或者直接指定路径:

```
把 /path/to/audio.mp3 转成 SRT 字幕
```

### 高级用法

```
把 xxx.mp3 用 large-v3-turbo 模型转成英文字幕

把 xxx.mp3 转成字幕,语言是日语
```

---

## 执行方式

AI 会调用 `scripts/transcribe.py` 脚本执行转录,脚本会:

1. 自动检测可用 GPU 设备并选择最优加速方式
2. 加载 Whisper 模型(默认 `turbo`)
3. 将音频转录为 SRT 格式字幕
4. 输出文件保存在与音频同目录

---

## 环境要求

- Python 3.8+
- PyTorch(版本需匹配硬件)
  - Intel GPU:`pip install torch==2.10.0+xpu`
  - NVIDIA GPU:`pip install torch --index-url https://download.pytorch.org/whl/cu121`
  - CPU:`pip install torch`
- openai-whisper:由 ClawHub 通过 `pip install openai-whisper` 自动安装

---

## 注意事项

- 首次运行会自动下载模型文件(turbo 约 1.5GB)
- 模型默认缓存在 `~/.cache/whisper`,可用软链接/Junction 指向其他磁盘
- Intel XPU 需要 Intel Arc 独显 + 对应版本 PyTorch

> **国内用户提示**:首次运行会自动下载模型,如下载失败可手动从镜像站下载后放入 `~/.cache/whisper/`

---

## 支持的模型

| 模型 | 大小 | 速度 | 准确度 |
|------|------|------|--------|
| `tiny` | 39M | 最快 | 低 |
| `base` | 74M | 快 | 中 |
| `small` | 244M | 中 | 中 |
| `medium` | 769M | 慢 | 高 |
| `turbo` | 809M | 中 | 高 ✅ 推荐 |
| `large-v3` | 1550M | 最慢 | 最高 |
| `large-v3-turbo` | 1550M | 慢 | 最高 |

Related Skills

local-whisper

3891
from openclaw/skills

Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.

openai-whisper

3891
from openclaw/skills

Local speech-to-text with the Whisper CLI (no API key).

whisper-context

3891
from openclaw/skills

Official Whisper Context skill for OpenClaw. Cuts context tokens via delta compression + caching, and adds long-term memory across sessions.

usewhisper-autohook

3891
from openclaw/skills

Auto-hook tools for OpenClaw: query Whisper Context before every generation, ingest after every turn. Built for Telegram agents (stable user_id/session_id).

voice-transcriber

3891
from openclaw/skills

Voice note transcription and archival for OpenClaw agents. Powered by Deepgram Nova-3. Transcribes audio messages, saves both audio files and text transcripts. Perfect for voice-first AI workflows, founder journaling, and meeting notes.

video-transcriber

3891
from openclaw/skills

(已验证) 强大的抖音视频批量转写器,集成了下载、音频提取和本地 Whisper 模型转写功能。

aj-openai-whisper

3891
from openclaw/skills

Local speech-to-text with the Whisper CLI (no API key).

u2-audio-file-transcriber

3891
from openclaw/skills

Transcribe audio files via UniCloud ASR (云知声语音识别, recorded audio → text) API from UniSound. Supports multiple formats, optimized for finance, customer service, and other domains.

name: u2-audio-file-transcriber

3891
from openclaw/skills

description: "Transcribe audio files via UniCloud ASR (云知声语音识别, recorded audio → text) API from UniSound. Supports multiple formats, optimized for finance, customer service, and other domains. 调用云知声语音识别服务转写音频文件,支持多种音频格式,适用于金融、客服等场景。Use when the user needs to transcribe recorded audio files, or asks for UniSound/云知声 audio file transcription. Do NOT use for real-time/streaming speech recognition, text-to-speech (TTS), or live captioning. 不适用于实时语音识别、语音合成(TTS)或直播字幕。"

video-transcribe-v1-0-3

3891
from openclaw/skills

本地视频转文字 - 使用 OpenAI Whisper 进行语音识别,完全免费、离线运行、保护隐私

whisper-asr

3891
from openclaw/skills

本地 Whisper 语音识别配置。自动将飞书/Telegram 等渠道的语音消息转成文字。 适用于需要离线、低延迟语音转文字的场景。

---

3891
from openclaw/skills

name: article-factory-wechat

Content & Documentation