whisper-gpu-transcribe

Convert audio to SRT subtitles using OpenAI Whisper with automatic GPU acceleration for Intel XPU / NVIDIA CUDA / AMD ROCm / Apple Metal. Ideal for content creators as a free alternative to paid subtitle generation.

3,891 stars

byopenclaw

View on GitHub Installation ↓

Best use case

whisper-gpu-transcribe is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using whisper-gpu-transcribe should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/whisper-gpu-transcriber-skill/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/allanmeng/whisper-gpu-transcriber-skill/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/whisper-gpu-transcriber-skill/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How whisper-gpu-transcribe Compares

Feature / Agent	whisper-gpu-transcribe	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

Best AI Agents for Marketing

A curated list of the best AI agents and skills for marketing teams focused on SEO, content systems, outreach, and campaign execution.

SKILL.md Source

# 🎙️ Whisper GPU Audio Transcriber

Convert audio files to SRT subtitles using local Whisper models — **completely free**, offline, and GPU accelerated.

---

## Use Cases

- Content creation, free alternative to paid subtitle features (e.g., CapCut/剪映)
- Meeting recording to text
- Podcast/course subtitles

---

## Supported GPU Acceleration

| Device | Acceleration | FP16 |
|--------|-------------|------|
| Intel Arc Series | XPU | ❌ Auto disabled |
| NVIDIA GPUs | CUDA | ✅ Auto enabled |
| AMD GPUs | ROCm | ✅ Auto enabled |
| Apple M Series | Metal | ✅ Auto enabled |
| No GPU | CPU | ❌ Auto disabled |

---

## Usage

### Basic Usage

Place the audio file in your current working directory and tell the AI:

```
Convert xxx.mp3 to SRT subtitles
```

Or specify the full path directly:

```
Convert /path/to/audio.mp3 to SRT subtitles
```

### Advanced Usage

```
Convert xxx.mp3 to English subtitles using large-v3-turbo model

Convert xxx.mp3 to subtitles, language is Japanese
```

---

## Execution

AI will execute the `scripts/transcribe.py` script, which will:

1. Automatically detect available GPU and select optimal acceleration
2. Load Whisper model (default: `turbo`)
3. Transcribe audio to SRT format
4. Save output in the same directory as the audio

---

## Requirements

- Python 3.8+
- PyTorch (version matching your hardware)
  - Intel GPU: `pip install torch==2.10.0+xpu`
  - NVIDIA GPU: `pip install torch --index-url https://download.pytorch.org/whl/cu121`
  - CPU: `pip install torch`
- openai-whisper: Automatically installed via `pip install openai-whisper`

---

## Notes

- First run will auto-download the model file (turbo ~1.5GB)
- Models cache in `~/.cache/whisper` by default, use symlink/Junction to redirect to another disk
- Intel XPU requires Intel Arc GPU + matching PyTorch version

> **Tip for China users**: If model download fails, manually download from mirror sites and place in `~/.cache/whisper/`

---

## Supported Models

| Model | Size | Speed | Accuracy |
|-------|------|-------|----------|
| `tiny` | 39M | Fastest | Low |
| `base` | 74M | Fast | Medium |
| `small` | 244M | Medium | Medium |
| `medium` | 769M | Slow | High |
| `turbo` | 809M | Medium | High ✅ Recommended |
| `large-v3` | 1550M | Slowest | Highest |
| `large-v3-turbo` | 1550M | Slow | Highest |

---
---

# 🎙️ Whisper GPU 音频转字幕

使用本地 Whisper 模型将音频文件转录为 SRT 字幕，**完全免费**，无需联网，支持 GPU 加速。

---

## 适用场景

- 自媒体视频制作，替代剪映付费字幕功能
- 会议录音转文字
- 播客/课程内容转字幕

---

## 支持的 GPU 加速

| 设备 | 加速方式 | FP16 |
|------|---------|------|
| Intel Arc 系列 | XPU | ❌ 自动禁用 |
| NVIDIA 显卡 | CUDA | ✅ 自动启用 |
| AMD 显卡 | ROCm | ✅ 自动启用 |
| Apple M 系列 | Metal | ✅ 自动启用 |
| 无独显 | CPU | ❌ 自动禁用 |

---

## 使用方法

### 基础用法

将音频文件放入当前工作目录，然后告诉 AI：

```
把 xxx.mp3 转成 SRT 字幕文件
```

或者直接指定路径：

```
把 /path/to/audio.mp3 转成 SRT 字幕
```

### 高级用法

```
把 xxx.mp3 用 large-v3-turbo 模型转成英文字幕

把 xxx.mp3 转成字幕，语言是日语
```

---

## 执行方式

AI 会调用 `scripts/transcribe.py` 脚本执行转录，脚本会：

1. 自动检测可用 GPU 设备并选择最优加速方式
2. 加载 Whisper 模型（默认 `turbo`）
3. 将音频转录为 SRT 格式字幕
4. 输出文件保存在与音频同目录

---

## 环境要求

- Python 3.8+
- PyTorch（版本需匹配硬件）
  - Intel GPU：`pip install torch==2.10.0+xpu`
  - NVIDIA GPU：`pip install torch --index-url https://download.pytorch.org/whl/cu121`
  - CPU：`pip install torch`
- openai-whisper：由 ClawHub 通过 `pip install openai-whisper` 自动安装

---

## 注意事项

- 首次运行会自动下载模型文件（turbo 约 1.5GB）
- 模型默认缓存在 `~/.cache/whisper`，可用软链接/Junction 指向其他磁盘
- Intel XPU 需要 Intel Arc 独显 + 对应版本 PyTorch

> **国内用户提示**：首次运行会自动下载模型，如下载失败可手动从镜像站下载后放入 `~/.cache/whisper/`

---

## 支持的模型

| 模型 | 大小 | 速度 | 准确度 |
|------|------|------|--------|
| `tiny` | 39M | 最快 | 低 |
| `base` | 74M | 快 | 中 |
| `small` | 244M | 中 | 中 |
| `medium` | 769M | 慢 | 高 |
| `turbo` | 809M | 中 | 高 ✅ 推荐 |
| `large-v3` | 1550M | 最慢 | 最高 |
| `large-v3-turbo` | 1550M | 慢 | 最高 |

Related Skills

local-whisper

3891

from openclaw/skills

Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.

openai-whisper

3891

from openclaw/skills

Local speech-to-text with the Whisper CLI (no API key).

whisper-context

3891

from openclaw/skills

Official Whisper Context skill for OpenClaw. Cuts context tokens via delta compression + caching, and adds long-term memory across sessions.

usewhisper-autohook

3891

from openclaw/skills

Auto-hook tools for OpenClaw: query Whisper Context before every generation, ingest after every turn. Built for Telegram agents (stable user_id/session_id).

voice-transcriber

3891

from openclaw/skills

Voice note transcription and archival for OpenClaw agents. Powered by Deepgram Nova-3. Transcribes audio messages, saves both audio files and text transcripts. Perfect for voice-first AI workflows, founder journaling, and meeting notes.

video-transcriber

3891

from openclaw/skills

(已验证) 强大的抖音视频批量转写器，集成了下载、音频提取和本地 Whisper 模型转写功能。

aj-openai-whisper

3891

from openclaw/skills

Local speech-to-text with the Whisper CLI (no API key).

u2-audio-file-transcriber

3891

from openclaw/skills

Transcribe audio files via UniCloud ASR (云知声语音识别, recorded audio → text) API from UniSound. Supports multiple formats, optimized for finance, customer service, and other domains.

name: u2-audio-file-transcriber

3891

from openclaw/skills

description: "Transcribe audio files via UniCloud ASR (云知声语音识别, recorded audio → text) API from UniSound. Supports multiple formats, optimized for finance, customer service, and other domains. 调用云知声语音识别服务转写音频文件，支持多种音频格式，适用于金融、客服等场景。Use when the user needs to transcribe recorded audio files, or asks for UniSound/云知声 audio file transcription. Do NOT use for real-time/streaming speech recognition, text-to-speech (TTS), or live captioning. 不适用于实时语音识别、语音合成(TTS)或直播字幕。"