whisper-gpu-transcribe
Convert audio to SRT subtitles using OpenAI Whisper with automatic GPU acceleration for Intel XPU / NVIDIA CUDA / AMD ROCm / Apple Metal. Ideal for content creators as a free alternative to paid subtitle generation.
Best use case
whisper-gpu-transcribe is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Convert audio to SRT subtitles using OpenAI Whisper with automatic GPU acceleration for Intel XPU / NVIDIA CUDA / AMD ROCm / Apple Metal. Ideal for content creators as a free alternative to paid subtitle generation.
Teams using whisper-gpu-transcribe should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/whisper-gpu-transcriber-skill/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How whisper-gpu-transcribe Compares
| Feature / Agent | whisper-gpu-transcribe | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Convert audio to SRT subtitles using OpenAI Whisper with automatic GPU acceleration for Intel XPU / NVIDIA CUDA / AMD ROCm / Apple Metal. Ideal for content creators as a free alternative to paid subtitle generation.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Marketing
Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.
AI Agents for Startups
Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.
Best AI Agents for Marketing
A curated list of the best AI agents and skills for marketing teams focused on SEO, content systems, outreach, and campaign execution.
SKILL.md Source
# 🎙️ Whisper GPU Audio Transcriber Convert audio files to SRT subtitles using local Whisper models — **completely free**, offline, and GPU accelerated. --- ## Use Cases - Content creation, free alternative to paid subtitle features (e.g., CapCut/剪映) - Meeting recording to text - Podcast/course subtitles --- ## Supported GPU Acceleration | Device | Acceleration | FP16 | |--------|-------------|------| | Intel Arc Series | XPU | ❌ Auto disabled | | NVIDIA GPUs | CUDA | ✅ Auto enabled | | AMD GPUs | ROCm | ✅ Auto enabled | | Apple M Series | Metal | ✅ Auto enabled | | No GPU | CPU | ❌ Auto disabled | --- ## Usage ### Basic Usage Place the audio file in your current working directory and tell the AI: ``` Convert xxx.mp3 to SRT subtitles ``` Or specify the full path directly: ``` Convert /path/to/audio.mp3 to SRT subtitles ``` ### Advanced Usage ``` Convert xxx.mp3 to English subtitles using large-v3-turbo model Convert xxx.mp3 to subtitles, language is Japanese ``` --- ## Execution AI will execute the `scripts/transcribe.py` script, which will: 1. Automatically detect available GPU and select optimal acceleration 2. Load Whisper model (default: `turbo`) 3. Transcribe audio to SRT format 4. Save output in the same directory as the audio --- ## Requirements - Python 3.8+ - PyTorch (version matching your hardware) - Intel GPU: `pip install torch==2.10.0+xpu` - NVIDIA GPU: `pip install torch --index-url https://download.pytorch.org/whl/cu121` - CPU: `pip install torch` - openai-whisper: Automatically installed via `pip install openai-whisper` --- ## Notes - First run will auto-download the model file (turbo ~1.5GB) - Models cache in `~/.cache/whisper` by default, use symlink/Junction to redirect to another disk - Intel XPU requires Intel Arc GPU + matching PyTorch version > **Tip for China users**: If model download fails, manually download from mirror sites and place in `~/.cache/whisper/` --- ## Supported Models | Model | Size | Speed | Accuracy | |-------|------|-------|----------| | `tiny` | 39M | Fastest | Low | | `base` | 74M | Fast | Medium | | `small` | 244M | Medium | Medium | | `medium` | 769M | Slow | High | | `turbo` | 809M | Medium | High ✅ Recommended | | `large-v3` | 1550M | Slowest | Highest | | `large-v3-turbo` | 1550M | Slow | Highest | --- --- # 🎙️ Whisper GPU 音频转字幕 使用本地 Whisper 模型将音频文件转录为 SRT 字幕,**完全免费**,无需联网,支持 GPU 加速。 --- ## 适用场景 - 自媒体视频制作,替代剪映付费字幕功能 - 会议录音转文字 - 播客/课程内容转字幕 --- ## 支持的 GPU 加速 | 设备 | 加速方式 | FP16 | |------|---------|------| | Intel Arc 系列 | XPU | ❌ 自动禁用 | | NVIDIA 显卡 | CUDA | ✅ 自动启用 | | AMD 显卡 | ROCm | ✅ 自动启用 | | Apple M 系列 | Metal | ✅ 自动启用 | | 无独显 | CPU | ❌ 自动禁用 | --- ## 使用方法 ### 基础用法 将音频文件放入当前工作目录,然后告诉 AI: ``` 把 xxx.mp3 转成 SRT 字幕文件 ``` 或者直接指定路径: ``` 把 /path/to/audio.mp3 转成 SRT 字幕 ``` ### 高级用法 ``` 把 xxx.mp3 用 large-v3-turbo 模型转成英文字幕 把 xxx.mp3 转成字幕,语言是日语 ``` --- ## 执行方式 AI 会调用 `scripts/transcribe.py` 脚本执行转录,脚本会: 1. 自动检测可用 GPU 设备并选择最优加速方式 2. 加载 Whisper 模型(默认 `turbo`) 3. 将音频转录为 SRT 格式字幕 4. 输出文件保存在与音频同目录 --- ## 环境要求 - Python 3.8+ - PyTorch(版本需匹配硬件) - Intel GPU:`pip install torch==2.10.0+xpu` - NVIDIA GPU:`pip install torch --index-url https://download.pytorch.org/whl/cu121` - CPU:`pip install torch` - openai-whisper:由 ClawHub 通过 `pip install openai-whisper` 自动安装 --- ## 注意事项 - 首次运行会自动下载模型文件(turbo 约 1.5GB) - 模型默认缓存在 `~/.cache/whisper`,可用软链接/Junction 指向其他磁盘 - Intel XPU 需要 Intel Arc 独显 + 对应版本 PyTorch > **国内用户提示**:首次运行会自动下载模型,如下载失败可手动从镜像站下载后放入 `~/.cache/whisper/` --- ## 支持的模型 | 模型 | 大小 | 速度 | 准确度 | |------|------|------|--------| | `tiny` | 39M | 最快 | 低 | | `base` | 74M | 快 | 中 | | `small` | 244M | 中 | 中 | | `medium` | 769M | 慢 | 高 | | `turbo` | 809M | 中 | 高 ✅ 推荐 | | `large-v3` | 1550M | 最慢 | 最高 | | `large-v3-turbo` | 1550M | 慢 | 最高 |
Related Skills
local-whisper
Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.
openai-whisper
Local speech-to-text with the Whisper CLI (no API key).
whisper-context
Official Whisper Context skill for OpenClaw. Cuts context tokens via delta compression + caching, and adds long-term memory across sessions.
usewhisper-autohook
Auto-hook tools for OpenClaw: query Whisper Context before every generation, ingest after every turn. Built for Telegram agents (stable user_id/session_id).
voice-transcriber
Voice note transcription and archival for OpenClaw agents. Powered by Deepgram Nova-3. Transcribes audio messages, saves both audio files and text transcripts. Perfect for voice-first AI workflows, founder journaling, and meeting notes.
video-transcriber
(已验证) 强大的抖音视频批量转写器,集成了下载、音频提取和本地 Whisper 模型转写功能。
aj-openai-whisper
Local speech-to-text with the Whisper CLI (no API key).
u2-audio-file-transcriber
Transcribe audio files via UniCloud ASR (云知声语音识别, recorded audio → text) API from UniSound. Supports multiple formats, optimized for finance, customer service, and other domains.
name: u2-audio-file-transcriber
description: "Transcribe audio files via UniCloud ASR (云知声语音识别, recorded audio → text) API from UniSound. Supports multiple formats, optimized for finance, customer service, and other domains. 调用云知声语音识别服务转写音频文件,支持多种音频格式,适用于金融、客服等场景。Use when the user needs to transcribe recorded audio files, or asks for UniSound/云知声 audio file transcription. Do NOT use for real-time/streaming speech recognition, text-to-speech (TTS), or live captioning. 不适用于实时语音识别、语音合成(TTS)或直播字幕。"
video-transcribe-v1-0-3
本地视频转文字 - 使用 OpenAI Whisper 进行语音识别,完全免费、离线运行、保护隐私
whisper-asr
本地 Whisper 语音识别配置。自动将飞书/Telegram 等渠道的语音消息转成文字。 适用于需要离线、低延迟语音转文字的场景。
---
name: article-factory-wechat