Best use case
whisper-asr is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
本地 Whisper 语音识别配置。自动将飞书/Telegram 等渠道的语音消息转成文字。 适用于需要离线、低延迟语音转文字的场景。
Teams using whisper-asr should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/openclaw-whisper-asr/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How whisper-asr Compares
| Feature / Agent | whisper-asr | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
本地 Whisper 语音识别配置。自动将飞书/Telegram 等渠道的语音消息转成文字。 适用于需要离线、低延迟语音转文字的场景。
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Marketing
Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.
AI Agents for Startups
Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
SKILL.md Source
# 本地 Whisper 语音识别配置 (whisper-asr) ## 概述 通过 whisper.cpp 在服务器上配置本地语音识别,用于: - 识别用户发来的语音消息 - 离线运行,无需 API - 支持中文等多种语言 ## 前置要求 - Linux 服务器(已测试 Ubuntu/Debian) - ffmpeg 已安装 - ~150MB 磁盘空间(base 模型) --- ## 安装步骤 ### 1. 安装 ffmpeg ```bash sudo apt-get update sudo apt-get install -y ffmpeg ``` ### 2. 克隆 whisper.cpp ```bash cd /home/brew/.openclaw/workspace git clone https://github.com/ggml-org/whisper.cpp.git ``` ### 3. 下载中文模型 ```bash cd whisper.cpp sh ./models/download-ggml-model.sh base ``` **模型选择建议:** | 模型 | 大小 | 内存 | 推荐场景 | |------|------|------|---------| | tiny | 75 MB | ~273 MB | 快速测试 | | **base** | 142 MB | ~388 MB | 平衡推荐 | | small | 466 MB | ~852 MB | 更高精度 | ### 4. 编译 ```bash cd whisper.cpp cmake -B build cmake --build build -j --config Release ``` --- ## 使用方式 ### 1. 转换音频格式 飞书语音通常是 ogg 格式,需要转换为 whisper 需要的格式: ```bash ffmpeg -i input.ogg -ar 16000 -ac 1 -c:a pcm_s16le output.wav ``` ### 2. 语音转文字 ```bash ./build/bin/whisper-cli \ -m models/ggml-base.bin \ -f output.wav \ --language zh \ --no-timestamps ``` **常用参数:** - `-m`: 模型路径 - `-f`: 输入音频文件 - `--language zh`: 指定中文 - `--no-timestamps`: 不输出时间戳 - `-t 4`: 线程数(默认自动) ### 3. 完整示例(单命令) ```bash ffmpeg -i input.ogg -ar 16000 -ac 1 -c:a pcm_s16le /tmp/audio.wav && \ ./build/bin/whisper-cli -m models/ggml-base.bin -f /tmp/audio.wav --language zh --no-timestamps ``` --- ## 路径速查 | 项目 | 路径 | |------|------| | whisper.cpp 目录 | `/home/brew/.openclaw/workspace/whisper.cpp` | | 可执行文件 | `/home/brew/.openclaw/workspace/whisper.cpp/build/bin/whisper-cli` | | 模型目录 | `/home/brew/.openclaw/workspace/whisper.cpp/models/` | | base 模型 | `/home/brew/.openclaw/workspace/whisper.cpp/models/ggml-base.bin` | --- ## 常见问题 ### Q: 识别结果不准确? A: 尝试使用更大的模型(small/medium),或在安静环境下录音。 ### Q: 识别速度慢? A: 增加线程数:`./whisper-cli -t 8 ...` ### Q: 支持其他语言? A: 不指定 `--language` 会自动检测。也可指定 `--language en` 等。 --- ## 进阶:量化模型(节省资源) ```bash # 量化(减少模型大小) ./build/bin/quantize models/ggml-base.bin models/ggml-base-q5.bin q5_0 # 使用量化模型 ./build/bin/whisper-cli -m models/ggml-base-q5.bin -f audio.wav --language zh ``` --- _本技能参考 [whisper.cpp 官方文档](https://github.com/ggml-org/whisper.cpp)_
Related Skills
local-whisper
Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.
openai-whisper
Local speech-to-text with the Whisper CLI (no API key).
whisper-gpu-transcribe
Convert audio to SRT subtitles using OpenAI Whisper with automatic GPU acceleration for Intel XPU / NVIDIA CUDA / AMD ROCm / Apple Metal. Ideal for content creators as a free alternative to paid subtitle generation.
whisper-context
Official Whisper Context skill for OpenClaw. Cuts context tokens via delta compression + caching, and adds long-term memory across sessions.
usewhisper-autohook
Auto-hook tools for OpenClaw: query Whisper Context before every generation, ingest after every turn. Built for Telegram agents (stable user_id/session_id).
aj-openai-whisper
Local speech-to-text with the Whisper CLI (no API key).
---
name: article-factory-wechat
humanizer
Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.
find-skills
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
tavily-search
Use Tavily API for real-time web search and content extraction. Use when: user needs real-time web search results, research, or current information from the web. Requires Tavily API key.
baidu-search
Search the web using Baidu AI Search Engine (BDSE). Use for live information, documentation, or research topics.
agent-autonomy-kit
Stop waiting for prompts. Keep working.