whisper-asr

Q: What does this skill do?

本地 Whisper 语音识别配置。自动将飞书/Telegram 等渠道的语音消息转成文字。 适用于需要离线、低延迟语音转文字的场景。

本地 Whisper 语音识别配置。自动将飞书/Telegram 等渠道的语音消息转成文字。适用于需要离线、低延迟语音转文字的场景。

3,891 stars

byopenclaw

View on GitHub Installation ↓

Best use case

whisper-asr is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

本地 Whisper 语音识别配置。自动将飞书/Telegram 等渠道的语音消息转成文字。适用于需要离线、低延迟语音转文字的场景。

Teams using whisper-asr should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/openclaw-whisper-asr/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/279458179/openclaw-whisper-asr/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/openclaw-whisper-asr/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How whisper-asr Compares

Feature / Agent	whisper-asr	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

本地 Whisper 语音识别配置。自动将飞书/Telegram 等渠道的语音消息转成文字。适用于需要离线、低延迟语音转文字的场景。

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# 本地 Whisper 语音识别配置 (whisper-asr)

## 概述

通过 whisper.cpp 在服务器上配置本地语音识别，用于：
- 识别用户发来的语音消息
- 离线运行，无需 API
- 支持中文等多种语言

## 前置要求

- Linux 服务器（已测试 Ubuntu/Debian）
- ffmpeg 已安装
- ~150MB 磁盘空间（base 模型）

---

## 安装步骤

### 1. 安装 ffmpeg

```bash
sudo apt-get update
sudo apt-get install -y ffmpeg
```

### 2. 克隆 whisper.cpp

```bash
cd /home/brew/.openclaw/workspace
git clone https://github.com/ggml-org/whisper.cpp.git
```

### 3. 下载中文模型

```bash
cd whisper.cpp
sh ./models/download-ggml-model.sh base
```

**模型选择建议：**

| 模型 | 大小 | 内存 | 推荐场景 |
|------|------|------|---------|
| tiny | 75 MB | ~273 MB | 快速测试 |
| **base** | 142 MB | ~388 MB | 平衡推荐 |
| small | 466 MB | ~852 MB | 更高精度 |

### 4. 编译

```bash
cd whisper.cpp
cmake -B build
cmake --build build -j --config Release
```

---

## 使用方式

### 1. 转换音频格式

飞书语音通常是 ogg 格式，需要转换为 whisper 需要的格式：

```bash
ffmpeg -i input.ogg -ar 16000 -ac 1 -c:a pcm_s16le output.wav
```

### 2. 语音转文字

```bash
./build/bin/whisper-cli \
  -m models/ggml-base.bin \
  -f output.wav \
  --language zh \
  --no-timestamps
```

**常用参数：**
- `-m`: 模型路径
- `-f`: 输入音频文件
- `--language zh`: 指定中文
- `--no-timestamps`: 不输出时间戳
- `-t 4`: 线程数（默认自动）

### 3. 完整示例（单命令）

```bash
ffmpeg -i input.ogg -ar 16000 -ac 1 -c:a pcm_s16le /tmp/audio.wav && \
./build/bin/whisper-cli -m models/ggml-base.bin -f /tmp/audio.wav --language zh --no-timestamps
```

---

## 路径速查

| 项目 | 路径 |
|------|------|
| whisper.cpp 目录 | `/home/brew/.openclaw/workspace/whisper.cpp` |
| 可执行文件 | `/home/brew/.openclaw/workspace/whisper.cpp/build/bin/whisper-cli` |
| 模型目录 | `/home/brew/.openclaw/workspace/whisper.cpp/models/` |
| base 模型 | `/home/brew/.openclaw/workspace/whisper.cpp/models/ggml-base.bin` |

---

## 常见问题

### Q: 识别结果不准确？
A: 尝试使用更大的模型（small/medium），或在安静环境下录音。

### Q: 识别速度慢？
A: 增加线程数：`./whisper-cli -t 8 ...`

### Q: 支持其他语言？
A: 不指定 `--language` 会自动检测。也可指定 `--language en` 等。

---

## 进阶：量化模型（节省资源）

```bash
# 量化（减少模型大小）
./build/bin/quantize models/ggml-base.bin models/ggml-base-q5.bin q5_0

# 使用量化模型
./build/bin/whisper-cli -m models/ggml-base-q5.bin -f audio.wav --language zh
```

---

_本技能参考 [whisper.cpp 官方文档](https://github.com/ggml-org/whisper.cpp)_

Related Skills

local-whisper

3891

from openclaw/skills

Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.

openai-whisper

3891

from openclaw/skills

Local speech-to-text with the Whisper CLI (no API key).

whisper-gpu-transcribe

3891

from openclaw/skills

Convert audio to SRT subtitles using OpenAI Whisper with automatic GPU acceleration for Intel XPU / NVIDIA CUDA / AMD ROCm / Apple Metal. Ideal for content creators as a free alternative to paid subtitle generation.

whisper-context

3891

from openclaw/skills

Official Whisper Context skill for OpenClaw. Cuts context tokens via delta compression + caching, and adds long-term memory across sessions.

usewhisper-autohook

3891

from openclaw/skills

Auto-hook tools for OpenClaw: query Whisper Context before every generation, ingest after every turn. Built for Telegram agents (stable user_id/session_id).

aj-openai-whisper

3891

from openclaw/skills

Local speech-to-text with the Whisper CLI (no API key).

---

3891

from openclaw/skills

name: article-factory-wechat

Content & Documentation

humanizer

3891

from openclaw/skills

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.

Content & Documentation

find-skills

3891

from openclaw/skills

Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.

General Utilities

tavily-search

3891

from openclaw/skills

Use Tavily API for real-time web search and content extraction. Use when: user needs real-time web search results, research, or current information from the web. Requires Tavily API key.

Data & Research