audio-summary Skill

音频/视频转文本总结助手。

3,891 stars

Best use case

audio-summary Skill is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

音频/视频转文本总结助手。

Teams using audio-summary Skill should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/audio-summary/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/alanoo7/audio-summary/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/audio-summary/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How audio-summary Skill Compares

Feature / Agentaudio-summary SkillStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

音频/视频转文本总结助手。

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# audio-summary Skill

音频/视频转文本总结助手。

## 功能

1.  **自动音频提取**:使用 `ffmpeg` 从 MP4 等视频文件中提取 16k mono 压缩音频,以适配大模型体积限制。
2.  **转录转总结**:基于百炼 `qwen3-asr-flash` 模型,自动将音频转换为文字并生成内容分段总结。
3.  **大文件支持**:通过 48k 压缩,支持最长约 5-8 分钟的视频单次直接转录。

## 依赖

-   `ffmpeg` (已安装在系统路径)
-   `openai` Python SDK (已安装)
-   百炼 API KEY (已在脚本中配置为 `sk-76735...`)

## 使用方法

### 从命令行运行

```powershell
# 对指定视频进行提取和总结
python .openclaw/workspace/skills/audio-summary/audio_summary_skill.py "C:\Path\To\Your\Video.mp4"
```

### 文件位置
- 提取出的总结文本将自动保存在视频同级目录下,并命名为 `视频名_summary.txt`。

## 注意事项
- 目前单次 Base64 转录限制为 6MB,对于超过 10 分钟的长视频,建议先手动切分或进一步降低码率。
- API 费用按 `qwen3-asr-flash` 模型计费。

Related Skills

email-daily-summary

3891
from openclaw/skills

Automatically logs into email accounts (Gmail, Outlook, QQ Mail, etc.) and generates daily email summaries. Use when the user wants to get a summary of their emails, check important messages, or create daily email digests.

Workflow & Productivity

email-summary

3891
from openclaw/skills

Fetches recent emails from Gmail and provides concise summaries. Use when the user wants to check emails, get email summaries, or review their inbox.

youtube-audio-download

3891
from openclaw/skills

Download YouTube video audio and convert to MP3. Supports age-restricted videos with cookies.

audio-play

3891
from openclaw/skills

Play audio files using Windows media player. Non-blocking execution.

audio-rename

3891
from openclaw/skills

Rename audio files with Chinese/special characters to simple English names for mlx-stt compatibility.

audiobooklm

3891
from openclaw/skills

提供有声书创作与音频能力(ABS 读写、音效/音频检索、二创、音色推荐、章节角色分析等),通过 HTTP Streamable MCP 调用。

Daily Summary Skill - 每日总结技能

3891
from openclaw/skills

**Version:** 1.0.0

aibrary-podcast-summary

3891
from openclaw/skills

[Aibrary] Generate a book summary podcast script in a single-narrator storytelling style. Use when the user wants to turn a book into a podcast, create an audio summary of a book, or generate a summary-style podcast script. The output is a narrated monologue that distills a book's key ideas into an engaging 10-15 minute listening experience.

solax-summary-fetch

3891
from openclaw/skills

Fetch inverter summary data from the Solax Cloud API using the npm package solax-cloud-api. Use when the user provides (or has configured) a Solax tokenId and inverter serial number (sn) and wants current/summary energy data returned as JSON (typed as SolaxSummary) for dashboards/automation.

qwen-audio-lab

3891
from openclaw/skills

Hybrid text-to-speech, reusable voice cloning, and narrated audio generation for macOS plus Aliyun Qwen. Use when the user wants to convert text into speech, clone and reuse a voice from a reference recording, generate narration files from plain text or text files, or create PPT speaker-note voiceovers.

deapi-audio

3891
from openclaw/skills

Text-to-speech, voice cloning, voice design, and transcribe audio files via deAPI GPU network. Trigger on 'text to speech', 'TTS', 'generate voice', 'read aloud', 'voice clone', 'clone voice', 'voice design', 'design voice', 'custom voice', 'transcribe audio', 'STT'. For video/YouTube transcription use deapi-video instead.

Audio Transcription Skill

3891
from openclaw/skills

Auto-transcribe voice messages using faster-whisper (local, no API key needed).