video-summarizer

将 B 站/YouTube/小红书/抖音视频转换为结构化 Notion 总结文档，自动上传截图，一键推送 Notion

3,891 stars

byopenclaw

View on GitHub Installation ↓

Best use case

video-summarizer is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

将 B 站/YouTube/小红书/抖音视频转换为结构化 Notion 总结文档，自动上传截图，一键推送 Notion

Teams using video-summarizer should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/video-summarizer/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/ajayhao/video-summarizer/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/video-summarizer/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How video-summarizer Compares

Feature / Agent	video-summarizer	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

将 B 站/YouTube/小红书/抖音视频转换为结构化 Notion 总结文档，自动上传截图，一键推送 Notion

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agent for YouTube Script Writing

Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

SKILL.md Source

# Video Summarizer — OpenClaw Skill

将 B 站/YouTube/小红书/抖音视频转换为结构化 Notion 总结文档，自动上传截图，一键推送 Notion。

**版本**: 1.0.7  
**发布**: 2026-04-06  
**许可**: MIT  
**作者**: Ajay Hao

---

## 📖 技能描述

### 核心能力

- 🎬 **多平台支持**: B 站、YouTube、小红书、抖音
- 📝 **智能分析**: AI 提取关键概念、核心要点、注意事项
- 📸 **截图嵌入**: 基于 AI 分析结果自动生成关键帧截图
- ☁️ **图床集成**: 阿里云 OSS 自动上传，永久链接
- 🚀 **一键推送**: 自动推送 Notion 数据库

### 技术特性

- **双模式转录**: Plan A（官方字幕）优先，Plan B（语音转录）兜底
- **并行优化**: 字幕下载与视频下载并行执行，节省 32% 时间
- **GPU 自适应**: 自动检测显存，选择最优 Whisper 模型
- **断点续跑**: 支持从中断点恢复，避免重复处理
- **四层标签**: 标题 hashtag → 元数据 → AI 关键词 → 默认值

---

## 🎯 平台支持详情

### Bilibili（完整支持）

**字幕**: ✅ 官方字幕 + 自动字幕  
**语音转录**: ✅ 支持（Plan B）  
**Cookies**: 推荐（获取官方字幕）  
**下载工具**: yt-dlp (>=2026.03.17)

#### 操作步骤

```bash
# 1. 扫码登录（首次使用，获取官方字幕）
cd ~/.openclaw/skills/video-summarizer/scripts
./bili-login.sh

# 2. 处理视频
./video-summarize.sh "https://www.bilibili.com/video/BV1xxxx"

# 3. 查看结果
cat /tmp/video-summarizer-*/summary.md
```

**说明**:
- Cookies 文件：`~/.cookies/bilibili_cookies.txt`
- 无 Cookies 时使用 Plan B 语音转录
- 支持 b23.tv 短链

---

### YouTube（完整支持）

**字幕**: ✅ 自动字幕（多语言）  
**语音转录**: ✅ 支持（Plan B）  
**Cookies**: ❌ 不需要  
**下载工具**: yt-dlp (>=2026.03.17)

#### 操作步骤

```bash
# 直接处理（无需登录）
./video-summarize.sh "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# 指定输出目录
./video-summarize.sh "https://youtu.be/dQw4w9WgXcQ" /tmp/output
```

**说明**:
- 需网络可达（可能需要代理）
- 优先下载英文字幕，无则用语音转录
- 支持 youtu.be 短链

---

### 小红书（基本支持）

**字幕**: ❌ 无字幕  
**语音转录**: ✅ 唯一方式（Plan B）  
**Cookies**: ❌ 不需要  
**下载工具**: yt-dlp (>=2026.03.17)

#### 操作步骤

```bash
# 直接处理（自动使用 Plan B 语音转录）
./video-summarize.sh "https://www.xiaohongshu.com/explore/xxxx"

# 或短链
./video-summarize.sh "https://xhslink.com/o/xxxx"
```

**说明**:
- 必须使用 Plan B 语音转录
- 推荐配置 GROQ_API_KEY 加速转录
- 自动上传封面图到 OSS

---

### 抖音（完整支持）

**字幕**: ❌ 无字幕  
**语音转录**: ✅ 唯一方式（Plan B）  
**Cookies**: ❌ 不需要（专用下载器）  
**下载工具**: douyin_downloader.py

#### 操作步骤

```bash
# 直接处理（专用下载器，无需 Cookies）
./video-summarize.sh "https://www.douyin.com/video/7234567890"

# 支持短链
./video-summarize.sh "https://v.douyin.com/abc123/"
```

**说明**:
- 使用专用下载器 `douyin_downloader.py`
- 无反爬限制，无需 Cookies
- 必须使用 Plan B 语音转录

---

## ⚙️ 配置详解

### 环境变量（~/.openclaw/.env）

```bash
# ========== 必需配置 ==========

# 阿里云 OSS 图床
ALIYUN_OSS_AK=your_access_key_id
ALIYUN_OSS_SK=your_access_key_secret
ALIYUN_OSS_BUCKET_ID=your_bucket_name
ALIYUN_OSS_ENDPOINT=oss-cn-shanghai.aliyuncs.com

# AI 分析（DashScope API）
DASHSCOPE_API_KEY=sk-xxxxxxxxxxxxxxxx

# ========== 可选配置 ==========

# Notion 自动推送
NOTION_API_KEY=nop_xxxxxxxxxxxxxxxx
NOTION_VIDEO_SUMMARY_DATABASE_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

# 语音转录加速（无 GPU 时推荐）
GROQ_API_KEY=gsk_xxxxxxxxxxxxxxxx

# 硅基流动（备用转录）
SILICONFLOW_API_KEY=sk-xxxxxxxxxxxxxxxx

# 本地 Whisper 模型（有 GPU 时自动检测）
WHISPER_MODEL=base  # tiny/base/small/medium/large
USE_LOCAL_WHISPER=false  # true 强制使用本地 Whisper
```

### OSS Bucket 要求

- **访问权限**: 公开可读（直接 URL 访问）
- **CORS 配置**: 允许跨域访问（Notion 嵌入需要）
- **存储类型**: 标准存储（低频访问会影响加载速度）

### Notion 数据库配置

#### 数据库属性（Properties）

| 属性名 | 类型 | 说明 | 来源 |
|--------|------|------|------|
| **Title** | `title` | 视频标题（≤200 字符） | Markdown `# 标题` |
| **Source** | `rich_text` | 平台来源（Bilibili/YouTube/小红书/抖音） | metadata.platform / URL 推断 |
| **Author** | `rich_text` | UP 主/作者名称 | Markdown `**UP 主:**` / metadata.uploader |
| **Url** | `url` | 视频原始链接 | metadata.webpage_url / Markdown `**链接:**` |
| **Tags** | `multi_select` | 标签（最多 5 个） | 三层策略提取 |
| **PubDate** | `date` | 发布日期 | metadata.upload_date |
| **Length** | `rich_text` | 视频时长（MM:SS 格式） | metadata.duration_string |
| **Cover** | `files` | 封面图片（可选，外部 URL） | metadata.thumbnail |
| **ts** | `date` | 创建时间戳（ISO 8601，东八区 +08:00） | 当前时间 |

#### 配置步骤

1. **创建数据库**（Table 视图）

2. **添加上述 9 个属性**（字段名必须完全匹配）

3. **获取 Database ID**：
   - 打开数据库 → 复制 URL 中 `?v=` 后的 ID
   - 示例：`https://notion.so/your-workspace/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx?v=yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy`
   - Database ID = `yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy`

4. **配置环境变量**：
   ```bash
   NOTION_API_KEY=nop_xxxxxxxxxxxxxxxx
   NOTION_VIDEO_SUMMARY_DATABASE_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
   ```

#### 数据库视图示例

```
┌─────────┬──────────┬─────────┬──────┬──────┬─────────┬────────┬───────┬─────────────┐
│ Title   │ Source   │ Author  │ Url  │ Tags │ PubDate │ Length │ Cover │ ts          │
│ title   │ text     │ text    │ url  │ multi│ date    │ text   │ files │ date        │
└─────────┴──────────┴─────────┴──────┴──────┴─────────┴────────┴───────┴─────────────┘
```

---

## 🏗️ 系统架构

### 处理流程

```
用户输入 (视频 URL)
       ↓
Step 1: 平台识别 + 元数据
       ↓
┌──────┴──────┐
↓             ↓
Step 2: 字幕   Step 3: 视频下载  ← 并行执行
       ↓             ↓
       └──────┬──────┘
              ↓
Step 4: 文本提取 (VTT → TXT / Plan B 转录)
       ↓
Step 5: AI 分析 (DashScope API - qwen3.5-plus)
       ↓
Step 6: 截图生成 (ffmpeg, 基于 AI 时间戳)
       ↓
Step 7: OSS 上传 (截图 + 封面)
       ↓
Step 8: Markdown 渲染
       ↓
Step 9: Notion 推送 (可选)
```

### 技术栈

| 层级 | 技术 |
|------|------|
| 编排层 | Bash (video-summarize.sh) |
| 分析层 | Python + DashScope API (qwen3.5-plus) |
| 转录层 | Faster-Whisper (本地) / Groq API / 硅基流动 |
| 工具层 | yt-dlp (>=2026.03.17), ffmpeg (>=6.1), oss2, requests |

### 数据文件

| 文件 | 来源 | 用途 |
|------|------|------|
| `metadata.json` | yt-dlp / douyin_downloader | 视频元数据 |
| `transcript.txt` | VTT 提取 / Plan B 转录 | 纯文本字幕 |
| `ai_result.json` | analyze-subtitles-ai.py | AI 分析结果 |
| `screenshot_urls.txt` | upload-to-oss.py | 截图 OSS 链接 |
| `summary.md` | analyze-subtitles-ai.py | 最终总结 |

---

## 📋 脚本清单

### 核心脚本（4 个）

| 脚本 | 功能 |
|------|------|
| `video-summarize.sh` | 主流程编排（Plan A/B 自动选择） |
| `analyze-subtitles-ai.py` | AI 分析 + Markdown 渲染 |
| `upload-to-oss.py` | OSS 图床上传（截图 + 封面） |
| `push-to-notion.py` | Notion 推送 |

### 平台工具（2 个）

| 脚本 | 功能 |
|------|------|
| `bili-login.sh` | B 站扫码登录（获取 Cookies） |
| `douyin_downloader.py` | 抖音专用下载器（无需 Cookies） |

### 辅助工具（4 个）

| 脚本 | 功能 |
|------|------|
| `download-audio.sh` | Plan B: 音频下载 |
| `transcribe-audio.py` | Plan B: 语音转录（GPU 自适应） |
| `check-config.sh` | 配置检查 |
| `convert-bili-cookie.py` | Cookies 格式转换 |

---

## 🎯 Plan A vs Plan B

### 对比表

| 项目 | Plan A | Plan B |
|------|--------|--------|
| **字幕来源** | 平台官方字幕 | 语音转录 |
| **准确率** | 90%+ | 80-90% |
| **速度** | 快 (1-2 分钟) | 较慢 (3-5 分钟) |
| **依赖** | Cookies（B 站） | GPU 或 API Key |

### 各平台使用情况

| 平台 | Plan A | Plan B | 默认 |
|------|--------|--------|------|
| **Bilibili** | ✅ 官方 + 自动 | ✅ 备用 | Plan A |
| **YouTube** | ✅ 自动字幕 | ✅ 备用 | Plan A |
| **小红书** | ❌ 无 | ✅ 唯一 | Plan B |
| **抖音** | ❌ 无 | ✅ 唯一 | Plan B |

### Plan B 四层降级方案

```
1. Faster-Whisper (本地) → GPU/CPU 自适应
   ├─ GPU ≥8GB  → large-v2 模型
   ├─ GPU ≥4GB  → medium 模型
   ├─ GPU ≥2GB  → small 模型
   └─ 无 GPU    → base 模型 (CPU)

2. Groq API (whisper-large-v3) → 云端高速

3. 硅基流动 (FunAudioLLM/SenseVoiceSmall) → 备选云端

4. Whisper.cpp / OpenAI Whisper → 保底方案
```

---

## 📁 输出文件结构

```
output/
├── summary.md              # 📝 最终总结（主要成果）
├── screenshot_urls.txt     # 🔗 截图 OSS 链接
├── metadata.json           # 📊 视频元数据
├── transcript.txt          # 📄 纯文本字幕
├── screenshots/            # 📸 截图原图（本地备份）
├── cover.jpg              # 🖼️ 封面图（本地备份）
└── *.log                   # 📋 日志文件（verbose 模式）
```

### OSS 路径规范

**截图路径**: `/screenshots/<平台>/<视频 ID>_<时间戳>/<截图文件>`

```
screenshots/bilibili/BV1eTPEzNEqf_20260405_053203/screenshot_01.jpg
screenshots/douyin/7234567890_20260405_175010/chapter_01.jpg
screenshots/xhs/69c1493b000000002003b3ce_20260405_152852/screenshot_01.jpg
screenshots/youtube/dQw4w9WgXcQ_20260405_120000/screenshot_01.jpg
```

**封面路径**: `thumbnails/<平台>/<视频 ID>/cover.jpg`

```
thumbnails/bilibili/BV1eTPEzNEqf/cover.jpg
thumbnails/douyin/7234567890/cover.jpg
```

---

## 🏷️ 标签策略

### 四层提取策略

```
1. 标题 hashtag → #([\w\u4e00-\u9fa5]+)
   示例："#AI 教程 #大模型原理" → ["AI 教程", "大模型原理"]

2. 元数据 tags → yt-dlp 提取的原始标签
   示例：["原理", "AI", "教程", "claude", "大模型"]

3. AI 关键词 → AI 分析提取的核心概念
   示例：["Transformer", "注意力机制", "深度学习"]

4. 默认值 → 视频总结/知识分享/学习
   当前三层都为空时使用
```

### 标签规则

- **长度**: 2-15 字符（兼容英文如 "openclaw"）
- **数量**: 最多 5 个
- **去重**: 自动去重，保留唯一值
- **优先级**: 1 → 2 → 3 → 4

---

## 🔧 故障排查

### Cookies 过期（B 站）

```bash
# 重新扫码登录
./bili-login.sh
```

### 配置检查

```bash
# 运行检查脚本
./check-config.sh
```

### 查看详细日志

```bash
# 使用 verbose 模式
./video-summarize.sh "URL" --verbose

# 查看错误日志
cat /tmp/output/error.log
```

### 常见问题

| 问题 | 原因 | 解决方案 |
|------|------|----------|
| 截图 404 | OSS 路径不匹配 | `python3 upload-to-oss.py auto /tmp/output` |
| 标签默认值 | 标签提取失败 | 检查标题 hashtag 格式 `#标签` |
| 转录失败 | 无 GPU/API 配额 | 检查 `GROQ_API_KEY` |
| Notion 推送失败 | API Key 过期 | 更新 `NOTION_API_KEY` |
| 并行任务失败 | 依赖缺失 | 检查 `ffmpeg` / `yt-dlp` 安装（版本要求：ffmpeg >= 6.1, yt-dlp >= 2026.03.17） |
| 抖音下载失败 | 链接格式错误 | 使用完整 URL 或 v.douyin.com 短链 |

---

## 📊 性能基准

### 处理时间（10 分钟视频）

| 平台 | Plan A | Plan B（本地） | Plan B（Groq） |
|------|--------|----------------|----------------|
| **Bilibili** | ~90 秒 | ~150 秒 | ~120 秒 |
| **YouTube** | ~90 秒 | ~150 秒 | ~120 秒 |
| **小红书** | - | ~150 秒 | ~120 秒 |
| **抖音** | - | ~150 秒 | ~120 秒 |

**优化效果**: 并行优化后节省约 30 秒（32%↓）

---

## 📝 输出格式（summary.md）

1. **标题 + Tags + Author**
2. **📝 Note** — AI 概述（150-250 字）
3. **📺 视频信息** — 链接/时长/播放数据
4. **📚 关键概念** — 术语表格（3-5 个，按时间排序）
5. **🎯 核心要点** — emoji+ 描述 + 时间戳（5-8 个）
6. **🎬 视频章节** — 标题 + 时间轴 + 截图
7. **⚠️ 注意事项** — 特别提醒（2-4 个）
8. **💡 总结** — AI 归纳（200-300 字）

---

## 🔜 后续优化

### 计划中

- [ ] 代码重构（提取公共函数）
- [ ] 单元测试（核心函数覆盖率 80%+）
- [ ] 性能优化（截图并行上传、结果缓存）
- [ ] 支持更多平台（TikTok、Instagram Reels）

---

## 📞 更多文档

- **快速入门**: [README.md](README.md) - 5 分钟上手
- **变更历史**: [CHANGELOG.md](CHANGELOG.md) - 版本演进
- **提示词配置**: [prompt.json](prompt.json) - AI 分析参数

---

**维护人**: Ajay Hao  
**项目地址**: https://github.com/AjayHao/video-summarizer  
**OpenClaw Skill**: 已发布到 clawdhub

Related Skills

📝 智能摘要助手 (Smart Summarizer)

3891

from openclaw/skills

Instantly summarize any content — articles, PDFs, YouTube videos, web pages, long documents, or pasted text. Extracts key points, action items, and insights. Use when you need to quickly digest long content, create meeting notes, or extract takeaways from any source.

Workflow & Productivity

demo-video

3891

from openclaw/skills

Create product demo videos by automating browser interactions and capturing frames. Use when the user wants to record a demo, walkthrough, product showcase, or interactive video of a web application. Supports Playwright CDP screencast for high-quality capture and FFmpeg for video encoding.

Video Production

video-script-creator

3891

from openclaw/skills

Short video script generator. 短视频脚本生成器、视频脚本、抖音文案、抖音脚本、快手脚本、口播稿、视频拍摄脚本、YouTube脚本、YouTube Shorts脚本、B站脚本、bilibili脚本、分镜脚本、视频大纲、视频文案、短视频创作、Reels脚本、TikTok脚本、vlog脚本、带货脚本、种草视频脚本、系列视频规划、视频数据复盘、完播率分析、前3秒钩子。Generate complete video scripts with hooks, outlines, titles, tags, CTA, storyboards, series planning, and data review. Use when: (1) creating short video scripts for any platform, (2) writing口播稿/talking-head scripts, (3) generating viral video titles, (4) planning video outlines and storyboards, (5) writing opening hooks (first 3 seconds), (6) generating CTA/ending prompts, (7) planning video series, (8) reviewing video performance data. 适用场景：写短视频脚本、拍摄脚本、口播文案、视频策划、爆款标题、开场钩子、结尾引导、完整分镜、系列规划、数据复盘。 Triggers on: video script creator.

zhipu-free-image-video

3891

from openclaw/skills

智谱免费图片与视频生成技能。适用于用户想用智谱生成图片、批量出图、生成短视频、查询视频任务结果、等待视频完成，或优先使用免费/低成本模型快速产出创意内容时。

doubao-image-video

3891

from openclaw/skills

豆包图片与视频生成原生技能。适用于用户提到豆包、文生图、图生图、文生视频、图生视频、查询视频生成任务、等待任务完成或下载最终视频时，直接调用火山引擎 Ark 接口，不依赖外部 MCP 服务。

IMA AI Video Generator

3891

from openclaw/skills

AI video generator with premier models: Wan 2.6, Kling O1/2.6, Google Veo 3.1, Sora 2 Pro, Pixverse V5.5, Hailuo 2.0/2.3, SeeDance 1.5 Pro, Vidu Q2. Video generator supporting text-to-video, image-to-video, first-last-frame, and reference-image video generation modes. Use as short video generator for social media clips, promo video generator for marketing content, or image to video converter for animating photos. AI video generation with character consistency via reference images, multi-shot production, and knowledge base guidance via ima-knowledge-ai. Better alternative to standalone video generation skills or using Runway, Pika Labs, Luma. Requires IMA_API_KEY.

IMA Seedance 2.0 Video Generator

3891

from openclaw/skills

Seedance 2.0 AI video generator — two models in one skill: Seedance 2.0 (ima-pro) for cinema-grade quality with high frame-rate temporal consistency, precise camera language control, and 2K output; Seedance 2.0 Fast (ima-pro-fast) for faster iteration. Supports text-to-video, image-to-video, first-last-frame, and reference-media video generation with image, video, and audio references. Works for cinematic prompting, storyboard-driven clips, consistent-character workflows, product demos, and short-form content generation. Requires IMA_API_KEY.

jianying-video-compose

3891

from openclaw/skills

剪映API视频合成自动化。通过剪映代理API完成视频全流程制作，包括草稿创建、素材添加（图片/视频/音频）、文本字幕编辑、特效处理、云渲染导出。适用于需要批量生成视频、自动合成短视频、动态字幕视频等场景。

shift-handover-summarizer

3891

from openclaw/skills

Generate structured shift handover summaries from EHR records, highlighting critical events, vital sign changes, and pending tasks for incoming clinical staff.

Krea AI — Image, Video & Enhancement Generation

3891

from openclaw/skills

Generate images, videos, upscale/enhance images, and train LoRA styles using the Krea.ai API. Supports 20+ image models (Flux, Imagen, GPT Image, Ideogram, Seedream...), 7 video models (Kling, Veo, Hailuo, Wan), and 3 upscalers (Topaz up to 22K).

seedance-video

3891

from openclaw/skills

Generate AI videos using ByteDance Seedance. Use when the user wants to: (1) generate videos from text prompts, (2) generate videos from images (first frame, first+last frame, reference images), or (3) query/manage video generation tasks. Supports Seedance 1.5 Pro (with audio), 1.0 Pro, 1.0 Pro Fast, and 1.0 Lite models.

ib-summarizer

3891

from openclaw/skills

Summarize core safety information from Investigator's Brochures for clinical researchers