video-understand 技能

## 概述

506 stars

Best use case

video-understand 技能 is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

## 概述

Teams using video-understand 技能 should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/video-understand/SKILL.md --create-dirs "https://raw.githubusercontent.com/xiaotianfotos/skills/main/video-understand/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/video-understand/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How video-understand 技能 Compares

Feature / Agent	video-understand 技能	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

## 概述

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# video-understand 技能

## 概述

通过多模态 Omni 模型理解视频/图片内容，用自然语言驱动输出任意所需结果——复刻视频的生成 Prompt、场景描述、音频转录、内容分析等。

当前实现基于 OpenRouter 上的 `xiaomi/mimo-v2-omni`，但工具本身可以改造为任何支持视频/图片输入的多模态模型（不限于 Omni 类型），只需替换模型 ID 和对应的 API 接入方式。

## 核心功能

把视频或图片交给模型，配合你的需求描述，直接生成所需内容：

- 复刻视频的生成 Prompt（用于 AI 视频生成工具）
- 视频内容理解与描述
- 音频转录
- 多媒体对比分析
- 其他自定义需求

## 使用方法

> 前置条件：`export OPENROUTER_API_KEY=your_api_key`

```bash
# 单视频
node scripts/openrouter-mimo-omni.js -p "你的需求" -v video.mp4

# 多图
node scripts/openrouter-mimo-omni.js -p "你的需求" -i img1.jpg -i img2.jpg

# 图片 + 视频混合
node scripts/openrouter-mimo-omni.js -p "你的需求" -i photo.jpg -v video.mp4

# 指定提取帧数
node scripts/openrouter-mimo-omni.js -p "你的需求" -v video.mp4 --frames 15
```

## 注意事项

- 需要安装 `ffmpeg`（用于大视频自动压缩）
- 需要设置 `OPENROUTER_API_KEY`

Related Skills

tutor

506

from xiaotianfotos/skills

一对一辅导老师技能，用于解答数学题，生成HTML讲解文档和带配音的Manim动画视频。核心工作流：数学分析 → HTML可视化 → 分镜脚本 → TTS音频 → 验证更新 → 脚手架 → Manim代码 → 渲染验证触发条件：学生粘贴数学题图片、需要教学视频、需要HTML讲解资料

dlna

506

from xiaotianfotos/skills

Control DLNA MediaRenderer devices. Discover devices and play media URLs on DLNA-compatible TVs, speakers, and media players. Supports default device configuration.

claude-in-tmux

506

from xiaotianfotos/skills

Run Claude Code CLI inside tmux sessions within Docker containers. Use this skill when the user wants to run Claude Code in a tmux session, check running tmux sessions, read logs from tmux sessions, or manage claude.sh wrapper script in OpenClaw agent containers.

demo-video

3891

from openclaw/skills

Create product demo videos by automating browser interactions and capturing frames. Use when the user wants to record a demo, walkthrough, product showcase, or interactive video of a web application. Supports Playwright CDP screencast for high-quality capture and FFmpeg for video encoding.

Video Production

remotion-video-creation

144923

from affaan-m/everything-claude-code

Best practices for Remotion - Video creation in React. 29 domain-specific rules covering 3D, animations, audio, captions, charts, transitions, and more.

understanding-streamlit-architecture

44152

from streamlit/streamlit

Explains Streamlit's internal architecture including backend runtime, frontend rendering, and WebSocket communication. Use when debugging cross-layer issues, understanding how features work end-to-end, planning architectural changes, or onboarding to the codebase. Covers ForwardMsg/BackMsg protocol, script rerun model, element tree, widget state management, and more.

videodb

31392

from sickn33/antigravity-awesome-skills

Video and audio perception, indexing, and editing. Ingest files/URLs/live streams, build visual/spoken indexes, search with timestamps, edit timelines, add overlays/subtitles, generate media, and create real-time alerts.

videodb-skills

31392

from sickn33/antigravity-awesome-skills

Upload, stream, search, edit, transcribe, and generate AI video and audio using the VideoDB SDK.

seek-and-analyze-video

31392

from sickn33/antigravity-awesome-skills

Seek and analyze video content using Memories.ai Large Visual Memory Model for persistent video intelligence

azure-ai-contentunderstanding-py

31392

from sickn33/antigravity-awesome-skills

Azure AI Content Understanding SDK for Python. Use for multimodal content extraction from documents, images, audio, and video.

short-video-script-generator-pro

3891

from openclaw/skills

AI Short Video Script Generator, support TikTok/YouTube Shorts/Instagram Reels, auto generate hook, shots, voiceover, subtitles, BGM, CTA. $0.005 USDT per use.

ai-notes-of-video

3891

from openclaw/skills

The video AI notes tool is provided by Baidu. Based on the video download address provided by the user, it downloads and parses the video, and finally generates AI notes corresponding to the video (a total of three types of notes can be generated: document notes, outline notes, and image-text notes).