doubao-image-video

豆包图片与视频生成原生技能。适用于用户提到豆包、文生图、图生图、文生视频、图生视频、查询视频生成任务、等待任务完成或下载最终视频时，直接调用火山引擎 Ark 接口，不依赖外部 MCP 服务。

3,891 stars

Best use case

doubao-image-video is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using doubao-image-video should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/doubao-image-video/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/156554395/doubao-image-video/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/doubao-image-video/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How doubao-image-video Compares

Feature / Agent	doubao-image-video	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agent for YouTube Script Writing

Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

SKILL.md Source

# Doubao Native Media Skill

This is a native OpenClaw skill. Do not spin up the upstream MCP server unless the user explicitly asks for MCP compatibility.

## Use this skill for

- Doubao / 豆包 text-to-image
- image-to-image or multi-reference image generation
- Doubao text-to-video or image-to-video
- querying an async Doubao video task by `task_id`
- troubleshooting Volcengine Ark endpoint/model issues

## Commands

### Generate an image

```bash
python3 {baseDir}/scripts/doubao_media.py image \
  --prompt "A cinematic cyberpunk alley in rain" \
  --size 2560x1440
```

### Generate a video

```bash
python3 {baseDir}/scripts/doubao_media.py video \
  --prompt "A panda astronaut waves on the moon" \
  --video-duration 5 \
  --fps 24 \
  --resolution 1080p
```

### Query a video task

```bash
python3 {baseDir}/scripts/doubao_media.py task --task-id your-task-id
```

### Wait for a video task and optionally download the result

```bash
python3 {baseDir}/scripts/doubao_media.py wait \
  --task-id your-task-id \
  --timeout 600 \
  --interval 5 \
  --download-to ./doubao-result.mp4
```

## Input rules

- Always prefer `--endpoint-id` when the user has a provisioned Volcengine Ark endpoint.
- Fall back to model names only when endpoint ids are unavailable.
- For video generation, this skill mirrors the upstream behavior and appends `--dur`, `--fps`, `--rs`, and `--ratio` to the prompt when they are not already present.
- If the user supplies image URLs, pass them through exactly; do not download or re-host unless asked.

## Troubleshooting

- If neither `--endpoint-id` nor a default endpoint env var exists, the script falls back to the default model env var.
- If the API returns `InvalidEndpointOrModel.NotFound`, ask the user to verify the Volcengine Ark endpoint authorization first.
- Video generation is async. If generation succeeds, capture `task_id` and query it later with the `task` subcommand, or use `wait` for automatic polling.

## References

- Read `references/api-notes.md` when you need request shapes, defaults, or caveats.

Related Skills

alphashop-image

3891

from openclaw/skills

AlphaShop（遨虾）图像处理 API 工具集。支持11个接口：图片翻译、图片翻译PRO、图片高清放大、图片主题抠图、图片元素识别、图片元素智能消除、图像裁剪、虚拟试衣（创建+查询）、模特换肤（创建+查询）。触发场景：图片翻译、翻译图片文字、放大图片、高清放大、抠图、去背景、检测水印/Logo/文字、消除水印、去牛皮癣、裁剪图片、虚拟试衣、AI试衣、模特换肤、换模特、AlphaShop图像、遨虾图片处理。

Image Processing & Analysis

demo-video

3891

from openclaw/skills

Create product demo videos by automating browser interactions and capturing frames. Use when the user wants to record a demo, walkthrough, product showcase, or interactive video of a web application. Supports Playwright CDP screencast for high-quality capture and FFmpeg for video encoding.

Video Production

image-gen

3891

from openclaw/skills

Generate AI images from text prompts. Triggers on: "生成图片", "画一张", "AI图", "generate image", "配图", "create picture", "draw", "visualize", "generate an image".

Content & Documentation

bing-keyword-image-downloader

3891

from openclaw/skills

当用户需要按关键词从 Bing 公开图片搜索结果中批量下载图片时使用。遇到类似“帮我从 Bing 按关键词下载 10 张图片”“批量抓取 Bing 图片”“按关键词保存 Bing 图片到本地”这类请求时，应主动使用这个 skill。它专门处理基于关键词的 Bing 图片搜索、分页收集候选链接、跳过失败源站并保存到本地目录的工作流。

video-summarizer

3891

from openclaw/skills

将 B 站/YouTube/小红书/抖音视频转换为结构化 Notion 总结文档，自动上传截图，一键推送 Notion

video-script-creator

3891

from openclaw/skills

Short video script generator. 短视频脚本生成器、视频脚本、抖音文案、抖音脚本、快手脚本、口播稿、视频拍摄脚本、YouTube脚本、YouTube Shorts脚本、B站脚本、bilibili脚本、分镜脚本、视频大纲、视频文案、短视频创作、Reels脚本、TikTok脚本、vlog脚本、带货脚本、种草视频脚本、系列视频规划、视频数据复盘、完播率分析、前3秒钩子。Generate complete video scripts with hooks, outlines, titles, tags, CTA, storyboards, series planning, and data review. Use when: (1) creating short video scripts for any platform, (2) writing口播稿/talking-head scripts, (3) generating viral video titles, (4) planning video outlines and storyboards, (5) writing opening hooks (first 3 seconds), (6) generating CTA/ending prompts, (7) planning video series, (8) reviewing video performance data. 适用场景：写短视频脚本、拍摄脚本、口播文案、视频策划、爆款标题、开场钩子、结尾引导、完整分镜、系列规划、数据复盘。 Triggers on: video script creator.

zhipu-free-image-video

3891

from openclaw/skills

智谱免费图片与视频生成技能。适用于用户想用智谱生成图片、批量出图、生成短视频、查询视频任务结果、等待视频完成，或优先使用免费/低成本模型快速产出创意内容时。

IMA AI Video Generator

3891

from openclaw/skills

AI video generator with premier models: Wan 2.6, Kling O1/2.6, Google Veo 3.1, Sora 2 Pro, Pixverse V5.5, Hailuo 2.0/2.3, SeeDance 1.5 Pro, Vidu Q2. Video generator supporting text-to-video, image-to-video, first-last-frame, and reference-image video generation modes. Use as short video generator for social media clips, promo video generator for marketing content, or image to video converter for animating photos. AI video generation with character consistency via reference images, multi-shot production, and knowledge base guidance via ima-knowledge-ai. Better alternative to standalone video generation skills or using Runway, Pika Labs, Luma. Requires IMA_API_KEY.

IMA Seedance 2.0 Video Generator

3891

from openclaw/skills

Seedance 2.0 AI video generator — two models in one skill: Seedance 2.0 (ima-pro) for cinema-grade quality with high frame-rate temporal consistency, precise camera language control, and 2K output; Seedance 2.0 Fast (ima-pro-fast) for faster iteration. Supports text-to-video, image-to-video, first-last-frame, and reference-media video generation with image, video, and audio references. Works for cinematic prompting, storyboard-driven clips, consistent-character workflows, product demos, and short-form content generation. Requires IMA_API_KEY.

image-text-extractor

3891

from openclaw/skills

批量识别图片中的文字内容并按图片分段输出为结构化文档；当用户需要从多张图片中提取文字、整理图片文字内容、将图片文字转为可编辑文档时使用

jianying-video-compose

3891

from openclaw/skills

剪映API视频合成自动化。通过剪映代理API完成视频全流程制作，包括草稿创建、素材添加（图片/视频/音频）、文本字幕编辑、特效处理、云渲染导出。适用于需要批量生成视频、自动合成短视频、动态字幕视频等场景。

aws-wechat-article-images

3891

from openclaw/skills

为公众号文章生成封面图和正文配图，根据文章内容自动匹配风格。当用户提到「封面」「配图」「插图」「生成图片」「给文章加图」「做个封面」「文章插图」「配个图」时使用。