multimodal-gen
多模态内容生成(图片、视频)。当用户需要生成图片、生成图像、生成视频、AI绘画、AI作图、画一张图、做个视频、文生图、文生视频时使用此技能。自动调用 multimodal-agent 进行生成。
Best use case
multimodal-gen is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
多模态内容生成(图片、视频)。当用户需要生成图片、生成图像、生成视频、AI绘画、AI作图、画一张图、做个视频、文生图、文生视频时使用此技能。自动调用 multimodal-agent 进行生成。
Teams using multimodal-gen should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/multimodal-gen/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How multimodal-gen Compares
| Feature / Agent | multimodal-gen | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
多模态内容生成(图片、视频)。当用户需要生成图片、生成图像、生成视频、AI绘画、AI作图、画一张图、做个视频、文生图、文生视频时使用此技能。自动调用 multimodal-agent 进行生成。
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# 多模态内容生成
当用户需要生成图片或视频时,自动调用 `multimodal-agent` 进行处理。
## 触发场景
### 图片生成
- "生成一张图片"
- "画一张..."
- "AI 作图"
- "文生图"
- "帮我生成图像"
- "用 flux/imagen/dalle 生成"
### 视频生成
- "生成一个视频"
- "做个视频"
- "文生视频"
- "用 veo/sora/kling 生成视频"
## 使用方式
### 自动调用 multimodal-agent
```python
sessions_spawn(
agentId="multimodal-agent",
task="生成图片: {用户描述}, 使用 {模型} 模型"
)
```
### 可用模型
#### 图片生成
| 别名 | 模型 | 特点 |
|------|------|------|
| `flux` | flux-pro-max | 高质量,推荐 |
| `imagen` | google/imagen-4-ultra | Google 最强 |
| `dalle` | gpt-image-1 | DALL-E 3 |
| `doubao` | doubao-seedream-4-5 | 中式美学 |
| `klingimg` | kling-image | 可灵生图 |
#### 视频生成
| 别名 | 模型 | 特点 |
|------|------|------|
| `veopro` | veo3.1-pro | Google 专业版 |
| `veo4k` | veo3.1-pro-4k | 4K 高清 |
| `sora` | sora-2-pro-all | OpenAI Sora |
| `kling` | kling-video | 可灵视频 |
## 执行流程
```
用户请求 "生成一张猫咪图片"
│
▼
┌─────────────────────────────────────┐
│ 1. 识别为图片生成请求 │
│ 2. 提取描述: "猫咪" │
│ 3. 选择默认模型: flux-pro-max │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ sessions_spawn( │
│ agentId="multimodal-agent", │
│ task="生成图片: 一只可爱的猫咪, │
│ 使用 flux 模型" │
│ ) │
└─────────────────────────────────────┘
│
▼
multimodal-agent 执行生成并返回结果
```
## 示例
### 生成图片
```
用户: 帮我生成一张日落海滩的图片
执行:
sessions_spawn(
agentId="multimodal-agent",
task="生成图片: 日落时分的海滩,金色阳光洒在海面上,使用 flux 模型"
)
```
### 生成视频
```
用户: 用 sora 生成一个猫咪玩耍的视频
执行:
sessions_spawn(
agentId="multimodal-agent",
task="生成视频: 一只可爱的猫咪在草地上玩耍,使用 sora 模型"
)
```
### 指定模型
```
用户: 用 doubao 画一张中国风山水画
执行:
sessions_spawn(
agentId="multimodal-agent",
task="生成图片: 中国风山水画,云雾缭绕的山峰,使用 doubao 模型"
)
```
## 模型选择建议
| 场景 | 推荐模型 |
|------|----------|
| 通用高质量 | `flux` |
| 中式风格 | `doubao` |
| 写实照片 | `imagen` |
| 创意艺术 | `dalle` |
| 高清视频 | `veo4k` |
| 电影级视频 | `sora` |
| 快速视频 | `kling` |
## 注意事项
1. **提示词优化**: multimodal-agent 会自动优化用户的描述
2. **模型选择**: 如果用户没指定,默认使用 flux (图片) 或 veopro (视频)
3. **异步执行**: 视频生成可能需要较长时间,会在后台执行
4. **结果返回**: 生成完成后会自动发送结果给用户Related Skills
wemp-operator
> 微信公众号全功能运营——草稿/发布/评论/用户/素材/群发/统计/菜单/二维码 API 封装
zsxq-smart-publish
Publish and manage content on 知识星球 (zsxq.com). Supports talk posts, Q&A, long articles, file sharing, digest/bookmark, homework tasks, and tag management. Use when publishing content to 知识星球, creating/editing posts, uploading files/images/audio, managing digests, batch publishing, or formatting content for 知识星球.
zoom-automation
Automate Zoom meeting creation, management, recordings, webinars, and participant tracking via Rube MCP (Composio). Always search tools first for current schemas.
zoho-crm-automation
Automate Zoho CRM tasks via Rube MCP (Composio): create/update records, search contacts, manage leads, and convert leads. Always search tools first for current schemas.
ziliu-publisher
字流(Ziliu) - AI驱动的多平台内容分发工具。用于一次创作、智能适配排版、一键分发到16+平台(公众号/知乎/小红书/B站/抖音/微博/X等)。当用户需要多平台发布、内容排版、格式适配时使用。触发词:字流、ziliu、多平台发布、一键分发、内容分发、排版发布。
zhihu-post-skill
> 知乎文章发布——知乎平台内容创作与发布自动化
zendesk-automation
Automate Zendesk tasks via Rube MCP (Composio): tickets, users, organizations, replies. Always search tools first for current schemas.
youtube-knowledge-extractor
This skill performs deep analysis of YouTube videos through **both information channels** Multimodal YouTube video analysis through both audio (transcript) and visual (frame extraction + image analysis) channels. Especially powerful for HowTo videos, tutorials, demos, and explainer videos where what is SHOWN (screenshots, UI demos, diagrams, code, physical actions) is just as important as what is SAID. Use this skill whenever a user wants to analyze, summarize, or create step-by-step guides from YouTube videos, or when they share a YouTube URL and want to understand what happens in the video. Triggers on requests like "Analyze this YouTube video", "Create a step-by-step guide from this video", "What does this video show?", "Summarize this tutorial", or any YouTube URL shared with analysis intent.
youtube-factory
Generate complete YouTube videos from a single prompt - script, voiceover, stock footage, captions, thumbnail. Self-contained, no external modules. 100% free tools.
youtube-automation
Automate YouTube tasks via Rube MCP (Composio): upload videos, manage playlists, search content, get analytics, and handle comments. Always search tools first for current schemas.
xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
xiaomo-assistant-template
小a助手配置模板。基于 xiaomo-starter-kit 改编,提供预配置的 OpenClaw 助手框架文件。当用户需要快速配置新助手、设置助手身份、创建助手配置文件时使用此技能。