multimodal-gen

多模态内容生成（图片、视频）。当用户需要生成图片、生成图像、生成视频、AI绘画、AI作图、画一张图、做个视频、文生图、文生视频时使用此技能。自动调用 multimodal-agent 进行生成。

33 stars

byaAAaqwq

View on GitHub Installation ↓

Best use case

multimodal-gen is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using multimodal-gen should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/multimodal-gen/SKILL.md --create-dirs "https://raw.githubusercontent.com/aAAaqwq/AGI-Super-Team/main/skills/multimodal-gen/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/multimodal-gen/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How multimodal-gen Compares

Feature / Agent	multimodal-gen	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# 多模态内容生成

当用户需要生成图片或视频时，自动调用 `multimodal-agent` 进行处理。

## 触发场景

### 图片生成
- "生成一张图片"
- "画一张..."
- "AI 作图"
- "文生图"
- "帮我生成图像"
- "用 flux/imagen/dalle 生成"

### 视频生成
- "生成一个视频"
- "做个视频"
- "文生视频"
- "用 veo/sora/kling 生成视频"

## 使用方式

### 自动调用 multimodal-agent

```python
sessions_spawn(
    agentId="multimodal-agent",
    task="生成图片: {用户描述}, 使用 {模型} 模型"
)
```

### 可用模型

#### 图片生成
| 别名 | 模型 | 特点 |
|------|------|------|
| `flux` | flux-pro-max | 高质量，推荐 |
| `imagen` | google/imagen-4-ultra | Google 最强 |
| `dalle` | gpt-image-1 | DALL-E 3 |
| `doubao` | doubao-seedream-4-5 | 中式美学 |
| `klingimg` | kling-image | 可灵生图 |

#### 视频生成
| 别名 | 模型 | 特点 |
|------|------|------|
| `veopro` | veo3.1-pro | Google 专业版 |
| `veo4k` | veo3.1-pro-4k | 4K 高清 |
| `sora` | sora-2-pro-all | OpenAI Sora |
| `kling` | kling-video | 可灵视频 |

## 执行流程

```
用户请求 "生成一张猫咪图片"
    │
    ▼
┌─────────────────────────────────────┐
│ 1. 识别为图片生成请求              │
│ 2. 提取描述: "猫咪"                │
│ 3. 选择默认模型: flux-pro-max      │
└─────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────┐
│ sessions_spawn(                     │
│   agentId="multimodal-agent",       │
│   task="生成图片: 一只可爱的猫咪,   │
│         使用 flux 模型"             │
│ )                                   │
└─────────────────────────────────────┘
    │
    ▼
multimodal-agent 执行生成并返回结果
```

## 示例

### 生成图片
```
用户: 帮我生成一张日落海滩的图片

执行:
sessions_spawn(
    agentId="multimodal-agent",
    task="生成图片: 日落时分的海滩，金色阳光洒在海面上，使用 flux 模型"
)
```

### 生成视频
```
用户: 用 sora 生成一个猫咪玩耍的视频

执行:
sessions_spawn(
    agentId="multimodal-agent",
    task="生成视频: 一只可爱的猫咪在草地上玩耍，使用 sora 模型"
)
```

### 指定模型
```
用户: 用 doubao 画一张中国风山水画

执行:
sessions_spawn(
    agentId="multimodal-agent",
    task="生成图片: 中国风山水画，云雾缭绕的山峰，使用 doubao 模型"
)
```

## 模型选择建议

| 场景 | 推荐模型 |
|------|----------|
| 通用高质量 | `flux` |
| 中式风格 | `doubao` |
| 写实照片 | `imagen` |
| 创意艺术 | `dalle` |
| 高清视频 | `veo4k` |
| 电影级视频 | `sora` |
| 快速视频 | `kling` |

## 注意事项

1. **提示词优化**: multimodal-agent 会自动优化用户的描述
2. **模型选择**: 如果用户没指定，默认使用 flux (图片) 或 veopro (视频)
3. **异步执行**: 视频生成可能需要较长时间，会在后台执行
4. **结果返回**: 生成完成后会自动发送结果给用户

Related Skills

wemp-operator

from aAAaqwq/AGI-Super-Team

> 微信公众号全功能运营——草稿/发布/评论/用户/素材/群发/统计/菜单/二维码 API 封装

Content & Documentation

zsxq-smart-publish

from aAAaqwq/AGI-Super-Team

Publish and manage content on 知识星球 (zsxq.com). Supports talk posts, Q&A, long articles, file sharing, digest/bookmark, homework tasks, and tag management. Use when publishing content to 知识星球, creating/editing posts, uploading files/images/audio, managing digests, batch publishing, or formatting content for 知识星球.

zoom-automation

from aAAaqwq/AGI-Super-Team

Automate Zoom meeting creation, management, recordings, webinars, and participant tracking via Rube MCP (Composio). Always search tools first for current schemas.

zoho-crm-automation

from aAAaqwq/AGI-Super-Team

Automate Zoho CRM tasks via Rube MCP (Composio): create/update records, search contacts, manage leads, and convert leads. Always search tools first for current schemas.

ziliu-publisher

from aAAaqwq/AGI-Super-Team

字流(Ziliu) - AI驱动的多平台内容分发工具。用于一次创作、智能适配排版、一键分发到16+平台（公众号/知乎/小红书/B站/抖音/微博/X等）。当用户需要多平台发布、内容排版、格式适配时使用。触发词：字流、ziliu、多平台发布、一键分发、内容分发、排版发布。

zhihu-post-skill

from aAAaqwq/AGI-Super-Team

> 知乎文章发布——知乎平台内容创作与发布自动化

zendesk-automation

from aAAaqwq/AGI-Super-Team

Automate Zendesk tasks via Rube MCP (Composio): tickets, users, organizations, replies. Always search tools first for current schemas.

youtube-knowledge-extractor

from aAAaqwq/AGI-Super-Team

This skill performs deep analysis of YouTube videos through **both information channels** Multimodal YouTube video analysis through both audio (transcript) and visual (frame extraction + image analysis) channels. Especially powerful for HowTo videos, tutorials, demos, and explainer videos where what is SHOWN (screenshots, UI demos, diagrams, code, physical actions) is just as important as what is SAID. Use this skill whenever a user wants to analyze, summarize, or create step-by-step guides from YouTube videos, or when they share a YouTube URL and want to understand what happens in the video. Triggers on requests like "Analyze this YouTube video", "Create a step-by-step guide from this video", "What does this video show?", "Summarize this tutorial", or any YouTube URL shared with analysis intent.

youtube-factory

from aAAaqwq/AGI-Super-Team

Generate complete YouTube videos from a single prompt - script, voiceover, stock footage, captions, thumbnail. Self-contained, no external modules. 100% free tools.

youtube-automation

from aAAaqwq/AGI-Super-Team

Automate YouTube tasks via Rube MCP (Composio): upload videos, manage playlists, search content, get analytics, and handle comments. Always search tools first for current schemas.

xlsx

from aAAaqwq/AGI-Super-Team

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas

xiaomo-assistant-template

from aAAaqwq/AGI-Super-Team

小a助手配置模板。基于 xiaomo-starter-kit 改编，提供预配置的 OpenClaw 助手框架文件。当用户需要快速配置新助手、设置助手身份、创建助手配置文件时使用此技能。