Image Generation

AI图像生成与编辑能力，基于 Nano Banana (Gemini Image) 实现文生图、图生图、图像编辑。适用于创意设计、营销素材、社交媒体内容、演示文稿配图等场景。支持多种风格、高分辨率输出（最高4K）、文字渲染、角色一致性保持。

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

Image Generation is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using Image Generation should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/image_generation/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/content-media/image_generation/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/image_generation/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How Image Generation Compares

Feature / Agent	Image Generation	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

## 能力概述

AI图像生成能力让你能够：
- **文生图**：根据文字描述生成图像
- **图生图**：基于参考图像生成新图像
- **图像编辑**：修改现有图像的特定部分
- **风格转换**：改变图像风格（写实、动漫、油画等）
- **文字渲染**：在图像中生成清晰可读的文字

底层基于 Google Gemini 的 Nano Banana / Nano Banana Pro 模型。

## 工作流程

### Phase 1: 需求理解
1. 理解用户的图像需求（主题、风格、用途）
2. 确认输出格式（尺寸、分辨率、数量）
3. 如有参考图，确认编辑意图

### Phase 2: Prompt 构建
1. 将用户意图转化为英文 Prompt（效果更好）
2. 遵循 Prompt 公式：`<subject> <action> <scene> <style> <quality>`
3. 补充必要的细节描述

### Phase 3: 图像生成
1. 调用 `generate_image` 工具
2. 如需编辑，调用 `edit_image` 工具
3. 生成多个候选（如用户需要选择）

### Phase 4: 交付
1. 展示生成结果
2. 询问是否需要调整
3. 保存到用户指定位置

## 工具使用

### generate_image
- **用途**：根据文字描述生成图像
- **参数**：
  - `prompt`: 图像描述（英文效果更佳）
  - `style`: 风格预设（realistic, anime, oil_painting, watercolor, minimal, cinematic）
  - `aspect_ratio`: 宽高比（1:1, 16:9, 9:16, 4:3, 3:4）
  - `resolution`: 分辨率（1K, 2K, 4K）
  - `num_images`: 生成数量（1-4）
- **示例**：
  ```python
  generate_image(
      prompt="A majestic horse galloping through cherry blossoms, golden hour lighting, Chinese New Year festive atmosphere",
      style="realistic",
      aspect_ratio="16:9",
      resolution="2K",
      num_images=2
  )
  ```

### edit_image
- **用途**：编辑现有图像
- **参数**：
  - `image_path`: 原图路径或URL
  - `prompt`: 编辑指令（如："将背景改为夜景"）
  - `preserve_subject`: 是否保持主体不变（默认True）
- **示例**：
  ```python
  edit_image(
      image_path="/workspace/photo.jpg",
      prompt="Add Chinese New Year decorations and red lanterns to the background",
      preserve_subject=True
  )
  ```

## Prompt 最佳实践

### 基础公式
```
[主体] + [动作/姿态] + [场景/背景] + [风格] + [氛围/光线]
```

### 风格关键词
- **写实**：photorealistic, hyperrealistic, 8K, detailed
- **动漫**：anime style, Ghibli style, cel shading
- **油画**：oil painting style, impressionist, Van Gogh style
- **极简**：minimal, flat design, vector art
- **电影感**：cinematic, dramatic lighting, movie poster style

### 质量增强词
- `high quality`, `detailed`, `sharp focus`
- `professional photography`, `award winning`
- `4K resolution`, `ultra detailed`

### 避免事项
- ❌ 避免模糊描述："一张好看的图"
- ❌ 避免矛盾描述："写实风格的卡通"
- ❌ 避免敏感内容
- ✅ 具体、清晰、有层次

## 应用场景模板

### 场景1：微信红包封面/节日祝福图
```yaml
prompt_template: |
  A {animal} in {pose}, surrounded by {decorations}, 
  Chinese New Year theme, festive red and gold colors, 
  {style} style, high quality, {text_content}
  
variables:
  animal: "majestic horse" # 马年
  pose: "running gracefully"
  decorations: "cherry blossoms, red lanterns, gold coins"
  style: "elegant illustration"
  text_content: "with Chinese text '恭喜发财' in golden calligraphy"
```

### 场景2：演示文稿配图
```yaml
prompt_template: |
  {concept} visualization, professional infographic style,
  clean white background, modern corporate aesthetic,
  subtle gradients, minimalist design

variables:
  concept: "AI workflow automation"
```

### 场景3：社交媒体内容
```yaml
prompt_template: |
  {subject} {action}, {platform} optimized aspect ratio,
  vibrant colors, eye-catching composition, 
  trending aesthetic, shareable content style
  
variables:
  subject: "coffee cup"
  action: "with steam rising"
  platform: "Instagram" # 1:1 or 4:5
```

## 输出格式

### 生成结果展示
```markdown
## 🎨 图像生成完成

**Prompt**: [使用的英文Prompt]

**参数**:
- 风格: [style]
- 尺寸: [aspect_ratio]
- 分辨率: [resolution]

**生成结果**:
![Generated Image](path/to/image.png)

**下一步**:
- [ ] 满意，保存到指定位置
- [ ] 需要调整风格/颜色
- [ ] 需要修改特定部分
- [ ] 重新生成
```

## 注意事项

1. **版权合规**：生成的图像带有 SynthID 水印
2. **内容政策**：遵守 Google 使用政策，不生成敏感内容
3. **商业使用**：支持商业用途（营销、产品）
4. **文字渲染**：Nano Banana Pro 支持多语言文字，但中文效果需要验证
5. **角色一致性**：跨图保持角色特征需要使用参考图功能

## 资源引用

- `resources/prompt_templates.yaml` - 预设 Prompt 模板
- `resources/style_presets.md` - 风格预设详解
- `resources/chinese_new_year_2026.md` - 马年专属模板

Related Skills

generational-agent-succession

from diegosouzapw/awesome-omni-skill

Parallel agent swarms with generational succession. Combines agent-architect's multi-agent parallelism with automatic succession when agents degrade. Each parallel agent gets fresh context through controlled handoffs while maintaining accumulated wisdom.

all-images-ai-automation

from diegosouzapw/awesome-omni-skill

Automate All Images AI tasks via Rube MCP (Composio). Always search tools first for current schemas.

ai-image-generator

from diegosouzapw/awesome-omni-skill

使用 ModelScope 等平台生成 AI 图像。当用户需要生成图像、设计图标、创建角色立绘，或需要帮助编写 AI 绘画提示词时使用此技能。支持直接生成图像和仅优化提示词两种模式。

xhs-images

from diegosouzapw/awesome-omni-skill

Xiaohongshu (Little Red Book) infographic series generator with multiple style options. Breaks down content into 1-10 cartoon-style infographics. Use when user asks to create "小红书图片", "XHS images", or "RedNote infographics".

x-image-cards

from diegosouzapw/awesome-omni-skill

Create X/Twitter cards that look like images, not marketing banners. Use when asked to "create OG images", "set up X cards", "make social cards", or "twitter card without text".

wiro-image-fill

from diegosouzapw/awesome-omni-skill

Generate missing or placeholder images in a project by calling the Wiro image generation API, saving assets under public/assets generated folders, and producing a JSON mapping. Use when you see empty img src, placeholder.png, or other image gaps that need real assets.

seedream-image-generator

from diegosouzapw/awesome-omni-skill

Generate images using the Doubao SeeDream API based on text prompts. Use this skill when users request AI-generated images, artwork, illustrations, or visual content creation. The skill handles API calls, downloads generated images to the project's /pic folder, and supports batch generation of up to 4 sequential images.

placeholder-images

from diegosouzapw/awesome-omni-skill

Rule to use placekitten.com for placeholder images in seed data.

og-image-generator

from diegosouzapw/awesome-omni-skill

Generate and optimize Open Graph meta images for social media sharing. Use this skill when building web applications that need dynamic OG image generation with support for Vercel's @vercel/og library, pre-generated image storage, and social media optimization (Twitter Cards, Facebook, LinkedIn). Handles dynamic routes, performance optimization, and includes best practices for crawler compatibility and testing.

nanobanana-image

from diegosouzapw/awesome-omni-skill

Nano Banana (Google Gemini API) を使って画像を生成・編集するスキル。「画像を生成して」「イラストを作って」「○○の絵を描いて」「画像を作成」「この画像を編集して」「この画像をもとに○○を作って」「generate an image」「create a picture」「edit this image」などの依頼があった場合に使用。テキストからの生成、参照画像からの生成、画像編集、Google検索グラウンディングによる最新情報を反映した画像生成に対応。「最新の○○」「トレンドを反映」「リアルタイム情報」といった依頼にも対応可能。

nano-image-generate

from diegosouzapw/awesome-omni-skill

Generate images using Nano Banana (Flash) or Nano Banana Pro. Use 'flash' for speed/efficiency and 'pro' for high quality, text rendering, and complex prompt adherence. Triggers include 'generate image', 'create logo', 'fast image', 'high quality image'.

media-generation

from diegosouzapw/awesome-omni-skill

Generate images, videos, and audio using Google's Gemini APIs. Use for image generation/editing (Gemini 3 Pro Image), video generation (Veo 3), and speech (TBD). Trigger words - images: generate, create, draw, design, make, edit, modify image/picture. Video: generate video, create video, animate, make a video. Supports text-to-image, image-to-image editing, text-to-video, and image-to-video.