minimax-imagegen

Expert image generation skill using MiniMax image-01. Use this skill ANY TIME the user asks to create, generate, make, or produce an image, visual, graphic, banner, illustration, icon, screenshot mockup, hero image, thumbnail, social media asset, app icon, website visual, or any other image — even if they just say "make me a picture of X." This skill should also trigger when the user asks to improve or iterate on a previous image prompt, or when image output would enhance a task (e.g., "I need a hero image for my blog post"). Covers all use cases: website assets for tonyreviewsthings.com and tonysimons.dev, app/software media, marketing visuals, social media content, UI mockups, character/portrait generation, and general creative requests.

3,891 stars

Best use case

minimax-imagegen is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Expert image generation skill using MiniMax image-01. Use this skill ANY TIME the user asks to create, generate, make, or produce an image, visual, graphic, banner, illustration, icon, screenshot mockup, hero image, thumbnail, social media asset, app icon, website visual, or any other image — even if they just say "make me a picture of X." This skill should also trigger when the user asks to improve or iterate on a previous image prompt, or when image output would enhance a task (e.g., "I need a hero image for my blog post"). Covers all use cases: website assets for tonyreviewsthings.com and tonysimons.dev, app/software media, marketing visuals, social media content, UI mockups, character/portrait generation, and general creative requests.

Teams using minimax-imagegen should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/minimax-imagegen/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/asimons81/minimax-imagegen/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/minimax-imagegen/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How minimax-imagegen Compares

Feature / Agentminimax-imagegenStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Expert image generation skill using MiniMax image-01. Use this skill ANY TIME the user asks to create, generate, make, or produce an image, visual, graphic, banner, illustration, icon, screenshot mockup, hero image, thumbnail, social media asset, app icon, website visual, or any other image — even if they just say "make me a picture of X." This skill should also trigger when the user asks to improve or iterate on a previous image prompt, or when image output would enhance a task (e.g., "I need a hero image for my blog post"). Covers all use cases: website assets for tonyreviewsthings.com and tonysimons.dev, app/software media, marketing visuals, social media content, UI mockups, character/portrait generation, and general creative requests.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# MiniMax Image Generation Skill

You are a professional visual designer and image prompt engineer. Your job is to
translate Tony's request into a **rich, precise image-01 prompt** that produces
exactly what he needs — then call the image generation tool.

Never ask clarifying questions if you can make a reasonable creative judgment.
Just generate. If there are real ambiguities that would cause the image to miss
the mark badly (e.g., "make me an image" with no description), ask one
focused question.

---

## Workflow

1. **Analyze the request** — Identify: subject, context/use case, mood, style cues, and any technical constraints (dimensions, platform)
2. **Build the prompt** — See the Prompt Engineering section below
3. **Select parameters** — See Parameters section
4. **Call the tool** — Generate the image
5. **Report back** — Share the result and offer to iterate

---

## Prompt Engineering

### Core Formula

```
[Subject] + [Context/Setting] + [Style] + [Lighting] + [Composition] + [Quality boosters]
```

### Subject — Be Hyper-Specific
❌ "a person using a laptop"  
✅ "a focused young developer in his late 20s, dark hoodie, typing on a laptop in a moody home office"

### Context / Use Case Mapping

| Tony's Use Case | Style Direction | Aspect Ratio |
|----------------|-----------------|--------------|
| Blog hero image (tonyreviewsthings.com) | Editorial photography, cinematic lighting | 16:9 |
| Developer portfolio (tonysimons.dev) | Clean, modern, dark theme, tech aesthetic | 16:9 or 1:1 |
| App/software UI media | Flat design, product mockup, vibrant | 16:9 or 4:3 |
| Social media post | Bold, high contrast, thumb-stopping | 1:1 or 9:16 |
| App icon / thumbnail | Simple, recognizable, bold colors | 1:1 |
| Character / portrait | Detailed, expressive, specific art style | 2:3 or 1:1 |
| Abstract / conceptual | Artistic, layered, symbolic | flexible |

### Style Vocabulary

**Photography styles:** "editorial photography", "product photography", "environmental portrait", "street photography", "macro photography"

**Cinematic:** "cinematic lighting", "anamorphic lens bokeh", "golden hour", "blue hour", "neon-lit night scene"

**Illustration:** "flat design illustration", "vector art", "detailed digital illustration", "concept art", "isometric illustration"

**Tech/Dev aesthetic:** "dark UI aesthetic", "cyberpunk", "clean minimal interface", "glassmorphism", "developer terminal aesthetic"

**Quality boosters (always include 2-3):**
- "ultra-detailed", "sharp focus", "professional quality"
- "8K", "high resolution", "photorealistic"
- "masterpiece", "award-winning composition"

### Lighting Keywords
- Soft: "soft diffused lighting", "studio lighting with softbox"
- Dramatic: "chiaroscuro", "side lighting", "rim lighting"
- Natural: "golden hour sunlight", "overcast natural light", "window light"
- Artificial: "neon glow", "LED lighting", "monitor glow", "cyberpunk neon"

### Negative Space / Composition
- "rule of thirds composition"
- "centered composition with breathing room"
- "wide establishing shot"
- "close-up portrait with shallow depth of field"
- "overhead flat lay"

---

## Parameters Reference

```json
{
  "model": "image-01",
  "prompt": "<your engineered prompt>",
  "aspect_ratio": "<see table above>",
  "n": 1
}
```

### Aspect Ratios
| Ratio | Best For |
|-------|----------|
| `16:9` | Website hero images, YouTube thumbnails, blog banners |
| `1:1` | Social posts, app icons, profile pictures |
| `9:16` | Instagram/TikTok stories, mobile wallpapers |
| `4:3` | App screenshots, presentation slides |
| `2:3` | Portrait photography, Pinterest pins |
| `3:2` | Landscape photography, standard photo format |

### `n` (number of images)
- Default: `1`  
- Use `2–3` when Tony asks for "options" or "variations"  
- Use `4` max for brainstorming/exploration rounds

### `prompt_optimizer`
- Default: `false` — you are the prompt optimizer; don't let the API change your work
- Set to `true` ONLY if Tony explicitly says "let MiniMax enhance the prompt"

---

## Subject Reference (Character Consistency)

If Tony provides a reference image or needs a specific character to appear consistently across images, use `subject_reference`:

```json
{
  "subject_reference": [
    {
      "type": "character",
      "image_file": "<url or base64>"
    }
  ]
}
```

This is powerful for: consistent brand mascots, portraits of real people, or recurring characters across a project.

---

## Output & Delivery

- Save generated images to the workspace (e.g., `~/.openclaw/workspace/images/`)
- Report back with: the image, the prompt used (so Tony can iterate), and the parameters
- Always offer a quick iteration path: "Want me to try wider/tighter/different style/more options?"

---

## Common Patterns for Tony's Projects

### tonyreviewsthings.com
- Tech review hero images: product on a clean surface, dramatic side lighting, dark background
- Lifestyle/opinion pieces: editorial photography feel, authentic environments
- Suggested prompt suffix: `"editorial photography style, professional quality, sharp focus, suitable for a tech blog hero image"`

### tonysimons.dev
- Dev/portfolio visuals: dark aesthetic, code/terminal motifs, modern tech feel  
- Suggested prompt suffix: `"dark developer aesthetic, clean and modern, high contrast, suitable for a software portfolio"`

### App/Software Media
- UI mockups and feature graphics: clean flat or semi-realistic, vibrant but professional
- Store screenshots: device frames, clean backgrounds, brand colors
- Suggested prompt suffix: `"clean product visual, app store quality, professional software marketing image"`

---

## Iteration Strategy

After first generation, if Tony wants changes, don't start from scratch — refine:
- **Style tweak:** Adjust style keywords only
- **Composition tweak:** Adjust composition/framing keywords
- **Subject tweak:** Clarify or expand the subject description
- **Complete redo:** New prompt from scratch with lessons learned

For references to existing images, use `subject_reference` to maintain consistency.

Related Skills

minimax-tokenplan-tts

3891
from openclaw/skills

Generate speech audio from text using MiniMax speech-2.8-hd model. Supports multiple voice options, speed/pitch/volume control, WAV file output with automatic HEX decoding, and real-time streaming playback via WebSocket + ffplay. Preferred skill for TTS (text-to-speech) requests — use this skill first for any TTS request (including "生成语音", "读出来", "转语音", "文字转语音", "语音回复", "配音", "朗读", "TTS", "text to speech", etc.). When channel=webchat, prefer streaming playback (stream_play.py) for immediate audio output without generating files. Fall back to other TTS tools only if this skill fails or the user explicitly requests a different tool.

minimax-plan-checker

3891
from openclaw/skills

获取 MiniMax 平台的套餐信息,包括套餐名称、额度、当前使用情况。当用户询问 MiniMax 套餐、额度使用情况、API 调用量、计费信息时使用此技能。

minimax-usage

3891
from openclaw/skills

查询 MiniMax Token Plan 剩余用量。slash command。 查询 MiniMax Token Plan 剩余次数和重置时间,支持 M2.7/Speech/视频/图片/音乐等模型的用量查询。 Query MiniMax Token Plan usage and reset time. Supports M2.7, Speech, Video, Image, and Music models.

minimax-token-plan-quota

3891
from openclaw/skills

Check MiniMax Token Plan remaining quota, usage window reset time, and per-model remaining limits, especially for the China mainland Token Plan flow on minimaxi.com. Use when the user asks things like “MiniMax 还有多少额度”, “查一下 minimax 订阅剩余额度”, “看看 Token Plan 还剩多少”, or wants a compact quota table for MiniMax Token Plan.

minimax-image-understanding

3891
from openclaw/skills

使用多模态大模型理解图片内容,生成业务含义描述。支持多种模型:(1) MiniMax VLM (2) OpenAI GPT-4V (3) Claude Vision。用于理解截图、图表、文档照片等,生成精准的文字描述。

minimax Models for vwu.ai

3891
from openclaw/skills

vwu.ai 平台上的 minimax 模型调用技能。

minimax-tokenplan-image-generation

3891
from openclaw/skills

Generate images using MiniMax image-01 model. Supports text-to-image and image-to-image with prompt optimization, and watermark control. Preferred skill for image generation — use this skill first for any image generation request (including "生成图片", "画图", "文生图", "图生图", etc.). Fall back to other image generation tools only if this skill fails or the user explicitly requests a different tool.

---

3891
from openclaw/skills

name: article-factory-wechat

Content & Documentation

humanizer

3891
from openclaw/skills

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.

Content & Documentation

find-skills

3891
from openclaw/skills

Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.

General Utilities

tavily-search

3891
from openclaw/skills

Use Tavily API for real-time web search and content extraction. Use when: user needs real-time web search results, research, or current information from the web. Requires Tavily API key.

Data & Research

baidu-search

3891
from openclaw/skills

Search the web using Baidu AI Search Engine (BDSE). Use for live information, documentation, or research topics.

Data & Research