image-prompt-generator

Generate AI images using Gemini image generation API. Use this skill when content needs images - thumbnails, social posts, blog headers, or creative visuals. Follows an iterative workflow - brainstorm concepts, select direction, generate in multiple styles, then produce via API.

8 stars

Best use case

image-prompt-generator is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Generate AI images using Gemini image generation API. Use this skill when content needs images - thumbnails, social posts, blog headers, or creative visuals. Follows an iterative workflow - brainstorm concepts, select direction, generate in multiple styles, then produce via API.

Teams using image-prompt-generator should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/image-prompt-generator/SKILL.md --create-dirs "https://raw.githubusercontent.com/cdeistopened/skill-stack/main/.claude/skills/image-prompt-generator/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/image-prompt-generator/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How image-prompt-generator Compares

Feature / Agentimage-prompt-generatorStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Generate AI images using Gemini image generation API. Use this skill when content needs images - thumbnails, social posts, blog headers, or creative visuals. Follows an iterative workflow - brainstorm concepts, select direction, generate in multiple styles, then produce via API.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Image Prompt Generator

Generate professional, non-generic images using Google's Gemini API for image generation.

## Prerequisites & Setup

### Getting Your Gemini API Key

1. Go to [Google AI Studio](https://aistudio.google.com/app/apikey)
2. Sign in with your Google account
3. Click "Create API Key"
4. Copy the generated key

### Configuring the API Key

**Option 1: Environment file (recommended)**

Create a `.env` file in your project root:

```bash
GEMINI_API_KEY=your_api_key_here
```

**Option 2: Direct environment variable**

```bash
export GEMINI_API_KEY=your_api_key_here
```

### Install Dependencies

```bash
pip install google-generativeai python-dotenv pillow
```

### Available Models

| Model | API Name | Best For |
|-------|----------|----------|
| **Flash** | `gemini-2.5-flash-image` | Speed, drafts, iteration |
| **Pro** | `gemini-3-pro-image-preview` | **Final assets, 16:9 aspect ratio, quality** |

### Image Size (CRITICAL for quality)

| Size | Output | Use For |
|------|--------|---------|
| 1K | ~400KB | Drafts, iteration |
| **2K** | ~2MB | **Thumbnails, web assets (default)** |
| 4K | ~7MB | Print, high-resolution needs |

**CRITICAL**: Always specify `--size 2K` or `--size 4K` for final assets. Without this, output defaults to 1K which looks low-quality.

### Skill Stack Thumbnails - MANDATORY SETTINGS

For ANY Skill Stack thumbnail, you MUST use:

```bash
python scripts/images/generate_image.py "YOUR PROMPT" \
  --model pro \
  --size 2K \
  --aspect 16:9 \
  --output public/images/thumbnails \
  --name your-slug
```

And your prompt MUST include risograph style instructions:

```
STYLE: Risograph / screen print aesthetic. Visible halftone dots throughout. 
Slight misregistration between color layers. Indie printmaking quality - warm, tactile, handmade.

COLOR: Warm cream base. Charcoal and gray tones. Terracotta (#D4654A) accent on ONE key element.

TEXTURE: Visible halftone pattern. Paper grain. NOT smooth digital gradients.

AVOID: Empty white borders, photorealism, glossy renders, smooth gradients, clip art aesthetic.
```

**WHY**: Without `--size 2K`, output is ~400KB garbage that looks like it was made with Flash. Without risograph style, output looks generic and AI-generated.

---

## Workflow Overview

1. **Brainstorm Concepts** - Generate 4-6 high-level visual ideas
2. **Select Direction** - User picks the concept they like
3. **Optimize Prompt** - Refine into a strong, detailed prompt
4. **Style Variations** - Adapt to 2-3 different visual styles
5. **Generate Images** - Run via Gemini API

## Step 1: Brainstorm Concepts

When the user provides a topic or use case, generate **3 high-level visual concepts** and indicate your pick. Each concept should be:

- **One sentence** describing the visual idea
- **Concrete and immediate** - you can picture it instantly
- **Conceptual but not abstract** - a clear object/scene with meaning
- **Non-generic** - avoid cliches (no lightbulbs for ideas, no handshakes for partnership)

**Format:**

```
**Concept A:** One sentence description of the visual concept and why it works.

**Concept B:** One sentence description...

**Concept C:** One sentence description...

**My pick: Concept X** — Brief reasoning why this one is strongest.
```

**Example for "newsletter about personal productivity":**

```
**Concept A:** A vintage compass where the needle points toward a coffee ring stain on a map—direction emerges from daily rituals.

**Concept B:** A clock where the 12 hours show seasonal changes—time management over long arcs, not just hours.

**Concept C:** One small key attached to dozens of decorative keychains—we overcomplicate simple solutions.

**My pick: Concept A** — The coffee stain is unexpected and personal, the compass is concrete without being cliché.
```

Wait for user to confirm or redirect before proceeding.

## Step 2: Optimize the Prompt

Once the user selects a concept, develop it into a full prompt. Structure:

```
Create a [style type] illustration of [subject].

CONCEPT: [Expand the one-sentence idea into a clear visual description]

STYLE: [Artistic approach - load from references/styles/ if brand-specific]

COMPOSITION: [Framing, focal point, negative space, balance]

COLORS: [Palette - describe by name, not hex codes which may render as text]

TEXTURE: [Surface qualities, analog/digital feel]

AVOID: [What should NOT appear - be specific]

FORMAT: [Aspect ratio]
```

**Key principles:**
- Natural language, full sentences - no tag soup
- Describe colors by name (burnt orange, sky blue, near-black) not hex codes
- Maximum 2-3 elements - if it feels busy, remove something
- Favor metaphor over literal depiction

## Step 3: Style Variations

**MANDATORY for Skill Stack: Risograph style.** Every thumbnail must include risograph style instructions in the prompt. See `references/styles/risograph.md`.

Key risograph elements to ALWAYS include in prompt:
- "Visible halftone dots throughout"
- "Slight misregistration between color layers"
- "Indie printmaking quality - warm, tactile, handmade"
- "NOT smooth digital gradients"

Available styles in `references/styles/` (for non-Skill-Stack projects):

- **risograph.md** - DEFAULT. Halftone dots, misregistration, indie printmaking aesthetic.
- **minimalist-ink.md** - High-contrast black and white, crosshatching.
- **watercolor-line.md** - Ink linework with watercolor washes.
- **editorial-conceptual.md** - Conceptual, sophisticated, editorial wit.

## Step 4: Generate via API

### Running the Script

```bash
# Generate 2K thumbnail (recommended for web)
python scripts/images/generate_image.py "prompt here" --model pro --aspect 16:9 --size 2K

# Generate 4K for print/high-res
python scripts/images/generate_image.py "prompt here" --model pro --size 4K

# Save to specific folder
python scripts/images/generate_image.py "prompt" --output "./images" --name "my_image" --size 2K
```

**Options:**
- `--model pro` (default, higher quality) or `--model flash` (faster)
- `--size 2K` (default), `1K`, or `4K` - **always use 2K or 4K for final assets**
- `--aspect 16:9` (default), `1:1`, `9:16`, `3:4`, `4:3`
- `--variations N` - generate N versions
- `--output ./path` - save location
- `--name prefix` - filename prefix

**Output location:** Save images alongside the content they belong to - not a generic images dump.

## Step 5: Iterate

After user reviews generated images:
- **80% good?** Request specific edits conversationally rather than regenerating
- **Composition off?** Adjust framing or element placement in prompt
- **Wrong style?** Try a different style reference
- **Too busy?** Simplify to fewer elements
- **Colors wrong?** Be more explicit about palette

## Prompting Principles

### Write Like a Creative Director

Brief the model like a human artist. Use proper grammar, full sentences, and descriptive adjectives.

| Don't | Do |
|-------|-----|
| "Cool car, neon, city, night, 8k" | "A cinematic wide shot of a futuristic sports car speeding through a rainy Tokyo street at night. The neon signs reflect off the wet pavement and the car's metallic chassis." |

**Be specific about:**
- **Subject:** Instead of "a woman," say "a sophisticated elderly woman wearing a vintage chanel-style suit"
- **Materiality:** Describe textures - "matte finish," "brushed steel," "soft velvet," "crumpled paper"
- **Setting:** Define location, time of day, weather
- **Lighting:** Specify mood and light source
- **Mood:** Emotional tone of the image

### Provide Context

Context helps the model make logical artistic decisions. Include the "why" or "for whom."

**Example:** "Create an image of a sandwich for a Brazilian high-end gourmet cookbook."
*(Model infers: professional plating, shallow depth of field, perfect lighting)*

### Keep It Simple

- One clear focal point
- Maximum 2-3 elements total
- Generous negative space
- If it feels busy, remove something

### Avoid the Generic

**Hard rules:**
- **NEVER include text in images** - Text renders poorly and looks amateur
- No lightbulbs for "ideas"
- No handshakes for "partnership"
- No audio waveforms for "podcasts" or "audio"
- No gears/cogs for "systems" or "process"
- No happy stock photo poses
- No glossy AI aesthetic
- No puzzle pieces for "connection" or "integration"

### Metaphors That Work

Strong concepts from past generations:
- **Orange cut open revealing neural network** - organic exterior, systematic interior (knowledge systems, RAG)
- **Mail slot with origami bird emerging** - delivery transforms into something alive (newsletters)
- **Exercise club crossed with quill pen** - physical culture meets intellectual work (body + mind)
- **Typewriter key extreme close-up with radiating impact** - decisive action, command (books, authority)
- **Medicine dropper with mandala bloom** - clinical precision meets transcendence (therapy, transformation)
- **Straight razor on leather strop** - precision tool, polishing, refinement (editing, production)
- **Prism dispersing light into distinct beams** - one input, multiple outputs (frameworks)
- **Compass with coffee stain on map** - direction from daily rituals (productivity)

## Resources

### references/styles/
Aesthetic style definitions:
- `risograph.md` - **DEFAULT** - Halftone, misregistration, indie printmaking
- `minimalist-ink.md` - Black and white ink illustration
- `watercolor-line.md` - Ink with watercolor washes
- `editorial-conceptual.md` - Conceptual editorial style

### scripts/
- `generate_image.py` - Gemini API image generation

## Prompt Modifiers Reference

| Category | Examples |
|----------|----------|
| **Lighting** | golden hour, dramatic shadows, soft diffused light, neon glow, overcast |
| **Style** | cinematic, editorial, technical diagram, hand-drawn, photorealistic |
| **Texture** | matte finish, brushed steel, soft velvet, crumpled paper, weathered wood |
| **Composition** | wide shot, close-up, bird's eye view, dutch angle, symmetrical |
| **Mood** | energetic, serene, dramatic, playful, sophisticated |
| **Quality** | 4K, high-fidelity, pixel-perfect, professional grade |

## Advanced Capabilities

### Text Rendering & Infographics

Put exact text in quotes. Specify style: "polished editorial," "technical diagram," or "hand-drawn whiteboard."

**Example prompts:**

```
Earnings Report Infographic:
"Generate a clean, modern infographic summarizing the key financial highlights from this earnings report. Include charts for 'Revenue Growth' and 'Net Income', and highlight the CEO's key quote in a stylized pull-quote box."
```

```
Whiteboard Summary:
"Summarize the concept of 'Transformer Neural Network Architecture' as a hand-drawn whiteboard diagram suitable for a university lecture. Use different colored markers for the Encoder and Decoder blocks, and include legible labels for 'Self-Attention' and 'Feed Forward'."
```

### Character Consistency & Thumbnails

Use reference images and state "Keep the person's facial features exactly the same as Image 1." Describe expression/action changes while maintaining identity.

**Example prompt:**

```
Viral Thumbnail:
"Design a viral video thumbnail using the person from Image 1.
Face Consistency: Keep the person's facial features exactly the same as Image 1, but change their expression to look excited and surprised.
Action: Pose the person on the left side, pointing their finger towards the right side of the frame.
Subject: On the right side, place a high-quality image of a delicious avocado toast.
Graphics: Add a bold yellow arrow connecting the person's finger to the toast.
Text: Overlay massive, pop-style text in the middle: 'Done in 3 mins!'. Use a thick white outline and drop shadow.
Background: A blurred, bright kitchen background. High saturation and contrast."
```

### Image Reworking (Edit Existing Images)

The `--input` flag enables "rework mode" - pass an existing image to Gemini and describe the changes you want.

**Key use cases:**
- **Small tweaks** - Adjust colors, add/remove elements, change lighting
- **Style transfer** - Keep composition but change artistic style
- **Object manipulation** - Remove, add, or modify specific objects
- **Seasonal/temporal changes** - Same scene, different time/season

**Running in rework mode:**

```bash
# Basic edit - add something
python scripts/generate_image.py "Add snow to the roof and yard" \
  --input ./house.png \
  --model pro

# Color adjustment
python scripts/generate_image.py "Change the accent color from red to teal, keep everything else identical" \
  --input ./thumbnail.png \
  --model pro

# Style transfer - keep composition, change aesthetic
python scripts/generate_image.py "Convert this to risograph style with halftone dots and slight color misregistration" \
  --input ./photo.png \
  --model pro

# Generate variations of an edit
python scripts/generate_image.py "Make the lighting warmer, like golden hour" \
  --input ./portrait.png \
  --variations 3 \
  --model pro
```

**Prompting tips for rework mode:**

1. **Be specific about what to preserve:**
   - "Keep the person's facial features exactly the same"
   - "Maintain the composition and framing"
   - "Don't change the background"

2. **Be explicit about what to change:**
   - "Change ONLY the color of the shirt from blue to red"
   - "Add snow to the roof and nothing else"
   - "Remove the text overlay"

3. **Use comparative language:**
   - "Make the colors more vibrant"
   - "Increase the contrast slightly"
   - "Make the lighting softer and more diffused"

**Output naming:** Files from rework mode are named `{prefix}_{timestamp}_edit_{model}.png` to distinguish from generated images (`_gen_`).

### Advanced Editing Examples

**Object Removal:**
```bash
python scripts/generate_image.py \
  "Remove the tourists from the background and fill with matching cobblestones and storefronts" \
  --input ./street-photo.png \
  --model pro
```

**Seasonal Control:**
```bash
python scripts/generate_image.py \
  "Turn this into winter. Add snow to the roof and yard. Change lighting to cold, overcast afternoon. Keep architecture identical." \
  --input ./house-summer.png \
  --model pro
```

**Character Consistency (thumbnail series):**
```bash
python scripts/generate_image.py \
  "Keep the person's face exactly the same. Change expression to surprised. Add a pointing gesture toward the right side of the frame." \
  --input ./person-reference.png \
  --model pro
```

---

## Related Skills

- **youtube-title-creator** - Pair generated images with optimized titles
- **social-content-creation** - Use images in platform-optimized posts

---

*For custom brand styles, create new style files in references/styles/ following the existing format*

Related Skills

video-generator

8
from cdeistopened/skill-stack

Generate AI videos using Google VEO 3.1 or OpenAI Sora. Two providers for different strengths - VEO for native audio, Sora for visual quality and longer clips.

clifton-sellers-content-prompts

8
from cdeistopened/skill-stack

Seven prompts for founder-led content creation from Clifton Sellers ($4M agency). Use when pressure-testing ideas, mining experience for content, enforcing clarity, building content systems, protecting voice, mapping content to sales, or prioritizing weekly focus. Core principle - AI amplifies conviction and experience, not generates on your behalf.

x-viral-template-miner

8
from cdeistopened/skill-stack

When the user wants to find proven-to-travel post templates in their niche and adapt them to their own product. Also use when the user mentions "what's going viral in my space", "what are competitors posting", "copy a viral post", "trending on X", "post ideas", "template mining", or "what to post this week". This is trend hunting, not plagiarism — the output is a template the user fills with their own assets.

x-linkedin-content-relay

8
from cdeistopened/skill-stack

When the user has X (Twitter) content that performed well and wants to relay it to LinkedIn 1-2 weeks later with reframing. Also use when the user mentions "repost to LinkedIn", "LinkedIn version of my tweet", "X to LinkedIn", "delayed repost", "LinkedIn for non-tech audience", or "LinkedIn relay". Also use when the user's ICP is non-tech and X is secondary — LinkedIn is the primary channel and this skill produces the content.

x-launch-video-structure

8
from cdeistopened/skill-stack

When the user is planning, scripting, or editing a product launch video for X (Twitter) and needs the structure. Also use when the user mentions "launch video", "demo video", "product launch on X", "60 second demo", "how to structure a launch", or "my launch video isn't working". Produces a beat-by-beat timing sheet, not copy.

x-account-warmup

8
from cdeistopened/skill-stack

When a user wants to grow an X (Twitter) account from zero before a product launch, or asks how to get first followers, warm up the algorithm, hit ~500-1,000 followers, or prepare an account to make a launch video land. Also use when the user mentions "new X account", "warm up my Twitter", "first 1000 followers", "building in public strategy", "X growth", or "engagement before launch".

skill-stack-thumbnails

8
from cdeistopened/skill-stack

Generate blog post thumbnails for Skill Stack using the brand aesthetic. Follows an iterative workflow - brainstorm concepts, get approval, generate with Gemini API.

youtube-ingest

8
from cdeistopened/skill-stack

Transcribe YouTube videos and playlists using Gemini Flash

web-scrape

8
from cdeistopened/skill-stack

Scrape web pages to clean markdown with optional AI summaries

voice-tyler-cowen

8
from cdeistopened/skill-stack

Write in Tyler Cowen's style - matter-of-fact, understated, treats enormous ideas as obvious observations. Read the passages. Absorb the flatness. Channel the HOW, not the content.

voice-trung-phan

8
from cdeistopened/skill-stack

Generate tweets and threads in the style of Trung Phan. Not just voice — captures his humor mechanics, format taxonomy, topic selection filter, and structural patterns. Use for trend-reactive tweets, meme commentary, and business/culture threads.

voice-levine-berry

8
from cdeistopened/skill-stack

Write in a combined Matt Levine + Wendell Berry voice. Levine's dry logic-walking and parenthetical humor for the analytical sections. Berry's meditative patience for the human ones. Read the passages. Absorb the rhythm. Channel the HOW, not the content.