image-prompt-generator

Generate AI images using Gemini image generation API. Use this skill when content needs images - thumbnails, social posts, blog headers, or creative visuals. Follows an iterative workflow - brainstorm concepts, select direction, generate in multiple styles, then produce via API.

8 stars

bycdeistopened

View on GitHub Installation ↓

Best use case

image-prompt-generator is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using image-prompt-generator should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/image-prompt-generator/SKILL.md --create-dirs "https://raw.githubusercontent.com/cdeistopened/skill-stack/main/.claude/skills/image-prompt-generator/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/image-prompt-generator/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How image-prompt-generator Compares

Feature / Agent	image-prompt-generator	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

SKILL.md Source

# Image Prompt Generator

Generate professional, non-generic images using Google's Gemini API for image generation.

## Prerequisites & Setup

### Getting Your Gemini API Key

1. Go to [Google AI Studio](https://aistudio.google.com/app/apikey)
2. Sign in with your Google account
3. Click "Create API Key"
4. Copy the generated key

### Configuring the API Key

**Option 1: Environment file (recommended)**

Create a `.env` file in your project root:

```bash
GEMINI_API_KEY=your_api_key_here
```

**Option 2: Direct environment variable**

```bash
export GEMINI_API_KEY=your_api_key_here
```

### Install Dependencies

```bash
pip install google-generativeai python-dotenv pillow
```

### Available Models

| Model | API Name | Best For |
|-------|----------|----------|
| **Flash** | `gemini-2.5-flash-image` | Speed, drafts, iteration |
| **Pro** | `gemini-3-pro-image-preview` | **Final assets, 16:9 aspect ratio, quality** |

### Image Size (CRITICAL for quality)

| Size | Output | Use For |
|------|--------|---------|
| 1K | ~400KB | Drafts, iteration |
| **2K** | ~2MB | **Thumbnails, web assets (default)** |
| 4K | ~7MB | Print, high-resolution needs |

**CRITICAL**: Always specify `--size 2K` or `--size 4K` for final assets. Without this, output defaults to 1K which looks low-quality.

### Skill Stack Thumbnails - MANDATORY SETTINGS

For ANY Skill Stack thumbnail, you MUST use:

```bash
python scripts/images/generate_image.py "YOUR PROMPT" \
  --model pro \
  --size 2K \
  --aspect 16:9 \
  --output public/images/thumbnails \
  --name your-slug
```

And your prompt MUST include risograph style instructions:

```
STYLE: Risograph / screen print aesthetic. Visible halftone dots throughout. 
Slight misregistration between color layers. Indie printmaking quality - warm, tactile, handmade.

COLOR: Warm cream base. Charcoal and gray tones. Terracotta (#D4654A) accent on ONE key element.

TEXTURE: Visible halftone pattern. Paper grain. NOT smooth digital gradients.

AVOID: Empty white borders, photorealism, glossy renders, smooth gradients, clip art aesthetic.
```

**WHY**: Without `--size 2K`, output is ~400KB garbage that looks like it was made with Flash. Without risograph style, output looks generic and AI-generated.

---

## Workflow Overview

1. **Brainstorm Concepts** - Generate 4-6 high-level visual ideas
2. **Select Direction** - User picks the concept they like
3. **Optimize Prompt** - Refine into a strong, detailed prompt
4. **Style Variations** - Adapt to 2-3 different visual styles
5. **Generate Images** - Run via Gemini API

## Step 1: Brainstorm Concepts

When the user provides a topic or use case, generate **3 high-level visual concepts** and indicate your pick. Each concept should be:

- **One sentence** describing the visual idea
- **Concrete and immediate** - you can picture it instantly
- **Conceptual but not abstract** - a clear object/scene with meaning
- **Non-generic** - avoid cliches (no lightbulbs for ideas, no handshakes for partnership)

**Format:**

```
**Concept A:** One sentence description of the visual concept and why it works.

**Concept B:** One sentence description...

**Concept C:** One sentence description...

**My pick: Concept X** — Brief reasoning why this one is strongest.
```

**Example for "newsletter about personal productivity":**

```
**Concept A:** A vintage compass where the needle points toward a coffee ring stain on a map—direction emerges from daily rituals.

**Concept B:** A clock where the 12 hours show seasonal changes—time management over long arcs, not just hours.

**Concept C:** One small key attached to dozens of decorative keychains—we overcomplicate simple solutions.

**My pick: Concept A** — The coffee stain is unexpected and personal, the compass is concrete without being cliché.
```

Wait for user to confirm or redirect before proceeding.

## Step 2: Optimize the Prompt

Once the user selects a concept, develop it into a full prompt. Structure:

```
Create a [style type] illustration of [subject].

CONCEPT: [Expand the one-sentence idea into a clear visual description]

STYLE: [Artistic approach - load from references/styles/ if brand-specific]

COMPOSITION: [Framing, focal point, negative space, balance]

COLORS: [Palette - describe by name, not hex codes which may render as text]

TEXTURE: [Surface qualities, analog/digital feel]

AVOID: [What should NOT appear - be specific]

FORMAT: [Aspect ratio]
```

**Key principles:**
- Natural language, full sentences - no tag soup
- Describe colors by name (burnt orange, sky blue, near-black) not hex codes
- Maximum 2-3 elements - if it feels busy, remove something
- Favor metaphor over literal depiction

## Step 3: Style Variations

**MANDATORY for Skill Stack: Risograph style.** Every thumbnail must include risograph style instructions in the prompt. See `references/styles/risograph.md`.

Key risograph elements to ALWAYS include in prompt:
- "Visible halftone dots throughout"
- "Slight misregistration between color layers"
- "Indie printmaking quality - warm, tactile, handmade"
- "NOT smooth digital gradients"

Available styles in `references/styles/` (for non-Skill-Stack projects):

- **risograph.md** - DEFAULT. Halftone dots, misregistration, indie printmaking aesthetic.
- **minimalist-ink.md** - High-contrast black and white, crosshatching.
- **watercolor-line.md** - Ink linework with watercolor washes.
- **editorial-conceptual.md** - Conceptual, sophisticated, editorial wit.

## Step 4: Generate via API

### Running the Script

```bash
# Generate 2K thumbnail (recommended for web)
python scripts/images/generate_image.py "prompt here" --model pro --aspect 16:9 --size 2K

# Generate 4K for print/high-res
python scripts/images/generate_image.py "prompt here" --model pro --size 4K

# Save to specific folder
python scripts/images/generate_image.py "prompt" --output "./images" --name "my_image" --size 2K
```

**Options:**
- `--model pro` (default, higher quality) or `--model flash` (faster)
- `--size 2K` (default), `1K`, or `4K` - **always use 2K or 4K for final assets**
- `--aspect 16:9` (default), `1:1`, `9:16`, `3:4`, `4:3`
- `--variations N` - generate N versions
- `--output ./path` - save location
- `--name prefix` - filename prefix

**Output location:** Save images alongside the content they belong to - not a generic images dump.

## Step 5: Iterate

After user reviews generated images:
- **80% good?** Request specific edits conversationally rather than regenerating
- **Composition off?** Adjust framing or element placement in prompt
- **Wrong style?** Try a different style reference
- **Too busy?** Simplify to fewer elements
- **Colors wrong?** Be more explicit about palette

## Prompting Principles

### Write Like a Creative Director

Brief the model like a human artist. Use proper grammar, full sentences, and descriptive adjectives.

| Don't | Do |
|-------|-----|
| "Cool car, neon, city, night, 8k" | "A cinematic wide shot of a futuristic sports car speeding through a rainy Tokyo street at night. The neon signs reflect off the wet pavement and the car's metallic chassis." |

**Be specific about:**
- **Subject:** Instead of "a woman," say "a sophisticated elderly woman wearing a vintage chanel-style suit"
- **Materiality:** Describe textures - "matte finish," "brushed steel," "soft velvet," "crumpled paper"
- **Setting:** Define location, time of day, weather
- **Lighting:** Specify mood and light source
- **Mood:** Emotional tone of the image

### Provide Context

Context helps the model make logical artistic decisions. Include the "why" or "for whom."

**Example:** "Create an image of a sandwich for a Brazilian high-end gourmet cookbook."
*(Model infers: professional plating, shallow depth of field, perfect lighting)*

### Keep It Simple

- One clear focal point
- Maximum 2-3 elements total
- Generous negative space
- If it feels busy, remove something

### Avoid the Generic

**Hard rules:**
- **NEVER include text in images** - Text renders poorly and looks amateur
- No lightbulbs for "ideas"
- No handshakes for "partnership"
- No audio waveforms for "podcasts" or "audio"
- No gears/cogs for "systems" or "process"
- No happy stock photo poses
- No glossy AI aesthetic
- No puzzle pieces for "connection" or "integration"

### Metaphors That Work

Strong concepts from past generations:
- **Orange cut open revealing neural network** - organic exterior, systematic interior (knowledge systems, RAG)
- **Mail slot with origami bird emerging** - delivery transforms into something alive (newsletters)
- **Exercise club crossed with quill pen** - physical culture meets intellectual work (body + mind)
- **Typewriter key extreme close-up with radiating impact** - decisive action, command (books, authority)
- **Medicine dropper with mandala bloom** - clinical precision meets transcendence (therapy, transformation)
- **Straight razor on leather strop** - precision tool, polishing, refinement (editing, production)
- **Prism dispersing light into distinct beams** - one input, multiple outputs (frameworks)
- **Compass with coffee stain on map** - direction from daily rituals (productivity)

## Resources

### references/styles/
Aesthetic style definitions:
- `risograph.md` - **DEFAULT** - Halftone, misregistration, indie printmaking
- `minimalist-ink.md` - Black and white ink illustration
- `watercolor-line.md` - Ink with watercolor washes
- `editorial-conceptual.md` - Conceptual editorial style

### scripts/
- `generate_image.py` - Gemini API image generation

## Prompt Modifiers Reference

| Category | Examples |
|----------|----------|
| **Lighting** | golden hour, dramatic shadows, soft diffused light, neon glow, overcast |
| **Style** | cinematic, editorial, technical diagram, hand-drawn, photorealistic |
| **Texture** | matte finish, brushed steel, soft velvet, crumpled paper, weathered wood |
| **Composition** | wide shot, close-up, bird's eye view, dutch angle, symmetrical |
| **Mood** | energetic, serene, dramatic, playful, sophisticated |
| **Quality** | 4K, high-fidelity, pixel-perfect, professional grade |

## Advanced Capabilities

### Text Rendering & Infographics

Put exact text in quotes. Specify style: "polished editorial," "technical diagram," or "hand-drawn whiteboard."

**Example prompts:**

```
Earnings Report Infographic:
"Generate a clean, modern infographic summarizing the key financial highlights from this earnings report. Include charts for 'Revenue Growth' and 'Net Income', and highlight the CEO's key quote in a stylized pull-quote box."
```

```
Whiteboard Summary:
"Summarize the concept of 'Transformer Neural Network Architecture' as a hand-drawn whiteboard diagram suitable for a university lecture. Use different colored markers for the Encoder and Decoder blocks, and include legible labels for 'Self-Attention' and 'Feed Forward'."
```

### Character Consistency & Thumbnails

Use reference images and state "Keep the person's facial features exactly the same as Image 1." Describe expression/action changes while maintaining identity.

**Example prompt:**

```
Viral Thumbnail:
"Design a viral video thumbnail using the person from Image 1.
Face Consistency: Keep the person's facial features exactly the same as Image 1, but change their expression to look excited and surprised.
Action: Pose the person on the left side, pointing their finger towards the right side of the frame.
Subject: On the right side, place a high-quality image of a delicious avocado toast.
Graphics: Add a bold yellow arrow connecting the person's finger to the toast.
Text: Overlay massive, pop-style text in the middle: 'Done in 3 mins!'. Use a thick white outline and drop shadow.
Background: A blurred, bright kitchen background. High saturation and contrast."
```

### Image Reworking (Edit Existing Images)

The `--input` flag enables "rework mode" - pass an existing image to Gemini and describe the changes you want.

**Key use cases:**
- **Small tweaks** - Adjust colors, add/remove elements, change lighting
- **Style transfer** - Keep composition but change artistic style
- **Object manipulation** - Remove, add, or modify specific objects
- **Seasonal/temporal changes** - Same scene, different time/season

**Running in rework mode:**

```bash
# Basic edit - add something
python scripts/generate_image.py "Add snow to the roof and yard" \
  --input ./house.png \
  --model pro

# Color adjustment
python scripts/generate_image.py "Change the accent color from red to teal, keep everything else identical" \
  --input ./thumbnail.png \
  --model pro

# Style transfer - keep composition, change aesthetic
python scripts/generate_image.py "Convert this to risograph style with halftone dots and slight color misregistration" \
  --input ./photo.png \
  --model pro

# Generate variations of an edit
python scripts/generate_image.py "Make the lighting warmer, like golden hour" \
  --input ./portrait.png \
  --variations 3 \
  --model pro
```

**Prompting tips for rework mode:**

1. **Be specific about what to preserve:**
   - "Keep the person's facial features exactly the same"
   - "Maintain the composition and framing"
   - "Don't change the background"

2. **Be explicit about what to change:**
   - "Change ONLY the color of the shirt from blue to red"
   - "Add snow to the roof and nothing else"
   - "Remove the text overlay"

3. **Use comparative language:**
   - "Make the colors more vibrant"
   - "Increase the contrast slightly"
   - "Make the lighting softer and more diffused"

**Output naming:** Files from rework mode are named `{prefix}_{timestamp}_edit_{model}.png` to distinguish from generated images (`_gen_`).

### Advanced Editing Examples

**Object Removal:**
```bash
python scripts/generate_image.py \
  "Remove the tourists from the background and fill with matching cobblestones and storefronts" \
  --input ./street-photo.png \
  --model pro
```

**Seasonal Control:**
```bash
python scripts/generate_image.py \
  "Turn this into winter. Add snow to the roof and yard. Change lighting to cold, overcast afternoon. Keep architecture identical." \
  --input ./house-summer.png \
  --model pro
```

**Character Consistency (thumbnail series):**
```bash
python scripts/generate_image.py \
  "Keep the person's face exactly the same. Change expression to surprised. Add a pointing gesture toward the right side of the frame." \
  --input ./person-reference.png \
  --model pro
```

---

## Related Skills

- **youtube-title-creator** - Pair generated images with optimized titles
- **social-content-creation** - Use images in platform-optimized posts

---

*For custom brand styles, create new style files in references/styles/ following the existing format*

Related Skills

video-generator

from cdeistopened/skill-stack

Generate AI videos using Google VEO 3.1 or OpenAI Sora. Two providers for different strengths - VEO for native audio, Sora for visual quality and longer clips.

clifton-sellers-content-prompts

from cdeistopened/skill-stack

Seven prompts for founder-led content creation from Clifton Sellers ($4M agency). Use when pressure-testing ideas, mining experience for content, enforcing clarity, building content systems, protecting voice, mapping content to sales, or prioritizing weekly focus. Core principle - AI amplifies conviction and experience, not generates on your behalf.

x-viral-template-miner

from cdeistopened/skill-stack

When the user wants to find proven-to-travel post templates in their niche and adapt them to their own product. Also use when the user mentions "what's going viral in my space", "what are competitors posting", "copy a viral post", "trending on X", "post ideas", "template mining", or "what to post this week". This is trend hunting, not plagiarism — the output is a template the user fills with their own assets.

x-linkedin-content-relay

from cdeistopened/skill-stack

When the user has X (Twitter) content that performed well and wants to relay it to LinkedIn 1-2 weeks later with reframing. Also use when the user mentions "repost to LinkedIn", "LinkedIn version of my tweet", "X to LinkedIn", "delayed repost", "LinkedIn for non-tech audience", or "LinkedIn relay". Also use when the user's ICP is non-tech and X is secondary — LinkedIn is the primary channel and this skill produces the content.

x-launch-video-structure

from cdeistopened/skill-stack

When the user is planning, scripting, or editing a product launch video for X (Twitter) and needs the structure. Also use when the user mentions "launch video", "demo video", "product launch on X", "60 second demo", "how to structure a launch", or "my launch video isn't working". Produces a beat-by-beat timing sheet, not copy.

x-account-warmup

from cdeistopened/skill-stack

When a user wants to grow an X (Twitter) account from zero before a product launch, or asks how to get first followers, warm up the algorithm, hit ~500-1,000 followers, or prepare an account to make a launch video land. Also use when the user mentions "new X account", "warm up my Twitter", "first 1000 followers", "building in public strategy", "X growth", or "engagement before launch".

skill-stack-thumbnails

from cdeistopened/skill-stack

Generate blog post thumbnails for Skill Stack using the brand aesthetic. Follows an iterative workflow - brainstorm concepts, get approval, generate with Gemini API.

youtube-ingest

from cdeistopened/skill-stack

Transcribe YouTube videos and playlists using Gemini Flash

web-scrape

from cdeistopened/skill-stack

Scrape web pages to clean markdown with optional AI summaries

voice-tyler-cowen

from cdeistopened/skill-stack

Write in Tyler Cowen's style - matter-of-fact, understated, treats enormous ideas as obvious observations. Read the passages. Absorb the flatness. Channel the HOW, not the content.

voice-trung-phan

from cdeistopened/skill-stack

Generate tweets and threads in the style of Trung Phan. Not just voice — captures his humor mechanics, format taxonomy, topic selection filter, and structural patterns. Use for trend-reactive tweets, meme commentary, and business/culture threads.

voice-levine-berry

from cdeistopened/skill-stack

Write in a combined Matt Levine + Wendell Berry voice. Levine's dry logic-walking and parenthetical humor for the analytical sections. Berry's meditative patience for the human ones. Read the passages. Absorb the rhythm. Channel the HOW, not the content.