gemini-image-gen

Generate and edit images via Google Gemini API. Supports Gemini native generation, Imagen 3, style presets, and batch generation with HTML gallery. Zero dependencies — pure Python stdlib.

7 stars

byDemerzels-lab

View on GitHub Installation ↓

Best use case

gemini-image-gen is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Generate and edit images via Google Gemini API. Supports Gemini native generation, Imagen 3, style presets, and batch generation with HTML gallery. Zero dependencies — pure Python stdlib.

Teams using gemini-image-gen should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/gemini-image-gen/SKILL.md --create-dirs "https://raw.githubusercontent.com/Demerzels-lab/elsamultiskillagent/main/public/skills/iisweetheartii/gemini-image-gen/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/gemini-image-gen/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How gemini-image-gen Compares

Feature / Agent	gemini-image-gen	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Generate and edit images via Google Gemini API. Supports Gemini native generation, Imagen 3, style presets, and batch generation with HTML gallery. Zero dependencies — pure Python stdlib.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Gemini Image Gen

Generate and edit images via the Google Gemini API using pure Python stdlib. Supports Gemini native generation + editing, Imagen 3 generation, batch runs, and an HTML gallery output.

## Quick Start

```bash
export GEMINI_API_KEY="your-key-here"

# Default: Gemini native, 4 random prompts
python3 scripts/gen.py

# Custom prompt
python3 scripts/gen.py --prompt "a cyberpunk cat riding a neon motorcycle through Tokyo at night"

# Imagen 3 engine
python3 scripts/gen.py --engine imagen --count 4 --aspect 16:9

# Edit an existing image (Gemini engine only)
python3 scripts/gen.py --edit path/to/image.png --prompt "change the background to a sunset beach"

# Use a style preset
python3 scripts/gen.py --style watercolor --prompt "floating islands above a calm sea"

# List available styles
python3 scripts/gen.py --styles
```

## Style Presets

| Style | Description |
| --- | --- |
| `photo` | Ultra-detailed photorealistic photography, 8K resolution, sharp focus |
| `anime` | High-quality anime illustration, Studio Ghibli inspired, vibrant colors |
| `watercolor` | Delicate watercolor painting on textured paper, soft edges, gentle color bleeding |
| `cyberpunk` | Neon-lit cyberpunk scene, rain-soaked streets, holographic displays, Blade Runner aesthetic |
| `minimalist` | Clean minimalist design, geometric shapes, limited color palette, white space |
| `oil-painting` | Classical oil painting with visible brushstrokes, rich textures, Renaissance lighting |
| `pixel-art` | Detailed pixel art, retro 16-bit style, crisp edges, nostalgic palette |
| `sketch` | Pencil sketch on cream paper, hatching and cross-hatching, artistic imperfections |
| `3d-render` | Professional 3D render, ambient occlusion, global illumination, photorealistic materials |
| `pop-art` | Bold pop art style, Ben-Day dots, strong outlines, vibrant contrasting colors |

## Full CLI Reference

| Flag | Default | Description |
| --- | --- | --- |
| `--prompt` | (random) | Text prompt. Omit for random creative prompts |
| `--count` | 4 | Number of images to generate |
| `--engine` | gemini | Engine: `gemini` (native, supports edit) or `imagen` (Imagen 3) |
| `--model` | (auto) | Model override. Default: `gemini-2.5-flash-image` or `imagen-3.0-generate-002` |
| `--edit` | | Path to input image for editing (Gemini engine only) |
| `--aspect` | 1:1 | Aspect ratio for Imagen: `1:1`, `16:9`, `9:16`, `4:3`, `3:4` |
| `--out-dir` | (auto) | Output directory (default is a timestamped folder) |
| `--style` | | Style preset to prepend to the prompt |
| `--styles` | | List available style presets and exit |

## Python Example

```python
import subprocess

subprocess.run(
    [
        "python3",
        "scripts/gen.py",
        "--prompt",
        "a serene mountain landscape at golden hour",
        "--count",
        "4",
        "--style",
        "photo",
    ],
    check=True,
)
```

## Troubleshooting

- Missing API key: set `GEMINI_API_KEY` in your environment and retry.
- Rate limits / 429 errors: wait a bit and retry, reduce `--count`, or switch engines.
- Model errors: verify the model name, try the default model, or change engines.

## Integration with Other Skills

- **[AgentGram](https://clawhub.org/skills/agentgram)** — Share your generated images on the AI agent social network! Create visual content and post it to your AgentGram feed.
- **[agent-selfie](https://clawhub.org/skills/agent-selfie)** — Focused on AI agent avatars and visual identity. Uses the same Gemini API key for personality-driven self-portraits.

## Changelog

- v1.1.0: Added style presets, `--style` and `--styles` flags, expanded documentation.
- v1.0.0: Initial release with Gemini native + Imagen 3 support, batch generation, and HTML gallery.

## Repository

https://github.com/IISweetHeartII/gemini-image-gen

Related Skills

ImageMagick Moltbot Skill

from Demerzels-lab/elsamultiskillagent

Comprehensive ImageMagick operations for image manipulation in Moltbot.

table-image

from Demerzels-lab/elsamultiskillagent

Generate images from tables for better readability in messaging apps like Telegram. Use when displaying tabular data.

ms-foundry-image-gen

from Demerzels-lab/elsamultiskillagent

Azure Foundry image generation skill for OpenClaw; generates images via a Foundry deployment and returns image.

antigravity-image-gen

from Demerzels-lab/elsamultiskillagent

Generate images using the internal Google Antigravity API (Gemini 3 Pro Image). High quality, native generation without browser automation.

antigravity-image

from Demerzels-lab/elsamultiskillagent

Generate images using the internal Antigravity Sandbox API (Gemini 3 Pro Image). Supports text-to-image generation via internal Google endpoints.

zhipu-cogview-image

from Demerzels-lab/elsamultiskillagent

Generate images using Zhipu AI's CogView model.

doubao-image-gen

from Demerzels-lab/elsamultiskillagent

Use Zhipu (智谱) web search API for searching the internet.

10-of-my-most-popular-text-to-image-series-prompts-78b0897e

from Demerzels-lab/elsamultiskillagent

generate a bunch of images, then you curate the results to handpick the best ones

Seasonal Product Image

from Demerzels-lab/elsamultiskillagent

**Version**: 1.0.0

google-imagen-3-portrait-photography

from Demerzels-lab/elsamultiskillagent

Generate professional portrait photography using Google Imagen 3. Use when creating realistic portraits, headshots, or artistic character photography with professional lighting and composition.

google-imagen-3-hyperrealistic-landscape

from Demerzels-lab/elsamultiskillagent

Generate hyperrealistic landscape photography using Google Imagen 3. Use when creating breathtaking natural scenes, landscapes, and nature photography with exceptional detail and realism.

gemini-nano-banana-pro-portraits

from Demerzels-lab/elsamultiskillagent

Generate ultra-photorealistic portraits using Gemini Nano Banana Pro with comprehensive JSON configuration templates. Use when creating cinematic quality portraits, fitness photography, or realistic character images. Includes complete JSON structure for prompt configuration, subject details, apparel, pose, environment, lighting, and technical specifications.