gemini-image-gen
Generate and edit images via Google Gemini API. Supports Gemini native generation, Imagen 3, style presets, and batch generation with HTML gallery. Zero dependencies — pure Python stdlib.
Best use case
gemini-image-gen is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Generate and edit images via Google Gemini API. Supports Gemini native generation, Imagen 3, style presets, and batch generation with HTML gallery. Zero dependencies — pure Python stdlib.
Teams using gemini-image-gen should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/gemini-image-gen/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How gemini-image-gen Compares
| Feature / Agent | gemini-image-gen | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Generate and edit images via Google Gemini API. Supports Gemini native generation, Imagen 3, style presets, and batch generation with HTML gallery. Zero dependencies — pure Python stdlib.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Gemini Image Gen
Generate and edit images via the Google Gemini API using pure Python stdlib. Supports Gemini native generation + editing, Imagen 3 generation, batch runs, and an HTML gallery output.
## Quick Start
```bash
export GEMINI_API_KEY="your-key-here"
# Default: Gemini native, 4 random prompts
python3 scripts/gen.py
# Custom prompt
python3 scripts/gen.py --prompt "a cyberpunk cat riding a neon motorcycle through Tokyo at night"
# Imagen 3 engine
python3 scripts/gen.py --engine imagen --count 4 --aspect 16:9
# Edit an existing image (Gemini engine only)
python3 scripts/gen.py --edit path/to/image.png --prompt "change the background to a sunset beach"
# Use a style preset
python3 scripts/gen.py --style watercolor --prompt "floating islands above a calm sea"
# List available styles
python3 scripts/gen.py --styles
```
## Style Presets
| Style | Description |
| --- | --- |
| `photo` | Ultra-detailed photorealistic photography, 8K resolution, sharp focus |
| `anime` | High-quality anime illustration, Studio Ghibli inspired, vibrant colors |
| `watercolor` | Delicate watercolor painting on textured paper, soft edges, gentle color bleeding |
| `cyberpunk` | Neon-lit cyberpunk scene, rain-soaked streets, holographic displays, Blade Runner aesthetic |
| `minimalist` | Clean minimalist design, geometric shapes, limited color palette, white space |
| `oil-painting` | Classical oil painting with visible brushstrokes, rich textures, Renaissance lighting |
| `pixel-art` | Detailed pixel art, retro 16-bit style, crisp edges, nostalgic palette |
| `sketch` | Pencil sketch on cream paper, hatching and cross-hatching, artistic imperfections |
| `3d-render` | Professional 3D render, ambient occlusion, global illumination, photorealistic materials |
| `pop-art` | Bold pop art style, Ben-Day dots, strong outlines, vibrant contrasting colors |
## Full CLI Reference
| Flag | Default | Description |
| --- | --- | --- |
| `--prompt` | (random) | Text prompt. Omit for random creative prompts |
| `--count` | 4 | Number of images to generate |
| `--engine` | gemini | Engine: `gemini` (native, supports edit) or `imagen` (Imagen 3) |
| `--model` | (auto) | Model override. Default: `gemini-2.5-flash-image` or `imagen-3.0-generate-002` |
| `--edit` | | Path to input image for editing (Gemini engine only) |
| `--aspect` | 1:1 | Aspect ratio for Imagen: `1:1`, `16:9`, `9:16`, `4:3`, `3:4` |
| `--out-dir` | (auto) | Output directory (default is a timestamped folder) |
| `--style` | | Style preset to prepend to the prompt |
| `--styles` | | List available style presets and exit |
## Python Example
```python
import subprocess
subprocess.run(
[
"python3",
"scripts/gen.py",
"--prompt",
"a serene mountain landscape at golden hour",
"--count",
"4",
"--style",
"photo",
],
check=True,
)
```
## Troubleshooting
- Missing API key: set `GEMINI_API_KEY` in your environment and retry.
- Rate limits / 429 errors: wait a bit and retry, reduce `--count`, or switch engines.
- Model errors: verify the model name, try the default model, or change engines.
## Integration with Other Skills
- **[AgentGram](https://clawhub.org/skills/agentgram)** — Share your generated images on the AI agent social network! Create visual content and post it to your AgentGram feed.
- **[agent-selfie](https://clawhub.org/skills/agent-selfie)** — Focused on AI agent avatars and visual identity. Uses the same Gemini API key for personality-driven self-portraits.
## Changelog
- v1.1.0: Added style presets, `--style` and `--styles` flags, expanded documentation.
- v1.0.0: Initial release with Gemini native + Imagen 3 support, batch generation, and HTML gallery.
## Repository
https://github.com/IISweetHeartII/gemini-image-genRelated Skills
ImageMagick Moltbot Skill
Comprehensive ImageMagick operations for image manipulation in Moltbot.
table-image
Generate images from tables for better readability in messaging apps like Telegram. Use when displaying tabular data.
ms-foundry-image-gen
Azure Foundry image generation skill for OpenClaw; generates images via a Foundry deployment and returns image.
antigravity-image-gen
Generate images using the internal Google Antigravity API (Gemini 3 Pro Image). High quality, native generation without browser automation.
antigravity-image
Generate images using the internal Antigravity Sandbox API (Gemini 3 Pro Image). Supports text-to-image generation via internal Google endpoints.
zhipu-cogview-image
Generate images using Zhipu AI's CogView model.
doubao-image-gen
Use Zhipu (智谱) web search API for searching the internet.
10-of-my-most-popular-text-to-image-series-prompts-78b0897e
generate a bunch of images, then you curate the results to handpick the best ones
Seasonal Product Image
**Version**: 1.0.0
google-imagen-3-portrait-photography
Generate professional portrait photography using Google Imagen 3. Use when creating realistic portraits, headshots, or artistic character photography with professional lighting and composition.
google-imagen-3-hyperrealistic-landscape
Generate hyperrealistic landscape photography using Google Imagen 3. Use when creating breathtaking natural scenes, landscapes, and nature photography with exceptional detail and realism.
gemini-nano-banana-pro-portraits
Generate ultra-photorealistic portraits using Gemini Nano Banana Pro with comprehensive JSON configuration templates. Use when creating cinematic quality portraits, fitness photography, or realistic character images. Includes complete JSON structure for prompt configuration, subject details, apparel, pose, environment, lighting, and technical specifications.