ai-image-tools
Generate and edit images using either OpenAI GPT Image 1.5 or Google's Nano Banana Pro (Gemini 3 Pro Image). Use when the user asks to generate/create/edit/modify images. Supports image-to-image editing for both providers and optional mask-based inpainting for OpenAI.
Best use case
ai-image-tools is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Generate and edit images using either OpenAI GPT Image 1.5 or Google's Nano Banana Pro (Gemini 3 Pro Image). Use when the user asks to generate/create/edit/modify images. Supports image-to-image editing for both providers and optional mask-based inpainting for OpenAI.
Teams using ai-image-tools should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/ai-image-tools/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How ai-image-tools Compares
| Feature / Agent | ai-image-tools | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Generate and edit images using either OpenAI GPT Image 1.5 or Google's Nano Banana Pro (Gemini 3 Pro Image). Use when the user asks to generate/create/edit/modify images. Supports image-to-image editing for both providers and optional mask-based inpainting for OpenAI.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# AI Image Tools (OpenAI + Gemini) One unified skill for image generation + editing, supporting: - **OpenAI**: GPT Image 1.5 (generation + edits, optional mask inpainting) - **Gemini**: Nano Banana Pro (Gemini 3 Pro Image) (generation + image-to-image edits) ## Usage Run from your current working directory so outputs save where you're working. ### Generate (text → image) ```bash uv run scripts/generate_image.py --prompt "A moody cinematic portrait of a golden retriever" --filename "out.png" ``` Pick a provider explicitly: ```bash # OpenAI (GPT Image 1.5) uv run scripts/generate_image.py --provider openai --prompt "..." --filename "out.png" # Gemini (Nano Banana Pro) uv run scripts/generate_image.py --provider gemini --prompt "..." --filename "out.png" ``` ### Edit (image → image) ```bash uv run scripts/generate_image.py --prompt "Make it look like a watercolor painting" --filename "out.png" --input-image "input.png" ``` Mask-based inpainting (OpenAI only): ```bash uv run scripts/generate_image.py --provider openai --prompt "A red balloon" --filename "out.png" --input-image "input.png" --mask "mask.png" ``` ## Provider Selection - Default `--provider auto`: - uses OpenAI if `OPENAI_API_KEY` (or `--openai-api-key`) is available - otherwise uses Gemini if `GEMINI_API_KEY` (or `--gemini-api-key`) is available - Set `--provider openai` or `--provider gemini` to force one. ## API Keys - **OpenAI**: - env: `OPENAI_API_KEY` - flag: `--openai-api-key` - **Gemini**: - env: `GEMINI_API_KEY` - flag: `--gemini-api-key` ## Options (Provider-Specific) ### OpenAI options - `--quality low|medium|high` (generation only; default `medium`) - `--size 1024x1024|1024x1536|1536x1024|auto` (default `1024x1024`) - `--background transparent|opaque|auto` (generation only; default `auto`) - `--mask path/to/mask.png` (edits only) ### Gemini options - `--resolution 1K|2K|4K` (default `1K`) ## Notes - Output is always saved as **PNG** at `--filename`. - Don’t read the output image back into the model unless explicitly requested.
Related Skills
browser-dev-tools
使用 Chrome DevTools MCP 进行前端页面调试、布局优化、性能诊断及交互验证。
anthropic-dev-tools-mcp-builder
Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).
agent-ops-tools
Detect available development tools at session start. Saves to .agent/tools.json and warns about missing required tools. Works with or without aoc CLI installed.
ai-image-asset-generator
This skill should be used when generating AI image assets for websites, landing pages, or applications. It automatically analyzes page requirements, generates images using Gemini API, removes backgrounds, converts to SVG for interactivity, and places assets in frontend code. Ideal for creating hero images, icons, backgrounds, product mockups, and infographic elements. Use this skill when users need image assets for their web projects.
genesis-tools:living-docs
Self-maintaining documentation system. Bootstraps, validates, refines, and optimizes codebase documentation. Creates minimal, token-efficient doc chunks. Use when creating, updating, or auditing project documentation.
Docker Image Builder Skill
Transform Docker knowledge from Lessons 1-6 into a reusable AI skill for consistent, production-ready containerization
azure-ai-vision-imageanalysis-py
Azure AI Vision Image Analysis SDK for captions, tags, objects, OCR, people detection, and smart cropping. Use for computer vision and image understanding tasks.
argocd-image-updater
Automate container image updates for Kubernetes workloads managed by Argo CD. USE WHEN configuring ArgoCD Image Updater, setting up automatic image updates, configuring update strategies (semver, digest, newest-build, alphabetical), implementing git write-back, troubleshooting image update issues, or working with ImageUpdater CRDs. Covers installation, configuration, authentication, and best practices.
tools-ui-frontend-design
Create distinctive, production-grade frontend interfaces grounded in this repo's design system. Use when asked to build web components, pages, or applications. Combines bold creative direction with token-constrained implementation.
scanning-tools
This skill should be used when the user asks to "perform vulnerability scanning", "scan networks for open ports", "assess web application security", "scan wireless networks", "detec...
red-team-tools
This skill should be used when the user asks to "follow red team methodology", "perform bug bounty hunting", "automate reconnaissance", "hunt for XSS vulnerabilities", "enumerate su...
md-to-image
Convert Markdown tables to PNG images for Telegram, WhatsApp, and other chat interfaces that don't support table formatting.