ai-image-tools

Generate and edit images using either OpenAI GPT Image 1.5 or Google's Nano Banana Pro (Gemini 3 Pro Image). Use when the user asks to generate/create/edit/modify images. Supports image-to-image editing for both providers and optional mask-based inpainting for OpenAI.

16 stars

Best use case

ai-image-tools is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Generate and edit images using either OpenAI GPT Image 1.5 or Google's Nano Banana Pro (Gemini 3 Pro Image). Use when the user asks to generate/create/edit/modify images. Supports image-to-image editing for both providers and optional mask-based inpainting for OpenAI.

Teams using ai-image-tools should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ai-image-tools/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/tools/ai-image-tools/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/ai-image-tools/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How ai-image-tools Compares

Feature / Agentai-image-toolsStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Generate and edit images using either OpenAI GPT Image 1.5 or Google's Nano Banana Pro (Gemini 3 Pro Image). Use when the user asks to generate/create/edit/modify images. Supports image-to-image editing for both providers and optional mask-based inpainting for OpenAI.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# AI Image Tools (OpenAI + Gemini)

One unified skill for image generation + editing, supporting:

- **OpenAI**: GPT Image 1.5 (generation + edits, optional mask inpainting)
- **Gemini**: Nano Banana Pro (Gemini 3 Pro Image) (generation + image-to-image edits)

## Usage

Run from your current working directory so outputs save where you're working.

### Generate (text → image)

```bash
uv run scripts/generate_image.py --prompt "A moody cinematic portrait of a golden retriever" --filename "out.png"
```

Pick a provider explicitly:

```bash
# OpenAI (GPT Image 1.5)
uv run scripts/generate_image.py --provider openai --prompt "..." --filename "out.png"

# Gemini (Nano Banana Pro)
uv run scripts/generate_image.py --provider gemini --prompt "..." --filename "out.png"
```

### Edit (image → image)

```bash
uv run scripts/generate_image.py --prompt "Make it look like a watercolor painting" --filename "out.png" --input-image "input.png"
```

Mask-based inpainting (OpenAI only):

```bash
uv run scripts/generate_image.py --provider openai --prompt "A red balloon" --filename "out.png" --input-image "input.png" --mask "mask.png"
```

## Provider Selection

- Default `--provider auto`:
  - uses OpenAI if `OPENAI_API_KEY` (or `--openai-api-key`) is available
  - otherwise uses Gemini if `GEMINI_API_KEY` (or `--gemini-api-key`) is available
- Set `--provider openai` or `--provider gemini` to force one.

## API Keys

- **OpenAI**:
  - env: `OPENAI_API_KEY`
  - flag: `--openai-api-key`
- **Gemini**:
  - env: `GEMINI_API_KEY`
  - flag: `--gemini-api-key`

## Options (Provider-Specific)

### OpenAI options

- `--quality low|medium|high` (generation only; default `medium`)
- `--size 1024x1024|1024x1536|1536x1024|auto` (default `1024x1024`)
- `--background transparent|opaque|auto` (generation only; default `auto`)
- `--mask path/to/mask.png` (edits only)

### Gemini options

- `--resolution 1K|2K|4K` (default `1K`)

## Notes

- Output is always saved as **PNG** at `--filename`.
- Don’t read the output image back into the model unless explicitly requested.

Related Skills

browser-dev-tools

16
from diegosouzapw/awesome-omni-skill

使用 Chrome DevTools MCP 进行前端页面调试、布局优化、性能诊断及交互验证。

anthropic-dev-tools-mcp-builder

16
from diegosouzapw/awesome-omni-skill

Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).

agent-ops-tools

16
from diegosouzapw/awesome-omni-skill

Detect available development tools at session start. Saves to .agent/tools.json and warns about missing required tools. Works with or without aoc CLI installed.

ai-image-asset-generator

16
from diegosouzapw/awesome-omni-skill

This skill should be used when generating AI image assets for websites, landing pages, or applications. It automatically analyzes page requirements, generates images using Gemini API, removes backgrounds, converts to SVG for interactivity, and places assets in frontend code. Ideal for creating hero images, icons, backgrounds, product mockups, and infographic elements. Use this skill when users need image assets for their web projects.

genesis-tools:living-docs

16
from diegosouzapw/awesome-omni-skill

Self-maintaining documentation system. Bootstraps, validates, refines, and optimizes codebase documentation. Creates minimal, token-efficient doc chunks. Use when creating, updating, or auditing project documentation.

Docker Image Builder Skill

16
from diegosouzapw/awesome-omni-skill

Transform Docker knowledge from Lessons 1-6 into a reusable AI skill for consistent, production-ready containerization

azure-ai-vision-imageanalysis-py

16
from diegosouzapw/awesome-omni-skill

Azure AI Vision Image Analysis SDK for captions, tags, objects, OCR, people detection, and smart cropping. Use for computer vision and image understanding tasks.

argocd-image-updater

16
from diegosouzapw/awesome-omni-skill

Automate container image updates for Kubernetes workloads managed by Argo CD. USE WHEN configuring ArgoCD Image Updater, setting up automatic image updates, configuring update strategies (semver, digest, newest-build, alphabetical), implementing git write-back, troubleshooting image update issues, or working with ImageUpdater CRDs. Covers installation, configuration, authentication, and best practices.

tools-ui-frontend-design

16
from diegosouzapw/awesome-omni-skill

Create distinctive, production-grade frontend interfaces grounded in this repo's design system. Use when asked to build web components, pages, or applications. Combines bold creative direction with token-constrained implementation.

scanning-tools

16
from diegosouzapw/awesome-omni-skill

This skill should be used when the user asks to "perform vulnerability scanning", "scan networks for open ports", "assess web application security", "scan wireless networks", "detec...

red-team-tools

16
from diegosouzapw/awesome-omni-skill

This skill should be used when the user asks to "follow red team methodology", "perform bug bounty hunting", "automate reconnaissance", "hunt for XSS vulnerabilities", "enumerate su...

md-to-image

16
from diegosouzapw/awesome-omni-skill

Convert Markdown tables to PNG images for Telegram, WhatsApp, and other chat interfaces that don't support table formatting.