ai-image-tools

Generate and edit images using either OpenAI GPT Image 1.5 or Google's Nano Banana Pro (Gemini 3 Pro Image). Use when the user asks to generate/create/edit/modify images. Supports image-to-image editing for both providers and optional mask-based inpainting for OpenAI.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

ai-image-tools is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using ai-image-tools should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ai-image-tools/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/tools/ai-image-tools/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/ai-image-tools/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ai-image-tools Compares

Feature / Agent	ai-image-tools	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# AI Image Tools (OpenAI + Gemini)

One unified skill for image generation + editing, supporting:

- **OpenAI**: GPT Image 1.5 (generation + edits, optional mask inpainting)
- **Gemini**: Nano Banana Pro (Gemini 3 Pro Image) (generation + image-to-image edits)

## Usage

Run from your current working directory so outputs save where you're working.

### Generate (text → image)

```bash
uv run scripts/generate_image.py --prompt "A moody cinematic portrait of a golden retriever" --filename "out.png"
```

Pick a provider explicitly:

```bash
# OpenAI (GPT Image 1.5)
uv run scripts/generate_image.py --provider openai --prompt "..." --filename "out.png"

# Gemini (Nano Banana Pro)
uv run scripts/generate_image.py --provider gemini --prompt "..." --filename "out.png"
```

### Edit (image → image)

```bash
uv run scripts/generate_image.py --prompt "Make it look like a watercolor painting" --filename "out.png" --input-image "input.png"
```

Mask-based inpainting (OpenAI only):

```bash
uv run scripts/generate_image.py --provider openai --prompt "A red balloon" --filename "out.png" --input-image "input.png" --mask "mask.png"
```

## Provider Selection

- Default `--provider auto`:
  - uses OpenAI if `OPENAI_API_KEY` (or `--openai-api-key`) is available
  - otherwise uses Gemini if `GEMINI_API_KEY` (or `--gemini-api-key`) is available
- Set `--provider openai` or `--provider gemini` to force one.

## API Keys

- **OpenAI**:
  - env: `OPENAI_API_KEY`
  - flag: `--openai-api-key`
- **Gemini**:
  - env: `GEMINI_API_KEY`
  - flag: `--gemini-api-key`

## Options (Provider-Specific)

### OpenAI options

- `--quality low|medium|high` (generation only; default `medium`)
- `--size 1024x1024|1024x1536|1536x1024|auto` (default `1024x1024`)
- `--background transparent|opaque|auto` (generation only; default `auto`)
- `--mask path/to/mask.png` (edits only)

### Gemini options

- `--resolution 1K|2K|4K` (default `1K`)

## Notes

- Output is always saved as **PNG** at `--filename`.
- Don’t read the output image back into the model unless explicitly requested.

Related Skills

browser-dev-tools

from diegosouzapw/awesome-omni-skill

使用 Chrome DevTools MCP 进行前端页面调试、布局优化、性能诊断及交互验证。

anthropic-dev-tools-mcp-builder

from diegosouzapw/awesome-omni-skill

Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).

agent-ops-tools

from diegosouzapw/awesome-omni-skill

Detect available development tools at session start. Saves to .agent/tools.json and warns about missing required tools. Works with or without aoc CLI installed.

ai-image-asset-generator

from diegosouzapw/awesome-omni-skill

This skill should be used when generating AI image assets for websites, landing pages, or applications. It automatically analyzes page requirements, generates images using Gemini API, removes backgrounds, converts to SVG for interactivity, and places assets in frontend code. Ideal for creating hero images, icons, backgrounds, product mockups, and infographic elements. Use this skill when users need image assets for their web projects.

genesis-tools:living-docs

from diegosouzapw/awesome-omni-skill

Self-maintaining documentation system. Bootstraps, validates, refines, and optimizes codebase documentation. Creates minimal, token-efficient doc chunks. Use when creating, updating, or auditing project documentation.

Docker Image Builder Skill

from diegosouzapw/awesome-omni-skill

Transform Docker knowledge from Lessons 1-6 into a reusable AI skill for consistent, production-ready containerization

azure-ai-vision-imageanalysis-py

from diegosouzapw/awesome-omni-skill

Azure AI Vision Image Analysis SDK for captions, tags, objects, OCR, people detection, and smart cropping. Use for computer vision and image understanding tasks.

argocd-image-updater

from diegosouzapw/awesome-omni-skill

Automate container image updates for Kubernetes workloads managed by Argo CD. USE WHEN configuring ArgoCD Image Updater, setting up automatic image updates, configuring update strategies (semver, digest, newest-build, alphabetical), implementing git write-back, troubleshooting image update issues, or working with ImageUpdater CRDs. Covers installation, configuration, authentication, and best practices.

tools-ui-frontend-design

from diegosouzapw/awesome-omni-skill

Create distinctive, production-grade frontend interfaces grounded in this repo's design system. Use when asked to build web components, pages, or applications. Combines bold creative direction with token-constrained implementation.

scanning-tools

from diegosouzapw/awesome-omni-skill

This skill should be used when the user asks to "perform vulnerability scanning", "scan networks for open ports", "assess web application security", "scan wireless networks", "detec...

red-team-tools

from diegosouzapw/awesome-omni-skill

This skill should be used when the user asks to "follow red team methodology", "perform bug bounty hunting", "automate reconnaissance", "hunt for XSS vulnerabilities", "enumerate su...

md-to-image

from diegosouzapw/awesome-omni-skill

Convert Markdown tables to PNG images for Telegram, WhatsApp, and other chat interfaces that don't support table formatting.