image-gen
Modular image generation - supports local SDXL Lightning, OpenAI DALL-E, Replicate, or custom providers
Best use case
image-gen is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Modular image generation - supports local SDXL Lightning, OpenAI DALL-E, Replicate, or custom providers
Teams using image-gen should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/image-gen/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How image-gen Compares
| Feature / Agent | image-gen | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Modular image generation - supports local SDXL Lightning, OpenAI DALL-E, Replicate, or custom providers
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Modular Image Generation Skill
---
## LIBRARY-FIRST PROTOCOL (MANDATORY)
**Before writing ANY code, you MUST check:**
### Step 1: Library Catalog
- Location: `.claude/library/catalog.json`
- If match >70%: REUSE or ADAPT
### Step 2: Patterns Guide
- Location: `.claude/docs/inventories/LIBRARY-PATTERNS-GUIDE.md`
- If pattern exists: FOLLOW documented approach
### Step 3: Existing Projects
- Location: `D:\Projects\*`
- If found: EXTRACT and adapt
### Decision Matrix
| Match | Action |
|-------|--------|
| Library >90% | REUSE directly |
| Library 70-90% | ADAPT minimally |
| Pattern exists | FOLLOW pattern |
| In project | EXTRACT |
| No match | BUILD (add to library after) |
---
## Purpose
Generate images using the best available provider - local models (free, private) or cloud APIs (fast, paid). Fully modular architecture allows plugging in any image generation backend.
## Providers
| Provider | Type | Cost | Requirements | Quality |
|----------|------|------|--------------|---------|
| **SDXL Lightning** | Local | Free | 8GB VRAM, ~7GB disk | Excellent |
| **OpenAI DALL-E 3** | API | ~$0.04/image | OPENAI_API_KEY | Excellent |
| **Replicate** | API | ~$0.01/image | REPLICATE_API_TOKEN | Good |
| **Custom** | Any | Varies | User-defined | Varies |
## When to Use
### Perfect For:
- Blog banners and social media images (LinkedIn: 1200x630)
- Documentation diagrams and illustrations
- UI mockups and wireframes
- Concept visualization
- Any image generation need
### Provider Selection:
- **Privacy required?** -> Use local SDXL
- **No GPU?** -> Use OpenAI or Replicate API
- **Batch generation?** -> Use local (no API costs)
- **Highest quality?** -> DALL-E 3 or SDXL Lightning
## Quick Start
### 1. Check Available Providers
```bash
python scripts/multi-model/image-gen/cli.py --list
```
### 2. Setup Local SDXL (Recommended)
```bash
# First-time setup (downloads ~7GB)
python scripts/multi-model/image-gen/cli.py --setup local
```
### 3. Generate Images
```bash
# Auto-selects best available provider
python scripts/multi-model/image-gen/cli.py "A sunset over mountains" output.png
# LinkedIn banner size
python scripts/multi-model/image-gen/cli.py "Tech concept" banner.png --width 1200 --height 630
# Specific provider
python scripts/multi-model/image-gen/cli.py "A cat" cat.png --provider openai
```
## Integration with Visual Art Composition
For professional-quality images, combine with `visual-art-composition`:
```
Step 1: visual-art-composition (Structure the prompt)
|
+---> 13-dimension aesthetic framework
+---> Cross-cultural synthesis
+---> Productive tension resolution
|
v
Step 2: image-gen (Generate the image)
|
+---> Select best provider (local or API)
+---> Generate high-quality image
+---> Save to specified path
```
### Example Pipeline
```bash
# 1. Get structured prompt from visual-art-composition
/visual-art-composition "tech dashboard for productivity app"
# 2. Generate with structured prompt
python scripts/multi-model/image-gen/cli.py \
"Dashboard UI with linear perspective depth, composed blues and warm golds,
focal hierarchy with clear primary metric, notan two-value contrast.
Modern professional aesthetic, clean geometric forms." \
docs/images/dashboard.png --width 1200 --height 630
```
## Provider Setup
### Local SDXL Lightning (Recommended)
**Requirements:**
- GPU with 8GB+ VRAM (or CPU with 16GB+ RAM, slower)
- ~7GB disk space on D: drive
- Python with diffusers, torch
**Setup:**
```bash
python scripts/multi-model/image-gen/cli.py --setup local
```
**Environment Variables (optional):**
```bash
export SDXL_MODEL_DIR="D:/AI-Models/sdxl-lightning"
```
### OpenAI DALL-E 3
**Requirements:**
- OpenAI API key
- ~$0.04 per image
**Setup:**
```bash
export OPENAI_API_KEY="sk-..."
python scripts/multi-model/image-gen/cli.py --setup openai
```
### Replicate
**Requirements:**
- Replicate API token
- ~$0.01 per image
**Setup:**
```bash
export REPLICATE_API_TOKEN="r8_..."
python scripts/multi-model/image-gen/cli.py --setup replicate
```
## Adding Custom Providers
Create a new provider by implementing `ImageGeneratorBase`:
```python
from base import ImageGeneratorBase, ImageProvider, ProviderRegistry
class MyCustomGenerator(ImageGeneratorBase):
provider = ImageProvider.CUSTOM
def is_available(self) -> bool:
# Check if provider is configured
return True
def setup(self) -> bool:
# Download models, verify API keys, etc.
return True
def generate(self, prompt, output_path, config=None):
# Generate image
# Return GeneratedImage
pass
# Register
ProviderRegistry.register(ImageProvider.CUSTOM, MyCustomGenerator)
```
## Python API
```python
from scripts.multi_model.image_gen.base import ProviderRegistry, ImageConfig
# Get best available provider
provider = ProviderRegistry.get_best_available()
# Configure
config = ImageConfig(
width=1200,
height=630,
num_inference_steps=4
)
# Generate
result = provider.generate(
prompt="A beautiful sunset",
output_path="output.png",
config=config
)
print(f"Generated: {result.path} in {result.generation_time_seconds}s")
```
## Batch Generation
```python
prompts = [
"Sunset over mountains",
"City skyline at night",
"Forest in autumn"
]
results = provider.generate_batch(
prompts=prompts,
output_dir="./images/",
config=config
)
```
## Best Practices
### Prompt Engineering
1. Be specific about composition, colors, style
2. Include negative prompts for local models
3. Use visual-art-composition for professional quality
4. Specify aspect ratio in prompt when needed
### Performance
1. Local models: First generation is slow (model loading), subsequent are fast
2. API models: Consistent speed, watch for rate limits
3. Batch generation: More efficient than individual calls
### Quality
1. SDXL Lightning: 4 steps is optimal (more steps = minimal improvement)
2. DALL-E 3: No step control, always high quality
3. Always validate output matches intent
## Related Skills
- `visual-art-composition`: 13-dimension aesthetic framework for structured prompts
- `prompt-architect`: General prompt optimization
- `pptx-generation`: Uses images for presentation slides
## Troubleshooting
### "No provider available"
- Run `--list` to see what's configured
- Run `--setup local` to download SDXL Lightning
- Or set API keys for cloud providers
### Out of VRAM
- Use CPU mode (slower): Set `SDXL_DEVICE=cpu`
- Use API provider instead
- Reduce image size
### Slow First Generation
- Normal for local models (loading ~7GB model)
- Subsequent generations are fast (~2-5 seconds)
### Poor Quality
- Use more descriptive prompts
- Apply visual-art-composition framework
- Try different provider
## Files
- CLI: `scripts/multi-model/image-gen/cli.py`
- Base classes: `scripts/multi-model/image-gen/base.py`
- Local SDXL: `scripts/multi-model/image-gen/local_sdxl.py`
- API providers: `scripts/multi-model/image-gen/api_providers.py`Related Skills
/*============================================================================*/
/* SKILL SKILL :: VERILINGUA x VERIX EDITION */
web-scraping
Structured data extraction from web pages using claude-in-chrome MCP with sequential-thinking planning. Focus on READ operations, data transformation, and pagination handling for multi-page extraction.
visual-testing
Screenshot-based visual comparison and regression testing using claude-in-chrome MCP. Captures, compares, and validates UI states to detect layout shifts, visual bugs, and design regressions across viewports.
reflect
Extract learnings from session corrections and patterns, update skill files with persistent memory. Implements Loop 1.5 - per-session micro-learning between execution and meta-optimization.
fix-bug
Fix bug command
e2e-test
End-to-end testing workflow for validating complete user journeys through web applications using claude-in-chrome MCP. Specializes in test assertions, suite organization, evidence collection, and pass/fail reporting.
build-feature
Build feature command
browser-automation
Complex browser automation workflow using claude-in-chrome MCP with mandatory sequential-thinking planning. Use when automating multi-step web interactions, form filling, navigation sequences, or web scraping.
reverse-engineering-quick-triage
Fast binary analysis with string reconnaissance and static disassembly\ \ (RE Levels 1-2). Use when triaging suspicious binaries, extracting IOCs quickly,\ \ or performing initial malware analysis. Completes in \u22642 hours with automated\ \ decision gates."
reverse-engineering-firmware-analysis
Firmware extraction and IoT security analysis (RE Level 5) for routers and embedded systems. Use when analyzing IoT firmware, extracting embedded filesystems (SquashFS/JFFS2/CramFS), finding hardcoded credentials, performing CVE scans, or auditing embedded system security. Handles encrypted firmware with known decryption schemes. Completes in 2-8 hours with binwalk+firmadyne+QEMU emulation.
reverse-engineering-deep-analysis
Advanced binary analysis with runtime execution and symbolic path exploration (RE Levels 3-4). Use when need runtime behavior, memory dumps, secret extraction, or input synthesis to reach specific program states. Completes in 3-7 hours with GDB+Angr.
reconnaissance
Systematic technology and market reconnaissance for extracting actionable intelligence from repositories, papers, and competitive landscapes.