image-gen

Modular image generation - supports local SDXL Lightning, OpenAI DALL-E, Replicate, or custom providers

28 stars

byDNYoussef

View on GitHub Installation ↓

Best use case

image-gen is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Modular image generation - supports local SDXL Lightning, OpenAI DALL-E, Replicate, or custom providers

Teams using image-gen should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/image-gen/SKILL.md --create-dirs "https://raw.githubusercontent.com/DNYoussef/context-cascade/main/skills/specialists/image-gen/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/image-gen/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How image-gen Compares

Feature / Agent	image-gen	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Modular image generation - supports local SDXL Lightning, OpenAI DALL-E, Replicate, or custom providers

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Modular Image Generation Skill



---

## LIBRARY-FIRST PROTOCOL (MANDATORY)

**Before writing ANY code, you MUST check:**

### Step 1: Library Catalog
- Location: `.claude/library/catalog.json`
- If match >70%: REUSE or ADAPT

### Step 2: Patterns Guide
- Location: `.claude/docs/inventories/LIBRARY-PATTERNS-GUIDE.md`
- If pattern exists: FOLLOW documented approach

### Step 3: Existing Projects
- Location: `D:\Projects\*`
- If found: EXTRACT and adapt

### Decision Matrix
| Match | Action |
|-------|--------|
| Library >90% | REUSE directly |
| Library 70-90% | ADAPT minimally |
| Pattern exists | FOLLOW pattern |
| In project | EXTRACT |
| No match | BUILD (add to library after) |

---

## Purpose

Generate images using the best available provider - local models (free, private) or cloud APIs (fast, paid). Fully modular architecture allows plugging in any image generation backend.

## Providers

| Provider | Type | Cost | Requirements | Quality |
|----------|------|------|--------------|---------|
| **SDXL Lightning** | Local | Free | 8GB VRAM, ~7GB disk | Excellent |
| **OpenAI DALL-E 3** | API | ~$0.04/image | OPENAI_API_KEY | Excellent |
| **Replicate** | API | ~$0.01/image | REPLICATE_API_TOKEN | Good |
| **Custom** | Any | Varies | User-defined | Varies |

## When to Use

### Perfect For:
- Blog banners and social media images (LinkedIn: 1200x630)
- Documentation diagrams and illustrations
- UI mockups and wireframes
- Concept visualization
- Any image generation need

### Provider Selection:
- **Privacy required?** -> Use local SDXL
- **No GPU?** -> Use OpenAI or Replicate API
- **Batch generation?** -> Use local (no API costs)
- **Highest quality?** -> DALL-E 3 or SDXL Lightning

## Quick Start

### 1. Check Available Providers
```bash
python scripts/multi-model/image-gen/cli.py --list
```

### 2. Setup Local SDXL (Recommended)
```bash
# First-time setup (downloads ~7GB)
python scripts/multi-model/image-gen/cli.py --setup local
```

### 3. Generate Images
```bash
# Auto-selects best available provider
python scripts/multi-model/image-gen/cli.py "A sunset over mountains" output.png

# LinkedIn banner size
python scripts/multi-model/image-gen/cli.py "Tech concept" banner.png --width 1200 --height 630

# Specific provider
python scripts/multi-model/image-gen/cli.py "A cat" cat.png --provider openai
```

## Integration with Visual Art Composition

For professional-quality images, combine with `visual-art-composition`:

```
Step 1: visual-art-composition (Structure the prompt)
    |
    +---> 13-dimension aesthetic framework
    +---> Cross-cultural synthesis
    +---> Productive tension resolution
    |
    v
Step 2: image-gen (Generate the image)
    |
    +---> Select best provider (local or API)
    +---> Generate high-quality image
    +---> Save to specified path
```

### Example Pipeline
```bash
# 1. Get structured prompt from visual-art-composition
/visual-art-composition "tech dashboard for productivity app"

# 2. Generate with structured prompt
python scripts/multi-model/image-gen/cli.py \
  "Dashboard UI with linear perspective depth, composed blues and warm golds,
   focal hierarchy with clear primary metric, notan two-value contrast.
   Modern professional aesthetic, clean geometric forms." \
  docs/images/dashboard.png --width 1200 --height 630
```

## Provider Setup

### Local SDXL Lightning (Recommended)

**Requirements:**
- GPU with 8GB+ VRAM (or CPU with 16GB+ RAM, slower)
- ~7GB disk space on D: drive
- Python with diffusers, torch

**Setup:**
```bash
python scripts/multi-model/image-gen/cli.py --setup local
```

**Environment Variables (optional):**
```bash
export SDXL_MODEL_DIR="D:/AI-Models/sdxl-lightning"
```

### OpenAI DALL-E 3

**Requirements:**
- OpenAI API key
- ~$0.04 per image

**Setup:**
```bash
export OPENAI_API_KEY="sk-..."
python scripts/multi-model/image-gen/cli.py --setup openai
```

### Replicate

**Requirements:**
- Replicate API token
- ~$0.01 per image

**Setup:**
```bash
export REPLICATE_API_TOKEN="r8_..."
python scripts/multi-model/image-gen/cli.py --setup replicate
```

## Adding Custom Providers

Create a new provider by implementing `ImageGeneratorBase`:

```python
from base import ImageGeneratorBase, ImageProvider, ProviderRegistry

class MyCustomGenerator(ImageGeneratorBase):
    provider = ImageProvider.CUSTOM

    def is_available(self) -> bool:
        # Check if provider is configured
        return True

    def setup(self) -> bool:
        # Download models, verify API keys, etc.
        return True

    def generate(self, prompt, output_path, config=None):
        # Generate image
        # Return GeneratedImage
        pass

# Register
ProviderRegistry.register(ImageProvider.CUSTOM, MyCustomGenerator)
```

## Python API

```python
from scripts.multi_model.image_gen.base import ProviderRegistry, ImageConfig

# Get best available provider
provider = ProviderRegistry.get_best_available()

# Configure
config = ImageConfig(
    width=1200,
    height=630,
    num_inference_steps=4
)

# Generate
result = provider.generate(
    prompt="A beautiful sunset",
    output_path="output.png",
    config=config
)

print(f"Generated: {result.path} in {result.generation_time_seconds}s")
```

## Batch Generation

```python
prompts = [
    "Sunset over mountains",
    "City skyline at night",
    "Forest in autumn"
]

results = provider.generate_batch(
    prompts=prompts,
    output_dir="./images/",
    config=config
)
```

## Best Practices

### Prompt Engineering
1. Be specific about composition, colors, style
2. Include negative prompts for local models
3. Use visual-art-composition for professional quality
4. Specify aspect ratio in prompt when needed

### Performance
1. Local models: First generation is slow (model loading), subsequent are fast
2. API models: Consistent speed, watch for rate limits
3. Batch generation: More efficient than individual calls

### Quality
1. SDXL Lightning: 4 steps is optimal (more steps = minimal improvement)
2. DALL-E 3: No step control, always high quality
3. Always validate output matches intent

## Related Skills

- `visual-art-composition`: 13-dimension aesthetic framework for structured prompts
- `prompt-architect`: General prompt optimization
- `pptx-generation`: Uses images for presentation slides

## Troubleshooting

### "No provider available"
- Run `--list` to see what's configured
- Run `--setup local` to download SDXL Lightning
- Or set API keys for cloud providers

### Out of VRAM
- Use CPU mode (slower): Set `SDXL_DEVICE=cpu`
- Use API provider instead
- Reduce image size

### Slow First Generation
- Normal for local models (loading ~7GB model)
- Subsequent generations are fast (~2-5 seconds)

### Poor Quality
- Use more descriptive prompts
- Apply visual-art-composition framework
- Try different provider

## Files

- CLI: `scripts/multi-model/image-gen/cli.py`
- Base classes: `scripts/multi-model/image-gen/base.py`
- Local SDXL: `scripts/multi-model/image-gen/local_sdxl.py`
- API providers: `scripts/multi-model/image-gen/api_providers.py`

Related Skills

/============================================================================/

from DNYoussef/context-cascade

/* SKILL SKILL :: VERILINGUA x VERIX EDITION */

web-scraping

from DNYoussef/context-cascade

Structured data extraction from web pages using claude-in-chrome MCP with sequential-thinking planning. Focus on READ operations, data transformation, and pagination handling for multi-page extraction.

visual-testing

from DNYoussef/context-cascade

Screenshot-based visual comparison and regression testing using claude-in-chrome MCP. Captures, compares, and validates UI states to detect layout shifts, visual bugs, and design regressions across viewports.

reflect

from DNYoussef/context-cascade

Extract learnings from session corrections and patterns, update skill files with persistent memory. Implements Loop 1.5 - per-session micro-learning between execution and meta-optimization.

fix-bug

from DNYoussef/context-cascade

Fix bug command

e2e-test

from DNYoussef/context-cascade

End-to-end testing workflow for validating complete user journeys through web applications using claude-in-chrome MCP. Specializes in test assertions, suite organization, evidence collection, and pass/fail reporting.

build-feature

from DNYoussef/context-cascade

Build feature command

browser-automation

from DNYoussef/context-cascade

Complex browser automation workflow using claude-in-chrome MCP with mandatory sequential-thinking planning. Use when automating multi-step web interactions, form filling, navigation sequences, or web scraping.

reverse-engineering-quick-triage

from DNYoussef/context-cascade

Fast binary analysis with string reconnaissance and static disassembly\ \ (RE Levels 1-2). Use when triaging suspicious binaries, extracting IOCs quickly,\ \ or performing initial malware analysis. Completes in \u22642 hours with automated\ \ decision gates."

reverse-engineering-firmware-analysis

from DNYoussef/context-cascade

Firmware extraction and IoT security analysis (RE Level 5) for routers and embedded systems. Use when analyzing IoT firmware, extracting embedded filesystems (SquashFS/JFFS2/CramFS), finding hardcoded credentials, performing CVE scans, or auditing embedded system security. Handles encrypted firmware with known decryption schemes. Completes in 2-8 hours with binwalk+firmadyne+QEMU emulation.

reverse-engineering-deep-analysis

from DNYoussef/context-cascade

Advanced binary analysis with runtime execution and symbolic path exploration (RE Levels 3-4). Use when need runtime behavior, memory dumps, secret extraction, or input synthesis to reach specific program states. Completes in 3-7 hours with GDB+Angr.

reconnaissance

from DNYoussef/context-cascade

Systematic technology and market reconnaissance for extracting actionable intelligence from repositories, papers, and competitive landscapes.

image-gen

Best use case

When to use this skill

When not to use this skill

Installation

How image-gen Compares

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

SKILL.md Source

Related Skills

/*============================================================================*/

web-scraping

visual-testing

reflect

fix-bug

e2e-test

build-feature

browser-automation

reverse-engineering-quick-triage

reverse-engineering-firmware-analysis

reverse-engineering-deep-analysis

reconnaissance

/============================================================================/