image-generator

Generate professional visuals using Gemini via browser automation with 6-gate quality control. Use when creating chapter illustrations, diagrams, or teaching visuals. NOT for stock photos or decorative images.

25 stars

Best use case

image-generator is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Generate professional visuals using Gemini via browser automation with 6-gate quality control. Use when creating chapter illustrations, diagrams, or teaching visuals. NOT for stock photos or decorative images.

Teams using image-generator should expect a more consistent output, faster repeated execution, less prompt rewriting, better workflow continuity with your supporting tools.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.
  • You already have the supporting tools or dependencies needed by this skill.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/image-generator/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/aiskillstore/marketplace/92bilal26/image-generator/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/image-generator/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How image-generator Compares

Feature / Agentimage-generatorStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Generate professional visuals using Gemini via browser automation with 6-gate quality control. Use when creating chapter illustrations, diagrams, or teaching visuals. NOT for stock photos or decorative images.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Image Generator

Generate professional teaching visuals using Gemini 3 with multi-turn reasoning partnership.

## Quick Start

```bash
# 1. Start browser (via browser-use skill)
bash .claude/skills/browser-use/scripts/start-server.sh

# 2. Navigate to Gemini
# Use browser_navigate to https://gemini.google.com/

# 3. Generate image from creative brief
# Paste creative brief → Wait 30-35s → Verify 6 gates → Download
```

## Core Principles

1. **Reasoning over prediction** - Creative briefs (Story/Intent/Metaphor) activate reasoning; pixel specs don't
2. **Multi-turn partnership** - Teach Gemini your standards through principle-based feedback
3. **6-gate quality** - Explicit pass/fail before download
4. **Autonomous batch** - No permission-asking between visuals

## Input: Creative Brief Format

Receive from visual-asset-workflow:

```markdown
## The Story
[Narrative about what's visualized]

## Emotional Intent
[What it should FEEL like]

## Visual Metaphor
[Universal concept for instant comprehension]

## Subject / Composition / Action / Location / Style
[Gemini 3 prompt structure]

## Color Semantics
Blue (#2563eb) = Authority | Green (#10b981) = Execution

## Typography Hierarchy
Largest: Key insight | Medium: Supporting | Smallest: Context
```

**Do NOT convert to pixel specs** - use as-is to activate reasoning.

## Workflow (Per Visual)

| Step | Action | Tool |
|------|--------|------|
| 1 | Navigate to gemini.google.com | browser_navigate |
| 2 | Select "🍌 Create Image" | browser_click |
| 3 | Paste creative brief | browser_type |
| 4 | Wait 30-35 seconds | browser_wait_for |
| 5 | Verify 6 gates (below) | Visual inspection |
| 6 | If fail: Iterate with feedback (max 3) | browser_type |
| 7 | If pass: Download full size | browser_click |
| 8 | Copy to `apps/learn-app/static/img/part-{N}/chapter-{NN}/` | Bash |
| 9 | Embed in lesson immediately | Edit |
| 10 | NEW CHAT for next visual | browser_navigate |

## Quality Gates (ALL Must Pass)

| Gate | Criterion | Fail Action |
|------|-----------|-------------|
| 1. Spelling | 99% accuracy (Y-Combinator, Kubernetes) | Iterate |
| 2. Layout | Proportions match prompt (2×2 not 3×1) | Iterate |
| 3. Color | Brand colors match (#2563eb not #002050) | Iterate |
| 4. Typography | Largest = key concept (not decoration) | Iterate |
| 5. Teaching | <5 sec concept grasp at target proficiency | Iterate |
| 6. Uniqueness | Not duplicate of existing chapter image | New chat |

**Decision**: ALL pass → Download | ANY fail → Iterate (max 3 tries)

## Iteration: Principle-Based Feedback

When gate fails, provide teaching feedback:

```
Gate 4 FAILED: Typography hierarchy incorrect

The largest text is "$100K" (supporting detail) but should be "$3T"
(key insight students must grasp).

Increase '$3T' to dominant size. Reduce '$100K' to supporting size.
Information importance drives sizing.
```

## Batch Mode

When invoked with "generate all visuals":

```
For EACH visual in list:
  A. NEW CHAT (context isolation)
  B. Generate (paste brief)
  C. Verify 6 gates
  D. Iterate if needed (max 3)
  E. Download when pass
  F. Embed in lesson
  G. Log "✅ N/M"
  H. NEXT (no stopping)
```

**Never ask**: "Continue?" "Pause here?" "Review?"

**Report at END only**:
```
BATCH COMPLETE
✅ Generated: 16/18
⚠️ Deferred: 2 (quality issues)
Location: apps/learn-app/static/img/part-{N}/
```

## Proficiency Limits

| Level | Max Elements | Grasp Time |
|-------|--------------|------------|
| A2 | 5-7 | <5 sec |
| B1 | 7-10 | <10 sec |
| C2 | No limit | N/A |

## Token Conservation (Batch Mode)

For >8 visuals, condense briefs:

**Original** (250 tokens):
```
"Top Layer shows Coordinator at center top with label 'Orchestrator'
featuring conductor icon, with role 'Strategic oversight'..."
```

**Condensed** (80 tokens):
```
"Top Layer - Coordinator: Center top, 'Orchestrator' (conductor),
Role: 'Strategic oversight', Gold (#fbbf24), Large hexagon."
```

Keep: Story, Intent, Metaphor, Colors, Reasoning
Condense: Long examples → Short labels

## Anti-Patterns

| Don't | Why |
|-------|-----|
| Accept first output without 6 gates | Quality standard violation |
| Ask permission between batch items | Breaks autonomous agency |
| Convert briefs to pixel specs | Defeats reasoning activation |
| Skip embedding step | Creates orphan images |
| Reuse same chat for next visual | Context contamination |

## Session Interruption

If session ends mid-batch, create checkpoint:

```markdown
# Checkpoint: Part {N}
Status: INTERRUPTED at 8/18

## Completed:
- ✅ Image 1: filename (embedded lesson-01.md)
- ✅ Image 2: filename (embedded lesson-02.md)

## Remaining:
- ⏳ Image 8: filename
```

On continuation: Read checkpoint → Resume → Update incrementally

## Success Indicators

- ✅ All 6 gates verified before download
- ✅ Batch completion without permission-asking
- ✅ Principle-based iteration feedback
- ✅ Images organized by part/chapter
- ✅ Immediate embedding (no orphans)
- ✅ >85% production-ready rate

Related Skills

human-browser-use Skill

3891
from openclaw/skills

> Human-like browser automation extension for [browser-use](https://github.com/browser-use/browser-use).

browser-use

3891
from openclaw/skills

AI-powered browser automation for complex multi-step web workflows. Uses Browser-Use framework when OpenClaw's built-in browser tool can't handle login flows, anti-bot sites, or 5+ step sequences.

browser-use

1864
from LeoYeAI/openclaw-master-skills

Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, or extract information from web pages.

fast-browser-use

533
from sundial-org/awesome-openclaw-skills

No description provided.

browser-use

533
from sundial-org/awesome-openclaw-skills

Use Browser Use cloud API to spin up cloud browsers for Clawdbot and run autonomous browser tasks. Primary use is creating browser sessions with profiles (persisted logins/cookies) that Clawdbot can control. Secondary use is running task subagents for fast autonomous browser automation. Docs at docs.browser-use.com and docs.cloud.browser-use.com.

browser-use-2

533
from sundial-org/awesome-openclaw-skills

Cloud browser automation via Browser Use API. Use when you need AI-driven web browsing, scraping, form filling, or multi-step web tasks without local browser control. Triggers on "browser use", "cloud browser", "scrape website", "automate web task", or when local browser isn't available/suitable.

browser-use

242
from aiskillstore/marketplace

Browser automation using Playwright MCP. Navigate websites, fill forms, click elements, take screenshots, and extract data. Use when tasks require web browsing, form submission, web scraping, UI testing, or any browser interaction.

browser-use

197
from mxyhi/ok-skills

Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, or extract information from web pages.

Browser-Use Automation

97
from PramodDutta/qaskills

CLI tool for persistent browser automation with multi-session support, featuring Chromium/Real/Remote browser modes, cookie management, JavaScript execution, and long-running automation workflows.

browser-use

55
from wanikua/become-ceo

Automates browser interactions for social media management across Instagram, LinkedIn, and X. Handles posting, DMs, connection requests, lead scraping, and monitoring. Use when the user needs to navigate, interact with, or extract data from approved websites.

browser-use

50
from vm0-ai/vm0-skills

Browser Use Cloud API for AI-powered browser automation. Use when user mentions "Browser Use", "browser automation", "web task", "AI agent browser", "run browser task", or "automated browser session".

fast-browser-use

33
from aAAaqwq/AGI-Super-Team

Use when the user wants extremely fast browser automation via fast-browser-use / fbu, especially for DOM-heavy pages, fast extraction, or browser tasks on macOS/Linux with Chrome installed.