media-generation

Generate images, edit existing images, create short videos, run inpainting/outpainting and object-focused edits, use reference images as provider inputs, batch related media jobs from a manifest, and fetch returned media from URLs/HTML/JSON/data URLs/base64. Use when working on AI image generation, AI image editing, mask-based inpainting, outpainting, reference-image workflows, short AI video generation, product-shot variations, or reusable media-production pipelines.

1,864 stars

byLeoYeAI

View on GitHub Installation ↓

Best use case

media-generation is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using media-generation should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/media-generation/SKILL.md --create-dirs "https://raw.githubusercontent.com/LeoYeAI/openclaw-master-skills/main/skills/media-generation/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/media-generation/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How media-generation Compares

Feature / Agent	media-generation	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Media Generation

Handle image generation, image editing, and short video generation through one workflow: choose the right modality, pass caller intent through to the provider, save outputs under `tmp/images/` or `tmp/videos/`, and prefer the bundled helpers over ad-hoc one-off API calls.

## Workflow decision

- If the user wants a brand-new still image, use an image-generation model.
- If the user supplies an image or wants a specific existing image changed, use an image-edit workflow.
- If the user wants motion / a clip / a short video, use a video-generation model.
- If the request includes one or more reference images, use the helper that supports reference-image transport.

## Standard workflow

1. Determine whether the task is image generation, image editing, or video generation.
2. Clarify only when required to execute the request correctly.
3. Prefer `scripts/generate_image.py` for still-image generation.
4. Prefer `scripts/edit_image.py` for direct image edits.
5. Prefer `scripts/mask_inpaint.py` for localized edits with masks or generated regions.
6. Prefer `scripts/outpaint_image.py` for canvas expansion / outpainting.
7. Prefer `scripts/generate_consistent_media.py` when reference images need to be passed through.
8. Prefer `scripts/generate_video.py` for video generation, especially when the provider may return async job payloads.
9. Prefer `scripts/generate_batch_media.py` for repeatable batch jobs, templated variations, or auditable manifests.
10. Prefer `scripts/object_select_edit.py` for simple object-vs-background edits on transparent assets or clean backdrops.
11. If the provider returns a URL, path, HTML snippet, markdown snippet, `data:` URL, or `b64_json`, use `scripts/fetch_generated_media.py`.
12. Save outputs under:
    - images → `tmp/images/`
    - videos → `tmp/videos/`
13. If the user wants files sent in chat, prefer sending the local downloaded file.
14. Keep the original remote reference as fallback when local retrieval fails.

## Prompt handling

Default to **prompt pass-through**.

- Pass the caller's prompt through unchanged.
- Use optional request fields only when the caller provides them.
- Keep prompt semantics under caller control.

Use the scripts mainly as functional helpers:
- normalize arguments
- map fields to provider-specific JSON
- upload files
- poll async jobs
- download returned media
- save outputs under `tmp/images/` or `tmp/videos/`

## Delivery rules

- Save generated or edited images in `tmp/images/`.
- Save generated videos in `tmp/videos/`.
- Never scatter generated files in the workspace root.
- If message delivery blocks remote URLs, download locally first and then send the local file.
- If a remote file cannot be fetched locally but the raw link may still help, provide the original link clearly.

## Image generation helper

Use `scripts/generate_image.py` for direct still-image generation.

Example:

```bash
python3 skills/media-generation/scripts/generate_image.py \
  --prompt 'person' \
  --size '1024x1024' \
  --out-dir 'tmp/images' \
  --prefix 'generated'
```

The helper:
- reads provider credentials from OpenClaw config (`~/.openclaw/openclaw.json` by default, or `--config` / `$OPENCLAW_CONFIG`)
- calls `/images/generations` by default
- supports `size`, `quality`, `style`, `background`, `n`, `seed`, `extra-json`, and `extra-json-file`
- downloads the returned image into `tmp/images/` by default
- handles providers that reply with URL/path, `data:` URL, or `b64_json`

## Image edit helper

Use `scripts/edit_image.py` for direct image-edit calls.

Example:

```bash
python3 skills/media-generation/scripts/edit_image.py \
  --image 'tmp/images/source.jpg' \
  --prompt 'replace the background' \
  --out-dir 'tmp/images' \
  --prefix 'edited'
```

The helper:
- reads provider credentials from OpenClaw config
- calls `/images/edits` by default
- supports optional `--mask` input for localized edits
- downloads the returned image into `tmp/images/` by default
- handles URL/path, `data:` URL, or `b64_json`

## Mask inpaint helper

Use `scripts/mask_inpaint.py` for localized repainting tasks.

Example:

```bash
python3 skills/media-generation/scripts/mask_inpaint.py \
  --image 'tmp/images/source.jpg' \
  --x 120 --y 80 --width 220 --height 180 \
  --prompt 'replace the masked area' \
  --out-dir 'tmp/images' \
  --prefix 'mask-result'
```

The helper:
- accepts either an existing `--mask` image or generated regions
- supports rectangle / ellipse regions and repeatable `--region` specs
- supports percentage-based regions like `rect-pct` / `ellipse-pct`
- supports `--expand` / `--shrink` before feathering
- supports `--mask-only` for local preparation / testing without a live API call
- forwards `--config`, `--provider`, `--model`, and `--endpoint` to `scripts/edit_image.py`
- reuses `scripts/edit_image.py` for the final edit call

## Outpaint helper

Use `scripts/outpaint_image.py` for extension / canvas expansion tasks.

Example:

```bash
python3 skills/media-generation/scripts/outpaint_image.py \
  --image 'tmp/images/source.jpg' \
  --left 512 --right 512 --top 128 --bottom 128 \
  --mode blur \
  --prompt 'extend outward' \
  --out-dir 'tmp/images' \
  --prefix 'outpaint-result'
```

The helper:
- expands the canvas locally before calling the model
- supports directional expansion on each side
- supports `transparent`, `blur`, and `solid` initialization modes
- forwards `--config`, `--provider`, `--model`, and `--endpoint` to `scripts/edit_image.py`
- reuses `scripts/edit_image.py` for the final edit call

## Reference-image helper

Use `scripts/generate_consistent_media.py` when one or more reference images need to be passed through to the provider.

Note: the script name is historical; its current role is reference-image transport and delegation.

Example:

```bash
python3 skills/media-generation/scripts/generate_consistent_media.py \
  --mode image \
  --reference-image 'tmp/images/reference.png' \
  --prompt 'character' \
  --size '1024x1024' \
  --out-dir 'tmp/images' \
  --prefix 'reference-output'
```

The helper:
- can pass encoded reference images in provider JSON (default key: `reference_images`)
- can retry without provider-json references when transport is `auto`
- delegates to `scripts/generate_image.py` or `scripts/generate_video.py`

## Batch generation helper

Use `scripts/generate_batch_media.py` when the user wants several related outputs, repeatable batch rendering, or a manifest-driven workflow.

Example:

```bash
python3 skills/media-generation/scripts/generate_batch_media.py \
  --manifest 'tmp/images/media-batch.jsonl' \
  --vars-json '{"subject":"item"}' \
  --summary-out 'tmp/images/media-batch-summary.json' \
  --continue-on-error \
  --print-json
```

The helper supports:
- JSON array or JSONL manifests
- image generation, video generation, and reference-image generation
- shared templating vars via `--vars-json` or `--vars-file`
- item-local `vars` objects for per-item string rendering such as `{index}`
- `--summary-out` to persist the resolved batch result JSON
- `--dry-run` to validate a manifest before spending live generation calls

## Object-select edit helper

Use `scripts/object_select_edit.py` when the source has a transparent background or a simple clean backdrop and the user wants a one-step object or background edit workflow.

Example:

```bash
python3 skills/media-generation/scripts/object_select_edit.py \
  --image 'tmp/images/product.png' \
  --selection-mode alpha \
  --edit-target background \
  --prompt 'replace the background' \
  --out-dir 'tmp/images' \
  --prefix 'product-bg-edit'
```

The helper:
- prepares an object/background mask with `prepare_object_mask.py`
- flips the mask automatically when editing the background instead of the object
- passes the prepared mask into `mask_inpaint.py`
- supports `--prepare-only` for local inspection/testing without a live edit call

## Video generation helper

Use `scripts/generate_video.py` for direct video-generation calls.

Example:

```bash
python3 skills/media-generation/scripts/generate_video.py \
  --prompt 'motion clip' \
  --size '720x1280' \
  --seconds 6 \
  --out-dir 'tmp/videos' \
  --prefix 'generated-video'
```

The helper:
- reads provider credentials from OpenClaw config
- calls `/videos` by default
- supports `size`, `seconds` / `duration`, `fps`, `seed`, optional input image, `extra-json`, and `extra-json-file`
- can resolve both immediate-result and async job responses by polling when the provider returns job metadata instead of the final media directly
- downloads the returned video into `tmp/videos/` by default

## Retrieval helper

Use `scripts/fetch_generated_media.py` for both images and videos.
It can extract downloadable refs from markdown / HTML / JSON, and can also persist `data:` URLs or `b64_json` payloads directly to local files.

## Quick compatibility checklist

Before blaming the skill, check these first:
- config exists and is valid JSON
- `config.models.providers.<provider>` exists
- the selected provider has both `baseUrl` and `apiKey`
- the chosen endpoint actually exists on that provider
- the chosen model name is valid for that endpoint
- any provider-specific fields passed through `--extra-json` or `--extra-json-file` match that provider's schema

Defaults used by the bundled scripts:
- config path: `~/.openclaw/openclaw.json` or `$OPENCLAW_CONFIG`
- default provider: `$OPENCLAW_MEDIA_PROVIDER`, otherwise the first provider found in config
- default model names: placeholders unless overridden by env vars or `--model`
  - image → `$OPENCLAW_MEDIA_IMAGE_MODEL` or `image-model`
  - edit → `$OPENCLAW_MEDIA_EDIT_MODEL` or `image-edit-model`
  - video → `$OPENCLAW_MEDIA_VIDEO_MODEL` or `video-model`
- output root: `tmp/` or `$MEDIA_GENERATION_OUTPUT_ROOT`
- output paths are resolved relative to the current working directory unless you pass an absolute `--out-dir`

## Quick troubleshooting

Common failure patterns:
- **`provider not found`** → pass `--provider` explicitly or set `$OPENCLAW_MEDIA_PROVIDER`
- **placeholder model warning (`image-model` / `image-edit-model` / `video-model`)** → pass `--model` explicitly or set the matching `$OPENCLAW_MEDIA_*_MODEL` env var
- **`config not found` / invalid JSON** → pass `--config` explicitly or fix the OpenClaw config file
- **HTTP 404** → check `--endpoint` and video polling paths
- **HTTP 400** → check model name and provider-specific payload fields in `--extra-json` / `--extra-json-file`
- **HTTP 401/403** → check the provider `apiKey`
- **request failed before HTTP response** → check base URL, proxy/TLS, or network reachability
- **video accepted then failed later** → check request payload, provider logs, or switch provider/model

Use `--print-json` when debugging so the response body, resolved endpoint, and failure hints stay visible.

## References

- Batch workflow reference: `references/batch-workflows.md`
- Model capability matrix: `references/model-capabilities.md`
- Reference-image workflow: `references/reference-image-workflow.md`
- Image generation helper: `scripts/generate_image.py`
- Reference-image helper: `scripts/generate_consistent_media.py`
- Image edit helper: `scripts/edit_image.py`
- Mask inpaint helper: `scripts/mask_inpaint.py`
- Outpaint helper: `scripts/outpaint_image.py`
- Video generation helper: `scripts/generate_video.py`
- Batch generation helper: `scripts/generate_batch_media.py`
- Object-select edit helper: `scripts/object_select_edit.py`
- Object mask prep helper: `scripts/prepare_object_mask.py`
- Shared request utility: `scripts/media_request_common.py`
- Smoke tests: `scripts/smoke_test.py`
- Unified fetch helper: `scripts/fetch_generated_media.py`

Related Skills

social-media-analyzer

1864

from LeoYeAI/openclaw-master-skills

Social media campaign analysis and performance tracking. Calculates engagement rates, ROI, and benchmarks across platforms. Use for analyzing social media performance, calculating engagement rate, measuring campaign ROI, comparing platform metrics, or benchmarking against industry standards.

social-media-manager

1864

from LeoYeAI/openclaw-master-skills

When the user wants to develop social media strategy, plan content calendars, manage community engagement, or grow their social presence across platforms. Also use when the user mentions 'social media strategy,' 'social calendar,' 'community management,' 'social media plan,' 'grow followers,' 'engagement rate,' 'social media audit,' or 'which platforms should I use.' For writing individual social posts, see social-content. For analyzing social performance data, see social-media-analyzer.

AI Image Generation & Editor — Nanobanana, GPT Image, ComfyUI

1864

from LeoYeAI/openclaw-master-skills

Generate images from text with multi-provider routing — supports Nanobanana 2, Seedream 5.0, GPT Image, and local ComfyUI workflows. Includes 1,300+ curated prompts and style-aware prompt enhancement. Use when users want to create images, design assets, enhance prompts, or manage AI art workflows.

youtube-watcher

1864

from LeoYeAI/openclaw-master-skills

Fetch and read transcripts from YouTube videos. Use when you need to summarize a video, answer questions about its content, or extract information from it.

youtube-transcript

1864

from LeoYeAI/openclaw-master-skills

Fetch and summarize YouTube video transcripts. Use when asked to summarize, transcribe, or extract content from YouTube videos. Handles transcript fetching via residential IP proxy to bypass YouTube's cloud IP blocks.

youtube-auto-captions - YouTube 自动字幕

1864

from LeoYeAI/openclaw-master-skills

## 描述

youtube

1864

from LeoYeAI/openclaw-master-skills

YouTube Data API integration with managed OAuth. Search videos, manage playlists, access channel data, and interact with comments. Use this skill when users want to interact with YouTube. For other third party apps, use the api-gateway skill (https://clawhub.ai/byungkyu/api-gateway).

yahoo-finance

1864

from LeoYeAI/openclaw-master-skills

Get stock prices, quotes, fundamentals, earnings, options, dividends, and analyst ratings using Yahoo Finance. Uses yfinance library - no API key required.

xurl

1864

from LeoYeAI/openclaw-master-skills

A Twitter research and content intelligence skill focused on attracting WordPress and Shopify clients. Use to analyze Twitter profiles, threads, and conversations for: (1) Identifying what small agency founders and eCommerce brands are discussing; (2) Understanding pain points around WordPress performance, Shopify CRO, and development bottlenecks; (3) Extracting high-performing content angles; (4) Turning insights into authority-building posts; (5) Converting Twitter intelligence into business leverage for clear content angles, strong positioning, and qualified inbound leads.

xlsx

1864

from LeoYeAI/openclaw-master-skills

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.

xiaohongshu-mcp

1864

from LeoYeAI/openclaw-master-skills

Automate Xiaohongshu (RedNote) content operations using a Python client for the xiaohongshu-mcp server. Use for: (1) Publishing image, text, and video content, (2) Searching for notes and trends, (3) Analyzing post details and comments, (4) Managing user profiles and content feeds. Triggers: xiaohongshu automation, rednote content, publish to xiaohongshu, xiaohongshu search, social media management.

twitter-openclaw

1864

from LeoYeAI/openclaw-master-skills

Interact with Twitter/X — read tweets, search, post, like, retweet, and manage your timeline.