AI Agent Skill HUB

Content & Documentation

openclaw-media-gen

Generate images & videos with AIsa. Gemini 3 Pro Image (image) + Qwen Wan 2.6 (video) via one API key.

3,891 stars

Complexity: easy

View on GitHub Installation ↓

About this skill

This skill provides a unified interface for AI agents to generate diverse visual media. It leverages Google's Gemini 3 Pro Image model for image creation, offering capabilities like generating detailed cyberpunk cityscapes or cinematic animal portraits. For video generation, it integrates with Qwen Wan 2.6 (Tongyi Wanxiang), supporting asynchronous tasks for creating short video clips from text prompts or reference images. By consolidating these powerful models under a single AIsa API key, the skill simplifies the process for agents, allowing them to focus on creative prompts rather than managing multiple API integrations. Users can employ this skill for various creative tasks, from quickly prototyping visual assets for presentations or web content to generating short promotional videos or personalized media. It streamlines the workflow for AI agents needing to produce diverse visual content without complex setup, offering a robust solution for enhancing communication and engagement through generated media. The asynchronous nature of video generation means agents can initiate a video task and retrieve the result later, fitting into broader content production pipelines. The skill uses standard `curl` commands, making it accessible to any AI agent capable of executing shell commands and managing API keys.

Best use case

The primary use case is empowering AI agents to create diverse visual content efficiently. Content creators, marketers, developers building AI-powered applications, and anyone requiring quick, high-quality image and video generation will benefit by leveraging advanced models through a simplified API, accelerating content production and reducing integration overhead.

Generate images & videos with AIsa. Gemini 3 Pro Image (image) + Qwen Wan 2.6 (video) via one API key.

The user should expect to receive either a base64 encoded image directly for image generation, or a task ID for asynchronous video generation which can later be used to retrieve the final video file.

Practical example

Example input

Generate a cyberpunk city night scene, neon lights, rainy night, cinematic feel.

Example output

{"candidates":[{"parts":[{"inline_data":{"mime_type":"image/jpeg","data":"/9j/4AAQSkZJRg...<base64_image_data>..."}}]}]}

When to use this skill

When an AI agent needs to generate both still images and short videos from textual prompts.
For content creation tasks requiring a single API for multiple media types.
To quickly prototype visual assets for web, social media, or presentations.
When leveraging advanced models like Gemini 3 Pro Image and Qwen Wan 2.6 is desired without direct individual API integration.

When not to use this skill

When advanced video editing or highly specific frame-by-frame control is required.
For tasks demanding local image/video generation without API calls.
If the user does not have or cannot obtain an `AISA_API_KEY`.
When very long or complex video sequences are needed.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/aisa-media-gen/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/aisapay/aisa-media-gen/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/aisa-media-gen/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How openclaw-media-gen Compares

Feature / Agent	openclaw-media-gen	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	easy	N/A

Frequently Asked Questions

What does this skill do?

Generate images & videos with AIsa. Gemini 3 Pro Image (image) + Qwen Wan 2.6 (video) via one API key.

How difficult is it to install?

The installation complexity is rated as easy. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

Top AI Agents for Productivity

See the top AI agent skills for productivity, workflow automation, operational systems, documentation, and everyday task execution.

SKILL.md Source

# OpenClaw Media Gen 🎬

用 AIsa API 一把钥匙生成**图片**与**视频**：

- **图片**：`gemini-3-pro-image-preview`（Gemini GenerateContent）
- **视频**：`wan2.6-t2v`（通义万相 / Qwen Wan 2.6，异步任务）

API 文档索引见 [AIsa API Reference](https://aisa.mintlify.app/api-reference/introduction)（可从 `https://aisa.mintlify.app/llms.txt` 找到所有页面）。

## 🔥 你可以做什么

### 图片生成（Gemini）
```
"生成一张赛博朋克风格的城市夜景，霓虹灯，雨夜，电影感"
```

### 视频生成（Wan 2.6）
```
"用一张参考图生成 5 秒镜头：镜头缓慢推进，风吹动头发，电影感，浅景深"
```

## Quick Start

```bash
export AISA_API_KEY="your-key"
```

---

## 🖼️ Image Generation (Gemini)

### Endpoint

- Base URL: `https://api.aisa.one/v1`
- `POST /models/{model}:generateContent`

文档：`google-gemini-chat`（GenerateContent）见 `https://aisa.mintlify.app/api-reference/chat/chat-api/google-gemini-chat.md`。

### curl 示例（返回 inline_data 时为图片）

```bash
curl -X POST "https://api.aisa.one/v1/models/gemini-3-pro-image-preview:generateContent" \
  -H "Authorization: Bearer $AISA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents":[
      {"role":"user","parts":[{"text":"A cute red panda, ultra-detailed, cinematic lighting"}]}
    ]
  }'
```

> 说明：该接口的响应中可能出现 `candidates[].parts[].inline_data`（通常包含 base64 数据与 mime 类型）；客户端脚本会自动解析并保存文件。

---

## 🎞️ Video Generation (Qwen Wan 2.6 / Tongyi Wanxiang)

### Create task

- Base URL: `https://api.aisa.one/apis/v1`
- `POST /services/aigc/video-generation/video-synthesis`
- Header：`X-DashScope-Async: enable`（必填，异步）

文档：`video-generation` 见 `https://aisa.mintlify.app/api-reference/aliyun/video/video-generation.md`。

```bash
curl -X POST "https://api.aisa.one/apis/v1/services/aigc/video-generation/video-synthesis" \
  -H "Authorization: Bearer $AISA_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-DashScope-Async: enable" \
  -d '{
    "model":"wan2.6-t2v",
    "input":{
      "prompt":"cinematic close-up, slow push-in, shallow depth of field",
      "img_url":"https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Cat03.jpg/320px-Cat03.jpg"
    },
    "parameters":{
      "resolution":"720P",
      "duration":5,
      "shot_type":"single",
      "watermark":false
    }
  }'
```

### Poll task

- `GET /services/aigc/tasks?task_id=...`

文档：`task` 见 `https://aisa.mintlify.app/api-reference/aliyun/video/task.md`。

```bash
curl "https://api.aisa.one/apis/v1/services/aigc/tasks?task_id=YOUR_TASK_ID" \
  -H "Authorization: Bearer $AISA_API_KEY"
```

---

## Python Client

```bash
# 生成图片（保存到本地文件）
python3 {baseDir}/scripts/media_gen_client.py image \
  --prompt "A cute red panda, cinematic lighting" \
  --out "out.png"

# 创建视频任务（需要 img_url）
python3 {baseDir}/scripts/media_gen_client.py video-create \
  --prompt "cinematic close-up, slow push-in" \
  --img-url "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Cat03.jpg/320px-Cat03.jpg" \
  --duration 5

# 轮询任务状态
python3 {baseDir}/scripts/media_gen_client.py video-status --task-id YOUR_TASK_ID

# 等待直到成功（可选：成功后打印 video_url）
python3 {baseDir}/scripts/media_gen_client.py video-wait --task-id YOUR_TASK_ID --poll 10 --timeout 600

# 等待直到成功并自动下载 mp4
python3 {baseDir}/scripts/media_gen_client.py video-wait --task-id YOUR_TASK_ID --download --out out.mp4
```

Related Skills

openclaw-youtube

from openclaw/skills

YouTube SERP Scout for agents. Search top-ranking videos, channels, and trends for content research and competitor tracking.

Content & Documentation

---

from openclaw/skills

name: article-factory-wechat

Content & Documentation

humanizer

from openclaw/skills

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.

Content & Documentation

linkedin-cli

from openclaw/skills

A bird-like LinkedIn CLI for searching profiles, checking messages, and summarizing your feed using session cookies.

Content & Documentation

小红书长图文发布 Skill

from openclaw/skills

## 概述

Content & Documentation

Cold Email Writer

from openclaw/skills

Writes personalized cold emails that actually get replies

Content & Documentation

Presentation Mastery — Complete Slide Design & Delivery System

from openclaw/skills

You are a Presentation Architect. You help build presentations that persuade, inform, and move people to action. You cover the full lifecycle: audience analysis → narrative structure → slide design → delivery coaching → post-presentation follow-up.

Content & Documentation

ai-humanizer

from openclaw/skills

Rewrites AI-generated content to sound natural, human, and undetectable. Removes robotic patterns, adds voice variety, and preserves meaning.

Content & Documentation

Employee Handbook Generator

from openclaw/skills

Build a complete, customized employee handbook for your company. Covers policies, benefits, conduct, leave, remote work, DEI, and compliance — ready for legal review.

Content & Documentation

afrexai-copywriting-mastery

from openclaw/skills

Write high-converting copy for any medium — landing pages, emails, ads, UX, sales pages, video scripts, and brand voice. Complete methodology with frameworks, templates, scoring rubrics, and swipe files. Use when writing or reviewing any user-facing text.

Content & Documentation

afrexai-conversion-copywriting

from openclaw/skills

Write high-converting copy for any surface — landing pages, emails, ads, sales pages, product descriptions, CTAs, video scripts, and more. Complete conversion copywriting system with research methodology, 12 proven frameworks, swipe-file templates, scoring rubrics, and A/B testing protocols. Use when you need to write or review any copy meant to drive action.

Content & Documentation

seo-assistant

from openclaw/skills

A client-facing SEO assistant grounded in Google's official SEO Starter Guide. Use this skill whenever a user mentions SEO, search rankings, Google visibility, meta descriptions, title tags, page titles, alt text, sitemaps, duplicate content, URL structure, or asks how to improve their website's presence in search results. Also trigger when a user shares a URL or webpage content and wants feedback, or asks for help writing any web content that needs to perform well in search. This skill covers auditing, content writing, and answering SEO questions — use it proactively even if the user only hints at wanting more website traffic or better Google rankings.

Content & Documentation