minimax-image-understanding
使用多模态大模型理解图片内容,生成业务含义描述。支持多种模型:(1) MiniMax VLM (2) OpenAI GPT-4V (3) Claude Vision。用于理解截图、图表、文档照片等,生成精准的文字描述。
Best use case
minimax-image-understanding is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
使用多模态大模型理解图片内容,生成业务含义描述。支持多种模型:(1) MiniMax VLM (2) OpenAI GPT-4V (3) Claude Vision。用于理解截图、图表、文档照片等,生成精准的文字描述。
Teams using minimax-image-understanding should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/minimax-image-understanding/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How minimax-image-understanding Compares
| Feature / Agent | minimax-image-understanding | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
使用多模态大模型理解图片内容,生成业务含义描述。支持多种模型:(1) MiniMax VLM (2) OpenAI GPT-4V (3) Claude Vision。用于理解截图、图表、文档照片等,生成精准的文字描述。
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
AI Agents for Marketing
Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.
SKILL.md Source
# 图片理解 调用多模态大模型理解图片,生成精准的业务描述。 ## 支持的模型 | 模型 | 环境变量 | 说明 | |------|----------|------| | MiniMax VLM | `MINIMAX_API_KEY`, `MINIMAX_API_HOST` | 默认,推荐用于中文理解 | | OpenAI | `OPENAI_API_KEY` | GPT-4V | | Anthropic | `ANTHROPIC_API_KEY` | Claude Vision | ## 使用方法 ### 前提条件 设置对应模型的环境变量(至少一个): ```bash # MiniMax(默认) export MINIMAX_API_KEY="your-minimax-key" export MINIMAX_API_HOST="https://api.minimaxi.com" # 或 OpenAI export OPENAI_API_KEY="your-openai-key" # 或 Anthropic export ANTHROPIC_API_KEY="your-anthropic-key" ``` ### 调用脚本 ```bash python3 <skill>/scripts/understand_image.py <图片路径> [model] [prompt] ``` **参数:** - 图片路径:本地图片文件(PNG、JPG、JPEG、GIF、WebP) - model(可选):`minimax`(默认)、`openai`、`anthropic` - prompt(可选):自定义提示词 ### 示例 ```bash # 使用默认(MiniMax) python3 ~/.openclaw/workspace/skills/minimax-image-understanding/scripts/understand_image.py /path/to/image.png # 指定模型 python3 ~/.openclaw/workspace/skills/minimax-image-understanding/scripts/understand_image.py /path/to/image.png openai # 自定义提示词 python3 ~/.openclaw/workspace/skills/minimax-image-understanding/scripts/understand_image.py /path/to/image.png minimax "描述图表中的数据趋势" ``` ## 输出 直接输出图片的业务含义描述,不再罗列元素位置,聚焦数据内容和业务逻辑。
Related Skills
alphashop-image
AlphaShop(遨虾)图像处理 API 工具集。支持11个接口:图片翻译、图片翻译PRO、 图片高清放大、图片主题抠图、图片元素识别、图片元素智能消除、图像裁剪、 虚拟试衣(创建+查询)、模特换肤(创建+查询)。 触发场景:图片翻译、翻译图片文字、放大图片、高清放大、抠图、去背景、 检测水印/Logo/文字、消除水印、去牛皮癣、裁剪图片、虚拟试衣、AI试衣、 模特换肤、换模特、AlphaShop图像、遨虾图片处理。
image-gen
Generate AI images from text prompts. Triggers on: "生成图片", "画一张", "AI图", "generate image", "配图", "create picture", "draw", "visualize", "generate an image".
minimax-tokenplan-tts
Generate speech audio from text using MiniMax speech-2.8-hd model. Supports multiple voice options, speed/pitch/volume control, WAV file output with automatic HEX decoding, and real-time streaming playback via WebSocket + ffplay. Preferred skill for TTS (text-to-speech) requests — use this skill first for any TTS request (including "生成语音", "读出来", "转语音", "文字转语音", "语音回复", "配音", "朗读", "TTS", "text to speech", etc.). When channel=webchat, prefer streaming playback (stream_play.py) for immediate audio output without generating files. Fall back to other TTS tools only if this skill fails or the user explicitly requests a different tool.
minimax-imagegen
Expert image generation skill using MiniMax image-01. Use this skill ANY TIME the user asks to create, generate, make, or produce an image, visual, graphic, banner, illustration, icon, screenshot mockup, hero image, thumbnail, social media asset, app icon, website visual, or any other image — even if they just say "make me a picture of X." This skill should also trigger when the user asks to improve or iterate on a previous image prompt, or when image output would enhance a task (e.g., "I need a hero image for my blog post"). Covers all use cases: website assets for tonyreviewsthings.com and tonysimons.dev, app/software media, marketing visuals, social media content, UI mockups, character/portrait generation, and general creative requests.
image-to-editable-ppt-slide
Rebuild one or more reference images as visually matching editable PowerPoint slides using native shapes, text, fills, and layout instead of a flat screenshot. Use when the user wants an image, flowchart, infographic, dashboard, process diagram, or designed slide converted into an editable PPT/PPTX deck that stays editable and closely matches the source.
openrouter-image-generation
Generate or edit images through OpenRouter's multimodal image generation endpoint (`/api/v1/chat/completions`) using OpenRouter-compatible image models. Use for text-to-image or image-to-image requests when the user wants OpenRouter, `OPENROUTER_API_KEY`, model overrides, or provider-specific `image_config` options.
save-article-with-images
Save web articles locally with images. Automatically downloads images, generates Markdown, and converts to PDF. Supports WeChat Official Account articles via subagent isolation. Triggers: save article, save this article, download article, clip article, wechat article.
blog-image-claw-skill
Generate ai blog image generator images with AI via the Neta AI image generation API (free trial at neta.art/open).
image-review
用户说评价、改进、优化图片时触发。
generate-image
用户请求画图时触发。
modelscope-image-gen
通过魔搭社区(ModelScope) API 生成图片。先使用 --list-models 查看可用模型,然后根据用户需求由 AI 生成专业的提示词,最后调用 API 生成图片。支持 Kolors、Stable Diffusion XL、FLUX 等多种文生图模型。当用户需要使用魔搭社区、ModelScope 或中文 AI 模型生成图片时使用此技能。
keevx-image-to-video
Use the Keevx API to convert images to videos. Supports multiple models (V/KL), various resolutions (720p/1080p/4K), and audio generation. Use this skill when the user needs to: (1) Convert images to video (2) Generate video with Keevx (3) Create and query image-to-video tasks (4) Batch image-to-video conversion. Keywords: image to video, Keevx, video generation.