glm4v-analyze-image

智谱AI的视觉语言模型，用于图像分析、内容识别和视觉问答

242 stars

Best use case

glm4v-analyze-image is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. 智谱AI的视觉语言模型，用于图像分析、内容识别和视觉问答

智谱AI的视觉语言模型，用于图像分析、内容识别和视觉问答

Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.

Practical example

Example input

Use the "glm4v-analyze-image" skill to help with this workflow task. Context: 智谱AI的视觉语言模型，用于图像分析、内容识别和视觉问答

Example output

A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.

When to use this skill

Use this skill when you want a reusable workflow rather than writing the same prompt again and again.

When not to use this skill

Do not use this when you only need a one-off answer and do not need a reusable workflow.
Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/glm4v-analyze-image/SKILL.md --create-dirs "https://raw.githubusercontent.com/aiskillstore/marketplace/main/skills/ck991357/glm4v-analyze-image/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/glm4v-analyze-image/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How glm4v-analyze-image Compares

Feature / Agent	glm4v-analyze-image	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

智谱AI的视觉语言模型，用于图像分析、内容识别和视觉问答

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# GLM-4V图像分析工具指南

## 核心能力
- 图像内容识别和描述
- 视觉问答和推理
- 图像细节分析
- 多模态理解和生成

## 调用规范
```json
{
  "tool_name": "glm4v_analyze_image",
  "parameters": {
    "model": "glm-4v-flash",
    "image_url": "图片URL",
    "prompt": "分析提示语"
  }
}
```

以下是调用 `glm4v_analyze_image` 工具的**正确**和**错误**示例。请务必遵循正确格式。

## ✅ 正确示例
```json
{"model": "glm-4v-flash", "image_url": "https://path/to/image.jpg", "prompt": "Describe this image."}
```

## ❌ 错误示例 (请避免以下常见错误)

- **缺少引号或逗号:** 
  ```json
  {"model": "glm-4v-flash", "image_url": "https://path/to/image.jpg", "prompt": "Describe this image."}
  ```
  (缺少 `}`)

- **参数名错误:** 
  ```json
  {"img_url": "https://path/to/image.jpg"}
  ```
  (应为 "image_url" 而非 "img_url")

- **模型名称错误:** 
  ```json
  {"model": "glm4v-flash", "image_url": "https://path/to/image.jpg", "prompt": "Describe this image."}
  ```
  (应为 "glm-4v-flash")
  
## 关键指令
1. **模型选择**: 使用 `glm-4v-flash` 模型
2. **图片格式**: 支持常见图片格式（JPEG, PNG, WebP等）
3. **提示语设计**: 清晰具体的分析指令
4. **URL有效性**: 确保图片URL可公开访问

## 使用场景

### 图像描述
```json
{
  "tool_name": "glm4v_analyze_image",
  "parameters": {
    "model": "glm-4v-flash", 
    "image_url": "https://example.com/image.jpg",
    "prompt": "详细描述这张图片的内容"
  }
}
```

### 视觉问答
```json
{
  "tool_name": "glm4v_analyze_image",
  "parameters": {
    "model": "glm-4v-flash",
    "image_url": "https://example.com/image.jpg", 
    "prompt": "图片中有多少人？他们在做什么？"
  }
}
```

### 细节分析
```json
{
  "tool_name": "glm4v_analyze_image",
  "parameters": {
    "model": "glm-4v-flash",
    "image_url": "https://example.com/image.jpg",
    "prompt": "分析图片中的文字内容和技术细节"
  }
}
```

## 最佳实践

### 提示语设计
- **具体明确**: "描述图片中人物的动作和表情"
- **任务导向**: "识别图片中的所有物体并分类"
- **细节要求**: "注意颜色、形状、空间关系等细节"

### 错误处理
- 检查图片URL是否有效
- 确认图片格式支持
- 处理网络超时情况

## 能力范围
- ✅ 物体识别和分类
- ✅ 场景理解和描述  
- ✅ 文字识别（OCR）
- ✅ 情感和氛围分析
- ✅ 技术细节提取

## 限制说明
- ❌ 不能处理敏感或不当内容
- ❌ 图片大小和分辨率有限制
- ❌ 实时视频流不支持
- ❌ 3D模型分析不支持

## 性能优化
- 使用合适的图片尺寸
- 提供具体的分析需求
- 分步骤进行复杂分析
- 结合其他工具进行验证

Related Skills

image-assistant

242

from aiskillstore/marketplace

配图助手 - 把文章/模块内容转成统一风格、少字高可读的 16:9 信息图提示词；先定“需要几张图+每张讲什么”，再压缩文案与隐喻，最后输出可直接复制的生图提示词并迭代。

zimage-skill

242

from aiskillstore/marketplace

Generate images using ModelScope Z-Image-Turbo API. Use when user asks to generate, create, or make images, pictures, or illustrations.

qwen-image

242

from aiskillstore/marketplace

Generate and edit images with Alibaba Qwen-Image-2.0 models via inference.sh CLI. Models: Qwen-Image-2.0 (fast), Qwen-Image-2.0-Pro (professional text rendering). Capabilities: text-to-image, multi-image editing, complex text rendering. Triggers: qwen image, qwen-image, alibaba image, dashscope image, qwen image 2, qwen image pro

qwen-image-pro

242

from aiskillstore/marketplace

Generate images with Alibaba Qwen-Image-2.0-Pro via inference.sh CLI. Professional text rendering, fine-grained realism, enhanced semantic adherence. Ideal for posters, banners, and text-heavy designs. Triggers: qwen image pro, qwen-image-pro, qwen 2 pro, alibaba image pro, dashscope pro, professional text rendering

imagen

242

from aiskillstore/marketplace

Generates images using Google Gemini's image generation model for frontend UIs, documentation, and visual representations.

fal-image-edit

242

from aiskillstore/marketplace

AI-powered image editing with style transfer and object removal

azure-ai-vision-imageanalysis-py

242

from aiskillstore/marketplace

Azure AI Vision Image Analysis SDK for captions, tags, objects, OCR, people detection, and smart cropping. Use for computer vision and image understanding tasks. Triggers: "image analysis", "computer vision", "OCR", "object detection", "ImageAnalysisClient", "image caption".

azure-ai-vision-imageanalysis-java

242

from aiskillstore/marketplace

Build image analysis applications with Azure AI Vision SDK for Java. Use when implementing image captioning, OCR text extraction, object detection, tagging, or smart cropping.

nano-image-generator

242

from aiskillstore/marketplace

Generate images using Nano Banana Pro (Gemini 3 Pro Preview). Use when creating app icons, logos, UI graphics, marketing banners, social media images, illustrations, diagrams, or any visual assets. Supports reference images for style transfer and character consistency. Triggers include phrases like 'generate an image', 'create a graphic', 'make an icon', 'design a logo', 'create a banner', 'same style as', 'keep the style', or any request needing visual content.

baoyu-xhs-images

242

from aiskillstore/marketplace

Xiaohongshu (Little Red Book) infographic series generator with multiple style options. Breaks down content into 1-10 cartoon-style infographics. Use when user asks to create "小红书图片", "XHS images", or "RedNote infographics".

baoyu-image-gen

242

from aiskillstore/marketplace

AI image generation with OpenAI and Google APIs. Supports text-to-image, reference images, aspect ratios, and parallel generation (recommended 4 concurrent subagents). Use when user asks to generate, create, or draw images.

baoyu-cover-image

242

from aiskillstore/marketplace

Generate elegant cover images for articles. Analyzes content and creates eye-catching hand-drawn style cover images with multiple style options. Use when user asks to "generate cover image", "create article cover", or "make a cover for article".