glm-v-model

Q: What does this skill do?

智谱 GLM-4V/4.6V 视觉模型调用技能。用于图像/视频理解、多模态对话、图表分析等任务。 当用户提到：图片理解、图像识别、视觉模型、GLM-4V、GLM-4.6V、多模态分析、看图说话、图表分析、视频理解时使用此技能。

智谱 GLM-4V/4.6V 视觉模型调用技能。用于图像/视频理解、多模态对话、图表分析等任务。当用户提到：图片理解、图像识别、视觉模型、GLM-4V、GLM-4.6V、多模态分析、看图说话、图表分析、视频理解时使用此技能。

3,891 stars

byopenclaw

View on GitHub Installation ↓

Best use case

glm-v-model is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using glm-v-model should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/glm-v-model/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/baokui/glm-v-model/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/glm-v-model/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How glm-v-model Compares

Feature / Agent	glm-v-model	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# GLM 视觉模型调用

本技能提供调用智谱 AI 的 GLM-4V 和 GLM-4.6V 视觉模型的能力，支持图像理解、视频分析、图表解读等功能。

## 支持的模型

| 模型 | 说明 | 特点 |
|------|------|------|
| glm-4v | GLM-4 视觉模型 | 基础视觉理解 |
| glm-4.6v | GLM-4.6V 视觉模型 | 更强的视觉理解能力，支持更长上下文 |

## 快速使用

### 基本图像理解

```python
from zai import ZhipuAiClient
import base64

client = ZhipuAiClient(api_key="YOUR_API_KEY")

# 读取本地图片并转为 base64
with open("image.jpg", "rb") as f:
    img_base = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="glm-4.6v",
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img_base}"}},
            {"type": "text", "content": "描述这张图片"}
        ]
    }],
    thinking={"type": "enabled"}
)
print(response.choices[0].message.content)
```

### 使用图片URL

```python
response = client.chat.completions.create(
    model="glm-4.6v",
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}},
            {"type": "text", "content": "这张图片里有什么？"}
        ]
    }]
)
```

### 多图理解

```python
response = client.chat.completions.create(
    model="glm-4.6v",
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": "图片1 base64 或 URL"}},
            {"type": "image_url", "image_url": {"url": "图片2 base64 或 URL"}},
            {"type": "text", "content": "比较这两张图片的异同"}
        ]
    }]
)
```

### 视频理解（GLM-4.6V）

```python
# 支持理解视频内容
response = client.chat.completions.create(
    model="glm-4.6v",
    messages=[{
        "role": "user",
        "content": [
            {"type": "video_url", "video_url": {"url": "视频URL"}},
            {"type": "text", "content": "描述这个视频的内容"}
        ]
    }]
)
```

## 使用脚本

项目中已包含脚本 `script/infer_glmv.py`，可直接调用：

```python
import sys
sys.path.append('/Users/guobaokui/.openclaw/workspace_multmodal/skills/glm-v-model/script')
from infer_glmv import glm_v

# 使用方式
# glm_v(['image.jpg'], '描述图片', 'glm-4.6v')
```

## 常用场景

| 场景 | Prompt 示例 |
|------|-------------|
| 图片描述 | "详细描述这张图片的内容" |
| 图表分析 | "分析这张图表数据" |
| 文字识别(OCR) | "提取图片中的文字" |
| 物体识别 | "图片中有哪些物体" |
| 场景理解 | "这是什么地方" |
| 多图对比 | "比较这两张图片的异同" |
| 视频理解 | "总结这个视频的内容" |

## 注意事项

1. **API Key**: 需要智谱 AI 的 API Key，可从 https://open.bigmodel.cn 获取
2. **图片格式**: 支持 JPEG、PNG、WebP 等常见格式
3. **图片大小**: 单张图片建议不超过 10MB
4. **thinking**: 可启用深度思考模式 `thinking={"type": "enabled"}`
5. **计费**: 按 token 计费，图片会转换为 token 消耗

Related Skills

MCP Engineering — Complete Model Context Protocol System

3891

from openclaw/skills

Build, integrate, secure, and scale MCP servers and clients. From first server to production multi-tool architecture.

AI Infrastructure & Integrations

ml-model-eval-benchmark

3891

from openclaw/skills

Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.

Machine Learning

decision-mental-models

3891

from openclaw/skills

Apply the most relevant mental models (First Principles, Inversion, Second-Order Thinking, Occam's Razor, and 16 others) to any problem or decision, surfaces non-obvious insights by explicitly matching and working through 2-3 models per query.

model-usage

3891

from openclaw/skills

Summarize per-model usage for Codex or Claude including cost tracking. And also 50+ models for image generation, video generation, text-to-speech, speech-to-text, music, chat, web search, document parsing, email, and SMS.

modelscope-image-gen

3891

from openclaw/skills

通过魔搭社区(ModelScope) API 生成图片。先使用 --list-models 查看可用模型，然后根据用户需求由 AI 生成专业的提示词，最后调用 API 生成图片。支持 Kolors、Stable Diffusion XL、FLUX 等多种文生图模型。当用户需要使用魔搭社区、ModelScope 或中文 AI 模型生成图片时使用此技能。

model-fallback

3891

from openclaw/skills

Multi-model automatic fallback system. Monitors model availability and automatically falls back to backup models when the primary model fails. Supports MiniMax, Kimi, Zhipu and other OpenAI-compatible APIs. Use when: (1) Primary model API is unavailable, (2) Model response time is too slow, (3) Rate limit exceeded, (4) Need to optimize costs by using cheaper models for simple tasks.

pydantic-ai-model-integration

3891

from openclaw/skills

Configure LLM providers, use fallback models, handle streaming, and manage model settings in PydanticAI. Use when selecting models, implementing resilience, or optimizing API calls.

model-council

3891

from openclaw/skills

Multi-model consensus system — send a query to 3+ different LLMs via OpenRouter simultaneously, then a judge model evaluates all responses and produces a winner, reasoning, and synthesized best answer. Like having a board of AI advisors. Use for important decisions, code review, research verification.

model-audit

3891

from openclaw/skills

Monthly LLM stack audit — compare your current models against latest benchmarks and pricing from OpenRouter. Identifies potential savings, upgrades, and better alternatives by category (reasoning, code, fast, cheap, vision). Use for optimizing AI costs and staying on the frontier.

Model Intel

3891

from openclaw/skills

Live LLM model intelligence from OpenRouter. Compare pricing, search models by name, find the best model for any task — code, reasoning, creative, fast, cheap, vision, long-context. Real-time data from 200+ models. Use when choosing models, comparing costs, or auditing your AI stack.

model-brain

3891

from openclaw/skills

Route each incoming message to the right Bankr/OpenClaw model or to a zero-LLM path based on task type, risk, and cost. Use when you need per-message model selection, cost-aware routing, deterministic skill bypasses, or a model recommendation for aaigotchi workflows.

wavelet-world-model

3891

from openclaw/skills

Generates a world model representation from state inputs using discrete wavelet transforms (DWT) to capture multi-resolution temporal and spatial features.