cliproxy-media

Analyze images (jpg, png, gif, webp) and PDFs via CLIProxyAPI — a Claude Max proxy that routes requests through your subscription at zero extra cost. Use this skill whenever you need to analyze, describe, or extract information from an image or photo ("analyze image", "describe photo", "what is in this picture"), read or summarize a PDF document ("read PDF", "summary of this document"), or process any media file via a CLIProxy-compatible endpoint ("process media via proxy", "cliproxy vision", "cliproxy media"). NEVER use the built-in `image` or `pdf` tools when using CLIProxyAPI — they fall back to direct Anthropic API which requires separate credits. Use this skill instead for all vision and document analysis tasks.

3,891 stars

Best use case

cliproxy-media is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Analyze images (jpg, png, gif, webp) and PDFs via CLIProxyAPI — a Claude Max proxy that routes requests through your subscription at zero extra cost. Use this skill whenever you need to analyze, describe, or extract information from an image or photo ("analyze image", "describe photo", "what is in this picture"), read or summarize a PDF document ("read PDF", "summary of this document"), or process any media file via a CLIProxy-compatible endpoint ("process media via proxy", "cliproxy vision", "cliproxy media"). NEVER use the built-in `image` or `pdf` tools when using CLIProxyAPI — they fall back to direct Anthropic API which requires separate credits. Use this skill instead for all vision and document analysis tasks.

Teams using cliproxy-media should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/cliproxy-media/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/bencoremans/cliproxy-media/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/cliproxy-media/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How cliproxy-media Compares

Feature / Agentcliproxy-mediaStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Analyze images (jpg, png, gif, webp) and PDFs via CLIProxyAPI — a Claude Max proxy that routes requests through your subscription at zero extra cost. Use this skill whenever you need to analyze, describe, or extract information from an image or photo ("analyze image", "describe photo", "what is in this picture"), read or summarize a PDF document ("read PDF", "summary of this document"), or process any media file via a CLIProxy-compatible endpoint ("process media via proxy", "cliproxy vision", "cliproxy media"). NEVER use the built-in `image` or `pdf` tools when using CLIProxyAPI — they fall back to direct Anthropic API which requires separate credits. Use this skill instead for all vision and document analysis tasks.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# cliproxy-media

**Source:** https://github.com/bencoremans/site/tree/main/skills/cliproxy-media

Analyze images and PDFs via CLIProxyAPI (Claude Max subscription, zero extra cost).

## Setup

Set the endpoint to your CLIProxy instance:

```bash
export CLIPROXY_URL=http://your-host:8317/v1/messages
```

For Docker setups, replace `your-host` with your container hostname (e.g. `cliproxyapi`, `localhost`, or the container IP).

## Quick start

```bash
# Analyze an image
python3 skills/cliproxy-media/scripts/analyze.py /path/to/image.jpg "What is in this image?"

# Read a PDF
python3 skills/cliproxy-media/scripts/analyze.py /path/to/document.pdf "Give a summary"

# Compare multiple images
python3 skills/cliproxy-media/scripts/analyze.py img1.jpg img2.jpg "Compare these images"

# With streaming (output appears immediately)
python3 skills/cliproxy-media/scripts/analyze.py --stream image.jpg "Describe in detail"

# With system prompt
python3 skills/cliproxy-media/scripts/analyze.py --system "You are a medical expert" scan.jpg "What do you see?"

# With higher token limit
python3 skills/cliproxy-media/scripts/analyze.py --max-tokens 4096 document.pdf "Extensive analysis"
```

## What works ✅ / What doesn't ❌

### ✅ Supported file types

| Type | Format | Note |
|------|--------|------|
| Image | `.jpg` / `.jpeg` | Requires valid JPEG data |
| Image | `.png` | Fully supported |
| Image | `.gif` | Fully supported |
| Image | `.webp` | Fully supported |
| Document | `.pdf` | Base64-encoded, via `document` content type |
| Image via URL | `http://` / `https://` | Direct URL reference, no download needed |

**Multiple files at once:** Provide multiple paths before the question. Max ~100 per request (Anthropic limit).

### ❌ Not supported

- **Office files** (`.docx`, `.xlsx`, `.pptx`) — Workaround: convert to PDF
- **Audio** (`.mp3`, `.wav`, `.ogg`) — Use Whisper for transcription
- **Video** (`.mp4`, `.mov`, `.avi`) — Not supported by the model
- **Other document types** (`.txt`, `.html`, `.md` as document) — Send text directly as a string

## ⚠️ System prompt warning

CLIProxyAPI accepts **only** the array notation for system prompts. The string notation is **silently ignored** — the model does not see it, but you also won't get an error message!

```python
# ❌ DOES NOT WORK — ignored without error message
payload["system"] = "You are an expert."

# ✅ WORKS — always use array notation
payload["system"] = [{"type": "text", "text": "You are an expert."}]
```

The `--system` argument in `analyze.py` automatically uses the correct array notation.

## Configuration (env vars)

| Variable | Default | Description |
|----------|---------|-------------|
| `CLIPROXY_URL` | `http://localhost:8317/v1/messages` | Full endpoint URL |
| `CLIPROXY_MODEL` | `claude-sonnet-4-6` | Model to use |

Example:
```bash
export CLIPROXY_URL=http://localhost:8317/v1/messages
export CLIPROXY_MODEL=claude-opus-4-6
python3 skills/cliproxy-media/scripts/analyze.py image.jpg "question"
```

## Additional options

```
--stream          Streaming output via SSE (output appears immediately)
--system TEXT     System prompt (automatically sent as array)
--max-tokens N    Maximum output tokens (default: 1024)
--model MODEL     Model override (overrides CLIPROXY_MODEL)
--url URL         Endpoint override (overrides CLIPROXY_URL)
```

## Compatibility

This script works with any API that supports the Anthropic Messages format:

| Provider | Compatible | Note |
|----------|-----------|------|
| **CLIProxyAPI** | ✅ Yes | Primarily tested, system prompt array required |
| **OpenRouter** | ✅ Yes | Use Bearer token instead of `x-api-key: dummy` |
| **LiteLLM** | ✅ Yes | As proxy for Anthropic format |
| **Anthropic direct** | ✅ Yes | Use `ANTHROPIC_API_KEY` as x-api-key |

**Note for non-CLIProxy endpoints:** Some proxies do accept string notation for system prompts. Always use array notation for maximum compatibility.

## Known limitations of CLIProxyAPI

- `temperature` and `top_p` may **not** be used at the same time (HTTP 400)
- PDF as document with URL source does not work (`Unable to download the file`)
- Only `claude-sonnet-4-6` and `claude-opus-4-6` available (haiku is deprecated)
- `inference_geo` is always `not_available` in the response

## Direct Python API

If you want to call the script from your own Python code:

```python
import subprocess, json

result = subprocess.run(
    ["python3", "skills/cliproxy-media/scripts/analyze.py", "image.jpg", "Describe this"],
    capture_output=True, text=True
)
print(result.stdout)
```

Or use the built-in exec tool:
```
exec: python3 skills/cliproxy-media/scripts/analyze.py /path/to/image.jpg "question"
```

Related Skills

openclaw-media-gen

3891
from openclaw/skills

Generate images & videos with AIsa. Gemini 3 Pro Image (image) + Qwen Wan 2.6 (video) via one API key.

Content & Documentation

media-compress

3891
from openclaw/skills

Compress and convert images and videos using ffmpeg. Use when the user wants to reduce file size, change format, resize, or optimize media files. Handles common formats like JPG, PNG, WebP, MP4, MOV, WebM. Triggers on phrases like "compress image", "compress video", "reduce file size", "convert to webp/mp4", "resize image", "make image smaller", "batch compress", "optimize media".

General Utilities

cpa-codex-auth-sweep-cliproxy

3891
from openclaw/skills

通过 CLI Proxy Management API 拉取 Codex 认证文件并高并发探活扫描。适用于「扫号」「清死号」「清理 Codex 401」场景;仅在用户明确确认后可删除 401。执行前必须提供 base_url 与 management_key。安全限制:默认仅允许 https://chatgpt.com 作为 probe 主机,非白名单目标需显式危险确认。

social-media-agent

3891
from openclaw/skills

Automated social media manager — plan, write, schedule, and analyze content across X/Twitter, LinkedIn, Instagram, TikTok, Facebook, and Pinterest. Integrates with Buffer (free) or Postiz (self-hosted) for scheduling.

social-media-content-scraper-pro

3891
from openclaw/skills

Social Media Content Bulk Scraper, extract articles/posts from WeChat, Instagram, TikTok, YouTube, export to Markdown/HTML with full metadata. $0.005 USDT per use.

cliproxy-openclaw

3891
from openclaw/skills

Deploy and configure CLIProxyAPI, expose its dashboard safely, connect OAuth providers like Claude Code, Gemini, Codex, Qwen, and iFlow, generate a reusable API endpoint and API key, and integrate it with OpenClaw or other OpenAI-compatible tools. Use when the user wants one API layer from subscription-based CLI or OAuth accounts, multi-account routing, or CLIProxy setup on a VPS or local machine.

siliconflow-media

3891
from openclaw/skills

SiliconFlow 多模态服务,支持图片生成(FLUX/Qwen)、视频生成(Wan)、TTS语音合成、ASR语音识别。使用代金券支付。

Macrocosmos SN13 API - Social Media Data Skill

3891
from openclaw/skills

Fetch real-time social media data from X (Twitter) and Reddit by keyword, username, date range, and filters with engagement metrics via Macrocosmos SN13 API on Bittensor.

muapi-media-generation

3891
from openclaw/skills

Generate AI images, videos, music, and audio from the terminal via muapi.ai — supports 100+ models including Flux, Midjourney v7, Kling 3.0, Veo3, and Suno V5

muapi-media-editing

3891
from openclaw/skills

Edit and enhance images and videos with AI via muapi.ai — prompt-based editing, upscaling, background removal, face swap, lipsync, video effects, and more

media-writing

3891
from openclaw/skills

You are a professional media writing expert with extensive experience in creating engaging and impactful content across multiple formats. Creating attention-grabbing titles and content, excelling in trending topics, emotional storytelling, and practical value-driven pieces that align with new media trends. You are well-versed in pop culture, current events, and user psychology, enabling you to ...

social-media-analyzer

3891
from openclaw/skills

Social media campaign analysis and performance tracking. Calculates engagement rates, ROI, and benchmarks across platforms. Use for analyzing social media performance, calculating engagement rate, measuring campaign ROI, comparing platform metrics, or benchmarking against industry standards.