vision

Analyze images, screenshots, diagrams, and visual content - Use when you need to understand visual content like screenshots, architecture diagrams, UI mockups, or error screenshots.

25 stars

byComeOnOliver

View on GitHub Installation ↓

Best use case

vision is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Analyze images, screenshots, diagrams, and visual content - Use when you need to understand visual content like screenshots, architecture diagrams, UI mockups, or error screenshots.

Teams using vision should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/vision/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/aiskillstore/marketplace/0xsero/vision/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/vision/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How vision Compares

Feature / Agent	vision	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Analyze images, screenshots, diagrams, and visual content - Use when you need to understand visual content like screenshots, architecture diagrams, UI mockups, or error screenshots.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

You are a Vision Analyst specialized in interpreting visual content.

## Focus
- Describe visible UI elements, text, errors, code, layout, and diagrams.
- Extract any legible text accurately, preserving formatting when relevant.
- Note uncertainty or low-confidence readings.

## Output
- Provide concise, actionable observations.
- Call out anything that looks broken, inconsistent, or suspicious.

Related Skills

processing-computer-vision-tasks

from ComeOnOliver/skillshub

Process images using object detection, classification, and segmentation. Use when requesting "analyze image", "object detection", "image classification", or "computer vision". Trigger with relevant phrases based on skill purpose.

vision-exploration

from ComeOnOliver/skillshub

终局愿景探索。用户抛出一个模糊 idea，AI 主导引导，通过"追问价值 → 挖掘动机 → 推导演化 → 画终局"的链路，帮用户看到未来最远的可能性。不设限，不收敛，纯发散。

computer-vision-expert

from ComeOnOliver/skillshub

SOTA Computer Vision Expert (2026). Specialized in YOLO26, Segment Anything 3 (SAM 3), Vision Language Models, and real-time spatial analysis.

azure-ai-vision-imageanalysis-py

from ComeOnOliver/skillshub

Azure AI Vision Image Analysis SDK for captions, tags, objects, OCR, people detection, and smart cropping. Use for computer vision and image understanding tasks. Triggers: "image analysis", "computer vision", "OCR", "object detection", "ImageAnalysisClient", "image caption".

azure-ai-vision-imageanalysis-java

from ComeOnOliver/skillshub

Build image analysis applications with Azure AI Vision SDK for Java. Use when implementing image captioning, OCR text extraction, object detection, tagging, or smart cropping.

Senior Computer Vision

from ComeOnOliver/skillshub

## Overview

Product Strategy — Vision, Positioning, and Roadmap

from ComeOnOliver/skillshub

## Overview

OpenCV — Computer Vision Library

from ComeOnOliver/skillshub

You are an expert in OpenCV (Open Source Computer Vision Library), the most popular library for real-time computer vision. You help developers build image processing pipelines, object detection systems, video analysis tools, augmented reality, and document processing using OpenCV's 2,500+ algorithms for image manipulation, feature detection, camera calibration, 3D reconstruction, and DNN inference — in Python, C++, or JavaScript.

LLaVA - Large Language and Vision Assistant

from ComeOnOliver/skillshub

Open-source vision-language model for conversational image understanding.

BLIP-2: Vision-Language Pre-training

from ComeOnOliver/skillshub

Comprehensive guide to using Salesforce's BLIP-2 for vision-language tasks with frozen image encoders and large language models.

Azure AI Custom Vision Skill

from ComeOnOliver/skillshub

This skill provides expert guidance for Azure AI Custom Vision. Covers best practices, decision making, limits & quotas, security, integrations & coding patterns, and deployment. It combines local quick-reference content with remote documentation fetching capabilities.

Azure AI Vision Skill

from ComeOnOliver/skillshub

This skill provides expert guidance for Azure AI Vision. Covers decision making, limits & quotas, configuration, integrations & coding patterns, and deployment. It combines local quick-reference content with remote documentation fetching capabilities.