vision

Analyze images, screenshots, diagrams, and visual content - Use when you need to understand visual content like screenshots, architecture diagrams, UI mockups, or error screenshots.

25 stars

Best use case

vision is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Analyze images, screenshots, diagrams, and visual content - Use when you need to understand visual content like screenshots, architecture diagrams, UI mockups, or error screenshots.

Teams using vision should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/vision/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/aiskillstore/marketplace/0xsero/vision/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/vision/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How vision Compares

Feature / AgentvisionStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Analyze images, screenshots, diagrams, and visual content - Use when you need to understand visual content like screenshots, architecture diagrams, UI mockups, or error screenshots.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

You are a Vision Analyst specialized in interpreting visual content.

## Focus
- Describe visible UI elements, text, errors, code, layout, and diagrams.
- Extract any legible text accurately, preserving formatting when relevant.
- Note uncertainty or low-confidence readings.

## Output
- Provide concise, actionable observations.
- Call out anything that looks broken, inconsistent, or suspicious.

Related Skills

processing-computer-vision-tasks

25
from ComeOnOliver/skillshub

Process images using object detection, classification, and segmentation. Use when requesting "analyze image", "object detection", "image classification", or "computer vision". Trigger with relevant phrases based on skill purpose.

vision-exploration

25
from ComeOnOliver/skillshub

终局愿景探索。用户抛出一个模糊 idea,AI 主导引导,通过"追问价值 → 挖掘动机 → 推导演化 → 画终局"的链路,帮用户看到未来最远的可能性。不设限,不收敛,纯发散。

computer-vision-expert

25
from ComeOnOliver/skillshub

SOTA Computer Vision Expert (2026). Specialized in YOLO26, Segment Anything 3 (SAM 3), Vision Language Models, and real-time spatial analysis.

azure-ai-vision-imageanalysis-py

25
from ComeOnOliver/skillshub

Azure AI Vision Image Analysis SDK for captions, tags, objects, OCR, people detection, and smart cropping. Use for computer vision and image understanding tasks. Triggers: "image analysis", "computer vision", "OCR", "object detection", "ImageAnalysisClient", "image caption".

azure-ai-vision-imageanalysis-java

25
from ComeOnOliver/skillshub

Build image analysis applications with Azure AI Vision SDK for Java. Use when implementing image captioning, OCR text extraction, object detection, tagging, or smart cropping.

Senior Computer Vision

25
from ComeOnOliver/skillshub

## Overview

Product Strategy — Vision, Positioning, and Roadmap

25
from ComeOnOliver/skillshub

## Overview

OpenCV — Computer Vision Library

25
from ComeOnOliver/skillshub

You are an expert in OpenCV (Open Source Computer Vision Library), the most popular library for real-time computer vision. You help developers build image processing pipelines, object detection systems, video analysis tools, augmented reality, and document processing using OpenCV's 2,500+ algorithms for image manipulation, feature detection, camera calibration, 3D reconstruction, and DNN inference — in Python, C++, or JavaScript.

LLaVA - Large Language and Vision Assistant

25
from ComeOnOliver/skillshub

Open-source vision-language model for conversational image understanding.

BLIP-2: Vision-Language Pre-training

25
from ComeOnOliver/skillshub

Comprehensive guide to using Salesforce's BLIP-2 for vision-language tasks with frozen image encoders and large language models.

Azure AI Custom Vision Skill

25
from ComeOnOliver/skillshub

This skill provides expert guidance for Azure AI Custom Vision. Covers best practices, decision making, limits & quotas, security, integrations & coding patterns, and deployment. It combines local quick-reference content with remote documentation fetching capabilities.

Azure AI Vision Skill

25
from ComeOnOliver/skillshub

This skill provides expert guidance for Azure AI Vision. Covers decision making, limits & quotas, configuration, integrations & coding patterns, and deployment. It combines local quick-reference content with remote documentation fetching capabilities.