computer-vision-expert

SOTA Computer Vision Expert (2026). Specialized in YOLO26, Segment Anything 3 (SAM 3), Vision Language Models, and real-time spatial analysis.

40 stars

bybenjaminasterA

View on GitHub Installation ↓

Best use case

computer-vision-expert is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

SOTA Computer Vision Expert (2026). Specialized in YOLO26, Segment Anything 3 (SAM 3), Vision Language Models, and real-time spatial analysis.

Teams using computer-vision-expert should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/computer-vision-expert/SKILL.md --create-dirs "https://raw.githubusercontent.com/benjaminasterA/antigravity-awesome-skills/main/skills/computer-vision-expert/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/computer-vision-expert/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How computer-vision-expert Compares

Feature / Agent	computer-vision-expert	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

SOTA Computer Vision Expert (2026). Specialized in YOLO26, Segment Anything 3 (SAM 3), Vision Language Models, and real-time spatial analysis.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Computer Vision Expert (SOTA 2026)

**Role**: Advanced Vision Systems Architect & Spatial Intelligence Expert

## Purpose
To provide expert guidance on designing, implementing, and optimizing state-of-the-art computer vision pipelines. From real-time object detection with YOLO26 to foundation model-based segmentation with SAM 3 and visual reasoning with VLMs.

## When to Use
- Designing high-performance real-time detection systems (YOLO26).
- Implementing zero-shot or text-guided segmentation tasks (SAM 3).
- Building spatial awareness, depth estimation, or 3D reconstruction systems.
- Optimizing vision models for edge device deployment (ONNX, TensorRT, NPU).
- Needing to bridge classical geometry (calibration) with modern deep learning.

## Capabilities

### 1. Unified Real-Time Detection (YOLO26)
- **NMS-Free Architecture**: Mastery of end-to-end inference without Non-Maximum Suppression (reducing latency and complexity).
- **Edge Deployment**: Optimization for low-power hardware using Distribution Focal Loss (DFL) removal and MuSGD optimizer.
- **Improved Small-Object Recognition**: Expertise in using ProgLoss and STAL assignment for high precision in IoT and industrial settings.

### 2. Promptable Segmentation (SAM 3)
- **Text-to-Mask**: Ability to segment objects using natural language descriptions (e.g., "the blue container on the right").
- **SAM 3D**: Reconstructing objects, scenes, and human bodies in 3D from single/multi-view images.
- **Unified Logic**: One model for detection, segmentation, and tracking with 2x accuracy over SAM 2.

### 3. Vision Language Models (VLMs)
- **Visual Grounding**: Leveraging Florence-2, PaliGemma 2, or Qwen2-VL for semantic scene understanding.
- **Visual Question Answering (VQA)**: Extracting structured data from visual inputs through conversational reasoning.

### 4. Geometry & Reconstruction
- **Depth Anything V2**: State-of-the-art monocular depth estimation for spatial awareness.
- **Sub-pixel Calibration**: Chessboard/Charuco pipelines for high-precision stereo/multi-camera rigs.
- **Visual SLAM**: Real-time localization and mapping for autonomous systems.

## Patterns

### 1. Text-Guided Vision Pipelines
- Use SAM 3's text-to-mask capability to isolate specific parts during inspection without needing custom detectors for every variation.
- Combine YOLO26 for fast "candidate proposal" and SAM 3 for "precise mask refinement".

### 2. Deployment-First Design
- Leverage YOLO26's simplified ONNX/TensorRT exports (NMS-free).
- Use MuSGD for significantly faster training convergence on custom datasets.

### 3. Progressive 3D Scene Reconstruction
- Integrate monocular depth maps with geometric homographies to build accurate 2.5D/3D representations of scenes.

## Anti-Patterns

- **Manual NMS Post-processing**: Stick to NMS-free architectures (YOLO26/v10+) for lower overhead.
- **Click-Only Segmentation**: Forgetting that SAM 3 eliminates the need for manual point prompts in many scenarios via text grounding.
- **Legacy DFL Exports**: Using outdated export pipelines that don't take advantage of YOLO26's simplified module structure.

## Sharp Edges (2026)

| Issue | Severity | Solution |
|-------|----------|----------|
| SAM 3 VRAM Usage | Medium | Use quantized/distilled versions for local GPU inference. |
| Text Ambiguity | Low | Use descriptive prompts ("the 5mm bolt" instead of just "bolt"). |
| Motion Blur | Medium | Optimize shutter speed or use SAM 3's temporal tracking consistency. |
| Hardware Compatibility | Low | YOLO26 simplified architecture is highly compatible with NPU/TPUs. |

## Related Skills
`ai-engineer`, `robotics-expert`, `research-engineer`, `embedded-systems`

Related Skills

typescript-expert

from benjaminasterA/antigravity-awesome-skills

threat-modeling-expert

from benjaminasterA/antigravity-awesome-skills

Expert in threat modeling methodologies, security architecture review, and risk assessment. Masters STRIDE, PASTA, attack trees, and security requirement extraction. Use for security architecture r...

swiftui-expert-skill

from benjaminasterA/antigravity-awesome-skills

Write, review, or improve SwiftUI code following best practices for state management, view composition, performance, modern APIs, Swift concurrency, and iOS 26+ Liquid Glass adoption. Use when buil...

service-mesh-expert

from benjaminasterA/antigravity-awesome-skills

Expert service mesh architect specializing in Istio, Linkerd, and cloud-native networking patterns. Masters traffic management, security policies, observability integration, and multi-cluster mesh con

prisma-expert

from benjaminasterA/antigravity-awesome-skills

Prisma ORM expert for schema design, migrations, query optimization, relations modeling, and database operations. Use PROACTIVELY for Prisma schema issues, migration problems, query performance, re...

nosql-expert

from benjaminasterA/antigravity-awesome-skills

Expert guidance for distributed NoSQL databases (Cassandra, DynamoDB). Focuses on mental models, query-first modeling, single-table design, and avoiding hot partitions in high-scale systems.

nestjs-expert

from benjaminasterA/antigravity-awesome-skills

Nest.js framework expert specializing in module architecture, dependency injection, middleware, guards, interceptors, testing with Jest/Supertest, TypeORM/Mongoose integration, and Passport.js auth...

n8n-mcp-tools-expert

from benjaminasterA/antigravity-awesome-skills

Expert guide for using n8n-mcp MCP tools effectively. Use when searching for nodes, validating configurations, accessing templates, managing workflows, or using any n8n-mcp tool. Provides tool sele...

mermaid-expert

from benjaminasterA/antigravity-awesome-skills

Create Mermaid diagrams for flowcharts, sequences, ERDs, and

laravel-expert

from benjaminasterA/antigravity-awesome-skills

Senior Laravel Engineer role for production-grade, maintainable, and idiomatic Laravel solutions. Focuses on clean architecture, security, performance, and modern standards (Laravel 10/11+).

flutter-expert

from benjaminasterA/antigravity-awesome-skills

Master Flutter development with Dart 3, advanced widgets, and

docker-expert

from benjaminasterA/antigravity-awesome-skills

Docker containerization expert with deep knowledge of multi-stage builds, image optimization, container security, Docker Compose orchestration, and production deployment patterns. Use PROACTIVELY f...