OpenCV — Computer Vision Library
You are an expert in OpenCV (Open Source Computer Vision Library), the most popular library for real-time computer vision. You help developers build image processing pipelines, object detection systems, video analysis tools, augmented reality, and document processing using OpenCV's 2,500+ algorithms for image manipulation, feature detection, camera calibration, 3D reconstruction, and DNN inference — in Python, C++, or JavaScript.
Best use case
OpenCV — Computer Vision Library is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
You are an expert in OpenCV (Open Source Computer Vision Library), the most popular library for real-time computer vision. You help developers build image processing pipelines, object detection systems, video analysis tools, augmented reality, and document processing using OpenCV's 2,500+ algorithms for image manipulation, feature detection, camera calibration, 3D reconstruction, and DNN inference — in Python, C++, or JavaScript.
Teams using OpenCV — Computer Vision Library should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/opencv/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How OpenCV — Computer Vision Library Compares
| Feature / Agent | OpenCV — Computer Vision Library | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
You are an expert in OpenCV (Open Source Computer Vision Library), the most popular library for real-time computer vision. You help developers build image processing pipelines, object detection systems, video analysis tools, augmented reality, and document processing using OpenCV's 2,500+ algorithms for image manipulation, feature detection, camera calibration, 3D reconstruction, and DNN inference — in Python, C++, or JavaScript.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# OpenCV — Computer Vision Library
You are an expert in OpenCV (Open Source Computer Vision Library), the most popular library for real-time computer vision. You help developers build image processing pipelines, object detection systems, video analysis tools, augmented reality, and document processing using OpenCV's 2,500+ algorithms for image manipulation, feature detection, camera calibration, 3D reconstruction, and DNN inference — in Python, C++, or JavaScript.
## Core Capabilities
### Image Processing
```python
import cv2
import numpy as np
# Read and display
img = cv2.imread("photo.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Resize
resized = cv2.resize(img, (800, 600))
# Or maintain aspect ratio
scale = 800 / img.shape[1]
resized = cv2.resize(img, None, fx=scale, fy=scale)
# Blur (noise reduction)
blurred = cv2.GaussianBlur(img, (5, 5), 0)
median = cv2.medianBlur(img, 5) # Better for salt-and-pepper noise
# Edge detection
edges = cv2.Canny(gray, 50, 150)
# Thresholding
_, binary = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
adaptive = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY, 11, 2)
# Morphological operations
kernel = np.ones((5, 5), np.uint8)
dilated = cv2.dilate(binary, kernel, iterations=1)
eroded = cv2.erode(binary, kernel, iterations=1)
opened = cv2.morphologyEx(binary, cv2.MORPH_OPEN, kernel) # Remove noise
closed = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel) # Fill gaps
```
### Object Detection (DNN Module)
```python
# YOLO inference with OpenCV DNN
net = cv2.dnn.readNetFromONNX("yolov8n.onnx")
def detect_objects(image, conf_threshold=0.5):
"""Detect objects using YOLOv8 with OpenCV DNN backend.
Args:
image: BGR image (numpy array)
conf_threshold: Minimum confidence to keep detection
Returns:
List of (class_id, confidence, x, y, w, h) tuples
"""
blob = cv2.dnn.blobFromImage(image, 1/255.0, (640, 640), swapRB=True, crop=False)
net.setInput(blob)
outputs = net.forward(net.getUnconnectedOutLayersNames())
detections = []
h, w = image.shape[:2]
for output in outputs:
for detection in output[0]:
scores = detection[4:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > conf_threshold:
cx, cy, bw, bh = detection[:4]
x = int((cx - bw/2) * w / 640)
y = int((cy - bh/2) * h / 640)
detections.append((class_id, float(confidence), x, y, int(bw*w/640), int(bh*h/640)))
return detections
```
### Video Processing
```python
# Real-time video processing
cap = cv2.VideoCapture(0) # Webcam
# cap = cv2.VideoCapture("video.mp4") # File
# Output video
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
out = cv2.VideoWriter("output.mp4", fourcc, 30.0, (640, 480))
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
# Process frame
processed = cv2.GaussianBlur(frame, (15, 15), 0)
edges = cv2.Canny(frame, 50, 150)
# Draw detections
for cls, conf, x, y, w, h in detect_objects(frame):
cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
cv2.putText(frame, f"{CLASSES[cls]} {conf:.2f}", (x, y-10),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
out.write(frame)
cv2.imshow("Detection", frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
cap.release()
out.release()
```
### Contours and Shape Detection
```python
# Find and analyze contours
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
contours, hierarchy = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
for contour in contours:
area = cv2.contourArea(contour)
if area < 100: # Skip small noise
continue
# Bounding box
x, y, w, h = cv2.boundingRect(contour)
# Shape approximation
peri = cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, 0.04 * peri, True)
sides = len(approx)
shape = "circle" if sides > 8 else {3: "triangle", 4: "rectangle"}.get(sides, "polygon")
cv2.drawContours(img, [contour], -1, (0, 255, 0), 2)
cv2.putText(img, shape, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)
```
## Installation
```bash
pip install opencv-python # Core + main modules
pip install opencv-contrib-python # + extra modules (SIFT, face detection, tracking)
pip install opencv-python-headless # Without GUI (for servers)
```
## Best Practices
1. **BGR not RGB** — OpenCV loads images in BGR; convert with `cv2.cvtColor` when using with matplotlib or PIL
2. **DNN for inference** — Use `cv2.dnn` for running YOLO, SSD, face detection; no PyTorch/TF dependency needed
3. **Preprocessing pipeline** — Resize → blur → convert → threshold → morphology → contours; order matters
4. **Contour hierarchy** — Use `RETR_EXTERNAL` for outermost contours only; `RETR_TREE` for nested relationships
5. **Video with codec** — Use `mp4v` for MP4, `XVID` for AVI; check codec availability on your platform
6. **NumPy integration** — OpenCV images are NumPy arrays; use NumPy for fast pixel-level operations
7. **Headless for servers** — Install `opencv-python-headless`; no X11/GUI dependencies needed for processing pipelines
8. **GPU acceleration** — Build from source with CUDA support for 10-50x speedup on GPU; or use `cv2.cuda` moduleRelated Skills
processing-computer-vision-tasks
Process images using object detection, classification, and segmentation. Use when requesting "analyze image", "object detection", "image classification", or "computer vision". Trigger with relevant phrases based on skill purpose.
zlibrary-to-notebooklm
自动从 Z-Library 下载书籍并上传到 Google NotebookLM。支持 PDF/EPUB 格式,自动转换,一键创建知识库。
vision-exploration
终局愿景探索。用户抛出一个模糊 idea,AI 主导引导,通过"追问价值 → 挖掘动机 → 推导演化 → 画终局"的链路,帮用户看到未来最远的可能性。不设限,不收敛,纯发散。
terraform-module-library
Build reusable Terraform modules for AWS, Azure, and GCP infrastructure following infrastructure-as-code best practices. Use when creating infrastructure modules, standardizing cloud provisioning, or implementing reusable IaC components.
prompt-library
Curated collection of high-quality prompts for various use cases. Includes role-based prompts, task-specific templates, and prompt refinement techniques. Use when user needs prompt templates, role-play prompts, or ready-to-use prompt examples for coding, writing, analysis, or creative tasks.
computer-vision-expert
SOTA Computer Vision Expert (2026). Specialized in YOLO26, Segment Anything 3 (SAM 3), Vision Language Models, and real-time spatial analysis.
computer-use-agents
Build AI agents that interact with computers like humans do - viewing screens, moving cursors, clicking buttons, and typing text. Covers Anthropic's Computer Use, OpenAI's Operator/CUA, and open-source alternatives. Critical focus on sandboxing, security, and handling the unique challenges of vision-based control. Use when: computer use, desktop automation agent, screen control AI, vision-based agent, GUI automation.
azure-ai-vision-imageanalysis-py
Azure AI Vision Image Analysis SDK for captions, tags, objects, OCR, people detection, and smart cropping. Use for computer vision and image understanding tasks. Triggers: "image analysis", "computer vision", "OCR", "object detection", "ImageAnalysisClient", "image caption".
azure-ai-vision-imageanalysis-java
Build image analysis applications with Azure AI Vision SDK for Java. Use when implementing image captioning, OCR text extraction, object detection, tagging, or smart cropping.
library-detection
Detect project stack from package manifests (package.json, pyproject.toml, go.mod, Cargo.toml, pubspec.yaml, CMakeLists.txt). Auto-identify frameworks, test tools, and build systems for onboarding.
fetching-library-docs
Token-efficient library API documentation fetcher using Context7 MCP with 77% token savings. Fetches code examples, API references, and usage patterns for published libraries (React, Next.js, Prisma, etc). Use when users ask "how do I use X library", need code examples, want API syntax, or are learning a framework's official API. Triggers: "Show me React hooks", "Prisma query syntax", "Next.js routing API". NOT for exploring repo internals/source code (use researching-with-deepwiki) or local files.
gemini-computer-use
Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.