OpenCV — Computer Vision Library

You are an expert in OpenCV (Open Source Computer Vision Library), the most popular library for real-time computer vision. You help developers build image processing pipelines, object detection systems, video analysis tools, augmented reality, and document processing using OpenCV's 2,500+ algorithms for image manipulation, feature detection, camera calibration, 3D reconstruction, and DNN inference — in Python, C++, or JavaScript.

25 stars

Best use case

OpenCV — Computer Vision Library is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

You are an expert in OpenCV (Open Source Computer Vision Library), the most popular library for real-time computer vision. You help developers build image processing pipelines, object detection systems, video analysis tools, augmented reality, and document processing using OpenCV's 2,500+ algorithms for image manipulation, feature detection, camera calibration, 3D reconstruction, and DNN inference — in Python, C++, or JavaScript.

Teams using OpenCV — Computer Vision Library should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/opencv/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/TerminalSkills/skills/opencv/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/opencv/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How OpenCV — Computer Vision Library Compares

Feature / AgentOpenCV — Computer Vision LibraryStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

You are an expert in OpenCV (Open Source Computer Vision Library), the most popular library for real-time computer vision. You help developers build image processing pipelines, object detection systems, video analysis tools, augmented reality, and document processing using OpenCV's 2,500+ algorithms for image manipulation, feature detection, camera calibration, 3D reconstruction, and DNN inference — in Python, C++, or JavaScript.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# OpenCV — Computer Vision Library

You are an expert in OpenCV (Open Source Computer Vision Library), the most popular library for real-time computer vision. You help developers build image processing pipelines, object detection systems, video analysis tools, augmented reality, and document processing using OpenCV's 2,500+ algorithms for image manipulation, feature detection, camera calibration, 3D reconstruction, and DNN inference — in Python, C++, or JavaScript.

## Core Capabilities

### Image Processing

```python
import cv2
import numpy as np

# Read and display
img = cv2.imread("photo.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Resize
resized = cv2.resize(img, (800, 600))
# Or maintain aspect ratio
scale = 800 / img.shape[1]
resized = cv2.resize(img, None, fx=scale, fy=scale)

# Blur (noise reduction)
blurred = cv2.GaussianBlur(img, (5, 5), 0)
median = cv2.medianBlur(img, 5)           # Better for salt-and-pepper noise

# Edge detection
edges = cv2.Canny(gray, 50, 150)

# Thresholding
_, binary = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
adaptive = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
                                  cv2.THRESH_BINARY, 11, 2)

# Morphological operations
kernel = np.ones((5, 5), np.uint8)
dilated = cv2.dilate(binary, kernel, iterations=1)
eroded = cv2.erode(binary, kernel, iterations=1)
opened = cv2.morphologyEx(binary, cv2.MORPH_OPEN, kernel)   # Remove noise
closed = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel)  # Fill gaps
```

### Object Detection (DNN Module)

```python
# YOLO inference with OpenCV DNN
net = cv2.dnn.readNetFromONNX("yolov8n.onnx")

def detect_objects(image, conf_threshold=0.5):
    """Detect objects using YOLOv8 with OpenCV DNN backend.

    Args:
        image: BGR image (numpy array)
        conf_threshold: Minimum confidence to keep detection

    Returns:
        List of (class_id, confidence, x, y, w, h) tuples
    """
    blob = cv2.dnn.blobFromImage(image, 1/255.0, (640, 640), swapRB=True, crop=False)
    net.setInput(blob)
    outputs = net.forward(net.getUnconnectedOutLayersNames())

    detections = []
    h, w = image.shape[:2]

    for output in outputs:
        for detection in output[0]:
            scores = detection[4:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]

            if confidence > conf_threshold:
                cx, cy, bw, bh = detection[:4]
                x = int((cx - bw/2) * w / 640)
                y = int((cy - bh/2) * h / 640)
                detections.append((class_id, float(confidence), x, y, int(bw*w/640), int(bh*h/640)))

    return detections
```

### Video Processing

```python
# Real-time video processing
cap = cv2.VideoCapture(0)                  # Webcam
# cap = cv2.VideoCapture("video.mp4")     # File

# Output video
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
out = cv2.VideoWriter("output.mp4", fourcc, 30.0, (640, 480))

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Process frame
    processed = cv2.GaussianBlur(frame, (15, 15), 0)
    edges = cv2.Canny(frame, 50, 150)

    # Draw detections
    for cls, conf, x, y, w, h in detect_objects(frame):
        cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
        cv2.putText(frame, f"{CLASSES[cls]} {conf:.2f}", (x, y-10),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

    out.write(frame)
    cv2.imshow("Detection", frame)
    if cv2.waitKey(1) & 0xFF == ord("q"):
        break

cap.release()
out.release()
```

### Contours and Shape Detection

```python
# Find and analyze contours
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
contours, hierarchy = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

for contour in contours:
    area = cv2.contourArea(contour)
    if area < 100:                         # Skip small noise
        continue

    # Bounding box
    x, y, w, h = cv2.boundingRect(contour)

    # Shape approximation
    peri = cv2.arcLength(contour, True)
    approx = cv2.approxPolyDP(contour, 0.04 * peri, True)
    sides = len(approx)

    shape = "circle" if sides > 8 else {3: "triangle", 4: "rectangle"}.get(sides, "polygon")
    cv2.drawContours(img, [contour], -1, (0, 255, 0), 2)
    cv2.putText(img, shape, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)
```

## Installation

```bash
pip install opencv-python                  # Core + main modules
pip install opencv-contrib-python          # + extra modules (SIFT, face detection, tracking)
pip install opencv-python-headless         # Without GUI (for servers)
```

## Best Practices

1. **BGR not RGB** — OpenCV loads images in BGR; convert with `cv2.cvtColor` when using with matplotlib or PIL
2. **DNN for inference** — Use `cv2.dnn` for running YOLO, SSD, face detection; no PyTorch/TF dependency needed
3. **Preprocessing pipeline** — Resize → blur → convert → threshold → morphology → contours; order matters
4. **Contour hierarchy** — Use `RETR_EXTERNAL` for outermost contours only; `RETR_TREE` for nested relationships
5. **Video with codec** — Use `mp4v` for MP4, `XVID` for AVI; check codec availability on your platform
6. **NumPy integration** — OpenCV images are NumPy arrays; use NumPy for fast pixel-level operations
7. **Headless for servers** — Install `opencv-python-headless`; no X11/GUI dependencies needed for processing pipelines
8. **GPU acceleration** — Build from source with CUDA support for 10-50x speedup on GPU; or use `cv2.cuda` module

Related Skills

processing-computer-vision-tasks

25
from ComeOnOliver/skillshub

Process images using object detection, classification, and segmentation. Use when requesting "analyze image", "object detection", "image classification", or "computer vision". Trigger with relevant phrases based on skill purpose.

zlibrary-to-notebooklm

25
from ComeOnOliver/skillshub

自动从 Z-Library 下载书籍并上传到 Google NotebookLM。支持 PDF/EPUB 格式,自动转换,一键创建知识库。

vision-exploration

25
from ComeOnOliver/skillshub

终局愿景探索。用户抛出一个模糊 idea,AI 主导引导,通过"追问价值 → 挖掘动机 → 推导演化 → 画终局"的链路,帮用户看到未来最远的可能性。不设限,不收敛,纯发散。

terraform-module-library

25
from ComeOnOliver/skillshub

Build reusable Terraform modules for AWS, Azure, and GCP infrastructure following infrastructure-as-code best practices. Use when creating infrastructure modules, standardizing cloud provisioning, or implementing reusable IaC components.

prompt-library

25
from ComeOnOliver/skillshub

Curated collection of high-quality prompts for various use cases. Includes role-based prompts, task-specific templates, and prompt refinement techniques. Use when user needs prompt templates, role-play prompts, or ready-to-use prompt examples for coding, writing, analysis, or creative tasks.

computer-vision-expert

25
from ComeOnOliver/skillshub

SOTA Computer Vision Expert (2026). Specialized in YOLO26, Segment Anything 3 (SAM 3), Vision Language Models, and real-time spatial analysis.

computer-use-agents

25
from ComeOnOliver/skillshub

Build AI agents that interact with computers like humans do - viewing screens, moving cursors, clicking buttons, and typing text. Covers Anthropic's Computer Use, OpenAI's Operator/CUA, and open-source alternatives. Critical focus on sandboxing, security, and handling the unique challenges of vision-based control. Use when: computer use, desktop automation agent, screen control AI, vision-based agent, GUI automation.

azure-ai-vision-imageanalysis-py

25
from ComeOnOliver/skillshub

Azure AI Vision Image Analysis SDK for captions, tags, objects, OCR, people detection, and smart cropping. Use for computer vision and image understanding tasks. Triggers: "image analysis", "computer vision", "OCR", "object detection", "ImageAnalysisClient", "image caption".

azure-ai-vision-imageanalysis-java

25
from ComeOnOliver/skillshub

Build image analysis applications with Azure AI Vision SDK for Java. Use when implementing image captioning, OCR text extraction, object detection, tagging, or smart cropping.

library-detection

25
from ComeOnOliver/skillshub

Detect project stack from package manifests (package.json, pyproject.toml, go.mod, Cargo.toml, pubspec.yaml, CMakeLists.txt). Auto-identify frameworks, test tools, and build systems for onboarding.

fetching-library-docs

25
from ComeOnOliver/skillshub

Token-efficient library API documentation fetcher using Context7 MCP with 77% token savings. Fetches code examples, API references, and usage patterns for published libraries (React, Next.js, Prisma, etc). Use when users ask "how do I use X library", need code examples, want API syntax, or are learning a framework's official API. Triggers: "Show me React hooks", "Prisma query syntax", "Next.js routing API". NOT for exploring repo internals/source code (use researching-with-deepwiki) or local files.

gemini-computer-use

25
from ComeOnOliver/skillshub

Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.