Senior Computer Vision

## Overview

25 stars

Best use case

Senior Computer Vision is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

## Overview

Teams using Senior Computer Vision should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/senior-computer-vision/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/TerminalSkills/skills/senior-computer-vision/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/senior-computer-vision/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How Senior Computer Vision Compares

Feature / AgentSenior Computer VisionStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

## Overview

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Senior Computer Vision

## Overview

Build and deploy computer vision pipelines for object detection, image segmentation, and visual AI. Supports YOLO (v8/v11), Faster R-CNN, SAM (Segment Anything Model), and Mask R-CNN. Includes TensorRT optimization for production deployment with real-time inference.

## Instructions

When a user asks for computer vision help, determine the task:

### Task A: Object detection with YOLO

1. Install ultralytics:

```bash
pip install ultralytics
```

2. Run inference:

```python
from ultralytics import YOLO

# Load a pretrained model
model = YOLO("yolo11n.pt")  # nano (fastest) | s | m | l | x (most accurate)

# Detect objects in an image
results = model("image.jpg")

# Process results
for result in results:
    boxes = result.boxes
    for box in boxes:
        cls = int(box.cls[0])
        conf = float(box.conf[0])
        label = model.names[cls]
        x1, y1, x2, y2 = box.xyxy[0].tolist()
        print(f"{label}: {conf:.2f} at [{x1:.0f},{y1:.0f},{x2:.0f},{y2:.0f}]")

# Save annotated image
results[0].save("result.jpg")
```

3. Run on video:

```python
# Process video with tracking
results = model.track("video.mp4", show=False, save=True, tracker="bytetrack.yaml")
```

4. Train a custom YOLO model:

```python
model = YOLO("yolo11n.pt")
model.train(
    data="dataset.yaml",   # Path to dataset config
    epochs=100,
    imgsz=640,
    batch=16,
    device=0,              # GPU index
    patience=20,           # Early stopping
)
```

Dataset YAML format:
```yaml
path: ./dataset
train: images/train
val: images/val
names:
  0: cat
  1: dog
  2: bird
```

### Task B: Image segmentation with SAM

```python
from segment_anything import sam_model_registry, SamPredictor, SamAutomaticMaskGenerator

# Load SAM model
sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
sam.to(device="cuda")

# Point-based segmentation
predictor = SamPredictor(sam)
predictor.set_image(image)  # numpy array (H, W, 3)

# Segment with a point prompt
masks, scores, logits = predictor.predict(
    point_coords=np.array([[500, 375]]),  # (x, y) coordinates
    point_labels=np.array([1]),            # 1 = foreground, 0 = background
    multimask_output=True,
)

# Automatic mask generation (segment everything)
mask_generator = SamAutomaticMaskGenerator(sam)
masks = mask_generator.generate(image)
# Each mask: {segmentation, area, bbox, predicted_iou, stability_score}
print(f"Found {len(masks)} segments")
```

### Task C: Faster R-CNN and Mask R-CNN with torchvision

```python
import torchvision
from torchvision.models.detection import fasterrcnn_resnet50_fpn_v2, maskrcnn_resnet50_fpn_v2
from torchvision.transforms import functional as F
from PIL import Image

# Object detection with Faster R-CNN
det_model = fasterrcnn_resnet50_fpn_v2(weights="DEFAULT")
det_model.eval().cuda()

img = Image.open("image.jpg")
img_tensor = F.to_tensor(img).unsqueeze(0).cuda()

with torch.no_grad():
    predictions = det_model(img_tensor)[0]

# Filter by confidence
threshold = 0.7
for i in range(len(predictions["scores"])):
    if predictions["scores"][i] > threshold:
        label = predictions["labels"][i].item()
        score = predictions["scores"][i].item()
        box = predictions["boxes"][i].tolist()
        print(f"Class {label}: {score:.2f} at {box}")

# Instance segmentation with Mask R-CNN
seg_model = maskrcnn_resnet50_fpn_v2(weights="DEFAULT")
seg_model.eval().cuda()

with torch.no_grad():
    predictions = seg_model(img_tensor)[0]
    # predictions["masks"] contains per-instance binary masks
```

### Task D: TensorRT optimization for deployment

```python
# Export YOLO to TensorRT
from ultralytics import YOLO

model = YOLO("yolo11n.pt")
model.export(format="engine", device=0, half=True)  # Creates yolo11n.engine

# Run inference with TensorRT engine
trt_model = YOLO("yolo11n.engine")
results = trt_model("image.jpg")
```

For custom models:

```python
import tensorrt as trt
import torch

# Export PyTorch model to ONNX first
torch.onnx.export(
    model, dummy_input, "model.onnx",
    opset_version=17,
    input_names=["input"],
    output_names=["output"],
    dynamic_axes={"input": {0: "batch"}, "output": {0: "batch"}},
)

# Convert ONNX to TensorRT
# trtexec --onnx=model.onnx --saveEngine=model.engine --fp16
```

### Task E: Image classification

```python
from torchvision.models import efficientnet_v2_s, EfficientNet_V2_S_Weights
from torchvision.transforms import functional as F
from PIL import Image

weights = EfficientNet_V2_S_Weights.DEFAULT
model = efficientnet_v2_s(weights=weights).eval().cuda()
preprocess = weights.transforms()

img = Image.open("image.jpg")
batch = preprocess(img).unsqueeze(0).cuda()

with torch.no_grad():
    logits = model(batch)
    probs = torch.softmax(logits, dim=1)[0]
    top5 = torch.topk(probs, 5)

categories = weights.meta["categories"]
for score, idx in zip(top5.values, top5.indices):
    print(f"{categories[idx]}: {score:.2%}")
```

## Examples

### Example 1: Count products on a shelf

**User request:** "Count how many bottles are on each shelf in this image"

```python
model = YOLO("yolo11m.pt")
results = model("shelf.jpg", conf=0.5)
bottles = [b for b in results[0].boxes if model.names[int(b.cls[0])] == "bottle"]
print(f"Detected {len(bottles)} bottles")
results[0].save("shelf_annotated.jpg")
```

### Example 2: Segment and extract a foreground object

**User request:** "Remove the background from this product photo"

Use SAM with a center-point prompt to segment the main object, then apply the mask to create a transparent PNG background.

### Example 3: Real-time detection on a webcam

**User request:** "Run object detection on my webcam feed"

```python
model = YOLO("yolo11n.pt")
results = model(source=0, show=True, conf=0.5)  # source=0 for webcam
```

## Guidelines

- Start with the smallest model variant (nano/small) and scale up only if accuracy is insufficient.
- Use TensorRT or ONNX Runtime for production deployments; they provide 2-5x speedup over PyTorch.
- For custom detection tasks, fine-tune YOLO on your dataset rather than training from scratch.
- Set confidence thresholds based on the application: 0.5 for general use, 0.7+ for high-precision needs.
- Use half-precision (FP16) inference on GPUs for nearly 2x speedup with minimal accuracy loss.
- Pre-process images to the model's expected resolution before inference for best results.
- For video processing, use batch inference and tracking (ByteTrack) for temporal consistency.
- Benchmark inference speed with `model.benchmark()` (YOLO) before committing to a model size.

Related Skills

processing-computer-vision-tasks

25
from ComeOnOliver/skillshub

Process images using object detection, classification, and segmentation. Use when requesting "analyze image", "object detection", "image classification", or "computer vision". Trigger with relevant phrases based on skill purpose.

senior-pm

25
from ComeOnOliver/skillshub

Senior Project Manager for enterprise software, SaaS, and digital transformation projects. Specializes in portfolio management, quantitative risk analysis, resource optimization, stakeholder alignment, and executive reporting. Uses advanced methodologies including EMV analysis, Monte Carlo simulation, WSJF prioritization, and multi-dimensional health scoring. Use when a user needs help with project plans, project status reports, risk assessments, resource allocation, project roadmaps, milestone tracking, team capacity planning, portfolio health reviews, program management, or executive-level project reporting — especially for enterprise-scale initiatives with multiple workstreams, complex dependencies, or multi-million dollar budgets.

vision-exploration

25
from ComeOnOliver/skillshub

终局愿景探索。用户抛出一个模糊 idea,AI 主导引导,通过"追问价值 → 挖掘动机 → 推导演化 → 画终局"的链路,帮用户看到未来最远的可能性。不设限,不收敛,纯发散。

computer-vision-expert

25
from ComeOnOliver/skillshub

SOTA Computer Vision Expert (2026). Specialized in YOLO26, Segment Anything 3 (SAM 3), Vision Language Models, and real-time spatial analysis.

computer-use-agents

25
from ComeOnOliver/skillshub

Build AI agents that interact with computers like humans do - viewing screens, moving cursors, clicking buttons, and typing text. Covers Anthropic's Computer Use, OpenAI's Operator/CUA, and open-source alternatives. Critical focus on sandboxing, security, and handling the unique challenges of vision-based control. Use when: computer use, desktop automation agent, screen control AI, vision-based agent, GUI automation.

azure-ai-vision-imageanalysis-py

25
from ComeOnOliver/skillshub

Azure AI Vision Image Analysis SDK for captions, tags, objects, OCR, people detection, and smart cropping. Use for computer vision and image understanding tasks. Triggers: "image analysis", "computer vision", "OCR", "object detection", "ImageAnalysisClient", "image caption".

azure-ai-vision-imageanalysis-java

25
from ComeOnOliver/skillshub

Build image analysis applications with Azure AI Vision SDK for Java. Use when implementing image captioning, OCR text extraction, object detection, tagging, or smart cropping.

professional-senior-chrome-extension-architect-developer

25
from ComeOnOliver/skillshub

Verwandelt den Agenten in einen professionellen MV3-Architekten und Entwickler mit Fokus auf AI-Integration, Sicherheit, Performance, Testing und Publishing-Compliance.

senior-rust-practices

25
from ComeOnOliver/skillshub

This skill should be used when the user asks about "rust workspace", "rust best practices", "cargo workspace setup", "rust code organization", "rust dependency management", "rust testing strategy", "rust project", "scalable rust", "rust CI setup", or needs guidance on senior-level Rust development patterns, workspace design, code organization strategies, or production-ready Rust architectures.

gemini-computer-use

25
from ComeOnOliver/skillshub

Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.

senior-security

25
from ComeOnOliver/skillshub

Comprehensive security engineering skill for application security, penetration testing, security architecture, and compliance auditing. Includes security assessment tools, threat modeling, crypto implementation, and security automation. Use when designing security architecture, conducting penetration tests, implementing cryptography, or performing security audits.

senior-secops

25
from ComeOnOliver/skillshub

Comprehensive SecOps skill for application security, vulnerability management, compliance, and secure development practices. Includes security scanning, vulnerability assessment, compliance checking, and security automation. Use when implementing security controls, conducting security audits, responding to vulnerabilities, or ensuring compliance requirements.