computer-vision-guide
Apply computer vision research methods, models, and evaluation tools
Best use case
computer-vision-guide is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Apply computer vision research methods, models, and evaluation tools
Teams using computer-vision-guide should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/computer-vision-guide/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How computer-vision-guide Compares
| Feature / Agent | computer-vision-guide | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Apply computer vision research methods, models, and evaluation tools
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Computer Vision Guide
A skill for conducting computer vision research, covering model architectures, dataset preparation, training pipelines, evaluation metrics, and common experimental protocols for image classification, object detection, and segmentation tasks.
## Core Tasks and Architectures
### Computer Vision Task Taxonomy
```
Image Classification:
Input: Single image
Output: Class label(s)
Models: ResNet, EfficientNet, ViT, ConvNeXt
Object Detection:
Input: Single image
Output: Bounding boxes + class labels
Models: YOLO (v5-v9), Faster R-CNN, DETR, RT-DETR
Semantic Segmentation:
Input: Single image
Output: Per-pixel class label
Models: U-Net, DeepLab, SegFormer, Mask2Former
Instance Segmentation:
Input: Single image
Output: Per-pixel labels distinguishing individual objects
Models: Mask R-CNN, Mask2Former, SAM
Image Generation:
Input: Text prompt or noise
Output: Generated image
Models: Stable Diffusion, DALL-E, Imagen
```
### Model Architecture Evolution
```
CNNs (Convolutional Neural Networks):
LeNet (1998) -> AlexNet (2012) -> VGG (2014) -> ResNet (2015)
-> EfficientNet (2019) -> ConvNeXt (2022)
Vision Transformers:
ViT (2020) -> DeiT (2021) -> Swin Transformer (2021)
-> BEiT (2021) -> DINOv2 (2023)
Trend: Transformers are competitive with CNNs at scale.
Hybrid architectures combining convolutions and attention are common.
```
## Dataset Preparation
### Building a Research Dataset
```python
import os
from pathlib import Path
def organize_image_dataset(source_dir: str,
split_ratios: dict = None) -> dict:
"""
Organize images into train/val/test splits.
Args:
source_dir: Directory containing class subdirectories
split_ratios: Dict with 'train', 'val', 'test' ratios
"""
if split_ratios is None:
split_ratios = {"train": 0.7, "val": 0.15, "test": 0.15}
import random
random.seed(42)
stats = {}
for class_dir in sorted(Path(source_dir).iterdir()):
if not class_dir.is_dir():
continue
images = list(class_dir.glob("*.jpg")) + list(class_dir.glob("*.png"))
random.shuffle(images)
n = len(images)
n_train = int(n * split_ratios["train"])
n_val = int(n * split_ratios["val"])
stats[class_dir.name] = {
"total": n,
"train": n_train,
"val": n_val,
"test": n - n_train - n_val
}
return stats
```
### Data Augmentation
```python
from torchvision import transforms
def get_training_transforms(img_size: int = 224) -> transforms.Compose:
"""
Standard data augmentation pipeline for training.
Args:
img_size: Target image size
"""
return transforms.Compose([
transforms.RandomResizedCrop(img_size, scale=(0.8, 1.0)),
transforms.RandomHorizontalFlip(p=0.5),
transforms.ColorJitter(brightness=0.2, contrast=0.2,
saturation=0.2, hue=0.1),
transforms.RandomRotation(15),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
])
```
## Training Pipeline
### Transfer Learning Workflow
```python
import torch
import torch.nn as nn
from torchvision import models
def create_classifier(num_classes: int,
backbone: str = "resnet50",
pretrained: bool = True) -> nn.Module:
"""
Create an image classifier using transfer learning.
Args:
num_classes: Number of target classes
backbone: Model architecture name
pretrained: Whether to use ImageNet-pretrained weights
"""
if backbone == "resnet50":
weights = models.ResNet50_Weights.DEFAULT if pretrained else None
model = models.resnet50(weights=weights)
model.fc = nn.Linear(model.fc.in_features, num_classes)
elif backbone == "vit_b_16":
weights = models.ViT_B_16_Weights.DEFAULT if pretrained else None
model = models.vit_b_16(weights=weights)
model.heads.head = nn.Linear(
model.heads.head.in_features, num_classes
)
else:
raise ValueError(f"Unknown backbone: {backbone}")
return model
```
## Evaluation Metrics
### Metrics by Task
```
Classification:
- Top-1 Accuracy: Fraction of correct predictions
- Top-5 Accuracy: Correct class in top 5 predictions
- Precision, Recall, F1: Per-class and macro-averaged
- Confusion Matrix: Visualize class-level errors
Object Detection:
- mAP (mean Average Precision): Standard COCO metric
- mAP@0.5: AP at IoU threshold 0.5
- mAP@0.5:0.95: AP averaged over IoU thresholds 0.5 to 0.95
- AP per class: Identifies weak categories
Segmentation:
- mIoU (mean Intersection over Union): Standard metric
- Pixel Accuracy: Fraction of correctly classified pixels
- Dice Coefficient: F1 score at the pixel level
```
## Reproducibility Checklist
### What to Report in Papers
```
1. Architecture: Exact model name, number of parameters
2. Pretraining: Dataset and weights used for initialization
3. Training: Optimizer, learning rate schedule, batch size, epochs
4. Augmentation: Full list of augmentations with parameters
5. Hardware: GPU type, number, training time
6. Evaluation: Exact metrics, test set version, evaluation protocol
7. Code: Link to repository with training and evaluation scripts
8. Random seeds: Report seeds used; ideally report mean over 3+ seeds
```
## Ethical Considerations
When collecting or using image datasets, consider consent (especially for images of people), geographic and demographic representation, potential for bias amplification, and dual-use concerns. Document the dataset's composition and limitations. Follow the Datasheets for Datasets framework. For generative models, implement safeguards against generating harmful content.Related Skills
thuthesis-guide
Write Tsinghua University theses using the ThuThesis LaTeX template
thesis-writing-guide
Templates, formatting rules, and strategies for thesis and dissertation writing
thesis-template-guide
Set up LaTeX templates for PhD and Master's thesis documents
sjtuthesis-guide
Write SJTU theses using the SJTUThesis LaTeX template with full compliance
novathesis-guide
LaTeX thesis template supporting multiple universities and formats
graphical-abstract-guide
Create SVG graphical abstracts for journal paper submissions
beamer-presentation-guide
Guide to creating academic presentations with LaTeX Beamer
plagiarism-detection-guide
Use plagiarism detection tools and ensure manuscript originality
paper-polish-guide
Review and polish LaTeX research papers for clarity and style
grammar-checker-guide
Use grammar and style checking tools to polish academic manuscripts
conciseness-editing-guide
Eliminate wordiness and redundancy in academic prose for clarity
academic-translation-guide
Academic translation, post-editing, and Chinglish correction guide