3d-cv-labeling-2026

Expert in 3D computer vision labeling tools, workflows, and AI-assisted annotation for LiDAR, point clouds, and sensor fusion. Covers SAM4D/Point-SAM, human-in-the-loop architectures, and vertical-specific training strategies. Activate on '3D labeling', 'point cloud annotation', 'LiDAR labeling', 'SAM 3D', 'SAM4D', 'sensor fusion annotation', '3D bounding box', 'semantic segmentation point cloud'. NOT for 2D image labeling (use clip-aware-embeddings), general ML training (use ml-engineer), video annotation without 3D (use computer-vision-pipeline), or VLM prompt engineering (use prompt-engineer).

85 stars

Best use case

3d-cv-labeling-2026 is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Expert in 3D computer vision labeling tools, workflows, and AI-assisted annotation for LiDAR, point clouds, and sensor fusion. Covers SAM4D/Point-SAM, human-in-the-loop architectures, and vertical-specific training strategies. Activate on '3D labeling', 'point cloud annotation', 'LiDAR labeling', 'SAM 3D', 'SAM4D', 'sensor fusion annotation', '3D bounding box', 'semantic segmentation point cloud'. NOT for 2D image labeling (use clip-aware-embeddings), general ML training (use ml-engineer), video annotation without 3D (use computer-vision-pipeline), or VLM prompt engineering (use prompt-engineer).

Teams using 3d-cv-labeling-2026 should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/3d-cv-labeling-2026/SKILL.md --create-dirs "https://raw.githubusercontent.com/curiositech/some_claude_skills/main/inbox/3d-cv-labeling-2026/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/3d-cv-labeling-2026/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How 3d-cv-labeling-2026 Compares

Feature / Agent3d-cv-labeling-2026Standard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Expert in 3D computer vision labeling tools, workflows, and AI-assisted annotation for LiDAR, point clouds, and sensor fusion. Covers SAM4D/Point-SAM, human-in-the-loop architectures, and vertical-specific training strategies. Activate on '3D labeling', 'point cloud annotation', 'LiDAR labeling', 'SAM 3D', 'SAM4D', 'sensor fusion annotation', '3D bounding box', 'semantic segmentation point cloud'. NOT for 2D image labeling (use clip-aware-embeddings), general ML training (use ml-engineer), video annotation without 3D (use computer-vision-pipeline), or VLM prompt engineering (use prompt-engineer).

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# 3D Computer Vision Labeling Expert (2026)

Expert guidance on 3D annotation tools, AI-assisted labeling workflows, and training architectures for LiDAR/point cloud computer vision in autonomous vehicles, robotics, infrastructure inspection, and geospatial applications.

## When to Use This Skill

✅ **Use for:**
- Selecting 3D point cloud annotation tools (BasicAI, Supervisely, Segments.ai, Deepen AI)
- Implementing SAM4D/Point-SAM for auto-labeling workflows
- Designing human-in-the-loop annotation pipelines
- Sensor fusion annotation (camera + LiDAR + radar)
- Training architecture decisions: specialized models vs VLMs
- Vertical-specific 3D detection (autonomous driving, inspection, agriculture, wildfire)

❌ **NOT for:**
- 2D image labeling without 3D context (use clip-aware-embeddings or Label Studio docs)
- General ML model training (use ml-engineer)
- Video annotation without point clouds (use computer-vision-pipeline)
- VLM prompt engineering (use prompt-engineer)
- Photogrammetry/3D reconstruction (use geo processing tools)

---

## 2026 Tool Landscape Overview

### Commercial Leaders

| Tool | Strength | Best For | Key AI Feature |
|------|----------|----------|----------------|
| **BasicAI** | One-click detection | Autonomous driving | Pre-labeling models fine-tuned for AV |
| **Supervisely** | Customization | R&D teams | AI tracking, 2D→3D single-click |
| **Segments.ai** | 2D+3D sync | Robotics perception | Sequential propagation |
| **Deepen AI** | Sensor calibration | In-house perception | Pixel-perfect multi-sensor |
| **Dataloop** | Enterprise MLOps | Large annotation teams | Model-assisted + Point Cloud Focus |
| **Encord** | Full workflow | Multi-modal projects | Track-ID management |
| **Ango Hub (iMerit)** | Dense annotation | Complex multi-modal | Frame-to-frame propagation |

### Open Source Options

| Tool | Maturity | Limitations |
|------|----------|-------------|
| **CVAT** | Stable | 3D bounding boxes only, limited interpolation |
| **3D BAT** | Good | Full-surround annotation, semi-auto tracking |
| **Label Studio** | Partial 3D | Better for multi-format, not specialized 3D |

---

## SAM Evolution for 3D (2024-2026)

### SAM4D (ICCV 2025) - Multi-Modal + Temporal

**Key innovation**: Unified Multi-modal Positional Encoding (UMPE) aligns camera and LiDAR in shared 3D space.

```
Camera Stream → Feature Extraction → ┐
                                      ├→ UMPE Alignment → Promptable 3D Segmentation
LiDAR Stream → Point Encoding     → ┘
```

**Data engine breakthrough**: Automatic pseudo-label generation at 100x+ faster than human annotation using:
1. VFM-driven video masklets
2. Spatiotemporal 4D reconstruction
3. Cross-modal masklet fusion

**Dataset**: Waymo-4DSeg (300k+ camera-LiDAR aligned masklets)

### Point-SAM (ICLR 2025) - Native 3D Prompting

**Architecture**: Efficient transformer designed specifically for point clouds (not adapted from 2D).

**Knowledge distillation**: 2D SAM → 3D Point-SAM via data engine that generates:
- Part-level pseudo-labels
- Object-level pseudo-labels

**Benchmarks**: Outperforms state-of-the-art on indoor (ScanNet) and outdoor (nuScenes, Waymo) datasets.

### SAMNet++ (2025) - Hybrid Pipeline

Two-stage approach:
1. SAM performs unsupervised segmentation
2. Adapted PointNet++ refines for semantic accuracy

**Best for**: UAV/drone workflows where colorized point clouds from L1 LiDAR + RGB cameras are available.

---

## Human-in-the-Loop Architecture

### The Model-in-the-Loop Paradigm (2023-2026)

**Old approach**: Human labels → Train model → Deploy
**New approach**: Model assists → Human validates → Rapid iteration

```
┌─────────────────────────────────────────────────────────┐
│                    LABELING PIPELINE                     │
├─────────────────────────────────────────────────────────┤
│  Raw Data → AI Pre-label → Human Review → QA Check      │
│     │           │              │             │          │
│     │     SAM4D/VLM       Corrections   Consensus      │
│     │     generates       only where    sampling        │
│     │     proposals       AI uncertain                  │
└─────────────────────────────────────────────────────────┘
```

### Efficiency Gains

| Approach | Time for 10k frames | Annotation Quality |
|----------|--------------------|--------------------|
| Manual only | 400 hours | 95% (expert) |
| AI pre-label + review | 50 hours | 97% (AI+human) |
| SAM4D data engine | 4 hours | 92% (pseudo) |

**The 80/20 rule**: ~80% of ML project time is data prep. Model-in-the-loop cuts this dramatically.

### Quality Assurance Strategies

1. **Consensus sampling**: Multiple annotators on subset, measure agreement
2. **Active learning**: Route uncertain predictions to experts
3. **Tiered review**: Tier 1 (critical objects) get SME validation, Tier 2/3 use AI confidence thresholds

---

## Why Specialized Training > VLMs for 3D

### The Core Trade-off

| Aspect | Specialized (YOLO, PointPillars) | VLMs (GPT-4V, Gemini) |
|--------|----------------------------------|----------------------|
| **Latency** | 10-50ms (real-time) | 500-2000ms |
| **3D precision** | Strong geometric priors | Noisy text-3D alignment |
| **Novel objects** | Closed-set (what you train) | Open-vocabulary |
| **Compute** | Edge-deployable | GPU cluster required |
| **Hallucinations** | None (deterministic) | Yes (safety-critical risk) |
| **Domain shift** | Struggles (fog, night) | Better generalization |

### When to Use Each

**Use Specialized Models When:**
- Real-time inference required (autonomous vehicles, robotics)
- Known object classes (infrastructure defects, crop types)
- Safety-critical deployment (can't tolerate hallucinations)
- Edge deployment (drones, embedded systems)

**Use VLMs/Foundation Models When:**
- Zero-shot exploration of new domains
- Generating training data (weak labels)
- Open-vocabulary requirements ("find anything damaged")
- Domain adaptation bootstrapping

### The Hybrid Architecture (2025+ Best Practice)

```
                    ┌───────────────────────┐
                    │    VLM (Slow Brain)   │
                    │  • Scene understanding│
                    │  • Open vocabulary    │
                    │  • Anomaly detection  │
                    └──────────┬────────────┘
                               │ High-level context
                               ▼
┌──────────────────────────────────────────────────────────┐
│              Specialized Detector (Fast Brain)           │
│  • Real-time inference (YOLO, PointPillars, CenterPoint)│
│  • Known object detection & tracking                    │
│  • Safety-critical decisions                            │
└──────────────────────────────────────────────────────────┘
```

**Examples**:
- VOLTRON: YOLOv8 + LLaMA2 for hazard identification
- DrivePI: Point clouds + multi-view + language instructions (0.5B Qwen2.5)

---

## Vertical-Specific Training Architecture

### Infrastructure Inspection

**Objects**: Utility poles, insulators, conductors, vegetation, damage types
**Sensor fusion**: RGB + thermal + LiDAR
**Training data needs**:
- Thermal anomaly samples (varied temperatures)
- Damage taxonomy (cracks, corrosion, rust grades)
- Vegetation clearance measurements

**Architecture**:
```
LiDAR → Point cloud encoder → ┐
Thermal → 2D encoder       → ├→ Fusion → Multi-task head
RGB → 2D encoder           → ┘          ├→ Object detection
                                         ├→ Defect classification
                                         └→ Clearance regression
```

### Autonomous Driving

**Objects**: Vehicles, pedestrians, cyclists, traffic signs, lane markings
**Key requirement**: Temporal consistency (track-IDs across frames)
**Training data needs**:
- Long-tail scenarios (emergency vehicles, animals, debris)
- Adverse weather (fog, rain, snow, night)
- Edge cases (construction zones, accidents)

**Architecture**: CenterPoint, PointPillars, or Voxel-based detectors with BEV (Bird's Eye View) representation.

### Agriculture/Wildfire

**Objects**: Crop rows, canopy height, fuel load, fire spread boundaries
**Sensor fusion**: RGB + multispectral + LiDAR
**Training data needs**:
- Crop growth stages
- Disease/pest visual signatures
- Fuel load density from LiDAR CHM (Canopy Height Model)

**Why not just VLM?** VLMs can't:
- Measure precise heights (LiDAR regression)
- Classify at hyperspectral wavelengths
- Maintain spatial precision for prescription maps

---

## Common Anti-Patterns

### Anti-Pattern: "Just Use SAM on Everything"

**Novice thinking**: "SAM segments anything, so I'll just run it on my LiDAR data"

**Reality**:
- SAM 1/2 are 2D models—they don't understand 3D geometry
- Point clouds need Point-SAM or SAM4D specifically
- Raw application produces noisy masks without geometric priors

**Correct approach**: Use Point-SAM for native 3D, or project to 2D for SAM → lift back to 3D.

### Anti-Pattern: Skipping Human Validation

**Novice thinking**: "AI pre-labels are 95% accurate, we can skip review"

**Reality**:
- 5% error on 100k objects = 5,000 wrong labels
- Errors compound in edge cases (exactly where you need accuracy)
- Model learns to reproduce annotation mistakes

**Correct approach**: Tier 1 (safety-critical) always human-validated. Use confidence thresholds for Tier 2/3.

### Anti-Pattern: VLM for Real-Time Inference

**Novice thinking**: "GPT-4V can identify damage in my photos"

**Reality**:
- 500-2000ms latency per frame
- Can't run on edge devices
- Hallucination risk in safety-critical contexts

**Correct approach**: Use VLM for data generation/exploration, specialized model for deployment.

### Anti-Pattern: Single-Modal Training

**Novice thinking**: "LiDAR is enough for 3D detection"

**Reality**:
- LiDAR: Precise geometry, no color/texture
- Camera: Rich semantics, no depth
- Fusion outperforms single-modal by 5-15% mAP

**Correct approach**: Sensor fusion from day one. SAM4D shows fusion pseudo-labels > single-modal.

---

## Decision Tree: Choosing Your Approach

```
                        Do you need real-time inference?
                              /                  \
                           YES                    NO
                            |                      |
                    Use specialized           Is this exploration?
                    detector (YOLO,              /        \
                    CenterPoint)               YES         NO
                            |                  |           |
                    Have labeled data?     Use VLM      Generate
                      /        \           for zero-    pseudo-labels
                   YES          NO         shot         with SAM4D
                    |            |
              Train model    Use SAM4D/
                             Point-SAM for
                             auto-labeling
```

---

## Tool Selection Decision Matrix

| Requirement | Recommended Tool |
|-------------|------------------|
| Autonomous driving at scale | Deepen AI or BasicAI |
| R&D/research flexibility | Supervisely or Segments.ai |
| Multi-modal (camera+LiDAR+radar) | Ango Hub or Dataloop |
| Self-hosted/open source | CVAT + 3D plugins or 3D BAT |
| Robotics perception | Segments.ai (2D+3D sync) |
| Budget-conscious | Label Studio + custom scripts |

---

## References

- `/references/sam4d-architecture.md` - Deep dive on SAM4D UMPE and data engine
- `/references/tool-comparison-matrix.md` - Detailed feature comparison of all tools
- `/references/hybrid-architecture-examples.md` - VOLTRON, DrivePI implementation patterns
- `/references/vertical-training-recipes.md` - Infrastructure, AV, agriculture specifics

---

## Sources

- [SAM4D: Segment Anything in Camera and LiDAR Streams](https://sam4d-project.github.io/) (ICCV 2025)
- [Point-SAM: Promptable 3D Segmentation Model](https://point-sam.github.io/) (ICLR 2025)
- [Segments.ai: 8 Best Point Cloud Labeling Tools](https://segments.ai/blog/the-8-best-point-cloud-labeling-tools/)
- [A Review of 3D Object Detection with Vision-Language Models](https://arxiv.org/html/2504.18738v1)
- [Vision-Language Models in Autonomous Driving Survey](https://arxiv.org/html/2310.14414v2)

Related Skills

modern-auth-2026

85
from curiositech/some_claude_skills

Modern authentication implementation for 2026 - passkeys (WebAuthn), OAuth (Google, Apple), magic links, and cross-device sync. Use for passwordless-first authentication, social login setup, Supabase Auth, Next.js auth flows, and multi-factor authentication. Activate on "passkeys", "WebAuthn", "Google Sign-In", "Apple Sign-In", "magic link", "passwordless", "authentication", "login", "OAuth", "social login". NOT for session management without auth (use standard JWT docs), authorization/RBAC (use security-auditor), or API key management (use api-architect).

2026-legal-research-agent

85
from curiositech/some_claude_skills

Expert legal research agent for finding and scraping expungement data state by state. Knows authoritative sources, URL patterns, Firecrawl configuration, and 2026 legal landscape. Activate on "find expungement data", "scrape state laws", "legal research", "court URLs", "statute sources", "Clean Slate laws", "automatic expungement research". NOT for interpreting laws (use national-expungement-expert), building UI, or legal advice.

skill-coach

85
from curiositech/some_claude_skills

Guides creation of high-quality Agent Skills with domain expertise, anti-pattern detection, and progressive disclosure best practices. Use when creating skills, reviewing existing skills, or when users mention improving skill quality, encoding expertise, or avoiding common AI tooling mistakes. Activate on keywords: create skill, review skill, skill quality, skill best practices, skill anti-patterns. NOT for general coding advice or non-skill Claude Code features.

wisdom-accountability-coach

85
from curiositech/some_claude_skills

Longitudinal memory tracking, philosophy teaching, and personal accountability with compassion. Expert in pattern recognition, Stoicism/Buddhism, and growth guidance. Activate on 'accountability', 'philosophy', 'Stoicism', 'Buddhism', 'personal growth', 'commitment tracking', 'wisdom teaching'. NOT for therapy or mental health treatment (refer to professionals), crisis intervention, or replacing professional coaching credentials.

windows-95-web-designer

85
from curiositech/some_claude_skills

Modern web applications with authentic Windows 95 aesthetic. Gradient title bars, Start menu paradigm, taskbar patterns, 3D beveled chrome. Extrapolates Win95 to AI chatbots, mobile UIs, responsive layouts. Activate on 'windows 95', 'win95', 'start menu', 'taskbar', 'retro desktop', '95 aesthetic', 'clippy'. NOT for Windows 3.1 (use windows-3-1-web-designer), vaporwave/synthwave, macOS, flat design.

windows-3-1-web-designer

85
from curiositech/some_claude_skills

Modern web applications with authentic Windows 3.1 aesthetic. Solid navy title bars, Program Manager navigation, beveled borders, single window controls. Extrapolates Win31 to AI chatbots (Cue Card paradigm), mobile UIs (pocket computing). Activate on 'windows 3.1', 'win31', 'program manager', 'retro desktop', '90s aesthetic', 'beveled'. NOT for Windows 95 (use windows-95-web-designer - has gradients, Start menu), vaporwave/synthwave, macOS, flat design.

win31-pixel-art-designer

85
from curiositech/some_claude_skills

Expert in Windows 3.1 era pixel art and graphics. Creates icons, banners, splash screens, and UI assets with authentic 16/256-color palettes, dithering patterns, and Program Manager styling. Activate on 'win31 icons', 'pixel art 90s', 'retro icons', '16-color', 'dithering', 'program manager icons', 'VGA palette'. NOT for modern flat icons, vaporwave art, or high-res illustrations.

win31-audio-design

85
from curiositech/some_claude_skills

Expert in Windows 3.1 era sound vocabulary for modern web/mobile apps. Creates satisfying retro UI sounds using CC-licensed 8-bit audio, Web Audio API, and haptic coordination. Activate on 'win31 sounds', 'retro audio', '90s sound effects', 'chimes', 'tada', 'ding', 'satisfying UI sounds'. NOT for modern flat UI sounds, voice synthesis, or music composition.

wedding-immortalist

85
from curiositech/some_claude_skills

Transform thousands of wedding photos and hours of footage into an immersive 3D Gaussian Splatting experience with theatre mode replay, face-clustered guest roster, and AI-curated best photos per person. Expert in 3DGS pipelines, face clustering, aesthetic scoring, and adaptive design matching the couple's wedding theme (disco, rustic, modern, LGBTQ+ celebrations). Activate on "wedding photos", "wedding video", "3D wedding", "Gaussian Splatting wedding", "wedding memory", "wedding immortalize", "face clustering wedding", "best wedding photos". NOT for general photo editing (use native-app-designer), non-wedding 3DGS (use drone-inspection-specialist), or event planning (not a wedding planner).

websocket-streaming

85
from curiositech/some_claude_skills

Implements real-time bidirectional communication between DAG execution engines and visualization dashboards via WebSocket. Covers connection management, typed event protocols, reconnection with backoff, and React hook integration. Activate on "WebSocket", "real-time updates", "live streaming", "execution events", "state streaming", "push notifications". NOT for HTTP REST APIs, server-sent events (SSE), or general networking.

webapp-testing

85
from curiositech/some_claude_skills

Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs. Activate on: Playwright, webapp testing, browser automation, E2E testing, UI testing. NOT for API-only testing without browser, unit tests, or mobile app testing.

web-weather-creator

85
from curiositech/some_claude_skills

Master of stylized atmospheric effects using SVG filters and CSS animations. Creates clouds, waves, lightning, rain, fog, aurora borealis, god rays, lens flares, twilight skies, and ocean spray—all with a premium aesthetic that's stylized but never cheap-looking.