depth-estimation

Performs real-time monocular depth estimation using Depth Anything v2, transforming live camera feeds into privacy-preserving depth maps or overlays for various platforms.

2,608 stars
Complexity: medium

About this skill

This AI agent skill provides real-time monocular depth estimation leveraging the advanced Depth Anything v2 model. It processes live camera feeds and converts them into colorized depth maps, where objects closer to the camera typically appear in warm colors (e.g., red/yellow) and those further away in cool colors (e.g., blue/green). Users can choose from different model sizes (small, base, large), display modes such as 'depth_only' for full anonymization, 'overlay' to blend with the original feed, or 'side_by_side' for comparison. The primary use case for this skill is enhancing privacy in visual monitoring. By utilizing the 'depth_only' blend mode, the scene is fully anonymized, effectively obscuring identities while critically preserving the spatial layout and activity within the environment. This makes it an invaluable tool for security surveillance, smart building management, or any application where tracking movement and presence is necessary without capturing personal identifying information. Users would deploy this skill to integrate advanced depth perception into their systems, ensuring privacy compliance without sacrificing essential contextual data about the scene. It offers flexible configuration for performance, display, and model selection, supporting both CoreML on macOS for optimized Apple Neural Engine performance and PyTorch on Linux/Windows with CUDA/CPU backends.

Best use case

The primary use case is to provide real-time privacy enhancement for video streams by generating depth maps that anonymize individuals while retaining spatial information about activity and object placement. This is particularly beneficial for security and surveillance applications, smart home systems, or any scenario where monitoring activity is required without infringing on personal privacy, helping professionals in these fields maintain compliance and ethical standards.

Performs real-time monocular depth estimation using Depth Anything v2, transforming live camera feeds into privacy-preserving depth maps or overlays for various platforms.

The user should expect a real-time stream or image output displaying a colorized depth map, an overlay of depth on the original feed, or a side-by-side comparison, depending on the chosen display mode and parameters, effectively transforming visual input into spatial data.

Practical example

Example input

Generate a real-time privacy-enhanced depth map from my camera feed using the 'base' model, displaying only the depth, and apply the 'viridis' colormap.

Example output

A live video stream displaying objects closer to the camera in warm colors (e.g., red/yellow) and objects further away in cool colors (e.g., blue/green), completely obscuring original visual details and personal identities.

When to use this skill

  • Monitoring public or semi-public spaces while strictly maintaining individual privacy.
  • Anonymizing video data for analysis or storage to remove identifiable features.
  • Integrating depth perception into smart home or security systems for spatial awareness.
  • Real-time applications where object presence and movement are key, but visual identities are irrelevant or sensitive.

When not to use this skill

  • When high-fidelity visual detail or precise object recognition is critical for the task.
  • When accurate 3D reconstruction from multiple viewpoints is required (as monocular depth is an estimation).
  • In scenarios where capturing and identifying specific individuals is a necessary part of the objective.
  • If the processing device lacks sufficient computational resources for real-time machine learning inference.

How depth-estimation Compares

Feature / Agentdepth-estimationStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexitymediumN/A

Frequently Asked Questions

What does this skill do?

Performs real-time monocular depth estimation using Depth Anything v2, transforming live camera feeds into privacy-preserving depth maps or overlays for various platforms.

How difficult is it to install?

The installation complexity is rated as medium. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Depth Estimation (Privacy)

Real-time monocular depth estimation using Depth Anything v2. Transforms camera feeds with colorized depth maps — near objects appear warm, far objects appear cool.

When used for **privacy mode**, the `depth_only` blend mode fully anonymizes the scene while preserving spatial layout and activity, enabling security monitoring without revealing identities.

## Hardware Backends

| Platform | Backend | Runtime | Model |
|----------|---------|---------|-------|
| **macOS** | CoreML | Apple Neural Engine | `apple/coreml-depth-anything-v2-small` (.mlpackage) |
| Linux/Windows | PyTorch | CUDA / CPU | `depth-anything/Depth-Anything-V2-Small` (.pth) |

On macOS, CoreML runs on the Neural Engine, leaving the GPU free for other tasks. The model is auto-downloaded from HuggingFace and stored at `~/.aegis-ai/models/feature-extraction/`.

## What You Get

- **Privacy anonymization** — depth-only mode hides all visual identity
- **Depth overlays** on live camera feeds
- **3D scene understanding** — spatial layout of the scene
- **CoreML acceleration** — Neural Engine on Apple Silicon (3-5x faster than MPS)

## Interface: TransformSkillBase

This skill implements the `TransformSkillBase` interface. Any new privacy skill can be created by subclassing `TransformSkillBase` and implementing two methods:

```python
from transform_base import TransformSkillBase

class MyPrivacySkill(TransformSkillBase):
    def load_model(self, config):
        # Load your model, return {"model": "...", "device": "..."}
        ...

    def transform_frame(self, image, metadata):
        # Transform BGR image, return BGR image
        ...
```

## Protocol

### Aegis → Skill (stdin)
```jsonl
{"event": "frame", "frame_id": "cam1_1710001", "camera_id": "front_door", "frame_path": "/tmp/frame.jpg", "timestamp": "..."}
{"command": "config-update", "config": {"opacity": 0.8, "blend_mode": "overlay"}}
{"command": "stop"}
```

### Skill → Aegis (stdout)
```jsonl
{"event": "ready", "model": "coreml-DepthAnythingV2SmallF16", "device": "neural_engine", "backend": "coreml"}
{"event": "transform", "frame_id": "cam1_1710001", "camera_id": "front_door", "transform_data": "<base64 JPEG>"}
{"event": "perf_stats", "total_frames": 50, "timings_ms": {"transform": {"avg": 12.5, ...}}}
```

## Setup

```bash
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
```

Related Skills

progressive-estimation

31392
from sickn33/antigravity-awesome-skills

Estimate AI-assisted and hybrid human+agent development work with research-backed PERT statistics and calibration feedback loops

IterativeDepth

11146
from danielmiessler/Personal_AI_Infrastructure

2-8 scientific lens passes to surface hidden requirements single-pass analysis misses. USE WHEN iterative depth, deep exploration, multi-angle analysis, multiple perspectives, examine from angles, surface hidden requirements.

performing-graphql-depth-limit-attack

4032
from mukul975/Anthropic-Cybersecurity-Skills

Execute and test GraphQL depth limit attacks using deeply nested recursive queries to identify denial-of-service vulnerabilities in GraphQL APIs.

bio-tumor-fraction-estimation

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Estimates circulating tumor DNA fraction from shallow whole-genome sequencing using ichorCNA. Detects copy number alterations via HMM segmentation and calculates ctDNA percentage. Requires 0.1-1x sWGS coverage. Use when quantifying tumor burden from liquid biopsy or monitoring treatment response.

task-estimation

242
from aiskillstore/marketplace

Estimate software development tasks accurately using various techniques. Use when planning sprints, roadmaps, or project timelines. Handles story points, t-shirt sizing, planning poker, and estimation best practices.

defense-in-depth

242
from aiskillstore/marketplace

Use when invalid data causes failures deep in execution, requiring validation at multiple system layers - validates at every layer data passes through to make bugs structurally impossible

liquidity-depth-analyzer

242
from aiskillstore/marketplace

DEX liquidity analysis and slippage estimation for MEV trading. Use when implementing swaps, route selection, or position sizing. Triggers on: liquidity, slippage, price impact, depth, AMM math, Uniswap, Curve.

in-depth-research-guide

191
from wentorai/research-plugins

Structured methodology for conducting exhaustive multi-source investigations

defense-in-depth

153
from Microck/ordinary-claude-skills

Use when invalid data causes failures deep in execution, requiring validation at multiple system layers - validates at every layer data passes through to make bugs structurally impossible

Defense-in-Depth Validation

118
from einverne/dotfiles

Validate at every layer data passes through to make bugs impossible

sec-context-depth

108
from alfredolopez80/multi-agent-ralph-loop

Comprehensive AI code security review using 27 sec-context anti-patterns. Use for code review when security vulnerabilities are suspected, especially for AI-generated code.

a-share-market-depth

105
from aifinlab/FinClaw

A股市场深度/订单簿分析。当用户说"市场深度"、"market depth"、"订单簿"、"order book"、"盘口深度"、"挂单量"、"委托深度"时触发。基于 cn-stock-data 获取数据,分析市场深度与订单簿结构。支持 formal/brief 两种输出风格。