depth-estimation
Performs real-time monocular depth estimation using Depth Anything v2, transforming live camera feeds into privacy-preserving depth maps or overlays for various platforms.
About this skill
This AI agent skill provides real-time monocular depth estimation leveraging the advanced Depth Anything v2 model. It processes live camera feeds and converts them into colorized depth maps, where objects closer to the camera typically appear in warm colors (e.g., red/yellow) and those further away in cool colors (e.g., blue/green). Users can choose from different model sizes (small, base, large), display modes such as 'depth_only' for full anonymization, 'overlay' to blend with the original feed, or 'side_by_side' for comparison. The primary use case for this skill is enhancing privacy in visual monitoring. By utilizing the 'depth_only' blend mode, the scene is fully anonymized, effectively obscuring identities while critically preserving the spatial layout and activity within the environment. This makes it an invaluable tool for security surveillance, smart building management, or any application where tracking movement and presence is necessary without capturing personal identifying information. Users would deploy this skill to integrate advanced depth perception into their systems, ensuring privacy compliance without sacrificing essential contextual data about the scene. It offers flexible configuration for performance, display, and model selection, supporting both CoreML on macOS for optimized Apple Neural Engine performance and PyTorch on Linux/Windows with CUDA/CPU backends.
Best use case
The primary use case is to provide real-time privacy enhancement for video streams by generating depth maps that anonymize individuals while retaining spatial information about activity and object placement. This is particularly beneficial for security and surveillance applications, smart home systems, or any scenario where monitoring activity is required without infringing on personal privacy, helping professionals in these fields maintain compliance and ethical standards.
Performs real-time monocular depth estimation using Depth Anything v2, transforming live camera feeds into privacy-preserving depth maps or overlays for various platforms.
The user should expect a real-time stream or image output displaying a colorized depth map, an overlay of depth on the original feed, or a side-by-side comparison, depending on the chosen display mode and parameters, effectively transforming visual input into spatial data.
Practical example
Example input
Generate a real-time privacy-enhanced depth map from my camera feed using the 'base' model, displaying only the depth, and apply the 'viridis' colormap.
Example output
A live video stream displaying objects closer to the camera in warm colors (e.g., red/yellow) and objects further away in cool colors (e.g., blue/green), completely obscuring original visual details and personal identities.
When to use this skill
- Monitoring public or semi-public spaces while strictly maintaining individual privacy.
- Anonymizing video data for analysis or storage to remove identifiable features.
- Integrating depth perception into smart home or security systems for spatial awareness.
- Real-time applications where object presence and movement are key, but visual identities are irrelevant or sensitive.
When not to use this skill
- When high-fidelity visual detail or precise object recognition is critical for the task.
- When accurate 3D reconstruction from multiple viewpoints is required (as monocular depth is an estimation).
- In scenarios where capturing and identifying specific individuals is a necessary part of the objective.
- If the processing device lacks sufficient computational resources for real-time machine learning inference.
How depth-estimation Compares
| Feature / Agent | depth-estimation | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | medium | N/A |
Frequently Asked Questions
What does this skill do?
Performs real-time monocular depth estimation using Depth Anything v2, transforming live camera feeds into privacy-preserving depth maps or overlays for various platforms.
How difficult is it to install?
The installation complexity is rated as medium. You can find the installation instructions above.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
Top AI Agents for Productivity
See the top AI agent skills for productivity, workflow automation, operational systems, documentation, and everyday task execution.
SKILL.md Source
# Depth Estimation (Privacy)
Real-time monocular depth estimation using Depth Anything v2. Transforms camera feeds with colorized depth maps — near objects appear warm, far objects appear cool.
When used for **privacy mode**, the `depth_only` blend mode fully anonymizes the scene while preserving spatial layout and activity, enabling security monitoring without revealing identities.
## Hardware Backends
| Platform | Backend | Runtime | Model |
|----------|---------|---------|-------|
| **macOS** | CoreML | Apple Neural Engine | `apple/coreml-depth-anything-v2-small` (.mlpackage) |
| Linux/Windows | PyTorch | CUDA / CPU | `depth-anything/Depth-Anything-V2-Small` (.pth) |
On macOS, CoreML runs on the Neural Engine, leaving the GPU free for other tasks. The model is auto-downloaded from HuggingFace and stored at `~/.aegis-ai/models/feature-extraction/`.
## What You Get
- **Privacy anonymization** — depth-only mode hides all visual identity
- **Depth overlays** on live camera feeds
- **3D scene understanding** — spatial layout of the scene
- **CoreML acceleration** — Neural Engine on Apple Silicon (3-5x faster than MPS)
## Interface: TransformSkillBase
This skill implements the `TransformSkillBase` interface. Any new privacy skill can be created by subclassing `TransformSkillBase` and implementing two methods:
```python
from transform_base import TransformSkillBase
class MyPrivacySkill(TransformSkillBase):
def load_model(self, config):
# Load your model, return {"model": "...", "device": "..."}
...
def transform_frame(self, image, metadata):
# Transform BGR image, return BGR image
...
```
## Protocol
### Aegis → Skill (stdin)
```jsonl
{"event": "frame", "frame_id": "cam1_1710001", "camera_id": "front_door", "frame_path": "/tmp/frame.jpg", "timestamp": "..."}
{"command": "config-update", "config": {"opacity": 0.8, "blend_mode": "overlay"}}
{"command": "stop"}
```
### Skill → Aegis (stdout)
```jsonl
{"event": "ready", "model": "coreml-DepthAnythingV2SmallF16", "device": "neural_engine", "backend": "coreml"}
{"event": "transform", "frame_id": "cam1_1710001", "camera_id": "front_door", "transform_data": "<base64 JPEG>"}
{"event": "perf_stats", "total_frames": 50, "timings_ms": {"transform": {"avg": 12.5, ...}}}
```
## Setup
```bash
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
```Related Skills
progressive-estimation
Estimate AI-assisted and hybrid human+agent development work with research-backed PERT statistics and calibration feedback loops
IterativeDepth
2-8 scientific lens passes to surface hidden requirements single-pass analysis misses. USE WHEN iterative depth, deep exploration, multi-angle analysis, multiple perspectives, examine from angles, surface hidden requirements.
performing-graphql-depth-limit-attack
Execute and test GraphQL depth limit attacks using deeply nested recursive queries to identify denial-of-service vulnerabilities in GraphQL APIs.
bio-tumor-fraction-estimation
Estimates circulating tumor DNA fraction from shallow whole-genome sequencing using ichorCNA. Detects copy number alterations via HMM segmentation and calculates ctDNA percentage. Requires 0.1-1x sWGS coverage. Use when quantifying tumor burden from liquid biopsy or monitoring treatment response.
task-estimation
Estimate software development tasks accurately using various techniques. Use when planning sprints, roadmaps, or project timelines. Handles story points, t-shirt sizing, planning poker, and estimation best practices.
defense-in-depth
Use when invalid data causes failures deep in execution, requiring validation at multiple system layers - validates at every layer data passes through to make bugs structurally impossible
liquidity-depth-analyzer
DEX liquidity analysis and slippage estimation for MEV trading. Use when implementing swaps, route selection, or position sizing. Triggers on: liquidity, slippage, price impact, depth, AMM math, Uniswap, Curve.
in-depth-research-guide
Structured methodology for conducting exhaustive multi-source investigations
defense-in-depth
Use when invalid data causes failures deep in execution, requiring validation at multiple system layers - validates at every layer data passes through to make bugs structurally impossible
Defense-in-Depth Validation
Validate at every layer data passes through to make bugs impossible
sec-context-depth
Comprehensive AI code security review using 27 sec-context anti-patterns. Use for code review when security vulnerabilities are suspected, especially for AI-generated code.
a-share-market-depth
A股市场深度/订单簿分析。当用户说"市场深度"、"market depth"、"订单簿"、"order book"、"盘口深度"、"挂单量"、"委托深度"时触发。基于 cn-stock-data 获取数据,分析市场深度与订单簿结构。支持 formal/brief 两种输出风格。