photo-content-recognition-curation-expert

Expert in photo content recognition, intelligent curation, and quality filtering. Specializes in face/animal/place recognition, perceptual hashing for de-duplication, screenshot/meme detection, burst photo selection, and quick indexing strategies. Activate on 'face recognition', 'face clustering', 'perceptual hash', 'near-duplicate', 'burst photo', 'screenshot detection', 'photo curation', 'photo indexing', 'NSFW detection', 'pet recognition', 'DINOHash', 'HDBSCAN faces'. NOT for GPS-based location clustering (use event-detection-temporal-intelligence-expert), color palette extraction (use color-theory-palette-harmony-expert), semantic image-text matching (use clip-aware-embeddings), or video analysis/frame extraction.

85 stars

bycuriositech

View on GitHub Installation ↓

Best use case

photo-content-recognition-curation-expert is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using photo-content-recognition-curation-expert should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/photo-content-recognition-curation-expert/SKILL.md --create-dirs "https://raw.githubusercontent.com/curiositech/some_claude_skills/main/.claude/skills/photo-content-recognition-curation-expert/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/photo-content-recognition-curation-expert/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How photo-content-recognition-curation-expert Compares

Feature / Agent	photo-content-recognition-curation-expert	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Photo Content Recognition & Curation Expert

Expert in photo content analysis and intelligent curation. Combines classical computer vision with modern deep learning for comprehensive photo analysis.

## When to Use This Skill

✅ **Use for:**
- Face recognition and clustering (identifying important people)
- Animal/pet detection and clustering
- Near-duplicate detection using perceptual hashing (DINOHash, pHash, dHash)
- Burst photo selection (finding best frame from 10-50 shots)
- Screenshot vs photo classification
- Meme/download filtering
- NSFW content detection
- Quick indexing for large photo libraries (10K+)
- Aesthetic quality scoring (NIMA)

❌ **NOT for:**
- GPS-based location clustering → `event-detection-temporal-intelligence-expert`
- Color palette extraction → `color-theory-palette-harmony-expert`
- Semantic image-text matching → `clip-aware-embeddings`
- Video analysis or frame extraction

## Quick Decision Tree

```
What do you need to recognize/filter?
│
├─ Duplicate photos? ─────────────────────────────── Perceptual Hashing
│   ├─ Exact duplicates? ──────────────────────────── dHash (fastest)
│   ├─ Brightness/contrast changes? ───────────────── pHash (DCT-based)
│   ├─ Heavy crops/compression? ───────────────────── DINOHash (2025 SOTA)
│   └─ Production system? ─────────────────────────── Hybrid (pHash → DINOHash)
│
├─ People in photos? ─────────────────────────────── Face Clustering
│   ├─ Known thresholds? ──────────────────────────── Apple-style Agglomerative
│   └─ Unknown data distribution? ─────────────────── HDBSCAN
│
├─ Pets/Animals? ─────────────────────────────────── Pet Recognition
│   ├─ Detection? ─────────────────────────────────── YOLOv8
│   └─ Individual clustering? ─────────────────────── CLIP + HDBSCAN
│
├─ Best from burst? ──────────────────────────────── Burst Selection
│   └─ Score: sharpness + face quality + aesthetics
│
└─ Filter junk? ──────────────────────────────────── Content Detection
    ├─ Screenshots? ───────────────────────────────── Multi-signal classifier
    └─ NSFW? ──────────────────────────────────────── Safety classifier
```

---

## Core Concepts

### 1. Perceptual Hashing for Near-Duplicate Detection

**Problem:** Camera bursts, re-saved images, and minor edits create near-duplicates.

**Solution:** Perceptual hashes generate similar values for visually similar images.

**Method Comparison:**

| Method | Speed | Robustness | Best For |
|--------|-------|------------|----------|
| dHash | Fastest | Low | Exact duplicates |
| pHash | Fast | Medium | Brightness/contrast changes |
| DINOHash | Slower | High | Heavy crops, compression |
| Hybrid | Medium | Very High | Production systems |

**Hybrid Pipeline (2025 Best Practice):**
1. **Stage 1:** Fast pHash filtering (eliminates obvious non-duplicates)
2. **Stage 2:** DINOHash refinement (accurate detection)
3. **Stage 3:** Optional Siamese ViT verification

**Hamming Distance Thresholds:**
- Conservative: ≤5 bits different = duplicates
- Aggressive: ≤10 bits different = duplicates

→ **Deep dive**: `references/perceptual-hashing.md`

---

### 2. Face Recognition & Clustering

**Goal:** Group photos by person without user labeling.

**Apple Photos Strategy (2021-2025):**
1. Extract face + upper body embeddings (FaceNet, 512-dim)
2. Two-pass agglomerative clustering
3. Conservative first pass (threshold=0.4, high precision)
4. HAC second pass (threshold=0.6, increase recall)
5. Incremental updates for new photos

**HDBSCAN Alternative:**
- No threshold tuning required
- Robust to noise
- Better for unknown data distributions

**Parameters:**

| Setting | Agglomerative | HDBSCAN |
|---------|---------------|---------|
| Pass 1 threshold | 0.4 (cosine) | - |
| Pass 2 threshold | 0.6 (cosine) | - |
| Min cluster size | - | 3 photos |
| Metric | cosine | cosine |

→ **Deep dive**: `references/face-clustering.md`

---

### 3. Burst Photo Selection

**Problem:** Burst mode creates 10-50 nearly identical photos.

**Multi-Criteria Scoring:**

| Criterion | Weight | Measurement |
|-----------|--------|-------------|
| Sharpness | 30% | Laplacian variance |
| Face Quality | 35% | Eyes open, smiling, face sharpness |
| Aesthetics | 20% | NIMA score |
| Position | 10% | Middle frames bonus |
| Exposure | 5% | Histogram clipping check |

**Burst Detection:** Photos within 0.5 seconds of each other.

→ **Deep dive**: `references/content-detection.md`

---

### 4. Screenshot Detection

**Multi-Signal Approach:**

| Signal | Confidence | Description |
|--------|------------|-------------|
| UI elements | 0.85 | Status bars, buttons detected |
| Perfect rectangles | 0.75 | &gt;5 UI buttons (90° angles) |
| High text | 0.70 | &gt;25% text coverage (OCR) |
| No camera EXIF | 0.60 | Missing Make/Model/Lens |
| Device aspect | 0.60 | Exact phone screen ratio |
| Perfect sharpness | 0.50 | &gt;2000 Laplacian variance |

**Decision:** Confidence &gt;0.6 = screenshot

→ **Deep dive**: `references/content-detection.md`

---

### 5. Quick Indexing Pipeline

**Goal:** Index 10K+ photos efficiently with caching.

**Features Extracted:**
- Perceptual hashes (de-duplication)
- Face embeddings (people clustering)
- CLIP embeddings (semantic search)
- Color palettes
- Aesthetic scores

**Performance (10K photos, M1 MacBook Pro):**

| Operation | Time |
|-----------|------|
| Perceptual hashing | 2 min |
| CLIP embeddings | 3 min (GPU) |
| Face detection | 4 min |
| Color palettes | 1 min |
| Aesthetic scoring | 2 min (GPU) |
| Clustering + dedup | 1 min |
| **Total (first run)** | **~13 min** |
| **Incremental** | **&lt;1 min** |

→ **Deep dive**: `references/photo-indexing.md`

---

## Common Anti-Patterns

### Anti-Pattern: Euclidean Distance for Face Embeddings

**What it looks like:**
```python
distance = np.linalg.norm(embedding1 - embedding2)  # WRONG
```

**Why it's wrong:** Face embeddings are normalized; cosine similarity is the correct metric.

**What to do instead:**
```python
from scipy.spatial.distance import cosine
distance = cosine(embedding1, embedding2)  # Correct
```

### Anti-Pattern: Fixed Clustering Thresholds

**What it looks like:** Using same distance threshold for all face clusters.

**Why it's wrong:** Different people have varying intra-class variance (twins vs. diverse ages).

**What to do instead:** Use HDBSCAN for automatic threshold discovery, or two-pass clustering with conservative + relaxed passes.

### Anti-Pattern: Raw Pixel Comparison for Duplicates

**What it looks like:**
```python
is_duplicate = np.allclose(img1, img2)  # WRONG
```

**Why it's wrong:** Re-saved JPEGs, crops, brightness changes create pixel differences.

**What to do instead:** Perceptual hashing (pHash or DINOHash) with Hamming distance.

### Anti-Pattern: Sequential Face Detection

**What it looks like:** Processing faces one photo at a time without batching.

**Why it's wrong:** GPU underutilization, 10x slower than batched.

**What to do instead:** Batch process images (batch_size=32) with GPU acceleration.

### Anti-Pattern: No Confidence Filtering

**What it looks like:**
```python
for face in all_detected_faces:
    cluster(face)  # No filtering
```

**Why it's wrong:** Low-confidence detections create noise clusters (hands, objects).

**What to do instead:** Filter by confidence (threshold 0.9 for faces).

### Anti-Pattern: Forcing Every Photo into Clusters

**What it looks like:** Assigning noise points to nearest cluster.

**Why it's wrong:** Solo appearances shouldn't pollute person clusters.

**What to do instead:** HDBSCAN/DBSCAN naturally identifies noise (label=-1). Keep noise separate.

---

## Quick Start

```python
from photo_curation import PhotoCurationPipeline

pipeline = PhotoCurationPipeline()

# Index photo library
index = pipeline.index_library('/path/to/photos')

# De-duplicate
duplicates = index.find_duplicates()
print(f"Found {len(duplicates)} duplicate groups")

# Cluster faces
face_clusters = index.cluster_faces()
print(f"Found {len(face_clusters)} people")

# Select best from bursts
best_photos = pipeline.select_best_from_bursts(index)

# Filter screenshots
real_photos = pipeline.filter_screenshots(index)

# Curate for collage
collage_photos = pipeline.curate_for_collage(index, target_count=100)
```

---

## Python Dependencies

```
torch transformers facenet-pytorch ultralytics hdbscan opencv-python scipy numpy scikit-learn pillow pytesseract
```

---

## Integration Points

- **event-detection-temporal-intelligence-expert**: Provides temporal event clustering for event-aware curation
- **color-theory-palette-harmony-expert**: Extracts color palettes for visual diversity
- **collage-layout-expert**: Receives curated photos for assembly
- **clip-aware-embeddings**: Provides CLIP embeddings for semantic search and DeepDBSCAN

---

## References

1. **DINOHash (2025)**: "Adversarially Fine-Tuned DINOv2 Features for Perceptual Hashing"
2. **Apple Photos (2021)**: "Recognizing People in Photos Through Private On-Device ML"
3. **HDBSCAN**: "Hierarchical Density-Based Spatial Clustering" (2013-2025)
4. **Perceptual Hashing**: dHash (Neal Krawetz), DCT-based pHash

---

**Version**: 2.0.0
**Last Updated**: November 2025

Related Skills

web-design-expert

from curiositech/some_claude_skills

Creates unique web designs with brand identity, color palettes, typography, and modern UI/UX patterns. Use for brand identity development, visual design systems, layout composition, and responsive web design. Activate on "web design", "brand identity", "color palette", "UI design", "visual design", "layout". NOT for typography details (use typography-expert), color theory deep-dives (use color-theory-expert), design system tokens (use design-system-creator), or code implementation without design direction.

typography-expert

from curiositech/some_claude_skills

Master typographer specializing in font pairing, typographic hierarchy, OpenType features, variable fonts, and performance-optimized web typography. Use for font selection, type scales, web font optimization, and typographic systems. Activate on "typography", "font pairing", "type scale", "variable fonts", "web fonts", "OpenType", "font loading". NOT for logo design, icon fonts, general CSS styling, or image-based typography.

test-automation-expert

from curiositech/some_claude_skills

Comprehensive test automation specialist covering unit, integration, and E2E testing strategies. Expert in Jest, Vitest, Playwright, Cypress, pytest, and modern testing frameworks. Guides test pyramid design, coverage optimization, flaky test detection, and CI/CD integration. Activate on 'test strategy', 'unit tests', 'integration tests', 'E2E testing', 'test coverage', 'flaky tests', 'mocking', 'test fixtures', 'TDD', 'BDD', 'test automation'. NOT for manual QA processes, load/performance testing (use performance-engineer), or security testing (use security-auditor).

terraform-iac-expert

from curiositech/some_claude_skills

Terraform and OpenTofu infrastructure as code — module design, state management, multi-environment setups, remote backends, secrets management, CI/CD integration. NOT for Pulumi, CDK, Ansible, or Kubernetes manifests.

seo-visibility-expert

from curiositech/some_claude_skills

Comprehensive SEO, discoverability, and AI crawler optimization for web projects. Use for technical SEO audits, llms.txt/robots.txt setup, schema markup, social launch strategies (Product Hunt, HN, Reddit), and Answer Engine Optimization (AEO). Activate on 'SEO', 'discoverability', 'llms.txt', 'robots.txt', 'Product Hunt', 'launch strategy', 'get traffic', 'be found', 'search ranking'. NOT for paid advertising, PPC campaigns, or social media content creation (use marketing skills).

reactflow-expert

from curiositech/some_claude_skills

Builds DAG visualizations using ReactFlow v12 with custom nodes, ELKjs auto-layout, Zustand state management, and live state updates via WebSocket. Use when implementing workflow visualization dashboards, creating custom agent node components, integrating ELK layout algorithms, or wiring execution state into React components. Activate on "ReactFlow", "workflow visualization", "DAG visualization", "ELKjs", "custom nodes", "node-based editor", "graph visualization". NOT for writing Mermaid diagrams (use mermaid-graph-writer), general React development, or static diagram rendering.

pwa-expert

from curiositech/some_claude_skills

Progressive Web App development with Service Workers, offline support, and app-like behavior. Use for caching strategies, install prompts, push notifications, background sync. Activate on "PWA", "Service Worker", "offline", "install prompt", "beforeinstallprompt", "manifest.json", "workbox", "cache-first". NOT for native app development (use React Native), general web performance (use performance docs), or server-side rendering.

physics-rendering-expert

from curiositech/some_claude_skills

Real-time rope/cable physics using Position-Based Dynamics (PBD), Verlet integration, and constraint solvers. Expert in quaternion math, Gauss-Seidel/Jacobi solvers, and tangling detection. Activate on 'rope simulation', 'PBD', 'Position-Based Dynamics', 'Verlet', 'constraint solver', 'quaternion', 'cable dynamics', 'cloth simulation', 'leash physics'. NOT for fluid dynamics (SPH/MPM), fracture simulation (FEM), offline cinematic physics, molecular dynamics, or general game physics engines (use Unity/Unreal built-ins).

photo-composition-critic

from curiositech/some_claude_skills

Expert photography composition critic grounded in graduate-level visual aesthetics education, computational aesthetics research (AVA, NIMA, LAION-Aesthetics, VisualQuality-R1), and professional image analysis with custom tooling. Use for image quality assessment, composition analysis, aesthetic scoring, photo critique. Activate on "photo critique", "composition analysis", "image aesthetics", "NIMA", "AVA dataset", "visual quality". NOT for photo editing/retouching (use native-app-designer), generating images (use Stability AI directly), or basic image processing (use clip-aware-embeddings).

nextjs-app-router-expert

from curiositech/some_claude_skills

Expert in Next.js 14/15 App Router architecture, React Server Components (RSC), Server Actions, and modern full-stack React development. Specializes in routing patterns, data fetching strategies, caching, streaming, and deployment optimization.

national-expungement-expert

from curiositech/some_claude_skills

Criminal record expungement laws across all 50 US states and DC — eligibility rules, waiting periods, filing processes, fees, Clean Slate laws, automatic expungement provisions. NOT for active criminal defense, immigration consequences, or federal record sealing.

metal-shader-expert

from curiositech/some_claude_skills

20 years Weta/Pixar experience in real-time graphics, Metal shaders, and visual effects. Expert in MSL shaders, PBR rendering, tile-based deferred rendering (TBDR), and GPU debugging. Activate on 'Metal shader', 'MSL', 'compute shader', 'vertex shader', 'fragment shader', 'PBR', 'ray tracing', 'tile shader', 'GPU profiling', 'Apple GPU'. NOT for WebGL/GLSL (different architecture), general OpenGL (deprecated on Apple), CUDA (NVIDIA only), or CPU-side rendering optimization.