adb-screen-detection
Screen understanding with OCR and template matching for Android device automation
Best use case
adb-screen-detection is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Screen understanding with OCR and template matching for Android device automation
Teams using adb-screen-detection should expect a more consistent output, faster repeated execution, less prompt rewriting, better workflow continuity with your supporting tools.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
- You already have the supporting tools or dependencies needed by this skill.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/adb-screen-detection/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How adb-screen-detection Compares
| Feature / Agent | adb-screen-detection | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Screen understanding with OCR and template matching for Android device automation
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
---
## Quick Reference (30 seconds)
**Screen Understanding for Android Automation**
**What It Does**: Provides OCR-based text detection and template matching to understand Android device screens. Enables reliable UI automation by verifying screen state before and after actions.
**Core Capabilities**:
- 📸 **Screen Capture**: ADB screencap with local storage
- 🔍 **OCR Detection**: Tesseract-based text extraction
- 🎯 **Template Matching**: OpenCV-based element detection
- 👆 **Coordinate Tapping**: ADB input tap with verification
**When to Use**:
- Need to verify UI state before taking actions
- Finding UI elements by text or appearance
- Building reliable automation workflows
- Screen-dependent decision making
---
## Scripts
### 1. adb-screen-capture.py
Capture Android device screen and save locally.
```bash
# Basic usage
uv run .claude/skills/adb-screen-detection/scripts/adb-screen-capture.py
# Specify device
uv run .claude/skills/adb-screen-detection/scripts/adb-screen-capture.py --device 127.0.0.1:5555
# Custom output path
uv run .claude/skills/adb-screen-detection/scripts/adb-screen-capture.py --output /tmp/screen.png
# JSON output
uv run .claude/skills/adb-screen-detection/scripts/adb-screen-capture.py --json
```
**Output**:
```json
{
"device": "127.0.0.1:5555",
"timestamp": "2025-12-01T10:30:45Z",
"local_path": "/tmp/screenshot.png",
"size": [1080, 2400],
"success": true
}
```
---
### 2. adb-ocr-extract.py
Extract all visible text from device screen using Tesseract OCR.
```bash
# Basic usage (uses most recent screenshot)
uv run .claude/skills/adb-screen-detection/scripts/adb-ocr-extract.py
# Specify screenshot path
uv run .claude/skills/adb-screen-detection/scripts/adb-ocr-extract.py --image /tmp/screen.png
# Search for specific text
uv run .claude/skills/adb-screen-detection/scripts/adb-ocr-extract.py --search "Login"
# JSON output with coordinates
uv run .claude/skills/adb-screen-detection/scripts/adb-ocr-extract.py --json
```
**Output**:
```json
{
"text": ["Login", "Username", "Password", "Submit"],
"detected": true,
"search_found": true,
"search_term": "Login",
"coordinates": {
"Login": [[100, 200, 150, 230]]
}
}
```
---
### 3. adb-find-element.py
Find UI element by template matching or OCR text search.
```bash
# Find by OCR text
uv run .claude/skills/adb-screen-detection/scripts/adb-find-element.py \
--method ocr \
--target "Login Button" \
--threshold 0.8
# Find by template image
uv run .claude/skills/adb-screen-detection/scripts/adb-find-element.py \
--method template \
--template /path/to/template.png \
--threshold 0.8
# JSON output
uv run .claude/skills/adb-screen-detection/scripts/adb-find-element.py \
--method ocr \
--target "Login" \
--json
```
**Output**:
```json
{
"found": true,
"method": "ocr",
"target": "Login",
"coordinates": {
"x": 100,
"y": 200,
"width": 150,
"height": 30
},
"confidence": 0.95,
"message": "Element found at (100, 200)"
}
```
---
### 4. adb-tap-coordinate.py
Tap device screen at specific coordinates.
```bash
# Tap at coordinates
uv run .claude/skills/adb-screen-detection/scripts/adb-tap-coordinate.py \
--x 100 \
--y 200 \
--device 127.0.0.1:5555
# Tap with verification (check screen after tap)
uv run .claude/skills/adb-screen-detection/scripts/adb-tap-coordinate.py \
--x 100 \
--y 200 \
--verify-text "Next Screen" \
--timeout 5
# JSON output
uv run .claude/skills/adb-screen-detection/scripts/adb-tap-coordinate.py \
--x 100 \
--y 200 \
--json
```
**Output**:
```json
{
"device": "127.0.0.1:5555",
"tap": {
"x": 100,
"y": 200
},
"success": true,
"verified": true,
"verify_text": "Next Screen",
"verification_match": true
}
```
---
## Usage Patterns
### Pattern 1: Verify Screen State Before Action
```bash
# 1. Capture current screen
adb-screen-capture.py
# 2. Check for expected element
adb-find-element.py --method ocr --target "Login Button"
# 3. If found, tap it
adb-tap-coordinate.py --x 100 --y 200 --verify-text "Welcome"
```
### Pattern 2: OCR-Based Automation
```bash
# 1. Capture screen
adb-screen-capture.py
# 2. Extract all text
adb-ocr-extract.py --search "Settings"
# 3. Get coordinates and tap
adb-find-element.py --method ocr --target "Settings"
adb-tap-coordinate.py --x 150 --y 300
```
### Pattern 3: Template-Based Element Detection
```bash
# 1. Have known UI template images in ./templates/
# 2. Capture screen
adb-screen-capture.py
# 3. Match against templates
adb-find-element.py --method template --template ./templates/button.png
# 4. Tap matched location
adb-tap-coordinate.py --x $(jq -r '.coordinates.x') --y $(jq -r '.coordinates.y')
```
---
## Architecture
**Design Principles**:
- **Independent**: Each script can run standalone
- **Chainable**: Scripts output JSON for piping
- **Stateless**: No dependencies between executions
- **Verifiable**: Always verify screen state before proceeding
- **Timeout Protected**: All network operations have timeouts
**Dependency Relationship**:
```
adb-screen-capture.py (foundation)
↓
adb-ocr-extract.py (uses capture)
adb-find-element.py (uses capture or templates)
↓
adb-tap-coordinate.py (uses find-element for verification)
```
---
## Integration Points
**Used By**:
- `adb-navigation-base` - Wait for elements between actions
- `adb-magisk` - Verify Magisk UI state
- `adb-karrot` - Verify app state during automation
- `adb-workflow-orchestrator` - Screen verification in workflows
**Dependencies**:
- System: `adb` command-line tool
- Python: pytesseract, opencv-python, pillow, numpy
---
## Troubleshooting
### OCR Not Working
- Install Tesseract: `brew install tesseract` (macOS) or `apt-get install tesseract-ocr` (Linux)
- Set TESSDATA_PREFIX: `export TESSDATA_PREFIX=/usr/local/share/tessdata`
### Template Matching Too Strict/Loose
- Adjust `--threshold` parameter (0.0-1.0)
- Higher threshold = stricter matching
- Recommended: 0.8-0.9 for reliable detection
### Device Offline
- Check ADB connection: `adb devices`
- Reconnect: `adb connect <device>`
- Restart ADB: `adb kill-server && adb start-server`
---
## Workflows
This skill includes TOON-based workflow definitions for automation.
### What is TOON?
TOON (Task-Oriented Orchestration Notation) is a structured workflow definition language that pairs with Markdown documentation. Each workflow consists of:
- **[name].toon** - Orchestration logic and execution steps
- **[name].md** - Complete documentation and usage guide
This TOON+MD pairing approach is inspired by the BMAD METHOD pattern, adapted to use TOON instead of YAML for better orchestration support.
### Available Workflows
Workflow files are located in `workflow/` directory:
**Example Workflows (adb-screen-detection):**
- `workflow/screen-verification.toon` - Capture and verify screen state
- `workflow/element-detection.toon` - Find elements via OCR or template matching
- `workflow/screen-monitoring.toon` - Continuous screen monitoring and analysis
### Running a Workflow
Execute any workflow using the ADB workflow orchestrator:
```bash
uv run .claude/skills/adb-workflow-orchestrator/scripts/adb-run-workflow.py \
--workflow .claude/skills/adb-screen-detection/workflow/screen-verification.toon \
--param device="127.0.0.1:5555"
```
### Workflow Documentation
Each workflow includes comprehensive documentation in the corresponding `.md` file:
- Purpose and use case
- Prerequisites and requirements
- Available parameters
- Execution phases and steps
- Success criteria
- Error handling and recovery
- Example commands
See the `workflow/` directory for complete TOON file definitions and documentation.
### Creating New Workflows
To create custom workflows for this skill:
1. Create a new `.toon` file in the `workflow/` directory
2. Define phases, steps, and parameters using TOON v4.0 syntax
3. Create corresponding `.md` file with comprehensive documentation
4. Test with the workflow orchestrator
For more information, refer to the TOON specification and the workflow orchestrator documentation.
---
**Version**: 1.0.0
**Status**: ✅ Foundation Tier
**Scripts**: 4 (all MCP-ready)
**Last Updated**: 2025-12-01
**Tier**: 2 (Foundation)Related Skills
moai-alfred-language-detection
Auto-detects project language and framework from package.json, pyproject.toml, etc.
performing-steganography-detection
Detect and extract hidden data embedded in images, audio, and other media files using steganalysis tools to uncover covert communication channels.
ai-writing-detection
Comprehensive AI writing detection patterns and methodology. Provides vocabulary lists, structural patterns, model-specific fingerprints, and false positive prevention guidance. Use when analyzing text for AI authorship or understanding detection patterns.
bio-metagenomics-amr-detection
Detect antimicrobial resistance genes using AMRFinderPlus, ResFinder, and CARD. Screen isolates and metagenomes for resistance determinants. Use when characterizing resistance profiles in clinical isolates, surveillance samples, or metagenomic data.
android-screenshot-automation
Setup automated screenshot capture for Play Store using Fastlane Screengrab
asyncpg-detection
This skill should be used when the user asks to "detect asyncpg usage", "find asyncpg patterns", "scan for asyncpg imports", or "identify asyncpg database code in FastAPI projects". It automatically scans Python files to identify asyncpg imports, connection patterns, and query execution methods that need conversion to SQLAlchemy.
bgo
Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.
moai-lang-r
R 4.4+ best practices with testthat 3.2, lintr 3.2, and data analysis patterns.
moai-lang-python
Python 3.13+ development specialist covering FastAPI, Django, async patterns, data science, testing with pytest, and modern Python features. Use when developing Python APIs, web applications, data pipelines, or writing tests.
moai-icons-vector
Vector icon libraries ecosystem guide covering 10+ major libraries with 200K+ icons, including React Icons (35K+), Lucide (1000+), Tabler Icons (5900+), Iconify (200K+), Heroicons, Phosphor, and Radix Icons with implementation patterns, decision trees, and best practices.
moai-foundation-trust
Complete TRUST 4 principles guide covering Test First, Readable, Unified, Secured. Validation methods, enterprise quality gates, metrics, and November 2025 standards. Enterprise v4.0 with 50+ software quality standards references.
moai-foundation-memory
Persistent memory across sessions using MCP Memory Server for user preferences, project context, and learned patterns