Claw Use Android — Phone Control for AI Agents

Give your AI agent eyes, hands, and a voice on a real Android phone.

3,891 stars

Best use case

Claw Use Android — Phone Control for AI Agents is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Give your AI agent eyes, hands, and a voice on a real Android phone.

Teams using Claw Use Android — Phone Control for AI Agents should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/claw-use-android/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/4ier/claw-use-android/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/claw-use-android/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How Claw Use Android — Phone Control for AI Agents Compares

Feature / AgentClaw Use Android — Phone Control for AI AgentsStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Give your AI agent eyes, hands, and a voice on a real Android phone.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Claw Use Android — Phone Control for AI Agents

Give your AI agent eyes, hands, and a voice on a real Android phone.

`claw-use-android` is an Android app + CLI (`cua`) that exposes HTTP endpoints for full phone control. No ADB, no root, no PC.

## Setup

```bash
# Install the APK on your Android phone, enable Accessibility Service
# Then register the device:
cua add redmi 192.168.0.105 <token>
cua ping
```

## New in v2.0.0: Unified API

Three new endpoints replace the scattered old endpoints for AI agent workflows:

### GET /screen — Semantic UI Tree
Returns elements with stable integer `ref` IDs, semantic `zone`, and `role` annotations.

```bash
cua screen              # full semantic UI tree (JSON)
cua screen -c           # compact: only interactive/text elements
```

Response:
```json
{
  "package": "com.android.settings",
  "elements": [
    {"ref": 1, "text": "设置", "zone": "header"},
    {"ref": 2, "text": "搜索", "zone": "header", "role": "button", "click": true},
    {"ref": 3, "text": "WLAN", "zone": "content"}
  ]
}
```

### GET /snapshot — JPEG Screenshot
Returns a base64-encoded JPEG screenshot.

```bash
cua snapshot              # save screenshot, print path
cua snapshot 50 720 out.jpg  # quality, maxWidth, output
```

### POST /act — Unified Action Endpoint
All operations through a single entry point, using `ref` IDs from `/screen`.

```bash
cua act '{"click": 3}'              # click ref 3
cua act '{"click": "OK"}'           # click by text (fallback)
cua act '{"click": [1, 2, 3]}'      # click refs in sequence
cua act '{"tap": {"x": 540, "y": 960}}'
cua act '{"type": "hello"}'          # type into focused field
cua act '{"type": {"ref": 3, "text": "hello"}}'  # focus ref then type
cua act '{"swipe": "up"}'            # directional swipe
cua act '{"scroll": "down"}'         # scroll nearest scrollable
cua act '{"back": true}'
cua act '{"home": true}'
cua act '{"recents": true}'
cua act '{"longpress": 3}'           # long press ref
cua act '{"launch": "com.duolingo"}'

# Multiple actions in one request:
cua act '{"home": true, "back": true}'
```

### Agent Workflow Pattern (screen → act loop)
```bash
# 1. Observe
cua screen -c          # get refs
# 2. Act
cua act '{"click": 5}' # click ref 5
# 3. Observe again
cua screen -c          # see result
```

### Flow-First Principle

**执行手机操作前,先读 `flows.md`(与本文件同目录)。**

- 如果有匹配的 flow → 直接用 `/flow` 或批量脚本执行,跳过逐步推理
- 如果 flow 中有 `{"screen":true}` 断点 → 在该步读屏后由 agent 决策,然后继续
- 如果没有匹配 flow → 走 screen→act 循环,完成后**沉淀新 flow 到 `flows.md`**
- 如果 flow 执行失败(超时、元素未找到等)→ **回退到 screen→act 循环**继续完成任务,事后修正 flows.md

**主动沉淀(必须执行):** 完成任何多步操作后,立即审视刚才的步骤序列。如果发现可复用的模式(哪怕只是部分步骤),当场追加到 `flows.md`。不要等用户提醒。沉淀是 agent 的责任,不是用户的。

这样做的好处:
1. **快**:`/flow` 在设备端 100ms 轮询执行,不经过 LLM
2. **省 token**:一个 flow 替代 5-10 轮 agent 推理
3. **可积累**:每次新场景都沉淀,agent 越用越快

## Legacy CLI Reference (`cua`)

All legacy endpoints remain supported alongside the new unified API.

### Device Management
```bash
cua add <name> <ip> <token>    # register device with alias
cua devices                     # list all (with live status)
cua use <name>                  # switch default device
cua rm <name>                   # remove device
cua -d <name> <command>         # target specific device
cua discover                    # scan LAN for devices (192.168.x.x:7333)
```

### Perception — read the phone
```bash
cua screen              # full UI tree (JSON)
cua screen -c           # compact: only interactive/text elements
cua screenshot          # save screenshot, print path
cua screenshot 50 720 out.jpg  # quality, maxWidth, output
cua notifications       # list all notifications
cua status              # health dashboard
cua info                # device model, screen size, permissions
```

### Action — control the phone
```bash
cua tap <x> <y>         # tap coordinates
cua click <text>        # tap element by visible text
cua longpress <x> <y>   # long press
cua swipe up|down|left|right
cua scroll up|down|left|right
cua type "text"         # type text (CJK supported)
cua back                # system back
cua home                # go home
cua launch <package>    # launch app
cua launch              # list all apps
cua open <url>          # open URL
cua call <number>       # phone call
cua intent '<json>'     # fire Android Intent
```

### Audio
```bash
cua tts "hello"         # speak through phone speaker
cua say "你好"          # alias
```

### Device I/O (v1.7.0+)
```bash
cua clipboard           # read clipboard
cua clipboard "text"    # write to clipboard
cua camera [front|back] [quality] [output.jpg]  # take photo
cua volume              # read all volumes
cua volume media 10     # set media volume
cua volume media up     # adjust volume
cua battery             # battery status
cua wifi                # WiFi info
cua location            # GPS/network location
cua vibrate [ms]        # vibrate (default 200ms)
cua contacts [search]   # list/search contacts
cua sms list [limit]    # read SMS
cua sms send <number> <message>  # send SMS
cua file list [path]    # list directory
cua file read <path>    # read file
cua file write <path> <content>  # write file
cua file delete <path>  # delete file
```

### Device State
```bash
cua wake                # wake screen
cua lock / cua unlock   # lock/unlock (PIN required)
cua config pin 123456   # remember lock screen PIN for auto-unlock
cua config pattern 256398  # EXPERIMENTAL: pattern unlock (not yet verified)
```

### Flow Engine — phone-side scripted automation
```bash
cua flow '{
  "steps": [
    {"wait": "继续安装", "then": "tap", "timeout": 10000},
    {"wait": "继续更新", "then": "tap", "timeout": 10000},
    {"wait": "完成",     "then": "tap", "timeout": 60000, "optional": true}
  ]
}'
```

Flow runs entirely on the phone with zero LLM calls. The device polls its accessibility tree at 100ms intervals and reacts instantly when the target element appears.

**Step fields:**
- `wait` — text to find (case-insensitive partial match)
- `waitId` — resource ID to find
- `waitDesc` — content description to find  
- `waitGone` — wait for text to DISAPPEAR
- `then` — action: `tap`, `click`, `longpress`, `back`, `home`, `none`
- `timeout` — per-step timeout in ms (default 10000)
- `optional` — if true, timeout doesn't fail the flow
- `pauseMs` — pause after action before next step (default 500)

### Click with Retry
```bash
# Atomic find-and-tap: retries until element appears
curl -X POST /click -d '{"text":"继续安装","retry":3,"retryMs":2000}'
```

---

## Device Onboarding (New Device Setup)

Complete recipe for adding a new Android device from zero to fully operational.

### Prerequisites (human must do once)
1. Install APK on the device (download from GitHub Releases or LAN HTTP)
2. Enable Accessibility Service: Settings → Accessibility → Claw Use → ON
3. Note the auth token from the app notification or main screen

### Step 1: Discover & Register
```bash
# Scan LAN for devices
cua discover

# Register with a friendly name
cua add <name> <ip> <token>

# Verify connectivity
cua -d <name> ping
cua -d <name> info
```

### Step 2: Configure Auto-Unlock
```bash
# PIN unlock (recommended — proven reliable via a11y button tapping)
cua -d <name> config pin <PIN>

# Verify: lock then unlock
cua -d <name> lock
sleep 3
cua -d <name> unlock
# Should show {"unlocked":true}
```

**Important**: Only PIN unlock is verified to work. Pattern unlock is experimental and unreliable — the accessibility gesture dispatch doesn't consistently hit the correct grid coordinates across different devices and screen sizes. If the device uses pattern lock, change it to PIN.

### Step 3: MIUI/HyperOS Permissions (automated)
```bash
cua -d <name> setup-perms
```

This automates granting all 9 app permissions on MIUI devices:
位置, 相机, 麦克风, 照片和视频, 音乐和音频, 短信, 电话, 联系人, 日历

The command navigates through Settings → Apps → Claw Use → Permissions and clicks through each permission grant dialog.

**If `setup-perms` fails** (common on tablets with dual-pane layout), grant manually:
1. Open Settings → Apps → Manage Apps → search "Claw Use"
2. Tap "App permissions" (应用权限)
3. Enable each permission: prefer "始终允许" > "仅在使用中允许" > "允许"

### Step 4: Background Survival (MIUI)
These settings prevent MIUI from killing the service:

```bash
# Navigate to app settings
cua -d <name> intent '{"action":"android.settings.APPLICATION_DETAILS_SETTINGS","uri":"package:com.clawuse.android"}'
```

Then via a11y or manually ensure:
- **自启动 (Autostart)**: ON
- **省电策略 (Battery saver)**: 无限制 (No restrictions)
- **通知 (Notifications)**: 允许 (Allow)
- **WLAN联网 (WiFi access)**: ON (if available)

### Step 5: Verify Everything
```bash
cua -d <name> status    # check a11y health, uptime, request count
cua -d <name> screen -c # verify a11y tree works
cua -d <name> screenshot 50 720 /tmp/verify.jpg  # verify screenshot

# Test auto-unlock end-to-end
cua -d <name> lock
sleep 3
cua -d <name> screen -c  # should auto-unlock then return tree
```

### Known Device-Specific Issues

**MIUI Tablets (Xiaomi Pad 5, etc.)**:
- Settings uses dual-pane layout — left panel items NOT visible in a11y tree
- Must navigate through full Settings → Apps path instead of direct Intent
- `APPLICATION_DETAILS_SETTINGS` intent opens app LIST, not specific app
- `setup-perms` may need manual fallback for tablet layout

**MIUI Phones (Redmi K60 Ultra, etc.)**:
- ICP 备案 dialog may appear during APK install — click "继续安装"
- "仍然下载" confirmation in Chrome for HTTP downloads
- Chrome downloads don't auto-open APK — go to Downloads → tap the file icon (left side)

**General Android**:
- Notification Listener requires manual enable: Settings → 通知 → 设备和应用通知 → Claw Use
- `takeScreenshot()` returns black image on lock screen (Android security)
- Lock screen a11y tree requires `flagRetrieveInteractiveWindows` (added in v1.6.2)

---

## Self-Update (OTA via LAN)

Update a device to a new APK version without ADB:

```bash
# Serve APK on LAN (from the machine with the APK)
cd /path/to/apk && python3 -m http.server 9090 &

# On the device, open browser to download
cua -d <name> intent '{"action":"android.intent.action.VIEW","uri":"http://<lan-ip>:9090/app.apk"}'

# Or via browser navigation for MIUI browser:
cua -d <name> click "浏览器"
cua -d <name> click "搜索或输入网址"
cua -d <name> type "http://<lan-ip>:9090/app.apk"
# ... then handle download + install prompts

# MIUI install flow (after APK opens in installer)
cua -d <name> flow '{
  "steps": [
    {"wait": "继续安装", "then": "tap", "timeout": 15000},
    {"wait": "已了解此应用未经安全检测", "then": "tap", "timeout": 10000, "optional": true},
    {"wait": "继续更新", "then": "tap", "timeout": 15000}
  ]
}'

# Verify new version after service restart (~30s)
sleep 30
cua -d <name> ping
```

**UpdateReceiver**: The app listens for `MY_PACKAGE_REPLACED` broadcast and auto-restarts the service after update. No manual intervention needed after install completes.

---

## Workflow Patterns

### Navigate and interact (v2.0+ recommended)
```bash
cua act '{"launch": "org.telegram.messenger"}'
cua screen -c
cua act '{"click": "Search Chats"}'
cua act '{"type": "John"}'
cua act '{"click": "John"}'
```

### Navigate and interact (legacy)
```bash
cua launch org.telegram.messenger
cua screen -c
cua click "Search Chats"
cua type "John"
cua click "John"
```

### Visual + semantic perception
```bash
cua screen -c                          # what elements exist (structured, with refs)
cua snapshot 50 720 /tmp/look.jpg      # what it looks like (visual)
```

**Prefer `screen -c` over `snapshot`** for decision-making. Structured a11y data is faster to process, has exact coordinates, and provides ref IDs for `/act`. Use snapshot only when visual context matters (images, colors, layout).

### Handle locked device
Automatic — any command auto-unlocks if PIN is configured. No special handling needed.

### MIUI APK Install (via /flow)
```bash
cua flow '{
  "steps": [
    {"wait": "继续安装", "then": "tap", "timeout": 15000},
    {"wait": "已了解此应用未经安全检测", "then": "tap", "timeout": 10000, "optional": true},
    {"wait": "继续更新", "then": "tap", "timeout": 10000}
  ]
}'
```

### Multi-device
```bash
cua add phone1 192.168.0.101 <token>
cua add tablet 192.168.0.102 <token>
cua -d phone1 say "hello from phone 1"
cua -d tablet screenshot
```

## Operational Lessons

### DO
- **Use `click` by text** instead of `tap` by coordinates whenever text is visible
- **Use `screen -c`** as the primary perception tool — compact filters noise
- **Use `/flow`** for multi-step mechanical sequences — saves tokens, 100x faster than LLM-per-step
- **Use `intent` deep links** for app navigation (e.g., `https://t.me/c/{id}/{topic}/{msg}`)
- **Use PIN unlock** — proven 100% reliable via a11y button tapping

### DON'T
- **Don't use screenshot coordinates for tapping** — `screenshot?maxWidth=720` is scaled, `screen` bounds are actual pixels
- **Don't try pattern unlock** — coordinates vary by device/OS, no reliable way to locate the grid
- **Don't rely on `tap` when `click` can work** — text-based is resolution-independent
- **Don't manually navigate app UIs when deep links exist** — error-prone and slow
- **Don't rapid-fire requests** — allow 0.5-1s between actions for UI to settle

## Architecture

```
┌─────────────────────────────────────────────┐
│              Android Device                  │
│                                              │
│  :http process          main process         │
│  ┌──────────────┐      ┌──────────────────┐ │
│  │ BridgeService│ HTTP │ AccessibilityBridge│ │
│  │ NanoHTTPD    │─────→│ A11yInternalServer│ │
│  │ 0.0.0.0:7333│proxy │ 127.0.0.1:7334   │ │
│  └──────────────┘      └──────────────────┘ │
│    ↑ auth+CORS           ↑ a11y service      │
│    ↑ auto-unlock         ↑ gesture dispatch  │
│    ↑ config/status       ↑ tree traversal    │
└────────────────────────────────────────────── ┘
         ↑ HTTP
    ┌────────────┐
    │  Agent/CLI │  cua commands / curl
    └────────────┘
```

## Family

| Platform | Package | CLI | Status |
|----------|---------|-----|--------|
| Android | claw-use-android | `cua` | ✅ Available |
| iOS | claw-use-ios | `cui` | 🔮 Planned |
| Windows | claw-use-windows | `cuw` | 🔮 Planned |
| Linux | claw-use-linux | `cul` | 🔮 Planned |
| macOS | claw-use-mac | `cum` | 🔮 Planned |

Related Skills

openclaw-youtube

3891
from openclaw/skills

YouTube SERP Scout for agents. Search top-ranking videos, channels, and trends for content research and competitor tracking.

Content & Documentation

openclaw-search

3891
from openclaw/skills

Intelligent search for agents. Multi-source retrieval with confidence scoring - web, academic, and Tavily in one unified API.

Data & Research

openclaw-media-gen

3891
from openclaw/skills

Generate images & videos with AIsa. Gemini 3 Pro Image (image) + Qwen Wan 2.6 (video) via one API key.

Content & Documentation

Pest Control Operations Agent

3891
from openclaw/skills

You are an expert pest control business operations advisor. Help operators with licensing, EPA/FIFRA compliance, pricing, route optimization, seasonal planning, technician management, and growth strategy.

Business Management

OpenClaw Mastery — The Complete Agent Engineering & Operations System

3891
from openclaw/skills

> Built by AfrexAI — the team that runs 9+ production agents 24/7 on OpenClaw.

DevOps & Infrastructure

HIPAA Compliance for AI Agents

3891
from openclaw/skills

Generate HIPAA compliance checklists, risk assessments, and audit frameworks for healthcare organizations deploying AI agents.

Security

Export Compliance & Trade Controls

3891
from openclaw/skills

Analyze products, destinations, and end-users against US export control regulations (EAR, ITAR, OFAC sanctions). Generate classification recommendations, license requirements, and compliance checklists.

Regulatory Compliance

clawrouter

3891
from openclaw/skills

Smart LLM router — save 67% on inference costs. Routes every request to the cheapest capable model across 41 models from OpenAI, Anthropic, Google, DeepSeek, and xAI.

AI Optimization & Cost Savings

openclaw-safe-change-flow

3891
from openclaw/skills

Safe OpenClaw config change workflow with backup, minimal edits, validation, health checks, and rollback. Single-instance first; secondary instance optional.

DevOps & Infrastructure

jqopenclaw-node-invoker

3891
from openclaw/skills

统一通过 Gateway 的 node.invoke 调用 JQOpenClawNode 能力(file.read、file.write、process.exec、process.manage、system.run、process.which、system.info、system.screenshot、system.notify、system.clipboard、system.input、node.selfUpdate)。当用户需要远程文件读写、文件移动/删除、目录创建/删除、进程管理(列表/搜索/终止)、远程进程执行、命令可执行性探测、系统信息采集、截图采集、系统弹窗、系统剪贴板读写、输入控制(鼠标/键盘)、节点自更新、节点命令可用性排查或修复 node.invoke 参数错误时使用。

DevOps & Infrastructure

alphaclaw

3891
from openclaw/skills

AlphaClaw 是 SkillHub 技能商店的 CLI 工具,用于搜索、安装、发布和管理 Claude Code 技能。支持 AK/SK 登录、关键词搜索技能、一键安装/发布技能包、收藏和评论等完整功能。

ecovacs-robot-control

3891
from openclaw/skills

Control Ecovacs/DEEBOT robot vacuums via the Ecovacs IoT API. Use when the user wants to control a robot vacuum, check battery, start/stop/pause cleaning, return to dock, check clean status, set suction/water level, manage schedules, check consumables, or control auto-empty station. Covers all mainstream Ecovacs protocols including clean_V2, charge, getBattery, getCleanInfo_V2, getStats, getSpeed/setSpeed, getWaterInfo/setWaterInfo, getWorkMode/setWorkMode, getLifeSpan, getAutoEmpty/setAutoEmpty, getCachedMapInfo, getMapSet, getSched_V2/setSched_V2.

Smart Home & IoT