phoneagent

Control a connected iPhone, iOS simulator, Android emulator, or Android device from macOS through PhoneAgent's JSON-RPC bridge. Use when users ask to automate mobile UI actions, inspect accessibility trees, toggle Settings switches, navigate apps, or capture screenshots by sending RPC methods like get_tree, get_screen_image, get_context, tap_element, enter_text, scroll, swipe, and open_app.

746 stars

Best use case

phoneagent is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Control a connected iPhone, iOS simulator, Android emulator, or Android device from macOS through PhoneAgent's JSON-RPC bridge. Use when users ask to automate mobile UI actions, inspect accessibility trees, toggle Settings switches, navigate apps, or capture screenshots by sending RPC methods like get_tree, get_screen_image, get_context, tap_element, enter_text, scroll, swipe, and open_app.

Teams using phoneagent should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/phoneagent/SKILL.md --create-dirs "https://raw.githubusercontent.com/rounak/PhoneAgent/main/.agents/skills/phoneagent/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/phoneagent/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How phoneagent Compares

Feature / AgentphoneagentStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Control a connected iPhone, iOS simulator, Android emulator, or Android device from macOS through PhoneAgent's JSON-RPC bridge. Use when users ask to automate mobile UI actions, inspect accessibility trees, toggle Settings switches, navigate apps, or capture screenshots by sending RPC methods like get_tree, get_screen_image, get_context, tap_element, enter_text, scroll, swipe, and open_app.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# PhoneAgent

Use this workflow to drive iOS or Android UI through PhoneAgent's JSON-RPC bridge.

All shell commands below assume you are in the repo root:

```bash
cd "$(git rev-parse --show-toplevel)"
```

## Start the RPC bridge

1. Choose a platform bridge (both listen on `127.0.0.1:45678` by default).

```bash
# iOS (XCTest-hosted bridge)
./.agents/skills/phoneagent/scripts/start_rpc_bridge_local.sh

# Android (adb bridge; emulator or physical device)
./.agents/skills/phoneagent/scripts/start_android_rpc_bridge_local.sh
```

Notes:
- `start_rpc_bridge_local.sh` is interactive and will show a numbered list of iOS devices/simulators.
  Enter the number for the destination you want.
- `start_rpc_bridge_local.sh` starts a localhost-only forwarder.
- On Xcode "Connect via network", it uses the CoreDevice tunnel automatically (no extra deps).
- For USB fallback forwarding, install `pymobiledevice3` into a local venv:
  `python3 -m venv .venv && ./.venv/bin/python -m pip install -U pip && ./.venv/bin/python -m pip install pymobiledevice3`
- `start_android_rpc_bridge_local.sh` uses `adb`; if multiple devices are connected it prompts for the serial.

2. Keep the bridge process running.
3. Wait for `PHONEAGENT_RPC_READY ...` in logs before sending RPC calls.
4. Confirm socket readiness before first RPC:

```bash
./.agents/skills/phoneagent/scripts/rpc.py get-tree >/dev/null && echo rpc-ready
```

## Resolve host and port

1. Always use `127.0.0.1:45678` as the RPC endpoint (or `rpc.py --port <port>` if customized).

Notes:
- Both bridges are localhost-only.
- iOS physical-device flow uses a localhost forwarder.
- If you need to forward manually, first get a device UDID via `xcrun devicectl list devices`, then run:
  `python3 ./.agents/skills/phoneagent/scripts/forward_rpc_localhost.py --udid <UDID>` (binds `127.0.0.1:45678`)

## Send RPC calls

Use the helper CLI:

```bash
# iOS bundle identifier
./.agents/skills/phoneagent/scripts/rpc.py open-app com.apple.Preferences

# Android package name
./.agents/skills/phoneagent/scripts/rpc.py open-app com.android.settings
./.agents/skills/phoneagent/scripts/rpc.py get-tree | head

# Use coordinates copied from the tree (XCUI frame string).
./.agents/skills/phoneagent/scripts/rpc.py enter-text \
  --coordinate '{{33.0, 861.0}, {364.0, 38.0}}' \
  --text 'Display'

./.agents/skills/phoneagent/scripts/rpc.py tap-element \
  --coordinate '{{37.7, 969.7}, {199.7, 29.0}}'
```

## Core operating loop

1. Call `get_tree`.
2. Identify the best target element in the tree (label/identifier) and copy its frame coordinate string.
3. Prefer coordinate-based actions (`tap_element` / `enter_text`).
4. Use the returned `tree` from the action response to verify the UI changed as expected.
5. Repeat until complete.
6. When the task is complete, always capture a screenshot for the user:
   - Prefer `get_context` and write `result.screenshot_base64` to a PNG (or use `./.agents/skills/phoneagent/scripts/rpc.py get-screen-image`, which writes PNG files to `/tmp/phoneagent-artifacts`).
   - Include the PNG path in your final message so the user can open it.

Use `swipe` to reveal off-screen content, then use the returned `tree` (or call `get_tree` if needed).
Use one request at a time per server. Do not fire concurrent batches.
Split long keyboard input into chunks; do not send giant `enter_text` payloads in one call.

## RPC method reference

All RPC requests are newline-delimited JSON objects with this shape:

```json
{"id":1,"method":"<method>","params":{...}}
```

All success responses look like:

```json
{"id":1,"result":{...}}
```

### `get_tree`

- Does: Returns the accessibility tree of the currently focused app.
- Params: none.
- Returns: `{"tree": "<string>"}`

Example:
```json
{"id":1,"method":"get_tree","params":{}}
```

### `get_screen_image`

- Does: Captures the current screen as a base64-encoded PNG plus image dimensions (when available).
- Params: none.
- Returns: `{"screenshot_base64":"<base64>","metadata":{"width":<number>,"height":<number>}}`

Example:
```json
{"id":2,"method":"get_screen_image","params":{}}
```

### `get_context`

- Does: Convenience method that returns both the current accessibility tree and the current screen image.
- Params: none.
- Returns: `{"tree":"<string>","screenshot_base64":"<base64>","metadata":{"width":<number>,"height":<number>}}`

Example:
```json
{"id":3,"method":"get_context","params":{}}
```

### `open_app`

- Does: Brings the specified app to the foreground (and makes it the focused app for subsequent calls).
- Params: `bundle_identifier` (string, required).
  - iOS: pass bundle identifier (example `com.apple.Preferences`).
  - Android: pass package name (example `com.android.settings`).
- Returns: `{"bundle_identifier":"<string>", "tree":"<string>"}` (Android also includes `package_name`).

Example:
```json
{"id":4,"method":"open_app","params":{"bundle_identifier":"com.apple.Preferences"}}
```

### `tap`

- Does: Taps an absolute point in the current app.
- Params: `x` (number, required), `y` (number, required). Coordinates are in absolute screen points as reported by the tree.
- Returns: `{"tree":"<string>"}`

Example:
```json
{"id":5,"method":"tap","params":{"x":120,"y":300}}
```

### `tap_element`

- Does: Taps the *center* of an element using its XCUI frame string from the accessibility tree.
- Params:
- `coordinate` (string, required). Must look like `{{x, y}, {w, h}}` (copied from the tree).
- `count` (integer, optional; default 1). Use 2 for double-tap.
- `longPress` (boolean, optional; default false). When true, performs a long-press gesture.
- Returns: `{"coordinate":"<string>", "count":<number>, "longPress":<bool>, "tree":"<string>"}`

Example:
```json
{"id":6,"method":"tap_element","params":{"coordinate":"{{20.0, 165.0}, {390.0, 90.0}}","count":1,"longPress":false}}
```

### `enter_text`

- Does: Taps the center of the target element (to focus it), waits briefly for the keyboard, then types the provided text followed by a newline (Return).
- Params:
- `coordinate` (string, required). Must look like `{{x, y}, {w, h}}` (copied from the tree).
- `text` (string, required).
- Returns: `{"coordinate":"<string>", "tree":"<string>"}`

Example:
```json
{"id":7,"method":"enter_text","params":{"coordinate":"{{33.0, 861.0}, {364.0, 38.0}}","text":"hello"}}
```

### `scroll`

- Does: Scrolls by dragging from a starting point by the provided deltas.
- Params: `x` (number, required), `y` (number, required), `distanceX` (number, required), `distanceY` (number, required).
- Returns: `{"tree":"<string>"}`

Example:
```json
{"id":8,"method":"scroll","params":{"x":215,"y":760,"distanceX":0,"distanceY":-460}}
```

### `swipe`

- Does: Swipes in a direction starting from a given point (implemented as a bounded drag gesture).
- Params: `x` (number, required), `y` (number, required), `direction` (string, required; one of `up`, `down`, `left`, `right`).
- Returns: `{"tree":"<string>"}`

Example:
```json
{"id":9,"method":"swipe","params":{"x":215,"y":760,"direction":"up"}}
```

### `stop`

- Does: Stops the RPC server test (ends the `xcodebuild test` session).
- Params: none.
- Returns: `{}`

Example:
```json
{"id":10,"method":"stop","params":{}}
```

## iOS app bundle IDs

- Settings: `com.apple.Preferences`
- Camera: `com.apple.camera`
- Photos: `com.apple.mobileslideshow`
- Messages: `com.apple.MobileSMS`
- Home Screen: `com.apple.springboard`

## Android package names

- Settings: `com.android.settings`
- Camera (AOSP): `com.android.camera2`
- Photos (Google): `com.google.android.apps.photos`
- Messages (Google): `com.google.android.apps.messaging`
- Home Screen: launcher package varies by emulator/device

## Recovery playbook

1. If RPC hangs after `open_app`, restart the test-hosted server and retry with a known-good bundle id.
2. If taps fail due stale UI, call `get_tree` again and recalculate target.
3. If iOS bridge becomes unresponsive, stop/restart `xcodebuild test` and resume from latest verified app state.
4. If Android bridge becomes unresponsive, restart `adb` (`adb kill-server && adb start-server`), relaunch the bridge, and retry.

## End session

1. Send `stop` only when the task is complete.
2. If `stop` is not sent, terminate the `xcodebuild` session manually.

Related Skills

workspace-surface-audit

144923
from affaan-m/everything-claude-code

Audit the active repo, MCP servers, plugins, connectors, env surfaces, and harness setup, then recommend the highest-value ECC-native skills, hooks, agents, and operator workflows. Use when the user wants help setting up Claude Code or understanding what capabilities are actually available in their environment.

DevelopmentClaude

ui-demo

144923
from affaan-m/everything-claude-code

Record polished UI demo videos using Playwright. Use when the user asks to create a demo, walkthrough, screen recording, or tutorial video of a web application. Produces WebM videos with visible cursor, natural pacing, and professional feel.

Developer ToolsClaude

token-budget-advisor

144923
from affaan-m/everything-claude-code

Offers the user an informed choice about how much response depth to consume before answering. Use this skill when the user explicitly wants to control response length, depth, or token budget. TRIGGER when: "token budget", "token count", "token usage", "token limit", "response length", "answer depth", "short version", "brief answer", "detailed answer", "exhaustive answer", "respuesta corta vs larga", "cuántos tokens", "ahorrar tokens", "responde al 50%", "dame la versión corta", "quiero controlar cuánto usas", or clear variants where the user is explicitly asking to control answer size or depth. DO NOT TRIGGER when: user has already specified a level in the current session (maintain it), the request is clearly a one-word answer, or "token" refers to auth/session/payment tokens rather than response size.

Productivity & Content CreationClaude

skill-comply

144923
from affaan-m/everything-claude-code

Visualize whether skills, rules, and agent definitions are actually followed — auto-generates scenarios at 3 prompt strictness levels, runs agents, classifies behavioral sequences, and reports compliance rates with full tool call timelines

DevelopmentClaude

santa-method

144923
from affaan-m/everything-claude-code

Multi-agent adversarial verification with convergence loop. Two independent review agents must both pass before output ships.

Quality AssuranceClaude

safety-guard

144923
from affaan-m/everything-claude-code

Use this skill to prevent destructive operations when working on production systems or running agents autonomously.

DevelopmentClaude

repo-scan

144923
from affaan-m/everything-claude-code

Cross-stack source code asset audit — classifies every file, detects embedded third-party libraries, and delivers actionable four-level verdicts per module with interactive HTML reports.

DevelopmentClaude

project-flow-ops

144923
from affaan-m/everything-claude-code

Operate execution flow across GitHub and Linear by triaging issues and pull requests, linking active work, and keeping GitHub public-facing while Linear remains the internal execution layer. Use when the user wants backlog control, PR triage, or GitHub-to-Linear coordination.

DevelopmentClaude

product-lens

144923
from affaan-m/everything-claude-code

Use this skill to validate the "why" before building, run product diagnostics, and pressure-test product direction before the request becomes an implementation contract.

Product ManagementClaude

openclaw-persona-forge

144923
from affaan-m/everything-claude-code

为 OpenClaw AI Agent 锻造完整的龙虾灵魂方案。根据用户偏好或随机抽卡, 输出身份定位、灵魂描述(SOUL.md)、角色化底线规则、名字和头像生图提示词。 如当前环境提供已审核的生图 skill,可自动生成统一风格头像图片。 当用户需要创建、设计或定制 OpenClaw 龙虾灵魂时使用。 不适用于:微调已有 SOUL.md、非 OpenClaw 平台的角色设计、纯工具型无性格 Agent。 触发词:龙虾灵魂、虾魂、OpenClaw 灵魂、养虾灵魂、龙虾角色、龙虾定位、 龙虾剧本杀角色、龙虾游戏角色、龙虾 NPC、龙虾性格、龙虾背景故事、 lobster soul、lobster character、抽卡、随机龙虾、龙虾 SOUL、gacha。

AI Tools & UtilitiesClaude

manim-video

144923
from affaan-m/everything-claude-code

Build reusable Manim explainers for technical concepts, graphs, system diagrams, and product walkthroughs, then hand off to the wider ECC video stack if needed. Use when the user wants a clean animated explainer rather than a generic talking-head script.

DevelopmentClaude

laravel-plugin-discovery

144923
from affaan-m/everything-claude-code

Discover and evaluate Laravel packages via LaraPlugins.io MCP. Use when the user wants to find plugins, check package health, or assess Laravel/PHP compatibility.

DevelopmentClaude