android-use

Control Android devices via ADB commands - tap, swipe, type, navigate apps

16 stars

Best use case

android-use is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Control Android devices via ADB commands - tap, swipe, type, navigate apps

Teams using android-use should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/android-use/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/development/android-use/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/android-use/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How android-use Compares

Feature / Agentandroid-useStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Control Android devices via ADB commands - tap, swipe, type, navigate apps

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Android Device Control Skill

This skill enables you to control Android devices connected via ADB (Android Debug Bridge). You act as both the reasoning and execution engine - reading the device's UI state directly and deciding what actions to take.

## Prerequisites

- Android device connected via USB with USB debugging enabled
- ADB installed and accessible in PATH
- Device authorized for debugging (accepted the "Allow USB debugging?" prompt)

## Multi-Device Support

All scripts support the `-s <serial>` flag to target a specific device. This is essential when multiple devices are connected (e.g., a physical phone AND an emulator).

### Identifying Devices

Run `scripts/check-device.sh` to see all connected devices:

```
Multiple devices connected (2):

  [PHYSICAL] 1A051FDF6007PA - Pixel 6
  [EMULATOR] emulator-5554 - sdk_gphone64_arm64

Use -s <serial> to specify which device to use.
```

### Choosing the Right Device

When the user mentions:
- **"phone"**, **"my phone"**, **"physical device"** → Use the `[PHYSICAL]` device
- **"emulator"**, **"virtual device"**, **"AVD"** → Use the `[EMULATOR]` device
- If unclear, **ask the user** which device they want to target

### Using the Serial Flag

Once you identify the target device, pass `-s <serial>` to ALL subsequent scripts:

```bash
# Check specific device
scripts/check-device.sh -s 1A051FDF6007PA

# All actions on that device
scripts/get-screen.sh -s 1A051FDF6007PA
scripts/tap.sh -s 1A051FDF6007PA 540 960
scripts/launch-app.sh -s 1A051FDF6007PA chrome
```

**Important:** Be consistent - use the same serial for all commands in a session.

## Core Workflow

When given a task, follow this perception-action loop:

1. **Check device connection** - Run `scripts/check-device.sh` first
   - If multiple devices: identify target based on user intent or ask
   - Note the serial number for subsequent commands
2. **Get current screen state** - Run `scripts/get-screen.sh [-s serial]` to dump UI hierarchy
3. **Analyze the XML** - Read the accessibility tree to understand what's on screen
4. **Decide next action** - Based on goal + current state, choose an action
5. **Execute action** - Run the appropriate script with `-s serial` if needed
6. **Wait briefly** - Allow UI to update (typically 500ms-1s)
7. **Repeat** - Go back to step 2 until goal is achieved

## Reading UI XML

The `get-screen.sh` script outputs Android's accessibility XML. Key attributes to look for:

```xml
<node index="0" text="Settings" resource-id="com.android.settings:id/title"
      class="android.widget.TextView" content-desc=""
      bounds="[42,234][1038,345]" clickable="true" />
```

**Important attributes:**
- `text` - Visible text on the element
- `content-desc` - Accessibility description (useful for icons)
- `resource-id` - Unique identifier for the element
- `bounds` - Screen coordinates as `[left,top][right,bottom]`
- `clickable` - Whether element responds to taps
- `scrollable` - Whether element can be scrolled
- `focused` - Whether element has input focus

**Calculating tap coordinates:**
From `bounds="[left,top][right,bottom]"`, calculate center:
- x = (left + right) / 2
- y = (top + bottom) / 2

Example: `bounds="[42,234][1038,345]"` → tap at x=540, y=289

## Available Scripts

All scripts are in the `scripts/` directory. Run them via bash.

**All scripts support `-s <serial>` to target a specific device.**

### Device Management
| Script | Args | Description |
|--------|------|-------------|
| `check-device.sh` | `[-s serial]` | List devices / verify connection |
| `wake.sh` | `[-s serial]` | Wake device and dismiss lock screen |
| `screenshot.sh` | `[-s serial]` | Capture screen image |

### Screen Reading
| Script | Args | Description |
|--------|------|-------------|
| `get-screen.sh` | `[-s serial]` | Dump UI accessibility tree |

### Input Actions
| Script | Args | Description |
|--------|------|-------------|
| `tap.sh` | `[-s serial] x y` | Tap at coordinates |
| `type-text.sh` | `[-s serial] "text"` | Type text string |
| `swipe.sh` | `[-s serial] direction` | Swipe up/down/left/right |
| `key.sh` | `[-s serial] keyname` | Press key (home/back/enter/recent) |

### App Management
| Script | Args | Description |
|--------|------|-------------|
| `launch-app.sh` | `[-s serial] package_or_name` | Launch app by package or search by name |
| `install-apk.sh` | `[-s serial] path/to/file.apk` | Install APK to device |

## Action Guidelines

### When to tap
- Target clickable elements
- Always calculate center from bounds
- Prefer elements with `clickable="true"`

### When to type
- After tapping a text input field
- The field should have `focused="true"` or `class="android.widget.EditText"`
- Clear existing text first if needed (select all + delete)

### When to swipe
- To scroll lists or pages
- To navigate between screens (e.g., swipe left/right for tabs)
- Directions: `up` (scroll down), `down` (scroll up), `left`, `right`

### When to use keys
- `home` - Return to home screen
- `back` - Go back / close dialogs
- `enter` - Submit forms / confirm
- `recent` - Open recent apps

### When to take screenshots
- For visual debugging when XML doesn't capture enough info
- To verify visual state (colors, images, etc.)
- When the task requires visual confirmation

### When to wake the device
- Before starting any task (device may have gone to sleep)
- If `get-screen.sh` returns empty or minimal XML
- If actions don't seem to be working (screen may be off)
- Note: Won't bypass PIN/pattern/password - user must unlock manually

## Common Patterns

### Opening an app
```bash
# By package name (fastest)
scripts/launch-app.sh com.android.chrome

# By app name (searches installed apps)
scripts/launch-app.sh "Chrome"
```

### Tapping a button
1. Get screen: `scripts/get-screen.sh`
2. Find element with matching text/content-desc
3. Calculate center from bounds
4. Tap: `scripts/tap.sh 540 289`

### Entering text in a field
1. Tap the text field to focus it
2. Wait for keyboard
3. Type: `scripts/type-text.sh "your text here"`
4. Press enter if needed: `scripts/key.sh enter`

### Scrolling to find content
1. Get screen to check if target is visible
2. If not found, swipe: `scripts/swipe.sh up`
3. Get screen again, repeat until found or reached end

### Handling dialogs/popups
- Look for elements with text like "OK", "Allow", "Accept", "Cancel"
- Tap the appropriate button
- Or press back to dismiss: `scripts/key.sh back`

## Error Handling

### No device connected
- Check USB connection
- Verify USB debugging is enabled
- Run `adb devices` manually to troubleshoot

### Element not found
- The UI may have changed - get fresh screen dump
- Try scrolling to find the element
- Element might be in a different screen/state

### Action didn't work
- Wait longer between actions (UI might be slow)
- Verify coordinates are correct
- Check if a popup/dialog appeared

### App not responding
- Press home and reopen the app
- Or force close and restart

## Example Sessions

### Single Device

**User request:** "Open Chrome and search for weather"

```
1. scripts/check-device.sh
   → Device connected: Pixel 6
   → Serial: 1A051FDF6007PA
   → Type: Physical

2. scripts/launch-app.sh com.android.chrome
   → Chrome launched

3. scripts/get-screen.sh
   → [Read XML, find search/URL bar]
   → Found: bounds="[0,141][1080,228]" resource-id="com.android.chrome:id/url_bar"
   → Center: x=540, y=184

4. scripts/tap.sh 540 184
   → Tapped URL bar

5. scripts/get-screen.sh
   → [Verify keyboard appeared and field is focused]

6. scripts/type-text.sh "weather"
   → Typed "weather"

7. scripts/key.sh enter
   → Pressed enter to search

8. scripts/get-screen.sh
   → [Verify search results loaded]
   → Task complete!
```

### Multiple Devices

**User request:** "Open Settings on my phone" (with emulator also running)

```
1. scripts/check-device.sh
   → Multiple devices connected (2):
   →   [PHYSICAL] 1A051FDF6007PA - Pixel 6
   →   [EMULATOR] emulator-5554 - sdk_gphone64_arm64

   User said "my phone" → target the PHYSICAL device
   Serial to use: 1A051FDF6007PA

2. scripts/check-device.sh -s 1A051FDF6007PA
   → Device connected: Pixel 6
   → Serial: 1A051FDF6007PA
   → Type: Physical
   → Status: Ready

3. scripts/launch-app.sh -s 1A051FDF6007PA settings
   → Resolved 'settings' to package: com.android.settings
   → Launched: com.android.settings

4. scripts/get-screen.sh -s 1A051FDF6007PA
   → [Read XML, verify Settings app is open]
   → Task complete!
```

## Tips

- **Be patient** - Android UI can be slow, wait between actions
- **Read carefully** - The XML tells you exactly what's on screen
- **Check your work** - Get screen after each action to verify state
- **Use screenshots** - When XML doesn't give enough context
- **Start simple** - Break complex tasks into small steps
- **Multi-device** - Always check for multiple devices first; ask user if target is unclear
- **Consistent serial** - Once you pick a device, use `-s <serial>` on ALL commands

Related Skills

android

16
from diegosouzapw/awesome-omni-skill

Build, review, and refactor Android mobile apps (Kotlin) using modern Android patterns. Use for tasks like setting up Gradle modules, Jetpack Compose UI, navigation, ViewModel/state management, networking (Retrofit/OkHttp), persistence (Room/DataStore), DI (Hilt/Koin), testing, performance, release builds, and Play Store readiness.

android-watch-logs

16
from diegosouzapw/awesome-omni-skill

Start real-time log streaming from connected Android device using adb logcat. Shows only app's log messages. Use when monitoring app behavior, debugging, or viewing Android logs.

android-supabase

16
from diegosouzapw/awesome-omni-skill

Supabase integration patterns for Android - authentication, database, realtime subscriptions. Use when setting up Supabase SDK, implementing OAuth, querying database, or setting up realtime.

android-stop-app

16
from diegosouzapw/awesome-omni-skill

Stop the Android app running on connected device. Cleanly terminates the app using force-stop. Use when stopping the app for debugging, testing, or cleanup.

android-project

16
from diegosouzapw/awesome-omni-skill

Navigate and analyze Android project structure, modules, and dependencies. Use when exploring project structure, finding related files, analyzing dependencies, or locating code patterns.

android-notification-builder

16
from diegosouzapw/awesome-omni-skill

Эксперт Android notifications. Используй для push notifications, channels и notification patterns.

android-motion-specialist

16
from diegosouzapw/awesome-omni-skill

Expert Android developer for the Motion Detector project. Use this skill when working on Camera2 API integration, motion detection algorithms, Android networking (LAN sockets + Supabase Realtime), debugging crashes, or any Android/Kotlin development tasks specific to this sprint timing application.

android-kotlin

16
from diegosouzapw/awesome-omni-skill

Android Kotlin development with Coroutines, Jetpack Compose, Hilt, and MockK testing

android-kotlin-development

16
from diegosouzapw/awesome-omni-skill

Develop native Android apps with Kotlin. Covers MVVM with Jetpack, Compose for modern UI, Retrofit for API calls, Room for local storage, and navigation architecture.

android-keystore-generation

16
from diegosouzapw/awesome-omni-skill

Generate production and local development keystores for Android release signing

android-gradle

16
from diegosouzapw/awesome-omni-skill

Automate Gradle tasks for Android projects - build, test, coverage, clean. Use when building APKs, running unit tests, generating coverage reports, or checking dependencies.

android-firebase

16
from diegosouzapw/awesome-omni-skill

Firebase integration patterns for Android - Crashlytics, Analytics, Remote Config, FCM. Use when setting up crash reporting, analytics events, remote configuration, or push notifications.