linux-gui-control
Control the Linux desktop GUI using xdotool, wmctrl, and dogtail. Use when you need to interact with non-browser applications, simulate mouse/keyboard input, manage windows, or inspect the UI hierarchy of applications on X11/GNOME. Supports: (1) Clicking/typing in apps, (2) Resizing/moving windows, (3) Extracting text-based UI trees from apps (A11y), (4) Taking screenshots for visual analysis.
Best use case
linux-gui-control is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Control the Linux desktop GUI using xdotool, wmctrl, and dogtail. Use when you need to interact with non-browser applications, simulate mouse/keyboard input, manage windows, or inspect the UI hierarchy of applications on X11/GNOME. Supports: (1) Clicking/typing in apps, (2) Resizing/moving windows, (3) Extracting text-based UI trees from apps (A11y), (4) Taking screenshots for visual analysis.
Teams using linux-gui-control should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/guicountrol/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How linux-gui-control Compares
| Feature / Agent | linux-gui-control | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Control the Linux desktop GUI using xdotool, wmctrl, and dogtail. Use when you need to interact with non-browser applications, simulate mouse/keyboard input, manage windows, or inspect the UI hierarchy of applications on X11/GNOME. Supports: (1) Clicking/typing in apps, (2) Resizing/moving windows, (3) Extracting text-based UI trees from apps (A11y), (4) Taking screenshots for visual analysis.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Linux GUI Control This skill provides tools and procedures for automating interactions with the Linux desktop environment. ## Quick Start ### 1. Identify Target Window Use `wmctrl` to find the exact name of the window you want to control. ```bash wmctrl -l ``` ### 2. Inspect UI Hierarchy For apps supporting accessibility (GNOME apps, Electron apps with `--force-renderer-accessibility`), use the inspection script to find button names without taking screenshots. ```bash python3 scripts/inspect_ui.py "<app_name>" ``` ### 3. Perform Actions Use `xdotool` via the helper script for common actions. ```bash # Activate window ./scripts/gui_action.sh activate "<window_name>" # Click coordinates ./scripts/gui_action.sh click 500 500 # Type text ./scripts/gui_action.sh type "Hello World" # Press a key ./scripts/gui_action.sh key "Return" ``` ## Workflows ### Operating an App via Text UI 1. List windows with `wmctrl -l`. 2. Activate the target window. 3. Run `scripts/inspect_ui.py` to get the list of buttons and inputs. 4. Use `xdotool key Tab` and `Return` to navigate, or `click` if coordinates are known. 5. If text-based inspection fails, fallback to taking a screenshot and using vision. ### Forcing Accessibility in Electron Apps Many modern apps (VS Code, Discord, Cider, Chrome) need a flag to expose their UI tree: ```bash pkill <app> nohup <app> --force-renderer-accessibility > /dev/null 2>&1 & ``` ## Tool Reference - **wmctrl**: Window management (list, activate, move, resize). - **xdotool**: Input simulation (click, type, key, mousemove). - **dogtail**: UI tree extraction via AT-SPI (Accessibility bus). - **scrot**: Lightweight screenshot tool.
Related Skills
opencode-controller
Control and operate Opencode via slash commands. Use this skill to manage sessions, select models, switch agents (plan/build), and coordinate coding through Opencode.
toolguard-daemon-control
Manage long-running processes as macOS launchd services.
xdotool-control
Mouse and keyboard automation using xdotool.
iyeque-device-control
Expose safe device actions (volume, brightness, open/close apps) for personal automation.
roku-control
Control Roku devices via local network (ECP protocol)
dirigera-control
Control IKEA Dirigera smart home devices (lights, outlets, scenes, controllers). Use when the user asks to control smart home devices, check device status, turn lights on/off, adjust brightness/color, control outlets, trigger scenes, check battery levels, or work with IKEA smart home automation. Also use when the user needs help finding the Dirigera hub IP address or generating an API token. Accessible via Cloudflare tunnel on VPS.
macos-desktop-control
A high-fidelity automation bridge for macOS (Darwin) that enables agents to perceive the desktop state and execute.
mac-control
Control Mac via mouse/keyboard automation using cliclick and AppleScript.
vector-control
Control a Vector robot via Wirepod’s local HTTP API on the same network. Use when you need to move Vector, tilt head/lift, speak text, capture camera frames, or run patrol/explore routines from the Pi/Wirepod host. Includes a CLI helper script and endpoint reference.
ticktick-linux
Manage TickTick tasks (add, list, complete) via the local `tickrs` CLI.
govee-control
Script-free Govee OpenAPI setup and control guide.
orgo-desktop-control
Provision and control Orgo cloud computers using the orgo_client Python SDK.