computer-use
Full desktop computer use for headless Linux servers and VPS. Creates a virtual display (Xvfb + XFCE) to control GUI applications without a physical monitor. Screenshots, mouse clicks, keyboard input, scrolling, dragging — all 17 standard actions. Model-agnostic, works with any LLM.
Best use case
computer-use is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Full desktop computer use for headless Linux servers and VPS. Creates a virtual display (Xvfb + XFCE) to control GUI applications without a physical monitor. Screenshots, mouse clicks, keyboard input, scrolling, dragging — all 17 standard actions. Model-agnostic, works with any LLM.
Teams using computer-use should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/computer-use-1-0-1/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How computer-use Compares
| Feature / Agent | computer-use | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Full desktop computer use for headless Linux servers and VPS. Creates a virtual display (Xvfb + XFCE) to control GUI applications without a physical monitor. Screenshots, mouse clicks, keyboard input, scrolling, dragging — all 17 standard actions. Model-agnostic, works with any LLM.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Computer Use Skill Full desktop GUI control for headless Linux servers. Creates a virtual display (Xvfb + XFCE) so you can run and control desktop applications on VPS/cloud instances without a physical monitor. ## Environment - **Display**: `:99` - **Resolution**: 1024x768 (XGA, Anthropic recommended) - **Desktop**: XFCE4 ## Quick Start ```bash export DISPLAY=:99 # Take screenshot ./scripts/screenshot.sh # Click at coordinates ./scripts/click.sh 512 384 left # Type text ./scripts/type_text.sh "Hello world" # Press key combo ./scripts/key.sh "ctrl+s" # Scroll down ./scripts/scroll.sh down 5 ``` ## Actions Reference | Action | Script | Arguments | Description | |--------|--------|-----------|-------------| | screenshot | `screenshot.sh` | — | Capture screen → base64 PNG | | cursor_position | `cursor_position.sh` | — | Get current mouse X,Y | | mouse_move | `mouse_move.sh` | x y | Move mouse to coordinates | | left_click | `click.sh` | x y left | Left click at coordinates | | right_click | `click.sh` | x y right | Right click | | middle_click | `click.sh` | x y middle | Middle click | | double_click | `click.sh` | x y double | Double click | | triple_click | `click.sh` | x y triple | Triple click (select line) | | left_click_drag | `drag.sh` | x1 y1 x2 y2 | Drag from start to end | | left_mouse_down | `mouse_down.sh` | — | Press mouse button | | left_mouse_up | `mouse_up.sh` | — | Release mouse button | | type | `type_text.sh` | "text" | Type text (50 char chunks, 12ms delay) | | key | `key.sh` | "combo" | Press key (Return, ctrl+c, alt+F4) | | hold_key | `hold_key.sh` | "key" secs | Hold key for duration | | scroll | `scroll.sh` | dir amt [x y] | Scroll up/down/left/right | | wait | `wait.sh` | seconds | Wait then screenshot | | zoom | `zoom.sh` | x1 y1 x2 y2 | Cropped region screenshot | ## Workflow Pattern 1. **Screenshot** — Always start by seeing the screen 2. **Analyze** — Identify UI elements and coordinates 3. **Act** — Click, type, scroll 4. **Screenshot** — Verify result 5. **Repeat** ## Tips - Screen is 1024x768, origin (0,0) at top-left - Click to focus before typing in text fields - Use `ctrl+End` to jump to page bottom in browsers - Most actions auto-screenshot after 2 sec delay - Long text is chunked (50 chars) with 12ms keystroke delay ## System Services ```bash # Services auto-start on boot sudo systemctl status virtual-desktop # Xvfb on :99 sudo systemctl status xfce-desktop # XFCE session # Manual restart if needed sudo systemctl restart virtual-desktop xfce-desktop ``` ## Opening Applications ```bash export DISPLAY=:99 chromium-browser --no-sandbox & # Web browser xfce4-terminal & # Terminal thunar & # File manager ``` ## Requirements System packages (install once): ```bash sudo apt install -y xvfb xfce4 xfce4-terminal xdotool scrot imagemagick dbus-x11 chromium-browser ```
Related Skills
gemini-computer-use
Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.
senior-computer-vision
Computer vision engineering skill for object detection, image segmentation, and visual AI systems. Covers CNN and Vision Transformer architectures, YOLO/Faster R-CNN/DETR detection, Mask R-CNN/SAM segmentation, and production deployment with ONNX/TensorRT. Includes PyTorch, torchvision, Ultralytics, Detectron2, and MMDetection frameworks. Use when building detection pipelines, training custom models, optimizing inference, or deploying vision systems.
paylock
Non-custodial SOL escrow for AI agent deals.
agent-reputation
summary: Cross-platform AI agent reputation checker with trust scoring and PayLock escrow recommendations.
Telecom Agent Skill
Turn your AI Agent into a Telecom Operator. Bulk calling, ChatOps, and Field Monitoring.
OpenClaw-Finnhub
OpenClaw skill for real-time stock quote, and financials via Finnhub API.
```markdown
# OpenClaw-Last.fm
security-operator
Runtime security guardrails for OpenClaw agents.
operator-humanizer
Transform AI-generated text into authentic human writing.
kit-email-operator
**AI-powered email marketing for Kit (ConvertKit)**.
agora
Trade prediction markets on Agora — the prediction market exclusively for AI agents. Register, browse markets, trade YES/NO, create markets, earn reputation via Brier scores.
surf-check
Surf forecast decision engine.