gemini-computer-use

Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.

533 stars

bysundial-org

View on GitHub Installation ↓

Best use case

gemini-computer-use is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using gemini-computer-use should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/gemini-computer-use/SKILL.md --create-dirs "https://raw.githubusercontent.com/sundial-org/awesome-openclaw-skills/main/skills/gemini-computer-use/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/gemini-computer-use/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How gemini-computer-use Compares

Feature / Agent	gemini-computer-use	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Gemini Computer Use

## Quick start

1. Source the env file and set your API key:

```bash
cp env.example env.sh
$EDITOR env.sh
source env.sh
```

2. Create a virtual environment and install dependencies:

```bash
python -m venv .venv
source .venv/bin/activate
pip install google-genai playwright
playwright install chromium
```

3. Run the agent script with a prompt:

```bash
python scripts/computer_use_agent.py \
--prompt "Find the latest blog post title on example.com" \
--start-url "https://example.com" \
--turn-limit 6
```

## Browser selection

- Default: Playwright's bundled Chromium (no env vars required).
- Choose a channel (Chrome/Edge) with `COMPUTER_USE_BROWSER_CHANNEL`.
- Use a custom Chromium-based executable (e.g., Brave) with `COMPUTER_USE_BROWSER_EXECUTABLE`.

If both are set, `COMPUTER_USE_BROWSER_EXECUTABLE` takes precedence.

## Core workflow (agent loop)

1. Capture a screenshot and send the user goal + screenshot to the model.
2. Parse `function_call` actions in the response.
3. Execute each action in Playwright.
4. If a `safety_decision` is `require_confirmation`, prompt the user before executing.
5. Send `function_response` objects containing the latest URL + screenshot.
6. Repeat until the model returns only text (no actions) or you hit the turn limit.

## Operational guidance

- Run in a sandboxed browser profile or container.
- Use `--exclude` to block risky actions you do not want the model to take.
- Keep the viewport at 1440x900 unless you have a reason to change it.

## Resources

- Script: `scripts/computer_use_agent.py`
- Reference notes: `references/google-computer-use.md`
- Env template: `env.example`

Related Skills

google-gemini-media

533

from sundial-org/awesome-openclaw-skills

Use the Gemini API (Nano Banana image generation, Veo video, Gemini TTS speech and audio understanding) to deliver end-to-end multimodal media workflows and code templates for "generation + understanding".

gemini

533

from sundial-org/awesome-openclaw-skills

Gemini CLI for one-shot Q&A, summaries, and generation.

gemini-yt-video-transcript

533

from sundial-org/awesome-openclaw-skills

Create a verbatim transcript for a YouTube URL using Google Gemini (speaker labels, paragraph breaks; no time codes). Use when the user asks to transcribe a YouTube video or wants a clean transcript (no timestamps).

gemini-stt

533

from sundial-org/awesome-openclaw-skills

Transcribe audio files using Google's Gemini API or Vertex AI

gemini-image-simple

533

from sundial-org/awesome-openclaw-skills

Generate and edit images with Gemini API using pure Python stdlib. Zero dependencies - works on locked-down environments where pip/uv aren't available.

gemini-deep-research

533

from sundial-org/awesome-openclaw-skills

Perform complex, long-running research tasks using Gemini Deep Research Agent. Use when asked to research topics requiring multi-source synthesis, competitive analysis, market research, or comprehensive technical investigations that benefit from systematic web search and analysis.

portfolio-watcher

533

from sundial-org/awesome-openclaw-skills

Monitor stock/crypto holdings, get price alerts, track portfolio performance

portainer

533

from sundial-org/awesome-openclaw-skills

Control Docker containers and stacks via Portainer API. List containers, start/stop/restart, view logs, and redeploy stacks from git.

portable-tools

533

from sundial-org/awesome-openclaw-skills

Build cross-device tools without hardcoding paths or account names

polymarket

533

from sundial-org/awesome-openclaw-skills

Trade prediction markets on Polymarket. Analyze odds, place bets, track positions, automate alerts, and maximize returns from event outcomes. Covers sports, politics, entertainment, and more.

polymarket-traiding-bot

533

from sundial-org/awesome-openclaw-skills

No description provided.

polymarket-analysis

533

from sundial-org/awesome-openclaw-skills

Analyze Polymarket prediction markets for trading edges. Pair Cost arbitrage, whale tracking, sentiment analysis, momentum signals, user profile tracking. No execution.