gemini-computer-use

Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.

533 stars

Best use case

gemini-computer-use is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.

Teams using gemini-computer-use should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/gemini-computer-use/SKILL.md --create-dirs "https://raw.githubusercontent.com/sundial-org/awesome-openclaw-skills/main/skills/gemini-computer-use/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/gemini-computer-use/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How gemini-computer-use Compares

Feature / Agentgemini-computer-useStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Gemini Computer Use

## Quick start

1. Source the env file and set your API key:

   ```bash
   cp env.example env.sh
   $EDITOR env.sh
   source env.sh
   ```

2. Create a virtual environment and install dependencies:

   ```bash
   python -m venv .venv
   source .venv/bin/activate
   pip install google-genai playwright
   playwright install chromium
   ```

3. Run the agent script with a prompt:

   ```bash
   python scripts/computer_use_agent.py \
     --prompt "Find the latest blog post title on example.com" \
     --start-url "https://example.com" \
     --turn-limit 6
   ```

## Browser selection

- Default: Playwright's bundled Chromium (no env vars required).
- Choose a channel (Chrome/Edge) with `COMPUTER_USE_BROWSER_CHANNEL`.
- Use a custom Chromium-based executable (e.g., Brave) with `COMPUTER_USE_BROWSER_EXECUTABLE`.

If both are set, `COMPUTER_USE_BROWSER_EXECUTABLE` takes precedence.

## Core workflow (agent loop)

1. Capture a screenshot and send the user goal + screenshot to the model.
2. Parse `function_call` actions in the response.
3. Execute each action in Playwright.
4. If a `safety_decision` is `require_confirmation`, prompt the user before executing.
5. Send `function_response` objects containing the latest URL + screenshot.
6. Repeat until the model returns only text (no actions) or you hit the turn limit.

## Operational guidance

- Run in a sandboxed browser profile or container.
- Use `--exclude` to block risky actions you do not want the model to take.
- Keep the viewport at 1440x900 unless you have a reason to change it.

## Resources

- Script: `scripts/computer_use_agent.py`
- Reference notes: `references/google-computer-use.md`
- Env template: `env.example`

Related Skills

google-gemini-media

533
from sundial-org/awesome-openclaw-skills

Use the Gemini API (Nano Banana image generation, Veo video, Gemini TTS speech and audio understanding) to deliver end-to-end multimodal media workflows and code templates for "generation + understanding".

gemini

533
from sundial-org/awesome-openclaw-skills

Gemini CLI for one-shot Q&A, summaries, and generation.

gemini-yt-video-transcript

533
from sundial-org/awesome-openclaw-skills

Create a verbatim transcript for a YouTube URL using Google Gemini (speaker labels, paragraph breaks; no time codes). Use when the user asks to transcribe a YouTube video or wants a clean transcript (no timestamps).

gemini-stt

533
from sundial-org/awesome-openclaw-skills

Transcribe audio files using Google's Gemini API or Vertex AI

gemini-image-simple

533
from sundial-org/awesome-openclaw-skills

Generate and edit images with Gemini API using pure Python stdlib. Zero dependencies - works on locked-down environments where pip/uv aren't available.

gemini-deep-research

533
from sundial-org/awesome-openclaw-skills

Perform complex, long-running research tasks using Gemini Deep Research Agent. Use when asked to research topics requiring multi-source synthesis, competitive analysis, market research, or comprehensive technical investigations that benefit from systematic web search and analysis.

portfolio-watcher

533
from sundial-org/awesome-openclaw-skills

Monitor stock/crypto holdings, get price alerts, track portfolio performance

portainer

533
from sundial-org/awesome-openclaw-skills

Control Docker containers and stacks via Portainer API. List containers, start/stop/restart, view logs, and redeploy stacks from git.

portable-tools

533
from sundial-org/awesome-openclaw-skills

Build cross-device tools without hardcoding paths or account names

polymarket

533
from sundial-org/awesome-openclaw-skills

Trade prediction markets on Polymarket. Analyze odds, place bets, track positions, automate alerts, and maximize returns from event outcomes. Covers sports, politics, entertainment, and more.

polymarket-traiding-bot

533
from sundial-org/awesome-openclaw-skills

No description provided.

polymarket-analysis

533
from sundial-org/awesome-openclaw-skills

Analyze Polymarket prediction markets for trading edges. Pair Cost arbitrage, whale tracking, sentiment analysis, momentum signals, user profile tracking. No execution.