gemini-computer-use

Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.

7 stars

Best use case

gemini-computer-use is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.

Teams using gemini-computer-use should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/gemini-computer-use/SKILL.md --create-dirs "https://raw.githubusercontent.com/Demerzels-lab/elsamultiskillagent/main/public/skills/am-will/gemini-computer-use/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/gemini-computer-use/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How gemini-computer-use Compares

Feature / Agentgemini-computer-useStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Gemini Computer Use

## Quick start

1. Source the env file and set your API key:

   ```bash
   cp env.example env.sh
   $EDITOR env.sh
   source env.sh
   ```

2. Create a virtual environment and install dependencies:

   ```bash
   python -m venv .venv
   source .venv/bin/activate
   pip install google-genai playwright
   playwright install chromium
   ```

3. Run the agent script with a prompt:

   ```bash
   python scripts/computer_use_agent.py \
     --prompt "Find the latest blog post title on example.com" \
     --start-url "https://example.com" \
     --turn-limit 6
   ```

## Browser selection

- Default: Playwright's bundled Chromium (no env vars required).
- Choose a channel (Chrome/Edge) with `COMPUTER_USE_BROWSER_CHANNEL`.
- Use a custom Chromium-based executable (e.g., Brave) with `COMPUTER_USE_BROWSER_EXECUTABLE`.

If both are set, `COMPUTER_USE_BROWSER_EXECUTABLE` takes precedence.

## Core workflow (agent loop)

1. Capture a screenshot and send the user goal + screenshot to the model.
2. Parse `function_call` actions in the response.
3. Execute each action in Playwright.
4. If a `safety_decision` is `require_confirmation`, prompt the user before executing.
5. Send `function_response` objects containing the latest URL + screenshot.
6. Repeat until the model returns only text (no actions) or you hit the turn limit.

## Operational guidance

- Run in a sandboxed browser profile or container.
- Use `--exclude` to block risky actions you do not want the model to take.
- Keep the viewport at 1440x900 unless you have a reason to change it.

## Resources

- Script: `scripts/computer_use_agent.py`
- Reference notes: `references/google-computer-use.md`
- Env template: `env.example`

Related Skills

gemini-image-gen

7
from Demerzels-lab/elsamultiskillagent

Generate and edit images via Google Gemini API. Supports Gemini native generation, Imagen 3, style presets, and batch generation with HTML gallery. Zero dependencies — pure Python stdlib.

gemini-nano-banana-pro-portraits

7
from Demerzels-lab/elsamultiskillagent

Generate ultra-photorealistic portraits using Gemini Nano Banana Pro with comprehensive JSON configuration templates. Use when creating cinematic quality portraits, fitness photography, or realistic character images. Includes complete JSON structure for prompt configuration, subject details, apparel, pose, environment, lighting, and technical specifications.

free-ai-prompt-generator-for-chatgpt-gemini-more-q-6e800b2c

7
from Demerzels-lab/elsamultiskillagent

Write an AI prompt for a job description that attracts top talent

50-viral-gemini-ai-prompts-ready-to-copy-paste-for-e7b5d316

7
from Demerzels-lab/elsamultiskillagent

Romantic couple hugging on a beach at sunset, cinematic lighting, soft focus, using reference faces

50-viral-gemini-ai-prompts-ready-to-copy-paste-for-e41bb853

7
from Demerzels-lab/elsamultiskillagent

Multi-age family playing in a park, golden-hour lighting, candid expressions, using reference photos

50-viral-gemini-ai-prompts-ready-to-copy-paste-for-aefb3d26

7
from Demerzels-lab/elsamultiskillagent

Polaroid-style portrait of a woman smiling, casual outfit, natural light, using reference face

50-viral-gemini-ai-prompts-ready-to-copy-paste-for-4ac228ab

7
from Demerzels-lab/elsamultiskillagent

Epic fantasy group portrait in a magical forest, mystical lighting, dynamic poses, remove objects from original photo, using reference faces

50-viral-gemini-ai-prompts-ready-to-copy-paste-for-335a199b

7
from Demerzels-lab/elsamultiskillagent

Three women posing in urban street fashion, dramatic lighting, stylish hairstyles, using reference faces

zown-gemini-governor

7
from Demerzels-lab/elsamultiskillagent

A high-fidelity token management and model stabilization skill.

gemini-web-search

7
from Demerzels-lab/elsamultiskillagent

Use Gemini CLI (@google/gemini-cli) to do web search / fact-finding and return a sourced summary. Use when the user asks “why did X happen today”, “what’s the latest news”, “search the web”, “find sources/links”, or any task requiring up-to-date info. Prefer this over other search tools when Gemini is available but slow; run it with a TTY, wait longer, and verify source quality.

gemini-image-simple

7
from Demerzels-lab/elsamultiskillagent

Generate and edit images with Gemini API using pure Python stdlib. Zero dependencies - works on locked-down environments where pip/uv aren't available.

computer-use

7
from Demerzels-lab/elsamultiskillagent

Full desktop computer use for headless Linux servers and VPS. Creates a virtual display (Xvfb + XFCE) to control GUI applications without a physical monitor. Screenshots, mouse clicks, keyboard input, scrolling, dragging — all 17 standard actions. Model-agnostic, works with any LLM.