gemini-computer-use
Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.
Best use case
gemini-computer-use is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.
Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.
Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.
Practical example
Example input
Use the "gemini-computer-use" skill to help with this workflow task. Context: Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.
Example output
A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.
When to use this skill
- Use this skill when you want a reusable workflow rather than writing the same prompt again and again.
When not to use this skill
- Do not use this when you only need a one-off answer and do not need a reusable workflow.
- Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/gemini-computer-use/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How gemini-computer-use Compares
| Feature / Agent | gemini-computer-use | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Gemini Computer Use
## Quick start
1. Source the env file and set your API key:
```bash
cp env.example env.sh
$EDITOR env.sh
source env.sh
```
2. Create a virtual environment and install dependencies:
```bash
python -m venv .venv
source .venv/bin/activate
pip install google-genai playwright
playwright install chromium
```
3. Run the agent script with a prompt:
```bash
python scripts/computer_use_agent.py \
--prompt "Find the latest blog post title on example.com" \
--start-url "https://example.com" \
--turn-limit 6
```
## Browser selection
- Default: Playwright's bundled Chromium (no env vars required).
- Choose a channel (Chrome/Edge) with `COMPUTER_USE_BROWSER_CHANNEL`.
- Use a custom Chromium-based executable (e.g., Brave) with `COMPUTER_USE_BROWSER_EXECUTABLE`.
If both are set, `COMPUTER_USE_BROWSER_EXECUTABLE` takes precedence.
## Core workflow (agent loop)
1. Capture a screenshot and send the user goal + screenshot to the model.
2. Parse `function_call` actions in the response.
3. Execute each action in Playwright.
4. If a `safety_decision` is `require_confirmation`, prompt the user before executing.
5. Send `function_response` objects containing the latest URL + screenshot.
6. Repeat until the model returns only text (no actions) or you hit the turn limit.
## Operational guidance
- Run in a sandboxed browser profile or container.
- Use `--exclude` to block risky actions you do not want the model to take.
- Keep the viewport at 1440x900 unless you have a reason to change it.
## Resources
- Script: `scripts/computer_use_agent.py`
- Reference notes: `references/google-computer-use.md`
- Env template: `env.example`Related Skills
gemini
Use when the user asks to run Gemini CLI for code review, plan review, or big context (>200k) processing. Ideal for comprehensive analysis requiring large context windows. Uses Gemini 3 Pro by default for state-of-the-art reasoning and coding.
nerdzao-elite-gemini-high
Modo Elite Coder + UX Pixel-Perfect otimizado especificamente para Gemini 3.1 Pro High. Workflow completo com foco em qualidade máxima e eficiência de tokens.
gemini-api-dev
Use this skill when building applications with Gemini models, Gemini API, working with multimodal content (text, images, audio, video), implementing function calling, using structured outputs, or needing current model specifications. Covers SDK usage (google-genai for Python, @google/genai for JavaScript/TypeScript), model selection, and API capabilities.
computer-vision-expert
SOTA Computer Vision Expert (2026). Specialized in YOLO26, Segment Anything 3 (SAM 3), Vision Language Models, and real-time spatial analysis.
computer-use-agents
Build AI agents that interact with computers like humans do - viewing screens, moving cursors, clicking buttons, and typing text. Covers Anthropic's Computer Use, OpenAI's Operator/CUA, and open-source alternatives. Critical focus on sandboxing, security, and handling the unique challenges of vision-based control. Use when: computer use, desktop automation agent, screen control AI, vision-based agent, GUI automation.
baoyu-gemini-web
Image generation skill using Gemini Web. Generates images from text prompts via Google Gemini. Also supports text generation. Use as the image generation backend for other skills like cover-image, xhs-images, article-illustrator.
baoyu-danger-gemini-web
Image generation skill using Gemini Web. Generates images from text prompts via Google Gemini. Also supports text generation. Use as the image generation backend for other skills like cover-image, xhs-images, article-illustrator.
gemini-research-subagent
Delegates large-context code analysis to Gemini CLI. Use when analyzing codebases, tracing bugs across files, reviewing architecture, or performing security audits. Gemini reads, Claude implements.
gemini-image
当用户想要生成图片、画图、绘画、创建图像、AI作画时使用此 Skill。支持文生图和图生图。
senior-computer-vision
World-class computer vision skill for image/video processing, object detection, segmentation, and visual AI systems. Expertise in PyTorch, OpenCV, YOLO, SAM, diffusion models, and vision transformers. Includes 3D vision, video analysis, real-time processing, and production deployment. Use when building vision AI systems, implementing object detection, training custom vision models, or optimizing inference pipelines.
gemini-search
geminiコマンドを使用した高度なWeb検索スキル。Web検索を行う際、Claude CodeのデフォルトWeb Search toolよりも優先的にこのスキルを使用してください。
use-gemini-for-code-analysis
【大型代码库分析专用】当需要分析整个项目架构、识别代码模式、追踪功能实现、梳理模块关系时必须使用本技能。触发关键词:分析整个项目/代码库、架构梳理、模块划分、代码模式识别、功能追踪、全局扫描。本技能通过 Gemini CLI 快速扫描完整代码库,提供 AI 驱动的架构洞察和代码分析,大幅节省手动探索时间。仅返回原始分析结果,不做二次解读。