Gemini Browser

Query Google Gemini via browser automation using OpenClaw's Browser Relay. Use when you need to ask Gemini questions and get AI responses. Requires OpenClaw with Browser Relay Chrome extension configured.

1,864 stars

byLeoYeAI

View on GitHub Installation ↓

Best use case

Gemini Browser is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using Gemini Browser should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/gemini-browser/SKILL.md --create-dirs "https://raw.githubusercontent.com/LeoYeAI/openclaw-master-skills/main/skills/gemini-browser/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/gemini-browser/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How Gemini Browser Compares

Feature / Agent	Gemini Browser	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Gemini Browser Skill

Query Google Gemini (`gemini.google.com`) via OpenClaw Browser Relay and extract responses.

> **⚠️ Security Notice**: This skill operates on your **real Chrome browser** with
> your logged-in Google session via CDP (Chrome DevTools Protocol). The agent will
> have access to anything visible in the attached tab. Only attach tabs you
> explicitly intend for the agent to control. See [Security Considerations](#security-considerations).

## Prerequisites

- **OpenClaw** installed and running (this skill uses OpenClaw's `browser` command)
- **OpenClaw Browser Relay** Chrome extension installed and configured
  - Extension binds to loopback `127.0.0.1:18792` by default
  - Gateway auth token must be configured in extension options
- **Google account** logged in within Chrome (Gemini requires authentication)
- Use `profile=chrome` to relay through your existing Chrome (not the isolated `profile=openclaw-managed`)

## Quick Start

```bash
# 1. Open Gemini in Chrome
open -a "Google Chrome" "https://gemini.google.com"

# 2. Manually click the Browser Relay extension icon on the Gemini tab to attach
#    (the badge will show "ON" when attached)

# 3. Verify relay is connected
browser action=status profile=chrome
# Should show cdpReady: true

# 4. List tabs
browser action=tabs profile=chrome
# Note the targetId for the Gemini tab
```

## Input Method

Gemini uses a **Quill rich-text editor** (`contenteditable` div), not a standard `<textarea>`. You must inject text via JavaScript:

```
browser action=act profile=chrome targetId=<id> request={
  "kind": "evaluate",
  "fn": "(() => { const editor = document.querySelector('div.ql-editor[contenteditable=\"true\"]'); if (!editor) return 'editor not found'; editor.focus(); editor.innerHTML = '<p>YOUR_QUERY_HERE</p>'; editor.dispatchEvent(new Event('input', { bubbles: true })); return 'ok'; })()"
}
```

Then submit:

```
browser action=act profile=chrome targetId=<id> request={"kind":"press","key":"Enter"}
```

## Complete Workflow

### 1. Prepare

Open Gemini in Chrome and **manually attach** the Browser Relay extension to the tab.

```bash
open -a "Google Chrome" "https://gemini.google.com"
# Then click the Browser Relay extension icon on the Gemini tab
```

### 2. Get Tab ID

```
browser action=tabs profile=chrome
```

Find the Gemini tab entry and note its `targetId`.

### 3. Input Query

```
browser action=act profile=chrome targetId=<id> request={
  "kind": "evaluate",
  "fn": "(() => { const editor = document.querySelector('div.ql-editor[contenteditable=\"true\"]'); if (!editor) return 'editor not found'; editor.focus(); editor.innerHTML = '<p>What is quantum computing?</p>'; editor.dispatchEvent(new Event('input', { bubbles: true })); return 'ok'; })()"
}
```

### 4. Submit

```
browser action=act profile=chrome targetId=<id> request={"kind":"press","key":"Enter"}
```

### 5. Wait for Response

Gemini may take 10–60 seconds. Poll for completion by checking if the stop button has disappeared:

```
browser action=act profile=chrome targetId=<id> request={
  "kind": "evaluate",
  "fn": "(() => { const stop = document.querySelector('button[aria-label*=\"Stop\"]'); return stop ? 'generating' : 'done'; })()"
}
```

### 6. Extract Response

**Option A — Clipboard (recommended, preserves Markdown formatting):**

```
# Take a snapshot and find the Copy button
browser action=snapshot profile=chrome targetId=<id>

# Click the Copy button by its ref from the snapshot
browser action=act profile=chrome targetId=<id> request={"kind":"click","ref":"<copy_button_ref>"}

# Read from clipboard
pbpaste
```

**Option B — DOM extraction (fallback):**

```
browser action=act profile=chrome targetId=<id> request={
  "kind": "evaluate",
  "fn": "(() => { const msgs = document.querySelectorAll('.model-response-text'); if (msgs.length === 0) return 'no response found'; return msgs[msgs.length - 1].innerText; })()"
}
```

## New Chat

For unrelated queries, start a fresh chat to avoid context pollution:

```
browser action=navigate profile=chrome targetId=<id> targetUrl="https://gemini.google.com"
```

## Response Completion Signals

The response is complete when:
- The **stop button** disappears
- A **copy button** appears below the response
- **Suggested follow-up chips** appear

## Security Considerations

> **⚠️ Important**: Understand these risks before using this skill.

1. **Session access**: `profile=chrome` uses your real Chrome with all logged-in sessions. The agent can see and interact with anything in the attached tab, including your Google account context.
2. **JavaScript evaluation**: The `evaluate` action runs arbitrary JavaScript in the page context. This skill limits it to DOM manipulation for the input field, but the mechanism itself is powerful.
3. **Manual attachment required**: The Browser Relay extension must be **manually clicked** by you to attach — the agent cannot auto-attach to arbitrary tabs. Only attach the specific Gemini tab.
4. **Loopback only**: The relay binds to `127.0.0.1` and requires an auth token, preventing remote access.
5. **Recommendation**: Use a **separate Chrome profile** dedicated to AI automation, logged into a non-primary Google account, to limit exposure.

## Troubleshooting

| Problem | Solution |
|---------|----------|
| `cdpReady: false` | Click the Browser Relay extension icon on the Gemini tab to re-attach |
| Tab not found | Run `browser action=tabs profile=chrome` to refresh tab list |
| Editor not found | Page may not be fully loaded; wait and retry. Gemini may have changed DOM — check for `div.ql-editor` |
| Copy button not found | Response may still be generating; poll stop button status first |
| Login wall | Ensure Chrome is logged into a Google account |
| Context overflow | Navigate to `gemini.google.com` for a fresh chat |

Related Skills

agent-browser

1864

from LeoYeAI/openclaw-master-skills

A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured commands.

gemini

1864

from LeoYeAI/openclaw-master-skills

Gemini CLI for one-shot Q&A, summaries, and generation.

browserstack

1864

from LeoYeAI/openclaw-master-skills

Run tests on BrowserStack. Use when user mentions "browserstack", "cross-browser", "cloud testing", "browser matrix", "test on safari", "test on firefox", or "browser compatibility".

SKILL: Browser

1864

from LeoYeAI/openclaw-master-skills

This skill uses a headless browser (Puppeteer) to render web pages and extract clean, readable content.

browser-use

1864

from LeoYeAI/openclaw-master-skills

Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, or extract information from web pages.

youtube-watcher

1864

from LeoYeAI/openclaw-master-skills

Fetch and read transcripts from YouTube videos. Use when you need to summarize a video, answer questions about its content, or extract information from it.

youtube-transcript

1864

from LeoYeAI/openclaw-master-skills

Fetch and summarize YouTube video transcripts. Use when asked to summarize, transcribe, or extract content from YouTube videos. Handles transcript fetching via residential IP proxy to bypass YouTube's cloud IP blocks.

youtube-auto-captions - YouTube 自动字幕

1864

from LeoYeAI/openclaw-master-skills

## 描述

youtube

1864

from LeoYeAI/openclaw-master-skills

YouTube Data API integration with managed OAuth. Search videos, manage playlists, access channel data, and interact with comments. Use this skill when users want to interact with YouTube. For other third party apps, use the api-gateway skill (https://clawhub.ai/byungkyu/api-gateway).

yahoo-finance

1864

from LeoYeAI/openclaw-master-skills

Get stock prices, quotes, fundamentals, earnings, options, dividends, and analyst ratings using Yahoo Finance. Uses yfinance library - no API key required.

xurl

1864

from LeoYeAI/openclaw-master-skills

A Twitter research and content intelligence skill focused on attracting WordPress and Shopify clients. Use to analyze Twitter profiles, threads, and conversations for: (1) Identifying what small agency founders and eCommerce brands are discussing; (2) Understanding pain points around WordPress performance, Shopify CRO, and development bottlenecks; (3) Extracting high-performing content angles; (4) Turning insights into authority-building posts; (5) Converting Twitter intelligence into business leverage for clear content angles, strong positioning, and qualified inbound leads.

xlsx

1864

from LeoYeAI/openclaw-master-skills

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.