gemma-gem-browser-ai

Build and extend Gemma Gem, an on-device AI browser assistant Chrome extension running Google's Gemma 4 model via WebGPU with no cloud dependencies.

22 stars

byAradotso

View on GitHub Installation ↓

Best use case

gemma-gem-browser-ai is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Build and extend Gemma Gem, an on-device AI browser assistant Chrome extension running Google's Gemma 4 model via WebGPU with no cloud dependencies.

Teams using gemma-gem-browser-ai should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/gemma-gem-browser-ai/SKILL.md --create-dirs "https://raw.githubusercontent.com/Aradotso/trending-skills/main/skills/gemma-gem-browser-ai/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/gemma-gem-browser-ai/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How gemma-gem-browser-ai Compares

Feature / Agent	gemma-gem-browser-ai	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Build and extend Gemma Gem, an on-device AI browser assistant Chrome extension running Google's Gemma 4 model via WebGPU with no cloud dependencies.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Gemma Gem Browser AI

> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.

Gemma Gem is a Chrome extension that runs Google's Gemma 4 model entirely on-device via WebGPU. It injects a chat overlay into every page and exposes a tool-calling agent loop that can read pages, click elements, fill forms, execute JavaScript, and take screenshots — all without sending data to any server.

## Architecture Overview

```
Offscreen Document          Service Worker           Content Script
(Gemma 4 + Agent Loop)  <-> (Message Router)    <-> (Chat UI + DOM Tools)
       |                         |
  WebGPU inference          Screenshot capture
  Token streaming           JS execution
```

- **Offscreen document** (`offscreen/`): Loads the ONNX model via `@huggingface/transformers`, runs the agent loop, streams tokens.
- **Service worker** (`background/`): Routes messages, handles `take_screenshot` and `run_javascript`.
- **Content script** (`content/`): Injects shadow DOM chat UI, executes DOM tools.
- **`agent/`**: Zero-dependency module defining `ModelBackend` and `ToolExecutor` interfaces — extractable as a standalone library.

## Install & Build

```bash
# Prerequisites: Node.js 18+, pnpm
pnpm install

# Development build (logging active, source maps)
pnpm build

# Production build (errors only, minified)
pnpm build:prod
```

Load the extension:
1. Open `chrome://extensions`
2. Enable **Developer mode**
3. Click **Load unpacked** → select `.output/chrome-mv3-dev/`

**Model download** happens automatically on first chat open:
- `onnx-community/gemma-4-E2B-it-ONNX` — ~500 MB (default)
- `onnx-community/gemma-4-E4B-it-ONNX` — ~1.5 GB

Models are cached in the browser's cache storage after the first run.

## Key Interfaces (`agent/`)

### ModelBackend

```typescript
// agent/types.ts
export interface ModelBackend {
  generate(
    messages: ChatMessage[],
    tools: ToolDefinition[],
    options: GenerateOptions
  ): AsyncGenerator<StreamChunk>;
}

export interface ToolDefinition {
  name: string;
  description: string;
  parameters: JSONSchema;
}

export interface GenerateOptions {
  maxNewTokens?: number;
  thinking?: boolean;
}
```

### ToolExecutor

```typescript
// agent/types.ts
export interface ToolExecutor {
  execute(toolName: string, args: Record<string, unknown>): Promise<unknown>;
}
```

### Agent Loop

```typescript
// agent/loop.ts — simplified illustration
export async function* runAgentLoop(
  userMessage: string,
  history: ChatMessage[],
  model: ModelBackend,
  tools: ToolExecutor,
  toolDefs: ToolDefinition[],
  maxIterations: number
): AsyncGenerator<AgentEvent> {
  const messages = [...history, { role: "user", content: userMessage }];

  for (let i = 0; i < maxIterations; i++) {
    for await (const chunk of model.generate(messages, toolDefs, {})) {
      if (chunk.type === "token") yield { type: "token", token: chunk.token };
      if (chunk.type === "tool_call") {
        yield { type: "tool_start", name: chunk.name };
        const result = await tools.execute(chunk.name, chunk.args);
        yield { type: "tool_result", name: chunk.name, result };
        messages.push({ role: "tool", name: chunk.name, content: String(result) });
      }
      if (chunk.type === "done") return;
    }
  }
}
```

## Built-in Tools

| Tool | Location | Description |
|------|----------|-------------|
| `read_page_content` | Content script | Read page text/HTML or a CSS selector |
| `take_screenshot` | Service worker | Capture visible tab as PNG |
| `click_element` | Content script | Click by CSS selector |
| `type_text` | Content script | Type into input by CSS selector |
| `scroll_page` | Content script | Scroll by pixel amount |
| `run_javascript` | Service worker | Execute JS in page context |

## Adding a New Tool

Tools live in two places: the **definition** (in the offscreen agent) and the **executor** (in content script or service worker).

### Step 1 — Define the tool schema

```typescript
// offscreen/tools/definitions.ts
export const MY_TOOL_DEFINITION: ToolDefinition = {
  name: "get_page_title",
  description: "Returns the document title of the current page.",
  parameters: {
    type: "object",
    properties: {},
    required: [],
  },
};
```

### Step 2 — Register in the tool list

```typescript
// offscreen/tools/index.ts
import { MY_TOOL_DEFINITION } from "./definitions";

export const ALL_TOOLS: ToolDefinition[] = [
  // ...existing tools
  MY_TOOL_DEFINITION,
];
```

### Step 3 — Implement execution in the content script

```typescript
// content/tools/executor.ts
export async function executeContentTool(
  name: string,
  args: Record<string, unknown>
): Promise<unknown> {
  switch (name) {
    case "get_page_title":
      return document.title;

    case "read_page_content": {
      const selector = args.selector as string | undefined;
      if (selector) {
        return document.querySelector(selector)?.textContent ?? "Not found";
      }
      return document.body.innerText;
    }

    case "click_element": {
      const el = document.querySelector(args.selector as string) as HTMLElement;
      if (!el) throw new Error(`Element not found: ${args.selector}`);
      el.click();
      return "clicked";
    }

    case "type_text": {
      const input = document.querySelector(args.selector as string) as HTMLInputElement;
      if (!input) throw new Error(`Input not found: ${args.selector}`);
      input.focus();
      input.value = args.text as string;
      input.dispatchEvent(new Event("input", { bubbles: true }));
      input.dispatchEvent(new Event("change", { bubbles: true }));
      return "typed";
    }

    default:
      throw new Error(`Unknown content tool: ${name}`);
  }
}
```

### Step 4 — Handle service-worker-side tools

```typescript
// background/tools.ts
export async function executeSwTool(
  name: string,
  args: Record<string, unknown>,
  tabId: number
): Promise<unknown> {
  switch (name) {
    case "take_screenshot": {
      const dataUrl = await chrome.tabs.captureVisibleTab({ format: "png" });
      return dataUrl;
    }

    case "run_javascript": {
      const results = await chrome.scripting.executeScript({
        target: { tabId },
        func: new Function(args.code as string) as () => unknown,
      });
      return results[0]?.result ?? null;
    }

    default:
      return null; // not a SW tool — forward to content script
  }
}
```

## Message Routing Pattern

The service worker acts as a message bus. All communication uses `chrome.runtime.sendMessage`.

```typescript
// Message types (shared/messages.ts)
export type ExtMessage =
  | { type: "TOOL_CALL"; name: string; args: Record<string, unknown>; tabId: number }
  | { type: "TOOL_RESULT"; name: string; result: unknown }
  | { type: "TOKEN"; token: string }
  | { type: "AGENT_DONE" }
  | { type: "AGENT_ERROR"; error: string };

// Offscreen → SW
chrome.runtime.sendMessage<ExtMessage>({
  type: "TOOL_CALL",
  name: "click_element",
  args: { selector: "#submit-btn" },
  tabId: currentTabId,
});

// SW → Content script
chrome.tabs.sendMessage<ExtMessage>(tabId, {
  type: "TOOL_CALL",
  name: "click_element",
  args: { selector: "#submit-btn" },
  tabId,
});
```

## Model Configuration

```typescript
// offscreen/model.ts — loading with transformers.js
import { pipeline, TextGenerationPipeline } from "@huggingface/transformers";

const MODEL_IDS = {
  E2B: "onnx-community/gemma-4-E2B-it-ONNX",
  E4B: "onnx-community/gemma-4-E4B-it-ONNX",
} as const;

export type ModelSize = keyof typeof MODEL_IDS;

export async function loadModel(
  size: ModelSize,
  onProgress: (progress: number) => void
): Promise<TextGenerationPipeline> {
  return pipeline("text-generation", MODEL_IDS[size], {
    dtype: "q4f16",
    device: "webgpu",
    progress_callback: (p: { progress: number }) => onProgress(p.progress),
  });
}
```

## Settings & Persistence

Settings are stored via `chrome.storage.sync`:

```typescript
export interface GemmaGemSettings {
  modelSize: "E2B" | "E4B";
  thinking: boolean;
  maxIterations: number;
  disabledHosts: string[];
}

const DEFAULT_SETTINGS: GemmaGemSettings = {
  modelSize: "E2B",
  thinking: false,
  maxIterations: 10,
  disabledHosts: [],
};

export async function getSettings(): Promise<GemmaGemSettings> {
  const stored = await chrome.storage.sync.get("settings");
  return { ...DEFAULT_SETTINGS, ...(stored.settings ?? {}) };
}

export async function saveSettings(patch: Partial<GemmaGemSettings>): Promise<void> {
  const current = await getSettings();
  await chrome.storage.sync.set({ settings: { ...current, ...patch } });
}

// Disable extension on current host
async function disableOnCurrentSite() {
  const host = new URL(location.href).hostname;
  const settings = await getSettings();
  if (!settings.disabledHosts.includes(host)) {
    await saveSettings({ disabledHosts: [...settings.disabledHosts, host] });
  }
}
```

## Shadow DOM Chat UI Pattern

The content script injects a shadow DOM to isolate styles:

```typescript
// content/ui.ts
export function injectChatOverlay(): ShadowRoot {
  const host = document.createElement("div");
  host.id = "gemma-gem-host";
  // Prevent page styles from leaking in
  const shadow = host.attachShadow({ mode: "closed" });

  // Inject styles
  const style = document.createElement("style");
  style.textContent = CHAT_STYLES; // imported CSS string
  shadow.appendChild(style);

  // Inject chat container
  const container = document.createElement("div");
  container.id = "gemma-gem-container";
  shadow.appendChild(container);

  document.body.appendChild(host);
  return shadow;
}
```

## Debugging

All logs use `[Gemma Gem]` prefix. Development builds log info/debug/warn; production only logs errors.

```
# Service worker logs
chrome://extensions → Gemma Gem → "Inspect views: service worker"

# Offscreen document (most useful: model loading, prompts, tool calls)
chrome://extensions → Gemma Gem → "Inspect views: offscreen.html"

# Content script logs
DevTools on any page → Console (filter: [Gemma Gem])

# All extension contexts
chrome://inspect#other
```

Key things to check in offscreen document logs:
- Model download progress
- Full prompt construction
- Token counts per turn
- Raw model output (before tool call parsing)
- Tool execution results

## Common Patterns & Gotchas

**WebGPU availability check:**
```typescript
if (!navigator.gpu) {
  throw new Error("WebGPU not supported. Use Chrome 113+ with hardware acceleration enabled.");
}
const adapter = await navigator.gpu.requestAdapter();
if (!adapter) throw new Error("No WebGPU adapter found.");
```

**Offscreen document lifecycle** — Chrome may suspend the offscreen document. Ping it before sending messages:
```typescript
async function ensureOffscreen() {
  const existing = await chrome.offscreen.hasDocument();
  if (!existing) {
    await chrome.offscreen.createDocument({
      url: "offscreen.html",
      reasons: [chrome.offscreen.Reason.WORKERS],
      justification: "Run Gemma 4 inference via WebGPU",
    });
  }
}
```

**Context window management** — Gemma 4 supports 128K tokens but inference slows with long contexts. Clear history per-page with `clear_context` or limit stored turns:
```typescript
const MAX_HISTORY_TURNS = 20;
function trimHistory(messages: ChatMessage[]): ChatMessage[] {
  if (messages.length <= MAX_HISTORY_TURNS * 2) return messages;
  return messages.slice(-MAX_HISTORY_TURNS * 2);
}
```

**Tool call parsing** — Gemma 4 emits tool calls in a structured format. If adding custom parsing, guard against partial/streamed JSON:
```typescript
function safeParseToolCall(raw: string): { name: string; args: Record<string, unknown> } | null {
  try {
    return JSON.parse(raw);
  } catch {
    return null; // still streaming
  }
}
```

**CSS selector safety for DOM tools:**
```typescript
function safeQuerySelector(selector: string): Element | null {
  try {
    return document.querySelector(selector);
  } catch {
    return null; // invalid selector from model
  }
}
```

Related Skills

lightpanda-browser

from Aradotso/trending-skills

Expert skill for Lightpanda — the headless browser built in Zig for AI agents and automation. 9x less memory, 11x faster than Chrome. Installation, CLI, CDP server, Playwright/Puppeteer integration, and web scraping.

gemma-tuner-multimodal

from Aradotso/trending-skills

Fine-tune Gemma 4 and 3n models with audio, images, and text on Apple Silicon using PyTorch and Metal Performance Shaders.

chrome-cdp-live-browser

from Aradotso/trending-skills

Give AI agents access to your live Chrome session via CDP — interact with open tabs, logged-in accounts, and current page state

agent-browser-automation

from Aradotso/trending-skills

Headless browser automation CLI for AI agents using native Rust binary with Chrome DevTools Protocol

```markdown

from Aradotso/trending-skills

---

zeroboot-vm-sandbox

from Aradotso/trending-skills

Sub-millisecond VM sandboxes for AI agents using copy-on-write KVM forking via Zeroboot

yourvpndead-vpn-detection

from Aradotso/trending-skills

Android app that detects VPN/proxy servers (VLESS/xray/sing-box) via local SOCKS5 vulnerability, exposing exit IPs and server configs without root

xata-postgres-platform

from Aradotso/trending-skills

Expert skill for Xata open-source cloud-native Postgres platform with copy-on-write branching, scale-to-zero, and Kubernetes deployment

x-mentor-skill-nuwa

from Aradotso/trending-skills

AI-powered X (Twitter) content strategy skill that distills methodologies from 6 top creators + open-source algorithm data into actionable writing, growth, and monetization guidance.

wx-favorites-report

from Aradotso/trending-skills

End-to-end pipeline to extract, decrypt, and visualize WeChat Mac favorites from encrypted SQLite DB into an interactive HTML report.

wterm-web-terminal

from Aradotso/trending-skills

Web terminal emulator with Zig/WASM core, DOM rendering, and React/vanilla JS bindings

worldmonitor-intelligence-dashboard

from Aradotso/trending-skills

Real-time global intelligence dashboard with AI-powered news aggregation, geopolitical monitoring, and infrastructure tracking