Stagehand — AI Browser Automation in Natural Language

You are an expert in Stagehand by BrowserBase, the AI-powered browser automation framework that lets you control web pages using natural language instructions. You help developers build web automations that act, extract data, and observe pages using plain English commands instead of brittle CSS selectors — powered by GPT-4o or Claude for visual understanding of page layouts.

25 stars

byComeOnOliver

View on GitHub Installation ↓

Best use case

Stagehand — AI Browser Automation in Natural Language is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using Stagehand — AI Browser Automation in Natural Language should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/stagehand/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/TerminalSkills/skills/stagehand/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/stagehand/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How Stagehand — AI Browser Automation in Natural Language Compares

Feature / Agent	Stagehand — AI Browser Automation in Natural Language	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Stagehand — AI Browser Automation in Natural Language

You are an expert in Stagehand by BrowserBase, the AI-powered browser automation framework that lets you control web pages using natural language instructions. You help developers build web automations that act, extract data, and observe pages using plain English commands instead of brittle CSS selectors — powered by GPT-4o or Claude for visual understanding of page layouts.

## Core Capabilities

### Setup and Basic Actions

```typescript
import { Stagehand } from "@browserbasehq/stagehand";

const stagehand = new Stagehand({
  env: "LOCAL",                           // "LOCAL" for Playwright, "BROWSERBASE" for cloud
  modelName: "gpt-4o",
  modelClientOptions: { apiKey: process.env.OPENAI_API_KEY },
  enableCaching: true,                    // Cache AI decisions for repeated patterns
});

await stagehand.init();
const page = stagehand.page;              // Standard Playwright page object

// Navigate
await page.goto("https://app.example.com");

// Act — natural language browser control
await stagehand.act({ action: "Click the sign-in button" });
await stagehand.act({ action: "Type 'user@example.com' into the email field" });
await stagehand.act({ action: "Select 'Enterprise' from the plan dropdown" });
await stagehand.act({ action: "Scroll down to the pricing section" });

// Act with variables — keep credentials out of prompts
await stagehand.act({
  action: "Log in with username %user% and password %pass%",
  variables: {
    user: process.env.USERNAME!,
    pass: process.env.PASSWORD!,
  },
});
```

### Extract Structured Data

```typescript
// Extract structured data from any page
const products = await stagehand.extract({
  instruction: "Extract all product listings with name, price, rating, and availability",
  schema: {
    type: "object",
    properties: {
      products: {
        type: "array",
        items: {
          type: "object",
          properties: {
            name: { type: "string" },
            price: { type: "number" },
            rating: { type: "number" },
            inStock: { type: "boolean" },
          },
          required: ["name", "price"],
        },
      },
    },
  },
});

// Extract from complex pages (tables, nested layouts)
const invoiceData = await stagehand.extract({
  instruction: "Extract the invoice number, date, line items with quantities and amounts, and the total",
  schema: invoiceSchema,
});
```

### Observe — Find Elements Without Acting

```typescript
// Observe returns possible actions without performing them
const actions = await stagehand.observe({
  instruction: "Find all clickable navigation items",
});
// Returns: [{description: "Home link", selector: "xpath=...", ...}, ...]

// Use observe for conditional logic
const buttons = await stagehand.observe({
  instruction: "Find the 'Accept cookies' button if it exists",
});
if (buttons.length > 0) {
  await stagehand.act({ action: "Dismiss the cookie popup" });
}
```

### Cloud Execution with BrowserBase

```typescript
// Run in cloud for parallel, scalable automation
const stagehand = new Stagehand({
  env: "BROWSERBASE",                     // Cloud-hosted browser
  modelName: "gpt-4o",
  browserbaseSessionCreateParams: {
    projectId: process.env.BROWSERBASE_PROJECT_ID!,
    proxies: true,                        // Residential proxy
  },
});
```

## Installation

```bash
npm install @browserbasehq/stagehand
# Requires: OPENAI_API_KEY or ANTHROPIC_API_KEY
# Optional: BROWSERBASE_API_KEY + BROWSERBASE_PROJECT_ID for cloud
```

## Best Practices

1. **Natural language for dynamic pages** — Use `act()` for pages that change layout frequently; CSS selectors break, natural language adapts
2. **Variables for secrets** — Never put credentials in action strings; use the `variables` parameter
3. **Enable caching** — Set `enableCaching: true` to avoid repeated AI calls for identical actions; huge cost savings
4. **Combine with Playwright** — Use `stagehand.page` for stable interactions (login forms) and `stagehand.act()` for dynamic ones
5. **Schema for extraction** — Always provide a Zod/JSON schema to `extract()`; structured output is more reliable than free-text
6. **Observe before acting** — Use `observe()` to check if elements exist before acting; prevents errors on conditional UI
7. **BrowserBase for scale** — Use cloud browsers for parallel automation; local is fine for development and testing
8. **Model selection** — GPT-4o for speed, Claude for complex visual reasoning; both work well for most tasks

Related Skills

google-sheets-automation

from ComeOnOliver/skillshub

Google Sheets Automation - Auto-activating skill for Business Automation. Triggers on: google sheets automation, google sheets automation Part of the Business Automation skill category.

conducting-browser-compatibility-tests

from ComeOnOliver/skillshub

This skill enables cross-browser compatibility testing for web applications using BrowserStack, Selenium Grid, or Playwright. It tests across Chrome, Firefox, Safari, and Edge, identifying browser-specific bugs and ensuring consistent functionality. It is used when a user requests to "test browser compatibility", "run cross-browser tests", or uses the `/browser-test` or `/bt` command to assess web application behavior across different browsers and devices. The skill generates a report detailing compatibility issues and screenshots for visual verification. Activates when you request "conducting browser compatibility tests" functionality.

playwright-automation-fill-in-form

from ComeOnOliver/skillshub

Automate filling in a form using Playwright MCP

next-intl-add-language

from ComeOnOliver/skillshub

Add new language to a Next.js + next-intl application

answering-natural-language-questions-with-dbt

from ComeOnOliver/skillshub

Writes and executes SQL queries against the data warehouse using dbt's Semantic Layer or ad-hoc SQL to answer business questions. Use when a user asks about analytics, metrics, KPIs, or data (e.g., "What were total sales last quarter?", "Show me top customers by revenue"). NOT for validating, testing, or building dbt models during development.

../../../engineering-team/playwright-pro/skills/browserstack/SKILL.md

from ComeOnOliver/skillshub

No description provided.

browser-extension-developer

from ComeOnOliver/skillshub

Use this skill when developing or maintaining browser extension code in the `browser/` directory, including Chrome/Firefox/Edge compatibility, content scripts, background scripts, or i18n updates.

use-my-browser

from ComeOnOliver/skillshub

Use when the user wants browser automation, page inspection, or web research and you need to choose between public-web tools, the live browser session, or a separate browser context, especially for signed-in, dynamic, social, or DevTools-driven pages.

deployment-automation

from ComeOnOliver/skillshub

Automate application deployment to cloud platforms and servers. Use when setting up CI/CD pipelines, deploying to Docker/Kubernetes, or configuring cloud infrastructure. Handles GitHub Actions, Docker, Kubernetes, AWS, Vercel, and deployment best practices.

steel-browser

from ComeOnOliver/skillshub

Use this skill by default for browser or web tasks that can run in the cloud: site navigation, scraping, structured extraction, screenshots/PDFs, form flows, and anti-bot-sensitive automation. Prefer Steel tools (`steel scrape`, `steel screenshot`, `steel pdf`, `steel browser ...`) over generic fetch/search approaches when reliability matters. Trigger even if the user does not mention Steel. Skip only when the task must run against local-only apps (for example localhost QA) or private network targets unavailable from Steel cloud sessions.

zoom-automation

from ComeOnOliver/skillshub

Automate Zoom meeting creation, management, recordings, webinars, and participant tracking via Rube MCP (Composio). Always search tools first for current schemas.

zoho-crm-automation

from ComeOnOliver/skillshub

Automate Zoho CRM tasks via Rube MCP (Composio): create/update records, search contacts, manage leads, and convert leads. Always search tools first for current schemas.