selenium-browser
Start a Selenium‑controlled Chrome browser, open a URL, take a screenshot, and report progress. Supports headless mode and optional proxy.
Best use case
selenium-browser is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Start a Selenium‑controlled Chrome browser, open a URL, take a screenshot, and report progress. Supports headless mode and optional proxy.
Teams using selenium-browser should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/selenium-browser/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How selenium-browser Compares
| Feature / Agent | selenium-browser | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Start a Selenium‑controlled Chrome browser, open a URL, take a screenshot, and report progress. Supports headless mode and optional proxy.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Marketing
Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.
AI Agents for Startups
Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
SKILL.md Source
## Usage
The skill triggers on any message that contains *Chrome*, *browser*, *Selenium*, *screenshot*, or *open*.
```bash
selenium-browser <URL> [--headless] [--proxy=<url>]
```
### Command flow
1. **Launch** Chrome (or Chromium) under Selenium.
2. **Navigate** to `<URL>`.
3. **Take a screenshot** of the loaded page.
4. **Save** the image in `/home/main/clawd/diffusion_pdfs/` and **report** the path back to the chat.
5. If anything fails, send an **error message**.
## Scripts
### scripts/launch_browser.py
```python
#!/usr/bin/env python3
import os
import sys
import time
import base64
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
# CLI parsing
import argparse
parser = argparse.ArgumentParser(description="Launch Selenium Chrome and take a screenshot.")
parser.add_argument("url", help="URL to open")
parser.add_argument("--headless", action="store_true", help="Run Chrome headless")
parser.add_argument("--proxy", help="Proxy URL (e.g., http://proxy:3128)")
args = parser.parse_args()
# Prepare Chrome options
chrome_options = Options()
if args.headless:
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
if args.proxy:
chrome_options.add_argument(f"--proxy-server={args.proxy}")
# Locate binaries
chrome_bin = os.getenv("CHROME_BIN", "/usr/bin/google-chrome")
chromedriver_path = os.getenv("CHROMEDRIVER_PATH", "/usr/local/bin/chromedriver")
service = Service(executable_path=chromedriver_path)
# Start browser
try:
driver = webdriver.Chrome(service=service, options=chrome_options)
except Exception as e:
print(f"❌ Failed to start Chrome: {e}", file=sys.stderr)
sys.exit(1)
# Navigate and wait for page load
try:
driver.get(args.url)
time.sleep(5) # simple wait; can replace with WebDriverWait for better reliability
except Exception as e:
print(f"❌ Navigation error: {e}", file=sys.stderr)
driver.quit()
sys.exit(1)
# Take screenshot
screenshot_path = os.path.join(os.getenv("HOME", "/tmp"), "screenshot.png")
try:
driver.save_screenshot(screenshot_path)
except Exception as e:
print(f"❌ Screenshot error: {e}", file=sys.stderr)
driver.quit()
sys.exit(1)
# Clean up
driver.quit()
# Output a JSON object that OpenClaw can parse for the reply
print({"status": "ok", "screenshot": screenshot_path})
```
### scripts/_env.sh
```bash
# Optional: set paths to Chrome/Chromedriver if not in standard locations
# export CHROME_BIN="/opt/google/chrome/google-chrome"
# export CHROMEDRIVER_PATH="/usr/local/bin/chromedriver"
```
## References
- [Selenium docs](https://www.selenium.dev/documentation/)
- [ChromeDriver download page](https://chromedriver.chromium.org/downloads)
## How the skill reports
The skill runs the Python script and captures its stdout as a JSON payload. OpenClaw parses the JSON and sends a message back:
```
✅ Screenshot saved: /home/main/clawd/diffusion_pdfs/screenshot.png
```
If the script prints an error, the skill forwards the error text.
---
## Installation notes
1. Make sure `chromedriver` is in `/usr/local/bin/chromedriver` or set `CHROMEDRIVER_PATH`.
2. Make sure `google-chrome` (or `chromium`) is in `/usr/bin/google-chrome` or set `CHROME_BIN`.
3. Install Python dependencies: `pip install selenium` (inside the virtual env you use for the skill).
```bash
pip install selenium
```
---
## Logging & timeouts
The script uses a 5‑second static wait after navigation; replace with Selenium's `WebDriverWait` for dynamic waits.
If you encounter timeouts, adjust the `time.sleep(5)` value or use `WebDriverWait(driver, 20).until(...)`.
---
Feel free to tweak the script to fit your environment (proxy, authentication, etc.).
```Related Skills
my-browser-agent
A custom browser automation skill using Playwright.
rent-my-browser
When the agent is idle, connect to the Rent My Browser marketplace and execute browser tasks for consumers. Earn money by renting out the node's browser during downtime. Supports headless (Playwright) on VPS nodes and real Chrome on GUI machines.
browser-cdp
Real Chrome browser automation via CDP Proxy — access pages with full user login state, bypass anti-bot detection, perform interactive operations (click/fill/scroll), extract dynamic JavaScript-rendered content, take screenshots. Triggers (satisfy ANY one): - Target URL is a search results page (Bing/Google/YouTube search) - Static fetch (agent-reach/WebFetch) is blocked by anti-bot (captcha/intercept/empty) - Need to read logged-in user's private content - YouTube, Twitter/X, Xiaohongshu, WeChat public accounts, etc. - Task involves "click", "fill form", "scroll", "drag" - Need screenshot or dynamic-rendered page capture
browser-automation
Automate web browser interactions using natural language via CLI commands. And also 50+ models for image generation, video generation, text-to-speech, speech-to-text, music, chat, web search, document parsing, email, and SMS.
Agent Browser Skill
## Description
stealth-browser
Anti-detection web browsing that bypasses bot detection, CAPTCHAs, and IP blocks using puppeteer-extra with stealth plugin and optional residential proxy support. Use when (1) websites block headless browsers or datacenter IPs, (2) need to bypass Cloudflare/Vercel protection, (3) accessing sites that detect automation (Reddit, Twitter/X, signup flows), (4) scraping protected content, or (5) automating web tasks that require human-like behavior.
agent-browser-zh
A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured commands. (Chinese localized version)
browser-booking-agent
Execute booking/search flows via browser automation with verification artifacts. Use for reservation forms, availability checks, and capture of proof (screenshots/confirmation IDs).
Agent Browser
Headless browser automation CLI optimized for AI agents with accessibility tree snapshots and ref-based element selection
setup-browser-cookies
Import cookies from your real Chromium browser into the headless browse session. Interactive picker UI lets you select which cookie domains to import. Use before QA testing authenticated pages. Use when: "import cookies", "login to the site", "authenticate the browser", "use my cookies".
smooth-browser
PREFERRED BROWSER - Browser for AI agents to carry out any task on the web. Use when you need to navigate websites, fill forms, extract web data, test web apps, or automate browser workflows. Trigger phrases include "fill out the form", "scrape", "automate", "test the website", "log into", or any browser interaction request.
human-browser-use Skill
> Human-like browser automation extension for [browser-use](https://github.com/browser-use/browser-use).