web-app-testing

Gemini 2.5 Computer Use for browser automation with VISIBLE local browser. Watch Gemini AI control your browser in real-time. Perfect for web app testing, automation demos, and debugging.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

web-app-testing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Gemini 2.5 Computer Use for browser automation with VISIBLE local browser. Watch Gemini AI control your browser in real-time. Perfect for web app testing, automation demos, and debugging.

Teams using web-app-testing should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/web-app-testing/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/testing-security/web-app-testing/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/web-app-testing/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How web-app-testing Compares

Feature / Agent	web-app-testing	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Gemini 2.5 Computer Use for browser automation with VISIBLE local browser. Watch Gemini AI control your browser in real-time. Perfect for web app testing, automation demos, and debugging.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Gemini Computer Use - Web Browser Automation

You are an expert web application testing assistant using **Gemini 2.5 Computer Use** - Google's AI that can see and control web browsers.

## What This Skill Does

This skill implements Gemini Computer Use the **correct way** according to Google's official documentation:

1. **Gemini AI analyzes screenshots** of your browser
2. **Gemini decides what actions to take** (where to click, what to type)
3. **Actions execute on YOUR local browser** using Playwright
4. **You WATCH it happen** in real-time on your screen
5. **New screenshot sent back to Gemini** to continue the loop

✅ **AI-powered decision making** (Gemini)
✅ **Visible browser on your screen** (Playwright)
✅ **Best of both worlds!**

## Purpose

- **Web Application Testing**: Automated testing with AI understanding
- **Browser Automation**: Let AI navigate complex workflows
- **Debugging**: Watch AI interact with your site to find issues
- **Demos**: Show intelligent browser automation in action

## How It Works

```
┌─────────────┐
│   Gemini AI │  Analyzes screenshot
│             │  Decides: "Click search box at (821, 202)"
└──────┬──────┘
       │
       ↓ function_call: click(821, 202)
       │
┌──────┴──────┐
│  Playwright │  Executes click on YOUR screen
│   (Visible) │  Captures new screenshot
└──────┬──────┘
       │
       ↓ new screenshot + result
       │
┌──────┴──────┐
│   Gemini AI │  Sees result, plans next action
│             │  Loop continues...
└─────────────┘
```

## Variables

- `{URL}`: Target URL to test/automate
- `{TASK}`: What you want Gemini to do (in natural language)

## Usage

### Basic Command (Windows)

**IMPORTANT**: Use absolute path directly - DO NOT use `cd` commands on Windows!

```bash
python "C:\Users\USERNAME\.claude\skills\web-app-testing\scripts\gemini_browser.py" "{URL}" --task "{TASK}"
```

### Example Commands (Windows)

```bash
# Search Wikipedia for cats (VISIBLE BROWSER)
python "C:\Users\USERNAME\.claude\skills\web-app-testing\scripts\gemini_browser.py" "https://en.wikipedia.org" --task "Search for cats and tell me the first paragraph about them"

# Test a login flow
python "C:\Users\USERNAME\.claude\skills\web-app-testing\scripts\gemini_browser.py" "http://localhost:3000" --task "Test the login flow with username 'test' and password 'demo123'"

# Check console errors
python "C:\Users\USERNAME\.claude\skills\web-app-testing\scripts\gemini_browser.py" "https://google.com" --task "Navigate to the site and check for any console errors"

# Fill out a form
python "C:\Users\USERNAME\.claude\skills\web-app-testing\scripts\gemini_browser.py" "https://example.com/contact" --task "Fill out the contact form with test data"

# Run with custom slow motion (1 second per action)
python "C:\Users\USERNAME\.claude\skills\web-app-testing\scripts\gemini_browser.py" "https://wikipedia.org" --task "Search for dogs" --slow 1000

# Run in headless mode (no visible browser)
python "C:\Users\USERNAME\.claude\skills\web-app-testing\scripts\gemini_browser.py" "https://google.com" --task "Check console" --headless
```

### Command Options

- `--task` / `-t`: **Required** - Natural language description of task
- `--slow`: Slow motion delay in milliseconds (default: 500ms)
- `--headless`: Run without visible browser (default: visible)
- `--max-turns`: Maximum conversation turns (default: 20)

## Workflow for Claude Code

When user asks to test a web application or automate browser tasks:

### Step 1: Parse Request

Extract:
- **URL**: Target website
- **Task**: What to do (user's natural language description)

### Step 2: Run Gemini Computer Use (Windows-Optimized)

**CRITICAL**: Use absolute path with quoted arguments - NO `cd` commands!

```bash
python "C:\Users\USERNAME\.claude\skills\web-app-testing\scripts\gemini_browser.py" "{URL}" --task "{TASK}"
```

**Token-Efficient Pattern**:
- ✅ Single command execution
- ✅ Absolute path in quotes
- ✅ No directory changes needed
- ✅ Works on Windows without path errors

### Step 3: Observe Output

The script will:
1. ✅ Launch visible browser (maximized window)
2. ✅ Show Gemini's decisions in terminal
3. ✅ Execute actions in slow motion (you can watch)
4. ✅ Display console logs when done
5. ✅ Keep browser open 10 seconds for inspection
6. ✅ Return final results

### Step 4: Report Results

Summarize what Gemini accomplished, any errors found, and console logs.

## Example Session

```
User: "Go to Wikipedia and search for cats"

Claude Code executes:
  python "C:\Users\USERNAME\.claude\skills\web-app-testing\scripts\gemini_browser.py" "https://en.wikipedia.org" --task "Search for cats"

Output shows:
  [BROWSER] Launching VISIBLE browser...
  [BROWSER] ✓ Browser ready

  TURN 1
  [EXECUTING] navigate({"url": "https://en.wikipedia.org"})
    → Navigating to: https://en.wikipedia.org

  TURN 2
  [GEMINI] I can see the Wikipedia homepage. I'll search for "cats" now.
  [EXECUTING] type_text_at({"x": 821, "y": 202, "text": "cats", "press_enter": true})
    → Clicking at (821, 202) then typing: 'cats'
    → Typing: 'cats'
    → Pressing Enter

  TURN 3
  [GEMINI] I've successfully navigated to the Cat article on Wikipedia.
  [COMPLETE] Task finished!

  BROWSER CONSOLE LOGS
  ✓ No console errors

  [BROWSER] Keeping browser open for 10 seconds...
```

User sees:
- ✅ Browser window opens on their screen
- ✅ Watches Wikipedia load
- ✅ Sees search box get clicked
- ✅ Watches "cats" being typed
- ✅ Sees search submit and results appear
- ✅ Browser stays open to inspect
```

## Key Features

### AI Intelligence
- Gemini analyzes page visually (like a human)
- Adapts to different page layouts
- Makes intelligent decisions about what to click
- Understands context and intent

### Visible Execution
- Browser opens on YOUR screen (maximized)
- Actions happen in slow motion (configurable)
- You can watch every step
- Browser stays open for inspection

### Console Log Capture
- Captures errors, warnings, and info messages
- Displays organized summary at end
- Helps identify JavaScript issues

### Screenshot Loop
- Every action triggers new screenshot
- Gemini sees the updated page state
- Enables accurate decision-making

## Important Notes

### This is NOT a Hybrid System
This is the **official Gemini Computer Use implementation** according to Google's documentation. The pattern is:
1. Screenshot → Gemini
2. Gemini → Function call
3. Execute function locally
4. New screenshot → back to Gemini

### Browser Visibility
- **Default**: Visible browser (headless=False)
- **Option**: Can run headless with `--headless` flag
- **Recommended**: Keep visible for debugging/demos

### API Costs
- Each Gemini API call incurs costs
- Screenshots are sent with each turn
- Complex tasks = more API calls
- Monitor usage in Google AI Studio

### Best Practices
- ✅ Use specific, clear task descriptions
- ✅ Test on localhost first before production
- ✅ Watch the browser to understand AI behavior
- ✅ Keep tasks focused and achievable
- ❌ Don't test production without permission
- ❌ Don't use for CAPTCHA bypass or scraping at scale

## Troubleshooting

### Browser doesn't open
- Check Playwright is installed: `pip install playwright`
- Install browsers: `playwright install chromium`

### Gemini not finding elements
- Increase `--slow` to give page time to load
- Check if page uses dynamic content
- Verify URL is accessible

### API errors
- Check API key is valid
- Verify quota not exceeded
- Check internet connectivity

## Version History

- **v3.0.0**: Complete rewrite with proper Gemini Computer Use implementation
- **v2.1.0**: Added local Playwright mode (deprecated)
- **v2.0.0**: Initial Gemini integration (simulated, deprecated)

---

**Created by**: Custom Skill Builder
**Last Updated**: 2025-10-19
**Version**: 3.0.0
**Implementation**: Official Gemini Computer Use pattern

Related Skills

web-security-testing

from diegosouzapw/awesome-omni-skill

Web application security testing workflow for OWASP Top 10 vulnerabilities including injection, XSS, authentication flaws, and access control issues.

wallaby-testing

from diegosouzapw/awesome-omni-skill

Check test status and debug failing tests using Wallaby.js real-time test results. Use after making code changes to verify tests pass, when checking if tests are failing, debugging test errors, analyzing assertions, inspecting runtime values, checking coverage, updating snapshots, or when user mentions Wallaby, tests, coverage, or test status.

unit-testing-test-generate

from diegosouzapw/awesome-omni-skill

Generate comprehensive, maintainable unit tests across languages with strong coverage and edge case focus.

treido-testing

from diegosouzapw/awesome-omni-skill

Testing specialist for Treido (Playwright + Next.js). Use for writing/debugging E2E tests, deflaking, selectors, auth state, parallel execution, and CI stability.

testing-workflow

from diegosouzapw/awesome-omni-skill

Meta-skill that orchestrates comprehensive testing across a project by coordinating testing-patterns, e2e-testing, and testing agents. Use when setting up testing for a new project, improving coverage for an existing project, establishing a testing strategy, or verifying quality before a release.

testing-strategy

from diegosouzapw/awesome-omni-skill

Comprehensive guide for implementing AIDB tests following E2E-first philosophy, DebugInterface abstraction, and MCP response health standards

testing-strategies

from diegosouzapw/awesome-omni-skill

Testing strategies, patterns, and best practices for production code

testing-services

from diegosouzapw/awesome-omni-skill

Writes unit tests for Python service classes using Arrange-Act-Assert pattern with proper mocking at boundaries. Tests behavior, not implementation. Mocks external systems only (API calls, file I/O, databases). Use when writing tests for services or fixing test coverage.

testing-quality

from diegosouzapw/awesome-omni-skill

Plans and executes comprehensive testing strategy across frontend, backend, and AI tiers. Activates when writing tests, testing features, setting up test infrastructure, checking coverage, running E2E tests, or performance testing. Does not handle writing production code (backend-developer or frontend-developer), vulnerability/security review (security), or infrastructure deployment (devops).

testing-patterns

from diegosouzapw/awesome-omni-skill

Testing patterns using bun:test with in-memory SQLite. Use when writing unit tests, integration tests, or router tests.

testing-obsessive

from diegosouzapw/awesome-omni-skill

This skill should be used when the user mentions "write tests", "test coverage", "testing strategy", "unit tests", "integration tests", "e2e tests", "vitest", "jest", discusses testing approaches, asks about test patterns, or works on test files. Addresses testing fundamentals with emphasis on Vitest and Svelte component testing using pragmatic, risk-based approaches.

testing

from diegosouzapw/awesome-omni-skill

Comprehensive testing specialization covering test strategy, automation, TDD methodology, test writing, and web app testing. Use when setting up test infrastructure, writing tests, implementing TDD workflows, analyzing coverage, integrating tests into CI/CD, or testing web applications with Playwright. Framework-agnostic approach with framework-specific guidance via reference files.