anti-scraping

Use when need to bypass Cloudflare protection, scrape websites with anti-bot measures, render JavaScript pages, or simulate real browser behavior for web scraping

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

anti-scraping is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Use when need to bypass Cloudflare protection, scrape websites with anti-bot measures, render JavaScript pages, or simulate real browser behavior for web scraping

Teams using anti-scraping should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/anti-scraping/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/backend/anti-scraping/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/anti-scraping/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How anti-scraping Compares

Feature / Agent	anti-scraping	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Use when need to bypass Cloudflare protection, scrape websites with anti-bot measures, render JavaScript pages, or simulate real browser behavior for web scraping

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Anti-Scraping & Web Scraping

**When to use**: Websites with Cloudflare protection, JavaScript rendering requirements, or anti-bot measures.

## Overview

Provides battle-tested solutions for bypassing common anti-scraping measures using Playwright headless browser with stealth configurations.

## Key Capabilities

- ✅ Cloudflare challenge bypass
- ✅ JavaScript rendering
- ✅ Real browser context simulation
- ✅ Stealth mode (hides automation detection)
- ✅ Screenshot capture for debugging

## Quick Start

### Prerequisites
```bash
# Install Playwright
npm install -g playwright
playwright install chromium
```

### Basic Usage Pattern

```javascript
// n8n Execute Command node
const { execSync } = require('child_process');

const url = 'https://example.com';
const outputFile = '/tmp/page.html';

// Playwright command with stealth
const command = `node playwright-cloudflare.js "${url}" "${outputFile}"`;
execSync(command);

// Read result
const html = fs.readFileSync(outputFile, 'utf8');
```

## Core Script: playwright-cloudflare.js

**Location**: `n8n-skills/anti-scraping/playwright-cloudflare.js`

**Key Features**:
- Disables automation detection
- Sets real browser headers
- Configures viewport and user agent
- Handles Cloudflare waiting
- Captures screenshots on failure

**Configuration**:
```javascript
const config = {
  waitForCloudflare: true,      // Wait for CF challenge
  waitTime: 15000,               // Max wait time (ms)
  selector: '.product-list',     // Element to wait for
  screenshotOnError: true,       // Debug screenshots
  userAgent: 'Mozilla/5.0...'   // Real browser UA
};
```

## n8n Workflow Pattern

```
[Manual Trigger]
    ↓
[Set Parameters]
    target_url: https://site.com
    wait_selector: .content
    ↓
[Execute Command: Playwright]
    Command: node
    Arguments: playwright-cloudflare.js {{$json.target_url}} /tmp/output.html
    ↓
[Read HTML File]
    File: /tmp/output.html
    ↓
[Parse with Cheerio]
    (use html-parsing skill)
```

## Performance

- **Speed**: 15-25 seconds per page
- **Success Rate**: ~95% for Cloudflare sites
- **Resource Usage**: ~200-300MB RAM per browser instance

## Troubleshooting

### Cloudflare Still Blocking
```bash
# Increase wait time
--wait 30000

# Add specific selector to wait for
--selector '.product-list'

# Check screenshot for errors
/tmp/error-screenshot.png
```

### Timeout Errors
```bash
# Increase timeout in playwright script
timeout: 60000  // 60 seconds
```

### Memory Issues
```bash
# Close browser properly
await browser.close();

# Limit concurrent instances
# Use n8n Split Into Batches with batch size = 1
```

## Best Practices

1. **Add Delays**: Wait 3-5 seconds between requests
2. **Rotate User Agents**: Change UA periodically
3. **Use Residential Proxies**: For high-volume scraping
4. **Handle Errors**: Implement retry logic with exponential backoff
5. **Respect robots.txt**: Check site policies

## Common Patterns

### Pattern 1: Single Page Scraping
```
Trigger → Playwright → Parse → Export
```

### Pattern 2: Multi-Page with Pagination
```
Trigger → Generate URLs (pagination skill) →
Split Into Batches → Playwright → Wait 5s →
Parse → Deduplicate → Export
```

### Pattern 3: With Error Handling
```
Playwright → [Error Trigger] → Retry Logic → Notification
```

## Integration with Other Skills

- **pagination**: Generate URLs for multi-page scraping
- **html-parsing**: Extract data from rendered HTML
- **error-handling**: Retry on failures
- **debugging**: Validate extracted data

## Full Code and Documentation

Complete implementation with examples:
`/mnt/d/work/n8n_agent/n8n-skills/anti-scraping/`

Files:
- `playwright-cloudflare.js` - Main scraping script
- `README.md` - Detailed documentation
- `example-workflow.json` - n8n workflow example
- `config.template.env` - Configuration template

Related Skills

webscraping-ai-automation

from diegosouzapw/awesome-omni-skill

Automate Webscraping AI tasks via Rube MCP (Composio). Always search tools first for current schemas.

web-scraping

from diegosouzapw/awesome-omni-skill

Web scraping best practices for AI coding agents. Covers tmux session management for long-running scrapes, Crawl4AI integration, parallel pipeline orchestration, resume-friendly architecture, and rate limit handling. Use this skill when building scrapers, running data extraction jobs, or managing lead generation pipelines.

using-superantigravity

from diegosouzapw/awesome-omni-skill

Use when starting any conversation — establishes how to find and use skills, requiring skill check before ANY response including clarifying questions

Testing Anti-Patterns

from diegosouzapw/awesome-omni-skill

This skill should be used when encountering "flaky tests", "test maintenance issues", "slow test suites", "brittle tests", "test code smells", "test debugging problems", or when tests are hard to understand, maintain, or debug.

scrapingbee-automation

from diegosouzapw/awesome-omni-skill

Automate Scrapingbee tasks via Rube MCP (Composio). Always search tools first for current schemas.

scrapingant-automation

from diegosouzapw/awesome-omni-skill

Automate Scrapingant tasks via Rube MCP (Composio). Always search tools first for current schemas.

pydantic

from diegosouzapw/awesome-omni-skill

Python data validation using type hints and runtime type checking with Pydantic v2's Rust-powered core for high-performance validation in FastAPI, Django, and configuration management.

antipattern-detector

from diegosouzapw/awesome-omni-skill

Detect common technical and organizational anti-patterns in proposals, architectures, and plans. Use when strategic-cto-mentor needs to identify red flags before they become problems.

antipattern-catalog

from diegosouzapw/awesome-omni-skill

Document technical debt, anti-patterns, and patterns to avoid from analyzed frameworks. Use when (1) creating a "Do Not Repeat" list from framework analysis, (2) categorizing observed code smells and issues, (3) assessing severity of architectural problems, (4) generating remediation suggestions, or (5) synthesizing lessons learned across multiple frameworks.

antigravity-frontend-dev

from diegosouzapw/awesome-omni-skill

Antigravity/Claude specific skill for continuous frontend UI/UX improvement and development in the Juliaz Agents project.

anti-reversing-techniques

from diegosouzapw/awesome-omni-skill

Understand anti-reversing, obfuscation, and protection techniques encountered during software analysis. Use when analyzing protected binaries, bypassing anti-debugging for authorized analysis, or understanding software protection mechanisms.

anti-fabrication

from diegosouzapw/awesome-omni-skill

Validate claims through tool execution, avoid superlatives and unsubstantiated metrics. Use when reviewing codebases, analyzing systems, reporting test results, or making any factual claims about code or capabilities.