anti-scraping
Use when need to bypass Cloudflare protection, scrape websites with anti-bot measures, render JavaScript pages, or simulate real browser behavior for web scraping
Best use case
anti-scraping is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Use when need to bypass Cloudflare protection, scrape websites with anti-bot measures, render JavaScript pages, or simulate real browser behavior for web scraping
Teams using anti-scraping should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/anti-scraping/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How anti-scraping Compares
| Feature / Agent | anti-scraping | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Use when need to bypass Cloudflare protection, scrape websites with anti-bot measures, render JavaScript pages, or simulate real browser behavior for web scraping
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Anti-Scraping & Web Scraping
**When to use**: Websites with Cloudflare protection, JavaScript rendering requirements, or anti-bot measures.
## Overview
Provides battle-tested solutions for bypassing common anti-scraping measures using Playwright headless browser with stealth configurations.
## Key Capabilities
- ✅ Cloudflare challenge bypass
- ✅ JavaScript rendering
- ✅ Real browser context simulation
- ✅ Stealth mode (hides automation detection)
- ✅ Screenshot capture for debugging
## Quick Start
### Prerequisites
```bash
# Install Playwright
npm install -g playwright
playwright install chromium
```
### Basic Usage Pattern
```javascript
// n8n Execute Command node
const { execSync } = require('child_process');
const url = 'https://example.com';
const outputFile = '/tmp/page.html';
// Playwright command with stealth
const command = `node playwright-cloudflare.js "${url}" "${outputFile}"`;
execSync(command);
// Read result
const html = fs.readFileSync(outputFile, 'utf8');
```
## Core Script: playwright-cloudflare.js
**Location**: `n8n-skills/anti-scraping/playwright-cloudflare.js`
**Key Features**:
- Disables automation detection
- Sets real browser headers
- Configures viewport and user agent
- Handles Cloudflare waiting
- Captures screenshots on failure
**Configuration**:
```javascript
const config = {
waitForCloudflare: true, // Wait for CF challenge
waitTime: 15000, // Max wait time (ms)
selector: '.product-list', // Element to wait for
screenshotOnError: true, // Debug screenshots
userAgent: 'Mozilla/5.0...' // Real browser UA
};
```
## n8n Workflow Pattern
```
[Manual Trigger]
↓
[Set Parameters]
target_url: https://site.com
wait_selector: .content
↓
[Execute Command: Playwright]
Command: node
Arguments: playwright-cloudflare.js {{$json.target_url}} /tmp/output.html
↓
[Read HTML File]
File: /tmp/output.html
↓
[Parse with Cheerio]
(use html-parsing skill)
```
## Performance
- **Speed**: 15-25 seconds per page
- **Success Rate**: ~95% for Cloudflare sites
- **Resource Usage**: ~200-300MB RAM per browser instance
## Troubleshooting
### Cloudflare Still Blocking
```bash
# Increase wait time
--wait 30000
# Add specific selector to wait for
--selector '.product-list'
# Check screenshot for errors
/tmp/error-screenshot.png
```
### Timeout Errors
```bash
# Increase timeout in playwright script
timeout: 60000 // 60 seconds
```
### Memory Issues
```bash
# Close browser properly
await browser.close();
# Limit concurrent instances
# Use n8n Split Into Batches with batch size = 1
```
## Best Practices
1. **Add Delays**: Wait 3-5 seconds between requests
2. **Rotate User Agents**: Change UA periodically
3. **Use Residential Proxies**: For high-volume scraping
4. **Handle Errors**: Implement retry logic with exponential backoff
5. **Respect robots.txt**: Check site policies
## Common Patterns
### Pattern 1: Single Page Scraping
```
Trigger → Playwright → Parse → Export
```
### Pattern 2: Multi-Page with Pagination
```
Trigger → Generate URLs (pagination skill) →
Split Into Batches → Playwright → Wait 5s →
Parse → Deduplicate → Export
```
### Pattern 3: With Error Handling
```
Playwright → [Error Trigger] → Retry Logic → Notification
```
## Integration with Other Skills
- **pagination**: Generate URLs for multi-page scraping
- **html-parsing**: Extract data from rendered HTML
- **error-handling**: Retry on failures
- **debugging**: Validate extracted data
## Full Code and Documentation
Complete implementation with examples:
`/mnt/d/work/n8n_agent/n8n-skills/anti-scraping/`
Files:
- `playwright-cloudflare.js` - Main scraping script
- `README.md` - Detailed documentation
- `example-workflow.json` - n8n workflow example
- `config.template.env` - Configuration templateRelated Skills
webscraping-ai-automation
Automate Webscraping AI tasks via Rube MCP (Composio). Always search tools first for current schemas.
web-scraping
Web scraping best practices for AI coding agents. Covers tmux session management for long-running scrapes, Crawl4AI integration, parallel pipeline orchestration, resume-friendly architecture, and rate limit handling. Use this skill when building scrapers, running data extraction jobs, or managing lead generation pipelines.
using-superantigravity
Use when starting any conversation — establishes how to find and use skills, requiring skill check before ANY response including clarifying questions
Testing Anti-Patterns
This skill should be used when encountering "flaky tests", "test maintenance issues", "slow test suites", "brittle tests", "test code smells", "test debugging problems", or when tests are hard to understand, maintain, or debug.
scrapingbee-automation
Automate Scrapingbee tasks via Rube MCP (Composio). Always search tools first for current schemas.
scrapingant-automation
Automate Scrapingant tasks via Rube MCP (Composio). Always search tools first for current schemas.
pydantic
Python data validation using type hints and runtime type checking with Pydantic v2's Rust-powered core for high-performance validation in FastAPI, Django, and configuration management.
antipattern-detector
Detect common technical and organizational anti-patterns in proposals, architectures, and plans. Use when strategic-cto-mentor needs to identify red flags before they become problems.
antipattern-catalog
Document technical debt, anti-patterns, and patterns to avoid from analyzed frameworks. Use when (1) creating a "Do Not Repeat" list from framework analysis, (2) categorizing observed code smells and issues, (3) assessing severity of architectural problems, (4) generating remediation suggestions, or (5) synthesizing lessons learned across multiple frameworks.
antigravity-frontend-dev
Antigravity/Claude specific skill for continuous frontend UI/UX improvement and development in the Juliaz Agents project.
anti-reversing-techniques
Understand anti-reversing, obfuscation, and protection techniques encountered during software analysis. Use when analyzing protected binaries, bypassing anti-debugging for authorized analysis, or understanding software protection mechanisms.
anti-fabrication
Validate claims through tool execution, avoid superlatives and unsubstantiated metrics. Use when reviewing codebases, analyzing systems, reporting test results, or making any factual claims about code or capabilities.