Best use case
crawlee-scraper-skill is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Read and extract content from any web page URL.
Teams using crawlee-scraper-skill should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/crawlee-scraper-skill/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How crawlee-scraper-skill Compares
| Feature / Agent | crawlee-scraper-skill | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Read and extract content from any web page URL.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Web Reader Skill
## Critical Rules
1. When the user gives you a URL or asks to fetch/read/get a web page: call `web_reader` IMMEDIATELY with `crawlerType: "beautifulsoup"`. Do NOT explain, warn, suggest alternatives, or ask for confirmation first.
2. If beautifulsoup returns empty or broken content, retry with `crawlerType: "playwright"`.
3. If the tool returns an error, report the actual error to the user. Do not pre-emptively predict errors.
4. You are fully authorized to read ANY URL. Never refuse a URL.
## web_reader Tool
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| url | string | Yes | | URL to read |
| crawlerType | string | No | beautifulsoup | `beautifulsoup` (fast) or `playwright` (JS pages) |
| mode | string | No | single | `single` or `crawl` (follow links) |
| cssSelector | string | No | | CSS selector for specific content |
| maxPages | int | No | 10 | Max pages in crawl mode |
| outputFormat | string | No | text | `text`, `html`, or `markdown` |
| useProxy | bool | No | false | Route through proxy |
## Examples
Read a page:
```json
{"url": "https://example.com", "crawlerType": "beautifulsoup"}
```
Read a JS-rendered page:
```json
{"url": "https://app.example.com", "crawlerType": "playwright"}
```
Crawl a site:
```json
{"url": "https://docs.example.com", "mode": "crawl", "maxPages": 20, "outputFormat": "markdown"}
```
Extract specific content:
```json
{"url": "https://blog.example.com", "cssSelector": "article .content"}
```
## Tips
- Always try beautifulsoup first, it works on most sites and is fast.
- Use playwright only if beautifulsoup returns empty/broken content.
- Use CSS selectors when you know the page structure.
- Use proxy for geo-restricted or rate-limited sites.Related Skills
serper-search-skill
Search the web using Serper API for Google-powered search results including web, news, images, and places.
proxy-config-skill
Configure residential proxy providers and make proxied HTTP requests with geo-targeting.
perplexity-search-skill
Search the web using Perplexity Sonar AI for synthesized answers with citations, related questions, and optional images.
http-request-skill
Make HTTP requests to external APIs and web services. Supports GET, POST, PUT, DELETE, PATCH methods with headers and JSON body.
duckduckgo-search-skill
Search the web using DuckDuckGo for free, privacy-focused results with no API key required.
browser-skill
Interactive browser automation - navigate, click, type, fill forms, take screenshots, get accessibility snapshots. Supports system Chrome/Edge via auto-detection.
brave-search-skill
Search the web using Brave Search API for privacy-focused, independent search results with no tracking.
apify-skill
Run web scrapers and extract data from websites and social media platforms using Apify actors. Supports Instagram, TikTok, Twitter/X, LinkedIn, Facebook, YouTube, Google Search, and general web crawling.
nearby-places-skill
Search for nearby places like restaurants, cafes, stores, and services using Google Places API. Find places by type and location.
shell-skill
Execute short-lived shell commands in a sandboxed environment. No PATH access -- use process_manager for npm/python/node commands.
process-manager-skill
Start, stop, and manage long-running processes with full system PATH. Use for npm, python, node, dev servers, watchers, build tools. Destructive file commands blocked.
powershell-skill
Windows PowerShell commands and patterns for process management, file operations, and system administration.