scrapling

CLI-first web scraping & content extraction with optional MCP server. Use when you have target URLs and need clean, selector-based outputs (html/md/txt).

1,174 stars

byforyourhealth111-pixel

View on GitHub Installation ↓

Best use case

scrapling is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

CLI-first web scraping & content extraction with optional MCP server. Use when you have target URLs and need clean, selector-based outputs (html/md/txt).

Teams using scrapling should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/scrapling/SKILL.md --create-dirs "https://raw.githubusercontent.com/foryourhealth111-pixel/Vibe-Skills/main/bundled/skills/scrapling/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/scrapling/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How scrapling Compares

Feature / Agent	scrapling	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

CLI-first web scraping & content extraction with optional MCP server. Use when you have target URLs and need clean, selector-based outputs (html/md/txt).

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Scrapling Skill (VCO)

Scrapling is a Python-based web scraping / extraction toolkit that exposes:
- a **CLI** (`scrapling ...`) for fetching + extracting content into files
- an **optional MCP server** (`scrapling mcp`) so an agent can call structured scraping tools

This skill is **CLI-first**. Prefer it when you already have URLs and need reliable, repeatable extraction (CSS selector → file).

## When to use

Use `scrapling` when you need:

- Extract **specific parts** of a web page (CSS selector / XPath) into `.txt` / `.md` / `.html`
- Run **repeatable scraping jobs** (batch URLs with a small wrapper script)
- Reduce token usage by extracting only the relevant DOM region before passing to the LLM
- Provide a local MCP endpoint for scraping tools (agent → MCP → scrapling)

## Boundaries (vs Playwright / Search)

### vs `playwright`
- `scrapling`: best for “get URL → extract selector → write file” workflows; simpler, faster iteration
- `playwright`: best for interactive UI flows (login, multi-step navigation, downloads, complex JS actions, stateful sessions)

If you must *navigate* or *click through* a UI, use `playwright`.
If you can directly fetch the target page and just need extraction, use `scrapling`.

### vs search tools
- Search tools are for discovering sources/URLs (query → result list → choose URLs).
- `scrapling` is for acquisition + extraction once you already know the URL(s).

A common pipeline:
1) Search → find candidate URLs
2) Scrapling → extract focused content from chosen URLs
3) LLM → summarize / transform / analyze extracted outputs

## Prerequisite check (required)

1) Python version (Scrapling requires Python >= 3.10):
```powershell
python --version
```

2) Scrapling CLI availability:
```powershell
scrapling --help
```

## Installation (recommended)

Scrapling’s CLI and MCP features are enabled via extras.

Recommended (CLI + MCP + fetchers):
```powershell
python -m pip install "scrapling[ai]"
```

If you only want CLI fetch/extract without MCP:
```powershell
python -m pip install "scrapling[fetchers]"
```

If you use browser-based fetchers, you may need browser binaries:
```powershell
# Option A: via Scrapling helper (after install)
scrapling install

# Option B: directly via Playwright
python -m playwright install
```

## Wrapper script (Windows convenience)

This skill ships a thin PowerShell wrapper:
- `C:/Users/羽裳/.codex/skills/scrapling/scripts/scrapling.ps1`

It checks whether `scrapling` exists and prints install hints if missing.

## Common CLI patterns

### 1) Extract full page body (to Markdown)
```powershell
scrapling extract get "https://example.com" out.md
```

### 2) Extract a specific element (CSS selector) to text
```powershell
scrapling extract get "https://example.com" out.txt --css-selector "main article"
```

### 3) Extract HTML for downstream parsing
```powershell
scrapling extract get "https://example.com" out.html --css-selector "#content"
```

### 4) Use browser-backed fetcher mode (when simple GET is blocked / dynamic)
```powershell
scrapling extract fetch "https://example.com" out.md --css-selector "main"
```

Tip: keep outputs in files and only feed the smallest relevant snippet to the LLM.

## MCP server relationship (optional)

Scrapling can run as an MCP server. This is useful when:
- the agent needs tool-style scraping calls
- you want scraping results to be structured and deterministic

Start MCP server (stdio transport by default):
```powershell
scrapling mcp
```

Optional: run MCP server with HTTP transport:
```powershell
scrapling mcp --http --host 127.0.0.1 --port 8765
```

### Example MCP server config snippet

```json
{
  "servers": {
    "scrapling": {
      "mode": "stdio",
      "command": "scrapling",
      "args": ["mcp"],
      "required": false,
      "note": "Requires: python -m pip install \"scrapling[ai]\""
    }
  }
}
```

## Safety & ops notes

- Prefer selector-based extraction to minimize data volume.
- Treat scraping as an external dependency: handle timeouts, retries, and failures explicitly.
- For aggressive bot protection, consider switching fetchers or using `playwright`.

Related Skills

zinc-database

1174

from foryourhealth111-pixel/Vibe-Skills

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

zarr-python

1174

from foryourhealth111-pixel/Vibe-Skills

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

yeet

1174

from foryourhealth111-pixel/Vibe-Skills

Use only when the user explicitly asks to stage, commit, push, and open a GitHub pull request in one flow using the GitHub CLI (`gh`).

xlsx

1174

from foryourhealth111-pixel/Vibe-Skills

Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.

xan

1174

from foryourhealth111-pixel/Vibe-Skills

High-performance CSV processing with xan CLI for large tabular datasets, streaming transformations, and low-memory pipelines.

writing-plans

1174

from foryourhealth111-pixel/Vibe-Skills

Use when you have a spec or requirements for a multi-step task, before touching code

writing-docs

1174

from foryourhealth111-pixel/Vibe-Skills

Guides for writing and editing Remotion documentation. Use when adding docs pages, editing MDX files in packages/docs, or writing documentation content.

windows-hook-debugging

1174

from foryourhealth111-pixel/Vibe-Skills

Windows环境下Claude Code插件Hook执行错误的诊断与修复。当遇到hook error、cannot execute binary file、.sh regex误匹配、WSL/Git Bash冲突时使用。

weights-and-biases

1174

from foryourhealth111-pixel/Vibe-Skills

Track ML experiments with automatic logging, visualize training in real-time, optimize hyperparameters with sweeps, and manage model registry with W&B - collaborative MLOps platform

webthinker-deep-research

1174

from foryourhealth111-pixel/Vibe-Skills

Deep web research for VCO: multi-hop search+browse+extract with an auditable action trace and a structured report (WebThinker-style).

vscode-release-notes-writer

1174

from foryourhealth111-pixel/Vibe-Skills

Guidelines for writing and reviewing Insiders and Stable release notes for Visual Studio Code.

visualization-best-practices

1174

from foryourhealth111-pixel/Vibe-Skills

Visualization Best Practices - Auto-activating skill for Data Analytics. Triggers on: visualization best practices, visualization best practices Part of the Data Analytics skill category.