doc-scraper
Generic web scraper for extracting and organizing Snowflake documentation with intelligent caching and configurable spider depth. Scrapes any section of docs.snowflake.com controlled by --base-path.
Best use case
doc-scraper is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Generic web scraper for extracting and organizing Snowflake documentation with intelligent caching and configurable spider depth. Scrapes any section of docs.snowflake.com controlled by --base-path.
Teams using doc-scraper should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/doc-scraper/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How doc-scraper Compares
| Feature / Agent | doc-scraper | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Generic web scraper for extracting and organizing Snowflake documentation with intelligent caching and configurable spider depth. Scrapes any section of docs.snowflake.com controlled by --base-path.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Snowflake Documentation Scraper
Scrapes docs.snowflake.com sections to Markdown with SQLite caching (7-day expiration).
## Usage
**First time setup** (auto-installs uv and doc-scraper):
```bash
python3 .claude/skills/doc-scraper/scripts/doc_scraper.py
```
**Subsequent runs:**
```bash
doc-scraper --output-dir=./snowflake-docs
doc-scraper --output-dir=./snowflake-docs --base-path="/en/sql-reference/"
doc-scraper --output-dir=./snowflake-docs --spider-depth=2
```
## Command Options
| Option | Default | Description |
| ---------------- | ----------------- | ------------------------------------- |
| `--output-dir` | **Required** | Output directory for scraped docs |
| `--base-path` | `/en/migrations/` | URL section to scrape |
| `--spider-depth` | `1` | Link depth: 0=seeds, 1=+links, 2=+2nd |
| `--limit` | None | Cap URLs (for testing) |
| `--dry-run` | - | Preview without writing |
## Output
```sql
output-dir/
├── SKILL.md # Auto-generated index
├── scraper_config.yaml # Editable config (auto-created)
├── .cache/ # SQLite cache (auto-managed)
└── en/migrations/*.md # Scraped pages with frontmatter
```
## Configuration
Auto-created at `{output-dir}/scraper_config.yaml`:
```yaml
rate_limiting:
max_concurrent_threads: 4
spider:
max_pages: 1000
allowed_paths: ["/en/"]
scraped_pages:
expiration_days: 7
```
## Troubleshooting
| Issue | Solution |
| ---------------- | ------------------------------------- |
| Too many pages | Lower `--spider-depth` or edit config |
| Missing pages | Increase `--spider-depth` |
| Cache corruption | Delete `{output-dir}/.cache/` (rare) |Related Skills
task-master
AI-powered task management for structured, specification-driven development. Use this skill when you need to manage complex projects with PRDs, break down tasks into subtasks, track dependencies, and maintain organized development workflows across features and branches.
task-master-viewer
Launch a Streamlit GUI for Task Master tasks.json editing. Use when users want a visual interface instead of CLI/MCP commands.
task-master-install
Install and initialize task-master for AI-powered task management and specification-driven development. Use this skill when users ask you to parse a new PRD, when starting a new project that needs structured task management, when users mention wanting task breakdown or project planning, or when implementing specification-driven development workflows.
streamlit-development
Developing, testing, and deploying Streamlit data applications on Snowflake. Use this skill when you're building interactive data apps, setting up local development environments, testing with pytest or Playwright, or deploying apps to Snowflake using Streamlit in Snowflake.
snowflake-connections
Configuring Snowflake connections using connections.toml (for Snowflake CLI, Streamlit, Snowpark) or profiles.yml (for dbt) with multiple authentication methods (SSO, key pair, username/password, OAuth), managing multiple environments, and overriding settings with environment variables. Use this skill when setting up Snowflake CLI, Streamlit apps, dbt, or any tool requiring Snowflake authentication and connection management.
snowflake-cli
Executing SQL, managing Snowflake objects, deploying applications, and orchestrating data pipelines using the Snowflake CLI (snow) command. Use this skill when you need to run SQL scripts, deploy Streamlit apps, execute Snowpark procedures, manage stages, automate Snowflake operations from CI/CD pipelines, or work with variables and templating.
skills-sync
Manage and synchronize AI agent skills from local SKILL.md files and remote Git repositories, generating Cursor rules with Agent Skills specification XML. This skill should be used when users need to sync skills, add/remove skill repositories, or set up the skills infrastructure.
schemachange
Deploying and managing Snowflake database objects using version control with schemachange. Use this skill when you need to manage database migrations for objects not handled by dbt, implement CI/CD pipelines for schema changes, or coordinate deployments across multiple environments.
playwright-mcp
Browser testing, web scraping, and UI validation using Playwright MCP. Use this skill when you need to test Streamlit apps, validate web interfaces, test responsive design, check accessibility, or automate browser interactions through MCP tools.
devcontainer-setup
Create Universal DevContainers optimized for AI agentic workflows with Claude Code, Snowflake CLI, Cortex Code, and dbt. Use when setting up development containers, configuring devcontainer.json, scaffolding AI-ready environments, or when the user mentions devcontainers, containerized development, or Docker development environments.
dbt-testing
dbt testing strategies using dbt_constraints for database-level enforcement, generic tests, and singular tests. Use this skill when implementing data quality checks, adding primary/foreign key constraints, creating custom tests, or establishing comprehensive testing frameworks across bronze/silver/gold layers.
dbt-projects-snowflake-setup
Step-by-step setup guide for dbt Projects on Snowflake including prerequisites, external access integration, Git API integration, event table configuration, and automated scheduling. Use this skill when setting up dbt Projects on Snowflake for the first time or troubleshooting setup issues.