seo-firecrawl

Full-site crawling, scraping, and site mapping via Firecrawl MCP. Use when user says "crawl site", "map site", "full crawl", "find all pages", "broken links", "site structure", "discover pages", "JS rendering", or needs site-wide analysis.

4,023 stars

Best use case

seo-firecrawl is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Full-site crawling, scraping, and site mapping via Firecrawl MCP. Use when user says "crawl site", "map site", "full crawl", "find all pages", "broken links", "site structure", "discover pages", "JS rendering", or needs site-wide analysis.

Teams using seo-firecrawl should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/seo-firecrawl/SKILL.md --create-dirs "https://raw.githubusercontent.com/AgriciDaniel/claude-seo/main/extensions/firecrawl/skills/seo-firecrawl/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/seo-firecrawl/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How seo-firecrawl Compares

Feature / Agentseo-firecrawlStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Full-site crawling, scraping, and site mapping via Firecrawl MCP. Use when user says "crawl site", "map site", "full crawl", "find all pages", "broken links", "site structure", "discover pages", "JS rendering", or needs site-wide analysis.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Firecrawl Extension for Claude SEO

This skill requires the Firecrawl extension to be installed:
```bash
./extensions/firecrawl/install.sh
```

**Check availability:** Before using any Firecrawl tool, verify the MCP server
is connected by checking if `firecrawl_scrape` or any Firecrawl tool
is available. If tools are not available, inform the user the extension is not
installed and provide install instructions.

## Quick Reference

| Command | Purpose |
|---------|---------|
| `/seo firecrawl crawl <url>` | Full-site crawl with content extraction |
| `/seo firecrawl map <url>` | Discover site structure (URLs only, fast) |
| `/seo firecrawl scrape <url>` | Single-page scrape with JS rendering |
| `/seo firecrawl search <query> <url>` | Search within a crawled site |

## Commands

### crawl -- Full-Site Crawl

Crawl an entire website starting from the given URL. Returns page content,
metadata, and links for all discovered pages.

**MCP Tool:** `firecrawl_crawl`

**Parameters:**
- `url` (required): Starting URL to crawl
- `limit`: Max pages to crawl (default: 100, max: 500)
- `maxDepth`: Max link depth from start URL (default: 3)
- `includePaths`: Array of glob patterns to include (e.g., `["/blog/*"]`)
- `excludePaths`: Array of glob patterns to exclude (e.g., `["/admin/*", "/api/*"]`)
- `scrapeOptions.formats`: Output formats -- `["markdown", "html", "links"]`

**SEO Usage Patterns:**
1. **Comprehensive audit crawl**: Crawl full site, extract all pages for subagent analysis
2. **Section-focused crawl**: Use `includePaths` to audit only `/blog/*` or `/products/*`
3. **Broken link detection**: Crawl with `["links"]` format, check all hrefs for 404s
4. **Content inventory**: Extract all page titles, meta descriptions, H1s at scale
5. **SPA/JS-rendered sites**: Firecrawl renders JavaScript, solving the Issue #11 problem

**Example orchestration for `/seo audit`:**
```
1. firecrawl_map(url) -> get all URLs (fast, no content)
2. Filter to top 50 most important pages (homepage, key sections)
3. firecrawl_crawl(url, limit=50) -> get full content
4. Feed content to seo-technical, seo-content, seo-schema agents
```

**Cost awareness:**
- Free tier: 500 credits/month
- 1 credit = 1 page crawled or scraped
- Map operations are cheaper (0.5 credits per URL discovered)
- Always inform user of estimated credit usage before large crawls

### map -- Site Structure Discovery

Discover all URLs on a website without fetching content. Fast and credit-efficient.

**MCP Tool:** `firecrawl_map`

**Parameters:**
- `url` (required): Website URL to map
- `limit`: Max URLs to discover (default: 5000)
- `search`: Optional search term to filter URLs

**SEO Usage Patterns:**
1. **Sitemap comparison**: Map site, compare discovered URLs vs XML sitemap
2. **Orphan page detection**: URLs in sitemap but not linked from any page
3. **Crawl budget analysis**: Total indexable pages vs pages linked from homepage
4. **URL pattern analysis**: Identify URL structure patterns, duplicates, parameter bloat
5. **Pre-audit discovery**: Run map first, then targeted crawl on key sections

**Output:** Array of URLs. Present as:
```
Site: example.com
Pages discovered: 342

URL Pattern Breakdown:
  /blog/*          - 128 pages (37%)
  /products/*      - 89 pages (26%)
  /category/*      - 45 pages (13%)
  /pages/*         - 32 pages (9%)
  / (root pages)   - 48 pages (14%)
```

### scrape -- Single-Page Deep Scrape

Scrape a single page with full JavaScript rendering. More thorough than
`fetch_page.py` because it executes JS and waits for dynamic content.

**MCP Tool:** `firecrawl_scrape`

**Parameters:**
- `url` (required): Page URL to scrape
- `formats`: Output formats -- `["markdown", "html", "links", "screenshot"]`
- `onlyMainContent`: Strip nav/footer/sidebar (default: true)
- `waitFor`: CSS selector or milliseconds to wait for content
- `timeout`: Request timeout in ms (default: 30000)
- `actions`: Browser actions before scraping (click, scroll, wait)

**SEO Usage Patterns:**
1. **SPA content extraction**: Scrape JS-rendered React/Vue/Angular pages
2. **Dynamic content audit**: Pages with lazy-loaded content below the fold
3. **Paywall/login detection**: Identify content behind authentication walls
4. **Main content extraction**: Use `onlyMainContent` for clean E-E-A-T analysis
5. **Screenshot capture**: Use `screenshot` format for visual analysis

**When to use scrape vs fetch_page.py:**
| Scenario | Use |
|----------|-----|
| Static HTML page | `fetch_page.py` (no API cost) |
| JS-rendered SPA | `firecrawl_scrape` (renders JS) |
| Need response headers | `fetch_page.py` (returns headers) |
| Need clean markdown | `firecrawl_scrape` (better extraction) |
| Rate-limited/blocked | `firecrawl_scrape` (handles anti-bot) |

### search -- Site-Scoped Search

Search within a website for specific content. Useful for finding pages
related to a topic without crawling everything.

**MCP Tool:** `firecrawl_search`

**Parameters:**
- `query` (required): Search query
- `url` (required): Website to search within
- `limit`: Max results (default: 10)
- `scrapeOptions.formats`: Output format for matched pages

**SEO Usage Patterns:**
1. **Content gap validation**: Search for a keyword on the site to check if content exists
2. **Internal linking opportunities**: Find pages mentioning a topic that could link to each other
3. **Duplicate content detection**: Search for key phrases to find near-duplicates
4. **Competitor content research**: Search competitor site for specific topics

## Cross-Skill Integration

### With seo-audit (full audit)
When Firecrawl is available during `/seo audit`:
1. Use `firecrawl_map` to discover all site URLs
2. Compare with XML sitemap (seo-sitemap) to find orphan/missing pages
3. Select top pages for deep analysis
4. Feed crawled content to all subagents (technical, content, schema, geo)
5. Report total crawlable pages, URL patterns, and crawl depth

### With seo-technical
- Broken link detection: crawl all internal links, check for 404s
- Redirect chain mapping: follow all redirects, flag chains > 2 hops
- Mixed content detection: check HTTP resources on HTTPS pages
- Canonical verification: compare canonical URLs with actual URLs

### With seo-sitemap
- Sitemap coverage: % of crawled pages present in sitemap
- Orphan pages: pages found by crawl but missing from sitemap
- Stale sitemap entries: URLs in sitemap that return 404/410

### With seo-content
- Content extraction: feed clean markdown to E-E-A-T analysis
- Thin content detection: identify pages with < 300 words at scale
- Duplicate content: compare content across pages for near-duplicates

### With seo-schema
- Schema extraction: pull JSON-LD from all crawled pages
- Schema coverage: % of pages with structured data
- Schema validation: batch-validate extracted schemas

## Error Handling

| Error | Cause | Resolution |
|-------|-------|-----------|
| `FIRECRAWL_API_KEY not set` | MCP not configured | Run `./extensions/firecrawl/install.sh` |
| `402 Payment Required` | Credits exhausted | Check usage at firecrawl.dev/app, upgrade plan |
| `429 Too Many Requests` | Rate limited | Wait 60s, reduce crawl concurrency |
| `408 Timeout` | Page too slow to render | Increase `timeout`, try without JS rendering |
| `403 Forbidden` | Site blocks crawling | Check robots.txt, may need to skip this site |

**Graceful fallback:** If Firecrawl is unavailable, inform the user and suggest:
1. Use `fetch_page.py` for single-page analysis (no API cost)
2. Use `WebFetch` tool for basic HTML retrieval
3. Install Firecrawl: `./extensions/firecrawl/install.sh`

Related Skills

seo

4023
from AgriciDaniel/claude-seo

Comprehensive SEO analysis for any website or business type. Full site audits, single-page analysis, technical SEO (crawlability, indexability, Core Web Vitals with INP), schema markup, content quality (E-E-A-T), image optimization, sitemap analysis, and GEO for AI Overviews/ChatGPT/Perplexity. Industry detection for SaaS, e-commerce, local, publishers, agencies. Triggers on: SEO, audit, schema, Core Web Vitals, sitemap, E-E-A-T, AI Overviews, GEO, technical SEO, content quality, page speed, structured data.

seo-technical

4023
from AgriciDaniel/claude-seo

Technical SEO audit across 9 categories: crawlability, indexability, security, URL structure, mobile, Core Web Vitals, structured data, JavaScript rendering, and IndexNow protocol. Use when user says "technical SEO", "crawl issues", "robots.txt", "Core Web Vitals", "site speed", or "security headers".

seo-sitemap

4023
from AgriciDaniel/claude-seo

Analyze existing XML sitemaps or generate new ones with industry templates. Validates format, URLs, and structure. Use when user says "sitemap", "generate sitemap", "sitemap issues", or "XML sitemap".

seo-schema

4023
from AgriciDaniel/claude-seo

Detect, validate, and generate Schema.org structured data. JSON-LD format preferred. Use when user says "schema", "structured data", "rich results", "JSON-LD", or "markup".

seo-programmatic

4023
from AgriciDaniel/claude-seo

Programmatic SEO planning and analysis for pages generated at scale from data sources. Covers template engines, URL patterns, internal linking automation, thin content safeguards, and index bloat prevention. Use when user says "programmatic SEO", "pages at scale", "dynamic pages", "template pages", "generated pages", or "data-driven SEO".

seo-plan

4023
from AgriciDaniel/claude-seo

Strategic SEO planning for new or existing websites. Industry-specific templates, competitive analysis, content strategy, and implementation roadmap. Use when user says "SEO plan", "SEO strategy", "SEO planning", "content strategy", "keyword strategy", "content calendar", "site architecture", or "SEO roadmap".

seo-page

4023
from AgriciDaniel/claude-seo

Deep single-page SEO analysis covering on-page elements, content quality, technical meta tags, schema, images, and performance. Use when user says "analyze this page", "check page SEO", "single URL", "check this page", "page analysis", or provides a single URL for review.

seo-maps

4023
from AgriciDaniel/claude-seo

Maps intelligence for local SEO — geo-grid rank tracking, GBP profile auditing via API, review intelligence across Google/Tripadvisor/Trustpilot, cross-platform NAP verification (Google/Bing/Apple/OSM), competitor radius mapping, and LocalBusiness schema generation from API data. Three-tier capability: free (Overpass + Geoapify), DataForSEO (full intelligence), DataForSEO + Google (maximum coverage). Use when user says "maps", "geo-grid", "rank tracking", "GBP audit", "review velocity", "competitor radius", "maps analysis", "local rank tracking", "Share of Local Voice", or "SoLV".

seo-local

4023
from AgriciDaniel/claude-seo

Local SEO analysis covering Google Business Profile optimization, NAP consistency, citation health, review signals, local schema markup, location page quality, multi-location SEO, and industry-specific recommendations. Detects business type (brick-and-mortar, SAB, hybrid) and industry vertical (restaurant, healthcare, legal, home services, real estate, automotive). Use when user says "local SEO", "Google Business Profile", "GBP", "map pack", "local pack", "citations", "NAP consistency", "local rankings", "service area", "multi-location", or "local search".

seo-images

4023
from AgriciDaniel/claude-seo

Image optimization analysis for SEO and performance. Checks alt text, file sizes, formats, responsive images, lazy loading, and CLS prevention. Use when user says "image optimization", "alt text", "image SEO", "image size", or "image audit".

seo-hreflang

4023
from AgriciDaniel/claude-seo

Hreflang and international SEO audit, validation, and generation. Detects common mistakes, validates language/region codes, and generates correct hreflang implementations. Use when user says "hreflang", "i18n SEO", "international SEO", "multi-language", "multi-region", or "language tags".

seo-google

4023
from AgriciDaniel/claude-seo

Google SEO APIs: Search Console (Search Analytics, URL Inspection, Sitemaps), PageSpeed Insights v5, CrUX field data with 25-week history, Indexing API v3, and GA4 organic traffic. Provides real Google field data for Core Web Vitals, indexation status, search performance, and organic traffic trends. Use when user says "search console", "GSC", "PageSpeed", "CrUX", "field data", "indexing API", "GA4 organic", "URL inspection", "google api setup", "real CWV data", "impressions", "clicks", "CTR", "position data", "LCP", "INP", "CLS", "FCP", "TTFB", or "Lighthouse scores".