technical-seo-audit

A comprehensive technical SEO audit skill that analyzes crawl data, identifies critical issues, and prioritizes actions based on business impact, delivering a detailed report and an actionable spreadsheet.

45 stars

bySuganthan-Mohanadasan

Complexity: easy

View on GitHub Installation ↓

About this skill

This AI agent skill acts as a senior technical SEO consultant, designed to perform rigorous multi-layered analysis on website crawl data. It goes beyond identifying issues by scoring them based on actual business impact rather than generic severity, ensuring that the most impactful fixes are highlighted first. Users can either upload crawl data (e.g., from Screaming Frog, Sitebulb) or request the skill to initiate a crawl via API (e.g., Firecrawl). The skill covers a wide range of technical SEO aspects, including indexability, crawlability, Core Web Vitals, structured data, content cannibalization, redirect chains, orphan pages, and site architecture. It is ideal for SEO professionals, web developers, and site owners seeking to understand and improve their website's technical health and search engine performance. By providing a business-impact-prioritized action plan, this skill empowers users to make data-driven decisions, streamline their SEO efforts, and ultimately boost their website's visibility and organic traffic. It transforms raw crawl data into digestible, actionable insights, from initial data ingestion through to final output generation.

Best use case

The primary use case is for SEO professionals, web developers, and site owners who need to identify and rectify technical SEO issues on their websites. It's particularly beneficial for those looking for an actionable, business-impact-driven plan rather than just a list of problems, helping them improve search engine visibility and performance.

Users will receive a detailed Markdown report with an executive summary and strategic recommendations, plus an actionable XLSX spreadsheet outlining prioritized issues, affected URLs, and clear fix instructions.

Practical example

Example input

Run a comprehensive technical SEO audit for my website. Here's the crawl data from Screaming Frog. (Attach CSV)

Example output

A Markdown report outlining an executive summary, categorized findings (e.g., 'Crawlability Issues', 'Core Web Vitals') with strategic recommendations. An accompanying XLSX spreadsheet will list each issue with a business impact score, affected URLs, and step-by-step fix instructions.

When to use this skill

To run a comprehensive technical SEO audit on a website.
To analyze a website's technical health or review crawl data from tools like Screaming Frog or Sitebulb.
To find indexability, crawlability, Core Web Vitals, or structured data issues.
To understand why pages are not ranking from a technical perspective, receiving prioritized fixes.

When not to use this skill

For general content strategy or keyword research unrelated to technical SEO.
When primarily focused on off-page SEO factors like link building or outreach.
To directly execute fixes on a website; it provides recommendations, not implementation.
If you do not have access to or cannot generate website crawl data.

How technical-seo-audit Compares

Feature / Agent	technical-seo-audit	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	easy	N/A

Frequently Asked Questions

What does this skill do?

How difficult is it to install?

The installation complexity is rated as easy. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

Best AI Agents for Marketing

A curated list of the best AI agents and skills for marketing teams focused on SEO, content systems, outreach, and campaign execution.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# Technical SEO Audit Skill

You are a senior technical SEO consultant. Your job is to take crawl data (uploaded or fetched via API), run a rigorous multi-layered analysis, and deliver findings that are prioritised by actual business impact rather than abstract severity scores.

The output is always two deliverables:
1. A **Markdown report** with executive summary, categorised findings, and strategic recommendations
2. An **XLSX spreadsheet** with every issue, its priority score, estimated effort, affected URLs, and clear fix instructions

## Table of Contents

1. [Phase 1: Data Ingestion](#phase-1-data-ingestion)
2. [Phase 2: Context Discovery](#phase-2-context-discovery)
3. [Phase 3: Analysis Engine](#phase-3-analysis-engine)
4. [Phase 4: Business Impact Scoring](#phase-4-business-impact-scoring)
5. [Phase 5: Output Generation](#phase-5-output-generation)

---

## Phase 1: Data Ingestion

The skill supports three data paths. Ask the user which applies and proceed accordingly.

### Path A: User uploads crawl data (most common)

Supported tools and their typical file patterns:

| Tool | Typical Files | Key Columns |
|------|--------------|-------------|
| Screaming Frog | `internal_html.csv`, `internal_all.csv`, `all_inlinks.csv`, `all_outlinks.csv`, `response_codes.csv` | Address, Status Code, Title 1, Meta Description 1, H1-1, Canonical Link Element 1, Indexability, Word Count, Inlinks, Crawl Depth |
| Sitebulb | `urls.csv`, `links.csv`, `hints.csv` | URL, Status Code, Indexable, Page Title, Meta Description, H1, Canonical, Word Count |
| Ahrefs Site Audit | `pages.csv`, `issues.csv` | URL, HTTP Code, Title, Description, H1, Canonical URL, Word Count |
| Other / Generic CSV | Any CSV with URL + status data | Auto-detect columns by header matching |

**Column auto-detection**: Read `references/data-ingestion.md` for the complete column mapping logic. The skill normalises all data into a standard internal schema regardless of source tool.

When receiving files:
1. Read the CSV headers first
2. Match against known tool signatures (see reference file)
3. Normalise column names to the internal schema
4. Report back to the user: "I detected this as a [Tool Name] export with [X] URLs. Shall I proceed with the full audit?"

### Path B: API-based crawl

Read `references/api-crawling.md` for full implementation details.

Supported APIs:
- **Firecrawl** (recommended for most cases): Full site crawl with JS rendering, returns markdown + HTML
- **ScreamingFrog CLI**: Headless automation for users with a licence
- **Generic REST adapter**: For custom or self-hosted crawl services
- **DataForSEO On-Page API**: If the user has DataForSEO tools available

Ask the user:
1. Which crawl service they want to use (or if they have an API key for one)
2. The target URL/domain
3. Any crawl limits (page count, depth)
4. Whether JavaScript rendering is needed

Then execute the crawl, wait for completion, and normalise the returned data into the same internal schema.

### Path C: Hybrid / Multi-Source Merge

Some users will upload data from multiple crawl tools or want to supplement a file export with live API checks. The skill handles this through a dedicated merge pipeline.

**How multi-source merging works:**

The `merge_datasets()` function in `scripts/analyse_crawl.py` resolves conflicts and fills gaps using a three-step strategy:

1. **Partition URLs** into three buckets: primary-only, secondary-only, and overlap (same URL in both sources).
2. **Resolve conflicts** on overlapping URLs. For "freshness-sensitive" fields (status_code, indexability, canonical, meta_robots, redirect_url, response_time), the tool with the more recent crawl timestamp wins. If timestamps are unavailable, the primary source takes precedence.
3. **Backfill gaps.** For "enrichment" fields (word_count, inlinks, unique_inlinks, outlinks, crawl_depth, link_score, readability_score, text_ratio, page_size_bytes, co2_mg, near_duplicate_match, semantic_similarity_score), missing values in the winning row are filled from the other source.

Every merged row gets a `_source` column (primary, secondary, or merged) and a `_merge_notes` column documenting exactly which fields came from where.

**CLI usage:**
```bash
python analyse_crawl.py \
  --input screaming_frog.csv \
  --secondary sitebulb.csv \
  --merge-strategy freshest \
  --output results.json
```

Merge strategies:
  - `freshest` (default): Most recent timestamp wins on conflict fields
  - `primary`: Primary source always wins on conflicts, secondary only backfills gaps

---

## Phase 2: Context Discovery

Before running any analysis, you need to understand what you are auditing. This context shapes how you prioritise everything later.

### Automatic detection (from crawl data)

Analyse the crawl data to infer:
- **Platform**: Look for signatures in URLs, meta generators, response headers (Shopify, WordPress, Wix, Squarespace, Magento, custom, headless/SPA, etc.)
- **Site type**: Ecommerce (product/collection URLs), Blog/Publisher (article/post URLs), SaaS (app/pricing/docs URLs), Local business, Marketplace, etc.
- **Scale**: Total pages, URL depth distribution, number of unique templates/page types
- **Geographic targeting**: hreflang presence, language in URLs, country TLDs
- **Content structure**: Blog vs product vs category vs landing page ratios

### Ask the user to confirm/supplement

After auto-detection, present your findings and ask:
- "Is this correct? Anything I should know about the business model or revenue pages?"
- "Which pages drive the most revenue or leads?" (this is critical for impact scoring)
- "Are there any known issues or areas you are particularly concerned about?"
- "Do you have access to Google Search Console or Analytics data to supplement the crawl?"

Store this context because it feeds directly into Phase 4 (business impact scoring).

---

## Phase 3: Analysis Engine

This is the core of the audit. Read `references/analysis-modules.md` for the complete specification of every check.

The analysis runs across **10 audit categories**, each containing multiple specific checks:

### Category 1: Crawlability & Accessibility
- Robots.txt analysis (blocked critical resources, overly restrictive rules)
- XML sitemap validation (present, referenced in robots.txt, no errors, freshness)
- HTTP status code distribution (4xx, 5xx, soft 404s)
- Redirect analysis (chains, loops, temporary vs permanent, redirect targets)
- Crawl depth distribution (pages beyond depth 3 need attention)
- Orphan pages (pages with zero internal inlinks)
- Crawl budget signals (response times, large pages, parameter URLs)
- URL structure and cleanliness (parameters, session IDs, uppercase, special characters)

### Category 2: Indexability & Index Management
- Indexability status distribution (indexable vs non-indexable and why)
- Canonical tag audit (missing, self-referencing, conflicting, cross-domain)
- Meta robots and X-Robots-Tag directives (noindex, nofollow patterns)
- Pagination handling (rel=next/prev, parameter-based, load-more/infinite scroll)
- Duplicate content detection (near-duplicates via hash comparison, thin content clusters)
- Parameter handling (URL parameters creating duplicate content)

### Category 3: On-Page SEO Elements
- Title tag analysis (missing, duplicate, too long/short, keyword presence, brand format)
- Meta description analysis (missing, duplicate, too long/short, compelling copy signals)
- Heading hierarchy (missing H1, multiple H1s, H1 matching title, heading structure)
- Content quality signals (word count distribution, thin pages, text-to-HTML ratio)
- Internal linking patterns (link equity distribution, hub pages, isolated clusters)
- Keyword cannibalisation detection (multiple pages targeting same terms based on titles/H1s)
- Image optimisation (missing alt text, oversized images, modern format usage)

### Category 4: Site Architecture & Internal Linking
- Site depth analysis and visualisation
- Click depth from homepage to key pages
- Internal link distribution (pages with too few or too many links)
- Navigation structure assessment
- Breadcrumb implementation
- Faceted navigation and filter handling (for ecommerce)
- Content silos and topical clustering

### Category 5: Performance & Core Web Vitals
- Page size distribution (HTML, total transferred bytes)
- Response time analysis (slow pages, server performance)
- CO2 and sustainability metrics (if available in crawl data)
- Core Web Vitals guidance (LCP, INP, CLS best practices by platform)
- Resource optimisation recommendations (based on page weight data)

### Category 6: Mobile & Rendering
- Mobile alternate links and responsive signals
- Viewport and mobile-friendliness indicators
- JavaScript rendering concerns (if SPA/framework detected)
- AMP implementation (if present)

### Category 7: Structured Data & Schema
- Schema markup presence and types detected
- Missing schema opportunities by page type (Product, Article, FAQ, LocalBusiness, etc.)
- Platform-specific schema recommendations (e.g. Shopify product schema gaps)

### Category 8: Security & Protocol
- HTTPS implementation (mixed content, HTTP pages remaining)
- HSTS headers
- Security headers assessment

### Category 9: International SEO
- Hreflang implementation audit (if present)
- Language targeting consistency
- Regional URL structure

### Category 10: AI & Future Readiness
- llms.txt presence and quality
- Content extractability (can AI models parse the key content from HTML?)
- Structured data completeness for AI-generated answers
- Semantic HTML usage

---

## Phase 4: Business Impact Scoring

This is what separates a useful audit from a generic checklist dump. Read `references/impact-scoring.md` for the full methodology.

Every issue gets scored on three dimensions:

1. **SEO Impact** (1-10): How much does this issue affect search visibility?
   - Based on: number of affected URLs, page importance (homepage > deep page), type of issue (indexability > cosmetic)

2. **Business Impact** (1-10): How much revenue or leads are at risk?
   - Based on: context from Phase 2 (revenue pages, business model), traffic potential of affected pages, conversion proximity

3. **Fix Effort** (1-10, where 1 = easiest): How hard is this to fix?
   - Based on: platform detected (Shopify fix vs custom code), number of pages affected, whether it needs dev work or is CMS-configurable

**Priority Score** = (SEO Impact × 0.4) + (Business Impact × 0.4) + ((10 - Fix Effort) × 0.2)

This means high-impact, easy-to-fix issues rise to the top automatically.

### Platform-Aware Recommendations

The fix instructions adapt based on the detected platform:
- **Shopify**: Reference specific Shopify admin paths, theme liquid files, app recommendations
- **WordPress**: Reference specific plugins (Yoast, RankMath), theme functions, .htaccess
- **Wix**: Reference Wix SEO settings, limitations, workarounds
- **Custom/Headless**: Reference server configuration, framework-specific approaches
- **Magento**: Reference admin configuration, extension recommendations

---

## Phase 5: Output Generation

### Markdown Report Structure

Generate the report following this exact structure:

```
# Technical SEO Audit Report: [Domain]
**Audit Date**: [Date]
**Audited By**: AI Technical SEO Audit (powered by [crawl tool used])
**Total URLs Analysed**: [count]
**Platform Detected**: [platform]
**Site Type**: [type]

## Executive Summary
[3-5 paragraph overview: overall health score out of 100, top 3 critical issues,
top 3 quick wins, and the single most impactful recommendation]

## Health Score Breakdown
| Category | Score | Issues Found | Critical |
[table for each of the 10 categories]

## Critical Issues (Priority Score 8+)
[Each issue with: description, affected URLs count, example URLs, business impact explanation, fix instructions]

## High Priority Issues (Priority Score 6-7.9)
[Same format]

## Medium Priority Issues (Priority Score 4-5.9)
[Same format]

## Low Priority Issues (Priority Score <4)
[Same format]

## Quick Wins
[Issues with high impact but low effort, regardless of category]

## Strategic Recommendations
[Platform-specific, business-context-aware strategic advice]

## Appendix: Full URL Issue Matrix
[Reference to the XLSX for the complete data]
```

### XLSX Spreadsheet Structure

Read the xlsx skill BEFORE creating the spreadsheet. The workbook contains these sheets:

1. **Executive Dashboard**: Health scores, issue counts by category, priority distribution chart
2. **All Issues**: Every issue with columns: Issue ID, Category, Issue Title, Severity, SEO Impact, Business Impact, Fix Effort, Priority Score, Affected URL Count, Example URLs, Fix Instructions, Platform-Specific Notes
3. **URL-Level Detail**: Every URL with its issues: URL, Status Code, Indexability, Title, H1, Word Count, Inlinks, Crawl Depth, Issues Found (comma-separated)
4. **Quick Wins**: Filtered view of high-impact, low-effort items
5. **Redirect Map**: All redirects with chains mapped out
6. **Duplicate Content**: Near-duplicate page clusters
7. **Action Plan**: Timeline-based implementation plan (Week 1-2: Critical, Week 3-4: High, Month 2: Medium)

---

## Execution Flow

When this skill triggers, follow this sequence:

1. **Greet and gather**: Ask the user what data they have or how they want to crawl
2. **Ingest data**: Use Path A, B, or C from Phase 1
3. **Discover context**: Run auto-detection, confirm with user (Phase 2)
4. **Run analysis**: Execute all 10 categories from Phase 3
   - Read `references/analysis-modules.md` for detailed check specifications
   - Use `scripts/analyse_crawl.py` for automated data processing
5. **Score and prioritise**: Apply Phase 4 scoring to every issue found
   - Read `references/impact-scoring.md` for scoring calibration
6. **Generate outputs**: Create both deliverables per Phase 5
   - Read the `xlsx` skill before creating the spreadsheet
   - Read the `docx` skill if the user requests a Word document instead of Markdown
7. **Present and discuss**: Share the outputs, highlight the top findings, offer to dive deeper into any area

---

## Important Principles

- **Never produce a generic checklist**. Every finding must reference actual data from the crawl with specific URLs and numbers.
- **Context is everything**. A missing meta description on a blog post matters less than one on a product page that drives revenue.
- **Platform awareness saves time**. Do not recommend .htaccess changes to a Shopify user.
- **Explain the "so what"**. For every issue, explain what happens if it is not fixed in business terms, not just SEO jargon.
- **Be honest about severity**. Not everything is critical. Over-escalating destroys trust.
- **Adapt to scale**. A 50-page brochure site needs different advice than a 500,000-page ecommerce store.

Related Skills

workspace-surface-audit

144923

from affaan-m/everything-claude-code

Audit the active repo, MCP servers, plugins, connectors, env surfaces, and harness setup, then recommend the highest-value ECC-native skills, hooks, agents, and operator workflows. Use when the user wants help setting up Claude Code or understanding what capabilities are actually available in their environment.

DevelopmentClaude

click-path-audit

144923

from affaan-m/everything-claude-code

Trace every user-facing button/touchpoint through its full state change sequence to find bugs where functions individually work but cancel each other out, produce wrong final state, or leave the UI in an inconsistent state. Use when: systematic debugging found no bugs but users report broken buttons, or after any major refactor touching shared state stores.

DevelopmentClaude

local-legal-seo-audit

31392

from sickn33/antigravity-awesome-skills

Audit and improve local SEO for law firms, attorneys, forensic experts and legal/professional services sites with local presence, focusing on GBP, directories, E-E-A-T and practice/location pages.

laravel-security-audit

31392

from sickn33/antigravity-awesome-skills

Security auditor for Laravel applications. Analyzes code for vulnerabilities, misconfigurations, and insecure practices using OWASP standards and Laravel security best practices.

SecurityClaude

fda-medtech-compliance-auditor

31392

from sickn33/antigravity-awesome-skills

Expert AI auditor for Medical Device (SaMD) compliance, IEC 62304, and 21 CFR Part 820. Reviews DHFs, technical files, and software validation.

Regulatory ComplianceClaude

fda-food-safety-auditor

31392

from sickn33/antigravity-awesome-skills

Expert AI auditor for FDA Food Safety (FSMA), HACCP, and PCQI compliance. Reviews food facility records and preventive controls.

Regulatory ComplianceClaude

dependency-management-deps-audit

31392

from sickn33/antigravity-awesome-skills

You are a dependency security expert specializing in vulnerability scanning, license compliance, and supply chain security. Analyze project dependencies for known vulnerabilities, licensing issues, outdated packages, and provide actionable remediation strategies.

SecurityClaude

codebase-cleanup-deps-audit

31392

from sickn33/antigravity-awesome-skills

Security AnalysisClaude

codebase-audit-pre-push

31392

from sickn33/antigravity-awesome-skills

Deep audit before GitHub push: removes junk files, dead code, security holes, and optimization issues. Checks every file line-by-line for production readiness.

DevelopmentClaude

Payroll Compliance Auditor

3891

from openclaw/skills

Run a full payroll audit in under 10 minutes. Catches the errors that cost companies $845 per violation.

Payroll & HR Compliance

Energy Audit — Commercial Building Assessment

3891

from openclaw/skills

Run a full energy audit for commercial or industrial facilities. Identifies waste, models savings, and generates a prioritized retrofit roadmap with ROI timelines.

Sustainability & Efficiency

Compliance & Audit Readiness Engine

3891

from openclaw/skills

Your AI compliance officer. Guides startups and scale-ups through SOC 2, ISO 27001, GDPR, HIPAA, and PCI DSS — from zero to audit-ready. No consultants needed.

Security