robots-txt

When the user wants to configure, audit, or optimize robots.txt. Also use when the user mentions "robots.txt," "crawler rules," "block crawlers," "AI crawlers," "GPTBot," "allow/disallow," "disallow path," "crawl directives," "user-agent," "block Googlebot," "fix robots.txt," "robots.txt blocking," or "search engine crawling." For indexing, use indexing.

313 stars

Best use case

robots-txt is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

When the user wants to configure, audit, or optimize robots.txt. Also use when the user mentions "robots.txt," "crawler rules," "block crawlers," "AI crawlers," "GPTBot," "allow/disallow," "disallow path," "crawl directives," "user-agent," "block Googlebot," "fix robots.txt," "robots.txt blocking," or "search engine crawling." For indexing, use indexing.

Teams using robots-txt should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/robots/SKILL.md --create-dirs "https://raw.githubusercontent.com/kostja94/marketing-skills/main/skills/seo/technical/robots/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/robots/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How robots-txt Compares

Feature / Agentrobots-txtStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

When the user wants to configure, audit, or optimize robots.txt. Also use when the user mentions "robots.txt," "crawler rules," "block crawlers," "AI crawlers," "GPTBot," "allow/disallow," "disallow path," "crawl directives," "user-agent," "block Googlebot," "fix robots.txt," "robots.txt blocking," or "search engine crawling." For indexing, use indexing.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# SEO Technical: robots.txt

Guides configuration and auditing of robots.txt for search engine and AI crawler control.

**When invoking**: On **first use**, if helpful, open with 1–2 sentences on what this skill covers and why it matters, then provide the main output. On **subsequent use** or when the user asks to skip, go directly to the main output.

## Scope (Technical SEO)

- **Robots.txt**: Configure Disallow/Allow, Sitemap, Clean-param; audit for accidental blocks
- **Crawler access**: Path-level crawl control; AI crawler allow/block strategy
- **Differentiation**: robots.txt = crawl control (who accesses what paths); noindex = index control (what gets indexed). See **indexing** for page-level exclusions.

## Initial Assessment

**Check for project context first:** If `.claude/project-context.md` or `.cursor/project-context.md` exists, read it for site URL and indexing goals.

Identify:
1. **Site URL**: Base domain (e.g., `https://example.com`)
2. **Indexing scope**: Full site, partial, or specific paths to exclude
3. **AI crawler strategy**: Allow search/indexing vs. block training data crawlers

## Best Practices

### Purpose and Limitations

| Point | Note |
|-------|------|
| **Purpose** | Controls crawler access; does NOT prevent indexing (disallowed URLs may still appear in search without snippet) |
| **Advisory** | Rules are advisory; malicious crawlers may ignore |
| **Public** | robots.txt is publicly readable; use noindex or auth for sensitive content. See **indexing** |

### Crawl vs Index vs Link Equity (Quick Reference)

| Tool | Controls | Prevents indexing? |
|------|----------|-------------------|
| **robots.txt** | Crawl (path-level) | No—blocked URLs may still appear in SERP |
| **noindex** (meta / X-Robots-Tag) | Index (page-level) | Yes. See **indexing** |
| **nofollow** | Link equity only | No—does not control indexing |

### When to Use robots.txt vs noindex

| Use | Tool | Example |
|-----|------|---------|
| **Path-level** (whole directory) | robots.txt | `Disallow: /admin/`, `Disallow: /api/`, `Disallow: /staging/` |
| **Page-level** (specific pages) | noindex meta / X-Robots-Tag | Login, signup, thank-you, 404, legal. See **indexing** for full list |
| **Critical** | Do NOT block in robots.txt | Pages that use noindex—crawlers must access the page to read the directive |

**Paths to block in robots.txt**: /admin/, /api/, /staging/, temp files. **Paths to use noindex** (allow crawl): /login/, /signup/, /thank-you/, etc.—see **indexing**.

### Location and Format

| Item | Requirement |
|------|-------------|
| **Path** | Site root: `https://example.com/robots.txt` |
| **Encoding** | UTF-8 plain text |
| **Standard** | RFC 9309 (Robots Exclusion Protocol) |

### Core Directives

| Directive | Purpose | Example |
|-----------|---------|---------|
| `User-agent:` | Target crawler | `User-agent: Googlebot`, `User-agent: *` |
| `Disallow:` | Block path prefix | `Disallow: /admin/` |
| `Allow:` | Allow path (can override Disallow) | `Allow: /public/` |
| `Sitemap:` | Declare sitemap absolute URL | `Sitemap: https://example.com/sitemap.xml` |
| `Clean-param:` | Strip query params (Yandex) | See below |

### Critical: Do Not Block

| Do not block | Reason |
|--------------|--------|
| CSS, JS, images | Google needs them to render pages; blocking breaks indexing |
| `/_next/` (Next.js) | Breaks CSS/JS loading; static assets in GSC "Crawled - not indexed" is expected. See **indexing** |
| Pages that use noindex | Crawlers must access the page to read the noindex directive; blocking in robots.txt prevents that |

**Only block**: paths that don't need crawling: /admin/, /api/, /staging/, temp files.

### AI Crawler Strategy

robots.txt is effective for all measured AI crawlers ([Vercel/MERJ study](https://vercel.com/blog/the-rise-of-the-ai-crawler), 2024). Set rules per user-agent; check each vendor's docs for current tokens.

| User-agent | Purpose | Typical |
|------------|---------|---------|
| **OAI-SearchBot** | ChatGPT search | Allow |
| **GPTBot** | OpenAI training | Disallow |
| **Claude-SearchBot** | Claude search | Allow |
| **ClaudeBot** | Anthropic training | Disallow |
| **PerplexityBot** | Perplexity search | Allow |
| **Google-Extended** | Gemini training | Disallow |
| **CCBot** | Common Crawl (LLM training) | Disallow |
| **Bytespider** | ByteDance | Disallow |
| **Meta-ExternalAgent** | Meta | Disallow |
| **AppleBot** | Apple (Siri, Spotlight); renders JS | Allow for indexing |

**Allow vs Disallow**: Allow search/indexing bots (OAI-SearchBot, Claude-SearchBot, PerplexityBot); Disallow training-only bots (GPTBot, ClaudeBot, CCBot) if you don't want content used for model training. See **site-crawlability** for AI crawler optimization (SSR, URL management).

### Clean-param (Yandex)

```
Clean-param: utm_source&utm_medium&utm_campaign&utm_term&utm_content&ref&fbclid&gclid
```

## Output Format

- **Current state** (if auditing)
- **Recommended robots.txt** (full file)
- **Compliance checklist**
- **References**: [Google robots.txt](https://developers.google.com/search/docs/crawling-indexing/robots/create-robots-txt)

## Related Skills

- **indexing**: Full noindex page-type list; when to use noindex vs robots.txt; GSC indexing diagnosis
- **page-metadata**: Meta robots (noindex, nofollow) implementation
- **xml-sitemap**: Sitemap URL to reference in robots.txt
- **site-crawlability**: Broader crawl and structure guidance; AI crawler optimization
- **rendering-strategies**: SSR, SSG, CSR; content in initial HTML for crawlers

Related Skills

website-structure

313
from kostja94/marketing-skills

When the user wants to plan website structure, decide which pages to build, or prioritize pages for a new or existing site. Also use when the user mentions "website structure," "site structure," "which pages do I need," "page planning," "sitemap planning," "Must Have pages," "website architecture," or "site hierarchy." For a specific page template (e.g. homepage), use homepage-generator or landing-page-generator as appropriate. Not for organic SEO roadmap alone; use seo-strategy.

seo-strategy

313
from kostja94/marketing-skills

When the user wants to plan SEO strategy, prioritize SEO work, or understand the SEO workflow. Also use when the user mentions "SEO strategy," "SEO plan," "SEO roadmap," "SEO priority," "SEO audit," "SEO workflow," "where to start SEO," "SEO approach," "organic growth strategy," "why SEO," "SEO value," or "search strategy." For technical/crawl audit execution, use seo-audit. For keyword research, use keyword-research. For AI search visibility, use generative-engine-optimization.

seo-audit

313
from kostja94/marketing-skills

When the user wants to run an SEO audit, technical SEO audit, or site health check. Also use when the user mentions "SEO audit," "technical audit," "site audit," "crawl audit," "indexing audit," "SEO health," or "fix SEO issues." For prioritization and organic strategy, use seo-strategy. For GSC data analysis, use google-search-console.

retention-strategy

313
from kostja94/marketing-skills

When the user wants to reduce churn, improve customer retention, or plan lifecycle marketing. Also use when the user mentions "retention," "churn," "customer lifecycle," "churn prevention," "at-risk customers," or "loyalty program." For lifecycle, use growth-funnel.

research-sources

313
from kostja94/marketing-skills

When the user wants to find information sources for content ideation, competitor monitoring, or industry tracking. Also use when the user mentions "research sources," "information sources," "content ideation," "industry monitoring," "competitor monitoring," "market intelligence," "content research," or "topic research." For keywords, use keyword-research.

product-launch

313
from kostja94/marketing-skills

When the user wants to plan a product launch, execute launch channels, or create a launch checklist. Also use when the user mentions "product launch," "launch strategy," "product announcement," "launch channels," or "market launch." For GTM motion and positioning, use gtm-strategy. For cold start and first users, use cold-start-strategy. For Product Hunt day-of, use product-hunt-launch.

pmf-strategy

313
from kostja94/marketing-skills

When the user wants to validate product-market fit, measure PMF, or plan before scaling. Also use when the user mentions "PMF," "product-market fit," "product market fit," "Sean Ellis test," "very disappointed," "vitamin vs painkiller," "PMF validation," "premature scaling," or "validate before scale." For GTM after validation, use gtm-strategy.

indie-hacker-strategy

313
from kostja94/marketing-skills

When the user wants indie hacker or bootstrapping founder strategy—growth, channels, Build in Public, or solo founder tactics. Also use when the user mentions "indie hacker," "indie developer," "bootstrapping," "bootstrapped founder," "solo founder," "Build in Public," "scratch your own itch," "Micro-SaaS," "first 100 users," or "solo company." For cold start, use cold-start-strategy.

gtm-strategy

313
from kostja94/marketing-skills

When the user wants to plan go-to-market strategy, GTM framework, or market entry. Also use when the user mentions "GTM," "go-to-market," "market entry," "new market," "repositioning," "PLG," "sales-led," "product-led," "marketing-led," "ICP," "buyer persona," "GTM motion," or "market expansion." For launch checklist, use product-launch.

growth-funnel

313
from kostja94/marketing-skills

When the user wants to plan growth using the AARRR framework, diagnose growth bottlenecks, or map actions across the customer lifecycle. Also use when the user mentions "growth funnel," "AARRR," "pirate metrics," "acquisition activation retention," "customer lifecycle metrics," or "growth framework." For retention tactics, use retention-strategy.

conversion-optimization

313
from kostja94/marketing-skills

When the user wants to improve conversion rates, run A/B tests, optimize funnels, or reduce friction. Also use when the user mentions "CRO," "conversion rate optimization," "A/B test," "split test," "funnel optimization," "checkout optimization," "form optimization," or "conversion funnel." For pricing psychology, use pricing-strategy.

cold-start-strategy

313
from kostja94/marketing-skills

When the user wants to plan cold start, get first users, or launch a new product with zero traction. Also use when the user mentions "cold start," "cold start problem," "first users," "seed users," "finding users," "finding early users," "Fiverr Upwork," "comment outreach," "Twitter search users," "product launch strategy," "0 to 1 growth," "early-stage acquisition," "launch channels," "get first customers," "Product Hunt launch," "AppSumo," "LTD," "indie hacker," "bootstrapping," or "solo founder." For directory listing copy and submissions, use directory-submission. For Product Hunt day-of execution, use product-hunt-launch. For GTM motion design, use gtm-strategy.