check_robots
Fetch and parse a domain's robots.txt to report which AI training and search crawlers are allowed or disallowed, including GPTBot, CCBot, and PerplexityBot.
Instructions
Fetch and parse a domain's robots.txt; report per-crawler allow/disallow posture for every known AI training crawler (GPTBot, CCBot, Anthropic-AI, Google-Extended, etc.), AI search crawlers (ChatGPT-User, PerplexityBot, OAI-SearchBot), and user-triggered fetchers.
Read-only. One HTTP GET to /robots.txt. No auth, no rate limits applied.
Deterministic, rule-based; no LLM. Returns structured findings with per-crawler status.
When to use: figuring out which AI crawlers a site blocks vs allows. Combine with check_sitemap for a full pre-crawl audit. Distinct from audit_page which evaluates a single URL; this evaluates a whole-domain policy.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| domain | Yes | Hostname or origin to inspect. Examples: `example.com`, `https://example.com`, `https://example.com/`. The tool fetches `https://<domain>/robots.txt` and reports per-crawler allow/disallow posture for all known AI training crawlers (GPTBot, CCBot, etc.), AI search crawlers (ChatGPT-User, PerplexityBot), and user-triggered fetchers. Read-only HTTP GET to /robots.txt only. |