crawl_site
Crawl a website to generate a manifest of pages with titles and snippets. Full page content is cached and accessible via fetch_url for later retrieval.
Instructions
Crawl a site and return a manifest of pages with titles and snippets. Full page content is cached — call fetch_url on any page URL for the full text. Strategy: Firecrawl (JS rendering) → sitemap-first → BFS (if enabled).
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | Base URL to crawl | |
| max_pages | No | Maximum pages to crawl (default 20, max 100) | |
| same_domain_only | No | Restrict crawl to the same domain (default true) | |
| include_path | No | Only include URLs matching this path prefix (e.g. '/docs') | |
| exclude_path | No | Exclude URLs matching this path prefix (e.g. '/blog') |