zapfetch_crawl
Crawl a website to gather content from multiple pages. Returns a job ID for async polling. Best for whole-site extraction; for single pages use scrape, for URL discovery use map.
Instructions
Crawl a website and extract content from multiple pages. Use this when the user wants to gather content from an entire site or section. Returns a job_id for async polling. Long-running — consider asking the user before invoking if the site is large. For a single page, use zapfetch_scrape. For URL discovery only (no content), use zapfetch_map.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The root URL to start crawling from | |
| limit | No | Maximum pages to crawl (default 50) | |
| maxDiscoveryDepth | No | Maximum link depth from root URL | |
| includePaths | No | Only crawl URLs matching these path patterns (regex) | |
| excludePaths | No | Skip URLs matching these path patterns (regex) | |
| crawlEntireDomain | No | Crawl entire domain, not just subpath of root URL |