crw_crawl
Start an asynchronous website crawl to collect pages and optionally extract structured data using a JSON schema. Returns a job ID for status polling.
Instructions
Start an async crawl of a website. Returns a job ID that can be polled with crw_check_crawl_status.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| jsonSchema | No | JSON schema for LLM-based structured data extraction on each crawled page | |
| maxDepth | No | Maximum crawl depth (default: 2) | |
| maxPages | No | Maximum number of pages to crawl (default: 10) | |
| renderJs | No | Render JavaScript on every crawled page (true = force JS, false = HTTP only, omit = auto-detect or use the server's render_js_default) | |
| renderer | No | Pin every crawled page to a specific renderer. "auto" (default if omitted) uses the configured fallback chain. Other values hard-pin with no fallback. Pinning a non-auto value implies renderJs:true unless renderJs:false is set explicitly. | |
| url | Yes | The starting URL to crawl | |
| waitFor | No | Milliseconds to wait after JS rendering on each page |