crw_crawl
Crawl websites to extract structured data using JSON schemas. Start with a URL, set depth and page limits, then monitor progress with job IDs.
Instructions
Start an async crawl of a website. Returns a job ID that can be polled with crw_check_crawl_status.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| jsonSchema | No | JSON schema for LLM-based structured data extraction on each crawled page | |
| maxDepth | No | Maximum crawl depth (default: 2) | |
| maxPages | No | Maximum number of pages to crawl (default: 10) | |
| url | Yes | The starting URL to crawl |