crawl
Extract website content by crawling from a seed URL, following links with configurable depth and page limits for structured data collection.
Instructions
Crawl a website starting from a seed URL, following links breadth-first up to a configurable depth and page limit.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| concurrency | No | Number of concurrent requests (default: 5) | |
| depth | No | Maximum link depth to follow (default: 2) | |
| format | No | Output format for each page: "markdown" (default), "llm", "text" | |
| max_pages | No | Maximum number of pages to crawl (default: 50) | |
| url | Yes | Seed URL to start crawling from | |
| use_sitemap | No | Seed the frontier from sitemap discovery before crawling |