crawl_site
Recursively crawl a website from a given URL, following links up to a maximum depth and page count, and return a short content summary for each page visited.
Instructions
Recursively crawl a website starting from a URL, following links up to a maximum depth and page count. Returns a short content summary for each page visited. Stays on the same domain by default.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| start_url | Yes | Page URL (http/https). Scheme-less input like `example.com` is allowed. | |
| max_depth | No | How many link-hops to follow from the start URL (0 = only the start page). | |
| max_pages | No | Hard cap on the total number of pages fetched. | |
| same_domain | No | Only follow links on the start URL's hostname. | |
| format | No | Output format for each page's content summary. | markdown |
| chars_per_page | No | Characters of content to include per page. | |
| delay_ms | No | Politeness delay between requests, in milliseconds. | |
| render | No | auto=fast static fetch, fall back to headless browser if the page needs JS; static=never use a browser; browser=always render with Playwright (handles SPAs). | auto |