crawl
Recursively follow on-domain links from a start URL to traverse a site, returning one Markdown record per page as JSON Lines. This is a single blocking call with a maximum page limit (default 50, cap 1000).
Instructions
Crawl a whole site: recursively follow on-domain links from the start URL and return one Markdown record per page as JSON Lines. WARNING: this is a single long BLOCKING call, bounded by max_pages (default 50, hard cap 1000). It emits MCP progress notifications per page for clients that render them (e.g. Claude Desktop); blocking clients like n8n just wait for the result. For a large site prefer 'map' to list URLs, then 'scrape' each URL, so every call stays short and you keep per-URL control.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The site root URL to crawl. | |
| browser | No | Use the headless browser for each page (JS-rendered sites). Default false. | |
| maxPages | No | Maximum pages to sweep. Default 50, hard cap 1000. |