crawl_website
BFS-crawl any website from a root URL, following internal links and respecting robots.txt. Extract each page's title, content, and links.
Instructions
BFS-crawl an entire website starting from the given root URL. Only follows internal (same-domain) links. Respects robots.txt. Avoids duplicate URLs. Limits crawl depth and total page count. Returns every scraped page with title, content, and links.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | Root URL to start crawling from. | |
| max_depth | No | How many link-hops deep to crawl (default 2). |