scrape_full_site
Crawl every internal page of a website, clean HTML content, and optionally split into chunks for structured retrieval.
Instructions
End-to-end pipeline: crawl every internal page of a website, clean the HTML of each page, and optionally split into chunks. Returns a structured result with every page's title, clean content, links, metadata, and (if requested) text chunks. Handles both static and dynamic pages automatically.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | Root website URL to start from. | |
| chunk | No | Whether to split page content into chunks. | |
| clean | No | Whether to clean HTML before returning content. | |
| max_depth | No | Maximum crawl depth (default 2). |