scrape_url
Scrape documentation websites by recursively following links, convert HTML to searchable markdown, and add all pages to a knowledge base.
Instructions
Scrape a documentation website and add all pages to the knowledge base. Supports recursive scraping of entire sites by following links. Great for ingesting online documentation like http://www.sidmusic.org/sid/. Converts HTML to searchable markdown.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | Starting URL to scrape (e.g., http://www.sidmusic.org/sid/) | |
| title | No | Base title for scraped documents (optional, defaults to page titles) | |
| tags | No | Tags for scraped documents (domain name auto-added) | |
| follow_links | No | Follow links to scrape sub-pages (default: true). Set to false to scrape only the single page. | |
| same_domain_only | No | Only follow links on the same domain (default: true). Prevents scraping external sites. | |
| max_pages | No | Maximum number of pages to scrape (default: 50) | |
| depth | No | Maximum link depth to follow (default: 3). Depth of 1=single page, 2=linked pages, 3=two levels deep. | |
| limit | No | Advanced: Only scrape URLs with this prefix (overrides same_domain_only) | |
| threads | No | Number of concurrent download threads (default: 10) | |
| delay | No | Delay between requests in milliseconds (default: 100) | |
| selector | No | CSS selector for main content (optional, auto-detected) |