tavily-crawl
Initiate a structured web crawl from a base URL, following internal links with customizable depth, breadth, and focus. Extract content based on specific paths, domains, or predefined categories using advanced or basic extraction.
Instructions
A powerful web crawler that initiates a structured web crawl starting from a specified base URL. The crawler expands from that point like a tree, following internal links across pages. You can control how deep and wide it goes, and guide it to focus on specific sections of the site.
Input Schema
Name | Required | Description | Default |
---|---|---|---|
allow_external | No | Whether to allow following links that go to external domains | |
categories | No | Filter URLs using predefined categories like documentation, blog, api, etc | |
extract_depth | No | Advanced extraction retrieves more data, including tables and embedded content, with higher success but may increase latency | basic |
instructions | No | Natural language instructions for the crawler | |
limit | No | Total number of links the crawler will process before stopping | |
max_breadth | No | Max number of links to follow per level of the tree (i.e., per page) | |
max_depth | No | Max depth of the crawl. Defines how far from the base URL the crawler can explore. | |
select_domains | No | Regex patterns to select crawling to specific domains or subdomains (e.g., ^docs\.example\.com$) | |
select_paths | No | Regex patterns to select only URLs with specific path patterns (e.g., /docs/.*, /api/v1.*) | |
url | Yes | The root URL to begin the crawl |