tavily-crawl
Crawl websites from a starting URL to extract structured content, controlling depth, breadth, and focus areas for targeted data collection.
Instructions
A powerful web crawler that initiates a structured web crawl starting from a specified base URL. The crawler expands from that point like a graph, following internal links across pages. You can control how deep and wide it goes, and guide it to focus on specific sections of the site.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The root URL to begin the crawl | |
| max_depth | No | Max depth of the crawl. Defines how far from the base URL the crawler can explore. | |
| max_breadth | No | Max number of links to follow per level of the tree (i.e., per page) | |
| limit | No | Total number of links the crawler will process before stopping | |
| instructions | No | Natural language instructions for the crawler. Instructions specify which types of pages the crawler should return. | |
| select_paths | No | Regex patterns to select only URLs with specific path patterns (e.g., /docs/.*, /api/v1.*) | |
| select_domains | No | Regex patterns to restrict crawling to specific domains or subdomains (e.g., ^docs\.example\.com$) | |
| allow_external | No | Whether to return external links in the final response | |
| extract_depth | No | Advanced extraction retrieves more data, including tables and embedded content, with higher success but may increase latency | basic |
| format | No | The format of the extracted web page content. markdown returns content in markdown format. text returns plain text and may increase latency. | markdown |
| include_favicon | No | Whether to include the favicon URL for each result |