tavily-crawl
Crawl websites from a starting URL to extract structured content, controlling depth, breadth, and focus areas for targeted data collection.
Instructions
A powerful web crawler that initiates a structured web crawl starting from a specified base URL. The crawler expands from that point like a graph, following internal links across pages. You can control how deep and wide it goes, and guide it to focus on specific sections of the site.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The root URL to begin the crawl | |
| max_depth | No | Max depth of the crawl. Defines how far from the base URL the crawler can explore. | |
| max_breadth | No | Max number of links to follow per level of the tree (i.e., per page) | |
| limit | No | Total number of links the crawler will process before stopping | |
| instructions | No | Natural language instructions for the crawler. Instructions specify which types of pages the crawler should return. | |
| select_paths | No | Regex patterns to select only URLs with specific path patterns (e.g., /docs/.*, /api/v1.*) | |
| select_domains | No | Regex patterns to restrict crawling to specific domains or subdomains (e.g., ^docs\.example\.com$) | |
| allow_external | No | Whether to return external links in the final response | |
| extract_depth | No | Advanced extraction retrieves more data, including tables and embedded content, with higher success but may increase latency | basic |
| format | No | The format of the extracted web page content. markdown returns content in markdown format. text returns plain text and may increase latency. | markdown |
| include_favicon | No | Whether to include the favicon URL for each result |
Implementation Reference
- src/index.ts:610-625 (handler)Core handler function that executes the tavily-crawl tool logic by making an API call to the Tavily crawl endpoint.async crawl(params: any): Promise<TavilyCrawlResponse> { try { const response = await this.axiosInstance.post(this.baseURLs.crawl, { ...params, api_key: API_KEY }); return response.data; } catch (error: any) { if (error.response?.status === 401) { throw new Error('Invalid API key'); } else if (error.response?.status === 429) { throw new Error('Usage limit exceeded'); } throw error; } }
- src/index.ts:285-354 (schema)Input schema definition for the tavily-crawl tool, including parameters like url, max_depth, instructions, etc.{ name: "tavily-crawl", description: "A powerful web crawler that initiates a structured web crawl starting from a specified base URL. The crawler expands from that point like a graph, following internal links across pages. You can control how deep and wide it goes, and guide it to focus on specific sections of the site.", inputSchema: { type: "object", properties: { url: { type: "string", description: "The root URL to begin the crawl" }, max_depth: { type: "integer", description: "Max depth of the crawl. Defines how far from the base URL the crawler can explore.", default: 1, minimum: 1 }, max_breadth: { type: "integer", description: "Max number of links to follow per level of the tree (i.e., per page)", default: 20, minimum: 1 }, limit: { type: "integer", description: "Total number of links the crawler will process before stopping", default: 50, minimum: 1 }, instructions: { type: "string", description: "Natural language instructions for the crawler. Instructions specify which types of pages the crawler should return." }, select_paths: { type: "array", items: { type: "string" }, description: "Regex patterns to select only URLs with specific path patterns (e.g., /docs/.*, /api/v1.*)", default: [] }, select_domains: { type: "array", items: { type: "string" }, description: "Regex patterns to restrict crawling to specific domains or subdomains (e.g., ^docs\\.example\\.com$)", default: [] }, allow_external: { type: "boolean", description: "Whether to return external links in the final response", default: true }, extract_depth: { type: "string", enum: ["basic", "advanced"], description: "Advanced extraction retrieves more data, including tables and embedded content, with higher success but may increase latency", default: "basic" }, format: { type: "string", enum: ["markdown","text"], description: "The format of the extracted web page content. markdown returns content in markdown format. text returns plain text and may increase latency.", default: "markdown" }, include_favicon: { type: "boolean", description: "Whether to include the favicon URL for each result", default: false, }, }, required: ["url"] } },
- src/index.ts:454-475 (handler)Tool dispatch handler case that processes arguments, calls the crawl method, formats the response, and returns MCP content.case "tavily-crawl": const crawlResponse = await this.crawl({ url: args.url, max_depth: args.max_depth, max_breadth: args.max_breadth, limit: args.limit, instructions: args.instructions, select_paths: Array.isArray(args.select_paths) ? args.select_paths : [], select_domains: Array.isArray(args.select_domains) ? args.select_domains : [], allow_external: args.allow_external, extract_depth: args.extract_depth, format: args.format, include_favicon: args.include_favicon, chunks_per_source: 3, }); return { content: [{ type: "text", text: formatCrawlResults(crawlResponse) }] };
- src/index.ts:686-708 (helper)Helper function to format the Tavily crawl response into readable text output.function formatCrawlResults(response: TavilyCrawlResponse): string { const output: string[] = []; output.push(`Crawl Results:`); output.push(`Base URL: ${response.base_url}`); output.push('\nCrawled Pages:'); response.results.forEach((page, index) => { output.push(`\n[${index + 1}] URL: ${page.url}`); if (page.raw_content) { // Truncate content if it's too long const contentPreview = page.raw_content.length > 200 ? page.raw_content.substring(0, 200) + "..." : page.raw_content; output.push(`Content: ${contentPreview}`); } if (page.favicon) { output.push(`Favicon: ${page.favicon}`); } }); return output.join('\n'); }
- src/index.ts:735-737 (registration)Tool registration in the CLI listTools function.name: "tavily-crawl", description: "A sophisticated web crawler that systematically explores websites starting from a base URL. Features include configurable depth and breadth limits, domain filtering, path pattern matching, and category-based filtering. Perfect for comprehensive site analysis, content discovery, and structured data collection." },