tavily-crawl

Initiates a structured web crawl from a specified URL, following internal links to explore site content with configurable depth, breadth, and filtering options for targeted data extraction.

Instructions

A powerful web crawler that initiates a structured web crawl starting from a specified base URL. The crawler expands from that point like a tree, following internal links across pages. You can control how deep and wide it goes, and guide it to focus on specific sections of the site.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`allow_external`	No	Whether to allow following links that go to external domains
`categories`	No	Filter URLs using predefined categories like documentation, blog, api, etc
`extract_depth`	No	Advanced extraction retrieves more data, including tables and embedded content, with higher success but may increase latency	basic
`instructions`	No	Natural language instructions for the crawler
`limit`	No	Total number of links the crawler will process before stopping
`max_breadth`	No	Max number of links to follow per level of the tree (i.e., per page)
`max_depth`	No	Max depth of the crawl. Defines how far from the base URL the crawler can explore.
`select_domains`	No	Regex patterns to select crawling to specific domains or subdomains (e.g., ^docs\.example\.com$)
`select_paths`	No	Regex patterns to select only URLs with specific path patterns (e.g., /docs/., /api/v1.)
`url`	Yes	The root URL to begin the crawl

Implementation Reference

src/index.ts:764-790 (handler)
Main handler logic for executing the 'tavily-crawl' tool: constructs parameters from tool arguments, invokes tavilyClient.crawl(), formats results with formatCrawlResults, and returns MCP content response.
case 'tavily-crawl': { const crawlParams: TavilyCrawlParams = { url: args?.url as string, max_depth: (args?.max_depth as number) || 1, max_breadth: (args?.max_breadth as number) || 20, limit: (args?.limit as number) || 50, instructions: args?.instructions as string, select_paths: Array.isArray(args?.select_paths) ? args.select_paths : [], select_domains: Array.isArray(args?.select_domains) ? args.select_domains : [], allow_external: (args?.allow_external as boolean) || false, categories: Array.isArray(args?.categories) ? args.categories : [], extract_depth: (args?.extract_depth as 'basic' | 'advanced') || 'basic', format: (args?.format as 'markdown' | 'text') || 'markdown', include_favicon: (args?.include_favicon as boolean) || false }; const crawlResult = await this.tavilyClient.crawl(crawlParams); return { content: [ { type: 'text', text: formatCrawlResults(crawlResult), }, ], }; }
src/index.ts:489-556 (registration)
Registration of the 'tavily-crawl' tool in the tools list, including name, description, and complete inputSchema for MCP tool listing.
{ name: "tavily-crawl", description: "A powerful web crawler that initiates a structured web crawl starting from a specified base URL. The crawler expands from that point like a tree, following internal links across pages. You can control how deep and wide it goes, and guide it to focus on specific sections of the site.", inputSchema: { type: "object", properties: { url: { type: "string", description: "The root URL to begin the crawl" }, max_depth: { type: "integer", description: "Max depth of the crawl. Defines how far from the base URL the crawler can explore.", default: 1, minimum: 1 }, max_breadth: { type: "integer", description: "Max number of links to follow per level of the tree (i.e., per page)", default: 20, minimum: 1 }, limit: { type: "integer", description: "Total number of links the crawler will process before stopping", default: 50, minimum: 1 }, instructions: { type: "string", description: "Natural language instructions for the crawler" }, select_paths: { type: "array", items: { type: "string" }, description: "Regex patterns to select only URLs with specific path patterns (e.g., /docs/.*, /api/v1.*)", default: [] }, select_domains: { type: "array", items: { type: "string" }, description: "Regex patterns to select crawling to specific domains or subdomains (e.g., ^docs\\.example\\.com$)", default: [] }, allow_external: { type: "boolean", description: "Whether to allow following links that go to external domains", default: false }, categories: { type: "array", items: { type: "string", enum: ["Careers", "Blog", "Documentation", "About", "Pricing", "Community", "Developers", "Contact", "Media"] }, description: "Filter URLs using predefined categories like documentation, blog, api, etc", default: [] }, extract_depth: { type: "string", enum: ["basic", "advanced"], description: "Advanced extraction retrieves more data, including tables and embedded content, with higher success but may increase latency", default: "basic" } }, required: ["url"] } } as Tool,
src/tavilyClient.ts:281-283 (handler)
Core handler in TavilyClient that executes the crawl by making a POST request to Tavily API endpoint https://api.tavily.com/crawl with the provided parameters.
async crawl(params: TavilyCrawlParams): Promise<any> { return this.makeRequest(this.baseURLs.crawl, params); }
src/tavilyClient.ts:33-46 (schema)
TypeScript interface defining the input parameters for the tavily-crawl tool, matching the JSON schema.
export interface TavilyCrawlParams { url: string; max_depth?: number; max_breadth?: number; limit?: number; instructions?: string; select_paths?: string[]; select_domains?: string[]; allow_external?: boolean; categories?: string[]; extract_depth?: 'basic' | 'advanced'; format?: 'markdown' | 'text'; include_favicon?: boolean; }
src/index.ts:146-190 (helper)
Helper function that formats the raw crawl results from Tavily API into a string for the tool response.
export function formatCrawlResults(response: any): string { try { if (!response || typeof response !== 'object') { return 'Error: Invalid crawl response format'; } const output = []; output.push(`Crawl Results:`); output.push(`Base URL: ${sanitizeText(response.base_url, 500)}`); output.push('\nCrawled Pages:'); if (Array.isArray(response.results)) { response.results.slice(0, 20).forEach((page: any, index: number) => { output.push(`\n[${index + 1}] URL: ${sanitizeText(page.url, 500)}`); if (page.raw_content) { const cleanContent = sanitizeText(page.raw_content, 1000); const contentPreview = cleanContent.length > 300 ? cleanContent.substring(0, 300) + "... [内容已截断]" : cleanContent; output.push(`Content: ${contentPreview}`); } if (page.title) { output.push(`Title: ${sanitizeText(page.title, 200)}`); } }); } const result = output.join('\n'); // 限制总输出大小 if (result.length > 30000) { return result.substring(0, 30000) + '\n\n... [爬取结果过长，已截断]'; } return result; } catch (error) { console.error('Error formatting crawl results:', error); return 'Error: Failed to format crawl results'; } } export function formatMapResults(response: any): string { try { if (!response || typeof response !== 'object') { return 'Error: Invalid map response format';

Tavily MCP Load Balancer

tavily-crawl

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API