Skip to main content
Glama

tavily-crawl

Crawl websites starting from a base URL to map site structure, follow internal links, and extract content with controlled depth and breadth parameters for comprehensive web analysis.

Instructions

A powerful web crawler that initiates a structured web crawl starting from a specified base URL. The crawler expands from that point like a tree, following internal links across pages. You can control how deep and wide it goes, and guide it to focus on specific sections of the site.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe root URL to begin the crawl
max_depthNoMax depth of the crawl. Defines how far from the base URL the crawler can explore.
max_breadthNoMax number of links to follow per level of the tree (i.e., per page)
limitNoTotal number of links the crawler will process before stopping
instructionsNoNatural language instructions for the crawler
select_pathsNoRegex patterns to select only URLs with specific path patterns (e.g., /docs/.*, /api/v1.*)
select_domainsNoRegex patterns to select crawling to specific domains or subdomains (e.g., ^docs\.example\.com$)
allow_externalNoWhether to allow following links that go to external domains
categoriesNoFilter URLs using predefined categories like documentation, blog, api, etc
extract_depthNoAdvanced extraction retrieves more data, including tables and embedded content, with higher success but may increase latencybasic

Implementation Reference

  • Core handler function that executes the tavily-crawl tool by making a POST request to the Tavily API's crawl endpoint with user parameters and handling authentication/rate limit errors.
    async crawl(params: any): Promise<TavilyCrawlResponse> { try { const response = await this.axiosInstance.post(this.baseURLs.crawl, { ...params, api_key: API_KEY }); return response.data; } catch (error: any) { if (error.response?.status === 401) { throw new Error('Invalid API key'); } else if (error.response?.status === 429) { throw new Error('Usage limit exceeded'); } throw error; } }
  • src/index.ts:216-283 (registration)
    Registration of the tavily-crawl tool in the ListTools response, including name, description, and detailed input schema for parameters like url, depth, breadth, limits, filters, etc.
    { name: "tavily-crawl", description: "A powerful web crawler that initiates a structured web crawl starting from a specified base URL. The crawler expands from that point like a tree, following internal links across pages. You can control how deep and wide it goes, and guide it to focus on specific sections of the site.", inputSchema: { type: "object", properties: { url: { type: "string", description: "The root URL to begin the crawl" }, max_depth: { type: "integer", description: "Max depth of the crawl. Defines how far from the base URL the crawler can explore.", default: 1, minimum: 1 }, max_breadth: { type: "integer", description: "Max number of links to follow per level of the tree (i.e., per page)", default: 20, minimum: 1 }, limit: { type: "integer", description: "Total number of links the crawler will process before stopping", default: 50, minimum: 1 }, instructions: { type: "string", description: "Natural language instructions for the crawler" }, select_paths: { type: "array", items: { type: "string" }, description: "Regex patterns to select only URLs with specific path patterns (e.g., /docs/.*, /api/v1.*)", default: [] }, select_domains: { type: "array", items: { type: "string" }, description: "Regex patterns to select crawling to specific domains or subdomains (e.g., ^docs\\.example\\.com$)", default: [] }, allow_external: { type: "boolean", description: "Whether to allow following links that go to external domains", default: false }, categories: { type: "array", items: { type: "string", enum: ["Careers", "Blog", "Documentation", "About", "Pricing", "Community", "Developers", "Contact", "Media"] }, description: "Filter URLs using predefined categories like documentation, blog, api, etc", default: [] }, extract_depth: { type: "string", enum: ["basic", "advanced"], description: "Advanced extraction retrieves more data, including tables and embedded content, with higher success but may increase latency", default: "basic" } }, required: ["url"] } },
  • Dispatch handler in the CallToolRequestSchema switch statement that invokes the crawl method with parsed arguments and formats the response using formatCrawlResults.
    case "tavily-crawl": const crawlResponse = await this.crawl({ url: args.url, max_depth: args.max_depth, max_breadth: args.max_breadth, limit: args.limit, instructions: args.instructions, select_paths: Array.isArray(args.select_paths) ? args.select_paths : [], select_domains: Array.isArray(args.select_domains) ? args.select_domains : [], allow_external: args.allow_external, categories: Array.isArray(args.categories) ? args.categories : [], extract_depth: args.extract_depth }); return { content: [{ type: "text", text: formatCrawlResults(crawlResponse) }] };
  • Helper function to format the crawl response into a human-readable string, including base URL and preview of crawled pages' content.
    function formatCrawlResults(response: TavilyCrawlResponse): string { const output: string[] = []; output.push(`Crawl Results:`); output.push(`Base URL: ${response.base_url}`); output.push('\nCrawled Pages:'); response.results.forEach((page, index) => { output.push(`\n[${index + 1}] URL: ${page.url}`); if (page.raw_content) { // Truncate content if it's too long const contentPreview = page.raw_content.length > 200 ? page.raw_content.substring(0, 200) + "..." : page.raw_content; output.push(`Content: ${contentPreview}`); } }); return output.join('\n'); }
  • TypeScript interface defining the expected response structure from the Tavily crawl API.
    interface TavilyCrawlResponse { base_url: string; results: Array<{ url: string; raw_content: string; }>; response_time: number; }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Jeetanshu18/tavily-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server