Skip to main content
Glama
Jeetanshu18

Tavily MCP Server

by Jeetanshu18

tavily-crawl

Crawl websites systematically from a base URL to map structure and extract content. Control depth, breadth, and focus using filters, patterns, and instructions for targeted data collection.

Instructions

A powerful web crawler that initiates a structured web crawl starting from a specified base URL. The crawler expands from that point like a tree, following internal links across pages. You can control how deep and wide it goes, and guide it to focus on specific sections of the site.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe root URL to begin the crawl
max_depthNoMax depth of the crawl. Defines how far from the base URL the crawler can explore.
max_breadthNoMax number of links to follow per level of the tree (i.e., per page)
limitNoTotal number of links the crawler will process before stopping
instructionsNoNatural language instructions for the crawler
select_pathsNoRegex patterns to select only URLs with specific path patterns (e.g., /docs/.*, /api/v1.*)
select_domainsNoRegex patterns to select crawling to specific domains or subdomains (e.g., ^docs\.example\.com$)
allow_externalNoWhether to allow following links that go to external domains
categoriesNoFilter URLs using predefined categories like documentation, blog, api, etc
extract_depthNoAdvanced extraction retrieves more data, including tables and embedded content, with higher success but may increase latencybasic

Implementation Reference

  • Core handler function that performs the actual API call to Tavily's /crawl endpoint using axios, handling errors like invalid API key or rate limits.
    async crawl(params: any): Promise<TavilyCrawlResponse> {
      try {
        const response = await this.axiosInstance.post(this.baseURLs.crawl, {
          ...params,
          api_key: API_KEY
        });
        return response.data;
      } catch (error: any) {
        if (error.response?.status === 401) {
          throw new Error('Invalid API key');
        } else if (error.response?.status === 429) {
          throw new Error('Usage limit exceeded');
        }
        throw error;
      }
  • Dispatch handler in the CallToolRequestSchema switch statement that parses arguments, calls the crawl method, formats the response using formatCrawlResults, and returns MCP content.
    case "tavily-crawl":
      const crawlResponse = await this.crawl({
        url: args.url,
        max_depth: args.max_depth,
        max_breadth: args.max_breadth,
        limit: args.limit,
        instructions: args.instructions,
        select_paths: Array.isArray(args.select_paths) ? args.select_paths : [],
        select_domains: Array.isArray(args.select_domains) ? args.select_domains : [],
        allow_external: args.allow_external,
        categories: Array.isArray(args.categories) ? args.categories : [],
        extract_depth: args.extract_depth
      });
      return {
        content: [{
          type: "text",
          text: formatCrawlResults(crawlResponse)
        }]
      };
  • Tool schema definition including name, description, and detailed inputSchema with properties like url, max_depth, max_breadth, etc., registered in the ListTools response.
    {
      name: "tavily-crawl",
      description: "A powerful web crawler that initiates a structured web crawl starting from a specified base URL. The crawler expands from that point like a tree, following internal links across pages. You can control how deep and wide it goes, and guide it to focus on specific sections of the site.",
      inputSchema: {
        type: "object",
        properties: {
          url: {
            type: "string",
            description: "The root URL to begin the crawl"
          },
          max_depth: {
            type: "integer",
            description: "Max depth of the crawl. Defines how far from the base URL the crawler can explore.",
            default: 1,
            minimum: 1
          },
          max_breadth: {
            type: "integer",
            description: "Max number of links to follow per level of the tree (i.e., per page)",
            default: 20,
            minimum: 1
          },
          limit: {
            type: "integer",
            description: "Total number of links the crawler will process before stopping",
            default: 50,
            minimum: 1
          },
          instructions: {
            type: "string",
            description: "Natural language instructions for the crawler"
          },
          select_paths: {
            type: "array",
            items: { type: "string" },
            description: "Regex patterns to select only URLs with specific path patterns (e.g., /docs/.*, /api/v1.*)",
            default: []
          },
          select_domains: {
            type: "array",
            items: { type: "string" },
            description: "Regex patterns to select crawling to specific domains or subdomains (e.g., ^docs\\.example\\.com$)",
            default: []
          },
          allow_external: {
            type: "boolean",
            description: "Whether to allow following links that go to external domains",
            default: false
          },
          categories: {
            type: "array",
            items: { 
              type: "string",
              enum: ["Careers", "Blog", "Documentation", "About", "Pricing", "Community", "Developers", "Contact", "Media"]
            },
            description: "Filter URLs using predefined categories like documentation, blog, api, etc",
            default: []
          },
          extract_depth: {
            type: "string",
            enum: ["basic", "advanced"],
            description: "Advanced extraction retrieves more data, including tables and embedded content, with higher success but may increase latency",
            default: "basic"
          }
        },
        required: ["url"]
      }
    },
  • Helper function to format the crawl API response into a human-readable string for the MCP tool output.
    function formatCrawlResults(response: TavilyCrawlResponse): string {
      const output: string[] = [];
      
      output.push(`Crawl Results:`);
      output.push(`Base URL: ${response.base_url}`);
      
      output.push('\nCrawled Pages:');
      response.results.forEach((page, index) => {
        output.push(`\n[${index + 1}] URL: ${page.url}`);
        if (page.raw_content) {
          // Truncate content if it's too long
          const contentPreview = page.raw_content.length > 200 
            ? page.raw_content.substring(0, 200) + "..." 
            : page.raw_content;
          output.push(`Content: ${contentPreview}`);
        }
      });
      
      return output.join('\n');
    }
  • src/index.ts:108-348 (registration)
    Registration of the tavily-crawl tool in the ListToolsRequestSchema handler, including it in the tools array returned to clients.
    this.server.setRequestHandler(ListToolsRequestSchema, async () => {
      // Define available tools: tavily-search and tavily-extract
      const tools: Tool[] = [
        {
          name: "tavily-search",
          description: "A powerful web search tool that provides comprehensive, real-time results using Tavily's AI search engine. Returns relevant web content with customizable parameters for result count, content type, and domain filtering. Ideal for gathering current information, news, and detailed web content analysis.",
          inputSchema: {
            type: "object",
            properties: {
              query: { 
                type: "string", 
                description: "Search query" 
              },
              search_depth: {
                type: "string",
                enum: ["basic","advanced"],
                description: "The depth of the search. It can be 'basic' or 'advanced'",
                default: "basic"
              },
              topic : {
                type: "string",
                enum: ["general","news"],
                description: "The category of the search. This will determine which of our agents will be used for the search",
                default: "general"
              },
              days: {
                type: "number",
                description: "The number of days back from the current date to include in the search results. This specifies the time frame of data to be retrieved. Please note that this feature is only available when using the 'news' search topic",
                default: 3
              },
              time_range: {
                type: "string",
                description: "The time range back from the current date to include in the search results. This feature is available for both 'general' and 'news' search topics",
                enum: ["day", "week", "month", "year", "d", "w", "m", "y"],
              },
              max_results: { 
                type: "number", 
                description: "The maximum number of search results to return",
                default: 10,
                minimum: 5,
                maximum: 20
              },
              include_images: { 
                type: "boolean", 
                description: "Include a list of query-related images in the response",
                default: false,
              },
              include_image_descriptions: { 
                type: "boolean", 
                description: "Include a list of query-related images and their descriptions in the response",
                default: false,
              },
              /*
              // Since the mcp server is using AI clients to generate answers form the search results, we don't need to include this feature.
              include_answer: { 
                type: ["boolean", "string"],
                enum: [true, false, "basic", "advanced"],
                description: "Include an answer to original query, generated by an LLM based on Tavily's search results. Can be boolean or string ('basic'/'advanced'). 'basic'/true answer will be quick but less detailed, 'advanced' answer will be more detailed but take longer to generate",
                default: false,
              },
              */
              include_raw_content: { 
                type: "boolean", 
                description: "Include the cleaned and parsed HTML content of each search result",
                default: false,
              },
              include_domains: {
                type: "array",
                items: { type: "string" },
                description: "A list of domains to specifically include in the search results, if the user asks to search on specific sites set this to the domain of the site",
                default: []
              },
              exclude_domains: {
                type: "array",
                items: { type: "string" },
                description: "List of domains to specifically exclude, if the user asks to exclude a domain set this to the domain of the site",
                default: []
              }
            },
            required: ["query"]
          }
        },
        {
          name: "tavily-extract",
          description: "A powerful web content extraction tool that retrieves and processes raw content from specified URLs, ideal for data collection, content analysis, and research tasks.",
          inputSchema: {
            type: "object",
            properties: {
              urls: { 
                type: "array",
                items: { type: "string" },
                description: "List of URLs to extract content from"
              },
              extract_depth: { 
                type: "string",
                enum: ["basic","advanced"],
                description: "Depth of extraction - 'basic' or 'advanced', if usrls are linkedin use 'advanced' or if explicitly told to use advanced",
                default: "basic"
              },
              include_images: { 
                type: "boolean", 
                description: "Include a list of images extracted from the urls in the response",
                default: false,
              }
            },
            required: ["urls"]
          }
        },
        {
          name: "tavily-crawl",
          description: "A powerful web crawler that initiates a structured web crawl starting from a specified base URL. The crawler expands from that point like a tree, following internal links across pages. You can control how deep and wide it goes, and guide it to focus on specific sections of the site.",
          inputSchema: {
            type: "object",
            properties: {
              url: {
                type: "string",
                description: "The root URL to begin the crawl"
              },
              max_depth: {
                type: "integer",
                description: "Max depth of the crawl. Defines how far from the base URL the crawler can explore.",
                default: 1,
                minimum: 1
              },
              max_breadth: {
                type: "integer",
                description: "Max number of links to follow per level of the tree (i.e., per page)",
                default: 20,
                minimum: 1
              },
              limit: {
                type: "integer",
                description: "Total number of links the crawler will process before stopping",
                default: 50,
                minimum: 1
              },
              instructions: {
                type: "string",
                description: "Natural language instructions for the crawler"
              },
              select_paths: {
                type: "array",
                items: { type: "string" },
                description: "Regex patterns to select only URLs with specific path patterns (e.g., /docs/.*, /api/v1.*)",
                default: []
              },
              select_domains: {
                type: "array",
                items: { type: "string" },
                description: "Regex patterns to select crawling to specific domains or subdomains (e.g., ^docs\\.example\\.com$)",
                default: []
              },
              allow_external: {
                type: "boolean",
                description: "Whether to allow following links that go to external domains",
                default: false
              },
              categories: {
                type: "array",
                items: { 
                  type: "string",
                  enum: ["Careers", "Blog", "Documentation", "About", "Pricing", "Community", "Developers", "Contact", "Media"]
                },
                description: "Filter URLs using predefined categories like documentation, blog, api, etc",
                default: []
              },
              extract_depth: {
                type: "string",
                enum: ["basic", "advanced"],
                description: "Advanced extraction retrieves more data, including tables and embedded content, with higher success but may increase latency",
                default: "basic"
              }
            },
            required: ["url"]
          }
        },
        {
          name: "tavily-map",
          description: "A powerful web mapping tool that creates a structured map of website URLs, allowing you to discover and analyze site structure, content organization, and navigation paths. Perfect for site audits, content discovery, and understanding website architecture.",
          inputSchema: {
            type: "object",
            properties: {
              url: { 
                type: "string", 
                description: "The root URL to begin the mapping"
              },
              max_depth: {
                type: "integer",
                description: "Max depth of the mapping. Defines how far from the base URL the crawler can explore",
                default: 1,
                minimum: 1
              },
              max_breadth: {
                type: "integer",
                description: "Max number of links to follow per level of the tree (i.e., per page)",
                default: 20,
                minimum: 1
              },
              limit: {
                type: "integer",
                description: "Total number of links the crawler will process before stopping",
                default: 50,
                minimum: 1
              },
              instructions: {
                type: "string",
                description: "Natural language instructions for the crawler"
              },
              select_paths: {
                type: "array",
                items: { type: "string" },
                description: "Regex patterns to select only URLs with specific path patterns (e.g., /docs/.*, /api/v1.*)",
                default: []
              },
              select_domains: {
                type: "array",
                items: { type: "string" },
                description: "Regex patterns to select crawling to specific domains or subdomains (e.g., ^docs\\.example\\.com$)",
                default: []
              },
              allow_external: {
                type: "boolean",
                description: "Whether to allow following links that go to external domains",
                default: false
              },
              categories: {
                type: "array",
                items: { 
                  type: "string",
                  enum: ["Careers", "Blog", "Documentation", "About", "Pricing", "Community", "Developers", "Contact", "Media"]
                },
                description: "Filter URLs using predefined categories like documentation, blog, api, etc",
                default: []
              }
            },
            required: ["url"]
          }
        },
      ];
      return { tools };
    });
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It describes the crawling behavior ('expands like a tree', 'following internal links'), scope control ('how deep and wide'), and guidance capabilities ('focus on specific sections'). However, it doesn't mention important behavioral aspects like rate limits, authentication needs, error handling, or what the output looks like.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise with three sentences that each earn their place: first establishes the core crawling functionality, second explains the expansion behavior, third describes the control mechanisms. No wasted words and front-loaded with the main purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 10 parameters and no annotations or output schema, the description provides good basic context about the crawling behavior but lacks important details about what the tool returns, error conditions, performance characteristics, or how it differs meaningfully from its sibling tools in practical use cases.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 10 parameters thoroughly. The description adds some context about the crawling approach ('tree expansion', 'internal links', 'specific sections') that helps understand the parameters' purpose, but doesn't provide additional syntax, format, or usage details beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'initiates a structured web crawl starting from a specified base URL' with specific verbs ('crawl', 'expands', 'following') and resources ('web', 'internal links', 'pages'). It distinguishes from sibling tools by focusing on crawling rather than extraction, mapping, or searching.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('structured web crawl', 'following internal links', 'control how deep and wide it goes'), but doesn't explicitly mention when not to use it or name alternatives like tavily-extract, tavily-map, or tavily-search for different scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Jeetanshu18/tavily-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server