Skip to main content
Glama
Jeetanshu18

Tavily MCP Server

by Jeetanshu18

tavily-extract

Extract and process raw web content from URLs for data collection, content analysis, and research tasks using basic or advanced extraction methods.

Instructions

A powerful web content extraction tool that retrieves and processes raw content from specified URLs, ideal for data collection, content analysis, and research tasks.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlsYesList of URLs to extract content from
extract_depthNoDepth of extraction - 'basic' or 'advanced', if usrls are linkedin use 'advanced' or if explicitly told to use advancedbasic
include_imagesNoInclude a list of images extracted from the urls in the response

Implementation Reference

  • Core handler function that executes the tavily-extract tool by sending a POST request to the Tavily extract API with the provided parameters.
    async extract(params: any): Promise<TavilyResponse> {
      try {
        const response = await this.axiosInstance.post(this.baseURLs.extract, {
          ...params,
          api_key: API_KEY
        });
        return response.data;
      } catch (error: any) {
        if (error.response?.status === 401) {
          throw new Error('Invalid API key');
        } else if (error.response?.status === 429) {
          throw new Error('Usage limit exceeded');
        }
        throw error;
      }
    }
  • Defines the input schema, description, and name for the tavily-extract tool, used in tool listing and validation.
    {
      name: "tavily-extract",
      description: "A powerful web content extraction tool that retrieves and processes raw content from specified URLs, ideal for data collection, content analysis, and research tasks.",
      inputSchema: {
        type: "object",
        properties: {
          urls: { 
            type: "array",
            items: { type: "string" },
            description: "List of URLs to extract content from"
          },
          extract_depth: { 
            type: "string",
            enum: ["basic","advanced"],
            description: "Depth of extraction - 'basic' or 'advanced', if usrls are linkedin use 'advanced' or if explicitly told to use advanced",
            default: "basic"
          },
          include_images: { 
            type: "boolean", 
            description: "Include a list of images extracted from the urls in the response",
            default: false,
          }
        },
        required: ["urls"]
      }
    },
  • Dispatcher case in the CallToolRequest handler that invokes the extract method for tavily-extract tool calls.
    case "tavily-extract":
      response = await this.extract({
        urls: args.urls,
        extract_depth: args.extract_depth,
        include_images: args.include_images
      });
      break;
  • Helper function to format the Tavily API response (used by tavily-extract and search) into a readable text output for the tool response.
    function formatResults(response: TavilyResponse): string {
      // Format API response into human-readable text
      const output: string[] = [];
    
      // Include answer if available
      if (response.answer) {
        output.push(`Answer: ${response.answer}`);
      }
    
      // Format detailed search results
      output.push('Detailed Results:');
      response.results.forEach(result => {
        output.push(`\nTitle: ${result.title}`);
        output.push(`URL: ${result.url}`);
        output.push(`Content: ${result.content}`);
        if (result.raw_content) {
          output.push(`Raw Content: ${result.raw_content}`);
        }
      });
    
        // Add images section if available
        if (response.images && response.images.length > 0) {
          output.push('\nImages:');
          response.images.forEach((image, index) => {
            if (typeof image === 'string') {
              output.push(`\n[${index + 1}] URL: ${image}`);
            } else {
              output.push(`\n[${index + 1}] URL: ${image.url}`);
              if (image.description) {
                output.push(`   Description: ${image.description}`);
              }
            }
          });
        }  
    
      return output.join('\n');
    }
  • src/index.ts:109-215 (registration)
    Registers tavily-extract in the list of available tools returned by ListToolsRequest.
    // Define available tools: tavily-search and tavily-extract
    const tools: Tool[] = [
      {
        name: "tavily-search",
        description: "A powerful web search tool that provides comprehensive, real-time results using Tavily's AI search engine. Returns relevant web content with customizable parameters for result count, content type, and domain filtering. Ideal for gathering current information, news, and detailed web content analysis.",
        inputSchema: {
          type: "object",
          properties: {
            query: { 
              type: "string", 
              description: "Search query" 
            },
            search_depth: {
              type: "string",
              enum: ["basic","advanced"],
              description: "The depth of the search. It can be 'basic' or 'advanced'",
              default: "basic"
            },
            topic : {
              type: "string",
              enum: ["general","news"],
              description: "The category of the search. This will determine which of our agents will be used for the search",
              default: "general"
            },
            days: {
              type: "number",
              description: "The number of days back from the current date to include in the search results. This specifies the time frame of data to be retrieved. Please note that this feature is only available when using the 'news' search topic",
              default: 3
            },
            time_range: {
              type: "string",
              description: "The time range back from the current date to include in the search results. This feature is available for both 'general' and 'news' search topics",
              enum: ["day", "week", "month", "year", "d", "w", "m", "y"],
            },
            max_results: { 
              type: "number", 
              description: "The maximum number of search results to return",
              default: 10,
              minimum: 5,
              maximum: 20
            },
            include_images: { 
              type: "boolean", 
              description: "Include a list of query-related images in the response",
              default: false,
            },
            include_image_descriptions: { 
              type: "boolean", 
              description: "Include a list of query-related images and their descriptions in the response",
              default: false,
            },
            /*
            // Since the mcp server is using AI clients to generate answers form the search results, we don't need to include this feature.
            include_answer: { 
              type: ["boolean", "string"],
              enum: [true, false, "basic", "advanced"],
              description: "Include an answer to original query, generated by an LLM based on Tavily's search results. Can be boolean or string ('basic'/'advanced'). 'basic'/true answer will be quick but less detailed, 'advanced' answer will be more detailed but take longer to generate",
              default: false,
            },
            */
            include_raw_content: { 
              type: "boolean", 
              description: "Include the cleaned and parsed HTML content of each search result",
              default: false,
            },
            include_domains: {
              type: "array",
              items: { type: "string" },
              description: "A list of domains to specifically include in the search results, if the user asks to search on specific sites set this to the domain of the site",
              default: []
            },
            exclude_domains: {
              type: "array",
              items: { type: "string" },
              description: "List of domains to specifically exclude, if the user asks to exclude a domain set this to the domain of the site",
              default: []
            }
          },
          required: ["query"]
        }
      },
      {
        name: "tavily-extract",
        description: "A powerful web content extraction tool that retrieves and processes raw content from specified URLs, ideal for data collection, content analysis, and research tasks.",
        inputSchema: {
          type: "object",
          properties: {
            urls: { 
              type: "array",
              items: { type: "string" },
              description: "List of URLs to extract content from"
            },
            extract_depth: { 
              type: "string",
              enum: ["basic","advanced"],
              description: "Depth of extraction - 'basic' or 'advanced', if usrls are linkedin use 'advanced' or if explicitly told to use advanced",
              default: "basic"
            },
            include_images: { 
              type: "boolean", 
              description: "Include a list of images extracted from the urls in the response",
              default: false,
            }
          },
          required: ["urls"]
        }
      },
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states the tool 'retrieves and processes raw content' but lacks details on rate limits, authentication needs, error handling, or what 'processes' entails (e.g., formatting, filtering). For a web content extraction tool with potential complexities, this is insufficient behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and front-loaded with the core purpose in the first clause. It uses two sentences efficiently without redundancy. However, the second sentence ('ideal for data collection...') could be more specific to earn a perfect score, but overall it's well-structured and avoids waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 3 parameters with full schema coverage but no annotations or output schema, the description is moderately complete. It covers the basic purpose but lacks behavioral details (e.g., rate limits, error handling) and output information. For a web extraction tool with sibling alternatives, more context on differences and usage would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all parameters (urls, extract_depth, include_images). The description adds no parameter-specific information beyond what's in the schema, such as explaining why 'advanced' is needed for LinkedIn or what 'basic' vs. 'advanced' extraction entails. Baseline 3 is appropriate since the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'retrieves and processes raw content from specified URLs' with specific verbs and resources. It distinguishes from siblings by focusing on extraction rather than crawling, mapping, or searching, though it doesn't explicitly contrast with them. The mention of 'data collection, content analysis, and research tasks' adds context but doesn't fully differentiate from sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no explicit guidance on when to use this tool versus its siblings (tavily-crawl, tavily-map, tavily-search). It mentions general use cases like 'data collection, content analysis, and research tasks' but offers no when/when-not rules or alternatives. This leaves the agent without clear direction for tool selection among similar options.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Jeetanshu18/tavily-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server