Skip to main content
Glama
tavily-ai

Tavily MCP Server

Official
by tavily-ai

tavily-extract

Extract and process web content from URLs for data collection, content analysis, and research tasks, supporting multiple formats and extraction depths.

Instructions

A powerful web content extraction tool that retrieves and processes raw content from specified URLs, ideal for data collection, content analysis, and research tasks.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlsYesList of URLs to extract content from
extract_depthNoDepth of extraction - 'basic' or 'advanced', if usrls are linkedin use 'advanced' or if explicitly told to use advancedbasic
include_imagesNoInclude a list of images extracted from the urls in the response
formatNoThe format of the extracted web page content. markdown returns content in markdown format. text returns plain text and may increase latency.markdown
include_faviconNoWhether to include the favicon URL for each result
queryNoUser intent query for reranking extracted chunks based on relevance

Implementation Reference

  • The core handler function implementing the 'tavily-extract' tool logic. Makes a POST request to Tavily's extract API endpoint (https://api.tavily.com/extract) using axios with user-provided parameters and API key, returns the response data, and handles specific errors like 401 (invalid key) and 429 (rate limit).
    async extract(params: any): Promise<TavilyResponse> {
      try {
        const response = await this.axiosInstance.post(this.baseURLs.extract, {
          ...params,
          api_key: API_KEY
        });
        return response.data;
      } catch (error: any) {
        if (error.response?.status === 401) {
          throw new Error('Invalid API key');
        } else if (error.response?.status === 429) {
          throw new Error('Usage limit exceeded');
        }
        throw error;
      }
    }
  • Input schema definition for the 'tavily-extract' tool, including required 'urls' array, optional parameters like extract_depth (basic/advanced), include_images, format (markdown/text), include_favicon, and query for reranking.
    {
      name: "tavily-extract",
      description: "A powerful web content extraction tool that retrieves and processes raw content from specified URLs, ideal for data collection, content analysis, and research tasks.",
      inputSchema: {
        type: "object",
        properties: {
          urls: { 
            type: "array",
            items: { type: "string" },
            description: "List of URLs to extract content from"
          },
          extract_depth: { 
            type: "string",
            enum: ["basic","advanced"],
            description: "Depth of extraction - 'basic' or 'advanced', if usrls are linkedin use 'advanced' or if explicitly told to use advanced",
            default: "basic"
          },
          include_images: { 
            type: "boolean", 
            description: "Include a list of images extracted from the urls in the response",
            default: false,
          },
          format: {
            type: "string",
            enum: ["markdown","text"],
            description: "The format of the extracted web page content. markdown returns content in markdown format. text returns plain text and may increase latency.",
            default: "markdown"
          },
          include_favicon: { 
            type: "boolean", 
            description: "Whether to include the favicon URL for each result",
            default: false,
          },
          query: {
            type: "string",
            description: "User intent query for reranking extracted chunks based on relevance"
          },
        },
        required: ["urls"]
      }
    },
  • src/index.ts:443-452 (registration)
    Registration and dispatch logic for 'tavily-extract' in the CallToolRequestSchema switch statement. Parses arguments and invokes the extract handler method.
    case "tavily-extract":
      response = await this.extract({
        urls: args.urls,
        extract_depth: args.extract_depth,
        include_images: args.include_images,
        format: args.format,
        include_favicon: args.include_favicon,
        query: args.query,
      });
      break;
  • Helper function used to format the TavilyResponse from tavily-extract (and other tools) into a human-readable string with sections for answer, detailed results (title, URL, content, raw_content, favicon), and images.
    function formatResults(response: TavilyResponse): string {
      // Format API response into human-readable text
      const output: string[] = [];
    
      // Include answer if available
      if (response.answer) {
        output.push(`Answer: ${response.answer}`);
      }
    
      // Format detailed search results
      output.push('Detailed Results:');
      response.results.forEach(result => {
        output.push(`\nTitle: ${result.title}`);
        output.push(`URL: ${result.url}`);
        output.push(`Content: ${result.content}`);
        if (result.raw_content) {
          output.push(`Raw Content: ${result.raw_content}`);
        }
        if (result.favicon) {
          output.push(`Favicon: ${result.favicon}`);
        }
      });
    
        // Add images section if available
        if (response.images && response.images.length > 0) {
          output.push('\nImages:');
          response.images.forEach((image, index) => {
            if (typeof image === 'string') {
              output.push(`\n[${index + 1}] URL: ${image}`);
            } else {
              output.push(`\n[${index + 1}] URL: ${image.url}`);
              if (image.description) {
                output.push(`   Description: ${image.description}`);
              }
            }
          });
        }  
    
      return output.join('\n');
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It mentions the tool 'retrieves and processes raw content' but doesn't disclose critical behavioral traits: whether it requires authentication, rate limits, error handling, pagination, or what the response structure looks like. The description adds minimal context beyond the basic operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized with two concise sentences. The first sentence states the core functionality, and the second provides use cases. There's no wasted text, though it could be slightly more front-loaded with sibling differentiation.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 6 parameters, no annotations, and no output schema, the description is incomplete. It doesn't explain what the tool returns, error conditions, or behavioral constraints. For a web extraction tool with multiple configuration options and no structured output documentation, the description should provide more context about the extraction results and limitations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 6 parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema. It mentions general purpose but no parameter semantics. Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'retrieves and processes raw content from specified URLs' with specific verbs and resource. It mentions use cases like 'data collection, content analysis, and research tasks' which helps understanding. However, it doesn't explicitly differentiate from sibling tools like tavily-crawl or tavily-search, which likely have overlapping web-related functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus its siblings (tavily-crawl, tavily-map, tavily-search). It mentions the tool is 'ideal for data collection, content analysis, and research tasks' but doesn't specify contexts where alternatives might be better. There's no explicit when/when-not guidance or named alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tavily-ai/tavily-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server