tavily-extract
Extract and process web content from URLs for data collection, content analysis, and research tasks, supporting multiple formats and extraction depths.
Instructions
A powerful web content extraction tool that retrieves and processes raw content from specified URLs, ideal for data collection, content analysis, and research tasks.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| urls | Yes | List of URLs to extract content from | |
| extract_depth | No | Depth of extraction - 'basic' or 'advanced', if usrls are linkedin use 'advanced' or if explicitly told to use advanced | basic |
| include_images | No | Include a list of images extracted from the urls in the response | |
| format | No | The format of the extracted web page content. markdown returns content in markdown format. text returns plain text and may increase latency. | markdown |
| include_favicon | No | Whether to include the favicon URL for each result | |
| query | No | User intent query for reranking extracted chunks based on relevance |
Implementation Reference
- src/index.ts:593-608 (handler)The core handler function implementing the 'tavily-extract' tool logic. Makes a POST request to Tavily's extract API endpoint (https://api.tavily.com/extract) using axios with user-provided parameters and API key, returns the response data, and handles specific errors like 401 (invalid key) and 429 (rate limit).async extract(params: any): Promise<TavilyResponse> { try { const response = await this.axiosInstance.post(this.baseURLs.extract, { ...params, api_key: API_KEY }); return response.data; } catch (error: any) { if (error.response?.status === 401) { throw new Error('Invalid API key'); } else if (error.response?.status === 429) { throw new Error('Usage limit exceeded'); } throw error; } }
- src/index.ts:244-284 (schema)Input schema definition for the 'tavily-extract' tool, including required 'urls' array, optional parameters like extract_depth (basic/advanced), include_images, format (markdown/text), include_favicon, and query for reranking.{ name: "tavily-extract", description: "A powerful web content extraction tool that retrieves and processes raw content from specified URLs, ideal for data collection, content analysis, and research tasks.", inputSchema: { type: "object", properties: { urls: { type: "array", items: { type: "string" }, description: "List of URLs to extract content from" }, extract_depth: { type: "string", enum: ["basic","advanced"], description: "Depth of extraction - 'basic' or 'advanced', if usrls are linkedin use 'advanced' or if explicitly told to use advanced", default: "basic" }, include_images: { type: "boolean", description: "Include a list of images extracted from the urls in the response", default: false, }, format: { type: "string", enum: ["markdown","text"], description: "The format of the extracted web page content. markdown returns content in markdown format. text returns plain text and may increase latency.", default: "markdown" }, include_favicon: { type: "boolean", description: "Whether to include the favicon URL for each result", default: false, }, query: { type: "string", description: "User intent query for reranking extracted chunks based on relevance" }, }, required: ["urls"] } },
- src/index.ts:443-452 (registration)Registration and dispatch logic for 'tavily-extract' in the CallToolRequestSchema switch statement. Parses arguments and invokes the extract handler method.case "tavily-extract": response = await this.extract({ urls: args.urls, extract_depth: args.extract_depth, include_images: args.include_images, format: args.format, include_favicon: args.include_favicon, query: args.query, }); break;
- src/index.ts:645-684 (helper)Helper function used to format the TavilyResponse from tavily-extract (and other tools) into a human-readable string with sections for answer, detailed results (title, URL, content, raw_content, favicon), and images.function formatResults(response: TavilyResponse): string { // Format API response into human-readable text const output: string[] = []; // Include answer if available if (response.answer) { output.push(`Answer: ${response.answer}`); } // Format detailed search results output.push('Detailed Results:'); response.results.forEach(result => { output.push(`\nTitle: ${result.title}`); output.push(`URL: ${result.url}`); output.push(`Content: ${result.content}`); if (result.raw_content) { output.push(`Raw Content: ${result.raw_content}`); } if (result.favicon) { output.push(`Favicon: ${result.favicon}`); } }); // Add images section if available if (response.images && response.images.length > 0) { output.push('\nImages:'); response.images.forEach((image, index) => { if (typeof image === 'string') { output.push(`\n[${index + 1}] URL: ${image}`); } else { output.push(`\n[${index + 1}] URL: ${image.url}`); if (image.description) { output.push(` Description: ${image.description}`); } } }); } return output.join('\n'); }