tavily-extract
Extract and process raw web content from URLs for data collection, content analysis, and research tasks with configurable depth and image options.
Instructions
A powerful web content extraction tool that retrieves and processes raw content from specified URLs, ideal for data collection, content analysis, and research tasks.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| urls | Yes | List of URLs to extract content from | |
| extract_depth | No | Depth of extraction - 'basic' or 'advanced', if usrls are linkedin use 'advanced' or if explicitly told to use advanced | basic |
| include_images | No | Include a list of images extracted from the urls in the response |
Implementation Reference
- src/index.ts:290-305 (handler)Core handler function for the 'tavily-extract' tool. Makes a POST request to the Tavily extract API endpoint using the provided parameters (urls, extract_depth, include_images), handles API errors, and returns the response data.async extract(params: any): Promise<TavilyResponse> { try { const response = await this.axiosInstance.post(this.baseURLs.extract, { ...params, api_key: API_KEY }); return response.data; } catch (error: any) { if (error.response?.status === 401) { throw new Error('Invalid API key'); } else if (error.response?.status === 429) { throw new Error('Usage limit exceeded'); } throw error; } }
- src/index.ts:174-195 (schema)Input schema for the 'tavily-extract' tool, defining parameters: urls (required array of strings), extract_depth (enum basic/advanced, default basic), include_images (boolean, default false).inputSchema: { type: "object", properties: { urls: { type: "array", items: { type: "string" }, description: "List of URLs to extract content from" }, extract_depth: { type: "string", enum: ["basic","advanced"], description: "Depth of extraction - 'basic' or 'advanced', if usrls are linkedin use 'advanced' or if explicitly told to use advanced", default: "basic" }, include_images: { type: "boolean", description: "Include a list of images extracted from the urls in the response", default: false, } }, required: ["urls"] }
- src/index.ts:171-196 (registration)Registration of the 'tavily-extract' tool in the ListToolsRequestSchema handler, including name, description, and input schema.{ name: "tavily-extract", description: "A powerful web content extraction tool that retrieves and processes raw content from specified URLs, ideal for data collection, content analysis, and research tasks.", inputSchema: { type: "object", properties: { urls: { type: "array", items: { type: "string" }, description: "List of URLs to extract content from" }, extract_depth: { type: "string", enum: ["basic","advanced"], description: "Depth of extraction - 'basic' or 'advanced', if usrls are linkedin use 'advanced' or if explicitly told to use advanced", default: "basic" }, include_images: { type: "boolean", description: "Include a list of images extracted from the urls in the response", default: false, } }, required: ["urls"] } },
- src/index.ts:223-229 (handler)Dispatch logic in the CallToolRequestSchema handler that invokes the extract method when 'tavily-extract' is called.case "tavily-extract": response = await this.extract({ urls: args.urls, extract_depth: args.extract_depth, include_images: args.include_images }); break;
- src/index.ts:18-35 (helper)TypeScript interface defining the structure of the Tavily API response used by the tavily-extract tool.interface TavilyResponse { // Response structure from Tavily API query: string; follow_up_questions?: Array<string>; answer?: string; images?: Array<string | { url: string; description?: string; }>; results: Array<{ title: string; url: string; content: string; score: number; published_date?: string; raw_content?: string; }>; }