tavily_extract_process
Extract web content from URLs into clean text for analysis, data collection, and research with configurable depth and image options.
Instructions
Extract web page content from single or multiple URLs using Tavily Extract. Efficiently converts web content into clean, processable text with configurable extraction depth and optional image extraction. Returns both combined and individual URL content. Best for content analysis, data collection, and research.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL(s) | |
| extract_depth | No | Extraction depth |
Implementation Reference
- Core handler logic in TavilyExtractProvider.process_content: validates URLs, calls Tavily Extract API via POST with urls and extract_depth, processes response into combined content, raw_contents per URL, metadata (word_count, failed_urls, etc.), with error handling and retry.async process_content( url: string | string[], extract_depth: 'basic' | 'advanced' = 'basic', ): Promise<ProcessingResult> { const urls = validate_processing_urls(url, this.name); const extract_request = async () => { const api_key = validate_api_key( config.processing.tavily_extract.api_key, this.name, ); try { const data = await http_json<TavilyExtractResponse>( this.name, `${config.processing.tavily_extract.base_url}/extract`, { method: 'POST', headers: { Authorization: `Bearer ${api_key}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ urls: urls, include_images: false, extract_depth, }), signal: AbortSignal.timeout( config.processing.tavily_extract.timeout, ), }, ); // Check if there are any results if (data.results.length === 0) { throw new ProviderError( ErrorType.PROVIDER_ERROR, 'No content extracted from URL', this.name, ); } // Map results to raw_contents array const raw_contents = data.results.map((result) => ({ url: result.url, content: result.raw_content, })); // Combine all results into a single content string const combined_content = raw_contents .map((result) => result.content) .join('\n\n'); // Calculate total word count const word_count = combined_content .split(/\s+/) .filter(Boolean).length; // Include any failed URLs in metadata const failed_urls = data.failed_results.length > 0 ? data.failed_results : undefined; return { content: combined_content, raw_contents, metadata: { word_count, failed_urls, urls_processed: urls.length, successful_extractions: data.results.length, extract_depth, }, source_provider: this.name, }; } catch (error) { handle_provider_error(error, this.name, 'extract content'); } }; return retry_with_backoff(extract_request);
- src/server/tools.ts:413-466 (registration)Registers the MCP tool 'tavily_extract_process' dynamically for each processing provider (including tavily_extract): defines schema (url: string|array, optional extract_depth), handler calls provider.process_content and returns JSON-formatted result or error.// Register remaining processing providers (kagi_summarizer, tavily_extract) this.processing_providers.forEach((provider) => { server.tool( { name: `${provider.name}_process`, description: provider.description, schema: v.object({ url: v.pipe( v.union([v.string(), v.array(v.string())]), v.description('URL(s)'), ), extract_depth: v.optional( v.pipe( v.union([v.literal('basic'), v.literal('advanced')]), v.description('Extraction depth'), ), ), }), }, async ({ url, extract_depth }) => { try { const result = await provider.process_content( url, extract_depth, ); const safe_result = handle_large_result( result, provider.name, ); return { content: [ { type: 'text' as const, text: JSON.stringify(safe_result, null, 2), }, ], }; } catch (error) { const error_response = create_error_response( error as Error, ); return { content: [ { type: 'text' as const, text: error_response.error, }, ], isError: true, }; } }, ); });
- src/providers/index.ts:116-123 (registration)Conditionally registers the TavilyExtractProvider instance if API key is valid, making it available for tool registration as 'tavily_extract_process'.if ( is_api_key_valid( config.processing.tavily_extract.api_key, 'tavily_extract', ) ) { register_processing_provider(new TavilyExtractProvider()); }
- src/server/tools.ts:419-430 (schema)Input schema for tavily_extract_process tool: urls (string or array), optional extract_depth ('basic' or 'advanced'). Defined in dynamic registration.schema: v.object({ url: v.pipe( v.union([v.string(), v.array(v.string())]), v.description('URL(s)'), ), extract_depth: v.optional( v.pipe( v.union([v.literal('basic'), v.literal('advanced')]), v.description('Extraction depth'), ), ), }),