Skip to main content
Glama

tavily_extract_process

Extract and process web content from single or multiple URLs into clean, structured text. Configure extraction depth and optionally include images for enhanced data analysis, research, or collection tasks.

Instructions

Extract web page content from single or multiple URLs using Tavily Extract. Efficiently converts web content into clean, processable text with configurable extraction depth and optional image extraction. Returns both combined and individual URL content. Best for content analysis, data collection, and research.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
extract_depthNoThe depth of the extraction process. "advanced" retrieves more data but costs more credits.basic
urlYes

Implementation Reference

  • Dynamic registration of MCP tools for processing providers. For 'tavily_extract' provider, registers tool 'tavily_extract_process' with schema and handler that delegates to provider.process_content(). Includes schema definition and error handling.
    this.processing_providers.forEach((provider) => { server.tool( { name: `${provider.name}_process`, description: provider.description, schema: v.object({ url: v.pipe( v.union([v.string(), v.array(v.string())]), v.description('URL(s)'), ), extract_depth: v.optional( v.pipe( v.union([v.literal('basic'), v.literal('advanced')]), v.description('Extraction depth'), ), ), }), }, async ({ url, extract_depth }) => { try { const result = await provider.process_content( url, extract_depth, ); return { content: [ { type: 'text' as const, text: JSON.stringify(result, null, 2), }, ], }; } catch (error) { const error_response = create_error_response( error as Error, ); return { content: [ { type: 'text' as const, text: error_response.error, }, ], isError: true, }; } }, ); });
  • Core handler logic in TavilyExtractProvider.process_content(): validates URLs, makes POST to Tavily /extract API, processes response into combined content and raw_contents, adds metadata, with retry and error handling.
    async process_content( url: string | string[], extract_depth: 'basic' | 'advanced' = 'basic', ): Promise<ProcessingResult> { const urls = validate_processing_urls(url, this.name); const extract_request = async () => { const api_key = validate_api_key( config.processing.tavily_extract.api_key, this.name, ); try { const data = await http_json<TavilyExtractResponse>( this.name, `${config.processing.tavily_extract.base_url}/extract`, { method: 'POST', headers: { Authorization: `Bearer ${api_key}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ urls: urls, include_images: false, extract_depth, }), signal: AbortSignal.timeout( config.processing.tavily_extract.timeout, ), }, ); // Check if there are any results if (data.results.length === 0) { throw new ProviderError( ErrorType.PROVIDER_ERROR, 'No content extracted from URL', this.name, ); } // Map results to raw_contents array const raw_contents = data.results.map((result) => ({ url: result.url, content: result.raw_content, })); // Combine all results into a single content string const combined_content = raw_contents .map((result) => result.content) .join('\n\n'); // Calculate total word count const word_count = combined_content .split(/\s+/) .filter(Boolean).length; // Include any failed URLs in metadata const failed_urls = data.failed_results.length > 0 ? data.failed_results : undefined; return { content: combined_content, raw_contents, metadata: { word_count, failed_urls, urls_processed: urls.length, successful_extractions: data.results.length, extract_depth, }, source_provider: this.name, }; } catch (error) { handle_provider_error(error, this.name, 'extract content'); } }; return retry_with_backoff(extract_request); }
  • TavilyExtractProvider class: defines name 'tavily_extract', description, and implements the process_content method used by the tool.
    export class TavilyExtractProvider implements ProcessingProvider { name = 'tavily_extract'; description = 'Extract web page content from single or multiple URLs using Tavily Extract. Efficiently converts web content into clean, processable text with configurable extraction depth and optional image extraction. Returns both combined and individual URL content. Best for content analysis, data collection, and research.'; async process_content( url: string | string[], extract_depth: 'basic' | 'advanced' = 'basic', ): Promise<ProcessingResult> { const urls = validate_processing_urls(url, this.name); const extract_request = async () => { const api_key = validate_api_key( config.processing.tavily_extract.api_key, this.name, ); try { const data = await http_json<TavilyExtractResponse>( this.name, `${config.processing.tavily_extract.base_url}/extract`, { method: 'POST', headers: { Authorization: `Bearer ${api_key}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ urls: urls, include_images: false, extract_depth, }), signal: AbortSignal.timeout( config.processing.tavily_extract.timeout, ), }, ); // Check if there are any results if (data.results.length === 0) { throw new ProviderError( ErrorType.PROVIDER_ERROR, 'No content extracted from URL', this.name, ); } // Map results to raw_contents array const raw_contents = data.results.map((result) => ({ url: result.url, content: result.raw_content, })); // Combine all results into a single content string const combined_content = raw_contents .map((result) => result.content) .join('\n\n'); // Calculate total word count const word_count = combined_content .split(/\s+/) .filter(Boolean).length; // Include any failed URLs in metadata const failed_urls = data.failed_results.length > 0 ? data.failed_results : undefined; return { content: combined_content, raw_contents, metadata: { word_count, failed_urls, urls_processed: urls.length, successful_extractions: data.results.length, extract_depth, }, source_provider: this.name, }; } catch (error) { handle_provider_error(error, this.name, 'extract content'); } }; return retry_with_backoff(extract_request); } }
  • Conditional registration of TavilyExtractProvider instance to the processing providers registry if API key is valid.
    if ( is_api_key_valid( config.processing.tavily_extract.api_key, 'tavily_extract', ) ) { register_processing_provider(new TavilyExtractProvider()); }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/spences10/mcp-omnisearch'

If you have feedback or need assistance with the MCP directory API, please join our Discord server