Skip to main content
Glama

tavily_extract_process

Extract web content from URLs into clean text for analysis, data collection, and research with configurable depth and image options.

Instructions

Extract web page content from single or multiple URLs using Tavily Extract. Efficiently converts web content into clean, processable text with configurable extraction depth and optional image extraction. Returns both combined and individual URL content. Best for content analysis, data collection, and research.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL(s)
extract_depthNoExtraction depth

Implementation Reference

  • Core handler logic in TavilyExtractProvider.process_content: validates URLs, calls Tavily Extract API via POST with urls and extract_depth, processes response into combined content, raw_contents per URL, metadata (word_count, failed_urls, etc.), with error handling and retry.
    async process_content( url: string | string[], extract_depth: 'basic' | 'advanced' = 'basic', ): Promise<ProcessingResult> { const urls = validate_processing_urls(url, this.name); const extract_request = async () => { const api_key = validate_api_key( config.processing.tavily_extract.api_key, this.name, ); try { const data = await http_json<TavilyExtractResponse>( this.name, `${config.processing.tavily_extract.base_url}/extract`, { method: 'POST', headers: { Authorization: `Bearer ${api_key}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ urls: urls, include_images: false, extract_depth, }), signal: AbortSignal.timeout( config.processing.tavily_extract.timeout, ), }, ); // Check if there are any results if (data.results.length === 0) { throw new ProviderError( ErrorType.PROVIDER_ERROR, 'No content extracted from URL', this.name, ); } // Map results to raw_contents array const raw_contents = data.results.map((result) => ({ url: result.url, content: result.raw_content, })); // Combine all results into a single content string const combined_content = raw_contents .map((result) => result.content) .join('\n\n'); // Calculate total word count const word_count = combined_content .split(/\s+/) .filter(Boolean).length; // Include any failed URLs in metadata const failed_urls = data.failed_results.length > 0 ? data.failed_results : undefined; return { content: combined_content, raw_contents, metadata: { word_count, failed_urls, urls_processed: urls.length, successful_extractions: data.results.length, extract_depth, }, source_provider: this.name, }; } catch (error) { handle_provider_error(error, this.name, 'extract content'); } }; return retry_with_backoff(extract_request);
  • Registers the MCP tool 'tavily_extract_process' dynamically for each processing provider (including tavily_extract): defines schema (url: string|array, optional extract_depth), handler calls provider.process_content and returns JSON-formatted result or error.
    // Register remaining processing providers (kagi_summarizer, tavily_extract) this.processing_providers.forEach((provider) => { server.tool( { name: `${provider.name}_process`, description: provider.description, schema: v.object({ url: v.pipe( v.union([v.string(), v.array(v.string())]), v.description('URL(s)'), ), extract_depth: v.optional( v.pipe( v.union([v.literal('basic'), v.literal('advanced')]), v.description('Extraction depth'), ), ), }), }, async ({ url, extract_depth }) => { try { const result = await provider.process_content( url, extract_depth, ); const safe_result = handle_large_result( result, provider.name, ); return { content: [ { type: 'text' as const, text: JSON.stringify(safe_result, null, 2), }, ], }; } catch (error) { const error_response = create_error_response( error as Error, ); return { content: [ { type: 'text' as const, text: error_response.error, }, ], isError: true, }; } }, ); });
  • Conditionally registers the TavilyExtractProvider instance if API key is valid, making it available for tool registration as 'tavily_extract_process'.
    if ( is_api_key_valid( config.processing.tavily_extract.api_key, 'tavily_extract', ) ) { register_processing_provider(new TavilyExtractProvider()); }
  • Input schema for tavily_extract_process tool: urls (string or array), optional extract_depth ('basic' or 'advanced'). Defined in dynamic registration.
    schema: v.object({ url: v.pipe( v.union([v.string(), v.array(v.string())]), v.description('URL(s)'), ), extract_depth: v.optional( v.pipe( v.union([v.literal('basic'), v.literal('advanced')]), v.description('Extraction depth'), ), ), }),

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/spences10/mcp-omnisearch'

If you have feedback or need assistance with the MCP directory API, please join our Discord server