Skip to main content
Glama
wlmwwx
by wlmwwx

parallel_read_url

Extract clean content from multiple web pages simultaneously to compare sources or gather information efficiently. Supports up to 5 URLs with options for links and images.

Instructions

Read multiple web pages in parallel to extract clean content efficiently. For best results, provide multiple URLs that you need to extract simultaneously. This is useful for comparing content across multiple sources or gathering information from multiple pages at once.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlsYesArray of URL configurations to read in parallel (maximum 5 URLs for optimal performance)
timeoutNoTimeout in milliseconds for all URL reads

Implementation Reference

  • The complete handler and registration block for the 'parallel_read_url' tool, including Zod input schema, description, and the execution logic that handles input validation, deduplication, parallel execution via utility, and response formatting.
    server.tool( "parallel_read_url", "Read multiple web pages in parallel to extract clean content efficiently. For best results, provide multiple URLs that you need to extract simultaneously. This is useful for comparing content across multiple sources or gathering information from multiple pages at once. 💡 Use this when you need to analyze multiple sources simultaneously for efficiency.", { urls: z.array(z.object({ url: z.string().url().describe("The complete URL of the webpage or PDF file to read and convert"), withAllLinks: z.boolean().default(false).describe("Set to true to extract and return all hyperlinks found on the page as structured data"), withAllImages: z.boolean().default(false).describe("Set to true to extract and return all images found on the page as structured data") })).max(5).describe("Array of URL configurations to read in parallel (maximum 5 URLs for optimal performance)"), timeout: z.number().default(30000).describe("Timeout in milliseconds for all URL reads") }, async ({ urls, timeout }: { urls: Array<{ url: string; withAllLinks: boolean; withAllImages: boolean }>; timeout: number }) => { try { const props = getProps(); const uniqueUrls = urls.filter((urlConfig, index, self) => index === self.findIndex(u => u.url === urlConfig.url) ); // Import the utility functions const { executeParallelUrlReads } = await import("../utils/read.js"); // Execute parallel URL reads using the utility const results = await executeParallelUrlReads(uniqueUrls, props.bearerToken, timeout); // Format results for consistent output const contentItems: Array<{ type: 'text'; text: string }> = []; for (const result of results) { if ('success' in result && result.success) { contentItems.push({ type: "text" as const, text: yamlStringify(result.structuredData), }); } } return { content: contentItems, }; } catch (error) { return createErrorResponse(`Error: ${error instanceof Error ? error.message : String(error)}`); } }, );
  • Helper utility to execute multiple single-URL reads in parallel with a global timeout, using Promise.all raced against a timeout promise.
    export async function executeParallelUrlReads( urlConfigs: ReadUrlConfig[], bearerToken?: string, timeout: number = 30000 ): Promise<ReadUrlResponse[]> { const timeoutPromise = new Promise<never>((_, reject) => setTimeout(() => reject(new Error('Parallel URL read timeout')), timeout) ); const readPromises = urlConfigs.map(urlConfig => readUrlFromConfig(urlConfig, bearerToken)); return Promise.race([ Promise.all(readPromises), timeoutPromise ]); }
  • Core helper function for reading a single URL: normalizes URL, sends POST to r.jina.ai with custom headers for auth/links/images, parses JSON response into structured data (url, title, content, optional links/images).
    export async function readUrlFromConfig( urlConfig: ReadUrlConfig, bearerToken?: string ): Promise<ReadUrlResponse> { try { // Normalize the URL first const normalizedUrl = normalizeUrl(urlConfig.url); if (!normalizedUrl) { return { error: "Invalid or unsupported URL", url: urlConfig.url }; } const headers: Record<string, string> = { 'Accept': 'application/json', 'Content-Type': 'application/json', 'X-Md-Link-Style': 'discarded', }; // Add Authorization header if bearer token is available if (bearerToken) { headers['Authorization'] = `Bearer ${bearerToken}`; } if (urlConfig.withAllLinks) { headers['X-With-Links-Summary'] = 'all'; } if (urlConfig.withAllImages) { headers['X-With-Images-Summary'] = 'true'; } else { headers['X-Retain-Images'] = 'none'; } const response = await fetch('https://r.jina.ai/', { method: 'POST', headers, body: JSON.stringify({ url: normalizedUrl }), }); if (!response.ok) { return { error: `HTTP ${response.status}: ${response.statusText}`, url: urlConfig.url }; } const data = await response.json() as any; if (!data.data) { return { error: "Invalid response data from r.jina.ai", url: urlConfig.url }; } // Prepare structured data const structuredData: any = { url: data.data.url, title: data.data.title, }; if (urlConfig.withAllLinks && data.data.links) { structuredData.links = data.data.links.map((link: [string, string]) => ({ anchorText: link[0], url: link[1] })); } if (urlConfig.withAllImages && data.data.images) { structuredData.images = data.data.images; } structuredData.content = data.data.content || ""; return { success: true, url: urlConfig.url, structuredData, withAllLinks: urlConfig.withAllLinks || false, withAllImages: urlConfig.withAllImages || false }; } catch (error) { return { error: error instanceof Error ? error.message : String(error), url: urlConfig.url }; } }
  • Zod input schema for the parallel_read_url tool.
    { urls: z.array(z.object({ url: z.string().url().describe("The complete URL of the webpage or PDF file to read and convert"), withAllLinks: z.boolean().default(false).describe("Set to true to extract and return all hyperlinks found on the page as structured data"), withAllImages: z.boolean().default(false).describe("Set to true to extract and return all images found on the page as structured data") })).max(5).describe("Array of URL configurations to read in parallel (maximum 5 URLs for optimal performance)"), timeout: z.number().default(30000).describe("Timeout in milliseconds for all URL reads")

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/wlmwwx/jina-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server