fetch_urls
Retrieve and extract web page content from multiple URLs using a headless browser, with options for timeout, content extraction, and format conversion.
Instructions
Retrieve web page content from multiple specified URLs
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| urls | Yes | Array of URLs to fetch | |
| timeout | No | Page loading timeout in milliseconds, default is 30000 (30 seconds) | |
| waitUntil | No | Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load' | |
| extractContent | No | Whether to intelligently extract the main content, default is true | |
| maxLength | No | Maximum length of returned content (in characters), default is no limit | |
| returnHtml | No | Whether to return HTML content instead of Markdown, default is false | |
| waitForNavigation | No | Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false | |
| navigationTimeout | No | Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds) | |
| disableMedia | No | Whether to disable media resources (images, stylesheets, fonts, media), default is true | |
| debug | No | Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specified |
Implementation Reference
- src/tools/fetchUrls.ts:77-161 (handler)The handler function that implements the fetch_urls tool. It validates input URLs, sets up browser options, creates a stealth browser instance using BrowserService, processes each URL concurrently with WebContentProcessor, extracts and combines content, and returns it in a structured format.export async function fetchUrls(args: any) { const urls = (args?.urls as string[]) || []; if (!urls || !Array.isArray(urls) || urls.length === 0) { throw new Error("URLs parameter is required and must be a non-empty array"); } // Validate all URLs protocols for security (only allow HTTP and HTTPS) validateUrlsProtocol(urls); const options: FetchOptions = { timeout: Number(args?.timeout) || 30000, waitUntil: String(args?.waitUntil || "load") as | "load" | "domcontentloaded" | "networkidle" | "commit", extractContent: args?.extractContent !== false, maxLength: Number(args?.maxLength) || 0, returnHtml: args?.returnHtml === true, waitForNavigation: args?.waitForNavigation === true, navigationTimeout: Number(args?.navigationTimeout) || 10000, disableMedia: args?.disableMedia !== false, debug: args?.debug, }; // Create browser service const browserService = new BrowserService(options); if (browserService.isInDebugMode()) { logger.debug(`Debug mode enabled for URLs: ${urls.join(", ")}`); } let browser: Browser | null = null; try { // Create a stealth browser with anti-detection measures browser = await browserService.createBrowser(); // Create a stealth browser context const { context, viewport } = await browserService.createContext(browser); const processor = new WebContentProcessor(options, "[FetchURLs]"); const results = await Promise.all( urls.map(async (url, index) => { // Create a new page with human-like behavior const page = await browserService.createPage(context, viewport); try { const result = await processor.processPageContent(page, url); return { index, ...result } as FetchResult; } finally { if (!browserService.isInDebugMode()) { await page .close() .catch((e) => logger.error(`Failed to close page: ${e.message}`)); } else { logger.debug(`Page kept open for debugging. URL: ${url}`); } } }), ); results.sort((a, b) => (a.index || 0) - (b.index || 0)); const combinedResults = results .map( (result, i) => `[webpage ${i + 1} begin]\n${result.content}\n[webpage ${i + 1} end]`, ) .join("\n\n"); return { content: [{ type: "text", text: combinedResults }], }; } finally { // Clean up browser resources if (!browserService.isInDebugMode()) { if (browser) await browser .close() .catch((e) => logger.error(`Failed to close browser: ${e.message}`)); } else { logger.debug(`Browser kept open for debugging. URLs: ${urls.join(", ")}`); } } }
- src/tools/fetchUrls.ts:8-72 (schema)Defines the fetch_urls tool schema, including name, description, and detailed inputSchema with properties for urls (required), timeout, waitUntil, extractContent, maxLength, and other fetch options./** * Tool definition for fetch_urls */ export const fetchUrlsTool = { name: "fetch_urls", description: "Retrieve web page content from multiple specified URLs", inputSchema: { type: "object", properties: { urls: { type: "array", items: { type: "string", }, description: "Array of URLs to fetch", }, timeout: { type: "number", description: "Page loading timeout in milliseconds, default is 30000 (30 seconds)", }, waitUntil: { type: "string", description: "Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'", }, extractContent: { type: "boolean", description: "Whether to intelligently extract the main content, default is true", }, maxLength: { type: "number", description: "Maximum length of returned content (in characters), default is no limit", }, returnHtml: { type: "boolean", description: "Whether to return HTML content instead of Markdown, default is false", }, waitForNavigation: { type: "boolean", description: "Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false", }, navigationTimeout: { type: "number", description: "Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)", }, disableMedia: { type: "boolean", description: "Whether to disable media resources (images, stylesheets, fonts, media), default is true", }, debug: { type: "boolean", description: "Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specified", }, }, required: ["urls"], }, };
- src/tools/index.ts:5-17 (registration)Registers the fetchUrlsTool in the tools array and maps 'fetch_urls' to the fetchUrls handler in the toolHandlers object, aggregating all tools for export.// Export tool definitions export const tools = [ fetchUrlTool, fetchUrlsTool, browserInstallTool ]; // Export tool implementations export const toolHandlers = { [fetchUrlTool.name]: fetchUrl, [fetchUrlsTool.name]: fetchUrls, [browserInstallTool.name]: browserInstall };
- src/server.ts:27-32 (registration)Registers the MCP ListToolsRequest handler to return the list of tools, including fetch_urls.server.setRequestHandler(ListToolsRequestSchema, async () => { logger.info("[Tools] Listing available tools"); return { tools, }; });
- src/server.ts:38-47 (registration)Registers the MCP CallToolRequest handler that dispatches tool calls to the corresponding handler from toolHandlers, enabling execution of fetch_urls.server.setRequestHandler(CallToolRequestSchema, async (request) => { const toolName = request.params.name; const handler = toolHandlers[toolName]; if (!handler) { throw new Error(`Unknown tool: ${toolName}`); } return handler(request.params.arguments); });