fetch_url
Retrieve and process web page content from a URL, extract main content, convert to Markdown, and customize settings like timeout, media handling, and navigation behavior.
Instructions
Retrieve web page content from a specified URL
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| debug | No | Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specified | |
| disableMedia | No | Whether to disable media resources (images, stylesheets, fonts, media), default is true | |
| extractContent | No | Whether to intelligently extract the main content, default is true | |
| maxLength | No | Maximum length of returned content (in characters), default is no limit | |
| navigationTimeout | No | Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds) | |
| returnHtml | No | Whether to return HTML content instead of Markdown, default is false | |
| timeout | No | Page loading timeout in milliseconds, default is 30000 (30 seconds) | |
| url | Yes | URL to fetch. Make sure to include the schema (http:// or https:// if not defined, preferring https for most cases) | |
| waitForNavigation | No | Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false | |
| waitUntil | No | Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load' |
Implementation Reference
- src/tools/fetchUrl.ts:73-132 (handler)The main handler function `fetchUrl` that implements the core logic: validates args, creates BrowserService and WebContentProcessor, launches stealth browser, navigates to URL, processes content, and returns it.export async function fetchUrl(args: any) { const url = String(args?.url || ""); if (!url) { logger.error(`URL parameter missing`); throw new Error("URL parameter is required"); } const options: FetchOptions = { timeout: Number(args?.timeout) || 30000, waitUntil: String(args?.waitUntil || "load") as | "load" | "domcontentloaded" | "networkidle" | "commit", extractContent: args?.extractContent !== false, maxLength: Number(args?.maxLength) || 0, returnHtml: args?.returnHtml === true, waitForNavigation: args?.waitForNavigation === true, navigationTimeout: Number(args?.navigationTimeout) || 10000, disableMedia: args?.disableMedia !== false, debug: args?.debug, }; // Create browser service const browserService = new BrowserService(options); // Create content processor const processor = new WebContentProcessor(options, "[FetchURL]"); let browser: Browser | null = null; let page: Page | null = null; if (browserService.isInDebugMode()) { logger.debug(`Debug mode enabled for URL: ${url}`); } try { // Create a stealth browser with anti-detection measures browser = await browserService.createBrowser(); // Create a stealth browser context const { context, viewport } = await browserService.createContext(browser); // Create a new page with human-like behavior page = await browserService.createPage(context, viewport); // Process page content const result = await processor.processPageContent(page, url); return { content: [{ type: "text", text: result.content }], }; } finally { // Clean up resources await browserService.cleanup(browser, page); if (browserService.isInDebugMode()) { logger.debug(`Browser and page kept open for debugging. URL: ${url}`); } } }
- src/tools/fetchUrl.ts:10-68 (schema)The `fetchUrlTool` definition object containing name 'fetch_url', description, and detailed inputSchema for parameters like url, timeout, extractContent, etc.export const fetchUrlTool = { name: "fetch_url", description: "Retrieve web page content from a specified URL", inputSchema: { type: "object", properties: { url: { type: "string", description: "URL to fetch. Make sure to include the schema (http:// or https:// if not defined, preferring https for most cases)", }, timeout: { type: "number", description: "Page loading timeout in milliseconds, default is 30000 (30 seconds)", }, waitUntil: { type: "string", description: "Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'", }, extractContent: { type: "boolean", description: "Whether to intelligently extract the main content, default is true", }, maxLength: { type: "number", description: "Maximum length of returned content (in characters), default is no limit", }, returnHtml: { type: "boolean", description: "Whether to return HTML content instead of Markdown, default is false", }, waitForNavigation: { type: "boolean", description: "Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false", }, navigationTimeout: { type: "number", description: "Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)", }, disableMedia: { type: "boolean", description: "Whether to disable media resources (images, stylesheets, fonts, media), default is true", }, debug: { type: "boolean", description: "Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specified", }, }, required: ["url"], }, };
- src/tools/index.ts:1-17 (registration)Registration in tools/index.ts: imports fetchUrlTool and fetchUrl, adds fetchUrlTool to exported tools array, and maps 'fetch_url' to fetchUrl in toolHandlers object.import { fetchUrlTool, fetchUrl } from './fetchUrl.js'; import { fetchUrlsTool, fetchUrls } from './fetchUrls.js'; import { browserInstallTool, browserInstall } from './browserInstall.js'; // Export tool definitions export const tools = [ fetchUrlTool, fetchUrlsTool, browserInstallTool ]; // Export tool implementations export const toolHandlers = { [fetchUrlTool.name]: fetchUrl, [fetchUrlsTool.name]: fetchUrls, [browserInstallTool.name]: browserInstall };