Skip to main content
Glama

fetch_url

Retrieve and process web page content from a URL, extract main content, convert to Markdown, and customize settings like timeout, media handling, and navigation behavior.

Instructions

Retrieve web page content from a specified URL

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
debugNoWhether to enable debug mode (showing browser window), overrides the --debug command line flag if specified
disableMediaNoWhether to disable media resources (images, stylesheets, fonts, media), default is true
extractContentNoWhether to intelligently extract the main content, default is true
maxLengthNoMaximum length of returned content (in characters), default is no limit
navigationTimeoutNoMaximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)
returnHtmlNoWhether to return HTML content instead of Markdown, default is false
timeoutNoPage loading timeout in milliseconds, default is 30000 (30 seconds)
urlYesURL to fetch. Make sure to include the schema (http:// or https:// if not defined, preferring https for most cases)
waitForNavigationNoWhether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false
waitUntilNoSpecifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'

Implementation Reference

  • The main handler function `fetchUrl` that implements the core logic: validates args, creates BrowserService and WebContentProcessor, launches stealth browser, navigates to URL, processes content, and returns it.
    export async function fetchUrl(args: any) { const url = String(args?.url || ""); if (!url) { logger.error(`URL parameter missing`); throw new Error("URL parameter is required"); } const options: FetchOptions = { timeout: Number(args?.timeout) || 30000, waitUntil: String(args?.waitUntil || "load") as | "load" | "domcontentloaded" | "networkidle" | "commit", extractContent: args?.extractContent !== false, maxLength: Number(args?.maxLength) || 0, returnHtml: args?.returnHtml === true, waitForNavigation: args?.waitForNavigation === true, navigationTimeout: Number(args?.navigationTimeout) || 10000, disableMedia: args?.disableMedia !== false, debug: args?.debug, }; // Create browser service const browserService = new BrowserService(options); // Create content processor const processor = new WebContentProcessor(options, "[FetchURL]"); let browser: Browser | null = null; let page: Page | null = null; if (browserService.isInDebugMode()) { logger.debug(`Debug mode enabled for URL: ${url}`); } try { // Create a stealth browser with anti-detection measures browser = await browserService.createBrowser(); // Create a stealth browser context const { context, viewport } = await browserService.createContext(browser); // Create a new page with human-like behavior page = await browserService.createPage(context, viewport); // Process page content const result = await processor.processPageContent(page, url); return { content: [{ type: "text", text: result.content }], }; } finally { // Clean up resources await browserService.cleanup(browser, page); if (browserService.isInDebugMode()) { logger.debug(`Browser and page kept open for debugging. URL: ${url}`); } } }
  • The `fetchUrlTool` definition object containing name 'fetch_url', description, and detailed inputSchema for parameters like url, timeout, extractContent, etc.
    export const fetchUrlTool = { name: "fetch_url", description: "Retrieve web page content from a specified URL", inputSchema: { type: "object", properties: { url: { type: "string", description: "URL to fetch. Make sure to include the schema (http:// or https:// if not defined, preferring https for most cases)", }, timeout: { type: "number", description: "Page loading timeout in milliseconds, default is 30000 (30 seconds)", }, waitUntil: { type: "string", description: "Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'", }, extractContent: { type: "boolean", description: "Whether to intelligently extract the main content, default is true", }, maxLength: { type: "number", description: "Maximum length of returned content (in characters), default is no limit", }, returnHtml: { type: "boolean", description: "Whether to return HTML content instead of Markdown, default is false", }, waitForNavigation: { type: "boolean", description: "Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false", }, navigationTimeout: { type: "number", description: "Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)", }, disableMedia: { type: "boolean", description: "Whether to disable media resources (images, stylesheets, fonts, media), default is true", }, debug: { type: "boolean", description: "Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specified", }, }, required: ["url"], }, };
  • Registration in tools/index.ts: imports fetchUrlTool and fetchUrl, adds fetchUrlTool to exported tools array, and maps 'fetch_url' to fetchUrl in toolHandlers object.
    import { fetchUrlTool, fetchUrl } from './fetchUrl.js'; import { fetchUrlsTool, fetchUrls } from './fetchUrls.js'; import { browserInstallTool, browserInstall } from './browserInstall.js'; // Export tool definitions export const tools = [ fetchUrlTool, fetchUrlsTool, browserInstallTool ]; // Export tool implementations export const toolHandlers = { [fetchUrlTool.name]: fetchUrl, [fetchUrlsTool.name]: fetchUrls, [browserInstallTool.name]: browserInstall };

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jae-jae/fetcher-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server