Skip to main content
Glama
jae-jae
by jae-jae

fetch_url

Retrieve web page content from any URL, extract main content, and convert to Markdown format for analysis and processing.

Instructions

Retrieve web page content from a specified URL

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL to fetch. Make sure to include the schema (http:// or https:// if not defined, preferring https for most cases)
timeoutNoPage loading timeout in milliseconds, default is 30000 (30 seconds)
waitUntilNoSpecifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'
extractContentNoWhether to intelligently extract the main content, default is true
maxLengthNoMaximum length of returned content (in characters), default is no limit
returnHtmlNoWhether to return HTML content instead of Markdown, default is false
waitForNavigationNoWhether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false
navigationTimeoutNoMaximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)
disableMediaNoWhether to disable media resources (images, stylesheets, fonts, media), default is true
debugNoWhether to enable debug mode (showing browser window), overrides the --debug command line flag if specified

Implementation Reference

  • Implementation of the fetch_url tool: validates URL, sets options, creates browser service and content processor, navigates to URL stealthily, extracts content, and returns it in MCP format.
    export async function fetchUrl(args: any) {
      const url = String(args?.url || "");
      if (!url) {
        logger.error(`URL parameter missing`);
        throw new Error("URL parameter is required");
      }
    
      // Validate URL protocol for security (only allow HTTP and HTTPS)
      validateUrlProtocol(url);
    
      const options: FetchOptions = {
        timeout: Number(args?.timeout) || 30000,
        waitUntil: String(args?.waitUntil || "load") as
          | "load"
          | "domcontentloaded"
          | "networkidle"
          | "commit",
        extractContent: args?.extractContent !== false,
        maxLength: Number(args?.maxLength) || 0,
        returnHtml: args?.returnHtml === true,
        waitForNavigation: args?.waitForNavigation === true,
        navigationTimeout: Number(args?.navigationTimeout) || 10000,
        disableMedia: args?.disableMedia !== false,
        debug: args?.debug,
      };
    
      // Create browser service
      const browserService = new BrowserService(options);
    
      // Create content processor
      const processor = new WebContentProcessor(options, "[FetchURL]");
      let browser: Browser | null = null;
      let page: Page | null = null;
    
      if (browserService.isInDebugMode()) {
        logger.debug(`Debug mode enabled for URL: ${url}`);
      }
    
      try {
        // Create a stealth browser with anti-detection measures
        browser = await browserService.createBrowser();
    
        // Create a stealth browser context
        const { context, viewport } = await browserService.createContext(browser);
    
        // Create a new page with human-like behavior
        page = await browserService.createPage(context, viewport);
    
        // Process page content
        const result = await processor.processPageContent(page, url);
    
        return {
          content: [{ type: "text", text: result.content }],
        };
      } finally {
        // Clean up resources
        await browserService.cleanup(browser, page);
    
        if (browserService.isInDebugMode()) {
          logger.debug(`Browser and page kept open for debugging. URL: ${url}`);
        }
      }
    }
  • Tool definition for fetch_url including name, description, and detailed input schema.
    export const fetchUrlTool = {
      name: "fetch_url",
      description: "Retrieve web page content from a specified URL",
      inputSchema: {
        type: "object",
        properties: {
          url: {
            type: "string",
            description:
              "URL to fetch. Make sure to include the schema (http:// or https:// if not defined, preferring https for most cases)",
          },
          timeout: {
            type: "number",
            description:
              "Page loading timeout in milliseconds, default is 30000 (30 seconds)",
          },
          waitUntil: {
            type: "string",
            description:
              "Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'",
          },
          extractContent: {
            type: "boolean",
            description:
              "Whether to intelligently extract the main content, default is true",
          },
          maxLength: {
            type: "number",
            description:
              "Maximum length of returned content (in characters), default is no limit",
          },
          returnHtml: {
            type: "boolean",
            description:
              "Whether to return HTML content instead of Markdown, default is false",
          },
          waitForNavigation: {
            type: "boolean",
            description:
              "Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false",
          },
          navigationTimeout: {
            type: "number",
            description:
              "Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)",
          },
          disableMedia: {
            type: "boolean",
            description:
              "Whether to disable media resources (images, stylesheets, fonts, media), default is true",
          },
          debug: {
            type: "boolean",
            description:
              "Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specified",
          },
        },
        required: ["url"],
      },
    };
  • Registration of the fetch_url tool: included in tools array and mapped from name to handler in toolHandlers object.
    // Export tool definitions
    export const tools = [
      fetchUrlTool,
      fetchUrlsTool,
      browserInstallTool
    ];
    
    // Export tool implementations
    export const toolHandlers = {
      [fetchUrlTool.name]: fetchUrl,
      [fetchUrlsTool.name]: fetchUrls,
      [browserInstallTool.name]: browserInstall
    };
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure but only states the basic action ('Retrieve web page content'). It doesn't mention potential side effects (e.g., network requests, rate limits), authentication needs, error handling, or what the return content looks like (structure or format). This is inadequate for a tool with 10 parameters and no output schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose without unnecessary words. It earns its place by clearly stating what the tool does, making it easy to scan and understand.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 10 parameters, no annotations, and no output schema, the description is insufficient. It doesn't explain behavioral traits, return values, or usage context, leaving significant gaps that could hinder correct tool selection and invocation by an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, providing detailed documentation for all 10 parameters. The description adds no additional parameter semantics beyond implying a 'URL' input, which is already covered in the schema. Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Retrieve') and resource ('web page content from a specified URL'), making the purpose immediately understandable. It doesn't explicitly differentiate from sibling tools like 'fetch_urls' (plural) or 'browser_install', but the singular 'URL' suggests this is for single-page retrieval versus batch operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like 'fetch_urls' or 'browser_install'. The description lacks context about prerequisites, typical use cases, or exclusions, leaving the agent to infer usage based on tool names alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jae-jae/fetcher-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server