Skip to main content
Glama
everford

Fetcher MCP

fetch_urls

Retrieve and process web page content from multiple URLs using customizable settings like timeout, content extraction, HTML return, and media disablement. Powered by Playwright headless browser for efficient data fetching.

Instructions

Retrieve web page content from multiple specified URLs

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
debugNoWhether to enable debug mode (showing browser window), overrides the --debug command line flag if specified
disableMediaNoWhether to disable media resources (images, stylesheets, fonts, media), default is true
extractContentNoWhether to intelligently extract the main content, default is true
maxLengthNoMaximum length of returned content (in characters), default is no limit
navigationTimeoutNoMaximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)
returnHtmlNoWhether to return HTML content instead of Markdown, default is false
timeoutNoPage loading timeout in milliseconds, default is 30000 (30 seconds)
urlsYesArray of URLs to fetch
waitForNavigationNoWhether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false
waitUntilNoSpecifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'

Implementation Reference

  • Main implementation of the fetch_urls tool. Parses arguments, launches headless Chromium browser (or visible if debug), fetches content from multiple URLs in parallel using Playwright pages, extracts/processes with WebContentProcessor, combines results with section markers, returns as text content.
    export async function fetchUrls(args: any) {
      const urls = (args?.urls as string[]) || [];
      if (!urls || !Array.isArray(urls) || urls.length === 0) {
        throw new Error("URLs parameter is required and must be a non-empty array");
      }
    
      const options: FetchOptions = {
        timeout: Number(args?.timeout) || 30000,
        waitUntil: String(args?.waitUntil || "load") as 'load' | 'domcontentloaded' | 'networkidle' | 'commit',
        extractContent: args?.extractContent !== false,
        maxLength: Number(args?.maxLength) || 0,
        returnHtml: args?.returnHtml === true,
        waitForNavigation: args?.waitForNavigation === true,
        navigationTimeout: Number(args?.navigationTimeout) || 10000,
        disableMedia: args?.disableMedia !== false,
        debug: args?.debug
      };
    
      // 确定是否启用调试模式(优先使用参数指定的值,否则使用命令行标志)
      const useDebugMode = options.debug !== undefined ? options.debug : isDebugMode;
      
      if (useDebugMode) {
        console.log(`[Debug] Debug mode enabled for URLs: ${urls.join(', ')}`);
      }
    
      let browser = null;
      try {
        browser = await chromium.launch({ headless: !useDebugMode });
        const context = await browser.newContext({
          javaScriptEnabled: true,
          ignoreHTTPSErrors: true,
          userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
        });
    
        await context.route('**/*', async (route) => {
          const resourceType = route.request().resourceType();
          if (options.disableMedia && ['image', 'stylesheet', 'font', 'media'].includes(resourceType)) {
            await route.abort();
          } else {
            await route.continue();
          }
        });
    
        const processor = new WebContentProcessor(options, '[FetchURLs]');
        
        const results = await Promise.all(
          urls.map(async (url, index) => {
            const page = await context.newPage();
            try {
              const result = await processor.processPageContent(page, url);
              return { index, ...result } as FetchResult;
            } finally {
              if (!useDebugMode) {
                await page.close().catch(e => console.error(`[Error] Failed to close page: ${e.message}`));
              } else {
                console.log(`[Debug] Page kept open for debugging. URL: ${url}`);
              }
            }
          })
        );
    
        results.sort((a, b) => (a.index || 0) - (b.index || 0));
        const combinedResults = results
          .map((result, i) => `[webpage ${i + 1} begin]\n${result.content}\n[webpage ${i + 1} end]`)
          .join('\n\n');
    
        return {
          content: [{ type: "text", text: combinedResults }]
        };
      } finally {
        if (!useDebugMode) {
          if (browser) await browser.close().catch(e => console.error(`[Error] Failed to close browser: ${e.message}`));
        } else {
          console.log(`[Debug] Browser kept open for debugging. URLs: ${urls.join(', ')}`);
        }
      }
    }
  • Tool definition object for 'fetch_urls' including name, description, and detailed inputSchema defining parameters like urls (required), timeout, waitUntil, extractContent, etc.
    export const fetchUrlsTool = {
      name: "fetch_urls",
      description: "Retrieve web page content from multiple specified URLs",
      inputSchema: {
        type: "object",
        properties: {
          urls: {
            type: "array",
            items: {
              type: "string",
            },
            description: "Array of URLs to fetch",
          },
          timeout: {
            type: "number",
            description:
              "Page loading timeout in milliseconds, default is 30000 (30 seconds)",
          },
          waitUntil: {
            type: "string",
            description:
              "Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'",
          },
          extractContent: {
            type: "boolean",
            description:
              "Whether to intelligently extract the main content, default is true",
          },
          maxLength: {
            type: "number",
            description:
              "Maximum length of returned content (in characters), default is no limit",
          },
          returnHtml: {
            type: "boolean",
            description:
              "Whether to return HTML content instead of Markdown, default is false",
          },
          waitForNavigation: {
            type: "boolean",
            description:
              "Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false",
          },
          navigationTimeout: {
            type: "number",
            description:
              "Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)",
          },
          disableMedia: {
            type: "boolean",
            description:
              "Whether to disable media resources (images, stylesheets, fonts, media), default is true",
          },
          debug: {
            type: "boolean",
            description:
              "Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specified",
          },
        },
        required: ["urls"],
      }
    };
  • Maps tool names to their handler functions, registering 'fetch_urls' to the fetchUrls implementation.
    export const toolHandlers = {
      [fetchUrlTool.name]: fetchUrl,
      [fetchUrlsTool.name]: fetchUrls
    };
  • src/tools/index.ts:5-8 (registration)
    Array exporting tool definitions, including fetchUrlsTool for registration in the MCP tools list.
    export const tools = [
      fetchUrlTool,
      fetchUrlsTool
    ];
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool retrieves content but lacks details on critical behaviors: it doesn't mention authentication needs, rate limits, error handling, or what the output looks like (e.g., format, structure). For a tool with 10 parameters and no output schema, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence: 'Retrieve web page content from multiple specified URLs.' It's front-loaded with the core purpose, has zero wasted words, and is appropriately sized for the tool's complexity. Every part of the sentence earns its place by clearly stating the action and scope.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (10 parameters, no annotations, no output schema), the description is incomplete. It doesn't address behavioral aspects like how content is returned, error cases, or performance constraints. While the schema covers parameters well, the description fails to provide necessary context for effective use, especially without annotations or output schema to fill gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, meaning all parameters are well-documented in the input schema itself. The description doesn't add any semantic details beyond what's in the schema (e.g., it doesn't explain how 'urls' are processed or interactions between parameters). With high schema coverage, the baseline score of 3 is appropriate, as the description doesn't compensate but also doesn't detract.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Retrieve web page content from multiple specified URLs.' It specifies the verb ('Retrieve'), resource ('web page content'), and scope ('multiple specified URLs'), which is specific and actionable. However, it doesn't explicitly distinguish this tool from its sibling 'fetch_url' (which presumably handles single URLs), missing full differentiation for a top score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention the sibling tool 'fetch_url' or explain scenarios where fetching multiple URLs is preferred over single ones. There's no context about prerequisites, limitations, or best practices, leaving the agent without usage direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/everford/fetcher-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server