Skip to main content
Glama
j0hanz

superFetch MCP Server

fetch-url

Fetch public webpages and convert HTML to clean Markdown for AI processing, with built-in content cleaning and secure web access.

Instructions

Web Content Extractor Fetch public webpages and convert HTML to clean Markdown.

  • READ-ONLY. No JavaScript execution.

  • GitHub/GitLab/Bitbucket URLs auto-transform to raw endpoints (check resolvedUrl).

  • If truncated=true, use cacheResourceUri with resources/read for full content.

  • For large pages/timeouts, use task mode (task: {}).

  • If error queue_full, retry with task mode.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesTarget URL. Max 2048 chars.
skipNoiseRemovalNoPreserve navigation/footers (disable noise filtering).
forceRefreshNoBypass cache and fetch fresh content.
maxInlineCharsNoInline markdown limit (0-10485760, 0=unlimited). Lower of this or global limit applies.

Implementation Reference

  • Main tool handler function - fetchUrlToolHandler is the primary entry point that wraps executeFetch with error handling and logging
    export async function fetchUrlToolHandler(
      input: FetchUrlInput,
      extra?: ToolHandlerExtra
    ): Promise<ToolResponseBase> {
      return executeFetch(input, extra).catch((error: unknown) => {
        logError('fetch-url tool error', toError(error));
        return handleToolError(error, input.url, 'Failed to fetch URL');
      });
    }
  • Core execution logic - executeFetch function orchestrates the URL fetching, progress reporting, and response building
    async function executeFetch(
      input: FetchUrlInput,
      extra?: ToolHandlerExtra
    ): Promise<ToolResponseBase> {
      const { url } = input;
      const signal = buildToolAbortSignal(extra?.signal);
      const progress = createProgressReporter(extra);
    
      const contextStr = getUrlContext(url);
      progress.report(0, `fetch-url: ${contextStr} [starting]`);
      logDebug('Fetching URL', { url });
    
      try {
        progress.report(1, `fetch-url: ${contextStr} [fetching HTML]`);
        const { pipeline, inlineResult } = await fetchPipeline(
          url,
          signal,
          progress,
          input.skipNoiseRemoval,
          input.forceRefresh,
          input.maxInlineChars
        );
    
        if (pipeline.fromCache) {
          progress.report(3, `fetch-url: ${contextStr} [loaded from cache]`);
        }
    
        progress.report(4, `fetch-url: ${contextStr} • completed`);
        return buildResponse(pipeline, inlineResult, url);
      } catch (error) {
        const isAbort = isAbortError(error);
        progress.report(
          4,
          `fetch-url: ${contextStr} • ${isAbort ? 'cancelled' : 'failed'}`
        );
        throw error;
      }
    }
  • Input validation schema - defines the URL, skipNoiseRemoval, forceRefresh, and maxInlineChars parameters for fetch-url tool
    export const fetchUrlInputSchema = z.strictObject({
      url: z
        .url({ protocol: /^https?$/i })
        .min(1)
        .max(config.constants.maxUrlLength)
        .describe(`Target URL. Max ${config.constants.maxUrlLength} chars.`),
      skipNoiseRemoval: z
        .boolean()
        .optional()
        .describe('Preserve navigation/footers (disable noise filtering).'),
      forceRefresh: z
        .boolean()
        .optional()
        .describe('Bypass cache and fetch fresh content.'),
      maxInlineChars: z
        .number()
        .int()
        .min(0)
        .max(config.constants.maxHtmlSize)
        .optional()
        .describe(
          `Inline markdown limit (0-${config.constants.maxHtmlSize}, 0=unlimited). Lower of this or global limit applies.`
        ),
    });
  • Output validation schema - defines the structure of the response including url, markdown, metadata, cacheResourceUri, and truncation fields
    export const fetchUrlOutputSchema = z.strictObject({
      url: z
        .string()
        .min(1)
        .max(config.constants.maxUrlLength)
        .describe('Fetched URL.'),
      inputUrl: z
        .string()
        .max(config.constants.maxUrlLength)
        .optional()
        .describe('Original requested URL.'),
      resolvedUrl: z
        .string()
        .max(config.constants.maxUrlLength)
        .optional()
        .describe('Final URL after raw-content transformations.'),
      finalUrl: z
        .string()
        .max(config.constants.maxUrlLength)
        .optional()
        .describe('Final URL after HTTP redirects.'),
      cacheResourceUri: z
        .string()
        .max(config.constants.maxUrlLength)
        .optional()
        .describe('URI for resources/read to get full markdown.'),
      title: z.string().max(512).optional().describe('Page title.'),
      metadata: z
        .strictObject({
          title: z.string().max(512).optional().describe('Detected page title.'),
          description: z
            .string()
            .max(2048)
            .optional()
            .describe('Detected page description.'),
          author: z.string().max(512).optional().describe('Detected page author.'),
          image: z
            .string()
            .max(config.constants.maxUrlLength)
            .optional()
            .describe('Detected page preview image URL.'),
          favicon: z
            .string()
            .max(config.constants.maxUrlLength)
            .optional()
            .describe('Detected page favicon URL.'),
          publishedAt: z
            .string()
            .max(64)
            .optional()
            .describe('Detected publication date.'),
          modifiedAt: z
            .string()
            .max(64)
            .optional()
            .describe('Detected last modified date.'),
        })
        .optional()
        .describe('Extracted page metadata.'),
      markdown: (config.constants.maxInlineContentChars > 0
        ? z.string().max(config.constants.maxInlineContentChars)
        : z.string()
      )
        .optional()
        .describe('Extracted Markdown. May be truncated (check truncated field).'),
      fromCache: z.boolean().optional().describe('True if served from cache.'),
      fetchedAt: z.string().max(64).optional().describe('ISO timestamp of fetch.'),
      contentSize: z
        .number()
        .int()
        .min(0)
        .max(config.constants.maxHtmlSize * 4)
        .optional()
        .describe('Full markdown size before truncation.'),
      truncated: z.boolean().optional().describe('True if markdown was truncated.'),
    });
  • Tool registration - registerTools function registers fetch-url with the MCP server, including input/output schemas and handler
    export function registerTools(server: McpServer): void {
      if (!config.tools.enabled.includes(FETCH_URL_TOOL_NAME)) {
        unregisterTaskCapableTool(FETCH_URL_TOOL_NAME);
        return;
      }
    
      registerTaskCapableTool({
        name: FETCH_URL_TOOL_NAME,
        parseArguments: (args) => {
          const parsed = fetchUrlInputSchema.safeParse(args);
          if (!parsed.success) {
            throw new McpError(
              ErrorCode.InvalidParams,
              'Invalid arguments for fetch-url'
            );
          }
          return parsed.data;
        },
        execute: fetchUrlToolHandler,
      });
    
      const registeredTool = server.registerTool(
        TOOL_DEFINITION.name,
        {
          title: TOOL_DEFINITION.title,
          description: TOOL_DEFINITION.description,
          inputSchema: TOOL_DEFINITION.inputSchema,
          outputSchema: TOOL_DEFINITION.outputSchema,
          annotations: TOOL_DEFINITION.annotations,
          execution: TOOL_DEFINITION.execution,
          icons: [TOOL_ICON],
        } as { inputSchema: typeof fetchUrlInputSchema } & Record<string, unknown>,
        withRequestContextIfMissing(TOOL_DEFINITION.handler)
      );
      // SDK typing gap workaround: preserve runtime `execution` metadata until the
      // registered tool type includes this field.
      applyRegisteredToolExecutionMetadata(
        registeredTool,
        TOOL_DEFINITION.execution
      );
    }
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/j0hanz/super-fetch-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server