Skip to main content
Glama
j0hanz

superFetch MCP Server

Fetch URL

fetch-url
Read-onlyIdempotent

Fetch public webpages and convert HTML to clean Markdown for AI processing, with built-in content cleaning and secure web access.

Instructions

Web Content Extractor Fetch public webpages and convert HTML to clean Markdown.

  • READ-ONLY. No JavaScript execution.

  • GitHub/GitLab/Bitbucket URLs auto-transform to raw endpoints (check resolvedUrl).

  • If truncated=true, use cacheResourceUri with resources/read for full content.

  • For large pages/timeouts, use task mode (task: {}).

  • If error queue_full, retry with task mode.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesTarget URL. Max 2048 chars.
skipNoiseRemovalNoPreserve navigation/footers (disable noise filtering).
forceRefreshNoBypass cache and fetch fresh content.
maxInlineCharsNoInline markdown limit (0-10485760, 0=unlimited). Lower of this or global limit applies.

Implementation Reference

  • Main tool handler function - fetchUrlToolHandler is the primary entry point that wraps executeFetch with error handling and logging
    export async function fetchUrlToolHandler(
      input: FetchUrlInput,
      extra?: ToolHandlerExtra
    ): Promise<ToolResponseBase> {
      return executeFetch(input, extra).catch((error: unknown) => {
        logError('fetch-url tool error', toError(error));
        return handleToolError(error, input.url, 'Failed to fetch URL');
      });
    }
  • Core execution logic - executeFetch function orchestrates the URL fetching, progress reporting, and response building
    async function executeFetch(
      input: FetchUrlInput,
      extra?: ToolHandlerExtra
    ): Promise<ToolResponseBase> {
      const { url } = input;
      const signal = buildToolAbortSignal(extra?.signal);
      const progress = createProgressReporter(extra);
    
      const contextStr = getUrlContext(url);
      progress.report(0, `fetch-url: ${contextStr} [starting]`);
      logDebug('Fetching URL', { url });
    
      try {
        progress.report(1, `fetch-url: ${contextStr} [fetching HTML]`);
        const { pipeline, inlineResult } = await fetchPipeline(
          url,
          signal,
          progress,
          input.skipNoiseRemoval,
          input.forceRefresh,
          input.maxInlineChars
        );
    
        if (pipeline.fromCache) {
          progress.report(3, `fetch-url: ${contextStr} [loaded from cache]`);
        }
    
        progress.report(4, `fetch-url: ${contextStr} • completed`);
        return buildResponse(pipeline, inlineResult, url);
      } catch (error) {
        const isAbort = isAbortError(error);
        progress.report(
          4,
          `fetch-url: ${contextStr} • ${isAbort ? 'cancelled' : 'failed'}`
        );
        throw error;
      }
    }
  • Input validation schema - defines the URL, skipNoiseRemoval, forceRefresh, and maxInlineChars parameters for fetch-url tool
    export const fetchUrlInputSchema = z.strictObject({
      url: z
        .url({ protocol: /^https?$/i })
        .min(1)
        .max(config.constants.maxUrlLength)
        .describe(`Target URL. Max ${config.constants.maxUrlLength} chars.`),
      skipNoiseRemoval: z
        .boolean()
        .optional()
        .describe('Preserve navigation/footers (disable noise filtering).'),
      forceRefresh: z
        .boolean()
        .optional()
        .describe('Bypass cache and fetch fresh content.'),
      maxInlineChars: z
        .number()
        .int()
        .min(0)
        .max(config.constants.maxHtmlSize)
        .optional()
        .describe(
          `Inline markdown limit (0-${config.constants.maxHtmlSize}, 0=unlimited). Lower of this or global limit applies.`
        ),
    });
  • Output validation schema - defines the structure of the response including url, markdown, metadata, cacheResourceUri, and truncation fields
    export const fetchUrlOutputSchema = z.strictObject({
      url: z
        .string()
        .min(1)
        .max(config.constants.maxUrlLength)
        .describe('Fetched URL.'),
      inputUrl: z
        .string()
        .max(config.constants.maxUrlLength)
        .optional()
        .describe('Original requested URL.'),
      resolvedUrl: z
        .string()
        .max(config.constants.maxUrlLength)
        .optional()
        .describe('Final URL after raw-content transformations.'),
      finalUrl: z
        .string()
        .max(config.constants.maxUrlLength)
        .optional()
        .describe('Final URL after HTTP redirects.'),
      cacheResourceUri: z
        .string()
        .max(config.constants.maxUrlLength)
        .optional()
        .describe('URI for resources/read to get full markdown.'),
      title: z.string().max(512).optional().describe('Page title.'),
      metadata: z
        .strictObject({
          title: z.string().max(512).optional().describe('Detected page title.'),
          description: z
            .string()
            .max(2048)
            .optional()
            .describe('Detected page description.'),
          author: z.string().max(512).optional().describe('Detected page author.'),
          image: z
            .string()
            .max(config.constants.maxUrlLength)
            .optional()
            .describe('Detected page preview image URL.'),
          favicon: z
            .string()
            .max(config.constants.maxUrlLength)
            .optional()
            .describe('Detected page favicon URL.'),
          publishedAt: z
            .string()
            .max(64)
            .optional()
            .describe('Detected publication date.'),
          modifiedAt: z
            .string()
            .max(64)
            .optional()
            .describe('Detected last modified date.'),
        })
        .optional()
        .describe('Extracted page metadata.'),
      markdown: (config.constants.maxInlineContentChars > 0
        ? z.string().max(config.constants.maxInlineContentChars)
        : z.string()
      )
        .optional()
        .describe('Extracted Markdown. May be truncated (check truncated field).'),
      fromCache: z.boolean().optional().describe('True if served from cache.'),
      fetchedAt: z.string().max(64).optional().describe('ISO timestamp of fetch.'),
      contentSize: z
        .number()
        .int()
        .min(0)
        .max(config.constants.maxHtmlSize * 4)
        .optional()
        .describe('Full markdown size before truncation.'),
      truncated: z.boolean().optional().describe('True if markdown was truncated.'),
    });
  • Tool registration - registerTools function registers fetch-url with the MCP server, including input/output schemas and handler
    export function registerTools(server: McpServer): void {
      if (!config.tools.enabled.includes(FETCH_URL_TOOL_NAME)) {
        unregisterTaskCapableTool(FETCH_URL_TOOL_NAME);
        return;
      }
    
      registerTaskCapableTool({
        name: FETCH_URL_TOOL_NAME,
        parseArguments: (args) => {
          const parsed = fetchUrlInputSchema.safeParse(args);
          if (!parsed.success) {
            throw new McpError(
              ErrorCode.InvalidParams,
              'Invalid arguments for fetch-url'
            );
          }
          return parsed.data;
        },
        execute: fetchUrlToolHandler,
      });
    
      const registeredTool = server.registerTool(
        TOOL_DEFINITION.name,
        {
          title: TOOL_DEFINITION.title,
          description: TOOL_DEFINITION.description,
          inputSchema: TOOL_DEFINITION.inputSchema,
          outputSchema: TOOL_DEFINITION.outputSchema,
          annotations: TOOL_DEFINITION.annotations,
          execution: TOOL_DEFINITION.execution,
          icons: [TOOL_ICON],
        } as { inputSchema: typeof fetchUrlInputSchema } & Record<string, unknown>,
        withRequestContextIfMissing(TOOL_DEFINITION.handler)
      );
      // SDK typing gap workaround: preserve runtime `execution` metadata until the
      // registered tool type includes this field.
      applyRegisteredToolExecutionMetadata(
        registeredTool,
        TOOL_DEFINITION.execution
      );
    }

Tool Definition Quality

Score is being calculated. Check back soon.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/j0hanz/super-fetch-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server