fetch-url

Fetch public webpages and convert HTML to clean Markdown for AI processing, with built-in content cleaning and secure web access.

Instructions

Web Content Extractor Fetch public webpages and convert HTML to clean Markdown.

READ-ONLY. No JavaScript execution.
GitHub/GitLab/Bitbucket URLs auto-transform to raw endpoints (check resolvedUrl).
If truncated=true, use cacheResourceUri with resources/read for full content.
For large pages/timeouts, use task mode (task: {}).
If error queue_full, retry with task mode.

Input Schema

TableJSON Schema

Name	Required	Description
`url`	Yes	Target URL. Max 2048 chars.
`skipNoiseRemoval`	No	Preserve navigation/footers (disable noise filtering).
`forceRefresh`	No	Bypass cache and fetch fresh content.
`maxInlineChars`	No	Inline markdown limit (0-10485760, 0=unlimited). Lower of this or global limit applies.

Implementation Reference

src/tools/fetch-url.ts:450-458 (handler)
Main tool handler function - fetchUrlToolHandler is the primary entry point that wraps executeFetch with error handling and logging
export async function fetchUrlToolHandler( input: FetchUrlInput, extra?: ToolHandlerExtra ): Promise<ToolResponseBase> { return executeFetch(input, extra).catch((error: unknown) => { logError('fetch-url tool error', toError(error)); return handleToolError(error, input.url, 'Failed to fetch URL'); }); }
src/tools/fetch-url.ts:411-448 (handler)
Core execution logic - executeFetch function orchestrates the URL fetching, progress reporting, and response building
async function executeFetch( input: FetchUrlInput, extra?: ToolHandlerExtra ): Promise<ToolResponseBase> { const { url } = input; const signal = buildToolAbortSignal(extra?.signal); const progress = createProgressReporter(extra); const contextStr = getUrlContext(url); progress.report(0, `fetch-url: ${contextStr} [starting]`); logDebug('Fetching URL', { url }); try { progress.report(1, `fetch-url: ${contextStr} [fetching HTML]`); const { pipeline, inlineResult } = await fetchPipeline( url, signal, progress, input.skipNoiseRemoval, input.forceRefresh, input.maxInlineChars ); if (pipeline.fromCache) { progress.report(3, `fetch-url: ${contextStr} [loaded from cache]`); } progress.report(4, `fetch-url: ${contextStr} • completed`); return buildResponse(pipeline, inlineResult, url); } catch (error) { const isAbort = isAbortError(error); progress.report( 4, `fetch-url: ${contextStr} • ${isAbort ? 'cancelled' : 'failed'}` ); throw error; } }
src/schemas/inputs.ts:5-28 (schema)
Input validation schema - defines the URL, skipNoiseRemoval, forceRefresh, and maxInlineChars parameters for fetch-url tool
export const fetchUrlInputSchema = z.strictObject({ url: z .url({ protocol: /^https?$/i }) .min(1) .max(config.constants.maxUrlLength) .describe(`Target URL. Max ${config.constants.maxUrlLength} chars.`), skipNoiseRemoval: z .boolean() .optional() .describe('Preserve navigation/footers (disable noise filtering).'), forceRefresh: z .boolean() .optional() .describe('Bypass cache and fetch fresh content.'), maxInlineChars: z .number() .int() .min(0) .max(config.constants.maxHtmlSize) .optional() .describe( `Inline markdown limit (0-${config.constants.maxHtmlSize}, 0=unlimited). Lower of this or global limit applies.` ), });
src/schemas/outputs.ts:5-80 (schema)
Output validation schema - defines the structure of the response including url, markdown, metadata, cacheResourceUri, and truncation fields
export const fetchUrlOutputSchema = z.strictObject({ url: z .string() .min(1) .max(config.constants.maxUrlLength) .describe('Fetched URL.'), inputUrl: z .string() .max(config.constants.maxUrlLength) .optional() .describe('Original requested URL.'), resolvedUrl: z .string() .max(config.constants.maxUrlLength) .optional() .describe('Final URL after raw-content transformations.'), finalUrl: z .string() .max(config.constants.maxUrlLength) .optional() .describe('Final URL after HTTP redirects.'), cacheResourceUri: z .string() .max(config.constants.maxUrlLength) .optional() .describe('URI for resources/read to get full markdown.'), title: z.string().max(512).optional().describe('Page title.'), metadata: z .strictObject({ title: z.string().max(512).optional().describe('Detected page title.'), description: z .string() .max(2048) .optional() .describe('Detected page description.'), author: z.string().max(512).optional().describe('Detected page author.'), image: z .string() .max(config.constants.maxUrlLength) .optional() .describe('Detected page preview image URL.'), favicon: z .string() .max(config.constants.maxUrlLength) .optional() .describe('Detected page favicon URL.'), publishedAt: z .string() .max(64) .optional() .describe('Detected publication date.'), modifiedAt: z .string() .max(64) .optional() .describe('Detected last modified date.'), }) .optional() .describe('Extracted page metadata.'), markdown: (config.constants.maxInlineContentChars > 0 ? z.string().max(config.constants.maxInlineContentChars) : z.string() ) .optional() .describe('Extracted Markdown. May be truncated (check truncated field).'), fromCache: z.boolean().optional().describe('True if served from cache.'), fetchedAt: z.string().max(64).optional().describe('ISO timestamp of fetch.'), contentSize: z .number() .int() .min(0) .max(config.constants.maxHtmlSize * 4) .optional() .describe('Full markdown size before truncation.'), truncated: z.boolean().optional().describe('True if markdown was truncated.'), });
src/tools/fetch-url.ts:571-611 (registration)
Tool registration - registerTools function registers fetch-url with the MCP server, including input/output schemas and handler
export function registerTools(server: McpServer): void { if (!config.tools.enabled.includes(FETCH_URL_TOOL_NAME)) { unregisterTaskCapableTool(FETCH_URL_TOOL_NAME); return; } registerTaskCapableTool({ name: FETCH_URL_TOOL_NAME, parseArguments: (args) => { const parsed = fetchUrlInputSchema.safeParse(args); if (!parsed.success) { throw new McpError( ErrorCode.InvalidParams, 'Invalid arguments for fetch-url' ); } return parsed.data; }, execute: fetchUrlToolHandler, }); const registeredTool = server.registerTool( TOOL_DEFINITION.name, { title: TOOL_DEFINITION.title, description: TOOL_DEFINITION.description, inputSchema: TOOL_DEFINITION.inputSchema, outputSchema: TOOL_DEFINITION.outputSchema, annotations: TOOL_DEFINITION.annotations, execution: TOOL_DEFINITION.execution, icons: [TOOL_ICON], } as { inputSchema: typeof fetchUrlInputSchema } & Record<string, unknown>, withRequestContextIfMissing(TOOL_DEFINITION.handler) ); // SDK typing gap workaround: preserve runtime `execution` metadata until the // registered tool type includes this field. applyRegisteredToolExecutionMetadata( registeredTool, TOOL_DEFINITION.execution ); }

superFetch MCP Server

fetch-url

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API