Skip to main content
Glama

web-fetch

Fetch and extract content from URLs, supporting HTML text extraction, JSON, and plain text formats with configurable domain security controls.

Instructions

Fetch content from a URL. Supports HTML (extracts text), JSON, and plain text. By default, only allows trusted domains for security. Set allow_any_domain=true to fetch from any URL (use with caution).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL to fetch content from
allow_any_domainNoAllow fetching from any domain (default: false, only trusted domains)
include_headersNoInclude response headers in the output (default: false)

Implementation Reference

  • The handler function that executes the web-fetch tool logic: fetches the URL using fetchUrl helper, formats the result, and returns text content or error.
    	async ({ url, allow_any_domain = false, include_headers = false }) => {
    		try {
    			const result = await fetchUrl(url, allow_any_domain, include_headers);
    			const formattedResult = formatFetchResult(result);
    
    			return {
    				content: [
    					{
    						type: "text" as const,
    						text: formattedResult,
    					},
    				],
    			};
    		} catch (error) {
    			return {
    				content: [
    					{
    						type: "text" as const,
    						text: `Error fetching URL: ${error instanceof Error ? error.message : "Unknown error"}`,
    					},
    				],
    				isError: true,
    			};
    		}
    	},
    );
  • Input schema for the web-fetch tool using Zod: requires URL, optional allow_any_domain and include_headers booleans.
    inputSchema: {
    	url: z.string().url().describe("The URL to fetch content from"),
    	allow_any_domain: z
    		.boolean()
    		.optional()
    		.describe(
    			"Allow fetching from any domain (default: false, only trusted domains)",
    		),
    	include_headers: z
    		.boolean()
    		.optional()
    		.describe("Include response headers in the output (default: false)"),
    },
  • registerWebFetchTool function that registers the 'web-fetch' tool on the MCP server, including name, description, input schema, and handler.
    export function registerWebFetchTool(server: McpServer) {
    	server.registerTool(
    		"web-fetch",
    		{
    			title: "Web Fetch",
    			description:
    				"Fetch content from a URL. Supports HTML (extracts text), JSON, and plain text. By default, only allows trusted domains for security. Set allow_any_domain=true to fetch from any URL (use with caution).",
    			inputSchema: {
    				url: z.string().url().describe("The URL to fetch content from"),
    				allow_any_domain: z
    					.boolean()
    					.optional()
    					.describe(
    						"Allow fetching from any domain (default: false, only trusted domains)",
    					),
    				include_headers: z
    					.boolean()
    					.optional()
    					.describe("Include response headers in the output (default: false)"),
    			},
    		},
    		async ({ url, allow_any_domain = false, include_headers = false }) => {
    			try {
    				const result = await fetchUrl(url, allow_any_domain, include_headers);
    				const formattedResult = formatFetchResult(result);
    
    				return {
    					content: [
    						{
    							type: "text" as const,
    							text: formattedResult,
    						},
    					],
    				};
    			} catch (error) {
    				return {
    					content: [
    						{
    							type: "text" as const,
    							text: `Error fetching URL: ${error instanceof Error ? error.message : "Unknown error"}`,
    						},
    					],
    					isError: true,
    				};
    			}
    		},
    	);
    }
  • Invocation of registerWebFetchTool during tools registration in index.ts, followed by logging.
    registerWebFetchTool(server);
    logger.tool("web-fetch", "registered");
  • Core helper function fetchUrl that performs the actual HTTP fetch, URL validation, content type processing (JSON pretty-print, HTML text extraction, text truncation), and returns structured FetchResult.
    async function fetchUrl(
    	url: string,
    	allowAnyDomain = false,
    	includeHeaders = false,
    ): Promise<FetchResult> {
    	// Validate URL
    	if (!isUrlAllowed(url, allowAnyDomain)) {
    		throw new Error(
    			allowAnyDomain
    				? "Invalid URL protocol (only http/https allowed)"
    				: `URL not allowed. Allowed domains: ${ALLOWED_DOMAINS.join(", ")}, and Duyet's GitHub repositories (github.com/duyet/*, raw.githubusercontent.com/duyet/*, gist.github.com/duyet/*)`,
    		);
    	}
    
    	try {
    		const response = await fetch(url, {
    			headers: {
    				"User-Agent": "Mozilla/5.0 (compatible; DuyetMCP/0.1; +https://duyet.net/)",
    			},
    			redirect: "follow",
    		});
    
    		const contentType = response.headers.get("content-type") || "text/plain";
    		const status = response.status;
    
    		// Check content length to prevent memory issues
    		const contentLength = response.headers.get("content-length");
    		if (contentLength) {
    			const size = Number.parseInt(contentLength, 10);
    			if (size > MAX_CONTENT_LENGTH) {
    				throw new Error(
    					`Content too large: ${(size / 1024 / 1024).toFixed(2)}MB (max ${MAX_CONTENT_LENGTH / 1024 / 1024}MB)`,
    				);
    			}
    		}
    
    		// Get response headers if requested
    		const headers: Record<string, string> = {};
    		if (includeHeaders) {
    			response.headers.forEach((value, key) => {
    				headers[key] = value;
    			});
    		}
    
    		let content: string;
    
    		// Process based on content type
    		if (contentType.includes("application/json")) {
    			const json = await response.json();
    			content = JSON.stringify(json, null, 2);
    		} else if (contentType.includes("text/html")) {
    			const html = await response.text();
    			// For HTML, extract readable text content
    			content = extractTextFromHtml(html);
    
    			// Limit content length for large pages
    			if (content.length > 10000) {
    				content = `${content.substring(0, 10000)}\n\n[Content truncated...]`;
    			}
    		} else {
    			// Plain text or other types
    			content = await response.text();
    
    			// Limit content length
    			if (content.length > 50000) {
    				content = `${content.substring(0, 50000)}\n\n[Content truncated...]`;
    			}
    		}
    
    		return {
    			url,
    			status,
    			contentType,
    			content,
    			headers: includeHeaders ? headers : undefined,
    		};
    	} catch (error) {
    		throw new Error(
    			`Failed to fetch URL: ${error instanceof Error ? error.message : "Unknown error"}`,
    		);
    	}
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/duyet/duyet-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server