Skip to main content
Glama

Web Fetch

web-fetch

Fetch and extract content from URLs, supporting HTML text extraction, JSON, and plain text formats with configurable domain security controls.

Instructions

Fetch content from a URL. Supports HTML (extracts text), JSON, and plain text. By default, only allows trusted domains for security. Set allow_any_domain=true to fetch from any URL (use with caution).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL to fetch content from
allow_any_domainNoAllow fetching from any domain (default: false, only trusted domains)
include_headersNoInclude response headers in the output (default: false)

Implementation Reference

  • The handler function that executes the web-fetch tool logic: fetches the URL using fetchUrl helper, formats the result, and returns text content or error.
    	async ({ url, allow_any_domain = false, include_headers = false }) => {
    		try {
    			const result = await fetchUrl(url, allow_any_domain, include_headers);
    			const formattedResult = formatFetchResult(result);
    
    			return {
    				content: [
    					{
    						type: "text" as const,
    						text: formattedResult,
    					},
    				],
    			};
    		} catch (error) {
    			return {
    				content: [
    					{
    						type: "text" as const,
    						text: `Error fetching URL: ${error instanceof Error ? error.message : "Unknown error"}`,
    					},
    				],
    				isError: true,
    			};
    		}
    	},
    );
  • Input schema for the web-fetch tool using Zod: requires URL, optional allow_any_domain and include_headers booleans.
    inputSchema: {
    	url: z.string().url().describe("The URL to fetch content from"),
    	allow_any_domain: z
    		.boolean()
    		.optional()
    		.describe(
    			"Allow fetching from any domain (default: false, only trusted domains)",
    		),
    	include_headers: z
    		.boolean()
    		.optional()
    		.describe("Include response headers in the output (default: false)"),
    },
  • registerWebFetchTool function that registers the 'web-fetch' tool on the MCP server, including name, description, input schema, and handler.
    export function registerWebFetchTool(server: McpServer) {
    	server.registerTool(
    		"web-fetch",
    		{
    			title: "Web Fetch",
    			description:
    				"Fetch content from a URL. Supports HTML (extracts text), JSON, and plain text. By default, only allows trusted domains for security. Set allow_any_domain=true to fetch from any URL (use with caution).",
    			inputSchema: {
    				url: z.string().url().describe("The URL to fetch content from"),
    				allow_any_domain: z
    					.boolean()
    					.optional()
    					.describe(
    						"Allow fetching from any domain (default: false, only trusted domains)",
    					),
    				include_headers: z
    					.boolean()
    					.optional()
    					.describe("Include response headers in the output (default: false)"),
    			},
    		},
    		async ({ url, allow_any_domain = false, include_headers = false }) => {
    			try {
    				const result = await fetchUrl(url, allow_any_domain, include_headers);
    				const formattedResult = formatFetchResult(result);
    
    				return {
    					content: [
    						{
    							type: "text" as const,
    							text: formattedResult,
    						},
    					],
    				};
    			} catch (error) {
    				return {
    					content: [
    						{
    							type: "text" as const,
    							text: `Error fetching URL: ${error instanceof Error ? error.message : "Unknown error"}`,
    						},
    					],
    					isError: true,
    				};
    			}
    		},
    	);
    }
  • Invocation of registerWebFetchTool during tools registration in index.ts, followed by logging.
    registerWebFetchTool(server);
    logger.tool("web-fetch", "registered");
  • Core helper function fetchUrl that performs the actual HTTP fetch, URL validation, content type processing (JSON pretty-print, HTML text extraction, text truncation), and returns structured FetchResult.
    async function fetchUrl(
    	url: string,
    	allowAnyDomain = false,
    	includeHeaders = false,
    ): Promise<FetchResult> {
    	// Validate URL
    	if (!isUrlAllowed(url, allowAnyDomain)) {
    		throw new Error(
    			allowAnyDomain
    				? "Invalid URL protocol (only http/https allowed)"
    				: `URL not allowed. Allowed domains: ${ALLOWED_DOMAINS.join(", ")}, and Duyet's GitHub repositories (github.com/duyet/*, raw.githubusercontent.com/duyet/*, gist.github.com/duyet/*)`,
    		);
    	}
    
    	try {
    		const response = await fetch(url, {
    			headers: {
    				"User-Agent": "Mozilla/5.0 (compatible; DuyetMCP/0.1; +https://duyet.net/)",
    			},
    			redirect: "follow",
    		});
    
    		const contentType = response.headers.get("content-type") || "text/plain";
    		const status = response.status;
    
    		// Check content length to prevent memory issues
    		const contentLength = response.headers.get("content-length");
    		if (contentLength) {
    			const size = Number.parseInt(contentLength, 10);
    			if (size > MAX_CONTENT_LENGTH) {
    				throw new Error(
    					`Content too large: ${(size / 1024 / 1024).toFixed(2)}MB (max ${MAX_CONTENT_LENGTH / 1024 / 1024}MB)`,
    				);
    			}
    		}
    
    		// Get response headers if requested
    		const headers: Record<string, string> = {};
    		if (includeHeaders) {
    			response.headers.forEach((value, key) => {
    				headers[key] = value;
    			});
    		}
    
    		let content: string;
    
    		// Process based on content type
    		if (contentType.includes("application/json")) {
    			const json = await response.json();
    			content = JSON.stringify(json, null, 2);
    		} else if (contentType.includes("text/html")) {
    			const html = await response.text();
    			// For HTML, extract readable text content
    			content = extractTextFromHtml(html);
    
    			// Limit content length for large pages
    			if (content.length > 10000) {
    				content = `${content.substring(0, 10000)}\n\n[Content truncated...]`;
    			}
    		} else {
    			// Plain text or other types
    			content = await response.text();
    
    			// Limit content length
    			if (content.length > 50000) {
    				content = `${content.substring(0, 50000)}\n\n[Content truncated...]`;
    			}
    		}
    
    		return {
    			url,
    			status,
    			contentType,
    			content,
    			headers: includeHeaders ? headers : undefined,
    		};
    	} catch (error) {
    		throw new Error(
    			`Failed to fetch URL: ${error instanceof Error ? error.message : "Unknown error"}`,
    		);
    	}
    }
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: content extraction for HTML (extracts text), support for multiple formats (JSON, plain text), security restrictions (default trusted domains only), and the ability to override security with 'allow_any_domain'. However, it doesn't mention potential rate limits, error handling, or response structure details, leaving some behavioral aspects uncovered.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured in three sentences that each serve a clear purpose: stating the core function, listing supported formats, and explaining security behavior with a cautionary note. There's no redundant information, and it's appropriately sized for the tool's complexity, making it easy for an agent to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 3 parameters, 100% schema coverage, but no annotations or output schema, the description does well by covering purpose, usage guidelines, and key behaviors. It adequately explains what the tool does and when to use it, though it could benefit from more detail about output format or error cases to be fully complete given the lack of output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, so the schema already documents all three parameters thoroughly. The description adds some context by explaining the security implications of 'allow_any_domain' ('use with caution') and mentioning content type support, but doesn't provide additional semantic meaning beyond what the schema descriptions already cover for each parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'fetch' and resource 'content from a URL', specifying the action and target. It distinguishes this tool from siblings like 'web-search' by focusing on direct URL content retrieval rather than search functionality, making the purpose specific and well-differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use the tool: for fetching content from URLs, with specific content types supported (HTML, JSON, plain text). It also includes a clear cautionary note about the 'allow_any_domain' parameter, advising 'use with caution' for non-trusted domains, which helps the agent understand security implications and appropriate usage contexts.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/duyet/duyet-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server