Skip to main content
Glama
brendon92

Specialized AI Search Tools

by brendon92

webfetch

webfetch

Fetch and parse HTML content from URLs to extract text, headings, links, metadata, and images using custom CSS selectors and configuration options.

Instructions

Fetch and parse HTML content from any URL. Extract text, headings, links, metadata, images, or use custom CSS selectors. Supports timeout configuration, custom user-agent, and redirect handling.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL to fetch content from
extractNoTypes of content to extract (default: all)
selectorsNoCustom CSS selectors to extract (key: name, value: selector)
optionsNo

Implementation Reference

  • Core handler function that fetches URL content using axios, parses HTML with Cheerio, extracts specified elements (text, headings, links, metadata, images, custom selectors), and returns structured ExtractedContent. Handles retries, timeouts, errors.
    protected async execute(params: WebFetchParams): Promise<ExtractedContent> {
        logger.info(`Fetching content from URL`, { url: params.url });
    
        try {
            // Configure request options
            const config: any = {
                timeout: params.options?.timeout || 10000,
                maxRedirects: params.options?.maxRedirects ?? 5,
                validateStatus: (status: number) => status >= 200 && status < 400,
            };
    
            if (params.options?.userAgent) {
                config.headers = { 'User-Agent': params.options.userAgent };
            }
    
            if (params.options?.followRedirects === false) {
                config.maxRedirects = 0;
            }
    
            // Fetch the HTML content
            const response = await httpClient.get(params.url, config);
            const html = response.data;
    
            // Load HTML into Cheerio
            const $ = cheerio.load(html);
    
            // Determine what to extract
            const extractAll = !params.extract || params.extract.length === 0;
            const shouldExtract = (type: string) => extractAll || params.extract?.includes(type as any);
    
            const result: ExtractedContent = {
                url: params.url,
            };
    
            // Extract text content
            if (shouldExtract('text')) {
                result.text = this.extractText($);
            }
    
            // Extract headings
            if (shouldExtract('headings')) {
                result.headings = this.extractHeadings($);
            }
    
            // Extract links
            if (shouldExtract('links')) {
                result.links = this.extractLinks($, params.url);
            }
    
            // Extract metadata
            if (shouldExtract('metadata')) {
                result.metadata = this.extractMetadata($);
            }
    
            // Extract images
            if (shouldExtract('images')) {
                result.images = this.extractImages($, params.url);
            }
    
            // Extract custom selectors
            if (params.selectors) {
                result.custom = this.extractCustomSelectors($, params.selectors);
            }
    
            logger.info(`Successfully fetched and parsed content`, { url: params.url });
            return result;
        } catch (error) {
            if (axios.isAxiosError(error)) {
                if (error.code === 'ECONNABORTED') {
                    throw new Error(`Request timeout: ${params.url}`);
                } else if (error.response) {
                    throw new Error(
                        `HTTP ${error.response.status}: ${error.response.statusText} - ${params.url}`
                    );
                } else if (error.request) {
                    throw new Error(`Network error: Unable to reach ${params.url}`);
                }
            }
            throw new Error(`Failed to fetch content: ${error instanceof Error ? error.message : 'Unknown error'}`);
        }
    }
  • Zod schema defining input parameters for webfetch tool: url (required), extract array, selectors record, options.
    const webFetchSchema = z.object({
        url: z.string().url().describe('URL to fetch content from'),
        extract: z
            .array(z.enum(['text', 'headings', 'links', 'metadata', 'images']))
            .optional()
            .describe('Types of content to extract (default: all)'),
        selectors: z
            .record(z.string())
            .optional()
            .describe('Custom CSS selectors to extract (key: name, value: selector)'),
        options: webFetchOptionsSchema,
    });
  • Function that instantiates and registers WebFetchTool along with other tools to the MCP server via server.registerTools().
    export function registerAllTools(server: MCPServer): void {
        const tools: BaseTool[] = [
            new WebSearchTool(),
            new WebFetchTool(),
            new TypeConversionTool(),
        ];
    
        // Register all tools
        if (tools.length > 0) {
            server.registerTools(tools);
            logger.info(`Registered ${tools.length} tool(s): ${tools.map(t => t.name).join(', ')}`);
        } else {
            logger.warn('No tools registered - add tool implementations to src/tools/index.ts');
        }
    }
  • TypeScript interface defining the output structure of the webfetch tool.
    export interface ExtractedContent {
        url: string;
        text?: string;
        headings?: { level: number; text: string }[];
        links?: { text: string; href: string }[];
        metadata?: Record<string, string>;
        images?: { src: string; alt: string }[];
        custom?: Record<string, string | string[]>;
    }
  • Class definition with tool name 'webfetch' set, extending BaseTool with the schema.
    export class WebFetchTool extends BaseTool<typeof webFetchSchema> {
        readonly name = 'webfetch';
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions 'fetch and parse HTML content' and configuration options like timeout and redirect handling, which provides some behavioral context. However, it doesn't address important aspects like error handling, rate limits, authentication requirements, or what happens with malformed URLs or network failures.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise and front-loaded, packing comprehensive information into two efficient sentences. Every phrase ('fetch and parse HTML content,' 'extract text...') earns its place without redundancy or unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 parameters, nested objects, no output schema, and no annotations), the description is adequate but has gaps. It covers the core functionality and parameters well, but lacks details on return values, error conditions, and behavioral constraints that would be important for an AI agent to use this tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds meaningful context beyond the schema by explaining the purpose of extraction ('extract text, headings, links, metadata, images') and configuration options ('supports timeout configuration, custom user-agent, and redirect handling'). With 75% schema description coverage, the description compensates well by providing semantic understanding of what the parameters achieve, though it doesn't detail all parameter interactions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('fetch and parse HTML content') and resources ('from any URL'), distinguishing it from sibling tools like typeconversion and websearch. It explicitly mentions extraction capabilities and configuration options, providing comprehensive purpose definition.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context through phrases like 'extract text, headings, links, metadata, images' and 'supports timeout configuration,' suggesting when to use this tool for web content extraction. However, it lacks explicit guidance on when to choose this tool versus the websearch sibling tool or any exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/brendon92/mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server