Skip to main content
Glama
brendon92

Specialized AI Search Tools

by brendon92

webfetch

Fetch and parse HTML content from URLs to extract text, headings, links, metadata, and images using custom CSS selectors and configuration options.

Instructions

Fetch and parse HTML content from any URL. Extract text, headings, links, metadata, images, or use custom CSS selectors. Supports timeout configuration, custom user-agent, and redirect handling.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL to fetch content from
extractNoTypes of content to extract (default: all)
selectorsNoCustom CSS selectors to extract (key: name, value: selector)
optionsNo

Implementation Reference

  • Core handler function that fetches URL content using axios, parses HTML with Cheerio, extracts specified elements (text, headings, links, metadata, images, custom selectors), and returns structured ExtractedContent. Handles retries, timeouts, errors.
    protected async execute(params: WebFetchParams): Promise<ExtractedContent> {
        logger.info(`Fetching content from URL`, { url: params.url });
    
        try {
            // Configure request options
            const config: any = {
                timeout: params.options?.timeout || 10000,
                maxRedirects: params.options?.maxRedirects ?? 5,
                validateStatus: (status: number) => status >= 200 && status < 400,
            };
    
            if (params.options?.userAgent) {
                config.headers = { 'User-Agent': params.options.userAgent };
            }
    
            if (params.options?.followRedirects === false) {
                config.maxRedirects = 0;
            }
    
            // Fetch the HTML content
            const response = await httpClient.get(params.url, config);
            const html = response.data;
    
            // Load HTML into Cheerio
            const $ = cheerio.load(html);
    
            // Determine what to extract
            const extractAll = !params.extract || params.extract.length === 0;
            const shouldExtract = (type: string) => extractAll || params.extract?.includes(type as any);
    
            const result: ExtractedContent = {
                url: params.url,
            };
    
            // Extract text content
            if (shouldExtract('text')) {
                result.text = this.extractText($);
            }
    
            // Extract headings
            if (shouldExtract('headings')) {
                result.headings = this.extractHeadings($);
            }
    
            // Extract links
            if (shouldExtract('links')) {
                result.links = this.extractLinks($, params.url);
            }
    
            // Extract metadata
            if (shouldExtract('metadata')) {
                result.metadata = this.extractMetadata($);
            }
    
            // Extract images
            if (shouldExtract('images')) {
                result.images = this.extractImages($, params.url);
            }
    
            // Extract custom selectors
            if (params.selectors) {
                result.custom = this.extractCustomSelectors($, params.selectors);
            }
    
            logger.info(`Successfully fetched and parsed content`, { url: params.url });
            return result;
        } catch (error) {
            if (axios.isAxiosError(error)) {
                if (error.code === 'ECONNABORTED') {
                    throw new Error(`Request timeout: ${params.url}`);
                } else if (error.response) {
                    throw new Error(
                        `HTTP ${error.response.status}: ${error.response.statusText} - ${params.url}`
                    );
                } else if (error.request) {
                    throw new Error(`Network error: Unable to reach ${params.url}`);
                }
            }
            throw new Error(`Failed to fetch content: ${error instanceof Error ? error.message : 'Unknown error'}`);
        }
    }
  • Zod schema defining input parameters for webfetch tool: url (required), extract array, selectors record, options.
    const webFetchSchema = z.object({
        url: z.string().url().describe('URL to fetch content from'),
        extract: z
            .array(z.enum(['text', 'headings', 'links', 'metadata', 'images']))
            .optional()
            .describe('Types of content to extract (default: all)'),
        selectors: z
            .record(z.string())
            .optional()
            .describe('Custom CSS selectors to extract (key: name, value: selector)'),
        options: webFetchOptionsSchema,
    });
  • Function that instantiates and registers WebFetchTool along with other tools to the MCP server via server.registerTools().
    export function registerAllTools(server: MCPServer): void {
        const tools: BaseTool[] = [
            new WebSearchTool(),
            new WebFetchTool(),
            new TypeConversionTool(),
        ];
    
        // Register all tools
        if (tools.length > 0) {
            server.registerTools(tools);
            logger.info(`Registered ${tools.length} tool(s): ${tools.map(t => t.name).join(', ')}`);
        } else {
            logger.warn('No tools registered - add tool implementations to src/tools/index.ts');
        }
    }
  • TypeScript interface defining the output structure of the webfetch tool.
    export interface ExtractedContent {
        url: string;
        text?: string;
        headings?: { level: number; text: string }[];
        links?: { text: string; href: string }[];
        metadata?: Record<string, string>;
        images?: { src: string; alt: string }[];
        custom?: Record<string, string | string[]>;
    }
  • Class definition with tool name 'webfetch' set, extending BaseTool with the schema.
    export class WebFetchTool extends BaseTool<typeof webFetchSchema> {
        readonly name = 'webfetch';
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/brendon92/mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server