webfetch
Fetch and parse HTML content from URLs to extract text, headings, links, metadata, and images using custom CSS selectors and configuration options.
Instructions
Fetch and parse HTML content from any URL. Extract text, headings, links, metadata, images, or use custom CSS selectors. Supports timeout configuration, custom user-agent, and redirect handling.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL to fetch content from | |
| extract | No | Types of content to extract (default: all) | |
| selectors | No | Custom CSS selectors to extract (key: name, value: selector) | |
| options | No |
Implementation Reference
- src/tools/webfetch.ts:78-158 (handler)Core handler function that fetches URL content using axios, parses HTML with Cheerio, extracts specified elements (text, headings, links, metadata, images, custom selectors), and returns structured ExtractedContent. Handles retries, timeouts, errors.protected async execute(params: WebFetchParams): Promise<ExtractedContent> { logger.info(`Fetching content from URL`, { url: params.url }); try { // Configure request options const config: any = { timeout: params.options?.timeout || 10000, maxRedirects: params.options?.maxRedirects ?? 5, validateStatus: (status: number) => status >= 200 && status < 400, }; if (params.options?.userAgent) { config.headers = { 'User-Agent': params.options.userAgent }; } if (params.options?.followRedirects === false) { config.maxRedirects = 0; } // Fetch the HTML content const response = await httpClient.get(params.url, config); const html = response.data; // Load HTML into Cheerio const $ = cheerio.load(html); // Determine what to extract const extractAll = !params.extract || params.extract.length === 0; const shouldExtract = (type: string) => extractAll || params.extract?.includes(type as any); const result: ExtractedContent = { url: params.url, }; // Extract text content if (shouldExtract('text')) { result.text = this.extractText($); } // Extract headings if (shouldExtract('headings')) { result.headings = this.extractHeadings($); } // Extract links if (shouldExtract('links')) { result.links = this.extractLinks($, params.url); } // Extract metadata if (shouldExtract('metadata')) { result.metadata = this.extractMetadata($); } // Extract images if (shouldExtract('images')) { result.images = this.extractImages($, params.url); } // Extract custom selectors if (params.selectors) { result.custom = this.extractCustomSelectors($, params.selectors); } logger.info(`Successfully fetched and parsed content`, { url: params.url }); return result; } catch (error) { if (axios.isAxiosError(error)) { if (error.code === 'ECONNABORTED') { throw new Error(`Request timeout: ${params.url}`); } else if (error.response) { throw new Error( `HTTP ${error.response.status}: ${error.response.statusText} - ${params.url}` ); } else if (error.request) { throw new Error(`Network error: Unable to reach ${params.url}`); } } throw new Error(`Failed to fetch content: ${error instanceof Error ? error.message : 'Unknown error'}`); } }
- src/tools/webfetch.ts:54-65 (schema)Zod schema defining input parameters for webfetch tool: url (required), extract array, selectors record, options.const webFetchSchema = z.object({ url: z.string().url().describe('URL to fetch content from'), extract: z .array(z.enum(['text', 'headings', 'links', 'metadata', 'images'])) .optional() .describe('Types of content to extract (default: all)'), selectors: z .record(z.string()) .optional() .describe('Custom CSS selectors to extract (key: name, value: selector)'), options: webFetchOptionsSchema, });
- src/tools/index.ts:17-31 (registration)Function that instantiates and registers WebFetchTool along with other tools to the MCP server via server.registerTools().export function registerAllTools(server: MCPServer): void { const tools: BaseTool[] = [ new WebSearchTool(), new WebFetchTool(), new TypeConversionTool(), ]; // Register all tools if (tools.length > 0) { server.registerTools(tools); logger.info(`Registered ${tools.length} tool(s): ${tools.map(t => t.name).join(', ')}`); } else { logger.warn('No tools registered - add tool implementations to src/tools/index.ts'); } }
- src/tools/webfetch.ts:29-37 (schema)TypeScript interface defining the output structure of the webfetch tool.export interface ExtractedContent { url: string; text?: string; headings?: { level: number; text: string }[]; links?: { text: string; href: string }[]; metadata?: Record<string, string>; images?: { src: string; alt: string }[]; custom?: Record<string, string | string[]>; }
- src/tools/webfetch.ts:72-73 (registration)Class definition with tool name 'webfetch' set, extending BaseTool with the schema.export class WebFetchTool extends BaseTool<typeof webFetchSchema> { readonly name = 'webfetch';