fetch_page
Retrieve and process web pages for LLM context using a URL, with options to include screenshots or limit content length. Part of the Web Content MCP Server for enhanced data extraction.
Instructions
Fetches and processes a web page for LLM context
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| includeScreenshot | No | Whether to include a screenshot (base64 encoded) | |
| maxContentLength | No | Maximum content length to return | |
| url | Yes | URL to fetch |
Implementation Reference
- src/server.ts:176-227 (handler)Main handler function executing the fetch_page tool logic: validates input, fetches and processes page content, optionally includes screenshot, truncates if needed, and formats response.private async handleFetchPage(args: any) { // Validate arguments if (typeof args !== 'object' || args === null || typeof args.url !== 'string') { throw new McpError(ErrorCode.InvalidParams, 'Invalid arguments for fetch_page'); } const { url, includeScreenshot = false, maxContentLength = 10000 } = args; try { // Fetch the page content const html = await this.browserClient.fetchContent(url); // Process the content for LLM const processedContent = this.contentProcessor.processForLLM(html, url); // Truncate if necessary const truncatedContent = processedContent.length > maxContentLength ? processedContent.substring(0, maxContentLength) + '...' : processedContent; // Get screenshot if requested let screenshot = null; if (includeScreenshot) { screenshot = await this.browserClient.takeScreenshot(url); } // Return the result return { content: [ { type: 'text', text: truncatedContent, }, ...(screenshot ? [{ type: 'image', image: screenshot, }] : []), ], }; } catch (error) { console.error('Error fetching page:', error); return { content: [ { type: 'text', text: `Error fetching page: ${error instanceof Error ? error.message : String(error)}`, }, ], isError: true, }; } }
- src/server.ts:59-79 (schema)Input schema and metadata definition for the fetch_page tool in the ListTools response.name: 'fetch_page', description: 'Fetches and processes a web page for LLM context', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'URL to fetch', }, includeScreenshot: { type: 'boolean', description: 'Whether to include a screenshot (base64 encoded)', }, maxContentLength: { type: 'number', description: 'Maximum content length to return', }, }, required: ['url'], }, },
- src/server.ts:146-147 (registration)Registration of the fetch_page handler in the tool call dispatcher switch statement.case 'fetch_page': return await this.handleFetchPage(args);
- src/browser-client.ts:20-36 (helper)Helper method in BrowserClient that fetches rendered HTML content from Cloudflare Browser Rendering API.async fetchContent(url: string): Promise<string> { try { console.log(`Fetching content from: ${url}`); // Make the API call to the Cloudflare Worker const response = await axios.post(`${this.apiEndpoint}/content`, { url, rejectResourceTypes: ['image', 'font', 'media'], waitUntil: 'networkidle0', }); return response.data.content; } catch (error) { console.error('Error fetching content:', error); throw new Error(`Failed to fetch content: ${error instanceof Error ? error.message : String(error)}`); } }
- src/content-processor.ts:11-20 (helper)Helper method in ContentProcessor that converts HTML to LLM-friendly markdown with extracted metadata.processForLLM(html: string, url: string): string { // Extract metadata const metadata = this.extractMetadata(html, url); // Clean the content const cleanedContent = this.cleanContent(html); // Format for LLM context return this.formatForLLM(cleanedContent, metadata); }