web_read
Extract and process web page content or HTML data for structured analysis. Enables local LLMs to retrieve and interpret online information without API dependencies.
Instructions
Alias of web.read
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| html | No | ||
| url | Yes |
Implementation Reference
- src/tools/webRead.ts:4-30 (handler)Core handler function for webRead that uses JSDOM and Readability to extract readable content, title, byline, language, text, word count, links, and meta from HTML.export function webRead(args: { url: string, html?: string }) { const { url, html } = args; const doc = new JSDOM(html || '', { url }); const reader = new Readability(doc.window.document); const art = reader.parse(); if (!art) return { title: '', byline: '', lang: '', text: '', wordCount: 0, links: [], meta: {} }; const links: Array<{text: string, url: string}> = []; const anchorEls = doc.window.document.querySelectorAll('a[href]'); anchorEls.forEach(a => { const href = (a as HTMLAnchorElement).href; const text = (a as HTMLElement).textContent?.trim() || ''; if (href) links.push({ text, url: href }); }); const meta: Record<string,string> = {}; const metas = doc.window.document.querySelectorAll('meta[name], meta[property]'); metas.forEach((m:any) => { const key = m.getAttribute('name') || m.getAttribute('property'); const val = m.getAttribute('content'); if (key && val) meta[key] = val; }); return { title: art.title || '', byline: art.byline || '', lang: (doc.window.document.documentElement.getAttribute('lang') || '').toLowerCase(), text: art.textContent || '', wordCount: (art.textContent || '').split(/\s+/).filter(Boolean).length, links, meta }; }
- src/server.ts:87-87 (schema)Zod schema defining input parameters for the web_read tool: url (required string) and optional html.const webReadShape = { url: z.string(), html: z.string().optional() };
- src/server.ts:95-101 (registration)Registration of the 'web_read' tool in the MCP server, which is an alias calling the webRead handler.server.tool('web_read', 'Alias of web.read', webReadShape, OPEN, async ({ url, html }) => { const res = webRead({ url, html }); return { content: [{ type: 'text', text: JSON.stringify(res) }] }; } );