web.read
Extract readable text content from web pages by processing HTML, enabling users to access clean information from websites for analysis or reading.
Instructions
Extract readable content from given HTML (or pass html from web.fetch).
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| html | No |
Implementation Reference
- src/tools/webRead.ts:4-30 (handler)The core handler function for 'web.read' tool. It uses JSDOM to parse HTML and Mozilla's Readability to extract the main article content, title, byline, language, word count, links, and metadata.export function webRead(args: { url: string, html?: string }) { const { url, html } = args; const doc = new JSDOM(html || '', { url }); const reader = new Readability(doc.window.document); const art = reader.parse(); if (!art) return { title: '', byline: '', lang: '', text: '', wordCount: 0, links: [], meta: {} }; const links: Array<{text: string, url: string}> = []; const anchorEls = doc.window.document.querySelectorAll('a[href]'); anchorEls.forEach(a => { const href = (a as HTMLAnchorElement).href; const text = (a as HTMLElement).textContent?.trim() || ''; if (href) links.push({ text, url: href }); }); const meta: Record<string,string> = {}; const metas = doc.window.document.querySelectorAll('meta[name], meta[property]'); metas.forEach((m:any) => { const key = m.getAttribute('name') || m.getAttribute('property'); const val = m.getAttribute('content'); if (key && val) meta[key] = val; }); return { title: art.title || '', byline: art.byline || '', lang: (doc.window.document.documentElement.getAttribute('lang') || '').toLowerCase(), text: art.textContent || '', wordCount: (art.textContent || '').split(/\s+/).filter(Boolean).length, links, meta }; }
- src/server.ts:87-87 (schema)Zod schema defining input parameters for the web.read tool: url (required string) and optional html.const webReadShape = { url: z.string(), html: z.string().optional() };
- src/server.ts:88-93 (registration)MCP server registration for the 'web.read' tool, including description, schema, params, and inline async handler that calls the webRead function.server.tool('web.read', 'Extract readable content from given HTML (or pass html from web.fetch).', webReadShape, OPEN, async ({ url, html }) => { const res = webRead({ url, html }); return { content: [{ type: 'text', text: JSON.stringify(res) }] }; }
- src/server.ts:95-100 (registration)Alias registration for 'web_read' tool, identical to 'web.read'.server.tool('web_read', 'Alias of web.read', webReadShape, OPEN, async ({ url, html }) => { const res = webRead({ url, html }); return { content: [{ type: 'text', text: JSON.stringify(res) }] }; }
- src/server.ts:10-10 (registration)Import of the webRead handler function from its module.import { webRead } from './tools/webRead.js';