web_read
Extract and process content from web pages or HTML strings to retrieve information for analysis and research.
Instructions
Alias of web.read
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| html | No |
Implementation Reference
- src/tools/webRead.ts:4-30 (handler)Core handler function for webRead tool: parses HTML with JSDOM and Readability to extract readable article content, metadata, links, language, and word count.export function webRead(args: { url: string, html?: string }) { const { url, html } = args; const doc = new JSDOM(html || '', { url }); const reader = new Readability(doc.window.document); const art = reader.parse(); if (!art) return { title: '', byline: '', lang: '', text: '', wordCount: 0, links: [], meta: {} }; const links: Array<{text: string, url: string}> = []; const anchorEls = doc.window.document.querySelectorAll('a[href]'); anchorEls.forEach(a => { const href = (a as HTMLAnchorElement).href; const text = (a as HTMLElement).textContent?.trim() || ''; if (href) links.push({ text, url: href }); }); const meta: Record<string,string> = {}; const metas = doc.window.document.querySelectorAll('meta[name], meta[property]'); metas.forEach((m:any) => { const key = m.getAttribute('name') || m.getAttribute('property'); const val = m.getAttribute('content'); if (key && val) meta[key] = val; }); return { title: art.title || '', byline: art.byline || '', lang: (doc.window.document.documentElement.getAttribute('lang') || '').toLowerCase(), text: art.textContent || '', wordCount: (art.textContent || '').split(/\s+/).filter(Boolean).length, links, meta }; }
- src/server.ts:87-87 (schema)Zod schema defining the input parameters for the web_read tool: url (required string) and optional html.const webReadShape = { url: z.string(), html: z.string().optional() };
- src/server.ts:95-101 (registration)Registration of the 'web_read' tool name, using webReadShape schema and webRead handler, returning JSON stringified result.server.tool('web_read', 'Alias of web.read', webReadShape, OPEN, async ({ url, html }) => { const res = webRead({ url, html }); return { content: [{ type: 'text', text: JSON.stringify(res) }] }; } );
- src/server.ts:88-94 (registration)Primary registration of the related 'web.read' tool, sharing the same schema and handler as 'web_read' alias.server.tool('web.read', 'Extract readable content from given HTML (or pass html from web.fetch).', webReadShape, OPEN, async ({ url, html }) => { const res = webRead({ url, html }); return { content: [{ type: 'text', text: JSON.stringify(res) }] }; } );