web.read

Extract readable text content from web pages by processing HTML, enabling users to access clean information from websites for analysis or reading.

Instructions

Extract readable content from given HTML (or pass html from web.fetch).

Input Schema

TableJSON Schema

Name	Required	Description	Default
`url`	Yes
`html`	No

Implementation Reference

src/tools/webRead.ts:4-30 (handler)
The core handler function for 'web.read' tool. It uses JSDOM to parse HTML and Mozilla's Readability to extract the main article content, title, byline, language, word count, links, and metadata.
export function webRead(args: { url: string, html?: string }) { const { url, html } = args; const doc = new JSDOM(html || '', { url }); const reader = new Readability(doc.window.document); const art = reader.parse(); if (!art) return { title: '', byline: '', lang: '', text: '', wordCount: 0, links: [], meta: {} }; const links: Array<{text: string, url: string}> = []; const anchorEls = doc.window.document.querySelectorAll('a[href]'); anchorEls.forEach(a => { const href = (a as HTMLAnchorElement).href; const text = (a as HTMLElement).textContent?.trim() || ''; if (href) links.push({ text, url: href }); }); const meta: Record<string,string> = {}; const metas = doc.window.document.querySelectorAll('meta[name], meta[property]'); metas.forEach((m:any) => { const key = m.getAttribute('name') || m.getAttribute('property'); const val = m.getAttribute('content'); if (key && val) meta[key] = val; }); return { title: art.title || '', byline: art.byline || '', lang: (doc.window.document.documentElement.getAttribute('lang') || '').toLowerCase(), text: art.textContent || '', wordCount: (art.textContent || '').split(/\s+/).filter(Boolean).length, links, meta }; }
src/server.ts:87-87 (schema)
Zod schema defining input parameters for the web.read tool: url (required string) and optional html.
const webReadShape = { url: z.string(), html: z.string().optional() };
src/server.ts:88-93 (registration)
MCP server registration for the 'web.read' tool, including description, schema, params, and inline async handler that calls the webRead function.
server.tool('web.read', 'Extract readable content from given HTML (or pass html from web.fetch).', webReadShape, OPEN, async ({ url, html }) => { const res = webRead({ url, html }); return { content: [{ type: 'text', text: JSON.stringify(res) }] }; }
src/server.ts:95-100 (registration)
Alias registration for 'web_read' tool, identical to 'web.read'.
server.tool('web_read', 'Alias of web.read', webReadShape, OPEN, async ({ url, html }) => { const res = webRead({ url, html }); return { content: [{ type: 'text', text: JSON.stringify(res) }] }; }
src/server.ts:10-10 (registration)
Import of the webRead handler function from its module.
import { webRead } from './tools/webRead.js';

TOOL4LM

web.read

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API