Skip to main content
Glama

web.read

Extract readable text content from web pages by processing HTML, enabling users to access clean information from websites for analysis or reading.

Instructions

Extract readable content from given HTML (or pass html from web.fetch).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes
htmlNo

Implementation Reference

  • The core handler function for 'web.read' tool. It uses JSDOM to parse HTML and Mozilla's Readability to extract the main article content, title, byline, language, word count, links, and metadata.
    export function webRead(args: { url: string, html?: string }) { const { url, html } = args; const doc = new JSDOM(html || '', { url }); const reader = new Readability(doc.window.document); const art = reader.parse(); if (!art) return { title: '', byline: '', lang: '', text: '', wordCount: 0, links: [], meta: {} }; const links: Array<{text: string, url: string}> = []; const anchorEls = doc.window.document.querySelectorAll('a[href]'); anchorEls.forEach(a => { const href = (a as HTMLAnchorElement).href; const text = (a as HTMLElement).textContent?.trim() || ''; if (href) links.push({ text, url: href }); }); const meta: Record<string,string> = {}; const metas = doc.window.document.querySelectorAll('meta[name], meta[property]'); metas.forEach((m:any) => { const key = m.getAttribute('name') || m.getAttribute('property'); const val = m.getAttribute('content'); if (key && val) meta[key] = val; }); return { title: art.title || '', byline: art.byline || '', lang: (doc.window.document.documentElement.getAttribute('lang') || '').toLowerCase(), text: art.textContent || '', wordCount: (art.textContent || '').split(/\s+/).filter(Boolean).length, links, meta }; }
  • Zod schema defining input parameters for the web.read tool: url (required string) and optional html.
    const webReadShape = { url: z.string(), html: z.string().optional() };
  • src/server.ts:88-93 (registration)
    MCP server registration for the 'web.read' tool, including description, schema, params, and inline async handler that calls the webRead function.
    server.tool('web.read', 'Extract readable content from given HTML (or pass html from web.fetch).', webReadShape, OPEN, async ({ url, html }) => { const res = webRead({ url, html }); return { content: [{ type: 'text', text: JSON.stringify(res) }] }; }
  • src/server.ts:95-100 (registration)
    Alias registration for 'web_read' tool, identical to 'web.read'.
    server.tool('web_read', 'Alias of web.read', webReadShape, OPEN, async ({ url, html }) => { const res = webRead({ url, html }); return { content: [{ type: 'text', text: JSON.stringify(res) }] }; }
  • src/server.ts:10-10 (registration)
    Import of the webRead handler function from its module.
    import { webRead } from './tools/webRead.js';

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/khanhs-234/tool4lm'

If you have feedback or need assistance with the MCP directory API, please join our Discord server