extract_readable
Extracts clean, readable text from web pages by removing ads, navigation, and other clutter to deliver focused content for analysis or reading.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes |
Implementation Reference
- src/server.js:138-157 (handler)The handler function for the 'extract_readable' tool. Fetches the HTML from the given URL, uses JSDOM to parse it, applies Mozilla's Readability library to extract the main article content (title, byline, excerpt, text), and returns it as markdown-formatted text.async (input) => { const res = await fetch(input.url, { headers: { "User-Agent": "Mozilla/5.0 (compatible; MCP-Web-Tools/0.1; +https://example.com)", }, }); const html = await res.text(); const dom = new JSDOM(html, { url: input.url }); const reader = new Readability(dom.window.document); const article = reader.parse(); if (!article) { return { content: [{ type: "text", text: "No readable content found." }] }; } const textBlocks = []; if (article.title) textBlocks.push(`# ${article.title}`); if (article.byline) textBlocks.push(`by ${article.byline}`); if (article.excerpt) textBlocks.push(article.excerpt); if (article.textContent) textBlocks.push(article.textContent); return { content: [{ type: "text", text: textBlocks.join("\n\n") }] }; }
- src/server.js:137-137 (schema)The input schema for the 'extract_readable' tool, validating that the input contains a valid URL string.{ url: z.string().url() },
- src/server.js:135-158 (registration)The registration of the 'extract_readable' tool on the McpServer instance, specifying name, input schema, and handler function.server.tool( "extract_readable", { url: z.string().url() }, async (input) => { const res = await fetch(input.url, { headers: { "User-Agent": "Mozilla/5.0 (compatible; MCP-Web-Tools/0.1; +https://example.com)", }, }); const html = await res.text(); const dom = new JSDOM(html, { url: input.url }); const reader = new Readability(dom.window.document); const article = reader.parse(); if (!article) { return { content: [{ type: "text", text: "No readable content found." }] }; } const textBlocks = []; if (article.title) textBlocks.push(`# ${article.title}`); if (article.byline) textBlocks.push(`by ${article.byline}`); if (article.excerpt) textBlocks.push(article.excerpt); if (article.textContent) textBlocks.push(article.textContent); return { content: [{ type: "text", text: textBlocks.join("\n\n") }] }; } );