extract_readable
Extract clean, readable text content from web pages by providing a URL, removing navigation elements and ads to focus on the main article text.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes |
Implementation Reference
- src/server.js:138-157 (handler)The main handler function for the 'extract_readable' tool. Fetches the HTML from the provided URL, uses JSDOM to parse it, applies Mozilla's Readability library to extract the main article content (title, byline, excerpt, text), and formats it into a markdown-like text response.async (input) => { const res = await fetch(input.url, { headers: { "User-Agent": "Mozilla/5.0 (compatible; MCP-Web-Tools/0.1; +https://example.com)", }, }); const html = await res.text(); const dom = new JSDOM(html, { url: input.url }); const reader = new Readability(dom.window.document); const article = reader.parse(); if (!article) { return { content: [{ type: "text", text: "No readable content found." }] }; } const textBlocks = []; if (article.title) textBlocks.push(`# ${article.title}`); if (article.byline) textBlocks.push(`by ${article.byline}`); if (article.excerpt) textBlocks.push(article.excerpt); if (article.textContent) textBlocks.push(article.textContent); return { content: [{ type: "text", text: textBlocks.join("\n\n") }] }; }
- src/server.js:137-137 (schema)Input schema validation using Zod: requires a single 'url' parameter that must be a valid URL string.{ url: z.string().url() },
- src/server.js:135-158 (registration)Registration of the 'extract_readable' tool on the MCP server using server.tool(), including schema and handler.server.tool( "extract_readable", { url: z.string().url() }, async (input) => { const res = await fetch(input.url, { headers: { "User-Agent": "Mozilla/5.0 (compatible; MCP-Web-Tools/0.1; +https://example.com)", }, }); const html = await res.text(); const dom = new JSDOM(html, { url: input.url }); const reader = new Readability(dom.window.document); const article = reader.parse(); if (!article) { return { content: [{ type: "text", text: "No readable content found." }] }; } const textBlocks = []; if (article.title) textBlocks.push(`# ${article.title}`); if (article.byline) textBlocks.push(`by ${article.byline}`); if (article.excerpt) textBlocks.push(article.excerpt); if (article.textContent) textBlocks.push(article.textContent); return { content: [{ type: "text", text: textBlocks.join("\n\n") }] }; } );