extract_readable

Extracts clean, readable text from web pages by removing ads, navigation, and other clutter to deliver focused content for analysis or reading.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`url`	Yes

Implementation Reference

src/server.js:138-157 (handler)

The handler function for the 'extract_readable' tool. Fetches the HTML from the given URL, uses JSDOM to parse it, applies Mozilla's Readability library to extract the main article content (title, byline, excerpt, text), and returns it as markdown-formatted text.

async (input) => {
    const res = await fetch(input.url, {
        headers: {
            "User-Agent": "Mozilla/5.0 (compatible; MCP-Web-Tools/0.1; +https://example.com)",
        },
    });
    const html = await res.text();
    const dom = new JSDOM(html, { url: input.url });
    const reader = new Readability(dom.window.document);
    const article = reader.parse();
    if (!article) {
        return { content: [{ type: "text", text: "No readable content found." }] };
    }
    const textBlocks = [];
    if (article.title) textBlocks.push(`# ${article.title}`);
    if (article.byline) textBlocks.push(`by ${article.byline}`);
    if (article.excerpt) textBlocks.push(article.excerpt);
    if (article.textContent) textBlocks.push(article.textContent);
    return { content: [{ type: "text", text: textBlocks.join("\n\n") }] };
}

src/server.js:137-137 (schema)
The input schema for the 'extract_readable' tool, validating that the input contains a valid URL string.
```
{ url: z.string().url() },
```

src/server.js:135-158 (registration)

The registration of the 'extract_readable' tool on the McpServer instance, specifying name, input schema, and handler function.

server.tool(
    "extract_readable",
    { url: z.string().url() },
    async (input) => {
        const res = await fetch(input.url, {
            headers: {
                "User-Agent": "Mozilla/5.0 (compatible; MCP-Web-Tools/0.1; +https://example.com)",
            },
        });
        const html = await res.text();
        const dom = new JSDOM(html, { url: input.url });
        const reader = new Readability(dom.window.document);
        const article = reader.parse();
        if (!article) {
            return { content: [{ type: "text", text: "No readable content found." }] };
        }
        const textBlocks = [];
        if (article.title) textBlocks.push(`# ${article.title}`);
        if (article.byline) textBlocks.push(`by ${article.byline}`);
        if (article.excerpt) textBlocks.push(article.excerpt);
        if (article.textContent) textBlocks.push(article.textContent);
        return { content: [{ type: "text", text: textBlocks.join("\n\n") }] };
    }
);

MCP Web Tools Server