Skip to main content
Glama
emzimmer

Mozilla Readability Parser MCP Server

by emzimmer

parse

Extract webpage content into clean, LLM-optimized Markdown by removing ads, navigation, and non-essential elements. Retrieve article title, main content, excerpt, byline, and site name using Mozilla's Readability algorithm.

Instructions

Extracts and transforms webpage content into clean, LLM-optimized Markdown. Returns article title, main content, excerpt, byline and site name. Uses Mozilla's Readability algorithm to remove ads, navigation, footers and non-essential elements while preserving the core content structure.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe website URL to parse

Implementation Reference

  • Core handler function that fetches the webpage, parses it using Readability, extracts main content, converts to Markdown, and returns structured article data.
    async fetchAndParse(url) { try { // Fetch the webpage const response = await axios.get(url, { headers: { 'User-Agent': 'Mozilla/5.0 (compatible; MCPBot/1.0)' } }); // Create a DOM from the HTML const dom = new JSDOM(response.data, { url }); const document = dom.window.document; // Use Readability to extract main content const reader = new Readability(document); const article = reader.parse(); if (!article) { throw new Error('Failed to parse content'); } // Convert HTML to Markdown const markdown = turndownService.turndown(article.content); return { title: article.title, content: markdown, excerpt: article.excerpt, byline: article.byline, siteName: article.siteName }; } catch (error) { throw new Error(`Failed to fetch or parse content: ${error.message}`); } }
  • MCP tool call handler that validates input, executes the parse tool logic via WebsiteParser, formats output as MCP content block, handles errors.
    server.setRequestHandler(CallToolRequestSchema, async (request) => { const { name, arguments: args } = request.params; if (name !== "parse") { throw new McpError(ErrorCode.MethodNotFound, `Unknown tool: ${name}`); } if (!args?.url) { throw new McpError(ErrorCode.InvalidParams, "URL is required"); } try { const result = await parser.fetchAndParse(args.url); return { content: [{ type: "text", text: JSON.stringify({ title: result.title, content: result.content, metadata: { excerpt: result.excerpt, byline: result.byline, siteName: result.siteName } }, null, 2) }] }; } catch (error) { return { isError: true, content: [{ type: "text", text: `Error: ${error.message}` }] }; } });
  • dist/index.js:64-79 (registration)
    Registers the 'parse' tool with MCP server by defining it in the listTools response, including name, description, and input schema.
    server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: [{ name: "parse", description: "Extracts and transforms webpage content into clean, LLM-optimized Markdown. Returns article title, main content, excerpt, byline and site name. Uses Mozilla's Readability algorithm to remove ads, navigation, footers and non-essential elements while preserving the core content structure.", inputSchema: { type: "object", properties: { url: { type: "string", description: "The website URL to parse" } }, required: ["url"] } }] }));
  • Input schema for the 'parse' tool: requires a 'url' string.
    inputSchema: { type: "object", properties: { url: { type: "string", description: "The website URL to parse" } }, required: ["url"] }
  • Helper class encapsulating the WebsiteParser with fetchAndParse method used by the tool handler.
    class WebsiteParser { async fetchAndParse(url) { try { // Fetch the webpage const response = await axios.get(url, { headers: { 'User-Agent': 'Mozilla/5.0 (compatible; MCPBot/1.0)' } }); // Create a DOM from the HTML const dom = new JSDOM(response.data, { url }); const document = dom.window.document; // Use Readability to extract main content const reader = new Readability(document); const article = reader.parse(); if (!article) { throw new Error('Failed to parse content'); } // Convert HTML to Markdown const markdown = turndownService.turndown(article.content); return { title: article.title, content: markdown, excerpt: article.excerpt, byline: article.byline, siteName: article.siteName }; } catch (error) { throw new Error(`Failed to fetch or parse content: ${error.message}`); } } }

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/emzimmer/server-moz-readability'

If you have feedback or need assistance with the MCP directory API, please join our Discord server