Skip to main content
Glama
emzimmer

Mozilla Readability Parser MCP Server

by emzimmer

parse

Extract webpage content into clean, LLM-optimized Markdown by removing ads, navigation, and non-essential elements. Retrieve article title, main content, excerpt, byline, and site name using Mozilla's Readability algorithm.

Instructions

Extracts and transforms webpage content into clean, LLM-optimized Markdown. Returns article title, main content, excerpt, byline and site name. Uses Mozilla's Readability algorithm to remove ads, navigation, footers and non-essential elements while preserving the core content structure.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe website URL to parse

Implementation Reference

  • Core handler function that fetches the webpage, parses it using Readability, extracts main content, converts to Markdown, and returns structured article data.
    async fetchAndParse(url) {
      try {
        // Fetch the webpage
        const response = await axios.get(url, {
          headers: {
            'User-Agent': 'Mozilla/5.0 (compatible; MCPBot/1.0)'
          }
        });
    
        // Create a DOM from the HTML
        const dom = new JSDOM(response.data, { url });
        const document = dom.window.document;
    
        // Use Readability to extract main content
        const reader = new Readability(document);
        const article = reader.parse();
    
        if (!article) {
          throw new Error('Failed to parse content');
        }
    
        // Convert HTML to Markdown
        const markdown = turndownService.turndown(article.content);
    
        return {
          title: article.title,
          content: markdown,
          excerpt: article.excerpt,
          byline: article.byline,
          siteName: article.siteName
        };
      } catch (error) {
        throw new Error(`Failed to fetch or parse content: ${error.message}`);
      }
    }
  • MCP tool call handler that validates input, executes the parse tool logic via WebsiteParser, formats output as MCP content block, handles errors.
    server.setRequestHandler(CallToolRequestSchema, async (request) => {
      const { name, arguments: args } = request.params;
    
      if (name !== "parse") {
        throw new McpError(ErrorCode.MethodNotFound, `Unknown tool: ${name}`);
      }
    
      if (!args?.url) {
        throw new McpError(ErrorCode.InvalidParams, "URL is required");
      }
    
      try {
        const result = await parser.fetchAndParse(args.url);
        
        return {
          content: [{
            type: "text",
            text: JSON.stringify({
              title: result.title,
              content: result.content,
              metadata: {
                excerpt: result.excerpt,
                byline: result.byline,
                siteName: result.siteName
              }
            }, null, 2)
          }]
        };
      } catch (error) {
        return {
          isError: true,
          content: [{
            type: "text",
            text: `Error: ${error.message}`
          }]
        };
      }
    });
  • dist/index.js:64-79 (registration)
    Registers the 'parse' tool with MCP server by defining it in the listTools response, including name, description, and input schema.
    server.setRequestHandler(ListToolsRequestSchema, async () => ({
      tools: [{
        name: "parse",
        description: "Extracts and transforms webpage content into clean, LLM-optimized Markdown. Returns article title, main content, excerpt, byline and site name. Uses Mozilla's Readability algorithm to remove ads, navigation, footers and non-essential elements while preserving the core content structure.",
        inputSchema: {
          type: "object",
          properties: {
            url: {
              type: "string",
              description: "The website URL to parse"
            }
          },
          required: ["url"]
        }
      }]
    }));
  • Input schema for the 'parse' tool: requires a 'url' string.
    inputSchema: {
      type: "object",
      properties: {
        url: {
          type: "string",
          description: "The website URL to parse"
        }
      },
      required: ["url"]
    }
  • Helper class encapsulating the WebsiteParser with fetchAndParse method used by the tool handler.
    class WebsiteParser {
      async fetchAndParse(url) {
        try {
          // Fetch the webpage
          const response = await axios.get(url, {
            headers: {
              'User-Agent': 'Mozilla/5.0 (compatible; MCPBot/1.0)'
            }
          });
    
          // Create a DOM from the HTML
          const dom = new JSDOM(response.data, { url });
          const document = dom.window.document;
    
          // Use Readability to extract main content
          const reader = new Readability(document);
          const article = reader.parse();
    
          if (!article) {
            throw new Error('Failed to parse content');
          }
    
          // Convert HTML to Markdown
          const markdown = turndownService.turndown(article.content);
    
          return {
            title: article.title,
            content: markdown,
            excerpt: article.excerpt,
            byline: article.byline,
            siteName: article.siteName
          };
        } catch (error) {
          throw new Error(`Failed to fetch or parse content: ${error.message}`);
        }
      }
    }
Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/emzimmer/server-moz-readability'

If you have feedback or need assistance with the MCP directory API, please join our Discord server