Skip to main content
Glama
tatn

MCP Server Fetch TypeScript

by tatn

get_markdown_summary

Extracts main content from web pages and converts it to clean Markdown format, removing navigation menus and peripheral elements for focused reading.

Instructions

Extracts and converts the main content area of a web page to Markdown format, automatically removing navigation menus, headers, footers, and other peripheral content. Perfect for capturing the core content of articles, blog posts, or documentation pages.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL of the web page whose main content should be extracted and converted to Markdown.

Implementation Reference

  • Handler for the 'get_markdown_summary' tool. It calls getMarkdownStringFromHtmlByTD with mainOnly=true to get a markdown summary of the main content.
    case "get_markdown_summary": {
      return {
        content: [{
          type: "text",
          text: (await getMarkdownStringFromHtmlByTD(url, true))
        }]
      };
    }
  • Input schema for the get_markdown_summary tool, requiring a single 'url' string parameter.
    inputSchema: {
      type: "object",
      properties: {
        url: {
          type: "string",
          description: "URL of the web page whose main content should be extracted and converted to Markdown."
        }
      },
      required: ["url"]
    }
  • src/index.ts:96-109 (registration)
    Registration of the 'get_markdown_summary' tool in the ListTools response, including name, description, and input schema.
    {
      name: "get_markdown_summary",
      description: "Extracts and converts the main content area of a web page to Markdown format, automatically removing navigation menus, headers, footers, and other peripheral content. Perfect for capturing the core content of articles, blog posts, or documentation pages.",
      inputSchema: {
        type: "object",
        properties: {
          url: {
            type: "string",
            description: "URL of the web page whose main content should be extracted and converted to Markdown."
          }
        },
        required: ["url"]
      }
    },
  • Primary helper implementing Markdown conversion from HTML using Turndown library. Supports mainOnly mode to exclude headers/footers/nav, and custom rules for tables and definition lists. Called by the handler with mainOnly=true.
    // Helper method to convert HTML to Markdown using Turndown with custom rules for tables and definition lists
    export async function getMarkdownStringFromHtmlByTD(
      request_url: string,
      mainOnly: boolean = false,
    ) {
      const htmlString = await getHtmlString(request_url);
    
      const turndownService = new Turndown({ headingStyle: 'atx' });
      turndownService.remove('script');
      turndownService.remove('style');
    
      if (mainOnly) {
        turndownService.remove('header');
        turndownService.remove('footer');
        turndownService.remove('nav');
      }
    
      turndownService.addRule('table', {
        filter: 'table',
        // eslint-disable-next-line @typescript-eslint/no-unused-vars
        replacement: function (content, node, _options) {
          // Process each row in the table
          const rows = Array.from(node.querySelectorAll('tr'));
          if (rows.length === 0) {
            return '';
          }
          const headerRow = rows[0];
          const headerCells = Array.from(
            headerRow.querySelectorAll('th, td'),
          ).map((cell) => cell.textContent?.trim() || '');
          const separator = headerCells.map(() => '---').join('|');
          // Header row and separator line
          let markdown = `\n| ${headerCells.join(' | ')} |\n|${separator}|`;
          // Process remaining rows
          for (let i = 1; i < rows.length; i++) {
            const row = rows[i];
            const rowCells = Array.from(row.querySelectorAll('th, td')).map(
              (cell) => cell.textContent?.trim() || '',
            );
            markdown += `\n| ${rowCells.join(' | ')} |`;
          }
          return markdown + '\n';
        },
      });
    
      turndownService.addRule('dl', {
        filter: 'dl',
        // eslint-disable-next-line @typescript-eslint/no-unused-vars
        replacement: function (content, node, _options) {
          let markdown = '\n\n';
          const items = Array.from(node.children);
    
          let currentDt: string = '';
          items.forEach((item) => {
            if (item.tagName === 'DT') {
              currentDt = item.textContent?.trim() || '';
              if (currentDt) {
                markdown += `**${currentDt}:**`;
              }
            } else if (item.tagName === 'DD') {
              const ddContent = item.textContent?.trim() || '';
              if (ddContent) {
                markdown += ` ${ddContent}\n`;
              }
            }
          });
          return markdown + '\n';
        },
      });
    
      const markdownString = turndownService.turndown(htmlString);
    
      return markdownString;
    }
  • Helper function to fetch fully rendered HTML content from a URL using Playwright Chromium headless browser, essential for dynamic content.
    async function getHtmlString(request_url: string): Promise<string> {
      let browser: Browser | null = null;
      let page: Page | null = null;
      try {
        browser = await chromium.launch({
          headless: true,
          // args: ['--single-process'], 
        });
        const context = await browser.newContext();
        page = await context.newPage();
    
        await page.goto(request_url, {
          waitUntil: 'domcontentloaded',
          timeout: TIMEOUT,
        });
        const htmlString = await page.content();
        return htmlString;
      } catch (error) {
        console.error(`Failed to fetch HTML for ${request_url}:`, error);
        return ""; 
      } finally {
        if (page) {
          try {
            await page.close();
          } catch (e) {
            console.error("Error closing page:", e);
          }
        }
        if (browser) {
          try {
            await browser.close();
          } catch (error) {
            console.error('Error closing browser:', error);
          }
        }
      }
    }
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It describes key behaviors like automatic removal of navigation menus and headers, which is useful context, but lacks details on potential limitations, error handling, or performance aspects that could affect tool selection.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, with two concise sentences that efficiently convey the tool's purpose and use case without any wasted words, making it easy for an agent to quickly understand the tool's value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (content extraction and conversion), no annotations, and no output schema, the description is somewhat complete but lacks details on output format, error conditions, or performance characteristics that would help an agent use it effectively in varied contexts.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the input schema already documents the 'url' parameter thoroughly. The description does not add any additional meaning or details beyond what the schema provides, such as URL format constraints or examples, resulting in a baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('extracts and converts') and resources ('main content area of a web page to Markdown format'), distinguishing it from siblings like get_raw_text or get_rendered_html by emphasizing content extraction and conversion to Markdown.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('Perfect for capturing the core content of articles, blog posts, or documentation pages'), but does not explicitly state when not to use it or name alternatives among the sibling tools, leaving some room for improvement in distinguishing usage scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tatn/mcp-server-fetch-typescript'

If you have feedback or need assistance with the MCP directory API, please join our Discord server