Skip to main content
Glama
tatn

MCP Server Fetch TypeScript

by tatn

get_markdown

Converts web page content into clean, structured Markdown format, preserving elements like tables and lists for readability and document consistency.

Instructions

Converts web page content to well-formatted Markdown, preserving structural elements like tables and definition lists. Recommended as the default tool for web content extraction when a clean, readable text format is needed while maintaining document structure.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL of the web page to convert to Markdown format, supporting various HTML elements and structures.

Implementation Reference

  • src/index.ts:82-94 (registration)
    Registers the 'get_markdown' tool with its description and input schema (requires 'url' string).
    { name: "get_markdown", description: "Converts web page content to well-formatted Markdown, preserving structural elements like tables and definition lists. Recommended as the default tool for web content extraction when a clean, readable text format is needed while maintaining document structure.", inputSchema: { type: "object", properties: { url: { type: "string", description: "URL of the web page to convert to Markdown format, supporting various HTML elements and structures." } }, required: ["url"] }
  • Dispatcher handler for 'get_markdown' tool that calls the markdown conversion function and returns the result as text content.
    case "get_markdown": { return { content: [{ type: "text", text: (await getMarkdownStringFromHtmlByNHM(url)) }] };
  • Core handler function that fetches HTML using a headless browser and converts it to Markdown using NodeHtmlMarkdown with custom translators for definition lists (dl/dt/dd) and head elements.
    export async function getMarkdownStringFromHtmlByNHM( request_url: string, mainOnly: boolean = false, ) { const htmlString = await getHtmlString(request_url); const customTranslators: TranslatorConfigObject = { dl: () => ({ preserveWhitespace: false, surroundingNewlines: true, }), dt: () => ({ prefix: '**', postfix: ':** ', surroundingNewlines: false, }), dd: () => ({ postfix: '\n', surroundingNewlines: false, }), Head: () => ({ postfix: '\n', ignore: false, postprocess: (ctx) => { const titleNode = ctx.node.querySelector('title'); if (titleNode) { return titleNode.textContent || ''; } return ''; }, surroundingNewlines: true, }), }; if (mainOnly) { customTranslators.Header = () => ({ ignore: true, }); customTranslators.Footer = () => ({ ignore: true, }); customTranslators.Nav = () => ({ ignore: true, }); } const markdownString = NodeHtmlMarkdown.translate( htmlString, {}, customTranslators, ); return markdownString; }
  • Helper function to fetch fully rendered HTML content from a URL using Playwright Chromium headless browser.
    async function getHtmlString(request_url: string): Promise<string> { let browser: Browser | null = null; let page: Page | null = null; try { browser = await chromium.launch({ headless: true, // args: ['--single-process'], }); const context = await browser.newContext(); page = await context.newPage(); await page.goto(request_url, { waitUntil: 'domcontentloaded', timeout: TIMEOUT, }); const htmlString = await page.content(); return htmlString; } catch (error) { console.error(`Failed to fetch HTML for ${request_url}:`, error); return ""; } finally { if (page) { try { await page.close(); } catch (e) { console.error("Error closing page:", e); } } if (browser) { try { await browser.close(); } catch (error) { console.error('Error closing browser:', error); } } } }

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/tatn/mcp-server-fetch-typescript'

If you have feedback or need assistance with the MCP directory API, please join our Discord server