get_markdown_summary
Extracts and converts the main content of a web page into Markdown format, removing headers, footers, and navigation menus. Ideal for capturing essential content from articles, blogs, or documentation.
Instructions
Extracts and converts the main content area of a web page to Markdown format, automatically removing navigation menus, headers, footers, and other peripheral content. Perfect for capturing the core content of articles, blog posts, or documentation pages.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL of the web page whose main content should be extracted and converted to Markdown. |
Implementation Reference
- src/index.ts:96-109 (registration)Registration of the get_markdown_summary tool in the ListToolsRequestSchema handler, including name, description, and input schema requiring a URL.{ name: "get_markdown_summary", description: "Extracts and converts the main content area of a web page to Markdown format, automatically removing navigation menus, headers, footers, and other peripheral content. Perfect for capturing the core content of articles, blog posts, or documentation pages.", inputSchema: { type: "object", properties: { url: { type: "string", description: "URL of the web page whose main content should be extracted and converted to Markdown." } }, required: ["url"] } },
- src/index.ts:151-158 (handler)Handler case in CallToolRequestSchema that executes the get_markdown_summary tool by calling the helper getMarkdownStringFromHtmlByTD(url, true) with mainOnly enabled for summary extraction.case "get_markdown_summary": { return { content: [{ type: "text", text: (await getMarkdownStringFromHtmlByTD(url, true)) }] }; }
- src/index.ts:213-285 (helper)Primary helper function implementing the Markdown conversion logic using Turndown. Fetches HTML via getHtmlString, removes script/style/(header/footer/nav if mainOnly), adds custom rules for tables and definition lists (dl), and converts to Markdown. Called by the handler with mainOnly=true for summary extraction.export async function getMarkdownStringFromHtmlByTD( request_url: string, mainOnly: boolean = false, ) { const htmlString = await getHtmlString(request_url); const turndownService = new Turndown({ headingStyle: 'atx' }); turndownService.remove('script'); turndownService.remove('style'); if (mainOnly) { turndownService.remove('header'); turndownService.remove('footer'); turndownService.remove('nav'); } turndownService.addRule('table', { filter: 'table', // eslint-disable-next-line @typescript-eslint/no-unused-vars replacement: function (content, node, _options) { // Process each row in the table const rows = Array.from(node.querySelectorAll('tr')); if (rows.length === 0) { return ''; } const headerRow = rows[0]; const headerCells = Array.from( headerRow.querySelectorAll('th, td'), ).map((cell) => cell.textContent?.trim() || ''); const separator = headerCells.map(() => '---').join('|'); // Header row and separator line let markdown = `\n| ${headerCells.join(' | ')} |\n|${separator}|`; // Process remaining rows for (let i = 1; i < rows.length; i++) { const row = rows[i]; const rowCells = Array.from(row.querySelectorAll('th, td')).map( (cell) => cell.textContent?.trim() || '', ); markdown += `\n| ${rowCells.join(' | ')} |`; } return markdown + '\n'; }, }); turndownService.addRule('dl', { filter: 'dl', // eslint-disable-next-line @typescript-eslint/no-unused-vars replacement: function (content, node, _options) { let markdown = '\n\n'; const items = Array.from(node.children); let currentDt: string = ''; items.forEach((item) => { if (item.tagName === 'DT') { currentDt = item.textContent?.trim() || ''; if (currentDt) { markdown += `**${currentDt}:**`; } } else if (item.tagName === 'DD') { const ddContent = item.textContent?.trim() || ''; if (ddContent) { markdown += ` ${ddContent}\n`; } } }); return markdown + '\n'; }, }); const markdownString = turndownService.turndown(htmlString); return markdownString; }