Skip to main content
Glama

convert_html_to_markdown

Convert HTML files to clean Markdown format while preserving structure, links, images, tables, and formatting. Automatically saves converted files to the specified output directory.

Instructions

Enhanced HTML to Markdown conversion with style preservation. Converts HTML files to clean Markdown format while preserving structure, links, images, tables, and formatting. Output directory is controlled by OUTPUT_DIR environment variable. Files will be automatically saved to OUTPUT_DIR with auto-generated names.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
htmlPathYesHTML file path to convert
preserveStylesNoPreserve HTML formatting and styles
includeCSSNoInclude CSS styles as comments in Markdown
debugNoEnable debug output

Implementation Reference

  • Core implementation of the HTML to Markdown conversion logic. Parses HTML with Cheerio, extracts CSS if needed, converts structure to Markdown (headings, lists, tables, images, etc.), sanitizes, saves output file.
    async convertHtmlToMarkdown( inputPath: string, options: HtmlToMarkdownOptions = {} ): Promise<HtmlToMarkdownResult> { try { this.options = { preserveStyles: true, includeCSS: true, debug: false, ...options, }; if (this.options.debug) { console.log('๐Ÿš€ ๅผ€ๅง‹ๅขžๅผบ็š„ HTML ๅˆฐ Markdown ่ฝฌๆข...'); console.log('๐Ÿ“„ ่พ“ๅ…ฅๆ–‡ไปถ:', inputPath); } // ่ฏปๅ–HTMLๆ–‡ไปถ const htmlContent = await fs.readFile(inputPath, 'utf-8'); // ไฝฟ็”จcheerio่งฃๆžHTML const $ = cheerio.load(htmlContent); // ๆๅ–CSSๆ ทๅผ๏ผˆๅฆ‚ๆžœ้œ€่ฆ๏ผ‰ let cssStyles = ''; if (this.options.includeCSS) { cssStyles = this.extractCSS($); } // ่ฝฌๆขไธบMarkdown let markdownContent = this.htmlToMarkdown($); // ๅฆ‚ๆžœๅŒ…ๅซCSS๏ผŒๆทปๅŠ ๅˆฐๆ–‡ๆกฃๅผ€ๅคด if (cssStyles && this.options.includeCSS) { markdownContent = `<!-- CSS Styles\n${cssStyles}\n-->\n\n${markdownContent}`; } // ๆทปๅŠ ๆ ทๅผไฟ็•™่ฏดๆ˜Ž if (this.options.preserveStyles) { const styleNote = `<!-- ๆ ทๅผไฟ็•™่ฏดๆ˜Ž๏ผš\nๆœฌๆ–‡ๆกฃๅœจ่ฝฌๆข่ฟ‡็จ‹ไธญไฟ็•™ไบ†ๅŽŸๅง‹HTML็š„ๆ ทๅผไฟกๆฏใ€‚\nๅฆ‚้œ€ๆŸฅ็œ‹ๅฎŒๆ•ดๆ ทๅผๆ•ˆๆžœ๏ผŒ่ฏทๅœจๆ”ฏๆŒHTML็š„็ŽฏๅขƒไธญๆŸฅ็œ‹ใ€‚\nๅ›พ็‰‡่ทฏๅพ„ๅทฒ่ฝฌๆขไธบ็›ธๅฏน่ทฏๅพ„๏ผŒ่ฏท็กฎไฟๅ›พ็‰‡ๆ–‡ไปถๅœจๆญฃ็กฎไฝ็ฝฎใ€‚\n-->\n\n`; markdownContent = styleNote + markdownContent; } // ๅฏผๅ…ฅๅฎ‰ๅ…จ้…็ฝฎๅ‡ฝๆ•ฐ const { validateAndSanitizePath } = require('../security/securityConfig'); // ็งป้™ค่ทฏๅพ„้™ๅˆถ๏ผŒๅ…่ฎธ่ฎฟ้—ฎไปปๆ„็›ฎๅฝ•๏ผˆไธŽindex.tsไธญ็š„validatePathๅ‡ฝๆ•ฐไฟๆŒไธ€่‡ด๏ผ‰ // ็”Ÿๆˆ่พ“ๅ‡บ่ทฏๅพ„ const rawOutputPath = this.options.outputPath || inputPath.replace(/\.html?$/i, '.md'); const outputPath = validateAndSanitizePath(rawOutputPath, []); // ไฟๅญ˜ๆ–‡ไปถ await fs.writeFile(outputPath, markdownContent, 'utf-8'); if (this.options.debug) { console.log('โœ… ๅขžๅผบ็š„ Markdown ่ฝฌๆขๅฎŒๆˆ:', outputPath); } return { success: true, content: markdownContent, outputPath, metadata: { originalFormat: 'html', targetFormat: 'markdown', stylesPreserved: this.options.preserveStyles ?? false, contentLength: markdownContent.length, converter: 'enhanced-html-to-markdown-converter', }, }; } catch (error: any) { console.error('โŒ ๅขžๅผบ็š„ HTML ่ฝฌ Markdown ๅคฑ่ดฅ:', error.message); return { success: false, error: error.message, }; } }
  • Type definitions for input options and output result of the HTML to Markdown conversion.
    interface HtmlToMarkdownOptions { preserveStyles?: boolean; includeCSS?: boolean; outputPath?: string; debug?: boolean; } interface HtmlToMarkdownResult { success: boolean; content?: string; outputPath?: string; metadata?: { originalFormat: string; targetFormat: string; stylesPreserved: boolean; contentLength: number; converter: string; }; error?: string; }
  • Exported wrapper function around the enhanced converter, providing a simplified interface compatible with HtmlConversionOptions and Result.
    export async function convertHtmlToMarkdown( inputPath: string, options: HtmlConversionOptions = {} ): Promise<HtmlConversionResult> { try { const enhancedConverter = new EnhancedHtmlToMarkdownConverter(); const result = await enhancedConverter.convertHtmlToMarkdown(inputPath, { preserveStyles: true, includeCSS: false, outputPath: options.outputPath, debug: options.debug ?? false, }); if (!result.success) { return { success: false, error: result.error ?? 'HTMLๅˆฐMarkdown่ฝฌๆขๅคฑ่ดฅ', }; } return { success: true, outputPath: result.outputPath, content: result.content, metadata: result.metadata, }; } catch (error: any) { return { success: false, error: error.message, }; } }
  • Class method wrapper that delegates to EnhancedHtmlToMarkdownConverter, includes file reading and error handling.
    async convertHtmlToMarkdown( inputPath: string, options: HtmlConversionOptions = {} ): Promise<HtmlConversionResult> { try { this.options = { preserveStyles: false, // Markdown ไธๆ”ฏๆŒๅคๆ‚ๆ ทๅผ debug: false, ...options, }; if (this.options.debug) { console.log('๐Ÿš€ ๅผ€ๅง‹ HTML ๅˆฐ Markdown ่ฝฌๆข...'); console.log('๐Ÿ“„ ่พ“ๅ…ฅๆ–‡ไปถ:', inputPath); } // ่ฏปๅ–HTMLๆ–‡ไปถ const htmlContent = await fs.readFile(inputPath, 'utf-8'); // ไฝฟ็”จๅขžๅผบ็š„HTMLๅˆฐMarkdown่ฝฌๆขๅ™จ const enhancedConverter = new EnhancedHtmlToMarkdownConverter(); const result = await enhancedConverter.convertHtmlToMarkdown(inputPath, { preserveStyles: true, includeCSS: false, debug: true, }); if (!result.success) { throw new Error(result.error ?? 'HTMLๅˆฐMarkdown่ฝฌๆขๅคฑ่ดฅ'); } const markdownContent = result.content ?? ''; // ๅฏผๅ…ฅๅฎ‰ๅ…จ้…็ฝฎๅ‡ฝๆ•ฐ const { validateAndSanitizePath } = require('../security/securityConfig'); const allowedPaths = [path.dirname(inputPath), process.cwd()]; // ็”Ÿๆˆ่พ“ๅ‡บ่ทฏๅพ„ const rawOutputPath = this.options.outputPath || inputPath.replace(/\.html?$/i, '.md'); const outputPath = validateAndSanitizePath(rawOutputPath, allowedPaths); // ไฟๅญ˜ๆ–‡ไปถ await fs.writeFile(outputPath, markdownContent, 'utf-8'); if (this.options.debug) { console.log('โœ… Markdown ่ฝฌๆขๅฎŒๆˆ:', outputPath); } return { success: true, outputPath, content: markdownContent, metadata: { originalFormat: 'html', targetFormat: 'markdown', contentLength: markdownContent.length, converter: 'html-converter', }, }; } catch (error: any) { console.error('โŒ HTML ่ฝฌ Markdown ๅคฑ่ดฅ:', error.message); return { success: false, error: error.message, }; } }
  • Tool name mapping in conversion planner for HTML to Markdown conversions.
    html: { markdown: 'convert_html_to_markdown', md: 'convert_html_to_markdown', docx: 'convert_document', txt: 'convert_document', pdf: 'convert_document', // ้œ€่ฆๅค–้ƒจๅทฅๅ…ท },

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Tele-AI/doc-ops-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server