convert_html_to_markdown
Convert HTML files to clean Markdown format while preserving structure, links, images, tables, and formatting. Automatically saves converted files to the specified output directory.
Instructions
Enhanced HTML to Markdown conversion with style preservation. Converts HTML files to clean Markdown format while preserving structure, links, images, tables, and formatting. Output directory is controlled by OUTPUT_DIR environment variable. Files will be automatically saved to OUTPUT_DIR with auto-generated names.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| htmlPath | Yes | HTML file path to convert | |
| preserveStyles | No | Preserve HTML formatting and styles | |
| includeCSS | No | Include CSS styles as comments in Markdown | |
| debug | No | Enable debug output |
Implementation Reference
- Core implementation of the HTML to Markdown conversion logic. Parses HTML with Cheerio, extracts CSS if needed, converts structure to Markdown (headings, lists, tables, images, etc.), sanitizes, saves output file.async convertHtmlToMarkdown( inputPath: string, options: HtmlToMarkdownOptions = {} ): Promise<HtmlToMarkdownResult> { try { this.options = { preserveStyles: true, includeCSS: true, debug: false, ...options, }; if (this.options.debug) { console.log('๐ ๅผๅงๅขๅผบ็ HTML ๅฐ Markdown ่ฝฌๆข...'); console.log('๐ ่พๅ ฅๆไปถ:', inputPath); } // ่ฏปๅHTMLๆไปถ const htmlContent = await fs.readFile(inputPath, 'utf-8'); // ไฝฟ็จcheerio่งฃๆHTML const $ = cheerio.load(htmlContent); // ๆๅCSSๆ ทๅผ๏ผๅฆๆ้่ฆ๏ผ let cssStyles = ''; if (this.options.includeCSS) { cssStyles = this.extractCSS($); } // ่ฝฌๆขไธบMarkdown let markdownContent = this.htmlToMarkdown($); // ๅฆๆๅ ๅซCSS๏ผๆทปๅ ๅฐๆๆกฃๅผๅคด if (cssStyles && this.options.includeCSS) { markdownContent = `<!-- CSS Styles\n${cssStyles}\n-->\n\n${markdownContent}`; } // ๆทปๅ ๆ ทๅผไฟ็่ฏดๆ if (this.options.preserveStyles) { const styleNote = `<!-- ๆ ทๅผไฟ็่ฏดๆ๏ผ\nๆฌๆๆกฃๅจ่ฝฌๆข่ฟ็จไธญไฟ็ไบๅๅงHTML็ๆ ทๅผไฟกๆฏใ\nๅฆ้ๆฅ็ๅฎๆดๆ ทๅผๆๆ๏ผ่ฏทๅจๆฏๆHTML็็ฏๅขไธญๆฅ็ใ\nๅพ็่ทฏๅพๅทฒ่ฝฌๆขไธบ็ธๅฏน่ทฏๅพ๏ผ่ฏท็กฎไฟๅพ็ๆไปถๅจๆญฃ็กฎไฝ็ฝฎใ\n-->\n\n`; markdownContent = styleNote + markdownContent; } // ๅฏผๅ ฅๅฎๅ จ้ ็ฝฎๅฝๆฐ const { validateAndSanitizePath } = require('../security/securityConfig'); // ็งป้ค่ทฏๅพ้ๅถ๏ผๅ ่ฎธ่ฎฟ้ฎไปปๆ็ฎๅฝ๏ผไธindex.tsไธญ็validatePathๅฝๆฐไฟๆไธ่ด๏ผ // ็ๆ่พๅบ่ทฏๅพ const rawOutputPath = this.options.outputPath || inputPath.replace(/\.html?$/i, '.md'); const outputPath = validateAndSanitizePath(rawOutputPath, []); // ไฟๅญๆไปถ await fs.writeFile(outputPath, markdownContent, 'utf-8'); if (this.options.debug) { console.log('โ ๅขๅผบ็ Markdown ่ฝฌๆขๅฎๆ:', outputPath); } return { success: true, content: markdownContent, outputPath, metadata: { originalFormat: 'html', targetFormat: 'markdown', stylesPreserved: this.options.preserveStyles ?? false, contentLength: markdownContent.length, converter: 'enhanced-html-to-markdown-converter', }, }; } catch (error: any) { console.error('โ ๅขๅผบ็ HTML ่ฝฌ Markdown ๅคฑ่ดฅ:', error.message); return { success: false, error: error.message, }; } }
- Type definitions for input options and output result of the HTML to Markdown conversion.interface HtmlToMarkdownOptions { preserveStyles?: boolean; includeCSS?: boolean; outputPath?: string; debug?: boolean; } interface HtmlToMarkdownResult { success: boolean; content?: string; outputPath?: string; metadata?: { originalFormat: string; targetFormat: string; stylesPreserved: boolean; contentLength: number; converter: string; }; error?: string; }
- src/tools/htmlConverter.ts:1353-1385 (helper)Exported wrapper function around the enhanced converter, providing a simplified interface compatible with HtmlConversionOptions and Result.export async function convertHtmlToMarkdown( inputPath: string, options: HtmlConversionOptions = {} ): Promise<HtmlConversionResult> { try { const enhancedConverter = new EnhancedHtmlToMarkdownConverter(); const result = await enhancedConverter.convertHtmlToMarkdown(inputPath, { preserveStyles: true, includeCSS: false, outputPath: options.outputPath, debug: options.debug ?? false, }); if (!result.success) { return { success: false, error: result.error ?? 'HTMLๅฐMarkdown่ฝฌๆขๅคฑ่ดฅ', }; } return { success: true, outputPath: result.outputPath, content: result.content, metadata: result.metadata, }; } catch (error: any) { return { success: false, error: error.message, }; } }
- src/tools/htmlConverter.ts:160-226 (handler)Class method wrapper that delegates to EnhancedHtmlToMarkdownConverter, includes file reading and error handling.async convertHtmlToMarkdown( inputPath: string, options: HtmlConversionOptions = {} ): Promise<HtmlConversionResult> { try { this.options = { preserveStyles: false, // Markdown ไธๆฏๆๅคๆๆ ทๅผ debug: false, ...options, }; if (this.options.debug) { console.log('๐ ๅผๅง HTML ๅฐ Markdown ่ฝฌๆข...'); console.log('๐ ่พๅ ฅๆไปถ:', inputPath); } // ่ฏปๅHTMLๆไปถ const htmlContent = await fs.readFile(inputPath, 'utf-8'); // ไฝฟ็จๅขๅผบ็HTMLๅฐMarkdown่ฝฌๆขๅจ const enhancedConverter = new EnhancedHtmlToMarkdownConverter(); const result = await enhancedConverter.convertHtmlToMarkdown(inputPath, { preserveStyles: true, includeCSS: false, debug: true, }); if (!result.success) { throw new Error(result.error ?? 'HTMLๅฐMarkdown่ฝฌๆขๅคฑ่ดฅ'); } const markdownContent = result.content ?? ''; // ๅฏผๅ ฅๅฎๅ จ้ ็ฝฎๅฝๆฐ const { validateAndSanitizePath } = require('../security/securityConfig'); const allowedPaths = [path.dirname(inputPath), process.cwd()]; // ็ๆ่พๅบ่ทฏๅพ const rawOutputPath = this.options.outputPath || inputPath.replace(/\.html?$/i, '.md'); const outputPath = validateAndSanitizePath(rawOutputPath, allowedPaths); // ไฟๅญๆไปถ await fs.writeFile(outputPath, markdownContent, 'utf-8'); if (this.options.debug) { console.log('โ Markdown ่ฝฌๆขๅฎๆ:', outputPath); } return { success: true, outputPath, content: markdownContent, metadata: { originalFormat: 'html', targetFormat: 'markdown', contentLength: markdownContent.length, converter: 'html-converter', }, }; } catch (error: any) { console.error('โ HTML ่ฝฌ Markdown ๅคฑ่ดฅ:', error.message); return { success: false, error: error.message, }; } }
- src/tools/conversionPlanner.ts:67-73 (helper)Tool name mapping in conversion planner for HTML to Markdown conversions.html: { markdown: 'convert_html_to_markdown', md: 'convert_html_to_markdown', docx: 'convert_document', txt: 'convert_document', pdf: 'convert_document', // ้่ฆๅค้จๅทฅๅ ท },