Skip to main content
Glama

convert_html_to_markdown

Transform HTML files into Markdown with preserved structure, links, images, and tables. Automatically saves outputs to a specified directory, ideal for streamlined document conversion.

Instructions

Enhanced HTML to Markdown conversion with style preservation. Converts HTML files to clean Markdown format while preserving structure, links, images, tables, and formatting. Output directory is controlled by OUTPUT_DIR environment variable. Files will be automatically saved to OUTPUT_DIR with auto-generated names.

Input Schema

NameRequiredDescriptionDefault
debugNoEnable debug output
htmlPathYesHTML file path to convert
includeCSSNoInclude CSS styles as comments in Markdown
preserveStylesNoPreserve HTML formatting and styles

Input Schema (JSON Schema)

{ "properties": { "debug": { "default": false, "description": "Enable debug output", "type": "boolean" }, "htmlPath": { "description": "HTML file path to convert", "type": "string" }, "includeCSS": { "default": false, "description": "Include CSS styles as comments in Markdown", "type": "boolean" }, "preserveStyles": { "default": true, "description": "Preserve HTML formatting and styles", "type": "boolean" } }, "required": [ "htmlPath" ], "type": "object" }

Implementation Reference

  • Top-level exported handler function for the 'convert_html_to_markdown' MCP tool. Reads HTML input file, delegates to EnhancedHtmlToMarkdownConverter, sanitizes path, saves Markdown output file, and returns conversion result.
    export async function convertHtmlToMarkdown( inputPath: string, options: HtmlConversionOptions = {} ): Promise<HtmlConversionResult> { try { const enhancedConverter = new EnhancedHtmlToMarkdownConverter(); const result = await enhancedConverter.convertHtmlToMarkdown(inputPath, { preserveStyles: true, includeCSS: false, outputPath: options.outputPath, debug: options.debug ?? false, }); if (!result.success) { return { success: false, error: result.error ?? 'HTML到Markdown转换失败', }; } return { success: true, outputPath: result.outputPath, content: result.content, metadata: result.metadata, }; } catch (error: any) { return { success: false, error: error.message, }; } }
  • Core helper class method implementing the HTML to Markdown conversion logic using Cheerio for parsing, CSS extraction, entity decoding, table/list/image handling, and file output.
    async convertHtmlToMarkdown( inputPath: string, options: HtmlToMarkdownOptions = {} ): Promise<HtmlToMarkdownResult> { try { this.options = { preserveStyles: true, includeCSS: true, debug: false, ...options, }; if (this.options.debug) { console.log('🚀 开始增强的 HTML 到 Markdown 转换...'); console.log('📄 输入文件:', inputPath); } // 读取HTML文件 const htmlContent = await fs.readFile(inputPath, 'utf-8'); // 使用cheerio解析HTML const $ = cheerio.load(htmlContent); // 提取CSS样式(如果需要) let cssStyles = ''; if (this.options.includeCSS) { cssStyles = this.extractCSS($); } // 转换为Markdown let markdownContent = this.htmlToMarkdown($); // 如果包含CSS,添加到文档开头 if (cssStyles && this.options.includeCSS) { markdownContent = `<!-- CSS Styles\n${cssStyles}\n-->\n\n${markdownContent}`; } // 添加样式保留说明 if (this.options.preserveStyles) { const styleNote = `<!-- 样式保留说明:\n本文档在转换过程中保留了原始HTML的样式信息。\n如需查看完整样式效果,请在支持HTML的环境中查看。\n图片路径已转换为相对路径,请确保图片文件在正确位置。\n-->\n\n`; markdownContent = styleNote + markdownContent; } // 导入安全配置函数 const { validateAndSanitizePath } = require('../security/securityConfig'); // 移除路径限制,允许访问任意目录(与index.ts中的validatePath函数保持一致) // 生成输出路径 const rawOutputPath = this.options.outputPath || inputPath.replace(/\.html?$/i, '.md'); const outputPath = validateAndSanitizePath(rawOutputPath, []); // 保存文件 await fs.writeFile(outputPath, markdownContent, 'utf-8'); if (this.options.debug) { console.log('✅ 增强的 Markdown 转换完成:', outputPath); } return { success: true, content: markdownContent, outputPath, metadata: { originalFormat: 'html', targetFormat: 'markdown', stylesPreserved: this.options.preserveStyles ?? false, contentLength: markdownContent.length, converter: 'enhanced-html-to-markdown-converter', }, }; } catch (error: any) { console.error('❌ 增强的 HTML 转 Markdown 失败:', error.message); return { success: false, error: error.message, }; } }
  • Type definition for conversion options used by the handler, including preserveStyles, outputPath, debug, and format-specific options.
    interface HtmlConversionOptions { preserveStyles?: boolean; outputPath?: string; debug?: boolean; // PDF特定选项 pdfOptions?: { format?: 'A4' | 'A3' | 'Letter'; orientation?: 'portrait' | 'landscape'; margins?: { top?: string; bottom?: string; left?: string; right?: string; }; }; // DOCX特定选项 docxOptions?: { fontSize?: number; fontFamily?: string; lineSpacing?: number; }; }
  • Additional class method in HtmlConverter that implements HTML to Markdown conversion, delegating to the enhanced converter and handling file I/O.
    async convertHtmlToMarkdown( inputPath: string, options: HtmlConversionOptions = {} ): Promise<HtmlConversionResult> { try { this.options = { preserveStyles: false, // Markdown 不支持复杂样式 debug: false, ...options, }; if (this.options.debug) { console.log('🚀 开始 HTML 到 Markdown 转换...'); console.log('📄 输入文件:', inputPath); } // 读取HTML文件 const htmlContent = await fs.readFile(inputPath, 'utf-8'); // 使用增强的HTML到Markdown转换器 const enhancedConverter = new EnhancedHtmlToMarkdownConverter(); const result = await enhancedConverter.convertHtmlToMarkdown(inputPath, { preserveStyles: true, includeCSS: false, debug: true, }); if (!result.success) { throw new Error(result.error ?? 'HTML到Markdown转换失败'); } const markdownContent = result.content ?? ''; // 导入安全配置函数 const { validateAndSanitizePath } = require('../security/securityConfig'); const allowedPaths = [path.dirname(inputPath), process.cwd()]; // 生成输出路径 const rawOutputPath = this.options.outputPath || inputPath.replace(/\.html?$/i, '.md'); const outputPath = validateAndSanitizePath(rawOutputPath, allowedPaths); // 保存文件 await fs.writeFile(outputPath, markdownContent, 'utf-8'); if (this.options.debug) { console.log('✅ Markdown 转换完成:', outputPath); } return { success: true, outputPath, content: markdownContent, metadata: { originalFormat: 'html', targetFormat: 'markdown', contentLength: markdownContent.length, converter: 'html-converter', }, }; } catch (error: any) { console.error('❌ HTML 转 Markdown 失败:', error.message); return { success: false, error: error.message, }; } }
  • Tool name registration/mapping in directConversions object for HTML to Markdown conversion planning.
    html: { markdown: 'convert_html_to_markdown', md: 'convert_html_to_markdown', docx: 'convert_document', txt: 'convert_document', pdf: 'convert_document', // 需要外部工具

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Tele-AI/doc-ops-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server