Skip to main content
Glama

convert_html_to_markdown

Convert HTML files to clean Markdown format while preserving structure, links, images, tables, and formatting. Automatically saves converted files to the specified output directory.

Instructions

Enhanced HTML to Markdown conversion with style preservation. Converts HTML files to clean Markdown format while preserving structure, links, images, tables, and formatting. Output directory is controlled by OUTPUT_DIR environment variable. Files will be automatically saved to OUTPUT_DIR with auto-generated names.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
htmlPathYesHTML file path to convert
preserveStylesNoPreserve HTML formatting and styles
includeCSSNoInclude CSS styles as comments in Markdown
debugNoEnable debug output

Implementation Reference

  • Core implementation of the HTML to Markdown conversion logic. Parses HTML with Cheerio, extracts CSS if needed, converts structure to Markdown (headings, lists, tables, images, etc.), sanitizes, saves output file.
    async convertHtmlToMarkdown(
      inputPath: string,
      options: HtmlToMarkdownOptions = {}
    ): Promise<HtmlToMarkdownResult> {
      try {
        this.options = {
          preserveStyles: true,
          includeCSS: true,
          debug: false,
          ...options,
        };
    
        if (this.options.debug) {
          console.log('🚀 开始增强的 HTML 到 Markdown 转换...');
          console.log('📄 输入文件:', inputPath);
        }
    
        // 读取HTML文件
        const htmlContent = await fs.readFile(inputPath, 'utf-8');
    
        // 使用cheerio解析HTML
        const $ = cheerio.load(htmlContent);
    
        // 提取CSS样式(如果需要)
        let cssStyles = '';
        if (this.options.includeCSS) {
          cssStyles = this.extractCSS($);
        }
    
        // 转换为Markdown
        let markdownContent = this.htmlToMarkdown($);
    
        // 如果包含CSS,添加到文档开头
        if (cssStyles && this.options.includeCSS) {
          markdownContent = `<!-- CSS Styles\n${cssStyles}\n-->\n\n${markdownContent}`;
        }
    
        // 添加样式保留说明
        if (this.options.preserveStyles) {
          const styleNote = `<!-- 样式保留说明:\n本文档在转换过程中保留了原始HTML的样式信息。\n如需查看完整样式效果,请在支持HTML的环境中查看。\n图片路径已转换为相对路径,请确保图片文件在正确位置。\n-->\n\n`;
          markdownContent = styleNote + markdownContent;
        }
    
        // 导入安全配置函数
        const { validateAndSanitizePath } = require('../security/securityConfig');
        // 移除路径限制,允许访问任意目录(与index.ts中的validatePath函数保持一致)
        
        // 生成输出路径
        const rawOutputPath = this.options.outputPath || inputPath.replace(/\.html?$/i, '.md');
        const outputPath = validateAndSanitizePath(rawOutputPath, []);
    
        // 保存文件
        await fs.writeFile(outputPath, markdownContent, 'utf-8');
    
        if (this.options.debug) {
          console.log('✅ 增强的 Markdown 转换完成:', outputPath);
        }
    
        return {
          success: true,
          content: markdownContent,
          outputPath,
          metadata: {
            originalFormat: 'html',
            targetFormat: 'markdown',
            stylesPreserved: this.options.preserveStyles ?? false,
            contentLength: markdownContent.length,
            converter: 'enhanced-html-to-markdown-converter',
          },
        };
      } catch (error: any) {
        console.error('❌ 增强的 HTML 转 Markdown 失败:', error.message);
        return {
          success: false,
          error: error.message,
        };
      }
    }
  • Type definitions for input options and output result of the HTML to Markdown conversion.
    interface HtmlToMarkdownOptions {
      preserveStyles?: boolean;
      includeCSS?: boolean;
      outputPath?: string;
      debug?: boolean;
    }
    
    interface HtmlToMarkdownResult {
      success: boolean;
      content?: string;
      outputPath?: string;
      metadata?: {
        originalFormat: string;
        targetFormat: string;
        stylesPreserved: boolean;
        contentLength: number;
        converter: string;
      };
      error?: string;
    }
  • Exported wrapper function around the enhanced converter, providing a simplified interface compatible with HtmlConversionOptions and Result.
    export async function convertHtmlToMarkdown(
      inputPath: string,
      options: HtmlConversionOptions = {}
    ): Promise<HtmlConversionResult> {
      try {
        const enhancedConverter = new EnhancedHtmlToMarkdownConverter();
        const result = await enhancedConverter.convertHtmlToMarkdown(inputPath, {
          preserveStyles: true,
          includeCSS: false,
          outputPath: options.outputPath,
          debug: options.debug ?? false,
        });
    
        if (!result.success) {
          return {
            success: false,
            error: result.error ?? 'HTML到Markdown转换失败',
          };
        }
    
        return {
          success: true,
          outputPath: result.outputPath,
          content: result.content,
          metadata: result.metadata,
        };
      } catch (error: any) {
        return {
          success: false,
          error: error.message,
        };
      }
    }
  • Class method wrapper that delegates to EnhancedHtmlToMarkdownConverter, includes file reading and error handling.
    async convertHtmlToMarkdown(
      inputPath: string,
      options: HtmlConversionOptions = {}
    ): Promise<HtmlConversionResult> {
      try {
        this.options = {
          preserveStyles: false, // Markdown 不支持复杂样式
          debug: false,
          ...options,
        };
    
        if (this.options.debug) {
          console.log('🚀 开始 HTML 到 Markdown 转换...');
          console.log('📄 输入文件:', inputPath);
        }
    
        // 读取HTML文件
        const htmlContent = await fs.readFile(inputPath, 'utf-8');
    
        // 使用增强的HTML到Markdown转换器
        const enhancedConverter = new EnhancedHtmlToMarkdownConverter();
        const result = await enhancedConverter.convertHtmlToMarkdown(inputPath, {
          preserveStyles: true,
          includeCSS: false,
          debug: true,
        });
    
        if (!result.success) {
          throw new Error(result.error ?? 'HTML到Markdown转换失败');
        }
    
        const markdownContent = result.content ?? '';
    
        // 导入安全配置函数
        const { validateAndSanitizePath } = require('../security/securityConfig');
        const allowedPaths = [path.dirname(inputPath), process.cwd()];
        
        // 生成输出路径
        const rawOutputPath = this.options.outputPath || inputPath.replace(/\.html?$/i, '.md');
        const outputPath = validateAndSanitizePath(rawOutputPath, allowedPaths);
    
        // 保存文件
        await fs.writeFile(outputPath, markdownContent, 'utf-8');
    
        if (this.options.debug) {
          console.log('✅ Markdown 转换完成:', outputPath);
        }
    
        return {
          success: true,
          outputPath,
          content: markdownContent,
          metadata: {
            originalFormat: 'html',
            targetFormat: 'markdown',
            contentLength: markdownContent.length,
            converter: 'html-converter',
          },
        };
      } catch (error: any) {
        console.error('❌ HTML 转 Markdown 失败:', error.message);
        return {
          success: false,
          error: error.message,
        };
      }
    }
  • Tool name mapping in conversion planner for HTML to Markdown conversions.
    html: {
      markdown: 'convert_html_to_markdown',
      md: 'convert_html_to_markdown',
      docx: 'convert_document',
      txt: 'convert_document',
      pdf: 'convert_document', // 需要外部工具
    },

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Tele-AI/doc-ops-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server