Skip to main content
Glama
cablate

Simple Document Processing MCP Server

html_to_text

Convert HTML files to structured plain text for easier processing. Save the output to a specified directory.

Instructions

Convert HTML to plain text while preserving structure

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
inputPathYesPath to the input HTML file
outputDirYesDirectory where text file should be saved

Implementation Reference

  • The core handler function that implements the html_to_text tool logic: reads HTML file, parses with JSDOM, extracts plain text from body, generates unique filename, writes to output directory.
    export async function htmlToText(inputPath: string, outputDir: string) { try { console.error(`Starting HTML to text conversion...`); console.error(`Input file: ${inputPath}`); console.error(`Output directory: ${outputDir}`); // 確保輸出目錄存在 try { await fs.access(outputDir); console.error(`Output directory exists: ${outputDir}`); } catch { console.error(`Creating output directory: ${outputDir}`); await fs.mkdir(outputDir, { recursive: true }); console.error(`Created output directory: ${outputDir}`); } const uniqueId = generateUniqueId(); const htmlContent = await fs.readFile(inputPath, "utf-8"); const dom = new JSDOM(htmlContent); const { document } = dom.window; // 保留結構的文字轉換 const text = document.body.textContent?.trim() || ""; const outputPath = path.join(outputDir, `text_${uniqueId}.txt`); await fs.writeFile(outputPath, text); console.error(`Written text to ${outputPath}`); return { success: true, data: `Successfully converted HTML to text: ${outputPath}`, }; } catch (error) { console.error(`Error in htmlToText:`, error); return { success: false, error: error instanceof Error ? error.message : "Unknown error", }; } }
  • The Tool object definition including name, description, and inputSchema for validating arguments to the html_to_text tool.
    export const HTML_TO_TEXT_TOOL: Tool = { name: "html_to_text", description: "Convert HTML to plain text while preserving structure", inputSchema: { type: "object", properties: { inputPath: { type: "string", description: "Path to the input HTML file", }, outputDir: { type: "string", description: "Directory where text file should be saved", }, }, required: ["inputPath", "outputDir"], }, };
  • Registration of the HTML_TO_TEXT_TOOL in the central tools array exported for the ListToolsRequestSchema handler.
    export const tools = [DOCUMENT_READER_TOOL, PDF_MERGE_TOOL, PDF_SPLIT_TOOL, DOCX_TO_PDF_TOOL, DOCX_TO_HTML_TOOL, HTML_CLEAN_TOOL, HTML_TO_TEXT_TOOL, HTML_TO_MARKDOWN_TOOL, HTML_EXTRACT_RESOURCES_TOOL, HTML_FORMAT_TOOL, TEXT_DIFF_TOOL, TEXT_SPLIT_TOOL, TEXT_FORMAT_TOOL, TEXT_ENCODING_CONVERT_TOOL, EXCEL_READ_TOOL, FORMAT_CONVERTER_TOOL];
  • src/index.ts:168-184 (registration)
    Dispatch/execution logic in the main CallToolRequestSchema handler that invokes the htmlToText function when the tool name matches.
    if (name === "html_to_text") { const { inputPath, outputDir } = args as { inputPath: string; outputDir: string; }; const result = await htmlToText(inputPath, outputDir); if (!result.success) { return { content: [{ type: "text", text: `Error: ${result.error}` }], isError: true, }; } return { content: [{ type: "text", text: fileOperationResponse(result.data) }], isError: false, }; }
  • Helper utility function used by htmlToText (and others) to generate unique IDs for output filenames.
    function generateUniqueId(): string { return randomBytes(9).toString("hex"); }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cablate/mcp-doc-forge'

If you have feedback or need assistance with the MCP directory API, please join our Discord server