html_to_text

html_to_text

Extract plain text from HTML files while maintaining document structure for easier reading and processing.

Instructions

Convert HTML to plain text while preserving structure

Input Schema

TableJSON Schema

Name	Required	Description	Default
`inputPath`	Yes	Path to the input HTML file
`outputDir`	Yes	Directory where text file should be saved

Implementation Reference

src/tools/htmlTools.ts:165-203 (handler)
The core execution function for the html_to_text tool. Reads HTML file using fs, parses with JSDOM, extracts textContent from body, generates unique filename, writes to output directory.
export async function htmlToText(inputPath: string, outputDir: string) { try { console.error(`Starting HTML to text conversion...`); console.error(`Input file: ${inputPath}`); console.error(`Output directory: ${outputDir}`); // 確保輸出目錄存在 try { await fs.access(outputDir); console.error(`Output directory exists: ${outputDir}`); } catch { console.error(`Creating output directory: ${outputDir}`); await fs.mkdir(outputDir, { recursive: true }); console.error(`Created output directory: ${outputDir}`); } const uniqueId = generateUniqueId(); const htmlContent = await fs.readFile(inputPath, "utf-8"); const dom = new JSDOM(htmlContent); const { document } = dom.window; // 保留結構的文字轉換 const text = document.body.textContent?.trim() || ""; const outputPath = path.join(outputDir, `text_${uniqueId}.txt`); await fs.writeFile(outputPath, text); console.error(`Written text to ${outputPath}`); return { success: true, data: `Successfully converted HTML to text: ${outputPath}`, }; } catch (error) { console.error(`Error in htmlToText:`, error); return { success: false, error: error instanceof Error ? error.message : "Unknown error", }; } }
src/tools/htmlTools.ts:33-50 (schema)
The Tool object definition for html_to_text, including name, description, and inputSchema for MCP validation.
export const HTML_TO_TEXT_TOOL: Tool = { name: "html_to_text", description: "Convert HTML to plain text while preserving structure", inputSchema: { type: "object", properties: { inputPath: { type: "string", description: "Path to the input HTML file", }, outputDir: { type: "string", description: "Directory where text file should be saved", }, }, required: ["inputPath", "outputDir"], }, };
src/tools/_index.ts:5-9 (registration)
Imports HTML_TO_TEXT_TOOL from htmlTools.ts and registers it in the central 'tools' array exported for the MCP server.
import { HTML_CLEAN_TOOL, HTML_EXTRACT_RESOURCES_TOOL, HTML_FORMAT_TOOL, HTML_TO_MARKDOWN_TOOL, HTML_TO_TEXT_TOOL } from "./htmlTools.js"; import { PDF_MERGE_TOOL, PDF_SPLIT_TOOL } from "./pdfTools.js"; import { TEXT_DIFF_TOOL, TEXT_ENCODING_CONVERT_TOOL, TEXT_FORMAT_TOOL, TEXT_SPLIT_TOOL } from "./txtTools.js"; export const tools = [DOCUMENT_READER_TOOL, PDF_MERGE_TOOL, PDF_SPLIT_TOOL, DOCX_TO_PDF_TOOL, DOCX_TO_HTML_TOOL, HTML_CLEAN_TOOL, HTML_TO_TEXT_TOOL, HTML_TO_MARKDOWN_TOOL, HTML_EXTRACT_RESOURCES_TOOL, HTML_FORMAT_TOOL, TEXT_DIFF_TOOL, TEXT_SPLIT_TOOL, TEXT_FORMAT_TOOL, TEXT_ENCODING_CONVERT_TOOL, EXCEL_READ_TOOL, FORMAT_CONVERTER_TOOL];
src/index.ts:47-49 (registration)
MCP server handler for listing tools, returns the 'tools' array which includes html_to_text.
server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools, }));
src/index.ts:168-184 (handler)
MCP CallToolRequestSchema dispatcher that matches name 'html_to_text', extracts args, calls htmlToText handler, and formats response.
if (name === "html_to_text") { const { inputPath, outputDir } = args as { inputPath: string; outputDir: string; }; const result = await htmlToText(inputPath, outputDir); if (!result.success) { return { content: [{ type: "text", text: `Error: ${result.error}` }], isError: true, }; } return { content: [{ type: "text", text: fileOperationResponse(result.data) }], isError: false, }; }

Simple Document Processing MCP Server

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API