html_to_text
Extract plain text from HTML files while maintaining document structure for easier reading and processing.
Instructions
Convert HTML to plain text while preserving structure
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| inputPath | Yes | Path to the input HTML file | |
| outputDir | Yes | Directory where text file should be saved |
Implementation Reference
- src/tools/htmlTools.ts:165-203 (handler)The core execution function for the html_to_text tool. Reads HTML file using fs, parses with JSDOM, extracts textContent from body, generates unique filename, writes to output directory.export async function htmlToText(inputPath: string, outputDir: string) { try { console.error(`Starting HTML to text conversion...`); console.error(`Input file: ${inputPath}`); console.error(`Output directory: ${outputDir}`); // 確保輸出目錄存在 try { await fs.access(outputDir); console.error(`Output directory exists: ${outputDir}`); } catch { console.error(`Creating output directory: ${outputDir}`); await fs.mkdir(outputDir, { recursive: true }); console.error(`Created output directory: ${outputDir}`); } const uniqueId = generateUniqueId(); const htmlContent = await fs.readFile(inputPath, "utf-8"); const dom = new JSDOM(htmlContent); const { document } = dom.window; // 保留結構的文字轉換 const text = document.body.textContent?.trim() || ""; const outputPath = path.join(outputDir, `text_${uniqueId}.txt`); await fs.writeFile(outputPath, text); console.error(`Written text to ${outputPath}`); return { success: true, data: `Successfully converted HTML to text: ${outputPath}`, }; } catch (error) { console.error(`Error in htmlToText:`, error); return { success: false, error: error instanceof Error ? error.message : "Unknown error", }; } }
- src/tools/htmlTools.ts:33-50 (schema)The Tool object definition for html_to_text, including name, description, and inputSchema for MCP validation.export const HTML_TO_TEXT_TOOL: Tool = { name: "html_to_text", description: "Convert HTML to plain text while preserving structure", inputSchema: { type: "object", properties: { inputPath: { type: "string", description: "Path to the input HTML file", }, outputDir: { type: "string", description: "Directory where text file should be saved", }, }, required: ["inputPath", "outputDir"], }, };
- src/tools/_index.ts:5-9 (registration)Imports HTML_TO_TEXT_TOOL from htmlTools.ts and registers it in the central 'tools' array exported for the MCP server.import { HTML_CLEAN_TOOL, HTML_EXTRACT_RESOURCES_TOOL, HTML_FORMAT_TOOL, HTML_TO_MARKDOWN_TOOL, HTML_TO_TEXT_TOOL } from "./htmlTools.js"; import { PDF_MERGE_TOOL, PDF_SPLIT_TOOL } from "./pdfTools.js"; import { TEXT_DIFF_TOOL, TEXT_ENCODING_CONVERT_TOOL, TEXT_FORMAT_TOOL, TEXT_SPLIT_TOOL } from "./txtTools.js"; export const tools = [DOCUMENT_READER_TOOL, PDF_MERGE_TOOL, PDF_SPLIT_TOOL, DOCX_TO_PDF_TOOL, DOCX_TO_HTML_TOOL, HTML_CLEAN_TOOL, HTML_TO_TEXT_TOOL, HTML_TO_MARKDOWN_TOOL, HTML_EXTRACT_RESOURCES_TOOL, HTML_FORMAT_TOOL, TEXT_DIFF_TOOL, TEXT_SPLIT_TOOL, TEXT_FORMAT_TOOL, TEXT_ENCODING_CONVERT_TOOL, EXCEL_READ_TOOL, FORMAT_CONVERTER_TOOL];
- src/index.ts:47-49 (registration)MCP server handler for listing tools, returns the 'tools' array which includes html_to_text.server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools, }));
- src/index.ts:168-184 (handler)MCP CallToolRequestSchema dispatcher that matches name 'html_to_text', extracts args, calls htmlToText handler, and formats response.if (name === "html_to_text") { const { inputPath, outputDir } = args as { inputPath: string; outputDir: string; }; const result = await htmlToText(inputPath, outputDir); if (!result.success) { return { content: [{ type: "text", text: `Error: ${result.error}` }], isError: true, }; } return { content: [{ type: "text", text: fileOperationResponse(result.data) }], isError: false, }; }