html_to_text
Convert HTML files to plain text while preserving document structure. Provide input path and output directory for the text file.
Instructions
Convert HTML to plain text while preserving structure
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| inputPath | Yes | Path to the input HTML file | |
| outputDir | Yes | Directory where text file should be saved |
Implementation Reference
- src/tools/htmlTools.ts:165-203 (handler)The main handler function for html_to_text. Reads an HTML file using JSDOM, extracts the body's textContent (preserving structure), and writes the plain text output to a uniquely-named .txt file in the specified output directory.
export async function htmlToText(inputPath: string, outputDir: string) { try { console.error(`Starting HTML to text conversion...`); console.error(`Input file: ${inputPath}`); console.error(`Output directory: ${outputDir}`); // 確保輸出目錄存在 try { await fs.access(outputDir); console.error(`Output directory exists: ${outputDir}`); } catch { console.error(`Creating output directory: ${outputDir}`); await fs.mkdir(outputDir, { recursive: true }); console.error(`Created output directory: ${outputDir}`); } const uniqueId = generateUniqueId(); const htmlContent = await fs.readFile(inputPath, "utf-8"); const dom = new JSDOM(htmlContent); const { document } = dom.window; // 保留結構的文字轉換 const text = document.body.textContent?.trim() || ""; const outputPath = path.join(outputDir, `text_${uniqueId}.txt`); await fs.writeFile(outputPath, text); console.error(`Written text to ${outputPath}`); return { success: true, data: `Successfully converted HTML to text: ${outputPath}`, }; } catch (error) { console.error(`Error in htmlToText:`, error); return { success: false, error: error instanceof Error ? error.message : "Unknown error", }; } } - src/tools/htmlTools.ts:33-50 (schema)The Tool definition/schema for html_to_text. Defines the tool name as 'html_to_text', description, and input schema requiring 'inputPath' (path to HTML file) and 'outputDir' (directory for output text file).
export const HTML_TO_TEXT_TOOL: Tool = { name: "html_to_text", description: "Convert HTML to plain text while preserving structure", inputSchema: { type: "object", properties: { inputPath: { type: "string", description: "Path to the input HTML file", }, outputDir: { type: "string", description: "Directory where text file should be saved", }, }, required: ["inputPath", "outputDir"], }, }; - src/tools/_index.ts:5-9 (registration)The tool is imported into the tools array (HTML_TO_TEXT_TOOL) which is exported and registered with the MCP server.
import { HTML_CLEAN_TOOL, HTML_EXTRACT_RESOURCES_TOOL, HTML_FORMAT_TOOL, HTML_TO_MARKDOWN_TOOL, HTML_TO_TEXT_TOOL } from "./htmlTools.js"; import { PDF_MERGE_TOOL, PDF_SPLIT_TOOL } from "./pdfTools.js"; import { TEXT_DIFF_TOOL, TEXT_ENCODING_CONVERT_TOOL, TEXT_FORMAT_TOOL, TEXT_SPLIT_TOOL } from "./txtTools.js"; export const tools = [DOCUMENT_READER_TOOL, PDF_MERGE_TOOL, PDF_SPLIT_TOOL, DOCX_TO_PDF_TOOL, DOCX_TO_HTML_TOOL, HTML_CLEAN_TOOL, HTML_TO_TEXT_TOOL, HTML_TO_MARKDOWN_TOOL, HTML_EXTRACT_RESOURCES_TOOL, HTML_FORMAT_TOOL, TEXT_DIFF_TOOL, TEXT_SPLIT_TOOL, TEXT_FORMAT_TOOL, TEXT_ENCODING_CONVERT_TOOL, EXCEL_READ_TOOL, FORMAT_CONVERTER_TOOL]; - src/index.ts:168-184 (registration)The request handler in the MCP server that dispatches calls to 'html_to_text', extracts inputPath/outputDir from args, calls the htmlToText function, and returns the result.
if (name === "html_to_text") { const { inputPath, outputDir } = args as { inputPath: string; outputDir: string; }; const result = await htmlToText(inputPath, outputDir); if (!result.success) { return { content: [{ type: "text", text: `Error: ${result.error}` }], isError: true, }; } return { content: [{ type: "text", text: fileOperationResponse(result.data) }], isError: false, }; }