document_reader
Extract text and content from PDF, DOCX, TXT, HTML, and CSV files by specifying the file path. Ideal for processing and analyzing non-image document formats in document workflows.
Instructions
Read content from non-image document-files at specified paths, supporting various file formats: .pdf, .docx, .txt, .html, .csv
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| filePath | Yes | Path to the file to be read |
Implementation Reference
- src/tools/documentReader.ts:83-118 (handler)The primary handler function for the document_reader tool, which reads and extracts text content from various supported document formats (PDF, DOCX, TXT, HTML, CSV) based on the file extension of the provided filePath.export async function readFile(filePath: string) { try { const ext = path.extname(filePath).toLowerCase(); let content: string; switch (ext) { case ".pdf": content = await readPDFFile(filePath); break; case ".docx": content = await readDocxFile(filePath); break; case ".txt": content = await readTextFile(filePath); break; case ".html": content = await readHTMLFile(filePath); break; case ".csv": content = await readCSVFile(filePath); break; default: throw new Error(`Unsupported file format: ${ext}`); } return { success: true, data: content, }; } catch (error) { return { success: false, error: error instanceof Error ? error.message : "Unknown error", }; } }
- src/tools/documentReader.ts:9-23 (schema)Tool definition object for 'document_reader', including name, description, and inputSchema specifying the required 'filePath' parameter.export const DOCUMENT_READER_TOOL: Tool = { name: "document_reader", description: "Read content from non-image document-files at specified paths, supporting various file formats: .pdf, .docx, .txt, .html, .csv", inputSchema: { type: "object", properties: { filePath: { type: "string", description: "Path to the file to be read", }, }, required: ["filePath"], }, };
- src/index.ts:59-75 (handler)MCP server CallToolRequest handler dispatch block for 'document_reader', which validates arguments and calls the readFile handler, formatting the response.if (name === "document_reader") { if (!isFileReaderArgs(args)) { throw new Error("Invalid arguments for document_reader"); } const result = await readFile(args.filePath); if (!result.success) { return { content: [{ type: "text", text: `Error: ${result.error}` }], isError: true, }; } return { content: [{ type: "text", text: result.data }], isError: false, }; }
- src/tools/documentReader.ts:29-36 (schema)Type guard function to validate and type-narrow input arguments for the document_reader tool to FileReaderArgs.export function isFileReaderArgs(args: unknown): args is FileReaderArgs { return ( typeof args === "object" && args !== null && "filePath" in args && typeof (args as FileReaderArgs).filePath === "string" ); }
- src/tools/_index.ts:9-9 (registration)Registration of the DOCUMENT_READER_TOOL (document_reader) in the central tools export array used for MCP ListTools response.export const tools = [DOCUMENT_READER_TOOL, PDF_MERGE_TOOL, PDF_SPLIT_TOOL, DOCX_TO_PDF_TOOL, DOCX_TO_HTML_TOOL, HTML_CLEAN_TOOL, HTML_TO_TEXT_TOOL, HTML_TO_MARKDOWN_TOOL, HTML_EXTRACT_RESOURCES_TOOL, HTML_FORMAT_TOOL, TEXT_DIFF_TOOL, TEXT_SPLIT_TOOL, TEXT_FORMAT_TOOL, TEXT_ENCODING_CONVERT_TOOL, EXCEL_READ_TOOL, FORMAT_CONVERTER_TOOL];