Skip to main content
Glama
cablate

Simple Document Processing MCP Server

document_reader

Extract text and content from PDF, DOCX, TXT, HTML, and CSV files by specifying the file path. Ideal for processing and analyzing non-image document formats in document workflows.

Instructions

Read content from non-image document-files at specified paths, supporting various file formats: .pdf, .docx, .txt, .html, .csv

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
filePathYesPath to the file to be read

Implementation Reference

  • The primary handler function for the document_reader tool, which reads and extracts text content from various supported document formats (PDF, DOCX, TXT, HTML, CSV) based on the file extension of the provided filePath.
    export async function readFile(filePath: string) { try { const ext = path.extname(filePath).toLowerCase(); let content: string; switch (ext) { case ".pdf": content = await readPDFFile(filePath); break; case ".docx": content = await readDocxFile(filePath); break; case ".txt": content = await readTextFile(filePath); break; case ".html": content = await readHTMLFile(filePath); break; case ".csv": content = await readCSVFile(filePath); break; default: throw new Error(`Unsupported file format: ${ext}`); } return { success: true, data: content, }; } catch (error) { return { success: false, error: error instanceof Error ? error.message : "Unknown error", }; } }
  • Tool definition object for 'document_reader', including name, description, and inputSchema specifying the required 'filePath' parameter.
    export const DOCUMENT_READER_TOOL: Tool = { name: "document_reader", description: "Read content from non-image document-files at specified paths, supporting various file formats: .pdf, .docx, .txt, .html, .csv", inputSchema: { type: "object", properties: { filePath: { type: "string", description: "Path to the file to be read", }, }, required: ["filePath"], }, };
  • MCP server CallToolRequest handler dispatch block for 'document_reader', which validates arguments and calls the readFile handler, formatting the response.
    if (name === "document_reader") { if (!isFileReaderArgs(args)) { throw new Error("Invalid arguments for document_reader"); } const result = await readFile(args.filePath); if (!result.success) { return { content: [{ type: "text", text: `Error: ${result.error}` }], isError: true, }; } return { content: [{ type: "text", text: result.data }], isError: false, }; }
  • Type guard function to validate and type-narrow input arguments for the document_reader tool to FileReaderArgs.
    export function isFileReaderArgs(args: unknown): args is FileReaderArgs { return ( typeof args === "object" && args !== null && "filePath" in args && typeof (args as FileReaderArgs).filePath === "string" ); }
  • Registration of the DOCUMENT_READER_TOOL (document_reader) in the central tools export array used for MCP ListTools response.
    export const tools = [DOCUMENT_READER_TOOL, PDF_MERGE_TOOL, PDF_SPLIT_TOOL, DOCX_TO_PDF_TOOL, DOCX_TO_HTML_TOOL, HTML_CLEAN_TOOL, HTML_TO_TEXT_TOOL, HTML_TO_MARKDOWN_TOOL, HTML_EXTRACT_RESOURCES_TOOL, HTML_FORMAT_TOOL, TEXT_DIFF_TOOL, TEXT_SPLIT_TOOL, TEXT_FORMAT_TOOL, TEXT_ENCODING_CONVERT_TOOL, EXCEL_READ_TOOL, FORMAT_CONVERTER_TOOL];

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cablate/mcp-doc-forge'

If you have feedback or need assistance with the MCP directory API, please join our Discord server