Skip to main content
Glama
cablate

Simple Document Processing MCP Server

document_reader

Extract text content from PDF, DOCX, TXT, HTML, and CSV files by specifying the file path, enabling document analysis and processing.

Instructions

Read content from non-image document-files at specified paths, supporting various file formats: .pdf, .docx, .txt, .html, .csv

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
filePathYesPath to the file to be read

Implementation Reference

  • Core handler function that executes the document reading logic by determining file type and calling appropriate reader, returning success/data or error.
    export async function readFile(filePath: string) {
      try {
        const ext = path.extname(filePath).toLowerCase();
        let content: string;
    
        switch (ext) {
          case ".pdf":
            content = await readPDFFile(filePath);
            break;
          case ".docx":
            content = await readDocxFile(filePath);
            break;
          case ".txt":
            content = await readTextFile(filePath);
            break;
          case ".html":
            content = await readHTMLFile(filePath);
            break;
          case ".csv":
            content = await readCSVFile(filePath);
            break;
          default:
            throw new Error(`Unsupported file format: ${ext}`);
        }
    
        return {
          success: true,
          data: content,
        };
      } catch (error) {
        return {
          success: false,
          error: error instanceof Error ? error.message : "Unknown error",
        };
      }
    } 
  • MCP server request handler dispatch for 'document_reader' tool: validates input, calls readFile, formats response.
    if (name === "document_reader") {
      if (!isFileReaderArgs(args)) {
        throw new Error("Invalid arguments for document_reader");
      }
    
      const result = await readFile(args.filePath);
      if (!result.success) {
        return {
          content: [{ type: "text", text: `Error: ${result.error}` }],
          isError: true,
        };
      }
      return {
        content: [{ type: "text", text: result.data }],
        isError: false,
      };
    }
  • Tool definition with name, description, and input schema requiring 'filePath'.
    export const DOCUMENT_READER_TOOL: Tool = {
      name: "document_reader",
      description:
        "Read content from non-image document-files at specified paths, supporting various file formats: .pdf, .docx, .txt, .html, .csv",
      inputSchema: {
        type: "object",
        properties: {
          filePath: {
            type: "string",
            description: "Path to the file to be read",
          },
        },
        required: ["filePath"],
      },
    };
  • Exports array of all tools including DOCUMENT_READER_TOOL for server registration.
    export const tools = [DOCUMENT_READER_TOOL, PDF_MERGE_TOOL, PDF_SPLIT_TOOL, DOCX_TO_PDF_TOOL, DOCX_TO_HTML_TOOL, HTML_CLEAN_TOOL, HTML_TO_TEXT_TOOL, HTML_TO_MARKDOWN_TOOL, HTML_EXTRACT_RESOURCES_TOOL, HTML_FORMAT_TOOL, TEXT_DIFF_TOOL, TEXT_SPLIT_TOOL, TEXT_FORMAT_TOOL, TEXT_ENCODING_CONVERT_TOOL, EXCEL_READ_TOOL, FORMAT_CONVERTER_TOOL];
  • Type guard/validator for document_reader input arguments.
    export function isFileReaderArgs(args: unknown): args is FileReaderArgs {
      return (
        typeof args === "object" &&
        args !== null &&
        "filePath" in args &&
        typeof (args as FileReaderArgs).filePath === "string"
      );
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cablate/mcp-doc-forge'

If you have feedback or need assistance with the MCP directory API, please join our Discord server