Skip to main content
Glama
Theorhd
by Theorhd

read_pdf

Extract text content from PDF files stored on disk. Provide the file path to retrieve readable text from PDF documents.

Instructions

Read a PDF file from disk and return its text content

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYesAbsolute or relative path to the PDF file

Implementation Reference

  • Handler for the 'read_pdf' tool that validates the file path, reads the PDF file, parses it using the 'pdf-parse' library, and returns the extracted text content.
    case "read_pdf": {
      const { file_path } = args as any;
    
      if (!file_path || typeof file_path !== 'string') {
        throw new Error('file_path is required and must be a string');
      }
    
      const resolvedPath = path.resolve(file_path);
    
      const isAllowed = ALLOWED_DIRS.some(allowedDir => resolvedPath.startsWith(path.resolve(allowedDir)));
      if (!isAllowed) {
        throw new Error(`file_path must be inside allowed directories: ${ALLOWED_DIRS.join(', ')}`);
      }
    
      const data = await fs.readFile(resolvedPath);
        let pdfParse: any;
        try {
          const require = createRequire(import.meta.url);
          pdfParse = require('pdf-parse');
      } catch (e) {
        throw new Error('Dependency "pdf-parse" is not installed. Please run `npm install pdf-parse` in pdftools-mcp');
      }
    
      const parsed: any = await pdfParse(data);
    
      return {
        content: [
          {
            type: 'text',
            text: parsed.text || ''
          }
        ]
      };
    }
  • index.ts:159-172 (registration)
    Registration of the 'read_pdf' tool in the tools list, including name, description, and input schema definition.
    {
      name: "read_pdf",
      description: "Read a PDF file from disk and return its text content",
      inputSchema: {
        type: "object",
        properties: {
          file_path: {
            type: "string",
            description: "Absolute or relative path to the PDF file"
          }
        },
        required: ["file_path"]
      }
    }
  • Input schema for the 'read_pdf' tool defining the required 'file_path' parameter.
    inputSchema: {
      type: "object",
      properties: {
        file_path: {
          type: "string",
          description: "Absolute or relative path to the PDF file"
        }
      },
      required: ["file_path"]
    }
  • TypeScript type definitions for the 'pdf-parse' library used in the read_pdf handler.
    declare module 'pdf-parse' {
      import { Buffer } from 'buffer';
    
      export default function pdfParse(data: Buffer | Uint8Array | string): Promise<{
        numpages?: number;
        numrender?: number;
        info?: any;
        metadata?: any;
        version?: string;
        text: string;
        textAsHtml?: string;
      }>;
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Theorhd/Pdftools-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server