get_pdf_metadata

Extract metadata and basic information from a PDF file, including page count, file size, creation dates, and document properties. Provide the file path to retrieve these details.

Instructions

Extract metadata and basic information from a PDF file, including page count, file size, creation dates, and document properties. Use either absolute_path for any location or relative_path for files in ~/pdf-agent/ directory.

Input Schema

TableJSON Schema

Name	Required	Description
`absolute_path`	No	Absolute path to the PDF file (e.g., '/Users/john/documents/report.pdf')
`relative_path`	No	Path relative to ~/pdf-agent/ directory (e.g., 'reports/annual.pdf')
`use_pdf_home`	No	Use PDF agent home directory for relative paths (default: true)

Implementation Reference

src/index.ts:1773-1909 (handler)

The handler function for the 'get_pdf_metadata' tool. It parses arguments, resolves the file path (absolute or relative to ~/pdf-agent/), reads the PDF, extracts metadata (page count, file size, title, author, subject, creator, producer, dates), and returns the result as JSON.

case "get_pdf_metadata": {
  const { absolute_path, relative_path, use_pdf_home } = GetPdfMetadataSchema.parse(args);
  
  try {
    // Resolve the final path based on parameters
    let resolvedPath: string;
    
    if (use_pdf_home && relative_path) {
      // Use relative path from PDF agent home directory
      const pdfAgentHome = await ensurePdfAgentHome();
      resolvedPath = join(pdfAgentHome, relative_path);
    } else if (absolute_path) {
      // Use absolute path directly
      if (!isAbsolute(absolute_path)) {
        return {
          content: [
            {
              type: "text",
              text: JSON.stringify({ 
                error: `Path '${absolute_path}' is not absolute. Use relative_path parameter for relative paths or provide a full absolute path.` 
              }),
            },
          ],
        };
      }
      resolvedPath = absolute_path;
    } else {
      return {
        content: [
          {
            type: "text",
            text: JSON.stringify({ 
              error: `Must provide either 'absolute_path' or 'relative_path'. Examples: {"absolute_path": "/Users/john/document.pdf"} or {"relative_path": "reports/annual.pdf"}` 
            }),
          },
        ],
      };
    }
    
    if (!(await fileExists(resolvedPath))) {
      const pathType = relative_path ? 'relative path' : 'absolute path';
      const homeInfo = relative_path ? ` (resolved from ~/pdf-agent/ to ${resolvedPath})` : '';
      return {
        content: [
          {
            type: "text",
            text: JSON.stringify({ 
              error: `PDF file not found at ${pathType}: ${relative_path || absolute_path}${homeInfo}. Please check the file path and ensure the file exists.` 
            }),
          },
        ],
      };
    }

    // Read the PDF file
    const pdfBuffer = await safeReadFile(resolvedPath);
    
    // Get file stats
    const stats = await stat(resolvedPath);
    
    // Parse PDF to get metadata and page count
    // Try loading with encryption ignored first for encrypted PDFs
    let pdfDoc: PDFDocument;
    try {
      pdfDoc = await PDFDocument.load(pdfBuffer);
    } catch (error) {
      if (error instanceof Error && error.message.includes('encrypted')) {
        pdfDoc = await PDFDocument.load(pdfBuffer, { ignoreEncryption: true });
      } else {
        throw error;
      }
    }
    
    // Get page count
    const pageCount = pdfDoc.getPageCount();
    
    // Extract metadata from PDF with error handling
    const title = pdfDoc.getTitle();
    const author = pdfDoc.getAuthor();
    const subject = pdfDoc.getSubject();
    const creator = pdfDoc.getCreator();
    const producer = pdfDoc.getProducer();
    
    // Handle potentially corrupted dates
    let creationDate: Date | null = null;
    let modificationDate: Date | null = null;
    
    try {
      creationDate = pdfDoc.getCreationDate() || null;
    } catch (e) {
      // Ignore corrupted creation date
    }
    
    try {
      modificationDate = pdfDoc.getModificationDate() || null;
    } catch (e) {
      // Ignore corrupted modification date
    }
    
    return {
      content: [
        {
          type: "text",
          text: JSON.stringify({
            file_path: resolvedPath,
            pages: pageCount,
            file_size_bytes: stats.size,
            file_size_mb: Number((stats.size / (1024 * 1024)).toFixed(2)),
            created_date: stats.birthtime?.toISOString() || null,
            modified_date: stats.mtime?.toISOString() || null,
            title: title || null,
            author: author || null,
            subject: subject || null,  
            creator: creator || null,
            producer: producer || null,
            creation_date: creationDate?.toISOString() || null,
            modification_date: modificationDate?.toISOString() || null,
            encrypted: false, // We handle encrypted PDFs by ignoring encryption
          }),
        },
      ],
    };
  } catch (e) {
    const providedPath = relative_path || absolute_path || 'unknown';
    const pathType = relative_path ? 'relative path' : 'absolute path';
    return {
      content: [
        {
          type: "text",
          text: JSON.stringify({ 
            error: `Error processing PDF at ${pathType} '${providedPath}': ${e}. Please ensure the file is a valid PDF and not corrupted.` 
          }),
        },
      ],
    };
  }
}

src/index.ts:101-110 (schema)

Zod schema 'GetPdfMetadataSchema' defining input validation for get_pdf_metadata. Accepts optional 'absolute_path', 'relative_path', and 'use_pdf_home' (default true). Requires exactly one of absolute_path or relative_path.

const GetPdfMetadataSchema = z.object({
  absolute_path: z.string().optional(),
  relative_path: z.string().optional(),
  use_pdf_home: z.boolean().default(true),
}).refine(
  (data) => (data.absolute_path && !data.relative_path) || (!data.absolute_path && data.relative_path),
  {
    message: "Exactly one of 'absolute_path' or 'relative_path' must be provided",
  }
);

src/index.ts:1458-1478 (registration)

Tool registration in the ListToolsRequestSchema handler. Defines the tool name 'get_pdf_metadata', its description, and input schema (JSON Schema format) for the MCP protocol.

  name: "get_pdf_metadata",
  description: "Extract metadata and basic information from a PDF file, including page count, file size, creation dates, and document properties. Use either absolute_path for any location or relative_path for files in ~/pdf-agent/ directory.",
  inputSchema: {
    type: "object",
    properties: {
      absolute_path: {
        type: "string",
        description: "Absolute path to the PDF file (e.g., '/Users/john/documents/report.pdf')",
      },
      relative_path: {
        type: "string", 
        description: "Path relative to ~/pdf-agent/ directory (e.g., 'reports/annual.pdf')",
      },
      use_pdf_home: {
        type: "boolean",
        description: "Use PDF agent home directory for relative paths (default: true)",
        default: true,
      },
    },
  },
},

PDF Agent MCP

get_pdf_metadata

Instructions

Input Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API