get_pdf_metadata
Extract metadata and basic information from a PDF file, including page count, file size, creation dates, and document properties. Provide the file path to retrieve these details.
Instructions
Extract metadata and basic information from a PDF file, including page count, file size, creation dates, and document properties. Use either absolute_path for any location or relative_path for files in ~/pdf-agent/ directory.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| absolute_path | No | Absolute path to the PDF file (e.g., '/Users/john/documents/report.pdf') | |
| relative_path | No | Path relative to ~/pdf-agent/ directory (e.g., 'reports/annual.pdf') | |
| use_pdf_home | No | Use PDF agent home directory for relative paths (default: true) |
Implementation Reference
- src/index.ts:1773-1909 (handler)The handler function for the 'get_pdf_metadata' tool. It parses arguments, resolves the file path (absolute or relative to ~/pdf-agent/), reads the PDF, extracts metadata (page count, file size, title, author, subject, creator, producer, dates), and returns the result as JSON.
case "get_pdf_metadata": { const { absolute_path, relative_path, use_pdf_home } = GetPdfMetadataSchema.parse(args); try { // Resolve the final path based on parameters let resolvedPath: string; if (use_pdf_home && relative_path) { // Use relative path from PDF agent home directory const pdfAgentHome = await ensurePdfAgentHome(); resolvedPath = join(pdfAgentHome, relative_path); } else if (absolute_path) { // Use absolute path directly if (!isAbsolute(absolute_path)) { return { content: [ { type: "text", text: JSON.stringify({ error: `Path '${absolute_path}' is not absolute. Use relative_path parameter for relative paths or provide a full absolute path.` }), }, ], }; } resolvedPath = absolute_path; } else { return { content: [ { type: "text", text: JSON.stringify({ error: `Must provide either 'absolute_path' or 'relative_path'. Examples: {"absolute_path": "/Users/john/document.pdf"} or {"relative_path": "reports/annual.pdf"}` }), }, ], }; } if (!(await fileExists(resolvedPath))) { const pathType = relative_path ? 'relative path' : 'absolute path'; const homeInfo = relative_path ? ` (resolved from ~/pdf-agent/ to ${resolvedPath})` : ''; return { content: [ { type: "text", text: JSON.stringify({ error: `PDF file not found at ${pathType}: ${relative_path || absolute_path}${homeInfo}. Please check the file path and ensure the file exists.` }), }, ], }; } // Read the PDF file const pdfBuffer = await safeReadFile(resolvedPath); // Get file stats const stats = await stat(resolvedPath); // Parse PDF to get metadata and page count // Try loading with encryption ignored first for encrypted PDFs let pdfDoc: PDFDocument; try { pdfDoc = await PDFDocument.load(pdfBuffer); } catch (error) { if (error instanceof Error && error.message.includes('encrypted')) { pdfDoc = await PDFDocument.load(pdfBuffer, { ignoreEncryption: true }); } else { throw error; } } // Get page count const pageCount = pdfDoc.getPageCount(); // Extract metadata from PDF with error handling const title = pdfDoc.getTitle(); const author = pdfDoc.getAuthor(); const subject = pdfDoc.getSubject(); const creator = pdfDoc.getCreator(); const producer = pdfDoc.getProducer(); // Handle potentially corrupted dates let creationDate: Date | null = null; let modificationDate: Date | null = null; try { creationDate = pdfDoc.getCreationDate() || null; } catch (e) { // Ignore corrupted creation date } try { modificationDate = pdfDoc.getModificationDate() || null; } catch (e) { // Ignore corrupted modification date } return { content: [ { type: "text", text: JSON.stringify({ file_path: resolvedPath, pages: pageCount, file_size_bytes: stats.size, file_size_mb: Number((stats.size / (1024 * 1024)).toFixed(2)), created_date: stats.birthtime?.toISOString() || null, modified_date: stats.mtime?.toISOString() || null, title: title || null, author: author || null, subject: subject || null, creator: creator || null, producer: producer || null, creation_date: creationDate?.toISOString() || null, modification_date: modificationDate?.toISOString() || null, encrypted: false, // We handle encrypted PDFs by ignoring encryption }), }, ], }; } catch (e) { const providedPath = relative_path || absolute_path || 'unknown'; const pathType = relative_path ? 'relative path' : 'absolute path'; return { content: [ { type: "text", text: JSON.stringify({ error: `Error processing PDF at ${pathType} '${providedPath}': ${e}. Please ensure the file is a valid PDF and not corrupted.` }), }, ], }; } } - src/index.ts:101-110 (schema)Zod schema 'GetPdfMetadataSchema' defining input validation for get_pdf_metadata. Accepts optional 'absolute_path', 'relative_path', and 'use_pdf_home' (default true). Requires exactly one of absolute_path or relative_path.
const GetPdfMetadataSchema = z.object({ absolute_path: z.string().optional(), relative_path: z.string().optional(), use_pdf_home: z.boolean().default(true), }).refine( (data) => (data.absolute_path && !data.relative_path) || (!data.absolute_path && data.relative_path), { message: "Exactly one of 'absolute_path' or 'relative_path' must be provided", } ); - src/index.ts:1458-1478 (registration)Tool registration in the ListToolsRequestSchema handler. Defines the tool name 'get_pdf_metadata', its description, and input schema (JSON Schema format) for the MCP protocol.
name: "get_pdf_metadata", description: "Extract metadata and basic information from a PDF file, including page count, file size, creation dates, and document properties. Use either absolute_path for any location or relative_path for files in ~/pdf-agent/ directory.", inputSchema: { type: "object", properties: { absolute_path: { type: "string", description: "Absolute path to the PDF file (e.g., '/Users/john/documents/report.pdf')", }, relative_path: { type: "string", description: "Path relative to ~/pdf-agent/ directory (e.g., 'reports/annual.pdf')", }, use_pdf_home: { type: "boolean", description: "Use PDF agent home directory for relative paths (default: true)", default: true, }, }, }, },