extract_pdf_metadata
Extract metadata and document information from PDF files to analyze file properties and content structure.
Instructions
Extract metadata and document information from PDF files
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes | Path to the PDF file to extract metadata from |
Implementation Reference
- src/tools/extract-metadata.ts:22-32 (handler)The main handler function for the 'extract_pdf_metadata' tool. It validates input using Zod schema, instantiates MetadataParser, and calls parseMetadata on the provided file path.export async function handleExtractMetadata(args: unknown): Promise<PDFMetadata> { try { const params = ExtractMetadataParamsSchema.parse(args); const parser = new MetadataParser(); return await parser.parseMetadata(params.file_path); } catch (error) { const mcpError = handleError(error, typeof args === 'object' && args !== null && 'file_path' in args ? String(args.file_path) : undefined); throw new Error(JSON.stringify(mcpError)); } }
- src/types/mcp-types.ts:14-16 (schema)Zod schema used for input validation in the tool handler, defining the required 'file_path' parameter.export const ExtractMetadataParamsSchema = z.object({ file_path: filePathValidation });
- src/index.ts:63-71 (registration)Switch case registration in the main server request handler that dispatches calls to 'extract_pdf_metadata' to the handleExtractMetadata function.case 'extract_pdf_metadata': return { content: [ { type: 'text', text: JSON.stringify(await handleExtractMetadata(args), null, 2), }, ], };
- src/index.ts:41-44 (registration)Tool listing registration where extractMetadataTool is included in the list returned for ListToolsRequestSchema.extractTextTool, extractMetadataTool, extractPagesTool, validatePDFTool,
- Core helper method that performs the actual PDF metadata extraction using pdf-parse library, including file validation, reading, parsing, and formatting.async parseMetadata(filePath: string): Promise<PDFMetadata> { await validatePDFFile(filePath); const buffer = await fs.readFile(filePath); const stats = await fs.stat(filePath); const pdfData = await withTimeout( pdf(buffer), this.config.processingTimeout ); return this.formatMetadata(pdfData, stats.size); }