Compare PDF Structures

compare_structure

Read-onlyIdempotent

Compare internal structures of two PDFs to identify differences in properties, fonts, and objects. Useful for verifying exports, tracking changes, and diagnosing generation issues.

Instructions

Compare the internal structures of two PDF documents and identify differences.

Args:

file_path_1 (string): Absolute path to the first PDF file
file_path_2 (string): Absolute path to the second PDF file
response_format ('markdown' | 'json'): Output format (default: 'markdown')

Returns: Structural comparison including: property-by-property diff (page count, PDF version, encryption, tagged status, object counts, page dimensions, file size, catalog entries, signatures), font comparison (fonts unique to each file and shared fonts), and a summary.

Examples:

Compare two versions of the same document
Verify structural consistency across PDF exports
Identify differences in PDF generation pipelines

Input Schema

TableJSON Schema

Name	Required	Description	Default
`file_path_1`	Yes	Absolute path to the first PDF file for comparison
`file_path_2`	Yes	Absolute path to the second PDF file for comparison
`response_format`	No	Output format: "markdown" for human-readable, "json" for structured data	markdown

Implementation Reference

src/services/validation-service.ts:615-707 (handler)

Main handler: compares two PDF documents by analyzing their structure, fonts, and metadata in parallel, then building a diff table and font comparison.

export async function compareStructure(
  filePath1: string,
  filePath2: string,
): Promise<StructureComparison> {
  // Analyze both files in parallel
  const [struct1, struct2, fonts1, fonts2, meta1, meta2] = await Promise.all([
    analyzeStructure(filePath1),
    analyzeStructure(filePath2),
    analyzeFontsWithPdfLib(filePath1),
    analyzeFontsWithPdfLib(filePath2),
    getMetadata(filePath1),
    getMetadata(filePath2),
  ]);

  const diffs: StructureDiffEntry[] = [];

  // Page count
  addDiff(
    diffs,
    'Page Count',
    String(struct1.pageTree.totalPages),
    String(struct2.pageTree.totalPages),
  );

  // PDF version
  addDiff(diffs, 'PDF Version', struct1.pdfVersion ?? 'Unknown', struct2.pdfVersion ?? 'Unknown');

  // Encrypted
  addDiff(diffs, 'Encrypted', String(struct1.isEncrypted), String(struct2.isEncrypted));

  // Tagged
  addDiff(diffs, 'Tagged', String(meta1.isTagged), String(meta2.isTagged));

  // Total objects
  addDiff(
    diffs,
    'Total Objects',
    String(struct1.objectStats.totalObjects),
    String(struct2.objectStats.totalObjects),
  );

  // Stream count
  addDiff(
    diffs,
    'Stream Count',
    String(struct1.objectStats.streamCount),
    String(struct2.objectStats.streamCount),
  );

  // First page dimensions
  const dim1 = struct1.pageTree.mediaBoxSamples[0];
  const dim2 = struct2.pageTree.mediaBoxSamples[0];
  addDiff(
    diffs,
    'Page 1 Dimensions (pt)',
    dim1 ? `${dim1.width} x ${dim1.height}` : 'N/A',
    dim2 ? `${dim2.width} x ${dim2.height}` : 'N/A',
  );

  // File size
  addDiff(diffs, 'File Size', formatFileSize(meta1.fileSize), formatFileSize(meta2.fileSize));

  // Catalog entry count
  addDiff(diffs, 'Catalog Entries', String(struct1.catalog.length), String(struct2.catalog.length));

  // Signatures
  addDiff(diffs, 'Has Signatures', String(meta1.hasSignatures), String(meta2.hasSignatures));

  // Font comparison
  const fontNames1 = new Set(fonts1.fontMap.keys());
  const fontNames2 = new Set(fonts2.fontMap.keys());
  const onlyInFile1 = [...fontNames1].filter((f) => !fontNames2.has(f));
  const onlyInFile2 = [...fontNames2].filter((f) => !fontNames1.has(f));
  const inBoth = [...fontNames1].filter((f) => fontNames2.has(f));

  addDiff(diffs, 'Total Fonts', String(fontNames1.size), String(fontNames2.size));

  // Summary
  const matchCount = diffs.filter((d) => d.status === 'match').length;
  const diffCount = diffs.filter((d) => d.status === 'differ').length;
  const summary =
    diffCount === 0
      ? `All ${matchCount} properties match between the two PDFs.`
      : `${diffCount} difference(s) found out of ${diffs.length} properties compared.`;

  return {
    file1: basename(filePath1),
    file2: basename(filePath2),
    diffs,
    fontComparison: { onlyInFile1, onlyInFile2, inBoth },
    summary,
  };
}

src/schemas/tier3.ts:25-31 (schema)

Zod input schema for compare_structure: requires file_path_1, file_path_2, and optional response_format.

export const CompareStructureSchema = z
  .object({
    file_path_1: FilePathSchema.describe('Absolute path to the first PDF file for comparison'),
    file_path_2: FilePathSchema.describe('Absolute path to the second PDF file for comparison'),
    response_format: ResponseFormatSchema,
  })
  .strict();

src/tools/tier3/compare-structure.ts:12-59 (registration)

Registers the compare_structure tool on the MCP server with schema, annotations, and async handler callback.

export function registerCompareStructure(server: McpServer): void {
  server.registerTool(
    'compare_structure',
    {
      title: 'Compare PDF Structures',
      description: `Compare the internal structures of two PDF documents and identify differences.

Args:
  - file_path_1 (string): Absolute path to the first PDF file
  - file_path_2 (string): Absolute path to the second PDF file
  - response_format ('markdown' | 'json'): Output format (default: 'markdown')

Returns:
  Structural comparison including: property-by-property diff (page count, PDF version, encryption, tagged status, object counts, page dimensions, file size, catalog entries, signatures), font comparison (fonts unique to each file and shared fonts), and a summary.

Examples:
  - Compare two versions of the same document
  - Verify structural consistency across PDF exports
  - Identify differences in PDF generation pipelines`,
      inputSchema: CompareStructureSchema,
      annotations: {
        readOnlyHint: true,
        destructiveHint: false,
        idempotentHint: true,
        openWorldHint: false,
      },
    },
    async (params: CompareStructureInput) => {
      try {
        const result = await compareStructure(params.file_path_1, params.file_path_2);

        const raw =
          params.response_format === ResponseFormat.JSON
            ? JSON.stringify(result, null, 2)
            : formatCompareStructureMarkdown(result);

        const { text } = truncateIfNeeded(raw);
        return { content: [{ type: 'text' as const, text }] };
      } catch (error) {
        const err = handleStructuredError(error);
        return {
          content: [{ type: 'text' as const, text: JSON.stringify(err, null, 2) }],
          isError: true,
        };
      }
    },
  );
}

src/utils/formatter.ts:446-483 (helper)

Helper that formats the StructureComparison result as Markdown with a property diff table, font comparison section, and summary.

export function formatCompareStructureMarkdown(result: StructureComparison): string {
  const lines: string[] = ['# PDF Structure Comparison', ''];

  lines.push(`Comparing **${result.file1}** vs **${result.file2}**`);

  // Property diff table
  lines.push('', '## Property Comparison', '');
  lines.push('| Property | File 1 | File 2 | Status |', '|---|---|---|---|');
  for (const diff of result.diffs) {
    const statusIcon = diff.status === 'match' ? '\u2705' : '\u274c';
    lines.push(`| ${diff.property} | ${diff.file1Value} | ${diff.file2Value} | ${statusIcon} |`);
  }

  // Font comparison
  const fc = result.fontComparison;
  lines.push('', '## Font Comparison', '');

  if (fc.inBoth.length > 0) {
    lines.push(`- **Shared fonts** (${fc.inBoth.length}): ${fc.inBoth.join(', ')}`);
  }
  if (fc.onlyInFile1.length > 0) {
    lines.push(
      `- **Only in ${result.file1}** (${fc.onlyInFile1.length}): ${fc.onlyInFile1.join(', ')}`,
    );
  }
  if (fc.onlyInFile2.length > 0) {
    lines.push(
      `- **Only in ${result.file2}** (${fc.onlyInFile2.length}): ${fc.onlyInFile2.join(', ')}`,
    );
  }
  if (fc.inBoth.length === 0 && fc.onlyInFile1.length === 0 && fc.onlyInFile2.length === 0) {
    lines.push('No fonts found in either document.');
  }

  lines.push('', '---', '', `**Summary**: ${result.summary}`);

  return lines.join('\n');
}

src/types.ts:292-310 (helper)

Type definitions: StructureDiffEntry (single diff row) and StructureComparison (full output type).

export interface StructureDiffEntry {
  property: string;
  file1Value: string;
  file2Value: string;
  status: 'match' | 'differ';
}

/** compare_structure output */
export interface StructureComparison {
  file1: string;
  file2: string;
  diffs: StructureDiffEntry[];
  fontComparison: {
    onlyInFile1: string[];
    onlyInFile2: string[];
    inBoth: string[];
  };
  summary: string;
}

@shuji-bonji/pdf-reader-mcp

Compare PDF Structures

Instructions

Input Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API