Skip to main content
Glama

Compare PDF Structures

compare_structure
Read-onlyIdempotent

Compare internal structures of two PDFs to identify differences in properties, fonts, and objects. Useful for verifying exports, tracking changes, and diagnosing generation issues.

Instructions

Compare the internal structures of two PDF documents and identify differences.

Args:

  • file_path_1 (string): Absolute path to the first PDF file

  • file_path_2 (string): Absolute path to the second PDF file

  • response_format ('markdown' | 'json'): Output format (default: 'markdown')

Returns: Structural comparison including: property-by-property diff (page count, PDF version, encryption, tagged status, object counts, page dimensions, file size, catalog entries, signatures), font comparison (fonts unique to each file and shared fonts), and a summary.

Examples:

  • Compare two versions of the same document

  • Verify structural consistency across PDF exports

  • Identify differences in PDF generation pipelines

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_path_1YesAbsolute path to the first PDF file for comparison
file_path_2YesAbsolute path to the second PDF file for comparison
response_formatNoOutput format: "markdown" for human-readable, "json" for structured datamarkdown

Implementation Reference

  • Main handler: compares two PDF documents by analyzing their structure, fonts, and metadata in parallel, then building a diff table and font comparison.
    export async function compareStructure(
      filePath1: string,
      filePath2: string,
    ): Promise<StructureComparison> {
      // Analyze both files in parallel
      const [struct1, struct2, fonts1, fonts2, meta1, meta2] = await Promise.all([
        analyzeStructure(filePath1),
        analyzeStructure(filePath2),
        analyzeFontsWithPdfLib(filePath1),
        analyzeFontsWithPdfLib(filePath2),
        getMetadata(filePath1),
        getMetadata(filePath2),
      ]);
    
      const diffs: StructureDiffEntry[] = [];
    
      // Page count
      addDiff(
        diffs,
        'Page Count',
        String(struct1.pageTree.totalPages),
        String(struct2.pageTree.totalPages),
      );
    
      // PDF version
      addDiff(diffs, 'PDF Version', struct1.pdfVersion ?? 'Unknown', struct2.pdfVersion ?? 'Unknown');
    
      // Encrypted
      addDiff(diffs, 'Encrypted', String(struct1.isEncrypted), String(struct2.isEncrypted));
    
      // Tagged
      addDiff(diffs, 'Tagged', String(meta1.isTagged), String(meta2.isTagged));
    
      // Total objects
      addDiff(
        diffs,
        'Total Objects',
        String(struct1.objectStats.totalObjects),
        String(struct2.objectStats.totalObjects),
      );
    
      // Stream count
      addDiff(
        diffs,
        'Stream Count',
        String(struct1.objectStats.streamCount),
        String(struct2.objectStats.streamCount),
      );
    
      // First page dimensions
      const dim1 = struct1.pageTree.mediaBoxSamples[0];
      const dim2 = struct2.pageTree.mediaBoxSamples[0];
      addDiff(
        diffs,
        'Page 1 Dimensions (pt)',
        dim1 ? `${dim1.width} x ${dim1.height}` : 'N/A',
        dim2 ? `${dim2.width} x ${dim2.height}` : 'N/A',
      );
    
      // File size
      addDiff(diffs, 'File Size', formatFileSize(meta1.fileSize), formatFileSize(meta2.fileSize));
    
      // Catalog entry count
      addDiff(diffs, 'Catalog Entries', String(struct1.catalog.length), String(struct2.catalog.length));
    
      // Signatures
      addDiff(diffs, 'Has Signatures', String(meta1.hasSignatures), String(meta2.hasSignatures));
    
      // Font comparison
      const fontNames1 = new Set(fonts1.fontMap.keys());
      const fontNames2 = new Set(fonts2.fontMap.keys());
      const onlyInFile1 = [...fontNames1].filter((f) => !fontNames2.has(f));
      const onlyInFile2 = [...fontNames2].filter((f) => !fontNames1.has(f));
      const inBoth = [...fontNames1].filter((f) => fontNames2.has(f));
    
      addDiff(diffs, 'Total Fonts', String(fontNames1.size), String(fontNames2.size));
    
      // Summary
      const matchCount = diffs.filter((d) => d.status === 'match').length;
      const diffCount = diffs.filter((d) => d.status === 'differ').length;
      const summary =
        diffCount === 0
          ? `All ${matchCount} properties match between the two PDFs.`
          : `${diffCount} difference(s) found out of ${diffs.length} properties compared.`;
    
      return {
        file1: basename(filePath1),
        file2: basename(filePath2),
        diffs,
        fontComparison: { onlyInFile1, onlyInFile2, inBoth },
        summary,
      };
    }
  • Zod input schema for compare_structure: requires file_path_1, file_path_2, and optional response_format.
    export const CompareStructureSchema = z
      .object({
        file_path_1: FilePathSchema.describe('Absolute path to the first PDF file for comparison'),
        file_path_2: FilePathSchema.describe('Absolute path to the second PDF file for comparison'),
        response_format: ResponseFormatSchema,
      })
      .strict();
  • Registers the compare_structure tool on the MCP server with schema, annotations, and async handler callback.
    export function registerCompareStructure(server: McpServer): void {
      server.registerTool(
        'compare_structure',
        {
          title: 'Compare PDF Structures',
          description: `Compare the internal structures of two PDF documents and identify differences.
    
    Args:
      - file_path_1 (string): Absolute path to the first PDF file
      - file_path_2 (string): Absolute path to the second PDF file
      - response_format ('markdown' | 'json'): Output format (default: 'markdown')
    
    Returns:
      Structural comparison including: property-by-property diff (page count, PDF version, encryption, tagged status, object counts, page dimensions, file size, catalog entries, signatures), font comparison (fonts unique to each file and shared fonts), and a summary.
    
    Examples:
      - Compare two versions of the same document
      - Verify structural consistency across PDF exports
      - Identify differences in PDF generation pipelines`,
          inputSchema: CompareStructureSchema,
          annotations: {
            readOnlyHint: true,
            destructiveHint: false,
            idempotentHint: true,
            openWorldHint: false,
          },
        },
        async (params: CompareStructureInput) => {
          try {
            const result = await compareStructure(params.file_path_1, params.file_path_2);
    
            const raw =
              params.response_format === ResponseFormat.JSON
                ? JSON.stringify(result, null, 2)
                : formatCompareStructureMarkdown(result);
    
            const { text } = truncateIfNeeded(raw);
            return { content: [{ type: 'text' as const, text }] };
          } catch (error) {
            const err = handleStructuredError(error);
            return {
              content: [{ type: 'text' as const, text: JSON.stringify(err, null, 2) }],
              isError: true,
            };
          }
        },
      );
    }
  • Helper that formats the StructureComparison result as Markdown with a property diff table, font comparison section, and summary.
    export function formatCompareStructureMarkdown(result: StructureComparison): string {
      const lines: string[] = ['# PDF Structure Comparison', ''];
    
      lines.push(`Comparing **${result.file1}** vs **${result.file2}**`);
    
      // Property diff table
      lines.push('', '## Property Comparison', '');
      lines.push('| Property | File 1 | File 2 | Status |', '|---|---|---|---|');
      for (const diff of result.diffs) {
        const statusIcon = diff.status === 'match' ? '\u2705' : '\u274c';
        lines.push(`| ${diff.property} | ${diff.file1Value} | ${diff.file2Value} | ${statusIcon} |`);
      }
    
      // Font comparison
      const fc = result.fontComparison;
      lines.push('', '## Font Comparison', '');
    
      if (fc.inBoth.length > 0) {
        lines.push(`- **Shared fonts** (${fc.inBoth.length}): ${fc.inBoth.join(', ')}`);
      }
      if (fc.onlyInFile1.length > 0) {
        lines.push(
          `- **Only in ${result.file1}** (${fc.onlyInFile1.length}): ${fc.onlyInFile1.join(', ')}`,
        );
      }
      if (fc.onlyInFile2.length > 0) {
        lines.push(
          `- **Only in ${result.file2}** (${fc.onlyInFile2.length}): ${fc.onlyInFile2.join(', ')}`,
        );
      }
      if (fc.inBoth.length === 0 && fc.onlyInFile1.length === 0 && fc.onlyInFile2.length === 0) {
        lines.push('No fonts found in either document.');
      }
    
      lines.push('', '---', '', `**Summary**: ${result.summary}`);
    
      return lines.join('\n');
    }
  • Type definitions: StructureDiffEntry (single diff row) and StructureComparison (full output type).
    export interface StructureDiffEntry {
      property: string;
      file1Value: string;
      file2Value: string;
      status: 'match' | 'differ';
    }
    
    /** compare_structure output */
    export interface StructureComparison {
      file1: string;
      file2: string;
      diffs: StructureDiffEntry[];
      fontComparison: {
        onlyInFile1: string[];
        onlyInFile2: string[];
        inBoth: string[];
      };
      summary: string;
    }
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark the tool as read-only and idempotent. The description adds value by specifying the return content (property-by-property diff, font comparison, summary) and that it reads two files. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is well-organized with a brief intro, args list, returns list, and examples. Every sentence is necessary, no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description thoroughly explains what is returned (structural comparison details). All 3 parameters (2 required) are described, and examples provide practical context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds examples and return format details beyond the schema, such as the default for response_format and the structure of the comparison output.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it compares internal structures of two PDF documents and identifies differences, listing specific aspects like property-by-property diff and font comparison. This distinguishes it from sibling tools like inspect_structure which likely handle single documents.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Examples provided (e.g., comparing two versions of the same document, verifying structural consistency) give clear context for when to use the tool. However, it does not explicitly state when not to use it or mention alternatives, though the sibling list implies other tools for different tasks.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/shuji-bonji/pdf-reader-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server