Summarize PDF

summarize

Read-onlyIdempotent

Generate a quick overview of any PDF document, including page count, file size, text presence, image count, and a text preview from the first page. Use it as a first step to decide which detailed tools to apply next.

Instructions

Generate a quick overview report of a PDF document.

Combines metadata, text presence check, image count, and a text preview from the first page into a single summary. Useful as a first step before deciding which detailed tools to use.

Args:

file_path (string): Absolute path to a local PDF file
response_format ('markdown' | 'json'): Output format (default: 'markdown')

Returns: Summary including: page count, PDF version, file size, tagged/encrypted/signature flags, text presence, image count, and a text preview from page 1.

Examples:

Quick overview: { file_path: "/path/to/doc.pdf" }
Machine-readable: { file_path: "/path/to/doc.pdf", response_format: "json" }

Input Schema

TableJSON Schema

Name	Required	Description	Default
`file_path`	Yes	Absolute path to a local PDF file (e.g., "/path/to/document.pdf")
`response_format`	No	Output format: "markdown" for human-readable, "json" for structured data	markdown

Implementation Reference

src/tools/tier1/summarize.ts:18-88 (handler)

Main handler function for the 'summarize' tool. Registers the tool with the MCP server, loads a PDF document, extracts metadata, text preview (first 500 chars), and image count, then returns the result as either markdown or JSON.

export function registerSummarize(server: McpServer): void {
  server.registerTool(
    'summarize',
    {
      title: 'Summarize PDF',
      description: `Generate a quick overview report of a PDF document.

Combines metadata, text presence check, image count, and a text preview from the first page into a single summary. Useful as a first step before deciding which detailed tools to use.

Args:
  - file_path (string): Absolute path to a local PDF file
  - response_format ('markdown' | 'json'): Output format (default: 'markdown')

Returns:
  Summary including: page count, PDF version, file size, tagged/encrypted/signature flags, text presence, image count, and a text preview from page 1.

Examples:
  - Quick overview: { file_path: "/path/to/doc.pdf" }
  - Machine-readable: { file_path: "/path/to/doc.pdf", response_format: "json" }`,
      inputSchema: SummarizeSchema,
      annotations: {
        readOnlyHint: true,
        destructiveHint: false,
        idempotentHint: true,
        openWorldHint: false,
      },
    },
    async (params: SummarizeInput) => {
      try {
        // Load the PDF document once and reuse for all operations
        const doc = await loadDocument(params.file_path);

        try {
          const [metadata, firstPageTexts, imageCount] = await Promise.all([
            getMetadataFromDoc(doc, params.file_path),
            extractTextFromDoc(doc, '1'),
            countImagesFromDoc(doc),
          ]);

          const textPreview = firstPageTexts[0]?.text?.slice(0, 500) ?? '';
          const hasText = textPreview.trim().length > 0;

          const summary: PdfSummary = {
            filePath: params.file_path,
            metadata,
            textPreview,
            imageCount,
            hasText,
          };

          const text =
            params.response_format === ResponseFormat.JSON
              ? JSON.stringify(summary, null, 2)
              : formatSummaryMarkdown(summary);

          return {
            content: [{ type: 'text' as const, text }],
          };
        } finally {
          await doc.destroy();
        }
      } catch (error) {
        const err = handleStructuredError(error);
        return {
          content: [{ type: 'text' as const, text: JSON.stringify(err, null, 2) }],
          isError: true,
        };
      }
    },
  );
}

src/schemas/tier1.ts:125-131 (schema)
Zod schema for the summarize tool input. Defines file_path (string) and response_format ('markdown' | 'json') as required parameters.
```
/** summarize */
export const SummarizeSchema = z
  .object({
    file_path: FilePathSchema,
    response_format: ResponseFormatSchema,
  })
  .strict();
```
src/tools/index.ts:14-14 (registration)
Import of registerSummarize in the central tool registration file.
```
import { registerSummarize } from './tier1/summarize.js';
```
src/tools/index.ts:38-38 (registration)
Call to registerSummarize(server) from the central registerAllTools function.
```
registerSummarize(server);
```

src/utils/formatter.ts:117-141 (helper)

Helper function formatSummaryMarkdown that formats the PdfSummary object into a Markdown table with properties like page count, PDF version, file size, tagged/encrypted/signature flags, text presence, image count, and a text preview from the first page.

export function formatSummaryMarkdown(summary: PdfSummary): string {
  const meta = summary.metadata;
  const lines: string[] = [
    `# PDF Summary`,
    '',
    `| Property | Value |`,
    `|---|---|`,
    `| Pages | ${meta.pageCount} |`,
    `| PDF Version | ${meta.pdfVersion ?? 'Unknown'} |`,
    `| File Size | ${formatFileSize(meta.fileSize)} |`,
    `| Tagged | ${meta.isTagged ? 'Yes' : 'No'} |`,
    `| Encrypted | ${meta.isEncrypted ? 'Yes' : 'No'} |`,
    `| Signatures | ${meta.hasSignatures ? 'Yes' : 'No'} |`,
    `| Has Text | ${summary.hasText ? 'Yes' : 'No'} |`,
    `| Images | ${summary.imageCount} |`,
  ];

  if (meta.title) lines.push(`| Title | ${meta.title} |`);
  if (meta.author) lines.push(`| Author | ${meta.author} |`);

  if (summary.textPreview) {
    lines.push('', '## Text Preview (first page)', '', summary.textPreview);
  }

  return lines.join('\n');

@shuji-bonji/pdf-reader-mcp

Summarize PDF

Instructions

Input Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API