Skip to main content
Glama

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault

No arguments

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": true
}

Tools

Functions exposed to the LLM to take actions

NameDescription
get_page_countA

Get the total number of pages in a PDF document.

This is a lightweight operation that only reads the PDF header, not the full content.

Args:

  • file_path (string): Absolute path to a local PDF file

Returns: Page count as a number.

Examples:

  • Quick check before deciding which pages to extract

  • Validate a PDF file is readable

get_metadataA

Extract metadata from a PDF document including title, author, creation date, page count, PDF version, and structural information.

Args:

  • file_path (string): Absolute path to a local PDF file

  • response_format ('markdown' | 'json'): Output format (default: 'markdown')

Returns: Metadata including: title, author, subject, keywords, creator, producer, creation/modification dates, page count, PDF version, linearized/encrypted/tagged/signature flags, file size.

Examples:

  • Get document properties for cataloging

  • Check if a PDF is tagged (accessibility)

  • Verify PDF version compatibility

read_textA

Extract text content from a PDF document with Y-coordinate-based reading order preservation.

Text is extracted page by page, sorted by vertical position (top to bottom) then horizontal position (left to right), providing natural reading order.

For untagged multi-column PDFs (e.g. older 新旧対照表 PDFs that lack a structure tree), pass split_columns: 2 or 3 to bucket items by X-coordinate left-to-right. Tagged PDFs with proper <Table> markup should use the extract_tables tool instead.

For Japanese form-style PDFs (帳票・様式) where U+3000 fullwidth spaces are used as visual indentation, pass compact_whitespace: true to collapse runs of whitespace to a single ASCII space. Cuts 20–40% of token consumption without losing content.

Args:

  • file_path (string): Absolute path to a local PDF file

  • pages (string, optional): Page range to extract. Format: "1-5", "3", or "1,3,5-7". Omit for all pages.

  • response_format ('markdown' | 'json'): Output format (default: 'markdown')

  • split_columns (1 | 2 | 3, optional): Column-aware reordering for untagged multi-column PDFs. Default 1 = existing Y-sort.

  • compact_whitespace (boolean, optional): Collapse whitespace runs (incl. U+3000) to one ASCII space and trim each line. Default false.

Returns: Extracted text organized by page number. With split_columns >= 2, columns are separated by a blank line so a downstream LLM can tell them apart.

Examples:

  • Extract all text: { file_path: "/path/to/doc.pdf" }

  • Untagged 新旧対照表: { file_path: "/path/to/older-shinkyu.pdf", split_columns: 2 }

  • Japanese form template: { file_path: "/path/to/form.pdf", compact_whitespace: true }

search_textA

Search for text within a PDF document. Returns matching locations with surrounding context.

Case-insensitive search across all or specified pages. Each match includes the page number, the matched text, and configurable surrounding context.

Args:

  • file_path (string): Absolute path to a local PDF file

  • query (string): Text to search for (case-insensitive, 1-500 chars)

  • pages (string, optional): Page range to search. Omit for all pages.

  • context_chars (number): Characters of context before/after match (default: 80)

  • max_results (number): Maximum matches to return (default: 20, max: 100)

  • response_format ('markdown' | 'json'): Output format (default: 'markdown')

Returns: Search matches with page number, matched text, and surrounding context.

Examples:

  • Search entire PDF: { file_path: "/path/to/doc.pdf", query: "digital signature" }

  • Search specific pages: { file_path: "/path/to/doc.pdf", query: "error", pages: "1-10" }

read_imagesA

Extract images from a PDF document as base64-encoded data.

Extracts embedded images from specified or all pages. Returns image metadata (dimensions, color space) along with raw pixel data in base64.

Args:

  • file_path (string): Absolute path to a local PDF file

  • pages (string, optional): Page range. Format: "1-5", "3", or "1,3,5-7". Omit for all pages.

Returns: Array of extracted images with: page number, index, width, height, color space (RGB/RGBA/Grayscale), bits per component, and base64-encoded data.

Note: Large images may produce very large responses. Use the pages parameter to limit scope.

Examples:

  • Extract all images: { file_path: "/path/to/doc.pdf" }

  • Extract from page 1: { file_path: "/path/to/doc.pdf", pages: "1" }

read_urlA

Fetch a PDF from a URL and extract its text content.

Downloads the PDF from the specified URL, then extracts text with Y-coordinate-based reading order. Supports HTTP and HTTPS. Maximum file size: 50MB. Timeout: 30 seconds.

Like read_text, accepts split_columns: 2 | 3 for untagged multi-column PDFs and compact_whitespace: true to collapse U+3000 / ASCII whitespace runs. Tagged PDFs should use extract_tables instead.

Args:

  • url (string): URL pointing to a PDF file (HTTP or HTTPS)

  • pages (string, optional): Page range to extract. Format: "1-5", "3", or "1,3,5-7". Omit for all pages.

  • response_format ('markdown' | 'json'): Output format (default: 'markdown')

  • split_columns (1 | 2 | 3, optional): Column-aware reordering. Default 1 = existing Y-sort.

  • compact_whitespace (boolean, optional): Collapse whitespace runs (incl. U+3000) to one ASCII space. Default false.

Returns: Extracted text organized by page number, same format as read_text.

Examples:

  • Read remote PDF: { url: "https://example.com/document.pdf" }

  • Untagged 2-column PDF: { url: "https://...", split_columns: 2 }

  • Japanese form: { url: "https://...", compact_whitespace: true }

summarizeA

Generate a quick overview report of a PDF document.

Combines metadata, text presence check, image count, and a text preview from the first page into a single summary. Useful as a first step before deciding which detailed tools to use.

Args:

  • file_path (string): Absolute path to a local PDF file

  • response_format ('markdown' | 'json'): Output format (default: 'markdown')

Returns: Summary including: page count, PDF version, file size, tagged/encrypted/signature flags, text presence, image count, and a text preview from page 1.

Examples:

  • Quick overview: { file_path: "/path/to/doc.pdf" }

  • Machine-readable: { file_path: "/path/to/doc.pdf", response_format: "json" }

inspect_structureA

Examine PDF internal object structure including catalog entries, page tree, and object statistics.

Args:

  • file_path (string): Absolute path to a local PDF file

  • response_format ('markdown' | 'json'): Output format (default: 'markdown')

Returns: Catalog entries (keys and types), page tree info (page count, MediaBox samples), object statistics (total count, stream count, type distribution), and encryption status.

Examples:

  • Examine document catalog for structural features

  • Count PDF objects and streams

  • Check page dimensions across the document

inspect_tagsA

Analyze the Tagged PDF structure tree for accessibility assessment.

Args:

  • file_path (string): Absolute path to a local PDF file

  • response_format ('markdown' | 'json'): Output format (default: 'markdown')

Returns: Whether the PDF is tagged, the structure tree hierarchy with roles, max nesting depth, total element count, and role distribution (e.g., Document, P, H1, Table, Figure).

Examples:

  • Check if a PDF is tagged for accessibility (PDF/UA)

  • Inspect the tag hierarchy and role distribution

  • Assess document structure quality

inspect_fontsA

List all fonts used in a PDF document with their properties.

Args:

  • file_path (string): Absolute path to a local PDF file

  • response_format ('markdown' | 'json'): Output format (default: 'markdown')

Returns: Font name, type (TrueType, Type1, CIDFont, etc.), encoding, embedded/subset status, and pages where each font is used.

Examples:

  • Check if all fonts are embedded (required for PDF/A, PDF/X)

  • Identify font types and encodings

  • Find which pages use specific fonts

inspect_annotationsA

Extract and categorize all annotations in a PDF document.

Args:

  • file_path (string): Absolute path to a local PDF file

  • pages (string, optional): Page range. Format: "1-5", "3", or "1,3,5-7". Omit for all pages.

  • response_format ('markdown' | 'json'): Output format (default: 'markdown')

Returns: Total annotation count, breakdown by subtype (Link, Widget, Highlight, Text, etc.) and by page, flags for links/forms/markup presence, and individual annotation details.

Examples:

  • Check for form fields (Widget annotations)

  • Find all links in a document

  • Inventory markup annotations (highlights, comments)

inspect_signaturesA

Examine digital signature fields in a PDF document.

Args:

  • file_path (string): Absolute path to a local PDF file

  • response_format ('markdown' | 'json'): Output format (default: 'markdown')

Returns: Total signature field count, signed/unsigned breakdown, and details for each field (signer name, reason, location, signing time, filter/subFilter).

Note: This tool inspects signature field structure only. Cryptographic signature verification is not performed.

Examples:

  • Check if a PDF has been digitally signed

  • Inspect signer information and signing dates

  • Verify signature field structure

extract_tablesA

Extract every <Table> subtree from a Tagged PDF as a structured row/cell list, optionally rendered as Markdown tables.

How it works: walks the StructTree and pulls cell text for each <TR><TH>/<TD>, then collapses kerning whitespace (e.g. "消 費 税 法" → "消費税法"). This sidesteps reading-order extraction's failure mode on multi-column tables (typical of 新旧対照表 PDFs).

Args:

  • file_path (string): Absolute path to a local PDF file

  • pages (string, optional): Page range. Format: "1-5", "3", or "1,3,5-7". Omit for all pages.

  • response_format ('markdown' | 'json'): Output format (default: 'markdown')

Returns: Markdown — # Extracted Tables summary block followed by one ## Page N — Table M section per table with a GFM table.

JSON — { isTagged, tables: [{ page, index, headerRows, bodyRows, footerRows }], totalTables, pagesScanned, note? }.

Limitations:

  • Untagged PDFs return an empty result and a note.

  • colspan/rowspan are not honoured (cells are listed in source order).

  • Nested tables are skipped to keep page indices stable.

Examples:

  • Pull 新旧対照表 from a kaisei tsutatsu PDF for diffing

  • Convert 帳票 (form template) tables into structured data

validate_taggedA

Validate PDF/UA tagged structure requirements.

Args:

  • file_path (string): Absolute path to a local PDF file

  • response_format ('markdown' | 'json'): Output format (default: 'markdown')

Returns: Validation results including: whether the PDF is tagged, total checks performed, pass/fail counts, detailed issues with severity levels (error/warning/info), and a summary.

Checks performed:

  • Document marked as tagged

  • Structure tree root existence

  • Document root tag presence

  • Heading hierarchy (H1-H6) sequential order

  • Figure tags for images

  • Paragraph tag presence

  • Structure element count

  • Table tag structure (TR/TH/TD)

Examples:

  • Check if a PDF meets PDF/UA accessibility requirements

  • Identify missing or incorrect tag structure

  • Assess document accessibility quality

validate_metadataA

Validate PDF metadata conformance against best practices and specification requirements.

Args:

  • file_path (string): Absolute path to a local PDF file

  • response_format ('markdown' | 'json'): Output format (default: 'markdown')

Returns: Validation results including: total checks, pass/fail counts, detailed issues with severity, metadata field presence summary, and an overall summary.

Checks performed:

  • Title presence (required for PDF/UA, PDF/A)

  • Author presence

  • Creation date format validation

  • Modification date presence

  • Producer identification

  • PDF version detection

  • Tagged flag status

  • Subject and Keywords presence

  • Encryption and accessibility impact

Examples:

  • Verify PDF metadata completeness for PDF/A archival

  • Check metadata requirements for PDF/UA compliance

  • Audit document metadata for publishing standards

compare_structureA

Compare the internal structures of two PDF documents and identify differences.

Args:

  • file_path_1 (string): Absolute path to the first PDF file

  • file_path_2 (string): Absolute path to the second PDF file

  • response_format ('markdown' | 'json'): Output format (default: 'markdown')

Returns: Structural comparison including: property-by-property diff (page count, PDF version, encryption, tagged status, object counts, page dimensions, file size, catalog entries, signatures), font comparison (fonts unique to each file and shared fonts), and a summary.

Examples:

  • Compare two versions of the same document

  • Verify structural consistency across PDF exports

  • Identify differences in PDF generation pipelines

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/shuji-bonji/pdf-reader-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server