Schema | @shuji-bonji/pdf-reader-mcp

@shuji-bonji/pdf-reader-mcp

Overview Schema Related Servers Score Discussions

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
No arguments

Capabilities

Features and capabilities supported by this server

Capability	Details
`tools`	{ "listChanged": true }

Tools

Functions exposed to the LLM to take actions

Name	Description
get_page_countA	Get the total number of pages in a PDF document. This is a lightweight operation that only reads the PDF header, not the full content. Args: file_path (string): Absolute path to a local PDF file Returns: Page count as a number. Examples: Quick check before deciding which pages to extract Validate a PDF file is readable
get_metadataA	Extract metadata from a PDF document including title, author, creation date, page count, PDF version, and structural information. Args: file_path (string): Absolute path to a local PDF file response_format ('markdown' \| 'json'): Output format (default: 'markdown') Returns: Metadata including: title, author, subject, keywords, creator, producer, creation/modification dates, page count, PDF version, linearized/encrypted/tagged/signature flags, file size. Examples: Get document properties for cataloging Check if a PDF is tagged (accessibility) Verify PDF version compatibility
read_textA	Extract text content from a PDF document with Y-coordinate-based reading order preservation. Text is extracted page by page, sorted by vertical position (top to bottom) then horizontal position (left to right), providing natural reading order. For untagged multi-column PDFs (e.g. older 新旧対照表 PDFs that lack a structure tree), pass `split_columns: 2` or `3` to bucket items by X-coordinate left-to-right. Tagged PDFs with proper `<Table>` markup should use the `extract_tables` tool instead. For Japanese form-style PDFs (帳票・様式) where U+3000 fullwidth spaces are used as visual indentation, pass `compact_whitespace: true` to collapse runs of whitespace to a single ASCII space. Cuts 20–40% of token consumption without losing content. Args: file_path (string): Absolute path to a local PDF file pages (string, optional): Page range to extract. Format: "1-5", "3", or "1,3,5-7". Omit for all pages. response_format ('markdown' \| 'json'): Output format (default: 'markdown') split_columns (1 \| 2 \| 3, optional): Column-aware reordering for untagged multi-column PDFs. Default 1 = existing Y-sort. compact_whitespace (boolean, optional): Collapse whitespace runs (incl. U+3000) to one ASCII space and trim each line. Default false. Returns: Extracted text organized by page number. With `split_columns >= 2`, columns are separated by a blank line so a downstream LLM can tell them apart. Examples: Extract all text: { file_path: "/path/to/doc.pdf" } Untagged 新旧対照表: { file_path: "/path/to/older-shinkyu.pdf", split_columns: 2 } Japanese form template: { file_path: "/path/to/form.pdf", compact_whitespace: true }
search_textA	Search for text within a PDF document. Returns matching locations with surrounding context. Case-insensitive search across all or specified pages. Each match includes the page number, the matched text, and configurable surrounding context. Args: file_path (string): Absolute path to a local PDF file query (string): Text to search for (case-insensitive, 1-500 chars) pages (string, optional): Page range to search. Omit for all pages. context_chars (number): Characters of context before/after match (default: 80) max_results (number): Maximum matches to return (default: 20, max: 100) response_format ('markdown' \| 'json'): Output format (default: 'markdown') Returns: Search matches with page number, matched text, and surrounding context. Examples: Search entire PDF: { file_path: "/path/to/doc.pdf", query: "digital signature" } Search specific pages: { file_path: "/path/to/doc.pdf", query: "error", pages: "1-10" }
read_imagesA	Extract images from a PDF document as base64-encoded data. Extracts embedded images from specified or all pages. Returns image metadata (dimensions, color space) along with raw pixel data in base64. Args: file_path (string): Absolute path to a local PDF file pages (string, optional): Page range. Format: "1-5", "3", or "1,3,5-7". Omit for all pages. Returns: Array of extracted images with: page number, index, width, height, color space (RGB/RGBA/Grayscale), bits per component, and base64-encoded data. Note: Large images may produce very large responses. Use the pages parameter to limit scope. Examples: Extract all images: { file_path: "/path/to/doc.pdf" } Extract from page 1: { file_path: "/path/to/doc.pdf", pages: "1" }
read_urlA	Fetch a PDF from a URL and extract its text content. Downloads the PDF from the specified URL, then extracts text with Y-coordinate-based reading order. Supports HTTP and HTTPS. Maximum file size: 50MB. Timeout: 30 seconds. Like `read_text`, accepts `split_columns: 2 \| 3` for untagged multi-column PDFs and `compact_whitespace: true` to collapse U+3000 / ASCII whitespace runs. Tagged PDFs should use `extract_tables` instead. Args: url (string): URL pointing to a PDF file (HTTP or HTTPS) pages (string, optional): Page range to extract. Format: "1-5", "3", or "1,3,5-7". Omit for all pages. response_format ('markdown' \| 'json'): Output format (default: 'markdown') split_columns (1 \| 2 \| 3, optional): Column-aware reordering. Default 1 = existing Y-sort. compact_whitespace (boolean, optional): Collapse whitespace runs (incl. U+3000) to one ASCII space. Default false. Returns: Extracted text organized by page number, same format as read_text. Examples: Read remote PDF: { url: "https://example.com/document.pdf" } Untagged 2-column PDF: { url: "https://...", split_columns: 2 } Japanese form: { url: "https://...", compact_whitespace: true }
summarizeA	Generate a quick overview report of a PDF document. Combines metadata, text presence check, image count, and a text preview from the first page into a single summary. Useful as a first step before deciding which detailed tools to use. Args: file_path (string): Absolute path to a local PDF file response_format ('markdown' \| 'json'): Output format (default: 'markdown') Returns: Summary including: page count, PDF version, file size, tagged/encrypted/signature flags, text presence, image count, and a text preview from page 1. Examples: Quick overview: { file_path: "/path/to/doc.pdf" } Machine-readable: { file_path: "/path/to/doc.pdf", response_format: "json" }
inspect_structureA	Examine PDF internal object structure including catalog entries, page tree, and object statistics. Args: file_path (string): Absolute path to a local PDF file response_format ('markdown' \| 'json'): Output format (default: 'markdown') Returns: Catalog entries (keys and types), page tree info (page count, MediaBox samples), object statistics (total count, stream count, type distribution), and encryption status. Examples: Examine document catalog for structural features Count PDF objects and streams Check page dimensions across the document
inspect_tagsA	Analyze the Tagged PDF structure tree for accessibility assessment. Args: file_path (string): Absolute path to a local PDF file response_format ('markdown' \| 'json'): Output format (default: 'markdown') Returns: Whether the PDF is tagged, the structure tree hierarchy with roles, max nesting depth, total element count, and role distribution (e.g., Document, P, H1, Table, Figure). Examples: Check if a PDF is tagged for accessibility (PDF/UA) Inspect the tag hierarchy and role distribution Assess document structure quality
inspect_fontsA	List all fonts used in a PDF document with their properties. Args: file_path (string): Absolute path to a local PDF file response_format ('markdown' \| 'json'): Output format (default: 'markdown') Returns: Font name, type (TrueType, Type1, CIDFont, etc.), encoding, embedded/subset status, and pages where each font is used. Examples: Check if all fonts are embedded (required for PDF/A, PDF/X) Identify font types and encodings Find which pages use specific fonts
inspect_annotationsA	Extract and categorize all annotations in a PDF document. Args: file_path (string): Absolute path to a local PDF file pages (string, optional): Page range. Format: "1-5", "3", or "1,3,5-7". Omit for all pages. response_format ('markdown' \| 'json'): Output format (default: 'markdown') Returns: Total annotation count, breakdown by subtype (Link, Widget, Highlight, Text, etc.) and by page, flags for links/forms/markup presence, and individual annotation details. Examples: Check for form fields (Widget annotations) Find all links in a document Inventory markup annotations (highlights, comments)
inspect_signaturesA	Examine digital signature fields in a PDF document. Args: file_path (string): Absolute path to a local PDF file response_format ('markdown' \| 'json'): Output format (default: 'markdown') Returns: Total signature field count, signed/unsigned breakdown, and details for each field (signer name, reason, location, signing time, filter/subFilter). Note: This tool inspects signature field structure only. Cryptographic signature verification is not performed. Examples: Check if a PDF has been digitally signed Inspect signer information and signing dates Verify signature field structure
extract_tablesA	Extract every `<Table>` subtree from a Tagged PDF as a structured row/cell list, optionally rendered as Markdown tables. How it works: walks the StructTree and pulls cell text for each `<TR>` → `<TH>/<TD>`, then collapses kerning whitespace (e.g. "消費税法" → "消費税法"). This sidesteps reading-order extraction's failure mode on multi-column tables (typical of 新旧対照表 PDFs). Args: file_path (string): Absolute path to a local PDF file pages (string, optional): Page range. Format: "1-5", "3", or "1,3,5-7". Omit for all pages. response_format ('markdown' \| 'json'): Output format (default: 'markdown') Returns: Markdown — `# Extracted Tables` summary block followed by one `## Page N — Table M` section per table with a GFM table. JSON — `{ isTagged, tables: [{ page, index, headerRows, bodyRows, footerRows }], totalTables, pagesScanned, note? }`. Limitations: Untagged PDFs return an empty result and a `note`. colspan/rowspan are not honoured (cells are listed in source order). Nested tables are skipped to keep page indices stable. Examples: Pull 新旧対照表 from a kaisei tsutatsu PDF for diffing Convert 帳票 (form template) tables into structured data
validate_taggedA	Validate PDF/UA tagged structure requirements. Args: file_path (string): Absolute path to a local PDF file response_format ('markdown' \| 'json'): Output format (default: 'markdown') Returns: Validation results including: whether the PDF is tagged, total checks performed, pass/fail counts, detailed issues with severity levels (error/warning/info), and a summary. Checks performed: Document marked as tagged Structure tree root existence Document root tag presence Heading hierarchy (H1-H6) sequential order Figure tags for images Paragraph tag presence Structure element count Table tag structure (TR/TH/TD) Examples: Check if a PDF meets PDF/UA accessibility requirements Identify missing or incorrect tag structure Assess document accessibility quality
validate_metadataA	Validate PDF metadata conformance against best practices and specification requirements. Args: file_path (string): Absolute path to a local PDF file response_format ('markdown' \| 'json'): Output format (default: 'markdown') Returns: Validation results including: total checks, pass/fail counts, detailed issues with severity, metadata field presence summary, and an overall summary. Checks performed: Title presence (required for PDF/UA, PDF/A) Author presence Creation date format validation Modification date presence Producer identification PDF version detection Tagged flag status Subject and Keywords presence Encryption and accessibility impact Examples: Verify PDF metadata completeness for PDF/A archival Check metadata requirements for PDF/UA compliance Audit document metadata for publishing standards
compare_structureA	Compare the internal structures of two PDF documents and identify differences. Args: file_path_1 (string): Absolute path to the first PDF file file_path_2 (string): Absolute path to the second PDF file response_format ('markdown' \| 'json'): Output format (default: 'markdown') Returns: Structural comparison including: property-by-property diff (page count, PDF version, encryption, tagged status, object counts, page dimensions, file size, catalog entries, signatures), font comparison (fonts unique to each file and shared fonts), and a summary. Examples: Compare two versions of the same document Verify structural consistency across PDF exports Identify differences in PDF generation pipelines

Prompts

Interactive templates invoked by user choice

Name	Description
No prompts

Resources

Contextual data attached and managed by the client

Name	Description
No resources

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/shuji-bonji/pdf-reader-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server