Skip to main content
Glama
lihongwen

PDF Reader MCP Server

by lihongwen

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault

No arguments

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": false
}
prompts
{
  "listChanged": false
}
resources
{
  "subscribe": false,
  "listChanged": false
}
experimental
{}

Tools

Functions exposed to the LLM to take actions

NameDescription
read_pdfA

Extract text from PDF files with intelligent page handling and chunking.

Args:
    file_path: Path to the PDF file
    pages: Page range (e.g., '1,3,5-10,-1' for pages 1, 3, 5 to 10, and last page)
    chunk_size: Maximum size of text chunks
    chunk_overlap: Overlap between chunks to preserve context
    
Returns:
    JSON string with extracted text and metadata
split_pdfB

Split PDF into multiple files based on page ranges.

Args:
    file_path: Path to the source PDF file
    split_ranges: List of page ranges (e.g., ["1-5", "6-10", "11-15"])
    output_dir: Output directory (defaults to source file directory)
    prefix: Output file prefix (defaults to source filename)
    
Returns:
    JSON string with split operation results and output file information
extract_pagesA

Extract specific pages from PDF to a new file.

Args:
    file_path: Path to the source PDF file
    pages: Page range (e.g., "1,3,5-7" for pages 1, 3, and 5 to 7)
    output_file: Output filename (optional, auto-generated if not provided)
    output_dir: Output directory (defaults to source file directory)
    
Returns:
    JSON string with extraction results and output file information
merge_pdfsA

Merge multiple PDF files into a single file.

Pages are processed in the order specified in the file_paths list,
preserving the original page sequence in the merged document.

Args:
    file_paths: List of PDF file paths to merge
    output_file: Output filename (optional, auto-generated if not provided)
    output_dir: Output directory (defaults to first file's directory)
    
Returns:
    JSON string with merge results and output file information
ocr_pdfA

Perform OCR on PDF pages using Tesseract for scanned documents.

Args:
    file_path: Path to the PDF file
    pages: Page range (e.g., '1,3,5-10,-1' for pages 1, 3, 5 to 10, and last page)
    language: OCR language code (default: 'chi_sim' for simplified Chinese)
    chunk_size: Maximum size of text chunks
    chunk_overlap: Overlap between chunks to preserve context
    dpi: DPI for PDF to image conversion (higher = better quality, slower)
    
Returns:
    JSON string with OCR results and metadata
pdf_to_imagesA

Convert PDF pages to images.

Args:
    file_path: Path to PDF file
    pages: Page range (e.g., '1,3,5-10,-1' for pages 1, 3, 5 to 10, and last page)
    dpi: Resolution for image conversion (default: 200)
    image_format: Output format ('PNG', 'JPEG', etc.)
    output_dir: Directory to save images (default: auto-generated)
    save_to_disk: Whether to save images to disk or keep in memory
    
Returns:
    JSON string with conversion results and file paths
images_to_pdfA

Convert multiple images to a single PDF.

Images are processed in the order specified in the image_paths list,
preserving their sequence in the final PDF document.

Args:
    image_paths: List of image file paths to convert
    output_file: Output PDF file path
    page_size: Page size ('A4', 'Letter', 'Legal', or 'auto')
    quality: JPEG quality for compression (1-100)
    title: PDF document title (optional)
    author: PDF document author (optional)
    
Returns:
    JSON string with conversion results
extract_pdf_imagesA

Extract images from PDF pages.

Args:
    file_path: Path to PDF file
    pages: Page range (e.g., '1,3,5-10,-1' for specific pages)
    min_size: Minimum image size to extract (format: 'WIDTHxHEIGHT', e.g., '100x100')
    output_dir: Directory to save extracted images (default: auto-generated)
    
Returns:
    JSON string with extraction results and file paths
get_pdf_metadataA

Read PDF metadata including standard fields and optionally XMP metadata.

Args:
    file_path: Path to PDF file
    include_xmp: Whether to include advanced XMP metadata (default: False)
    
Returns:
    JSON string with comprehensive metadata information
set_pdf_metadataB

Write or update PDF metadata fields.

Args:
    file_path: Path to source PDF file
    output_file: Output PDF file path (optional, defaults to overwrite source)
    title: Document title
    author: Document author
    subject: Document subject  
    creator: Creator application name
    producer: Producer application name
    keywords: Keywords or tags (comma-separated)
    preserve_existing: Whether to preserve existing metadata (default: True)
    
Returns:
    JSON string with operation results
remove_pdf_metadataA

Remove specific metadata fields or all metadata from PDF.

The fields_to_remove and remove_all parameters are mutually exclusive:
use either fields_to_remove for selective removal OR remove_all for complete removal.

Args:
    file_path: Path to source PDF file
    output_file: Output PDF file path (optional, defaults to overwrite source)
    fields_to_remove: List of specific fields to remove (e.g., ['title', 'author'])
    remove_all: Remove all metadata if True (default: False)
    
Returns:
    JSON string with operation results
search_pdf_textA

Search for text content across PDF pages with detailed match information.

Args:
    file_path: Path to PDF file
    query: Text to search for (or regex pattern if regex_search=True)
    pages: Page range (e.g., '1,3,5-10,-1') or None for all pages
    case_sensitive: Whether search is case-sensitive (default: False)
    regex_search: Whether to treat query as regex pattern (default: False)
    context_chars: Number of characters to show around matches (default: 100)
    max_matches: Maximum number of matches to return (default: 100)
    
Returns:
    JSON string with search results, match locations, and context
extract_page_textC

Extract text from a specific PDF page with various extraction options.

Args:
    file_path: Path to PDF file
    page_number: Page number to extract (1-based)
    extraction_mode: Text extraction mode ('default', 'layout', 'simple')
    
Returns:
    JSON string with extracted text and statistics
find_and_highlight_textB

Find text and return information for highlighting matches.

Args:
    file_path: Path to PDF file
    query: Text to search for
    pages: Page range (e.g., '1,3,5-10,-1') or None for all pages
    case_sensitive: Whether search is case-sensitive (default: False)
    
Returns:
    JSON string with page highlights and position information
optimize_pdfA

Optimize PDF file using various compression techniques.

Args:
    file_path: Path to source PDF file
    output_file: Output PDF file path (optional, defaults to '_optimized' suffix)
    optimization_level: Optimization preset ('light', 'medium', 'heavy', 'maximum')
    
Returns:
    JSON string with optimization results and file size statistics
compress_pdf_imagesA

Compress images in PDF while preserving document structure.

Args:
    file_path: Path to source PDF file
    output_file: Output PDF file path (optional, auto-generated)
    quality: Image compression quality (1-100, where 100=best quality)
    
Returns:
    JSON string with compression results and statistics
remove_pdf_contentB

Remove specific content from PDF to reduce file size.

Args:
    file_path: Path to source PDF file
    output_file: Output PDF file path (optional, auto-generated)
    remove_images: Whether to remove all images
    remove_annotations: Whether to remove annotations
    compress_streams: Whether to compress content streams
    
Returns:
    JSON string with content removal results and statistics
analyze_pdf_sizeA

Analyze PDF file to identify optimization opportunities.

Provides detailed size breakdown by content type (text, images, metadata, etc.)
and recommends specific optimization strategies for file size reduction.

Args:
    file_path: Path to PDF file to analyze
    
Returns:
    JSON string with size analysis breakdown and optimization recommendations

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lihongwen/pdfreadermcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server