Skip to main content
Glama

PDF MCP Server

by volume19
README.md3.34 kB
# PDF MCP Server A Model Context Protocol (MCP) server for processing large PDF files with intelligent chunking and text extraction. ## Features - **PDF Metadata**: Get file info, page count, author, title, etc. - **Text Extraction**: Extract text from specific page ranges with character limits - **PDF Search**: Search within PDFs with contextual results - **Smart Chunking**: Calculate optimal page ranges for processing large PDFs ## Tools ### 1. `pdf_get_metadata` Get metadata about a PDF file. **Parameters:** - `pdf_path` (string, required): Full path to the PDF file **Returns:** - File size, page count, title, author, and other metadata ### 2. `pdf_extract_text` Extract text from a range of pages. **Parameters:** - `pdf_path` (string, required): Full path to the PDF file - `start_page` (integer, optional): Starting page (1-indexed, default: 1) - `end_page` (integer, optional): Ending page (default: last page) - `max_chars` (integer, optional): Maximum characters to extract **Returns:** - Extracted text with page markers - Character count and truncation info ### 3. `pdf_search` Search for text within a PDF. **Parameters:** - `pdf_path` (string, required): Full path to the PDF file - `query` (string, required): Text to search for (case-insensitive) - `context_chars` (integer, optional): Context characters around matches (default: 200) - `max_results` (integer, optional): Maximum results (default: 50) **Returns:** - List of matches with page numbers and context ### 4. `pdf_get_chunks` Calculate optimal chunking strategy for large PDFs. **Parameters:** - `pdf_path` (string, required): Full path to the PDF file - `max_chars_per_chunk` (integer, optional): Target chunk size (default: 50000) - `overlap_pages` (integer, optional): Page overlap between chunks (default: 1) **Returns:** - List of chunks with page ranges and estimated character counts ## Installation 1. Install dependencies: ```bash pip install -r requirements.txt ``` 2. Configure in Claude Code (see Configuration section) ## Configuration Add to your Claude Code MCP settings (`%APPDATA%\Claude\claude_desktop_config.json` on Windows): ```json { "mcpServers": { "pdf-processor": { "command": "python", "args": ["c:\\Users\\Will\\pdf-mcp-server\\server.py"] } } } ``` After configuration, restart Claude Code to load the MCP server. ## Usage Examples ### Processing a 55MB PDF 1. First, get metadata: ``` Use pdf_get_metadata to check the page count ``` 2. Calculate chunks: ``` Use pdf_get_chunks to determine optimal page ranges ``` 3. Extract text by chunk: ``` Use pdf_extract_text with the page ranges from step 2 ``` 4. Search across the PDF: ``` Use pdf_search to find specific content ``` ## Technical Details - Uses `pdfplumber` for high-quality text extraction - Uses `pypdf` for metadata and PDF structure - Runs locally using your compute resources - No file size limits (processes in chunks) - Handles encrypted PDFs (if not password-protected) ## Troubleshooting **Server not appearing in Claude Code:** - Check that the path in config is correct - Restart Claude Code after configuration changes - Check Python is accessible from command line **Extraction issues:** - Scanned PDFs may have poor text extraction (OCR not yet implemented) - Some PDFs may have unusual encoding

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/volume19/pdf-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server