The MCP PDF Reader server enables AI assistants to extract, search, and analyze content from PDF documents through the Model Context Protocol (MCP).
Core Capabilities:
Extract full text content from entire PDFs with optional metadata inclusion
Read specific pages or page ranges for targeted content extraction
Retrieve document metadata including title, author, creation/modification dates, keywords, and page count
Search for text within PDFs with case-sensitive/insensitive options, returning matches with surrounding context
Get page count to quickly determine document size
List embedded images with metadata (page location, dimensions, type, name, index)
Extract individual images as Base64-encoded data for processing or saving
Integrate with AI assistants like Claude Desktop and GitHub Copilot
Test functionality using the MCP Inspector for debugging and validation
Limitations:
⚠️ No OCR support - only works with text-based PDFs containing selectable/embedded text (not scanned documents)
⚠️ Read-only - cannot edit, create, or modify PDF files
⚠️ Standard image formats only - supports JPEG, PNG, and TIFF extraction
Enables GitHub Copilot to analyze PDF files through standardized tools for searching content, reading specific page ranges, and extracting embedded visual assets.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@MCP PDF Readersummarize pages 1 to 3 of C:/Users/Documents/contract.pdf"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
MCP PDF Reader
Available Languages: 🇬🇧 English | 🇪🇸 Español
A powerful Model Context Protocol (MCP) server that empowers AI assistants like Claude and GitHub Copilot to intelligently interact with PDF documents. Extract text, metadata, search content, and retrieve embedded images—all through a standardized, LLM-friendly interface. Not OCR-based.
Current Version: 1.0.0
Package: @rturv/mcp-pdf-reader
License: MIT
Quick Start
Installation
Run the Server
Configuration
Claude Desktop
Add to claude_desktop_config.json:
Location:
Windows:
%APPDATA%\Claude\claude_desktop_config.jsonmacOS:
~/Library/Application Support/Claude/claude_desktop_config.jsonLinux:
~/.config/claude/claude_desktop_config.json
GitHub Copilot (VS Code)
Add to mcpServers.json:
Location: %APPDATA%\Code\User\globalStorage\github.copilot-chat\mcpServers.json
See COPILOT_CONFIG.md for additional installation methods.
Features
✅ Full Text Extraction - Extract complete text from PDF files
✅ Metadata Extraction - Retrieve title, author, creation date, and more
✅ Page Range Reading - Extract text from specific pages
✅ Text Search - Find text with surrounding context
✅ Page Count - Get total page count
✅ Image Extraction - List and extract embedded images in Base64
✅ Standards Compliant - Follows MCP specification for seamless LLM integration
Tools Reference
This MCP server exposes 7 tools for comprehensive PDF manipulation. All tools are accessible through Claude Desktop, GitHub Copilot, and other MCP-compatible clients.
1. read_pdf
Purpose: Extract all text content from a PDF file. Use this as your primary method for understanding PDF document content.
When to use:
Reading entire document content
Getting full document text for summarization or analysis
Extracting content when combined with metadata
Input Parameters:
Example Request:
Example Response:
2. get_pdf_metadata
Purpose: Extract document metadata without reading the full text. Ideal for quick document inspection.
When to use:
Identifying document properties (author, title, dates)
Quick document validation
Building document catalogs
Checking modification dates
Input Parameters:
Example Request:
Example Response:
3. read_pdf_pages
Purpose: Extract text from a specific page or range of pages. Use this for targeted content extraction.
When to use:
Reading specific sections of a document
Analyzing particular chapters or pages
Extracting cover pages or specific reports within a multi-part document
Handling large PDFs by reading sections
Input Parameters:
Example Request (single page):
Example Request (page range):
Example Response:
4. search_pdf
Purpose: Search for text within a PDF and retrieve all matches with surrounding context.
When to use:
Finding specific terms or phrases
Locating sections by keyword
Validating content presence
Building keyword-based summaries
Compliance checking (finding specific clauses)
Input Parameters:
Example Request:
Example Response:
5. get_pdf_page_count
Purpose: Get the total number of pages in a PDF without reading content.
When to use:
Validating PDF integrity
Determining if a PDF is empty
Planning page range extractions
Batch processing logic based on document size
Input Parameters:
Example Request:
Example Response:
6. list_pdf_images
Purpose: List all images embedded in a PDF with their metadata and locations.
When to use:
Discovering embedded images before extraction
Getting image dimensions and types
Planning image extraction operations
Validating image content presence
Input Parameters:
Example Request:
Example Response:
7. extract_pdf_image
Purpose: Extract a specific image from a PDF and return it as Base64-encoded data.
When to use:
Recovering images from PDFs
Converting PDF images to standard formats
Processing visual content for analysis
Archiving embedded images
Before using: Call list_pdf_images first to discover available images and their indices.
Input Parameters:
Example Request:
Example Response:
Note: Image data is Base64 encoded. Decode it to save as a file (e.g., using atob() in JavaScript or base64 -d in bash).
Testing with MCP Inspector
The MCP Inspector is an interactive tool for testing and debugging MCP servers. It provides a web-based UI to invoke tools and inspect responses in real-time.
Installation
Running the Server with Inspector
Start the MCP server in debug mode:
mcp-pdf-readerIn a separate terminal, launch the Inspector:
mcp-inspector node dist/index.jsOr if using the global npm package:
mcp-inspector npx @rturv/mcp-pdf-readerOpen the web UI: The Inspector will provide a URL (typically
http://localhost:5173). Open it in your browser.
Using the Inspector
Example: Extract text from a PDF
In the Inspector UI, find the
read_pdftool in the left sidebarClick on it to expand the tool interface
Enter the parameters:
{ "filePath": "C:/path/to/your/document.pdf", "includeMetadata": true }Click "Call Tool"
View the response in the right panel
Example: Search for text
Select the
search_pdftoolEnter parameters:
{ "filePath": "C:/path/to/your/document.pdf", "searchTerm": "important keyword", "caseSensitive": false }Click "Call Tool" and review the search results with context
Example: Extract an image
First, call
list_pdf_imagesto discover images:{ "filePath": "C:/path/to/your/document.pdf" }Note the index of the image you want (e.g.,
index: 0)Call
extract_pdf_imagewith that index:{ "filePath": "C:/path/to/your/document.pdf", "imageIndex": 0 }The response will include Base64-encoded image data ready for decoding and saving
Troubleshooting Inspector Issues
Port already in use: Change the port with
mcp-inspector --port 5174Connection refused: Ensure the MCP server is running before starting the Inspector
Tool not appearing: Verify the tool definition in
src/index.tsand rebuild withnpm run build
Development
Build
Watch Mode
Run Tests
Note: Tests require a sample PDF at test-files/sample.pdf. Create one or skip PDF-dependent tests.
Watch Tests
Project Structure
Technology Stack
@modelcontextprotocol/sdk (^1.25.2) - MCP protocol implementation
pdf-parse (^2.4.5) - PDF text extraction
pdf-lib (^1.17.1) - PDF image extraction
TypeScript (^5.9.3) - Type-safe development
Jest (^29.7.0) - Unit testing
Limitations
No OCR: Only extracts selectable text from PDFs (not scanned images)
Text-based PDFs: Works best with PDFs containing embedded text. Scanned documents without OCR cannot be read
Image extraction: Standard formats only (JPEG, PNG, TIFF)
Base64 encoding: All images are returned as Base64 strings; large images may result in large responses
No PDF modification: This server is read-only; it cannot edit or create PDFs
Configuration Files
CLAUDE_CONFIG.md - Claude Desktop configuration guide
COPILOT_CONFIG.md - GitHub Copilot setup instructions
License
MIT - See LICENSE file for details