What can you do with this server?

The MCP PDF Reader server enables AI assistants to extract, search, and analyze content from PDF documents through the Model Context Protocol (MCP). Core Capabilities: * Extract full text content from entire PDFs with optional metadata inclusion * Read specific pages or page ranges for targeted content extraction * Retrieve document metadata including title, author, creation/modification dates, keywords, and page count * Search for text within PDFs with case-sensitive/insensitive options, returning matches with surrounding context * Get page count to quickly determine document size * List embedded images with metadata (page location, dimensions, type, name, index) * Extract individual images as Base64-encoded data for processing or saving * Integrate with AI assistants like Claude Desktop and GitHub Copilot * Test functionality using the MCP Inspector for debugging and validation Limitations: * ⚠️ No OCR support - only works with text-based PDFs containing selectable/embedded text (not scanned documents) * ⚠️ Read-only - cannot edit, create, or modify PDF files * ⚠️ Standard image formats only - supports JPEG, PNG, and TIFF extraction

Which integrations are available for this server?

Enables GitHub Copilot to analyze PDF files through standardized tools for searching content, reading specific page ranges, and extracting embedded visual assets.

How do I use MCP PDF Reader?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@MCP PDF Reader summarize pages 1 to 3 of C:/Users/Documents/contract.pdf" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

MCP PDF Reader

Available Languages: 🇬🇧 English | 🇪🇸 Español

A powerful Model Context Protocol (MCP) server that empowers AI assistants like Claude and GitHub Copilot to intelligently interact with PDF documents. Extract text, metadata, search content, and retrieve embedded images—all through a standardized, LLM-friendly interface. Not OCR-based.

Current Version: 1.0.0
Package: @rturv/mcp-pdf-reader
License: MIT

Quick Start

Installation

npm install -g @rturv/mcp-pdf-reader

Run the Server

mcp-pdf-reader

Configuration

Claude Desktop

Add to claude_desktop_config.json:

{ "mcpServers": { "pdf-reader": { "command": "mcp-pdf-reader" } } }

Location:

Windows: %APPDATA%\Claude\claude_desktop_config.json
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Linux: ~/.config/claude/claude_desktop_config.json

GitHub Copilot (VS Code)

Add to mcpServers.json:

{ "mcpServers": { "pdf-reader": { "command": "mcp-pdf-reader", "args": [], "disabled": false } } }

Location: %APPDATA%\Code\User\globalStorage\github.copilot-chat\mcpServers.json

See COPILOT_CONFIG.md for additional installation methods.

Features

✅ Full Text Extraction - Extract complete text from PDF files
✅ Metadata Extraction - Retrieve title, author, creation date, and more
✅ Page Range Reading - Extract text from specific pages
✅ Text Search - Find text with surrounding context
✅ Page Count - Get total page count
✅ Image Extraction - List and extract embedded images in Base64
✅ Standards Compliant - Follows MCP specification for seamless LLM integration

Tools Reference

This MCP server exposes 7 tools for comprehensive PDF manipulation. All tools are accessible through Claude Desktop, GitHub Copilot, and other MCP-compatible clients.

1. `read_pdf`

Purpose: Extract all text content from a PDF file. Use this as your primary method for understanding PDF document content.

When to use:

Reading entire document content
Getting full document text for summarization or analysis
Extracting content when combined with metadata

Input Parameters:

{ "filePath": "string (required) - Absolute path to the PDF file", "includeMetadata": "boolean (optional, default: false) - Include PDF metadata in response" }

Example Request:

{ "filePath": "C:/Documents/report.pdf", "includeMetadata": true }

Example Response:

{ "text": "Executive Summary\n\nThis report details Q4 2025 performance...", "metadata": { "title": "Q4 2025 Performance Report", "author": "Analytics Team", "subject": "Quarterly Report", "creator": "Microsoft Word", "producer": "iLovePDF", "creationDate": "D:20250115120000Z", "modificationDate": "D:20250115150000Z", "keywords": "Q4, report, performance", "totalPages": 12 }, "pageCount": 12 }

2. `get_pdf_metadata`

Purpose: Extract document metadata without reading the full text. Ideal for quick document inspection.

When to use:

Identifying document properties (author, title, dates)
Quick document validation
Building document catalogs
Checking modification dates

Input Parameters:

{ "filePath": "string (required) - Absolute path to the PDF file" }

Example Request:

{ "filePath": "C:/Documents/contract.pdf" }

Example Response:

{ "title": "Service Agreement 2025", "author": "Legal Department", "subject": "Service Terms & Conditions", "creator": "Adobe InDesign", "producer": "Adobe PDF Library", "creationDate": "D:20250101090000Z", "modificationDate": "D:20250110140000Z", "keywords": "service, agreement, contract", "totalPages": 8 }

3. `read_pdf_pages`

Purpose: Extract text from a specific page or range of pages. Use this for targeted content extraction.

When to use:

Reading specific sections of a document
Analyzing particular chapters or pages
Extracting cover pages or specific reports within a multi-part document
Handling large PDFs by reading sections

Input Parameters:

{ "filePath": "string (required) - Absolute path to the PDF file", "startPage": "number (required) - Starting page number (1-indexed)", "endPage": "number (optional) - Ending page number. If omitted, defaults to startPage" }

Example Request (single page):

{ "filePath": "C:/Documents/thesis.pdf", "startPage": 1 }

Example Request (page range):

{ "filePath": "C:/Documents/thesis.pdf", "startPage": 5, "endPage": 12 }

Example Response:

{ "text": "Chapter 2: Literature Review\n\nThis chapter examines existing research...", "startPage": 5, "endPage": 12, "totalPages": 45 }

4. `search_pdf`

Purpose: Search for text within a PDF and retrieve all matches with surrounding context.

When to use:

Finding specific terms or phrases
Locating sections by keyword
Validating content presence
Building keyword-based summaries
Compliance checking (finding specific clauses)

Input Parameters:

{ "filePath": "string (required) - Absolute path to the PDF file", "searchTerm": "string (required) - Text to search for", "caseSensitive": "boolean (optional, default: false) - Case-sensitive search" }

Example Request:

{ "filePath": "C:/Documents/policy.pdf", "searchTerm": "termination clause", "caseSensitive": false }

Example Response:

[ { "page": 3, "text": "termination clause", "context": "...either party may invoke the termination clause without prior written notice...", "position": 456 }, { "page": 7, "text": "termination clause", "context": "...In accordance with Section 4.2, the termination clause becomes effective...", "position": 1230 } ]

5. `get_pdf_page_count`

Purpose: Get the total number of pages in a PDF without reading content.

When to use:

Validating PDF integrity
Determining if a PDF is empty
Planning page range extractions
Batch processing logic based on document size

Input Parameters:

{ "filePath": "string (required) - Absolute path to the PDF file" }

Example Request:

{ "filePath": "C:/Documents/manual.pdf" }

Example Response:

{ "pageCount": 247 }

6. `list_pdf_images`

Purpose: List all images embedded in a PDF with their metadata and locations.

When to use:

Discovering embedded images before extraction
Getting image dimensions and types
Planning image extraction operations
Validating image content presence

Input Parameters:

{ "filePath": "string (required) - Absolute path to the PDF file" }

Example Request:

{ "filePath": "C:/Documents/presentation.pdf" }

Example Response:

{ "images": [ { "index": 0, "page": 2, "name": "Image42", "width": 800, "height": 600, "type": "JPEG" }, { "index": 1, "page": 5, "name": "Image43", "width": 1024, "height": 768, "type": "PNG" }, { "index": 2, "page": 8, "name": "Image44", "width": 640, "height": 480, "type": "TIFF" } ] }

7. `extract_pdf_image`

Purpose: Extract a specific image from a PDF and return it as Base64-encoded data.

When to use:

Recovering images from PDFs
Converting PDF images to standard formats
Processing visual content for analysis
Archiving embedded images

Before using: Call list_pdf_images first to discover available images and their indices.

Input Parameters:

{ "filePath": "string (required) - Absolute path to the PDF file", "imageIndex": "number (required) - Image index from list_pdf_images (0-indexed)" }

Example Request:

{ "filePath": "C:/Documents/presentation.pdf", "imageIndex": 0 }

Example Response:

{ "index": 0, "page": 2, "name": "Image42", "width": 800, "height": 600, "type": "JPEG", "data": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==" }

Note: Image data is Base64 encoded. Decode it to save as a file (e.g., using atob() in JavaScript or base64 -d in bash).

Testing with MCP Inspector

The MCP Inspector is an interactive tool for testing and debugging MCP servers. It provides a web-based UI to invoke tools and inspect responses in real-time.

Installation

npm install -g @modelcontextprotocol/inspector

Running the Server with Inspector

Start the MCP server in debug mode:
mcp-pdf-reader
In a separate terminal, launch the Inspector:
mcp-inspector node dist/index.js
Or if using the global npm package:
mcp-inspector npx @rturv/mcp-pdf-reader
Open the web UI: The Inspector will provide a URL (typically http://localhost:5173). Open it in your browser.

Using the Inspector

Example: Extract text from a PDF

In the Inspector UI, find the read_pdf tool in the left sidebar
Click on it to expand the tool interface
Enter the parameters:
{ "filePath": "C:/path/to/your/document.pdf", "includeMetadata": true }
Click "Call Tool"
View the response in the right panel

Example: Search for text

Select the search_pdf tool
Enter parameters:
{ "filePath": "C:/path/to/your/document.pdf", "searchTerm": "important keyword", "caseSensitive": false }
Click "Call Tool" and review the search results with context

Example: Extract an image

First, call list_pdf_images to discover images:
{ "filePath": "C:/path/to/your/document.pdf" }
Note the index of the image you want (e.g., index: 0)
Call extract_pdf_image with that index:
{ "filePath": "C:/path/to/your/document.pdf", "imageIndex": 0 }
The response will include Base64-encoded image data ready for decoding and saving

Troubleshooting Inspector Issues

Port already in use: Change the port with mcp-inspector --port 5174
Connection refused: Ensure the MCP server is running before starting the Inspector
Tool not appearing: Verify the tool definition in src/index.ts and rebuild with npm run build

Development

Build

npm run build

Watch Mode

npm run dev

Run Tests

npm test

Note: Tests require a sample PDF at test-files/sample.pdf. Create one or skip PDF-dependent tests.

Watch Tests

npm run test:watch

Project Structure

mcp-pdf-reader/ ├── src/ │ ├── index.ts # MCP server implementation & tool definitions │ ├── pdf-tools.ts # Core PDF manipulation functions │ ├── types.ts # TypeScript interfaces & types │ └── __tests__/ │ └── pdf-tools.test.ts # Unit tests ├── dist/ # Compiled JavaScript (generated) ├── test-files/ # Test PDF files ├── package.json ├── tsconfig.json └── README.md

Technology Stack

@modelcontextprotocol/sdk (^1.25.2) - MCP protocol implementation
pdf-parse (^2.4.5) - PDF text extraction
pdf-lib (^1.17.1) - PDF image extraction
TypeScript (^5.9.3) - Type-safe development
Jest (^29.7.0) - Unit testing

Limitations

No OCR: Only extracts selectable text from PDFs (not scanned images)
Text-based PDFs: Works best with PDFs containing embedded text. Scanned documents without OCR cannot be read
Image extraction: Standard formats only (JPEG, PNG, TIFF)
Base64 encoding: All images are returned as Base64 strings; large images may result in large responses
No PDF modification: This server is read-only; it cannot edit or create PDFs

Configuration Files

CLAUDE_CONFIG.md - Claude Desktop configuration guide
COPILOT_CONFIG.md - GitHub Copilot setup instructions

License

MIT - See LICENSE file for details

Repository

github.com/rturv/mcp-pdf-reader

MCP PDF Reader

MCP PDF Reader

Quick Start

Installation

Run the Server

Configuration

Claude Desktop

GitHub Copilot (VS Code)

Features

Tools Reference

1. `read_pdf`

2. `get_pdf_metadata`

3. `read_pdf_pages`

4. `search_pdf`

5. `get_pdf_page_count`

6. `list_pdf_images`

7. `extract_pdf_image`

Testing with MCP Inspector

Installation

Running the Server with Inspector

Using the Inspector

Example: Extract text from a PDF

Example: Search for text

Example: Extract an image

Troubleshooting Inspector Issues

Development

Build

Watch Mode

Run Tests

Watch Tests

Project Structure

Technology Stack

Limitations

Configuration Files

License

Repository

Resources

Tools

Latest Blog Posts

MCP directory API

MCP PDF Reader

Quick Start

Installation

Run the Server

Configuration

Claude Desktop

GitHub Copilot (VS Code)

Features

Tools Reference

1. read_pdf

2. get_pdf_metadata

3. read_pdf_pages

4. search_pdf

5. get_pdf_page_count

6. list_pdf_images

7. extract_pdf_image

Testing with MCP Inspector

Installation

Running the Server with Inspector

Using the Inspector

Example: Extract text from a PDF

Example: Search for text

Example: Extract an image

Troubleshooting Inspector Issues

Development

Build

Watch Mode

Run Tests

Watch Tests

Project Structure

Technology Stack

Limitations

Configuration Files

License

Repository

Resources

Tools

Latest Blog Posts

MCP directory API

1. `read_pdf`

2. `get_pdf_metadata`

3. `read_pdf_pages`

4. `search_pdf`

5. `get_pdf_page_count`

6. `list_pdf_images`

7. `extract_pdf_image`