<div align="center">
# mcp-pdf-tools
**MCP server for extracting text, searching, and analyzing PDF files**
[](https://www.npmjs.com/package/mcp-pdf-tools)
[](https://opensource.org/licenses/MIT)


Give Claude (or any MCP client) the ability to read, search, and analyze PDF documents.
</div>
---
## What is this?
`mcp-pdf-tools` is a [Model Context Protocol](https://modelcontextprotocol.io/) server that gives AI assistants the ability to work with PDF files. Point it at any text-based PDF and your assistant can extract content, search for specific text, pull metadata, and analyze word usage — all without leaving the conversation.
## Quick Start
### Claude Desktop
Add to your Claude Desktop config (`~/Library/Application Support/Claude/claude_desktop_config.json`):
```json
{
"mcpServers": {
"pdf-tools": {
"command": "npx",
"args": ["-y", "mcp-pdf-tools"]
}
}
}
```
### Claude Code
```bash
claude mcp add pdf-tools npx mcp-pdf-tools
```
### Other MCP Clients
```bash
npx -y mcp-pdf-tools
```
The server communicates over stdio using the MCP protocol.
## Tools
### `pdf_info`
Get metadata and statistics about a PDF file.
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `file_path` | string | Yes | Absolute path to the PDF file |
**Returns:** Title, author, page count, text length, creator, and producer information.
---
### `pdf_extract_text`
Extract all text content from a PDF file.
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `file_path` | string | Yes | — | Absolute path to the PDF file |
| `max_chars` | number | No | 50000 | Maximum characters to return (truncates with notice) |
**Returns:** Full text content of the PDF, prefixed with page count.
---
### `pdf_extract_pages`
Extract text from a specific page range.
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `file_path` | string | Yes | Absolute path to the PDF file |
| `start_page` | number | Yes | Start page (1-indexed) |
| `end_page` | number | Yes | End page (inclusive) |
**Returns:** Text content from the specified page range.
---
### `pdf_search`
Search for text within a PDF file, returning matches with surrounding context.
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `file_path` | string | Yes | — | Absolute path to the PDF file |
| `query` | string | Yes | — | Text to search for (case-insensitive) |
| `max_results` | number | No | 20 | Maximum number of matches to return |
**Returns:** List of matches with line numbers and surrounding context lines.
---
### `pdf_word_stats`
Get word count and top word frequencies from a PDF.
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `file_path` | string | Yes | — | Absolute path to the PDF file |
| `top_n` | number | No | 20 | Number of top words to include |
**Returns:** Total word count, page count, and a ranked list of the most frequent words (3+ characters).
## Example Conversations
### Summarizing a report
> **You:** Summarize the key points in /documents/quarterly-report.pdf
>
> **Claude:** *(uses `pdf_info` to check document size, then `pdf_extract_text` to read content)*
>
> This is a 24-page quarterly report covering Q4 2025. The key points are...
### Searching a contract
> **You:** Does the NDA in /legal/nda-acme.pdf mention anything about a non-compete?
>
> **Claude:** *(uses `pdf_search` with query "non-compete")*
>
> Yes — I found 3 mentions of "non-compete" in the document. On line 47, there's a clause stating...
### Analyzing word usage
> **You:** What are the most discussed topics in /research/paper.pdf?
>
> **Claude:** *(uses `pdf_word_stats` to get word frequencies)*
>
> The paper is 8,400 words across 12 pages. The most frequent terms are "neural" (47 occurrences), "training" (38), and "optimization" (29), suggesting the paper focuses heavily on...
## Limitations
Be aware of these current constraints:
- **Text-based PDFs only** — Scanned or image-based PDFs will return empty text. No OCR support (yet).
- **Page extraction is approximate** — Page boundaries are detected heuristically. Extracted page ranges may not align perfectly with the visual pages in your PDF viewer.
- **No table extraction** — Tabular data in PDFs may not preserve its structure in the extracted text.
- **Full file loaded into memory** — Very large PDFs may be slow to process.
- **No merge or split** — This tool reads PDFs; it does not modify, merge, or split them.
## Development
```bash
git clone https://github.com/seraphinederenouard/mcp-pdf-tools.git
cd mcp-pdf-tools
npm install
npm run build
npm test
```
## License
MIT