Skip to main content
Glama

PDFtotext MCP Server

by jpwebb
MIT License
3
  • Linux
  • Apple

PDFtotext MCP Server

A reliable Model Context Protocol (MCP) server for PDF text extraction using the proven pdftotext utility from poppler-utils.

🚀 Why This Server?

Unlike other PDF MCP servers that suffer from logging interference, complex dependencies, and reliability issues, pdftotext-mcp is:

  • Actually works - Clean JSON-RPC communication without stdout pollution
  • Reliable - Built on mature pdftotext from poppler-utils (used by millions)
  • Lightweight - Minimal dependencies, maximum compatibility
  • Production tested - Successfully tested with Claude Desktop and other MCP clients
  • Feature complete - Page-specific extraction, layout preservation, encoding options
  • Error handling - Comprehensive validation and helpful error messages

📋 Features

  • 📄 Extract text from entire PDF documents or specific pages
  • 🎨 Preserve original layout formatting (optional)
  • 🔤 Multiple text encoding support (UTF-8, Latin1, ASCII)
  • 📊 Comprehensive metadata in responses (word count, file info, etc.)
  • 🛡️ File validation and security checks
  • ⚡ Fast processing with configurable timeouts
  • 🔍 Detailed error reporting with troubleshooting hints

🔧 Prerequisites

You must have pdftotext installed on your system:

Ubuntu/Debian

sudo apt update sudo apt install poppler-utils

macOS

brew install poppler

Windows

# Using Chocolatey choco install poppler # Using Scoop scoop install poppler

Verify Installation

pdftotext -v

📦 Installation

npm install -g pdftotext-mcp

Option 2: Use with npx (No Installation)

npx pdftotext-mcp

Option 3: Local Development

git clone https://github.com/jpwebb/pdftotext-mcp.git cd pdftotext-mcp npm install npm start

⚙️ Configuration

Add to your MCP client configuration:

Claude Desktop

Add to claude_desktop_config.json:

{ "mcpServers": { "pdftotext": { "command": "pdftotext-mcp" } } }

Or with npx:

{ "mcpServers": { "pdftotext": { "command": "npx", "args": ["pdftotext-mcp"] } } }

Other MCP Clients

The server works with any MCP-compatible client. Use pdftotext-mcp as the command.

🎯 Usage

The server provides a single, powerful tool: read_pdf_text

Basic Usage

Extract entire document
{ "tool": "read_pdf_text", "arguments": { "path": "./document.pdf" } }
Extract specific page
{ "tool": "read_pdf_text", "arguments": { "path": "./document.pdf", "page": 2 } }
Preserve layout formatting
{ "tool": "read_pdf_text", "arguments": { "path": "./document.pdf", "layout": true } }
Custom encoding
{ "tool": "read_pdf_text", "arguments": { "path": "./document.pdf", "encoding": "Latin1" } }

Response Format

Success Response
{ "success": true, "file": "document.pdf", "path": "/absolute/path/to/document.pdf", "extractedText": "Full text content...", "pageSpecific": "all", "layoutPreserved": false, "encoding": "UTF-8", "fileSize": 1048576, "lastModified": "2024-01-15T10:30:00.000Z", "extractedAt": "2024-01-15T10:35:00.000Z", "textLength": 5234, "wordCount": 892 }
Error Response
{ "success": false, "error": "File not found: ./nonexistent.pdf", "errorType": "FILE_NOT_FOUND", "file": "./nonexistent.pdf", "timestamp": "2024-01-15T10:35:00.000Z" }

📚 API Reference

Tool: read_pdf_text

Extracts text content from PDF files using pdftotext.

Parameters
ParameterTypeRequiredDefaultDescription
pathstring-Path to PDF file (relative or absolute)
pagenumberall pagesSpecific page to extract (1-based)
layoutbooleanfalsePreserve original text layout
encodingstring"UTF-8"Output text encoding
Supported Encodings
  • UTF-8 (default)
  • Latin1
  • ASCII
Error Types
  • FILE_NOT_FOUND - PDF file doesn't exist
  • PERMISSION_DENIED - Cannot read the file
  • INVALID_PDF - File is not a valid PDF
  • PDFTOTEXT_ERROR - pdftotext utility error
  • UNKNOWN_ERROR - Unexpected error

🔧 Troubleshooting

"pdftotext is not available"

Solution: Install poppler-utils (see Prerequisites)

"File not found"

Solutions:

  • Use absolute paths: /home/user/document.pdf
  • Check file exists: ls -la /path/to/file.pdf
  • Verify MCP server working directory

"Permission denied"

Solutions:

  • Check file permissions: chmod 644 document.pdf
  • Ensure directory is readable: chmod 755 /path/to/directory/

"File is not a valid PDF"

Solutions:

  • Verify file is actually a PDF: file document.pdf
  • Check for file corruption
  • Try with a different PDF file

MCP Connection Issues

Solutions:

  • Restart your MCP client completely
  • Check configuration syntax in config file
  • Verify pdftotext-mcp is accessible in PATH
  • Check MCP client logs for detailed errors

🧪 Testing

# Run tests npm test # Run tests with watch mode npm run test:watch # Run linter npm run lint

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

git clone https://github.com/jpwebb/pdftotext-mcp.git cd pdftotext-mcp npm install

Running Locally

npm start

Code Style

This project uses ESLint. Run npm run lint to check code style.

📄 License

MIT - see LICENSE file for details.

🙏 Acknowledgments


Made for the MCP community

Related MCP Servers

  • -
    security
    F
    license
    -
    quality
    Provides tools for reading and extracting text from PDF files, supporting both local files and URLs.
    Last updated -
    25
    Python
  • A
    security
    F
    license
    A
    quality
    An MCP server that provides a tool to extract text content from local PDF files, supporting both standard PDF reading and OCR capabilities with optional page selection.
    Last updated -
    1
    17
    Python
    • Apple
  • -
    security
    F
    license
    -
    quality
    A PDF processing server that extracts text via normal parsing or OCR, and retrieves images from PDF files through the MCP protocol with a built-in web debugger.
    Last updated -
    26
    Python
  • -
    security
    A
    license
    -
    quality
    A Model Context Protocol (MCP) based server that efficiently manages PDF files, allowing AI coding tools like Cursor to read, summarize, and extract information from PDF datasheets to assist embedded development work.
    Last updated -
    6
    Apache 2.0

View all related MCP servers

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jpwebb/pdftotext-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server