Skip to main content
Glama

PDFtotext MCP Server

by jpwebb
MIT License
8
  • Linux
  • Apple

PDFtotext MCP Server

A reliable Model Context Protocol (MCP) server for PDF text extraction using the proven pdftotext utility from poppler-utils.

🚀 Why This Server?

Unlike other PDF MCP servers that suffer from logging interference, complex dependencies, and reliability issues, pdftotext-mcp is:

  • Actually works - Clean JSON-RPC communication without stdout pollution
  • Reliable - Built on mature pdftotext from poppler-utils (used by millions)
  • Lightweight - Minimal dependencies, maximum compatibility
  • Production tested - Successfully tested with Claude Desktop and other MCP clients
  • Feature complete - Page-specific extraction, layout preservation, encoding options
  • Error handling - Comprehensive validation and helpful error messages

📋 Features

  • 📄 Extract text from entire PDF documents or specific pages
  • 🎨 Preserve original layout formatting (optional)
  • 🔤 Multiple text encoding support (UTF-8, Latin1, ASCII)
  • 📊 Comprehensive metadata in responses (word count, file info, etc.)
  • 🛡️ File validation and security checks
  • ⚡ Fast processing with configurable timeouts
  • 🔍 Detailed error reporting with troubleshooting hints

🔧 Prerequisites

You must have pdftotext installed on your system:

Ubuntu/Debian

sudo apt update sudo apt install poppler-utils

macOS

brew install poppler

Windows

# Using Chocolatey choco install poppler # Using Scoop scoop install poppler

Verify Installation

pdftotext -v

📦 Installation

npm install -g pdftotext-mcp

Option 2: Use with npx (No Installation)

npx pdftotext-mcp

Option 3: Local Development

git clone https://github.com/jpwebb/pdftotext-mcp.git cd pdftotext-mcp npm install npm start

⚙️ Configuration

Add to your MCP client configuration:

Claude Desktop

Add to claude_desktop_config.json:

{ "mcpServers": { "pdftotext": { "command": "pdftotext-mcp" } } }

Or with npx:

{ "mcpServers": { "pdftotext": { "command": "npx", "args": ["pdftotext-mcp"] } } }

Other MCP Clients

The server works with any MCP-compatible client. Use pdftotext-mcp as the command.

🎯 Usage

The server provides a single, powerful tool: read_pdf_text

Basic Usage

Extract entire document
{ "tool": "read_pdf_text", "arguments": { "path": "./document.pdf" } }
Extract specific page
{ "tool": "read_pdf_text", "arguments": { "path": "./document.pdf", "page": 2 } }
Preserve layout formatting
{ "tool": "read_pdf_text", "arguments": { "path": "./document.pdf", "layout": true } }
Custom encoding
{ "tool": "read_pdf_text", "arguments": { "path": "./document.pdf", "encoding": "Latin1" } }

Response Format

Success Response
{ "success": true, "file": "document.pdf", "path": "/absolute/path/to/document.pdf", "extractedText": "Full text content...", "pageSpecific": "all", "layoutPreserved": false, "encoding": "UTF-8", "fileSize": 1048576, "lastModified": "2024-01-15T10:30:00.000Z", "extractedAt": "2024-01-15T10:35:00.000Z", "textLength": 5234, "wordCount": 892 }
Error Response
{ "success": false, "error": "File not found: ./nonexistent.pdf", "errorType": "FILE_NOT_FOUND", "file": "./nonexistent.pdf", "timestamp": "2024-01-15T10:35:00.000Z" }

📚 API Reference

Tool: read_pdf_text

Extracts text content from PDF files using pdftotext.

Parameters
ParameterTypeRequiredDefaultDescription
pathstring-Path to PDF file (relative or absolute)
pagenumberall pagesSpecific page to extract (1-based)
layoutbooleanfalsePreserve original text layout
encodingstring"UTF-8"Output text encoding
Supported Encodings
  • UTF-8 (default)
  • Latin1
  • ASCII
Error Types
  • FILE_NOT_FOUND - PDF file doesn't exist
  • PERMISSION_DENIED - Cannot read the file
  • INVALID_PDF - File is not a valid PDF
  • PDFTOTEXT_ERROR - pdftotext utility error
  • UNKNOWN_ERROR - Unexpected error

🔧 Troubleshooting

"pdftotext is not available"

Solution: Install poppler-utils (see Prerequisites)

"File not found"

Solutions:

  • Use absolute paths: /home/user/document.pdf
  • Check file exists: ls -la /path/to/file.pdf
  • Verify MCP server working directory

"Permission denied"

Solutions:

  • Check file permissions: chmod 644 document.pdf
  • Ensure directory is readable: chmod 755 /path/to/directory/

"File is not a valid PDF"

Solutions:

  • Verify file is actually a PDF: file document.pdf
  • Check for file corruption
  • Try with a different PDF file

MCP Connection Issues

Solutions:

  • Restart your MCP client completely
  • Check configuration syntax in config file
  • Verify pdftotext-mcp is accessible in PATH
  • Check MCP client logs for detailed errors

🧪 Testing

# Run tests npm test # Run tests with watch mode npm run test:watch # Run linter npm run lint

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

git clone https://github.com/jpwebb/pdftotext-mcp.git cd pdftotext-mcp npm install

Running Locally

npm start

Code Style

This project uses ESLint. Run npm run lint to check code style.

📄 License

MIT - see LICENSE file for details.

🙏 Acknowledgments


Made for the MCP community

Related MCP Servers

  • -
    security
    F
    license
    -
    quality
    Provides tools for reading and extracting text from PDF files, supporting both local files and URLs.
    Last updated -
    23
    Python
  • A
    security
    F
    license
    A
    quality
    A Model Context Protocol server that converts PDF documents into PNG images through a simple MCP tool call.
    Last updated -
    1
    5
    Python
    • Apple
    • Linux
  • A
    security
    F
    license
    A
    quality
    An MCP server that provides a tool to extract text content from local PDF files, supporting both standard PDF reading and OCR capabilities with optional page selection.
    Last updated -
    1
    17
    Python
    • Apple
  • -
    security
    F
    license
    -
    quality
    A PDF processing server that extracts text via normal parsing or OCR, and retrieves images from PDF files through the MCP protocol with a built-in web debugger.
    Last updated -
    24
    Python

View all related MCP servers

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jpwebb/pdftotext-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server