Skip to main content
Glama

PDFtotext MCP Server

by jpwebb

PDFtotext MCP Server

A reliable Model Context Protocol (MCP) server for PDF text extraction using the proven pdftotext utility from poppler-utils.

npm version License: MIT

๐Ÿš€ Why This Server?

Unlike other PDF MCP servers that suffer from logging interference, complex dependencies, and reliability issues, pdftotext-mcp is:

  • โœ… Actually works - Clean JSON-RPC communication without stdout pollution

  • โœ… Reliable - Built on mature pdftotext from poppler-utils (used by millions)

  • โœ… Lightweight - Minimal dependencies, maximum compatibility

  • โœ… Production tested - Successfully tested with Claude Desktop and other MCP clients

  • โœ… Feature complete - Page-specific extraction, layout preservation, encoding options

  • โœ… Error handling - Comprehensive validation and helpful error messages

Related MCP server: PDF Extraction MCP Server

๐Ÿ“‹ Features

  • ๐Ÿ“„ Extract text from entire PDF documents or specific pages

  • ๐ŸŽจ Preserve original layout formatting (optional)

  • ๐Ÿ”ค Multiple text encoding support (UTF-8, Latin1, ASCII)

  • ๐Ÿ“Š Comprehensive metadata in responses (word count, file info, etc.)

  • ๐Ÿ›ก๏ธ File validation and security checks

  • โšก Fast processing with configurable timeouts

  • ๐Ÿ” Detailed error reporting with troubleshooting hints

๐Ÿ”ง Prerequisites

You must have pdftotext installed on your system:

Ubuntu/Debian

sudo apt update sudo apt install poppler-utils

macOS

brew install poppler

Windows

# Using Chocolatey choco install poppler # Using Scoop scoop install poppler

Verify Installation

pdftotext -v

๐Ÿ“ฆ Installation

Option 1: Global Installation (Recommended)

npm install -g pdftotext-mcp

Option 2: Use with npx (No Installation)

npx pdftotext-mcp

Option 3: Local Development

git clone https://github.com/jpwebb/pdftotext-mcp.git cd pdftotext-mcp npm install npm start

โš™๏ธ Configuration

Add to your MCP client configuration:

Claude Desktop

Add to claude_desktop_config.json:

{ "mcpServers": { "pdftotext": { "command": "pdftotext-mcp" } } }

Or with npx:

{ "mcpServers": { "pdftotext": { "command": "npx", "args": ["pdftotext-mcp"] } } }

Other MCP Clients

The server works with any MCP-compatible client. Use pdftotext-mcp as the command.

๐ŸŽฏ Usage

The server provides a single, powerful tool: read_pdf_text

Basic Usage

Extract entire document

{ "tool": "read_pdf_text", "arguments": { "path": "./document.pdf" } }

Extract specific page

{ "tool": "read_pdf_text", "arguments": { "path": "./document.pdf", "page": 2 } }

Preserve layout formatting

{ "tool": "read_pdf_text", "arguments": { "path": "./document.pdf", "layout": true } }

Custom encoding

{ "tool": "read_pdf_text", "arguments": { "path": "./document.pdf", "encoding": "Latin1" } }

Response Format

Success Response

{ "success": true, "file": "document.pdf", "path": "/absolute/path/to/document.pdf", "extractedText": "Full text content...", "pageSpecific": "all", "layoutPreserved": false, "encoding": "UTF-8", "fileSize": 1048576, "lastModified": "2024-01-15T10:30:00.000Z", "extractedAt": "2024-01-15T10:35:00.000Z", "textLength": 5234, "wordCount": 892 }

Error Response

{ "success": false, "error": "File not found: ./nonexistent.pdf", "errorType": "FILE_NOT_FOUND", "file": "./nonexistent.pdf", "timestamp": "2024-01-15T10:35:00.000Z" }

๐Ÿ“š API Reference

Tool: read_pdf_text

Extracts text content from PDF files using pdftotext.

Parameters

Parameter

Type

Required

Default

Description

path

string

โœ…

-

Path to PDF file (relative or absolute)

page

number

โŒ

all pages

Specific page to extract (1-based)

layout

boolean

โŒ

false

Preserve original text layout

encoding

string

โŒ

"UTF-8"

Output text encoding

Supported Encodings

  • UTF-8 (default)

  • Latin1

  • ASCII

Error Types

  • FILE_NOT_FOUND - PDF file doesn't exist

  • PERMISSION_DENIED - Cannot read the file

  • INVALID_PDF - File is not a valid PDF

  • PDFTOTEXT_ERROR - pdftotext utility error

  • UNKNOWN_ERROR - Unexpected error

๐Ÿ”ง Troubleshooting

"pdftotext is not available"

Solution: Install poppler-utils (see Prerequisites)

"File not found"

Solutions:

  • Use absolute paths: /home/user/document.pdf

  • Check file exists: ls -la /path/to/file.pdf

  • Verify MCP server working directory

"Permission denied"

Solutions:

  • Check file permissions: chmod 644 document.pdf

  • Ensure directory is readable: chmod 755 /path/to/directory/

"File is not a valid PDF"

Solutions:

  • Verify file is actually a PDF: file document.pdf

  • Check for file corruption

  • Try with a different PDF file

MCP Connection Issues

Solutions:

  • Restart your MCP client completely

  • Check configuration syntax in config file

  • Verify pdftotext-mcp is accessible in PATH

  • Check MCP client logs for detailed errors

๐Ÿงช Testing

# Run tests npm test # Run tests with watch mode npm run test:watch # Run linter npm run lint

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

git clone https://github.com/jpwebb/pdftotext-mcp.git cd pdftotext-mcp npm install

Running Locally

npm start

Code Style

This project uses ESLint. Run npm run lint to check code style.

๐Ÿ“„ License

MIT - see LICENSE file for details.

๐Ÿ™ Acknowledgments

๐Ÿ”— Related


Made for the MCP community

Deploy Server
A
security โ€“ no known vulnerabilities
A
license - permissive license
A
quality - confirmed to work

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jpwebb/pdftotext-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server