Skip to main content
Glama
jpwebb

PDFtotext MCP Server

by jpwebb

PDFtotext MCP Server

A reliable Model Context Protocol (MCP) server for PDF text extraction using the proven pdftotext utility from poppler-utils.

npm version License: MIT

๐Ÿš€ Why This Server?

Unlike other PDF MCP servers that suffer from logging interference, complex dependencies, and reliability issues, pdftotext-mcp is:

  • โœ… Actually works - Clean JSON-RPC communication without stdout pollution

  • โœ… Reliable - Built on mature pdftotext from poppler-utils (used by millions)

  • โœ… Lightweight - Minimal dependencies, maximum compatibility

  • โœ… Production tested - Successfully tested with Claude Desktop and other MCP clients

  • โœ… Feature complete - Page-specific extraction, layout preservation, encoding options

  • โœ… Error handling - Comprehensive validation and helpful error messages

Related MCP server: PDF Extraction MCP Server

๐Ÿ“‹ Features

  • ๐Ÿ“„ Extract text from entire PDF documents or specific pages

  • ๐ŸŽจ Preserve original layout formatting (optional)

  • ๐Ÿ”ค Multiple text encoding support (UTF-8, Latin1, ASCII)

  • ๐Ÿ“Š Comprehensive metadata in responses (word count, file info, etc.)

  • ๐Ÿ›ก๏ธ File validation and security checks

  • โšก Fast processing with configurable timeouts

  • ๐Ÿ” Detailed error reporting with troubleshooting hints

๐Ÿ”ง Prerequisites

You must have pdftotext installed on your system:

Ubuntu/Debian

sudo apt update
sudo apt install poppler-utils

macOS

brew install poppler

Windows

# Using Chocolatey
choco install poppler

# Using Scoop
scoop install poppler

Verify Installation

pdftotext -v

๐Ÿ“ฆ Installation

npm install -g pdftotext-mcp

Option 2: Use with npx (No Installation)

npx pdftotext-mcp

Option 3: Local Development

git clone https://github.com/jpwebb/pdftotext-mcp.git
cd pdftotext-mcp
npm install
npm start

โš™๏ธ Configuration

Add to your MCP client configuration:

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "pdftotext": {
      "command": "pdftotext-mcp"
    }
  }
}

Or with npx:

{
  "mcpServers": {
    "pdftotext": {
      "command": "npx",
      "args": ["pdftotext-mcp"]
    }
  }
}

Other MCP Clients

The server works with any MCP-compatible client. Use pdftotext-mcp as the command.

๐ŸŽฏ Usage

The server provides a single, powerful tool: read_pdf_text

Basic Usage

Extract entire document

{
  "tool": "read_pdf_text",
  "arguments": {
    "path": "./document.pdf"
  }
}

Extract specific page

{
  "tool": "read_pdf_text",
  "arguments": {
    "path": "./document.pdf",
    "page": 2
  }
}

Preserve layout formatting

{
  "tool": "read_pdf_text",
  "arguments": {
    "path": "./document.pdf",
    "layout": true
  }
}

Custom encoding

{
  "tool": "read_pdf_text",
  "arguments": {
    "path": "./document.pdf",
    "encoding": "Latin1"
  }
}

Response Format

Success Response

{
  "success": true,
  "file": "document.pdf",
  "path": "/absolute/path/to/document.pdf",
  "extractedText": "Full text content...",
  "pageSpecific": "all",
  "layoutPreserved": false,
  "encoding": "UTF-8",
  "fileSize": 1048576,
  "lastModified": "2024-01-15T10:30:00.000Z",
  "extractedAt": "2024-01-15T10:35:00.000Z",
  "textLength": 5234,
  "wordCount": 892
}

Error Response

{
  "success": false,
  "error": "File not found: ./nonexistent.pdf",
  "errorType": "FILE_NOT_FOUND",
  "file": "./nonexistent.pdf",
  "timestamp": "2024-01-15T10:35:00.000Z"
}

๐Ÿ“š API Reference

Tool: read_pdf_text

Extracts text content from PDF files using pdftotext.

Parameters

Parameter

Type

Required

Default

Description

path

string

โœ…

-

Path to PDF file (relative or absolute)

page

number

โŒ

all pages

Specific page to extract (1-based)

layout

boolean

โŒ

false

Preserve original text layout

encoding

string

โŒ

"UTF-8"

Output text encoding

Supported Encodings

  • UTF-8 (default)

  • Latin1

  • ASCII

Error Types

  • FILE_NOT_FOUND - PDF file doesn't exist

  • PERMISSION_DENIED - Cannot read the file

  • INVALID_PDF - File is not a valid PDF

  • PDFTOTEXT_ERROR - pdftotext utility error

  • UNKNOWN_ERROR - Unexpected error

๐Ÿ”ง Troubleshooting

"pdftotext is not available"

Solution: Install poppler-utils (see Prerequisites)

"File not found"

Solutions:

  • Use absolute paths: /home/user/document.pdf

  • Check file exists: ls -la /path/to/file.pdf

  • Verify MCP server working directory

"Permission denied"

Solutions:

  • Check file permissions: chmod 644 document.pdf

  • Ensure directory is readable: chmod 755 /path/to/directory/

"File is not a valid PDF"

Solutions:

  • Verify file is actually a PDF: file document.pdf

  • Check for file corruption

  • Try with a different PDF file

MCP Connection Issues

Solutions:

  • Restart your MCP client completely

  • Check configuration syntax in config file

  • Verify pdftotext-mcp is accessible in PATH

  • Check MCP client logs for detailed errors

๐Ÿงช Testing

# Run tests
npm test

# Run tests with watch mode
npm run test:watch

# Run linter
npm run lint

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

git clone https://github.com/jpwebb/pdftotext-mcp.git
cd pdftotext-mcp
npm install

Running Locally

npm start

Code Style

This project uses ESLint. Run npm run lint to check code style.

๐Ÿ“„ License

MIT - see LICENSE file for details.

๐Ÿ™ Acknowledgments


Made for the MCP community

Install Server
A
security โ€“ no known vulnerabilities
A
license - permissive license
A
quality - confirmed to work

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jpwebb/pdftotext-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server