PDF Extraction MCP Server (Claude Code Fork)
MCP server to extract contents from PDF files, with fixes for Claude Code CLI installation.
This fork includes critical fixes for installing and running the server with Claude Code (the CLI version).
What's Different in This Fork
- Added
__main__.py
- Enables the package to be run as a module withpython -m pdf_extraction
- Claude Code specific instructions - Clear installation steps that work with Claude Code CLI
- Tested installation process - Verified working with
claude mcp add
command
Components
Tools
The server implements one tool:
- extract-pdf-contents: Extract contents from a local PDF file
- Takes
pdf_path
as a required string argument (local file path) - Takes
pages
as an optional string argument (comma-separated page numbers, supports negative indexing like-1
for last page) - Supports both PDF text extraction and OCR for scanned documents
- Takes
Installation for Claude Code CLI
Prerequisites
- Python 3.11 or higher
- pip or conda
- Claude Code CLI installed (
claude
command)
Step 1: Clone and Install
Step 2: Find the Installed Command
Step 3: Add to Claude Code
Step 4: Use in Claude
Usage Example
Once connected, you can ask Claude to extract PDF contents:
Troubleshooting
Server Not Connecting
- Make sure you started a NEW Claude session after adding the server
- Verify the command path is correct:
ls -la $(which pdf-extraction)
- Test the command directly (it should hang waiting for input):
pdf-extraction
Module Not Found Errors
If you get Python import errors:
- Make sure you're using the same Python environment where you installed the package
- Try using the full Python path:
claude mcp add pdf-extraction /path/to/python -m pdf_extraction
Installation Issues
If pip install -e .
fails:
- Make sure you have Python 3.11+:
python --version
- Try creating a fresh virtual environment:
For Claude Desktop Users
This fork is specifically for Claude Code CLI. If you're using Claude Desktop (the GUI app), please refer to the original repository for installation instructions.
Dependencies
- mcp>=1.2.0
- pypdf2>=3.0.1
- pytesseract>=0.3.10 (for OCR support)
- Pillow>=10.0.0
- pydantic>=2.10.1,<3.0.0
- pymupdf>=1.24.0
Contributing
Contributions are welcome! The main change in this fork is the addition of __main__.py
to make the package runnable as a module.
License
Same as the original repository.
Credits
local-only server
The server can only run on the client's local machine because it depends on local resources.
An MCP server that provides a tool to extract text content from local PDF files, supporting both standard PDF reading and OCR capabilities with optional page selection.
Related Resources
Related MCP Servers
- -securityFlicense-qualityA PDF processing server that extracts text via normal parsing or OCR, and retrieves images from PDF files through the MCP protocol with a built-in web debugger.Last updated -27
- -securityAlicense-qualityA Model Context Protocol (MCP) based server that efficiently manages PDF files, allowing AI coding tools like Cursor to read, summarize, and extract information from PDF datasheets to assist embedded development work.Last updated -7Apache 2.0
- -securityFlicense-qualityThis MCP server enables interactions with the PDF Generator API for creating, converting, and managing PDF documents using natural language commands.Last updated -
- AsecurityAlicenseAqualityMCP server that converts Markdown to high-quality PDF documents using LaTeX, enabling AI agents like Claude to generate professional PDFs without requiring sign-ups or credit cards.Last updated -13128MIT License