PDF Processor Server
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@PDF Processor Serverextract text from resume.pdf"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
FastMCP PDF Processing Server
An MCP server built with FastMCP (STDIO transport) offering PDF utilities: text extraction, metadata, merge/split/rotate, and PDF↔image conversion.
SPANISH VERSION [README.es.md]
Quick Start (Windows PowerShell)
python -m venv .venv
\.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
Copy-Item .env.example .env
python -m fastmcp_pdf_serverIf installed as a package, you may also run:
fastmcp-pdf-serverRelated MCP server: PDF Reader MCP Server
Quick Start (Linux/macOS)
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
python -m fastmcp_pdf_serverMCP Integration
Transport: STDIO. Do not print to stdout/stderr; logs go to file.
Server name/version: from config (
server_name,server_version).Tools are registered using
@app.tool()and return structured outputs with ametablock containingoperation_idandexecution_ms.
Claude Desktop config example
Add to claude_desktop_config.json (Update with your own File System Path):
{
"mcpServers": {
"pdf-processor-server": {
"command": "D:\\Github Projects\\mcp_pdf_server\\.venv\\Scripts\\python.exe",
"args": [
"-m",
"fastmcp_pdf_server"
],
"env": {
"MAX_FILE_SIZE_MB": "50",
"TEMP_DIR": "D:\\Github Projects\\mcp_pdf_server\\temp_files",
"LOG_LEVEL": "DEBUG",
"LOG_FILE_PATH": "D:\\Github Projects\\mcp_pdf_server\\logs\\fastmcp_pdf_server.log",
"SERVER_NAME": "pdf-processor-server",
"SERVER_VERSION": "1.0.0",
"PATH": "%PATH%;C:\\poppler-25.07.0\\Library\\bin"
}
}
}Note: If you update dependencies (e.g., we added requests for URL uploads), reinstall with:
pip install -r requirements.txtClaude Desktop config example (Linux)
Add to claude_desktop_config.json:
{
"mcpServers": {
"pdf-processor-server": {
"command": "python3",
"args": [
"-m",
"fastmcp_pdf_server"
],
"env": {
"MAX_FILE_SIZE_MB": "50",
"TEMP_DIR": "/home/you/dev/mcp_pdf_server/temp_files",
"LOG_LEVEL": "DEBUG"
}
}
}
}Claude Desktop config example (macOS)
Add to claude_desktop_config.json:
{
"mcpServers": {
"pdf-processor-server": {
"command": "python3",
"args": [
"-m",
"fastmcp_pdf_server"
],
"env": {
"MAX_FILE_SIZE_MB": "50",
"TEMP_DIR": "/Users/you/dev/mcp_pdf_server/temp_files",
"LOG_LEVEL": "DEBUG"
}
}
}
}Programmatic usage (Python)
import asyncio
from fastmcp import Client
async def main():
client = Client(command="python", args=["-m", "fastmcp_pdf_server"])
await client.start()
try:
info = await client.call_tool("server_info")
print(info)
finally:
await client.close()
asyncio.run(main())Exposed Tools (API)
All tools return structured data; many responses include meta.operation_id and meta.execution_ms.
Some tools return lists (arrays). These are marked with FastMCP's x-fastmcp-wrap-result, so clients receive { "result": [...] } at the RPC layer.
MCP Tools Reference
Each tool lists: purpose, inputs, outputs, behavior, examples, and notes about errors and usage.
Utilities / Server
server_info()Purpose: Return basic server info and configuration snapshot (non-secret).
Inputs: None
Returns: dict with keys:
name(str): server name from settingsversion(str): server version from settingsmax_file_size_mb(int): maximum configured file size in megabytestemp_dir(str): absolute path to temporary files directorylog_file(str): absolute path to the log filemeta(dict): operation metadata:operation_id(hex),execution_ms(int)
Errors: none expected; if configuration missing, underlying access may raise exceptions.
Example:
Call:
{ "name": "server_info" }Response:
{ "name": "mcp-pdf", "version": "1.0.0", "meta": { ... } }
list_temp_resources(content_type: Optional[str] = None, max_items: Optional[int] = 100) -> list[dict]Purpose: List files currently in the server temp directory with optional filtering by content type.
Inputs:
content_type(optional str): MIME filter; supported examples:application/pdf,image/png,image/jpeg.max_items(optional int): maximum number of entries to return (default 100). If set to null or 0, defaults to 100.
Returns: list of resource dicts (each):
path(str): absolute path to the temp filesize(int): size in bytescreated(str): creation timestamp (ISO or file-manager-specific format)content_type(str): MIME type of resourcefilename(str): filename onlyextension(str): lowercased file extension (e.g..pdf)directory(str): parent directory of the file
Behavior: Cleans up expired temp files before listing. Result list is sliced to
max_items.Errors: Raises
ValueErrorif internal listing fails.Example call:
{ "name": "list_temp_resources", "arguments": { "content_type": "application/pdf" } }
get_pdf_info(file_path: str) -> dictPurpose: Read a PDF headers and basic info without extracting pages/text.
Inputs:
file_path(str): path to an existing file on disk (absolute or relative). Must be accessible to the server.
Returns: dict:
pages(int): number of pagessize(int): file size in bytesversion(str|None): PDF header/version info (if available)encrypted(bool): whether the PDF is encryptedmeta(dict):operation_id,execution_ms
Errors:
Raises
ValueErrorif file not found.May raise other errors if the file is not a PDF or is corrupted.
get_resource_base64(file_path: str) -> dictPurpose: Return base64-encoded contents of a file inside the server temp directory.
Inputs:
file_path(str): path; must be inside the configured temp directory. The function enforces this.
Returns: dict:
path(str): resolved path inside tempbase64(str): Base64-encoded content of the filemeta(dict): operation metadata
Errors:
Raises
ValueErrorif the path is outside temp or file missing.
Notes: Use this to fetch content for download via MCP where direct file transfers aren't available.
Uploads
upload_file(file: Any, filename: Optional[str] = None) -> dictPurpose: Persist an uploaded file into the server temp directory.
Inputs:
file(Any): Accepts:a full path string to a local file
a short filename that refers to a file already stored in temp
bytes or file-like object
dicts containing
base64andfilename(will be saved to temp)
filename(Optional[str]): optional filename hint used when saving raw bytes.
Returns: dict:
path(str): absolute path to the saved filefilename(str): saved filenamedirectory(str): directory containing the filemeta(dict): operation metadata
Errors:
Raises
ValueErrorwith a descriptive message on failure (network, decoding, IO).
Example:
To upload base64: call
upload_filewithfile={ "base64": "<...>", "filename": "my.pdf" }.
upload_file_base64(base64: str, filename: str) -> dictPurpose: Upload raw Base64 content and persist to temp storage.
Inputs:
base64(str): Base64 stringfilename(str): filename to use when saving
Returns: dict:
path,filename,directory,size(int),meta
Errors: Raises
ValueErroron decoding or write errors.
upload_file_url(url: str, filename: Optional[str] = None) -> dictPurpose: Download a remote file (HTTP/HTTPS) and save to temp storage.
Inputs:
url(str): direct URL to filefilename(Optional[str]): optional override filename
Returns: dict with
path,filename,directory,meta.Notes: Requires
requestspackage to be available in the environment.
Text Extraction
extract_text(file: Any, encoding: Optional[str] = "utf-8") -> dictPurpose: Extract all text from a PDF and return summary metrics.
Inputs:
file(Any): same resolver rules asupload_file(path, temp filename, bytes, base64 dict).encoding(str|None): encoding used when returning text (defaultutf-8).
Returns: dict:
text(str): full extracted textpage_count(int): number of pages processedchar_count(int): number of characters intextmeta(dict): includesresolved_pathpointing to saved temp file
Errors:
Raises
ValueErrorwith helpful hint explaining how to provide the file if extraction fails.
Example usage:
Upload a file with
upload_file, then callextract_textwith the returnedpath.
extract_text_by_page(file: Any, pages: Optional[List[int]] = None, page_range: Optional[str] = None, encoding: Optional[str] = "utf-8") -> list[dict]Purpose: Extract text from specific pages or a page range.
Inputs:
file(Any): resolver rules as abovepages(Optional[List[int]]): list of 1-based page indices to extract (e.g.,[1,3,5]).page_range(Optional[str]): range expression like"1-3,5"(parser inutils.parserswill be used).encoding(Optional[str]): text encoding
Returns: list of page result dicts; each dict typically contains:
page_number(int)text(str)char_count(int)
Behavior: If both
pagesandpage_rangeare provided,pagestakes precedence. The tool returns a list directly (framework wraps list results).Errors: Raises
ValueErroron invalid pages or extraction failures.
extract_metadata(file: Any) -> dictPurpose: Extract detailed PDF metadata (author, title, producer, creation/mod dates, custom metadata, etc.).
Inputs:
filesame as above.Returns: dict containing metadata keys found in the PDF plus
metaoperation info.
Conversion
pdf_to_images(file_path: str, output_dir: str, format: str = "png", dpi: int = 150, pages: Optional[List[int]] = None) -> list[dict]Purpose: Convert one or more PDF pages to image files.
Inputs:
file_path(str): path to the PDF on disk (absolute or temp path).output_dir(str): directory where generated images will be written.format(str): image format, e.g.,png,jpeg.dpi(int): resolution for conversion (default 150).pages(Optional[List[int]]): list of 1-based pages to render;Nonefor all pages.
Returns: list of dicts for each generated image:
path(str),page_number(int),size(int),format(str)
Notes: Implementation uses
pdf2imageand PIL; ensure dependencies and poppler are installed on the host.
images_to_pdf(image_paths: List[str], output_path: str, page_size: str = "A4", orientation: str = "portrait") -> dictPurpose: Create a PDF document from multiple images.
Inputs:
image_paths(List[str]): list of image file paths in orderoutput_path(str): path for the generated PDFpage_size(str): e.g.,A4,Letter(processor maps to physical sizes)orientation(str):portraitorlandscape
Returns: dict with success info and
metaincluding operation timing.
PDF Manipulation
merge_pdfs(input_files: List[str], output_path: str) -> dictPurpose: Merge multiple PDF files into a single PDF.
Inputs:
input_files(List[str]): file pathsoutput_path(str): destination path
Returns: dict with details (e.g.,
path) andmeta.
split_pdf(file_path: str, split_ranges: List[Dict[str, Any]]) -> list[dict]Purpose: Split a PDF into multiple files by page ranges.
Inputs:
file_path(str): source PDFsplit_ranges(List[Dict]): each dict should describestartandendpages and optionalfilename.
Returns: list of generated files info dicts.
rotate_pages(file_path: str, rotations: List[Dict[str, int]], output_path: str) -> dictPurpose: Rotate specific pages in a PDF and write to
output_path.Inputs:
file_path(str): source PDFrotations(List[Dict]): each dict should includepage(1-based) anddegrees(e.g., 90, 180, 270).output_path(str): target PDF path
Returns: dict with
pathandmeta.
Notes:
All tools log an
operation_idand execution time in ms in the returnedmetaobject.Tools that return lists set
x-fastmcp-wrap-result=truefor the framework so they are returned as bare lists.Tools will raise
ValueErrorfor user-facing errors; internal exceptions are logged.For file inputs, prefer uploading first via
upload_fileto ensure files are in the server temp directory.page_rangesyntax usesutils.parsers.parse_page_range: e.g.,"1-3,5,7-9".If both
pagesandpage_rangeare passed,pagestakes precedence.Image conversion requires Poppler (see below).
Example JSON: extract_text (simple)
Request arguments:
{
"file": "C:/path/to/input.pdf",
"encoding": "utf-8"
}Response shape:
{
"text": "... full extracted text ...",
"page_count": 3,
"char_count": 1234,
"meta": { "operation_id": "<hex>", "execution_ms": 42 }
}Uploading files (Claude Desktop and clients)
Claude may not automatically send binary file contents. Use one of these upload tools to persist a file to the server temp directory, then reference it by short filename in subsequent calls.
Upload a file (generic)
Tool:
upload_fileRequest:
{
"name": "upload_file",
"arguments": {
"file": { "base64": "<BASE64_DATA>", "filename": "document.pdf" }
}
}Response contains
filenameand absolutepathunder the servertemp_dir.
Upload a file as base64 (explicit schema)
Tool:
upload_file_base64Request:
{
"name": "upload_file_base64",
"arguments": { "base64": "<BASE64_DATA>", "filename": "document.pdf" }
}Upload a file from URL (explicit schema)
Tool:
upload_file_urlRequest:
{
"name": "upload_file_url",
"arguments": { "url": "https://example.com/document.pdf", "filename": "document.pdf" }
}Extract text using the saved short filename
Request:
{
"name": "extract_text",
"arguments": { "file": "document.pdf" }
}Alternative: provide a URL to upload_file (requires requests installed):
{
"name": "upload_file",
"arguments": {
"file": { "url": "https://example.com/document.pdf", "filename": "document.pdf" }
}
}Manual option: run server_info to get temp_dir, copy your file into that directory, then call tools with the short filename.
Example JSON: merge_pdfs
Request arguments:
{
"input_files": [
"C:/path/a.pdf",
"C:/path/b.pdf"
],
"output_path": "C:/path/merged.pdf"
}Response shape:
{
"output_path": "C:/path/merged.pdf",
"page_count": 10,
"size": 456789,
"meta": { "operation_id": "<hex>", "execution_ms": 87 }
}Configuration
Configuration is loaded via pydantic-settings from .env and environment variables.
Env vars (case-insensitive):
MAX_FILE_SIZE_MB(int, default 50): Max file size for inputs.LOG_LEVEL(str, defaultINFO): Logging level.LOG_FILE_PATH(str, defaultlogs/pdf-processor-server.log): Log file path.TEMP_DIR(str, defaulttemp_files): Working temp storage directory.SERVER_NAME(str, defaultpdf-processor-server): Server name.SERVER_VERSION(str, default1.0.0): Server version.
Path helpers:
TEMP_DIRresolves to absolutesettings.temp_path.LOG_FILE_PATHresolves to absolutesettings.log_path.
Storage & Security
Temp files are stored under
TEMP_DIRand cleaned up automatically after 24h of inactivity.ensure_within_temp(path)prevents reading files outsideTEMP_DIRfor base64 retrieval.Validators enforce allowed extensions and size limits for PDFs and images.
Logging & Telemetry
Rotating logs at
LOG_FILE_PATH(10MB x 5). No stdout/stderr prints.Each tool returns
meta.operation_idandmeta.execution_msfor traceability.Server banner and lifecycle logs are emitted by FastMCP at startup/shutdown.
Windows: Poppler for pdf2image
pdf2image requires Poppler binaries.
Download: https://github.com/oschwartz10612/poppler-windows/releases/
Extract, add
poppler-*/Library/binto yourPATH.Verify:
pdftoppm -vprints a version. If not available,pdf_to_imagestools will raise helpful errors.
Linux: Poppler for pdf2image
pdf2image requires Poppler binaries. Install via your package manager:
Debian/Ubuntu:
sudo apt update && sudo apt install -y poppler-utilsFedora:
sudo dnf install -y poppler-utilsArch:
sudo pacman -S --noconfirm popplerVerify:
pdftoppm -vprints a version.
macOS: Poppler for pdf2image
Install Poppler with Homebrew:
brew install popplerIf Homebrew is in /opt/homebrew/bin (Apple Silicon), ensure your shell PATH includes it. Verify: pdftoppm -v.
Developer Guide
Project layout
src/fastmcp_pdf_server/main.py: Builds FastMCP app, registers tools, runs via STDIO.config.py: Pydantic settings for env and paths.utils/: Logger, validators, parsers.services/: PDF and image operations, file manager.tools/: Thin async wrappers exposing services as MCP tools.
Install & Run
python -m venv .venv
\.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
Copy-Item .env.example .env
python -m fastmcp_pdf_serverLinux/macOS:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
python -m fastmcp_pdf_serverTests
pytest -qConversion tests are skipped if Poppler (pdftoppm) is not found.
Troubleshooting
Startup hangs after banner: normal for STDIO mode (waiting for an MCP client).
pdf2imageerrors: ensure Poppler on PATH; retry shell after updating PATH.ValueError: File not foundorInvalid file extension: check inputs and validators.Large files slow/timeout: reduce
dpi, use page-range, or increase resources.
Performance Notes
Max file size is enforced; adjust
MAX_FILE_SIZE_MBif needed.Prefer page-scoped ops for large PDFs.
Lower
dpifor faster PDF→image conversions.
Optional HTTP Mode (advanced)
FastMCP supports a streamable HTTP transport. This server defaults to STDIO. For experimentation, you can run an HTTP endpoint:
# run_http.py
import asyncio
from fastmcp_pdf_server.main import build_app
async def main():
app = build_app()
await app.run_http_async(host="127.0.0.1", port=8000, path="mcp")
asyncio.run(main())Happy Coding!
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/gabrielmrojas/pdf_mcp_server'
If you have feedback or need assistance with the MCP directory API, please join our Discord server