pdf-search-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@pdf-search-mcpfind pages mentioning 'garbage collection'"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
pdf-search-mcp
MCP server for full-text search across PDF document collections. Built for AI agents — index once, search instantly from any MCP client.
Search entire collections — pre-indexes all PDFs for instant ranked results with snippets, not one file at a time
Fully offline — no API keys, no cloud services, just SQLite FTS5 and PyMuPDF
Page rendering — render pages as PNG for formulas, diagrams, and tables; crop to a region with auto-DPI scaling for detail shots
Dual renderer — CoreGraphics on macOS (sharper math fonts), PyMuPDF on Linux/Windows
German-aware — automatic expansion of
ß↔ss,ä↔ae,ö↔oe,ü↔ueso both spellings match
Installation
From PyPI
pip install pdf-search-mcpFrom source
git clone https://github.com/renvk/pdf-search-mcp.git
cd pdf-search-mcp
python3 -m venv .venv && source .venv/bin/activate
pip install -e .Requires Python 3.10+. On macOS, pyobjc-framework-Quartz is installed automatically for native CoreGraphics PDF rendering (sharper formula and math font output). On Linux/Windows, PyMuPDF is used as the renderer.
Quick Start
1. Index your PDFs
PDF_SEARCH_DIR=/path/to/your/pdfs python -m pdf_search_mcp.pdf_search index2. Register with your MCP client
The server runs over stdio. Example for Claude Code:
# project-scoped (only available in the current directory)
claude mcp add pdf-search -- pdf-search-mcp
# or global (available in all projects)
claude mcp add --scope global pdf-search -- pdf-search-mcpFor other MCP clients, add to your MCP config:
{
"mcpServers": {
"pdf-search": {
"command": "pdf-search-mcp"
}
}
}3. Search
Ask your AI agent to search your PDFs — it will use the search, read_page, and read_page_image tools automatically.
Configuration
Environment Variable | Default | Description |
| (none) | Path to your PDF directory (required for first index, remembered after) |
|
| Path to the SQLite database file |
CLI Usage
The pdf_search.py module doubles as a CLI for indexing and direct search:
# Build index (first time — PDF_SEARCH_DIR required)
PDF_SEARCH_DIR=/path/to/pdfs python -m pdf_search_mcp.pdf_search index
# Subsequent syncs (path remembered from first index)
python -m pdf_search_mcp.pdf_search index
# Search from command line
python -m pdf_search_mcp.pdf_search search "query terms"
# Read a specific page
python -m pdf_search_mcp.pdf_search read filename.pdf 5
# Show index statistics
python -m pdf_search_mcp.pdf_search stats
# Rebuild index from scratch (path remembered)
python -m pdf_search_mcp.pdf_search reindexSearch Syntax
Uses SQLite FTS5 query syntax:
Syntax | Example | Description |
Terms |
| Both terms must appear (implicit AND) |
Phrase |
| Exact phrase match |
OR |
| Either term |
NOT |
| Exclude term |
Prefix |
| Prefix matching |
NEAR |
| Terms within 10 tokens of each other |
Auto-quoting: Terms containing dots, hyphens, commas, or slashes are automatically quoted (e.g., ISO-27001 becomes "ISO-27001") because FTS5 treats these as token separators.
German expansion: Umlauts and eszett are automatically expanded to their digraph equivalents and vice versa (ß↔ss, ä↔ae, ö↔oe, ü↔ue). Searching for Größe also finds Groesse, and Weißbuch also finds Weissbuch.
Auto-relaxation: When a multi-term query returns no results (all terms must appear on the same page), the search automatically relaxes: first by dropping one term at a time to find the term blocking results, then by OR-ing all terms. A note in the output explains what was actually searched. Queries with explicit operators (AND, OR, NOT, NEAR) are not relaxed.
MCP Tools
Tool | Parameters | Description |
|
| Full-text search with ranked results and snippets |
|
| Read the full text of a specific page |
|
| Render a page (or cropped region) as PNG. |
| (none) | Show index statistics (file count, pages, DB size, renderer) |
Python API
from pdf_search_mcp import search_pdfs, read_pdf_page, render_pdf_page, index_pdfs
# Index PDFs
index_pdfs("/path/to/pdfs")
# Search
results = search_pdfs("garbage collection", limit=5)
for r in results:
print(f"{r['subfolder']}/{r['file']} p.{r['page']}: {r['snippet']}")
# Read full page text
text = read_pdf_page("document.pdf", 42)
# Render full page as PNG
png_path = render_pdf_page("document.pdf", 42)
# Render cropped region (DPI auto-scales to maximize detail)
png_path = render_pdf_page("document.pdf", 42, region=[0.0, 0.5, 1.0, 0.8])How It Works
Indexing incrementally syncs your PDF directory into a SQLite FTS5 virtual table. On first run, all PDFs are indexed. On subsequent runs, only new, changed (by mtime/size), and deleted files are processed. Subdirectory names are preserved as a
subfoldercolumn for context. Directories starting with_are skipped.Searching runs FTS5 MATCH queries and re-ranks results by combining BM25 relevance with match density — pages where search terms cluster together score higher than pages with the same terms scattered throughout. The density signal blends term concentration (matches per character) and spatial clustering (how tightly grouped the matches are).
Reading re-opens the original PDF file on disk (path resolved via the stored
pdf_dirmetadata) for full page text or image rendering. Region crops auto-scale DPI to fill a 1568 px long-edge budget, maximizing detail without producing oversized images.
The database stores the text content only — original PDFs are accessed on disk for read_page and read_page_image. Rendering uses CoreGraphics on macOS and PyMuPDF elsewhere.
License
MIT
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/renvk/pdf-search-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server