Uses OpenAI's GPT-4o-mini model to generate answers to questions about PDF documents through AI-powered retrieval-augmented generation (RAG).
PDF Retrieval MCP Server
A completely free Model Context Protocol (MCP) server for retrieving relevant chunks from PDF documents using hybrid search (BM25 + Vector Search).
๐ Features
PDF Document Processing: Automatic parsing and indexing of PDF files using Docling
Hybrid Retrieval: Combines BM25 (keyword) and vector search (semantic) for accurate retrieval
Free Embeddings: Uses ChromaDB's default sentence-transformers (no API costs!)
Pure Retrieval Mode: Returns raw document chunks for agent processing (no LLM answer generation)
Fresh Start: Clears vector database on each startup for clean indexing
MCP Integration: Exposes
retrieve_pdf_chunkstool via FastMCP for seamless agent integration
๐ Prerequisites
Python 3.11 or later
PDF documents to index
No API keys required! โจ
๐ ๏ธ Installation
1. Clone the Repository (if not already done)
2. Install Dependencies with uv
This will automatically:
Create a virtual environment (
.venv)Install all dependencies from
pyproject.tomlSet up the project
3. Add PDF Documents
Create a documents directory and add your PDF files:
That's it! No API keys or additional configuration needed.
๐ฏ Usage
Running the Server
Or activate the virtual environment first:
The server will:
Start immediately (lazy initialization)
Load and index PDFs on first query
Be ready to retrieve document chunks via MCP
Using the retrieve_pdf_chunks Tool
The server exposes a single MCP tool: retrieve_pdf_chunks(query: str, max_chunks: int = 5) -> str
Example Query:
Example Response:
Response Structure
Field | Type | Description |
| string | The original search query |
| array | List of relevant document chunks |
| string | The text content of the chunk |
| string | Source PDF filename |
| int | Page number (if available) |
| object | Additional metadata |
| int | Number of chunks returned |
How Agents Use This
When an agent (like Claude) calls this tool:
Agent sends a search query
Server returns relevant document chunks
Agent uses chunks in its context to answer questions
Example Agent Flow:
๐ Testing with MCP Inspector
The MCP Inspector is a web-based tool for testing and debugging MCP servers interactively.
Running the Inspector
This command will:
Start the MCP Inspector proxy server
Launch your PDF Retrieval Server
Open a web browser with the Inspector UI
What You'll See
The Inspector provides:
Tool Discovery: View available tools (
retrieve_pdf_chunks)Interactive Testing: Test queries with custom parameters
Real-time Responses: See JSON responses in real-time
Request/Response Logs: Debug the MCP protocol communication
Example Inspector Workflow
Open the Inspector - Browser opens automatically at
http://localhost:6274Wait for Initialization - Server loads and indexes PDFs on first query (~1-2 minutes)
Select Tool - Click on
retrieve_pdf_chunksin the tools listEnter Query - Type your search query (e.g., "machine learning")
Set Parameters - Optionally adjust
max_chunks(default: 5)Execute - Click "Run" to see the results
View Response - Inspect the returned chunks and metadata
Inspector Tips
First query is slow: PDF indexing happens on first query (87 seconds for typical PDFs)
Subsequent queries are fast: Embeddings are cached in ChromaDB
Fresh start: Server clears ChromaDB on each restart for clean indexing
Check logs: Terminal shows detailed logging of the indexing process
๐๏ธ Architecture
Key Components
PDFProcessor: Singleton class that loads PDFs, converts to Markdown using Docling, and builds hybrid retriever (BM25 + Vector Search)
RetrievalHandler: Retrieves relevant chunks for queries - no LLM answer## ๐ง Configuration
Configuration is managed through environment variables. Create a .env file in the project root:
Configuration Options
Variable | Required | Default | Description |
| No |
| Directory containing PDF files to index |
| No |
| Directory for ChromaDB vector storage |
| No |
| Logging level (DEBUG, INFO, WARNING, ERROR) |
Note: No API keys required! ChromaDB uses free local embeddings (sentence-transformers).
๐งช Testing
Run unit tests:
๐ Troubleshooting
No PDF files found
Error: No PDF files found in ./documents
Solution: Add PDF files to the documents/ directory or update PDF_DOCUMENTS_DIR in .env
Import errors
Error: ModuleNotFoundError: No module named 'docling'
Solution: Ensure all dependencies are installed: uv sync
CUDA out of memory
Error: CUDA out of memory
Solution: The server is configured to use CPU-only mode. If you still see this error, check that CUDA_VISIBLE_DEVICES="" is set in src/pdf_processor.py
๐ Dependencies
fastmcp: MCP server framework
docling: Document processing and parsing
chromadb: Vector database with free sentence-transformers embeddings
langchain: RAG framework and retrievers
loguru: Logging
No paid APIs required! All embeddings are generated locally using ChromaDB's default model (all-MiniLM-L6-v2).
๐ค Contributing
This is a Proof of Concept (PoC) implementation. For production use, consider:
Adding caching for processed documents
Implementing multi-agent workflow with fact verification
Supporting additional document formats (DOCX, TXT, etc.)
Adding authentication and rate limiting
๐ License
[Your License Here]
๐ Acknowledgments
Based on the docchat-docling architecture.