Skip to main content
Glama

PDF Knowledgebase MCP Server

by juanqui

PDF Knowledgebase MCP Server

A Model Context Protocol (MCP) server that enables intelligent document search and retrieval from PDF collections. Built for seamless integration with Claude Desktop, Continue, Cline, and other MCP clients, this server provides advanced search capabilities powered by local, OpenAI, or HuggingFace embeddings and ChromaDB vector storage.

🆕 NEW Features:

  • Reranking Support: Advanced result reranking using Qwen3-Reranker models (standard and GGUF) for improved search relevance
  • GGUF Quantized Models: Memory-optimized local embeddings and rerankers with 50-70% smaller models using GGUF quantization
  • Qwen3-Embedding Exclusive Support: Optimized support for the advanced Qwen3-Embedding model family only
  • HuggingFace Inference Embeddings: Use HuggingFace Inference API with support for custom providers like Nebius
  • Custom OpenAI Endpoints: Support for OpenAI-compatible APIs with custom base URLs
  • Minimum Chunk Filtering: Automatically filter out short, low-information chunks below configurable character threshold
  • Markdown Document Support: Native support for .md files with frontmatter parsing and page boundary detection
  • Page-Based Chunking: Preserve document structure with intelligent page-level chunk boundaries
  • Semantic Chunking: Advanced content-aware chunking using embedding similarity for better context preservation
  • Local Embeddings: Run embeddings locally with HuggingFace models - no API costs, full privacy
  • Hybrid Search: Combines semantic similarity with keyword matching (BM25) for superior search quality
  • Web Interface: Modern web UI for document management and search alongside the traditional MCP protocol

Table of Contents

🚀 Quick Start

Step 1: Configure Your MCP Client

Option A: Local Embeddings w/ Hybrid Search (No API Key Required)

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp[hybrid]"], "env": { "PDFKB_KNOWLEDGEBASE_PATH": "/Users/yourname/Documents", "PDFKB_ENABLE_HYBRID_SEARCH": "true" }, "transport": "stdio", "autoRestart": true } } }

🆕 Option A2: Local GGUF Embeddings (Memory Optimized, No API Key Required)

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp[hybrid]"], "env": { "PDFKB_KNOWLEDGEBASE_PATH": "/Users/yourname/Documents", "PDFKB_LOCAL_EMBEDDING_MODEL": "Qwen/Qwen3-Embedding-0.6B-GGUF", "PDFKB_GGUF_QUANTIZATION": "Q6_K", "PDFKB_ENABLE_HYBRID_SEARCH": "true" }, "transport": "stdio", "autoRestart": true } } }

🆕 Option A3: Local Embeddings with Reranking (Best Search Quality, No API Key Required)

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp[hybrid]"], "env": { "PDFKB_KNOWLEDGEBASE_PATH": "/Users/yourname/Documents", "PDFKB_ENABLE_HYBRID_SEARCH": "true", "PDFKB_ENABLE_RERANKER": "true", "PDFKB_RERANKER_MODEL": "Qwen/Qwen3-Reranker-0.6B" }, "transport": "stdio", "autoRestart": true } } }

Option B: OpenAI Embeddings w/ Hybrid Search

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp[hybrid]"], "env": { "PDFKB_EMBEDDING_PROVIDER": "openai", "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...", "PDFKB_KNOWLEDGEBASE_PATH": "/Users/yourname/Documents", "PDFKB_ENABLE_HYBRID_SEARCH": "true" }, "transport": "stdio", "autoRestart": true } } }

🆕 Option C: HuggingFace w/ Custom Provider

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp[hybrid]"], "env": { "PDFKB_EMBEDDING_PROVIDER": "huggingface", "PDFKB_HUGGINGFACE_EMBEDDING_MODEL": "sentence-transformers/all-MiniLM-L6-v2", "PDFKB_HUGGINGFACE_PROVIDER": "nebius", "HF_TOKEN": "hf_your_token_here", "PDFKB_KNOWLEDGEBASE_PATH": "/Users/yourname/Documents", "PDFKB_ENABLE_HYBRID_SEARCH": "true" }, "transport": "stdio", "autoRestart": true } } }

🆕 Option D: Custom OpenAI-Compatible API

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp[hybrid]"], "env": { "PDFKB_EMBEDDING_PROVIDER": "openai", "PDFKB_OPENAI_API_KEY": "your-api-key", "PDFKB_OPENAI_API_BASE": "https://api.studio.nebius.com/v1/", "PDFKB_EMBEDDING_MODEL": "text-embedding-3-large", "PDFKB_KNOWLEDGEBASE_PATH": "/Users/yourname/Documents", "PDFKB_ENABLE_HYBRID_SEARCH": "true" }, "transport": "stdio", "autoRestart": true } } }

Step 3: Verify Installation

  1. Restart your MCP client completely
  2. Check for PDF KB tools: Look for add_document, search_documents, list_documents, remove_document
  3. Test functionality: Try adding a PDF and searching for content

🌐 Web Interface

The PDF Knowledgebase includes a modern web interface for easy document management and search. The web interface is disabled by default and must be explicitly enabled.

Server Modes

1. MCP Only Mode (Default):

pdfkb-mcp
  • Runs only the MCP server for integration with Claude Desktop, VS Code, etc.
  • Most resource-efficient option
  • Best for pure MCP integration

2. Integrated Mode (MCP + Web):

# Option A: Environment variable PDFKB_WEB_ENABLE=true pdfkb-mcp # Option B: Command line flag pdfkb-mcp --enable-web
  • Runs both MCP server AND web interface concurrently
  • Web interface available at http://localhost:8080
  • Best of both worlds: API integration + web UI

Web Interface Features

PDF Knowledgebase Web Interface - Documents List Modern web interface showing document collection with search, filtering, and management capabilities

  • 📄 Document Upload: Drag & drop PDF files or upload via file picker
  • 🔍 Semantic Search: Powerful vector-based search with real-time results
  • 📊 Document Management: List, preview, and manage your PDF collection
  • 📈 Real-time Status: Live processing updates via WebSocket connections
  • 🎯 Chunk Explorer: View and navigate document chunks for detailed analysis
  • ⚙️ System Metrics: Monitor server performance and resource usage

PDF Knowledgebase Web Interface - Document Summary Detailed document view showing metadata, chunk analysis, and content preview

Quick Web Setup

  1. Install and run:
    uvx pdfkb-mcp # Install if needed PDFKB_WEB_ENABLE=true pdfkb-mcp # Start integrated server
  2. Open your browser: http://localhost:8080
  3. Configure environment (create .env file):
    PDFKB_OPENAI_API_KEY=sk-proj-abc123def456ghi789... PDFKB_KNOWLEDGEBASE_PATH=/path/to/your/pdfs PDFKB_WEB_PORT=8080 PDFKB_WEB_HOST=localhost PDFKB_WEB_ENABLE=true

Web Configuration Options

Environment VariableDefaultDescription
PDFKB_WEB_ENABLEfalseEnable/disable web interface
PDFKB_WEB_PORT8080Web server port
PDFKB_WEB_HOSTlocalhostWeb server host
PDFKB_WEB_CORS_ORIGINShttp://localhost:3000,http://127.0.0.1:3000CORS allowed origins

Command Line Options

The server supports command line arguments:

# Customize web server port with web interface enabled pdfkb-mcp --enable-web --port 9000 # Use custom configuration file pdfkb-mcp --config myconfig.env # Change log level pdfkb-mcp --log-level DEBUG # Enable web interface via command line pdfkb-mcp --enable-web

API Documentation

When running with web interface enabled, comprehensive API documentation is available at:

  • Swagger UI: http://localhost:8080/docs
  • ReDoc: http://localhost:8080/redoc

🏗️ Architecture Overview

MCP Integration

Internal Architecture

Available Tools & Resources

Tools (Actions your client can perform):

Resources (Data your client can access):

  • pdf://{document_id} - Full document content as JSON
  • pdf://{document_id}/page/{page_number} - Specific page content
  • pdf://list - List of all documents with metadata

🤖 Embedding Options

The server supports three embedding providers, each with different trade-offs:

1. Local Embeddings (Default)

Run embeddings locally using HuggingFace models, eliminating API costs and keeping your data completely private.

Features:

  • Zero API Costs: No external API charges
  • Complete Privacy: Documents never leave your machine
  • Hardware Acceleration: Automatic detection of Metal (macOS), CUDA (NVIDIA), or CPU
  • Smart Caching: LRU cache for frequently embedded texts
  • Multiple Model Sizes: Choose based on your hardware capabilities

Local embeddings are enabled by default. No configuration needed for basic usage:

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"], "env": { "PDFKB_KNOWLEDGEBASE_PATH": "/path/to/pdfs" } } } }

Supported Models

🆕 Qwen3-Embedding Series Only: The server now exclusively supports the Qwen3-Embedding model family, including both standard and quantized GGUF variants for optimized performance.

Standard Models
ModelSizeDimensionsMax ContextBest For
Qwen/Qwen3-Embedding-0.6B (default)1.2GB102432K tokensBest overall - long docs, fast
Qwen/Qwen3-Embedding-4B8.0GB256032K tokensHigh quality, long context
Qwen/Qwen3-Embedding-8B16.0GB358432K tokensMaximum quality, long context
🆕 GGUF Quantized Models (Reduced Memory Usage)
ModelSizeDimensionsMax ContextBest For
Qwen/Qwen3-Embedding-0.6B-GGUF0.6GB102432K tokensQuantized lightweight, 32K context
Qwen/Qwen3-Embedding-4B-GGUF2.4GB256032K tokensQuantized high quality, 32K context
Qwen/Qwen3-Embedding-8B-GGUF4.8GB358432K tokensQuantized maximum quality, 32K context

Configure your preferred model:

# Standard models PDFKB_LOCAL_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-0.6B" # Default PDFKB_LOCAL_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-4B" PDFKB_LOCAL_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-8B" # GGUF quantized models (reduced memory usage) PDFKB_LOCAL_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-0.6B-GGUF" PDFKB_LOCAL_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-4B-GGUF" PDFKB_LOCAL_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-8B-GGUF"
🆕 GGUF Quantization Options

When using GGUF models, you can configure the quantization level to balance between model size and quality:

# Configure quantization (default: Q6_K) PDFKB_GGUF_QUANTIZATION="Q6_K" # Default - balanced size/quality PDFKB_GGUF_QUANTIZATION="Q8_0" # Higher quality, larger size PDFKB_GGUF_QUANTIZATION="F16" # Highest quality, largest size PDFKB_GGUF_QUANTIZATION="Q4_K_M" # Smaller size, lower quality

Quantization Recommendations:

  • Q6_K (default): Best balance of quality and size
  • Q8_0: Near-original quality with moderate compression
  • F16: Original quality, minimal compression
  • Q4_K_M: Maximum compression, acceptable quality loss

Hardware Optimization

The server automatically detects and uses the best available hardware:

  • Apple Silicon (M1/M2/M3): Uses Metal Performance Shaders (MPS)
  • NVIDIA GPUs: Uses CUDA acceleration
  • CPU Fallback: Optimized for multi-core processing

Force a specific device if needed:

PDFKB_EMBEDDING_DEVICE="mps" # Force Metal/MPS PDFKB_EMBEDDING_DEVICE="cuda" # Force CUDA PDFKB_EMBEDDING_DEVICE="cpu" # Force CPU

Configuration Options

# Embedding provider (local or openai) PDFKB_EMBEDDING_PROVIDER="local" # Default # Model selection (Qwen3-Embedding series only) PDFKB_LOCAL_EMBEDDING_MODEL="Qwen/Qwen3-Embedding-0.6B" # Default # Standard options: # - "Qwen/Qwen3-Embedding-0.6B" (1.2GB, 1024 dims, default) # - "Qwen/Qwen3-Embedding-4B" (8GB, 2560 dims, high quality) # - "Qwen/Qwen3-Embedding-8B" (16GB, 3584 dims, maximum quality) # GGUF quantized options (reduced memory usage): # - "Qwen/Qwen3-Embedding-0.6B-GGUF" (0.6GB, 1024 dims) # - "Qwen/Qwen3-Embedding-4B-GGUF" (2.4GB, 2560 dims) # - "Qwen/Qwen3-Embedding-8B-GGUF" (4.8GB, 3584 dims) # GGUF quantization configuration (only used with GGUF models) PDFKB_GGUF_QUANTIZATION="Q6_K" # Default quantization level # Available options: Q8_0, F16, Q6_K, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S # Performance tuning PDFKB_LOCAL_EMBEDDING_BATCH_SIZE=32 # Adjust based on memory PDFKB_EMBEDDING_CACHE_SIZE=10000 # Number of cached embeddings PDFKB_MAX_SEQUENCE_LENGTH=512 # Maximum text length # Hardware acceleration PDFKB_EMBEDDING_DEVICE="auto" # auto, mps, cuda, cpu PDFKB_USE_MODEL_OPTIMIZATION=true # Enable torch.compile optimization # Fallback options PDFKB_FALLBACK_TO_OPENAI=false # Use OpenAI if local fails

2. OpenAI Embeddings

Use OpenAI's embedding API or any OpenAI-compatible endpoint for high-quality embeddings with minimal setup.

Features:

  • High Quality: State-of-the-art embedding models
  • No Local Resources: Runs entirely in the cloud
  • Fast: Optimized API with batching support
  • 🆕 Custom Endpoints: Support for OpenAI-compatible APIs like Together, Nebius, etc.

Standard OpenAI:

{ "env": { "PDFKB_EMBEDDING_PROVIDER": "openai", "PDFKB_OPENAI_API_KEY": "sk-proj-...", "PDFKB_EMBEDDING_MODEL": "text-embedding-3-large" } }

🆕 Custom OpenAI-Compatible Endpoints:

{ "env": { "PDFKB_EMBEDDING_PROVIDER": "openai", "PDFKB_OPENAI_API_KEY": "your-api-key", "PDFKB_OPENAI_API_BASE": "https://api.studio.nebius.com/v1/", "PDFKB_EMBEDDING_MODEL": "text-embedding-3-large" } }

3. HuggingFace Embeddings

🆕 ENHANCED: Use HuggingFace's Inference API with support for custom providers and thousands of embedding models.

Features:

  • 🆕 Multiple Providers: Use HuggingFace directly or third-party providers like Nebius
  • Wide Model Selection: Access to thousands of embedding models
  • Cost-Effective: Many free or low-cost options available
  • 🆕 Provider Support: Seamlessly switch between HuggingFace and custom inference providers

Configuration:

{ "mcpServers": { "pdfkb": { "command": "pdfkb-mcp", "env": { "PDFKB_KNOWLEDGEBASE_PATH": "/path/to/your/pdfs", "PDFKB_EMBEDDING_PROVIDER": "huggingface", "PDFKB_HUGGINGFACE_EMBEDDING_MODEL": "sentence-transformers/all-MiniLM-L6-v2", "HF_TOKEN": "hf_your_token_here" } } } }

Advanced Configuration:

# Use a specific provider like Nebius PDFKB_HUGGINGFACE_PROVIDER=nebius PDFKB_HUGGINGFACE_EMBEDDING_MODEL=Qwen/Qwen3-Embedding-8B # Or use HuggingFace directly (auto/default) PDFKB_HUGGINGFACE_PROVIDER= # Leave empty for auto

Performance Tips

  1. Batch Size: Larger batches are faster but use more memory
    • Apple Silicon: 32-64 recommended
    • NVIDIA GPUs: 64-128 recommended
    • CPU: 16-32 recommended
  2. Model Selection: Choose based on your needs
    • Default (Qwen3-0.6B): Best for most users - 32K context, fast, 1.2GB
    • GGUF (Qwen3-0.6B-GGUF): Memory-optimized version - 32K context, fast, 0.6GB
    • High Quality (Qwen3-4B): Better accuracy - 32K context, 8GB
    • GGUF High Quality (Qwen3-4B-GGUF): Memory-optimized high quality - 32K context, 2.4GB
    • Maximum Quality (Qwen3-8B): Best accuracy - 32K context, 16GB
    • GGUF Maximum Quality (Qwen3-8B-GGUF): Memory-optimized maximum quality - 32K context, 4.8GB
  3. GGUF Quantization: Choose based on memory constraints
    • Q6_K (default): Best balance of quality and size
    • Q8_0: Higher quality, larger size
    • F16: Near-original quality, largest size
    • Q4_K_M: Smallest size, acceptable quality
  4. Memory Management: The server automatically handles OOM errors by reducing batch size

📝 Markdown Document Support

The server now supports Markdown documents (.md, .markdown) alongside PDFs, perfect for:

  • Pre-processed documents where you've already extracted clean markdown
  • Technical documentation and notes
  • Avoiding complex PDF parsing for better quality content
  • Faster processing with no conversion overhead

Features

  • Native Processing: Markdown files are read directly without conversion
  • Page Boundary Detection: Automatically splits documents on page markers like --[PAGE: 142]--
  • Frontmatter Support: Automatically extracts YAML/TOML frontmatter metadata
  • Title Extraction: Intelligently extracts titles from H1 headers or frontmatter
  • Same Pipeline: Uses the same chunking, embedding, and search infrastructure as PDFs
  • Mixed Collections: Search across both PDFs and Markdown documents seamlessly

Usage

Simply add Markdown files the same way you add PDFs:

# In your MCP client await add_document("/path/to/document.md") await add_document("/path/to/paper.pdf") # Search across both types results = await search_documents("your query")

Configuration

# Markdown-specific settings PDFKB_MARKDOWN_PAGE_BOUNDARY_PATTERN="--\\[PAGE:\\s*(\\d+)\\]--" # Regex pattern for page boundaries PDFKB_MARKDOWN_SPLIT_ON_PAGE_BOUNDARIES=true # Enable page boundary detection PDFKB_MARKDOWN_PARSE_FRONTMATTER=true # Parse YAML/TOML frontmatter (default: true) PDFKB_MARKDOWN_EXTRACT_TITLE=true # Extract title from first H1 (default: true)

🔄 Reranking

🆕 NEW: The server now supports advanced reranking using multiple providers to significantly improve search result relevance and quality. Reranking is a post-processing step that re-orders initial search results based on deeper semantic understanding.

Supported Providers

  1. Local Models: Qwen3-Reranker models (both standard and GGUF quantized variants)
  2. DeepInfra API: Qwen3-Reranker-8B via DeepInfra's native API

How It Works

  1. Initial Search: Retrieves limit + reranker_sample_additional candidates using hybrid/vector/text search
  2. Reranking: Uses Qwen3-Reranker to deeply analyze query-document relevance and re-score results
  3. Final Results: Returns the top limit results based on reranker scores

Supported Models

Local Models (Qwen3-Reranker Series)

Standard Models

ModelSizeBest For
Qwen/Qwen3-Reranker-0.6B (default)1.2GBLightweight, fast reranking
Qwen/Qwen3-Reranker-4B8.0GBHigh quality reranking
Qwen/Qwen3-Reranker-8B16.0GBMaximum quality reranking

🆕 GGUF Quantized Models (Reduced Memory Usage)

ModelSizeBest For
Mungert/Qwen3-Reranker-0.6B-GGUF0.3GBQuantized lightweight, very fast
Mungert/Qwen3-Reranker-4B-GGUF2.0GBQuantized high quality
Mungert/Qwen3-Reranker-8B-GGUF4.0GBQuantized maximum quality
🆕 DeepInfra Model
ModelBest For
Qwen/Qwen3-Reranker-8BHigh-quality cross-encoder reranking via DeepInfra API

Configuration

Option 1: Local Reranking (Standard Models)
# Enable reranking with local models PDFKB_ENABLE_RERANKER=true PDFKB_RERANKER_PROVIDER=local # Default # Choose reranker model PDFKB_RERANKER_MODEL="Qwen/Qwen3-Reranker-0.6B" # Default PDFKB_RERANKER_MODEL="Qwen/Qwen3-Reranker-4B" # Higher quality PDFKB_RERANKER_MODEL="Qwen/Qwen3-Reranker-8B" # Maximum quality # Configure candidate sampling PDFKB_RERANKER_SAMPLE_ADDITIONAL=5 # Default: get 5 extra candidates for reranking # Optional: specify device PDFKB_RERANKER_DEVICE="mps" # For Apple Silicon PDFKB_RERANKER_DEVICE="cuda" # For NVIDIA GPUs PDFKB_RERANKER_DEVICE="cpu" # For CPU-only
Option 2: GGUF Quantized Local Reranking (Memory Optimized)
# Enable reranking with GGUF quantized models PDFKB_ENABLE_RERANKER=true PDFKB_RERANKER_PROVIDER=local # Choose GGUF reranker model PDFKB_RERANKER_MODEL="Mungert/Qwen3-Reranker-0.6B-GGUF" # Smallest PDFKB_RERANKER_MODEL="Mungert/Qwen3-Reranker-4B-GGUF" # Balanced PDFKB_RERANKER_MODEL="Mungert/Qwen3-Reranker-8B-GGUF" # Highest quality # Configure GGUF quantization level PDFKB_RERANKER_GGUF_QUANTIZATION="Q6_K" # Balanced (recommended) PDFKB_RERANKER_GGUF_QUANTIZATION="Q8_0" # Higher quality, larger PDFKB_RERANKER_GGUF_QUANTIZATION="Q4_K_M" # Smaller, lower quality # Configure candidate sampling PDFKB_RERANKER_SAMPLE_ADDITIONAL=5 # Default: get 5 extra candidates
🆕 Option 3: DeepInfra Reranking (API-based)
# Enable reranking with DeepInfra PDFKB_ENABLE_RERANKER=true PDFKB_RERANKER_PROVIDER=deepinfra # Set your DeepInfra API key PDFKB_DEEPINFRA_API_KEY="your-deepinfra-api-key" # Optional: Choose model (default: Qwen/Qwen3-Reranker-8B) # Available: Qwen/Qwen3-Reranker-0.6B, Qwen/Qwen3-Reranker-4B, Qwen/Qwen3-Reranker-8B PDFKB_DEEPINFRA_RERANKER_MODEL="Qwen/Qwen3-Reranker-8B" # Configure candidate sampling PDFKB_RERANKER_SAMPLE_ADDITIONAL=8 # Sample 8 extra docs for reranking

About DeepInfra Reranker:

  • Supports three Qwen3-Reranker models:
    • 0.6B: Lightweight model, fastest inference
    • 4B: Balanced model with good quality and speed
    • 8B: Maximum quality model (default)
  • Optimized for high-quality cross-encoder relevance scoring
  • Pay-per-use pricing model
  • Get your API key at https://deepinfra.com
  • Note: The API requires equal-length query and document arrays, so the query is duplicated for each document internally
Complete Examples

Local Reranking with GGUF Models

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp[hybrid]"], "env": { "PDFKB_KNOWLEDGEBASE_PATH": "/Users/yourname/Documents", "PDFKB_ENABLE_HYBRID_SEARCH": "true", "PDFKB_ENABLE_RERANKER": "true", "PDFKB_RERANKER_PROVIDER": "local", "PDFKB_RERANKER_MODEL": "Mungert/Qwen3-Reranker-4B-GGUF", "PDFKB_RERANKER_GGUF_QUANTIZATION": "Q6_K", "PDFKB_RERANKER_SAMPLE_ADDITIONAL": "8", "PDFKB_LOCAL_EMBEDDING_MODEL": "Qwen/Qwen3-Embedding-0.6B-GGUF", "PDFKB_GGUF_QUANTIZATION": "Q6_K" }, "transport": "stdio", "autoRestart": true } } }

🆕 DeepInfra Reranking with Local Embeddings

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp[hybrid]"], "env": { "PDFKB_KNOWLEDGEBASE_PATH": "/Users/yourname/Documents", "PDFKB_ENABLE_HYBRID_SEARCH": "true", "PDFKB_ENABLE_RERANKER": "true", "PDFKB_RERANKER_PROVIDER": "deepinfra", "PDFKB_DEEPINFRA_API_KEY": "your-deepinfra-api-key", "PDFKB_RERANKER_SAMPLE_ADDITIONAL": "8", "PDFKB_LOCAL_EMBEDDING_MODEL": "Qwen/Qwen3-Embedding-0.6B", "PDFKB_EMBEDDING_PROVIDER": "local" }, "transport": "stdio", "autoRestart": true } } }

Performance Impact

Search Quality: Reranking typically improves search relevance by 15-30% by better understanding query intent and document relevance.

Memory Usage:

  • Local standard models: 1.2GB - 16GB depending on model size
  • GGUF quantized: 0.3GB - 4GB depending on model and quantization
  • DeepInfra: No local memory usage (API-based)

Speed:

  • Local models: Adds ~100-500ms per search
  • GGUF models: Slightly slower initial load, similar inference
  • DeepInfra: Adds ~200-800ms depending on API latency

Cost:

  • Local models: Free after initial download
  • DeepInfra: Pay-per-use based on token usage

When to Use Reranking

✅ Recommended for:

  • High-stakes searches where quality matters most
  • Complex queries requiring nuanced understanding
  • Large document collections with diverse content
  • When you have adequate hardware resources

❌ Skip reranking for:

  • Simple keyword-based searches
  • Real-time applications requiring sub-100ms responses
  • Limited memory/compute environments
  • Very small document collections (<100 documents)

GGUF Quantization Recommendations

For GGUF reranker models, choose quantization based on your needs:

  • Q6_K (recommended): Best balance of quality and size
  • Q8_0: Near-original quality with moderate compression
  • F16: Original quality, minimal compression
  • Q4_K_M: Maximum compression, acceptable quality loss
  • Q4_K_S: Small size, lower quality
  • Q5_K_M: Medium compression and quality
  • Q5_K_S: Smaller variant of Q5

The server now supports Hybrid Search, which combines the strengths of semantic similarity search (vector embeddings) with traditional keyword matching (BM25) for improved search quality.

How It Works

  1. Dual Indexing: Documents are indexed in both a vector database (ChromaDB) and a full-text search index (Whoosh)
  2. Parallel Search: Queries execute both semantic and keyword searches simultaneously
  3. Reciprocal Rank Fusion (RRF): Results are intelligently merged using RRF algorithm for optimal ranking

Benefits

  • Better Recall: Finds documents that match exact keywords even if semantically different
  • Improved Precision: Combines conceptual understanding with keyword relevance
  • Technical Terms: Excellent for technical documentation, code references, and domain-specific terminology
  • Balanced Results: Configurable weights let you adjust the balance between semantic and keyword matching

Configuration

Enable hybrid search by setting:

PDFKB_ENABLE_HYBRID_SEARCH=true # Enable hybrid search (default: true) PDFKB_HYBRID_VECTOR_WEIGHT=0.6 # Weight for semantic search (default: 0.6) PDFKB_HYBRID_TEXT_WEIGHT=0.4 # Weight for keyword search (default: 0.4) PDFKB_RRF_K=60 # RRF constant (default: 60)

Installation

To use hybrid search, install with the optional dependency:

pip install "pdfkb-mcp[hybrid]"

Or if using uvx, it's included by default when hybrid search is enabled.

🔽 Minimum Chunk Filtering

NEW: The server now supports Minimum Chunk Filtering, which automatically filters out short, low-information chunks that don't contain enough content to be useful for search and retrieval.

How It Works

Documents are processed normally through parsing and chunking, then chunks below the configured character threshold are automatically filtered out before indexing and embedding.

Benefits

  • Improved Search Quality: Eliminates noise from short, uninformative chunks
  • Reduced Storage: Less vector storage and faster search by removing low-value content
  • Better Context: Search results focus on chunks with substantial, meaningful content
  • Configurable: Set custom thresholds based on your document types and use case

Configuration

# Enable filtering (default: 0 = disabled) PDFKB_MIN_CHUNK_SIZE=150 # Filter chunks smaller than 150 characters # Examples for different use cases: PDFKB_MIN_CHUNK_SIZE=100 # Permissive - keep most content PDFKB_MIN_CHUNK_SIZE=200 # Stricter - only substantial chunks PDFKB_MIN_CHUNK_SIZE=0 # Disabled - keep all chunks (default)

Or in your MCP client configuration:

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"], "env": { "PDFKB_OPENAI_API_KEY": "sk-proj-...", "PDFKB_KNOWLEDGEBASE_PATH": "/path/to/pdfs", "PDFKB_MIN_CHUNK_SIZE": "150" } } } }

Usage Guidelines

  • Default (0): No filtering - keeps all chunks for maximum recall
  • Conservative (100-150): Good balance - removes very short chunks while preserving content
  • Aggressive (200+): Strict filtering - only keeps substantial chunks with rich content

🧩 Semantic Chunking

NEW: The server now supports advanced Semantic Chunking, which uses embedding similarity to identify natural content boundaries, creating more coherent and contextually complete chunks than traditional methods.

How It Works

  1. Sentence Embedding: Each sentence in the document is embedded using your configured embedding model
  2. Similarity Analysis: Distances between consecutive sentence embeddings are calculated
  3. Breakpoint Detection: Natural content boundaries are identified where similarity drops significantly
  4. Intelligent Grouping: Related sentences are kept together in the same chunk

Benefits

  • 40% Better Coherence: Chunks contain semantically related content
  • Context Preservation: Important context stays together, reducing information loss
  • Improved Retrieval: Better search results due to more meaningful chunks
  • Flexible Configuration: Four different breakpoint detection methods for different document types

Quick Start

Enable semantic chunking by setting:

PDFKB_PDF_CHUNKER=semantic PDFKB_SEMANTIC_CHUNKER_THRESHOLD_TYPE=percentile # Default PDFKB_SEMANTIC_CHUNKER_THRESHOLD_AMOUNT=95.0 # Default

Or in your MCP client configuration:

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp[semantic]"], "env": { "PDFKB_KNOWLEDGEBASE_PATH": "/path/to/pdfs", "PDFKB_PDF_CHUNKER": "semantic", "PDFKB_SEMANTIC_CHUNKER_THRESHOLD_TYPE": "percentile", "PDFKB_SEMANTIC_CHUNKER_THRESHOLD_AMOUNT": "95.0" } } } }

Breakpoint Detection Methods

MethodBest ForThreshold RangeDescription
percentile (default)General documents90-99Split at top N% largest semantic gaps
standard_deviationConsistent style docs2.0-4.0Split at mean + N×σ distance
interquartileNoisy documents1.0-2.0Split at mean + N×IQR, robust to outliers
gradientTechnical/legal docs90-99Analyze rate of change in similarity

Configuration Options

# Breakpoint detection method PDFKB_SEMANTIC_CHUNKER_THRESHOLD_TYPE=percentile # percentile, standard_deviation, interquartile, gradient # Threshold amount (interpretation depends on type) PDFKB_SEMANTIC_CHUNKER_THRESHOLD_AMOUNT=95.0 # For percentile/gradient: 0-100, for others: positive float # Context buffer size (sentences to include around breakpoints) PDFKB_SEMANTIC_CHUNKER_BUFFER_SIZE=1 # Default: 1 # Optional: Fixed number of chunks (overrides threshold-based splitting) PDFKB_SEMANTIC_CHUNKER_NUMBER_OF_CHUNKS= # Leave empty for dynamic # Minimum chunk size in characters PDFKB_SEMANTIC_CHUNKER_MIN_CHUNK_CHARS=100 # Default: 100 # Sentence splitting regex PDFKB_SEMANTIC_CHUNKER_SENTENCE_SPLIT_REGEX="(?<=[.?!])\\s+" # Default pattern

Tuning Guidelines

  1. For General Documents (default):
    • Use percentile with 95.0 threshold
    • Good balance between chunk size and coherence
  2. For Technical Documentation:
    • Use gradient with 90.0 threshold
    • Better at detecting technical section boundaries
  3. For Academic Papers:
    • Use standard_deviation with 3.0 threshold
    • Maintains paragraph and section integrity
  4. For Mixed Content:
    • Use interquartile with 1.5 threshold
    • Robust against varying content styles

Installation

Install with the semantic chunking dependency:

pip install "pdfkb-mcp[semantic]"

Or if using uvx:

uvx pdfkb-mcp[semantic]

Compatibility

  • Works with both local and OpenAI embeddings
  • Compatible with all PDF parsers
  • Integrates with intelligent caching system
  • Falls back to LangChain chunker if dependencies missing

🎯 Parser Selection Guide

Decision Tree

Document Type & Priority? ├── 🏃 Speed Priority → PyMuPDF4LLM (fastest processing, low memory) ├── 📚 Academic Papers → MinerU (GPU-accelerated, excellent formulas/tables) ├── 📊 Business Reports → Docling (accurate tables, structured output) ├── ⚖️ Balanced Quality → Marker (good multilingual, selective OCR) └── 🎯 Maximum Accuracy → LLM (slow, API costs, complex layouts)

Performance Comparison

ParserProcessing SpeedMemoryText QualityTable QualityBest For
PyMuPDF4LLMFastestLowGoodBasic-ModerateRAG pipelines, bulk ingestion
MinerUFast with GPU¹~4GB VRAM²ExcellentExcellentScientific/technical PDFs
Docling0.9-2.5 pages/s³2.5-6GB⁴ExcellentExcellentStructured documents, tables
Marker~25 p/s batch⁵~4GB VRAM⁶ExcellentGood-Excellent⁷Scientific papers, multilingual
LLMSlow⁸Variable⁹Excellent¹⁰ExcellentComplex layouts, high-value docs

Notes: ¹ >10,000 tokens/s on RTX 4090 with sglang ² Reported for <1B parameter model ³ CPU benchmarks: 0.92-1.34 p/s (native), 1.57-2.45 p/s (pypdfium) ⁴ 2.42-2.56GB (pypdfium), 6.16-6.20GB (native backend) ⁵ Projected on H100 GPU in batch mode ⁶ Benchmark configuration on NVIDIA A6000 ⁷ Enhanced with optional LLM mode for table merging ⁸ Order of magnitude slower than traditional parsers ⁹ Depends on token usage and model size ¹⁰ 98.7-100% accuracy when given clean text

⚙️ Configuration

Tier 1: Basic Configurations (80% of users)

Default (Recommended):

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"], "env": { "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...", "PDFKB_PDF_PARSER": "pymupdf4llm", "PDFKB_PDF_CHUNKER": "langchain", "PDFKB_EMBEDDING_MODEL": "text-embedding-3-large" }, "transport": "stdio" } } }

Speed Optimized:

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"], "env": { "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...", "PDFKB_PDF_PARSER": "pymupdf4llm", "PDFKB_CHUNK_SIZE": "800" }, "transport": "stdio" } } }

Memory Efficient:

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"], "env": { "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...", "PDFKB_PDF_PARSER": "pymupdf4llm", "PDFKB_EMBEDDING_BATCH_SIZE": "50" }, "transport": "stdio" } } }

Tier 2: Use Case Specific (15% of users)

Academic Papers:

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"], "env": { "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...", "PDFKB_PDF_PARSER": "mineru", "PDFKB_CHUNK_SIZE": "1200" }, "transport": "stdio" } } }

Business Documents:

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"], "env": { "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...", "PDFKB_PDF_PARSER": "pymupdf4llm", "PDFKB_DOCLING_TABLE_MODE": "ACCURATE", "PDFKB_DOCLING_DO_TABLE_STRUCTURE": "true" }, "transport": "stdio" } } }

Multi-language Documents:

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"], "env": { "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...", "PDFKB_PDF_PARSER": "docling", "PDFKB_DOCLING_OCR_LANGUAGES": "en,fr,de,es", "PDFKB_DOCLING_DO_OCR": "true" }, "transport": "stdio" } } }

Hybrid Search (NEW - Improved Search Quality):

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"], "env": { "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...", "PDFKB_ENABLE_HYBRID_SEARCH": "true", "PDFKB_HYBRID_VECTOR_WEIGHT": "0.6", "PDFKB_HYBRID_TEXT_WEIGHT": "0.4" }, "transport": "stdio" } } }

Semantic Chunking (NEW - Context-Aware Chunking):

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp[semantic]"], "env": { "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...", "PDFKB_PDF_CHUNKER": "semantic", "PDFKB_SEMANTIC_CHUNKER_THRESHOLD_TYPE": "gradient", "PDFKB_SEMANTIC_CHUNKER_THRESHOLD_AMOUNT": "90.0", "PDFKB_ENABLE_HYBRID_SEARCH": "true" }, "transport": "stdio" } } }

Maximum Quality:

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"], "env": { "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...", "PDFKB_OPENROUTER_API_KEY": "sk-or-v1-abc123def456ghi789...", "PDFKB_PDF_PARSER": "llm", "PDFKB_LLM_MODEL": "anthropic/claude-3.5-sonnet", "PDFKB_EMBEDDING_MODEL": "text-embedding-3-large" }, "transport": "stdio" } } }

Essential Environment Variables

VariableDefaultDescription
PDFKB_OPENAI_API_KEYrequiredOpenAI API key for embeddings
PDFKB_KNOWLEDGEBASE_PATH./pdfsDirectory containing PDF files
PDFKB_CACHE_DIR./.cacheCache directory for processing
PDFKB_PDF_PARSERpymupdf4llmParser: pymupdf4llm (default), marker, mineru, docling, llm
PDFKB_PDF_CHUNKERlangchainChunking strategy: langchain (default), page, unstructured, semantic
PDFKB_CHUNK_SIZE1000Target chunk size for LangChain chunker
PDFKB_WEB_ENABLEfalseEnable/disable web interface
PDFKB_WEB_PORT8080Web server port
PDFKB_WEB_HOSTlocalhostWeb server host
PDFKB_WEB_CORS_ORIGINShttp://localhost:3000,http://127.0.0.1:3000CORS allowed origins (comma-separated)
PDFKB_EMBEDDING_MODELtext-embedding-3-largeOpenAI embedding model (use text-embedding-3-small for faster processing)
PDFKB_MIN_CHUNK_SIZE0Minimum chunk size in characters (0 = disabled, filters out chunks smaller than this size)
PDFKB_OPENAI_API_BASEoptionalCustom base URL for OpenAI-compatible APIs (e.g., https://api.studio.nebius.com/v1/)
PDFKB_HUGGINGFACE_EMBEDDING_MODELsentence-transformers/all-MiniLM-L6-v2HuggingFace model for embeddings when using huggingface provider
PDFKB_HUGGINGFACE_PROVIDERoptionalHuggingFace provider (e.g., "nebius"), leave empty for default
PDFKB_ENABLE_HYBRID_SEARCHtrueEnable hybrid search combining semantic and keyword matching
PDFKB_HYBRID_VECTOR_WEIGHT0.6Weight for semantic search (0-1, must sum to 1 with text weight)
PDFKB_HYBRID_TEXT_WEIGHT0.4Weight for keyword/BM25 search (0-1, must sum to 1 with vector weight)
PDFKB_RRF_K60Reciprocal Rank Fusion constant (higher = less emphasis on rank differences)
PDFKB_LOCAL_EMBEDDING_MODELQwen/Qwen3-Embedding-0.6BLocal embedding model (Qwen3-Embedding series only)
PDFKB_GGUF_QUANTIZATIONQ6_KGGUF quantization level (Q8_0, F16, Q6_K, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S)
PDFKB_ENABLE_RERANKERfalseEnable/disable result reranking for improved search quality
PDFKB_RERANKER_PROVIDERlocalReranker provider: 'local' or 'deepinfra'
PDFKB_RERANKER_MODELQwen/Qwen3-Reranker-0.6BReranker model for local provider
PDFKB_RERANKER_SAMPLE_ADDITIONAL5Additional results to sample for reranking
PDFKB_RERANKER_GGUF_QUANTIZATIONoptionalGGUF quantization level (Q6_K, Q8_0, etc.)
PDFKB_DEEPINFRA_API_KEYrequiredDeepInfra API key for reranking
PDFKB_DEEPINFRA_RERANKER_MODELQwen/Qwen3-Reranker-8BDeepInfra model: 0.6B, 4B, or 8B

🖥️ MCP Client Setup

Claude Desktop

Configuration File Location:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • Linux: ~/.config/Claude/claude_desktop_config.json

Configuration:

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"], "env": { "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...", "PDFKB_KNOWLEDGEBASE_PATH": "/Users/yourname/Documents", "PDFKB_CACHE_DIR": "/Users/yourname/Documents/PDFs/.cache" }, "transport": "stdio", "autoRestart": true, "PDFKB_EMBEDDING_MODEL": "text-embedding-3-small", } } }

Verification:

  1. Restart Claude Desktop completely
  2. Look for PDF KB tools in the interface
  3. Test with "Add a document" or "Search documents"

VS Code with Native MCP Support

Configuration (.vscode/mcp.json in workspace):

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"], "env": { "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...", "PDFKB_KNOWLEDGEBASE_PATH": "${workspaceFolder}/pdfs" }, "transport": "stdio" } } }

Verification:

  1. Reload VS Code window
  2. Check VS Code's MCP server status in Command Palette
  3. Use MCP tools in Copilot Chat

VS Code with Continue Extension

Configuration (.continue/config.json):

{ "models": [...], "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"], "env": { "PDFKB_OPENAI_API_KEY": "sk-proj-abc123def456ghi789...", "PDFKB_KNOWLEDGEBASE_PATH": "${workspaceFolder}/pdfs" }, "transport": "stdio" } } }

Verification:

  1. Reload VS Code window
  2. Check Continue panel for server connection
  3. Use @pdfkb in Continue chat

Generic MCP Client

Standard Configuration Template:

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"], "env": { "PDFKB_OPENAI_API_KEY": "required", "PDFKB_KNOWLEDGEBASE_PATH": "required-absolute-path", "PDFKB_PDF_PARSER": "optional-default-pymupdf4llm" }, "transport": "stdio", "autoRestart": true, "timeout": 30000 } } }

📊 Performance & Troubleshooting

Common Issues

Server not appearing in MCP client:

// ❌ Wrong: Missing transport { "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"] } } } // ✅ Correct: Include transport and restart client { "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"], "transport": "stdio" } } }

System overload when processing multiple PDFs:

# Reduce parallel operations to prevent system stress PDFKB_MAX_PARALLEL_PARSING=1 # Process one PDF at a time PDFKB_MAX_PARALLEL_EMBEDDING=1 # Embed one document at a time PDFKB_BACKGROUND_QUEUE_WORKERS=1 # Single background worker

Processing too slow:

// Switch to faster parser and increase parallelism (if system can handle it) { "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"], "env": { "PDFKB_OPENAI_API_KEY": "sk-key", "PDFKB_PDF_PARSER": "pymupdf4llm" }, "transport": "stdio" } } }

Memory issues:

// Reduce memory usage { "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"], "env": { "PDFKB_OPENAI_API_KEY": "sk-key", "PDFKB_EMBEDDING_BATCH_SIZE": "25", "PDFKB_CHUNK_SIZE": "500" }, "transport": "stdio" } } }

Poor table extraction:

// Use table-optimized parser { "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"], "env": { "PDFKB_OPENAI_API_KEY": "sk-key", "PDFKB_PDF_PARSER": "docling", "PDFKB_DOCLING_TABLE_MODE": "ACCURATE" }, "transport": "stdio" } } }

Resource Requirements

ConfigurationRAM UsageProcessing SpeedBest For
Speed2-4 GBFastestLarge collections
Balanced4-6 GBMediumMost users
Quality6-12 GBMedium-FastAccuracy priority
GPU8-16 GBVery FastHigh-volume processing

🔧 Advanced Configuration

Parser-Specific Options

MinerU Configuration:

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"], "env": { "PDFKB_OPENAI_API_KEY": "sk-key", "PDFKB_PDF_PARSER": "mineru", "PDFKB_MINERU_LANG": "en", "PDFKB_MINERU_METHOD": "auto", "PDFKB_MINERU_VRAM": "16" }, "transport": "stdio" } } }

LLM Parser Configuration:

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"], "env": { "PDFKB_OPENAI_API_KEY": "sk-key", "PDFKB_OPENROUTER_API_KEY": "sk-or-v1-abc123def456ghi789...", "PDFKB_PDF_PARSER": "llm", "PDFKB_LLM_MODEL": "google/gemini-2.5-flash-lite", "PDFKB_LLM_CONCURRENCY": "5", "PDFKB_LLM_DPI": "150" }, "transport": "stdio" } } }

Performance Tuning

Parallel Processing Configuration:

Control the number of concurrent operations to optimize performance and prevent system overload:

# Maximum number of PDFs to parse simultaneously PDFKB_MAX_PARALLEL_PARSING=1 # Default: 1 (conservative to prevent overload) # Maximum number of documents to embed simultaneously PDFKB_MAX_PARALLEL_EMBEDDING=1 # Default: 1 (prevents API rate limits) # Number of background queue workers PDFKB_BACKGROUND_QUEUE_WORKERS=2 # Default: 2 # Thread pool size for CPU-intensive operations PDFKB_THREAD_POOL_SIZE=1 # Default: 1

Resource-Optimized Setup (for low-powered systems):

{ "env": { "PDFKB_MAX_PARALLEL_PARSING": "1", # Process one PDF at a time "PDFKB_MAX_PARALLEL_EMBEDDING": "1", # Embed one document at a time "PDFKB_BACKGROUND_QUEUE_WORKERS": "1", # Single background worker "PDFKB_THREAD_POOL_SIZE": "1" # Single thread for CPU tasks } }

High-Performance Setup (for powerful machines):

{ "env": { "PDFKB_MAX_PARALLEL_PARSING": "4", # Parse up to 4 PDFs in parallel "PDFKB_MAX_PARALLEL_EMBEDDING": "2", # Embed 2 documents simultaneously "PDFKB_BACKGROUND_QUEUE_WORKERS": "4", # More background workers "PDFKB_THREAD_POOL_SIZE": "2", # More threads for CPU tasks "PDFKB_EMBEDDING_BATCH_SIZE": "200", # Larger embedding batches "PDFKB_VECTOR_SEARCH_K": "15" # More search results } }

Complete High-Performance Setup:

{ "mcpServers": { "pdfkb": { "command": "uvx", "args": ["pdfkb-mcp"], "env": { "PDFKB_OPENAI_API_KEY": "sk-key", "PDFKB_PDF_PARSER": "mineru", "PDFKB_KNOWLEDGEBASE_PATH": "/Volumes/FastSSD/Documents/PDFs", "PDFKB_CACHE_DIR": "/Volumes/FastSSD/Documents/PDFs/.cache", "PDFKB_MAX_PARALLEL_PARSING": "4", "PDFKB_MAX_PARALLEL_EMBEDDING": "2", "PDFKB_BACKGROUND_QUEUE_WORKERS": "4", "PDFKB_THREAD_POOL_SIZE": "2", "PDFKB_EMBEDDING_BATCH_SIZE": "200", "PDFKB_VECTOR_SEARCH_K": "15", "PDFKB_FILE_SCAN_INTERVAL": "30" }, "transport": "stdio" } } }

Intelligent Caching

The server uses multi-stage caching:

Cache Invalidation Rules:

  • Changing PDFKB_PDF_PARSER → Full reset (parsing + chunking + embeddings)
  • Changing PDFKB_PDF_CHUNKER → Partial reset (chunking + embeddings)
  • Changing PDFKB_EMBEDDING_MODEL → Minimal reset (embeddings only)

📚 Appendix

Installation Options

Primary (Recommended):

uvx pdfkb-mcp **Web Interface Included**: All installation methods include the web interface. Use these commands: - `pdfkb-mcp` - MCP server only (default, web disabled) - `PDFKB_WEB_ENABLE=true pdfkb-mcp` - Integrated MCP + Web server (web enabled)

With Specific Parser Dependencies:

uvx pdfkb-mcp[marker] # Marker parser uvx pdfkb-mcp[mineru] # MinerU parser uvx pdfkb-mcp[docling] # Docling parser uvx pdfkb-mcp[llm] # LLM parser uvx pdfkb-mcp[semantic] # Semantic chunker (NEW) uvx pdfkb-mcp[unstructured_chunker] # Unstructured chunker uvx pdfkb-mcp[web] # Enhanced web features (psutil for metrics)

pip install "pdfkb-mcp[web]" # Enhanced web features Or via pip/pipx:

pip install "pdfkb-mcp[marker]" # Marker parser pip install "pdfkb-mcp[docling-complete]" # Docling with OCR and full features

Development Installation:

git clone https://github.com/juanqui/pdfkb-mcp.git cd pdfkb-mcp pip install -e ".[dev]"

Complete Environment Variables Reference

VariableDefaultDescription
PDFKB_OPENAI_API_KEYrequiredOpenAI API key for embeddings
PDFKB_OPENROUTER_API_KEYoptionalRequired for LLM parser
PDFKB_KNOWLEDGEBASE_PATH./pdfsPDF directory path
PDFKB_CACHE_DIR./.cacheCache directory
PDFKB_PDF_PARSERpymupdf4llmPDF parser selection
PDFKB_PDF_CHUNKERlangchainChunking strategy: langchain, unstructured, semantic
PDFKB_CHUNK_SIZE1000LangChain chunk size
PDFKB_CHUNK_OVERLAP200LangChain chunk overlap
PDFKB_MIN_CHUNK_SIZE0Minimum chunk size in characters (0 = disabled, filters out chunks smaller than this size)
PDFKB_EMBEDDING_MODELtext-embedding-3-largeOpenAI model
PDFKB_OPENAI_API_BASEoptionalCustom base URL for OpenAI-compatible APIs
PDFKB_HUGGINGFACE_EMBEDDING_MODELsentence-transformers/all-MiniLM-L6-v2HuggingFace model
PDFKB_HUGGINGFACE_PROVIDERoptionalHuggingFace provider (e.g., "nebius")
PDFKB_LOCAL_EMBEDDING_MODELQwen/Qwen3-Embedding-0.6BLocal embedding model (Qwen3-Embedding series only)
PDFKB_GGUF_QUANTIZATIONQ6_KGGUF quantization level (Q8_0, F16, Q6_K, Q4_K_M, Q4_K_S, Q5_K_M, Q5_K_S)
PDFKB_EMBEDDING_DEVICEautoHardware device (auto, mps, cuda, cpu)
PDFKB_USE_MODEL_OPTIMIZATIONtrueEnable torch.compile optimization
PDFKB_EMBEDDING_CACHE_SIZE10000Number of cached embeddings in LRU cache
PDFKB_MODEL_CACHE_DIR~/.cache/huggingfaceLocal model cache directory
PDFKB_ENABLE_RERANKERfalseEnable/disable result reranking
PDFKB_RERANKER_PROVIDERlocalReranker provider: 'local' or 'deepinfra'
PDFKB_RERANKER_MODELQwen/Qwen3-Reranker-0.6BReranker model for local provider
PDFKB_RERANKER_SAMPLE_ADDITIONAL5Additional results to sample for reranking
PDFKB_RERANKER_DEVICEautoHardware device for local reranker (auto, mps, cuda, cpu)
PDFKB_RERANKER_MODEL_CACHE_DIR~/.cache/pdfkb-mcp/rerankerCache directory for local reranker models
PDFKB_RERANKER_GGUF_QUANTIZATIONoptionalGGUF quantization level (Q6_K, Q8_0, etc.)
PDFKB_DEEPINFRA_API_KEYrequiredDeepInfra API key for reranking
PDFKB_DEEPINFRA_RERANKER_MODELQwen/Qwen3-Reranker-8BModel: Qwen/Qwen3-Reranker-0.6B, 4B, or 8B
PDFKB_EMBEDDING_BATCH_SIZE100Embedding batch size
PDFKB_MAX_PARALLEL_PARSING1Max concurrent PDF parsing operations
PDFKB_MAX_PARALLEL_EMBEDDING1Max concurrent embedding operations
PDFKB_BACKGROUND_QUEUE_WORKERS2Number of background processing workers
PDFKB_THREAD_POOL_SIZE1Thread pool size for CPU-intensive tasks
PDFKB_VECTOR_SEARCH_K5Default search results
PDFKB_FILE_SCAN_INTERVAL60File monitoring interval
PDFKB_LOG_LEVELINFOLogging level
PDFKB_WEB_ENABLEfalseEnable/disable web interface
PDFKB_WEB_PORT8080Web server port
PDFKB_WEB_HOSTlocalhostWeb server host
PDFKB_WEB_CORS_ORIGINShttp://localhost:3000,http://127.0.0.1:3000CORS allowed origins (comma-separated)

Parser Comparison Details

FeaturePyMuPDF4LLMMarkerMinerUDoclingLLM
SpeedFastestMediumFast (GPU)MediumSlowest
MemoryLowestMediumHighMediumLowest
TablesBasicGoodExcellentExcellentExcellent
FormulasBasicGoodExcellentGoodExcellent
ImagesBasicGoodGoodExcellentExcellent
SetupSimpleSimpleModerateSimpleSimple
CostFreeFreeFreeFreeAPI costs

Chunking Strategies

LangChain (PDFKB_PDF_CHUNKER=langchain):

  • Header-aware splitting with MarkdownHeaderTextSplitter
  • Configurable via PDFKB_CHUNK_SIZE and PDFKB_CHUNK_OVERLAP
  • Best for customizable chunking
  • Default and installed with base package

Page (PDFKB_PDF_CHUNKER=page) 🆕 NEW:

  • Page-based chunking that preserves document page boundaries
  • Works with page-aware parsers that output individual pages
  • Supports merging small pages and splitting large ones
  • Configurable via PDFKB_PAGE_CHUNKER_MIN_CHUNK_SIZE and PDFKB_PAGE_CHUNKER_MAX_CHUNK_SIZE
  • Best for preserving original document structure and page-level metadata

Semantic (PDFKB_PDF_CHUNKER=semantic):

  • Advanced semantic chunking using LangChain's SemanticChunker
  • Groups semantically related content together using embedding similarity
  • Four breakpoint detection methods: percentile, standard_deviation, interquartile, gradient
  • Preserves context and improves retrieval quality by 40%
  • Install extra: pip install "pdfkb-mcp[semantic]" to enable
  • Configurable via environment variables (see Semantic Chunking section)
  • Best for documents requiring high context preservation

Unstructured (PDFKB_PDF_CHUNKER=unstructured):

  • Intelligent semantic chunking with unstructured library
  • Zero configuration required
  • Install extra: pip install "pdfkb-mcp[unstructured_chunker]" to enable
  • Best for document structure awareness

First-run notes

  • On the first run, the server initializes caches and vector store and logs selected components:
    • Parser: PyMuPDF4LLM (default)
    • Chunker: LangChain (default)
    • Embedding Model: text-embedding-3-large (default)
  • If you select a parser/chunker that isn’t installed, the server logs a warning with the exact install command and falls back to the default components instead of exiting.

Troubleshooting Guide

API Key Issues:

  1. Verify key format starts with sk-
  2. Check account has sufficient credits
  3. Test connectivity: curl -H "Authorization: Bearer $PDFKB_OPENAI_API_KEY" https://api.openai.com/v1/models

Parser Installation Issues:

  1. MinerU: pip install mineru[all] and verify mineru --version
  2. Docling: pip install docling for basic, pip install pdfkb-mcp[docling-complete] for all features
  3. LLM: Requires PDFKB_OPENROUTER_API_KEY environment variable

Performance Optimization:

  1. Speed: Use pymupdf4llm parser (fastest, low memory footprint)
  2. Memory: Reduce PDFKB_EMBEDDING_BATCH_SIZE and PDFKB_CHUNK_SIZE; use pypdfium backend for Docling
  3. Quality: Use mineru with GPU (>10K tokens/s on RTX 4090) or marker for balanced quality
  4. Tables: Use docling with PDFKB_DOCLING_TABLE_MODE=ACCURATE or marker with LLM mode
  5. Batch Processing: Use marker on H100 (~25 pages/s) or mineru with sglang acceleration

For additional support, see implementation details in src/pdfkb/main.py and src/pdfkb/config.py.

-
security - not tested
A
license - permissive license
-
quality - not tested

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

A Model Context Protocol server that enables intelligent document search and retrieval from PDF collections, providing semantic search capabilities powered by OpenAI embeddings and ChromaDB vector storage.

  1. Table of Contents
    1. 🚀 Quick Start
      1. Step 1: Configure Your MCP Client
      2. Step 3: Verify Installation
    2. 🌐 Web Interface
      1. Server Modes
      2. Web Interface Features
      3. Quick Web Setup
      4. Web Configuration Options
      5. Command Line Options
      6. API Documentation
    3. 🏗️ Architecture Overview
      1. MCP Integration
      2. Internal Architecture
      3. Available Tools & Resources
    4. 🤖 Embedding Options
      1. 1. Local Embeddings (Default)
      2. Supported Models
      3. Hardware Optimization
      4. Configuration Options
      5. 2. OpenAI Embeddings
      6. 3. HuggingFace Embeddings
      7. Performance Tips
    5. 📝 Markdown Document Support
      1. Features
      2. Usage
      3. Configuration
    6. 🔄 Reranking
      1. Supported Providers
      2. How It Works
      3. Supported Models
      4. Configuration
      5. Performance Impact
      6. When to Use Reranking
      7. GGUF Quantization Recommendations
    7. 🔍 Hybrid Search
      1. How It Works
      2. Benefits
      3. Configuration
      4. Installation
    8. 🔽 Minimum Chunk Filtering
      1. How It Works
      2. Benefits
      3. Configuration
      4. Usage Guidelines
    9. 🧩 Semantic Chunking
      1. How It Works
      2. Benefits
      3. Quick Start
      4. Breakpoint Detection Methods
      5. Configuration Options
      6. Tuning Guidelines
      7. Installation
      8. Compatibility
    10. 🎯 Parser Selection Guide
      1. Decision Tree
      2. Performance Comparison
    11. ⚙️ Configuration
      1. Tier 1: Basic Configurations (80% of users)
      2. Tier 2: Use Case Specific (15% of users)
      3. Essential Environment Variables
    12. 🖥️ MCP Client Setup
      1. Claude Desktop
      2. VS Code with Native MCP Support
      3. VS Code with Continue Extension
      4. Generic MCP Client
    13. 📊 Performance & Troubleshooting
      1. Common Issues
      2. Resource Requirements
    14. 🔧 Advanced Configuration
      1. Parser-Specific Options
      2. Performance Tuning
      3. Intelligent Caching
    15. 📚 Appendix
      1. Installation Options
      2. Complete Environment Variables Reference
      3. Parser Comparison Details
      4. Chunking Strategies
      5. First-run notes
      6. Troubleshooting Guide

    Related MCP Servers

    • A
      security
      A
      license
      A
      quality
      A Model Context Protocol server providing vector database capabilities through Chroma, enabling semantic document search, metadata filtering, and document management with persistent storage.
      Last updated -
      6
      38
      MIT License
      • Apple
      • Linux
    • -
      security
      F
      license
      -
      quality
      A Model Context Protocol server for ingesting, chunking and semantically searching documentation files, with support for markdown, Python, OpenAPI, HTML files and URLs.
      Last updated -
      • Apple
    • A
      security
      A
      license
      A
      quality
      A Model Context Protocol (MCP) server for the Open Library API that enables AI assistants to search for book information.
      Last updated -
      6
      3
      32
      MIT License
    • -
      security
      A
      license
      -
      quality
      A Model Context Protocol server that provides intelligent file reading and semantic search capabilities across multiple document formats with security-first access controls.
      Last updated -
      5
      MIT License
      • Apple
      • Linux

    View all related MCP servers

    MCP directory API

    We provide all the information about MCP servers via our MCP API.

    curl -X GET 'https://glama.ai/api/mcp/v1/servers/juanqui/pdfkb-mcp'

    If you have feedback or need assistance with the MCP directory API, please join our Discord server