Skip to main content
Glama

PDF Knowledgebase MCP Server

by juanqui

PDF Knowledgebase MCP Server

A powerful Model Context Protocol (MCP) server that transforms your PDF and Markdown document collection into an intelligent, searchable knowledge base. Built for seamless integration with Claude Desktop, VS Code, Continue, and other MCP-enabled AI assistants.

Description

pdfkb-mcp processes your documents using advanced PDF parsing, creates semantic embeddings, and provides sophisticated search capabilities through the Model Context Protocol. Whether you're managing research papers, technical documentation, or business reports, pdfkb-mcp makes your document collection instantly searchable and accessible to your AI assistant.

Motivation

I built pdfkb-mcp because I needed a way to efficiently index and search through hundreds of semiconductor datasheets and technical documents. Traditional file search wasn't sufficient—I needed semantic understanding, context preservation, and the ability to ask complex questions about technical specifications across multiple documents. This tool has transformed how I work with technical documentation, and I'm sharing it so others can benefit from intelligent document search in their workflows.

✨ Features

🤖 Intelligent Document Processing

  • Multiple PDF Parsers: PyMuPDF4LLM (fast), Marker (balanced), Docling (tables), MinerU (academic), LLM (complex layouts)
  • Markdown Support: Native processing of .md and .markdown files with metadata extraction
  • Smart Chunking: LangChain, semantic, page-based, and unstructured chunking strategies
  • Background Processing: Non-blocking document processing with intelligent caching

🔍 Advanced Search & AI

  • Hybrid Search: Combines semantic similarity with keyword matching (BM25) for superior results
  • AI Reranking: Qwen3-Reranker models improve search relevance by 15-30%
  • Local & Remote Embeddings: Privacy-focused local models or high-performance API-based options
  • Document Summarization: Auto-generates rich metadata with titles, descriptions, and summaries

🌐 Multi-Client & Remote Access

  • MCP Protocol Support: Works with Claude Desktop, VS Code, Continue, Cline, and other MCP clients
  • Web Interface: Modern web UI for document management, search, and analysis
  • HTTP/SSE Transport: Remote access from multiple clients simultaneously
  • Docker Deployment: Production-ready containerized deployment

🔒 Privacy & Performance

  • Local-First Option: Run completely offline with local embeddings—no API costs, full privacy
  • Quantized Models: GGUF models use 50-70% less memory with maintained quality
  • Best Practices: Background processing, health checks, monitoring, and scalability

🌐 Web Interface Preview

Once your setup is complete, you'll have access to a modern web interface for document management and search:

PDF Knowledgebase Web Interface

The web interface provides document upload, real-time processing status, semantic search, and comprehensive document management capabilities.

Key Features:

  • 🔍 Real-time Search: Instant semantic and hybrid search
  • 📊 Processing Status: Live updates on document processing
  • 📈 Document Analytics: View chunks, metadata, and summaries
  • ⚙️ System Monitoring: Server performance and resource usage

🚀 Quick Start

Get up and running in minutes using Docker/Podman with DeepInfra as your AI provider.

Prerequisites

  • Container Runtime: Docker or Podman installed
  • DeepInfra API Key: Get your free key (recommended for cost-effectiveness)
  • Documents: A folder with PDF or Markdown files to index

1. Set Up Docker Compose

# Download configuration and create directories curl -o docker-compose.yml https://raw.githubusercontent.com/juanqui/pdfkb-mcp/main/docker-compose.sample.yml mkdir -p ./documents ./cache ./logs # Edit docker-compose.yml and update: # 1. Volume path: "/path/to/your/documents:/app/documents:rw" # 2. API key: PDFKB_OPENAI_API_KEY: "your-deepinfra-api-key-here"

2. Start the Server

# Using Podman (recommended) podman-compose up -d # Or using Docker docker compose up -d

Access Points:

  • Web Interface: http://localhost:8000
  • MCP Endpoint: http://localhost:8000/mcp/
  • Health Check: http://localhost:8000/health

3. Configure Your MCP Client

Claude Desktop - Add to claude_desktop_config.json:

{ "mcpServers": { "pdfkb": { "transport": "http", "url": "http://localhost:8000/mcp/" } } }

VS Code with Continue - Add to .continue/config.json:

{ "mcpServers": { "pdfkb": { "transport": "http", "url": "http://localhost:8000/mcp/" } } }

4. Add Your Documents

  • Web Interface: Open http://localhost:8000
  • File System: Copy files to your documents directory — they're automatically detected

5. Start Searching

Ask your AI assistant to search your documents:

  • "What register do I need to configure to reset charging in the nPM1300?"
  • "Is XYZ a clock capable pin according to the nRF54L15 datasheet?"
  • "What is the conversion formula to interpret temperature as celcius according to the XYZ datashet?"

The setup includes:

  • DeepInfra AI: Cost-effective embeddings, reranking, and document summarization
  • Hybrid Search: Semantic + keyword matching
  • Document Summarization: Auto-generated metadata (i.e. title, description)
  • Web Interface: Document management UI
  • Persistent Storage: Documents and cache preserved

📚 User Guide

For complete documentation, configuration options, and advanced features:

👉 View the Complete User Guide

The user guide includes:

License

This project is licensed under the MIT License - see the LICENSE file for details.

-
security - not tested
A
license - permissive license
-
quality - not tested

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

A Model Context Protocol server that enables intelligent document search and retrieval from PDF collections, providing semantic search capabilities powered by OpenAI embeddings and ChromaDB vector storage.

  1. Description
    1. Motivation
  2. ✨ Features
    1. 🤖 Intelligent Document Processing
    2. 🔍 Advanced Search & AI
    3. 🌐 Multi-Client & Remote Access
    4. 🔒 Privacy & Performance
  3. 🌐 Web Interface Preview
    1. 🚀 Quick Start
      1. Prerequisites
      2. 1. Set Up Docker Compose
      3. 2. Start the Server
      4. 3. Configure Your MCP Client
      5. 4. Add Your Documents
      6. 5. Start Searching
    2. 📚 User Guide
      1. License

        Related MCP Servers

        • A
          security
          A
          license
          A
          quality
          A Model Context Protocol server providing vector database capabilities through Chroma, enabling semantic document search, metadata filtering, and document management with persistent storage.
          Last updated -
          6
          38
          MIT License
          • Apple
          • Linux
        • -
          security
          F
          license
          -
          quality
          A Model Context Protocol server for ingesting, chunking and semantically searching documentation files, with support for markdown, Python, OpenAPI, HTML files and URLs.
          Last updated -
          • Apple
        • A
          security
          A
          license
          A
          quality
          A Model Context Protocol (MCP) server for the Open Library API that enables AI assistants to search for book information.
          Last updated -
          6
          2
          35
          MIT License
        • -
          security
          A
          license
          -
          quality
          A Model Context Protocol server that provides intelligent file reading and semantic search capabilities across multiple document formats with security-first access controls.
          Last updated -
          5
          MIT License
          • Apple
          • Linux

        View all related MCP servers

        MCP directory API

        We provide all the information about MCP servers via our MCP API.

        curl -X GET 'https://glama.ai/api/mcp/v1/servers/juanqui/pdfkb-mcp'

        If you have feedback or need assistance with the MCP directory API, please join our Discord server