How do I use MCP Server Knowledge Engine?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@MCP Server Knowledge Engine Search for 'data privacy' near 'encryption' in the documentation" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

MCP Server Knowledge Engine

by lhstorm

Overview Schema Related Servers Score Discussions

Python

Local

MCP Server Knowledge Engine

A powerful Model Context Protocol (MCP) server that transforms any PDF document collection into an intelligent, searchable knowledge base accessible through Claude Desktop. This server features advanced search capabilities using TF-IDF scoring, proximity matching, and domain-specific optimization.

🌟 Key Features

🔍 Advanced Search Engine: TF-IDF-based inverted index with proximity matching for highly relevant results
📄 Universal PDF Support: Process any PDF collection - technical docs, legal papers, research, and more
⚡ High Performance: Cached search index, incremental processing, and background initialization
🎯 Domain Optimization: Configure domain-specific keywords for enhanced search accuracy
⚙️ Fully Configurable: JSON-based configuration with environment variable support
🛠️ Comprehensive CLI: Complete server management through intuitive commands
🔗 Seamless MCP Integration: Ready-to-use with Claude Desktop, VS Code, and other MCP clients
📊 Smart Caching: MD5 hash-based change detection for efficient updates

Related MCP server: PDFDashboardWithMCP

📋 Quick Start

Prerequisites

Python 3.8 or higher
pip (Python package manager)
Claude Desktop app (for MCP integration)

1. Installation

# Clone the repository
git clone https://github.com/lhstorm/mcp_server_knowledge_engine.git
cd mcp_server_knowledge_engine

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Create Your Server

# Interactive setup
python manage_server.py create-config

# This will ask you for:
# - Server name (e.g., 'legal-docs-server')
# - Display name (e.g., 'Legal Documents Server')
# - PDF folder location
# - Domain-specific keywords

3. Add PDF Documents

# Add individual PDFs
python manage_server.py add-pdf /path/to/document.pdf
python manage_server.py add-pdf /path/to/another-doc.pdf

# Or copy PDFs directly to your configured folder

4. Process Documents

# Convert PDFs to searchable format
python manage_server.py process-pdfs

5. Generate MCP Configuration

# Generate configuration for Claude Desktop
python generate_mcp_config.py --merge

# Or get the config to copy manually
python generate_mcp_config.py

6. Start Using with Claude

Restart Claude Desktop and your server will appear in the MCP tools menu!

💬 Using with Claude Desktop

Once configured, you can interact with your PDFs naturally:

Example prompts:

"Search for information about [topic] in the documentation"
"What does the documentation say about [specific feature]?"
"Find all references to [keyword] across all PDFs"
"Show me the content of [document name]"
"List all available documents"

Advanced usage:

"Search for [term1] near [term2]" - Leverages proximity matching
"Get page 15 of [document]" - Retrieves specific pages
"Find the top 10 results for [query]" - Adjusts result count

📁 Project Structure

mcp_server_knowledge_engine/
├── server.py              # Main MCP server with search engine
├── config.py              # Configuration management & validation
├── manage_server.py       # CLI for server management
├── generate_mcp_config.py # MCP configuration generator
├── convert_pdfs.py        # Standalone PDF conversion utility
├── server_config.json     # Active server configuration
├── requirements.txt       # Python dependencies
├── examples/              # Example configurations
│   ├── legal_docs_config.json
│   ├── medical_docs_config.json
│   ├── research_papers_config.json
│   └── tech_docs_config.json
└── your-pdfs/             # Your PDF folder (configurable)
    ├── document1.pdf
    ├── document2.pdf
    └── markdown/          # Auto-generated cache
        ├── .pdf_cache.json      # Processing metadata
        ├── .search_index.pkl    # Cached search index
        ├── document1.md         # Converted documents
        └── document2.md

⚙️ Configuration

The server is configured via server_config.json:

{
  "server": {
    "name": "my-docs-server",
    "display_name": "My Documents Server", 
    "description": "Search through my PDF collection",
    "version": "1.0.0"
  },
  "storage": {
    "pdf_folder": "./docs",
    "markdown_folder": "./docs/markdown",
    "domain_keywords": ["keyword1", "keyword2", "domain-term"]
  },
  "tools": {
    "search": {
      "name": "search_docs",
      "description": "Search through PDF documentation"
    },
    "list": {
      "name": "list_docs", 
      "description": "List all available documents"
    },
    "content": {
      "name": "get_document_content",
      "description": "Get full content from documents"
    },
    "max_results_default": 5
  },
  "processing": {
    "cache_enabled": true,
    "parallel_processing": true,
    "max_file_size_mb": 50,
    "context_size": 500
  }
}

🛠️ Management Commands

Server Management

# Create new configuration
python manage_server.py create-config

# Test configuration
python manage_server.py test

# Generate MCP config
python manage_server.py generate-mcp-config

PDF Management

# List all PDFs
python manage_server.py list-pdfs

# Add PDF
python manage_server.py add-pdf document.pdf

# Remove PDF  
python manage_server.py remove-pdf document.pdf

# Process all PDFs
python manage_server.py process-pdfs

MCP Configuration

# Print MCP config
python generate_mcp_config.py

# Automatically merge with Claude Desktop config
python generate_mcp_config.py --merge

# Save to file
python generate_mcp_config.py --output my_mcp_config.json

💡 Usage Examples

Legal Documents Server

{
  "server": {
    "name": "legal-docs-server",
    "display_name": "Legal Documents Server"
  },
  "storage": {
    "domain_keywords": ["contract", "liability", "jurisdiction", "plaintiff", "defendant"]
  }
}

Technical Documentation Server

{
  "server": {
    "name": "tech-docs-server", 
    "display_name": "Technical Documentation Server"
  },
  "storage": {
    "domain_keywords": ["API", "function", "class", "method", "parameter", "return"]
  }
}

Research Papers Server

{
  "server": {
    "name": "research-server",
    "display_name": "Research Papers Server"
  },
  "storage": {
    "domain_keywords": ["hypothesis", "methodology", "results", "conclusion", "analysis"]
  }
}

🔧 Available MCP Tools

Each server provides three configurable tools:

Search Tool (default: search_docs)
- Intelligent search through all documents
- TF-IDF scoring with proximity matching
- Returns relevant excerpts with context
List Tool (default: list_docs)
- Lists all available documents
- Shows document metadata and page counts
Content Tool (default: get_document_content)
- Retrieves full document content
- Can fetch specific pages
- Includes complete markdown formatting

🎯 Domain Customization

The server adapts to your domain through:

Domain Keywords: Configure terms important to your field
Tool Names: Customize tool names (e.g., search_legal_docs)
Descriptions: Tailor descriptions for your use case
Context Size: Adjust how much context to return in search results

🔍 How the Search Engine Works

Inverted Index Architecture

The server uses an advanced inverted index for lightning-fast searches:

Document Processing: PDFs are converted to markdown and tokenized
Index Building: Words are mapped to their locations (document, page, position)
TF-IDF Scoring:
- TF (Term Frequency): How often a word appears in a document
- IDF (Inverse Document Frequency): How rare a word is across all documents
- Combined score ensures relevant, unique results rank higher

Search Features

Proximity Boosting: Multi-word queries score higher when terms appear close together
Context Extraction: Returns relevant snippets with search terms highlighted
Domain Keyword Recognition: Configured keywords get special treatment
Page-Level Precision: Results include specific page numbers
Smart Caching: Search index persists between server restarts

📊 Performance Optimizations

Incremental Processing: MD5 hash-based change detection - only new/modified PDFs are processed
Persistent Search Index: Pickled index loads instantly on server restart
Background Initialization: Server accepts connections while building index
Memory Efficiency: Streaming PDF processing and markdown storage
Configurable Limits: Control file size limits and processing parameters

🐛 Troubleshooting

Common Issues & Solutions

Server not appearing in Claude Desktop:

Ensure MCP configuration was merged: python generate_mcp_config.py --merge
Check Python path: which python or where python (Windows)
Verify server_config.json exists and is valid JSON
Restart Claude Desktop after configuration changes

PDFs not processing:

Check folder permissions: ls -la /path/to/pdf/folder
Verify PDF files aren't corrupted: file document.pdf
Look for errors in stderr: python server.py 2>error.log
Ensure sufficient disk space for markdown cache

Search returns no/poor results:

Initial indexing may take time - check stderr for progress
Verify markdown files exist: ls markdown/*.md
Check search index exists: ls markdown/.search_index.pkl
Try single-word queries first, then expand
Review domain keywords in configuration

Server crashes or hangs:

Check Python version (3.8+ required): python --version
Verify all dependencies installed: pip install -r requirements.txt
Clear cache and reprocess: rm -rf markdown/.pdf_cache.json markdown/.search_index.pkl
Check for file locking issues on Windows

Debug Mode

# Run with full debug output
python server.py 2>&1 | tee debug.log

# Check server initialization
grep "initialization" debug.log

# Monitor PDF processing
grep "Processing\|Error" debug.log

Validation Commands

# Test configuration validity
python manage_server.py test

# Verify configuration loading
python -c "from config import load_config_from_env_or_file; c=load_config_from_env_or_file(); print(f'✓ Config loaded: {c.server.name}')"

# Check MCP integration
python generate_mcp_config.py  # Should output valid JSON

🚀 Advanced Usage

Multiple Servers

You can run multiple specialized servers:

# Legal documents server
python manage_server.py --config legal_config.json create-config

# Technical docs server  
python manage_server.py --config tech_config.json create-config

# Research papers server
python manage_server.py --config research_config.json create-config

Batch Processing

# Process multiple PDF folders
for folder in docs legal_docs tech_docs; do
    python convert_pdfs.py "$folder" "$folder/markdown"
done

Custom Keywords

Configure domain-specific keywords for better search relevance:

{
  "storage": {
    "domain_keywords": [
      "algorithm", "data structure", "complexity",
      "optimization", "performance", "scalability"
    ]
  }
}

🏗️ Architecture Overview

Core Components

SearchIndex Class (server.py:27-140)
- Implements inverted index with TF-IDF scoring
- Handles word tokenization and document indexing
- Provides proximity-based ranking for multi-word queries
GenericPDFServer Class (server.py:142-661)
- Main server implementation with MCP protocol handling
- Manages PDF processing pipeline
- Handles async operations and background initialization
Configuration System (config.py)
- Dataclass-based type-safe configuration
- JSON schema validation
- Environment variable support
Management CLI (manage_server.py)
- Interactive configuration creation
- PDF management operations
- Server testing and validation

Data Flow

PDFs → PDF Reader → Markdown Converter → Search Index → MCP Tools → Claude
         ↓                    ↓                ↓
    [.pdf files]      [.md cache files]  [.search_index.pkl]

🔄 Current Server Configuration

The repository currently includes a configuration for QuantConnect documentation (server_config.json). To create your own server:

# Option 1: Interactive setup
python manage_server.py create-config

# Option 2: Copy and modify an example
cp examples/tech_docs_config.json server_config.json
# Edit server_config.json with your settings

📚 Example Use Cases

Legal Firms: Search through contracts, case files, and legal documents
Research Labs: Query scientific papers and technical reports
Software Teams: Access API documentation and technical specs
Medical Practices: Search patient records and medical literature
Educational Institutions: Browse course materials and textbooks

🤝 Contributing

We welcome contributions! Here are some ways to help:

Enhancement Ideas

Document Format Support: Add support for Word, HTML, or other formats
Search Improvements: Implement semantic search, fuzzy matching, or ML-based ranking
Performance: Add database backend, parallel processing, or distributed indexing
Tools: Create specialized MCP tools for specific domains
UI: Build a web interface for configuration management

Development Guidelines

Follow existing code style and patterns
Add tests for new functionality
Update documentation for new features
Submit PRs with clear descriptions

🔐 Security Considerations

The server only has read access to specified PDF folders
No external network calls are made during operation
Sensitive data remains local - nothing is sent to external services
Configure appropriate file permissions for your PDF folders

📄 License

This project is open source. See LICENSE file for details.

🙏 Acknowledgments

Built with the Model Context Protocol by Anthropic.

Ready to transform your PDFs into a searchable knowledge base?

Run python manage_server.py create-config to get started! 🚀

📦 Dependencies

mcp: Model Context Protocol SDK for building MCP servers
PyPDF2: PDF parsing and text extraction
asyncio: Asynchronous I/O for concurrent operations
jsonschema: JSON validation for configuration files

All dependencies are lightweight and have minimal system requirements.

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Related MCP Servers

Local Vector Store MCP Server
Vector Databases RAG Systems Search
Paddione
F
license
-
quality
D
maintenance
Enables document search and retrieval using TF-IDF vector similarity across HTML and PDF files. Provides ingest, query, and vector store management capabilities through both HTTP API and MCP stdio interfaces.
Last updated 2025-09-08
PDFDashboardWithMCP
RAG Systems Search
dakshp26
A
license
-
quality
C
maintenance
Enables MCP clients to list indexed PDF document collections and perform semantic search queries on them using locally extracted text and embeddings.
Last updated 2026-05-20
AGPL 3.0
RAG Database with Model
RAG Systems Vector Databases Search
Human-center
F
license
-
quality
D
maintenance
Enables AI assistants like Claude, ChatGPT, and Gemini to search and query local PDF documents using natural language through vector embeddings and the MCP protocol.
Last updated 2025-09-14
RAG MCP Server
RAG Systems Vector Databases Search
devsinsight
F
license
-
quality
C
maintenance
Indexes PDF documents into Qdrant and exposes semantic search as MCP tools, enabling RAG-based interactions with your documents.
Last updated 2026-06-29

View all related MCP servers

Related MCP Connectors

Dewey
Agentic search over your Dewey document collections from any MCP-compatible client.
OntoRamp Knowledge Cartographer
Knowledge coverage map and health score. Ingest docs into a governed knowledge graph via MCP.
agentready-mcp
Query any docs site via MCP. Submit a URL, ask questions, get cited answers.

View all MCP Connectors

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lhstorm/mcp_server_knowledge_engine'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

MCP Server Knowledge Engine

🌟 Key Features

📋 Quick Start

Prerequisites

1. Installation

2. Create Your Server

3. Add PDF Documents

4. Process Documents

5. Generate MCP Configuration

6. Start Using with Claude

💬 Using with Claude Desktop

📁 Project Structure

⚙️ Configuration

🛠️ Management Commands

Server Management

PDF Management

MCP Configuration

💡 Usage Examples

Legal Documents Server

Technical Documentation Server

Research Papers Server

🔧 Available MCP Tools

🎯 Domain Customization

🔍 How the Search Engine Works

Inverted Index Architecture

Search Features

📊 Performance Optimizations

🐛 Troubleshooting

Common Issues & Solutions

Debug Mode

Validation Commands

🚀 Advanced Usage

Multiple Servers

Batch Processing

Custom Keywords

🏗️ Architecture Overview

Core Components

Data Flow

🔄 Current Server Configuration

📚 Example Use Cases

🤝 Contributing

Enhancement Ideas

Development Guidelines

🔐 Security Considerations

📄 License

🙏 Acknowledgments

📦 Dependencies

Maintenance

Resources

Looking for Admin?

Related MCP Servers

Local Vector Store MCP Server

PDFDashboardWithMCP

RAG Database with Model

RAG MCP Server

Related MCP Connectors

Latest Blog Posts

MCP directory API