Uses .env files for configuration management, storing MongoDB connection strings, API keys, and other system parameters.
Supports repository cloning as part of the installation process, allowing users to obtain the codebase through Git.
Hosts the project repository, enabling users to access the source code and contribute through GitHub's platform.
Ingests and chunks markdown documents with hierarchical headers, preserving document structure for retrieval and contextual understanding.
Leverages MongoDB Atlas Vector Search for document retrieval, storing document chunks and embeddings with parent-child relationships to enable semantic search capabilities.
Built on Python, requiring version 3.10+ for operation and providing Python-based scripts for document ingestion and server execution.
MCP Document Search System
A vector search system for document retrieval using MongoDB Atlas Vector Search and Voyage AI embeddings.
Sample data included is for Atlas Vector Search!
Features
- Ingests and chunks markdown documents with hierarchical headers
- Generates embeddings using Voyage AI's contextual embeddings API
- Stores documents and embeddings in MongoDB with parent-child relationships
- Provides a FastMCP server for semantic document search
- Supports configurable vector dimensions and chunking strategies
Available MCP Tools
The document search server provides these tools:
- search_documents_vector(query: str, limit: int = 5)
- Primary search method using vector similarity
- Returns document chunks with metadata and similarity scores
- Best for semantic/meaning-based queries
- search_documents_lexicaly(query: str, limit: int = 1)
- Fallback search using lexical/text matching
- Returns full parent documents with search scores
- Useful when vector search doesn't find good matches
- get_parent_document(parent_id: str)
- Retrieves the complete parent document by ID
- Returns original content and file path
- Use after search to get full context for a chunk
Prerequisites
- Python 3.10+
- MongoDB Atlas cluster with vector search enabled
- Voyage AI API key
Installation
- Clone the repository:
- Install dependencies:
- Create a
.env
file based onsample.env
with your credentials
Usage
- Ingest documents in the docs/ directory:
- Run the search server:
Running the search server won't do much, other than verify your MongoDB URI is correct, you will need to plug this MCP server into an MCP client like Claude Desktop. Here's a sample config:
Configuration
Copy sample.env
to .env
and Edit to configure:
- MongoDB connection string
- Database and collection names
- Voyage AI API key
- Vector dimensions (256 default)
Future Improvements
- Implement hybrid search combining vector and text search using
$rankFusion
(when MongoDB 8.1 is GA on Atlas) - Support additional file formats (PDF, Word, etc.) with Docling
Contributing
Pull requests are welcome! For major changes, please open an issue first.
Author
Pat Wendorf
pat.wendorf@mongodb.com
GitHub: patw
License
This server cannot be installed
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
A vector search system that enables semantic retrieval of document chunks using MongoDB Atlas Vector Search and Voyage AI embeddings, allowing users to search documents by meaning rather than just keywords.
Related MCP Servers
- AsecurityAlicenseAqualityAn MCP server implementation that provides tools for retrieving and processing documentation through vector search, enabling AI assistants to augment their responses with relevant documentation contextLast updated -730211TypeScriptMIT License
- -securityFlicense-qualityEnables AI assistants to enhance their responses with relevant documentation through a semantic vector search, offering tools for managing and processing documentation efficiently.Last updated -3037TypeScript
- -securityAlicense-qualityEnables semantic search across multiple Qdrant vector database collections, supporting multi-query capability and providing semantically relevant document retrieval with configurable result counts.Last updated -46TypeScriptMIT License
- -securityAlicense-qualityProvides tools for retrieving and processing documentation through vector search, enabling AI assistants to augment their responses with relevant documentation context.Last updated -30TypeScriptMIT License