MCP VectorStore Server

A Model Context Protocol (MCP) server that provides advanced vector store operations for document search, PDF processing, and information retrieval. This server wraps the functionality from vectorstore.py into a standardized MCP interface.

Features

Vector Store Operations: Create, search, and manage document vector stores
PDF Processing: Extract and index content from PDF documents using LLMSherpa
Semantic Search: Advanced document search using HuggingFace embeddings
Web Search Integration: Google, Wikipedia, and DuckDuckGo search capabilities
File Operations: Read and process local files
Mathematical Calculations: Built-in calculator functionality

Prerequisites

System Requirements

Python: 3.8 or higher
Operating System: Linux, macOS, or Windows
Memory: Minimum 4GB RAM (8GB+ recommended for large document collections)
Storage: At least 2GB free space for models and vector stores
Network: Internet connection for downloading models and web searches

Optional GPU Support

For improved performance with large document collections:

CUDA: 11.8 or higher
GPU: NVIDIA GPU with 4GB+ VRAM
cuDNN: Compatible version for your CUDA installation

Installation

Step 1: Clone or Download the Repository

# If you have the files locally, navigate to the directory cd /path/to/McpDocServer # Or clone from a repository (if available) # git clone <repository-url> # cd McpDocServer

Step 2: Create a Virtual Environment

# Create a virtual environment python3 -m venv venv # Activate the virtual environment # On Linux/macOS: source venv/bin/activate # On Windows: # venv\Scripts\activate

Step 3: Install Dependencies

# Upgrade pip pip install --upgrade pip # Install all required packages pip install -r requirements.txt

Step 4: Install LLMSherpa (Optional but Recommended)

For optimal PDF processing, install LLMSherpa locally:

# Install LLMSherpa pip install llmsherpa # Start the LLMSherpa server (in a separate terminal) llmsherpa --port 5001

Step 5: Download Embedding Models

The server will automatically download the required embedding model on first use, but you can pre-download it:

# Download the embedding model python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/all-mpnet-base-v2')"

Configuration

Environment Variables

Create a .env file in the project directory:

# LLMSherpa API URL (use local if available, otherwise cloud) LLMSHERPA_API_URL=http://localhost:5001/api/parseDocument?renderFormat=all # Vector store directory VECTORSTORE_DIR=/path/to/your/documents # User agent for web scraping USER_AGENT=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 # Optional: CUDA device for GPU acceleration CUDA_VISIBLE_DEVICES=0

Directory Structure

Prepare your document directory:

your_documents/ ├── pdfs/ │ ├── document1.pdf │ ├── document2.pdf │ └── ... ├── text_files/ │ ├── notes.txt │ └── ... └── other_documents/ └── ...

Usage

Starting the MCP Server

# Make the server executable chmod +x mcp_vectorstore_server.py # Start the server on linux python /home/em/McpDocServer/mcp_vectorstore_server.py or windows with wsl wsl -d Ubuntu-24.04 bash -c "/mnt/c/Users/emanu/Desktop/McpDocServer/start_mcp.sh"

Using with MCP Clients

0. Claude Desktop

Add to your MCP configuration:

{ "mcpServers": { "vectorstore": { "command": "python", "args": ["/home/em/McpDocServer/mcp_vectorstore_server.py"], "env": { "PYTHONPATH": "/home/em/McpDocServer/McpDocServer" } } } }

1. GitHub Copilot

Click on Configure Tools in the GitHub Copilot Chat window:
Click on Add More Tools in the top search bar.
Click on Add MCP Server in the top search bar.
Click on command (stdio) in the top search bar.
Enter command to run:
python /home/em/McpDocServer/mcp_vectorstore_server.py or on windows: wsl -d Ubuntu-24.04 /mnt/c/Users/emanu/Desktop/McpDocServer/start_mcp.sh
Enter mcp server id / name e.g. McpDocServer-19be5552
Configure settings.json

{ "security.workspace.trust.untrustedFiles": "open", "python.defaultInterpreterPath": "/mnt/c/Users/emanu/Desktop/LLM/venv/venv/bin/python", "terminal.integrated.inheritEnv": false, "git.openRepositoryInParentFolders": "never", "terminal.integrated.scrollback": 100000, "mcp": { "servers": { "McpDocServer-19be5552": { "type": "stdio", "command": "python", "args": [ "/mnt/c/Users/emanu/Desktop/McpDocServer/mcp_vectorstore_server.py" ] } } } }

Check if the following tools are available in the mcp server tool list when you click on Configure Tools in the GitHub Copilot Chat window and scroll to bottom: vectorstore_search vectorstore_create vectorstore_info vectorstore_clear read_file google_search wikipedia_search duckduckgo_search calculate
Select Agent mode in GitHub Copilot Chat window and use vectorstore_search to get information: use vectorstore_search to get information on unit testing 11)Confirm tool call usage.

2. Continue MCP CLient

name: McpDocServer version: 1.0.1 schema: v1 mcpServers: - name: McpDocServer command: wsl -d Ubuntu-24.04 args: - "/mnt/c/Users/emanu/Desktop/McpDocServer/start_mcp.sh" env: {} mcp_timeout: 180 # set timeout to 180 sec timeout: 9999 connectionTimeout: 120000 # 120 seconds = 2 minutes

3. Other MCP Clients

Configure your MCP client to use the server:

# Example with a generic MCP client mcp-client --server python --args /path/to/McpDocServer/mcp_vectorstore_server.py

Available Tools

Vector Store Operations

`vectorstore_search`

Search the vector store for relevant documents.

Parameters:

query (string, required): Search query
k (integer, optional): Number of results (default: 2)

Example:

{ "name": "vectorstore_search", "arguments": { "query": "machine learning algorithms", "k": 5 } }

`vectorstore_create`

Create a new vector store from documents in a directory.

Parameters:

directory_path (string, required): Path to directory containing documents

Example:

{ "name": "vectorstore_create", "arguments": { "directory_path": "/home/user/documents/research_papers" } }

`vectorstore_info`

Get information about the current vector store.

Example:

{ "name": "vectorstore_info", "arguments": {} }

`vectorstore_clear`

Clear all documents from the vector store.

Example:

{ "name": "vectorstore_clear", "arguments": {} }

File Operations

`read_file`

Read the contents of a file on the system.

Parameters:

filename (string, required): Path to the file to read

Example:

{ "name": "read_file", "arguments": { "filename": "/home/user/documents/notes.txt" } }

Web Search Operations

`google_search`

Search Google for information.

Parameters:

query (string, required): Search query
max_results (integer, optional): Maximum number of results (default: 3)

Example:

{ "name": "google_search", "arguments": { "query": "latest AI developments 2024", "max_results": 5 } }

`wikipedia_search`

Search Wikipedia for information.

Parameters:

query (string, required): Search query

Example:

{ "name": "wikipedia_search", "arguments": { "query": "artificial intelligence" } }

`duckduckgo_search`

Search DuckDuckGo for information.

Parameters:

query (string, required): Search query

Example:

{ "name": "duckduckgo_search", "arguments": { "query": "privacy-focused search engines" } }

Utility Operations

`calculate`

Perform mathematical calculations.

Parameters:

operation (string, required): Mathematical operation to perform

Example:

{ "name": "calculate", "arguments": { "operation": "2 + 2 * 3" } }

Resources

The server provides the following resources:

`vectorstore://info`

Returns information about the current vector store in JSON format.

Example Response:

{ "num_documents": 150, "directory": "/home/user/documents", "embeddings_model": "sentence-transformers/all-mpnet-base-v2" }

Troubleshooting

Common Issues

1. Import Errors

Problem: ModuleNotFoundError for various packages Solution: Ensure all dependencies are installed:

pip install -r requirements.txt

2. CUDA/GPU Issues

Problem: CUDA-related errors Solution: Install CPU-only versions:

pip uninstall faiss-gpu torch pip install faiss-cpu

3. LLMSherpa Connection Issues

Problem: Cannot connect to LLMSherpa API Solution:

Start LLMSherpa server: llmsherpa --port 5001
Or use cloud API by updating the URL in the code

4. Memory Issues

Problem: Out of memory errors with large documents Solution:

Reduce chunk size in the text splitter
Use smaller embedding models
Process documents in batches

5. Permission Issues

Problem: Cannot read files or directories Solution: Check file permissions:

chmod 644 /path/to/documents/* chmod 755 /path/to/documents/

Performance Optimization

For Large Document Collections

Use GPU acceleration:
# In vectorstore.py, ensure CUDA is enabled model_kwargs={'device': 'cuda'}
Optimize chunk size:
# Adjust in PDFVectorStoreTool.__init__ chunk_size=1000, # Smaller chunks for better performance chunk_overlap=100,
Batch processing:
# Process documents in smaller batches batch_size = 10

For Better Search Results

Adjust similarity threshold:
# In vectorstore_search method similarity_threshold = 0.7
Use different embedding models:
# Try different models for better results model_name="sentence-transformers/all-MiniLM-L6-v2" # Faster model_name="sentence-transformers/all-mpnet-base-v2" # Better quality

Development

Project Structure

McpDocServer/ ├── mcp_vectorstore_server.py # Main MCP server ├── vectorstore.py # Original vectorstore implementation ├── requirements.txt # Python dependencies ├── README.md # This documentation └── .env # Environment variables (create this)

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

Testing

# Run basic functionality tests python -c " from mcp_vectorstore_server import * print('Server imports successfully') " # Test vector store operations python -c " from vectorstore import PDFVectorStoreTool tool = PDFVectorStoreTool() print(f'Vector store initialized with {tool.vectorstore_get_num_items()} documents') "

License

This project is provided as-is for educational and research purposes. Please ensure you comply with the licenses of all included dependencies.

Support

For issues and questions:

Check the troubleshooting section above
Review the error logs
Ensure all dependencies are correctly installed
Verify your system meets the requirements

Changelog

Version 1.0.0

Initial release
MCP server implementation
Vector store operations
Web search integration
File operations
Mathematical calculations

This server cannot be installed

security - not tested

license - permissive license

quality - not tested

How are these scores calculated?

Related Resources

GitHub Repository

Need Help?

Report Issue

Related MCP Servers

Simple Document Processing MCP Server
cablate
A
security
A
license
A
quality
Provides comprehensive document processing, including reading, converting, and manipulating various document formats with advanced text and HTML processing capabilities.
Last updated -
16
61
16
MIT License
RAG Documentation MCP Server
jumasheff
-
security
A
license
-
quality
Provides tools for retrieving and processing documentation through vector search, enabling AI assistants to augment their responses with relevant documentation context.
Last updated -
10
MIT License
FastMCP Document Analyzer
Tathagat017
-
security
F
license
-
quality
A comprehensive document analysis server that performs sentiment analysis, keyword extraction, readability scoring, and text statistics while providing document management capabilities including storage, search, and organization.
Last updated -
AVS Document Search System
patw
-
security
A
license
-
quality
A vector search system that enables semantic retrieval of document chunks using MongoDB Atlas Vector Search and Voyage AI embeddings, allowing users to search documents by meaning rather than just keywords.
Last updated -
2
MIT License

View all related MCP servers

MCP VectorStore Server

Features

Prerequisites

System Requirements

Optional GPU Support

Installation

Step 1: Clone or Download the Repository

Step 2: Create a Virtual Environment

Step 3: Install Dependencies

Step 4: Install LLMSherpa (Optional but Recommended)

Step 5: Download Embedding Models

Configuration

Environment Variables

Directory Structure

Usage

Starting the MCP Server

Using with MCP Clients

0. Claude Desktop

1. GitHub Copilot

2. Continue MCP CLient

3. Other MCP Clients

Available Tools

Vector Store Operations

vectorstore_search

vectorstore_create

vectorstore_info

vectorstore_clear

File Operations

read_file

Web Search Operations

google_search

wikipedia_search

duckduckgo_search

Utility Operations

calculate

Resources

vectorstore://info

Troubleshooting

Common Issues

1. Import Errors

2. CUDA/GPU Issues

3. LLMSherpa Connection Issues

4. Memory Issues

5. Permission Issues

Performance Optimization

For Large Document Collections

For Better Search Results

Development

Project Structure

Contributing

Testing

License

Support

Changelog

Version 1.0.0

Related Resources

Related MCP Servers

Simple Document Processing MCP Server

RAG Documentation MCP Server

FastMCP Document Analyzer

AVS Document Search System

Appeared in Searches

New MCP Servers

MCP directory API

`vectorstore_search`

`vectorstore_create`

`vectorstore_info`

`vectorstore_clear`

`read_file`

`google_search`

`wikipedia_search`

`duckduckgo_search`

`calculate`

`vectorstore://info`