RAG MCP Server
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@RAG MCP Serversearch my documents for information about neural networks"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
RAG MCP Server
A Model Context Protocol (MCP) server for Retrieval-Augmented Generation (RAG) operations. This server provides tools for building and querying vector-based knowledge bases from document collections, enabling semantic search and document retrieval capabilities.
Features
Document Processing: Supports multiple file formats (.txt, .pdf) with automatic text extraction
Intelligent Chunking: Configurable text chunking with overlap to preserve context
Vector Embeddings: Uses SentenceTransformers for high-quality text embeddings
Semantic Search: FAISS-powered similarity search for fast and accurate retrieval
Incremental Updates: Smart document tracking to only process new or changed files
Persistent Storage: SQLite-based document store for metadata and change tracking
Flexible Configuration: Customizable embedding models, chunk sizes, and search parameters
Related MCP server: Better Qdrant MCP Server
Architecture
rag-mcp-server/
├── src/rag_mcp_server/
│ ├── server.py # Main MCP server implementation
│ └── core/
│ ├── document_processor.py # Document loading and chunking
│ ├── embedding_service.py # Text embedding generation
│ ├── faiss_index.py # Vector similarity search
│ └── document_store.py # Document metadata storageInstallation
Using uvx (Recommended)
# Install with uvx (comes with uv)
uvx rag-mcp-serverUsing pip
pip install rag-mcp-serverFrom source
git clone <repository-url>
cd rag-mcp-server
pip install -e .Setup
The easiest way to run the MCP server is with uvx, but manual setup is also available.
Find the MCP settings file for the client
Claude Desktop
Install Claude Desktop as needed
Open the config file by opening the Claude Desktop app, going into its Settings, opening the 'Developer' tab, and clicking the 'Edit Config' button
Follow the 'Set up the MCP server' steps below
Claude Code
Install Claude Code as needed
Run the following command to add the RAG server:
claude mcp add ragOr manually add with custom configuration:
claude mcp add-json rag '{"command":"uvx","args":["rag-mcp-server","--knowledge-base","/path/to/your/docs","--embedding-model","all-MiniLM-L6-v2","--chunk-size","1000","--chunk-overlap","200"]}'
Cursor
Install Cursor as needed
Open the config file by opening Cursor, going into 'Cursor Settings' (not the normal VSCode IDE settings), opening the 'MCP' tab, and clicking the 'Add new global MCP server' button
Follow the 'Set up the MCP server' steps below
Cline
Install Cline in your IDE as needed
Open the config file by opening your IDE, opening the Cline sidebar, clicking the 'MCP Servers' icon button that is second from left at the top, opening the 'Installed' tab, and clicking the 'Configure MCP Servers' button
Follow the 'Set up the MCP server' steps below
Windsurf
Install Windsurf as needed
Open the config file by opening Windsurf, going into 'Windsurf Settings' (not the normal VSCode IDE settings), opening the 'Cascade' tab, and clicking the 'View raw config' button in the 'Model Context Protocol (MCP) Servers' section
Follow the 'Set up the MCP server' steps below
Any other client
Find the MCP settings file, usually something like
[client]_mcp_config.jsonFollow the 'Set up the MCP server' steps below
Set up the MCP server
Install uv as needed (uvx comes bundled with uv)
Add the following to your MCP setup:
Basic Configuration:
{ "mcpServers": { "rag": { "command": "uvx", "args": ["rag-mcp-server"] } } }Full Configuration with All Parameters:
{ "mcpServers": { "rag": { "command": "uvx", "args": [ "rag-mcp-server", "--knowledge-base", "/path/to/your/documents", "--embedding-model", "ibm-granite/granite-embedding-278m-multilingual", "--chunk-size", "500", "--chunk-overlap", "200", "--top-k", "7", "--verbose" ] } } }
Variant: Manual setup with uvx
If you prefer to run the server manually or need specific Python version:
# Run with default settings
uvx rag-mcp-server
# Run with all parameters specified
uvx rag-mcp-server \
--knowledge-base /path/to/documents \
--embedding-model "ibm-granite/granite-embedding-278m-multilingual" \
--chunk-size 500 \
--chunk-overlap 200 \
--top-k 7 \
--verbose
# Run from source directory
uvx --from . rag-mcp-server \
--knowledge-base /home/user/documents \
--embedding-model "all-MiniLM-L6-v2" \
--chunk-size 800 \
--chunk-overlap 100 \
--top-k 5Usage Examples
Sample LLM Queries
Here are example queries you can use with your LLM to interact with the RAG server:
Initialize a knowledge base with custom parameters:
Initialize the knowledge base with:
- knowledge_base_path: "/home/user/research_papers"
- embedding_model: "ibm-granite/granite-embedding-278m-multilingual"
- chunk_size: 300
- chunk_overlap: 50Search with specific parameters:
Search for "machine learning optimization techniques" in the knowledge base at "/home/user/research_papers" and return the top 10 results with similarity scores.Initialize with high-quality embeddings:
Set up a knowledge base at "/data/technical_docs" using the "all-mpnet-base-v2" model with chunk_size of 1000 and chunk_overlap of 400 for better context preservation.Refresh and get statistics:
Refresh the knowledge base at "/home/user/documents" to include any new files, then show me the statistics including total documents, chunks, and current configuration.List and search documents:
List all documents in the knowledge base, then search for information about "API authentication" and show me the top 5 most relevant chunks.Complex workflow example:
1. Initialize a knowledge base at "/home/user/project_docs" with embedding_model "all-MiniLM-L6-v2", chunk_size 800, and chunk_overlap 150
2. Show me the statistics
3. Search for "database optimization strategies"
4. List all documents that were processedMultilingual search example:
Initialize the knowledge base at "/docs/international" using the multilingual model "ibm-granite/granite-embedding-278m-multilingual", then search for "machine learning" in multiple languages and show the top 7 results.Command Line Examples
High-Quality Configuration for Research:
uvx rag-mcp-server \
--knowledge-base /home/tommasomariaungetti/RAG \
--embedding-model "all-mpnet-base-v2" \
--chunk-size 1000 \
--chunk-overlap 400 \
--top-k 10 \
--verboseFast Processing for Large Document Sets:
uvx rag-mcp-server \
--knowledge-base /data/large_corpus \
--embedding-model "all-MiniLM-L6-v2" \
--chunk-size 2000 \
--chunk-overlap 100 \
--top-k 5Multilingual Document Processing:
uvx rag-mcp-server \
--knowledge-base /docs/multilingual \
--embedding-model "ibm-granite/granite-embedding-278m-multilingual" \
--chunk-size 500 \
--chunk-overlap 200 \
--top-k 7Running from Source with Custom Settings:
uvx --from . rag-mcp-server \
--embedding-model "all-MiniLM-L6-v2" \
--chunk-size 800 \
--chunk-overlap 100 \
--top-k 5 \
--knowledge-base /home/tommasomariaungetti/RAGMCP Tools
The following tools are available:
1. initialize_knowledge_base
Initialize a knowledge base from a directory of documents.
Parameters:
knowledge_base_path(optional): Path to document directory - defaults to server configembedding_model(optional): Model name for embeddings - defaults to "ibm-granite/granite-embedding-278m-multilingual"chunk_size(optional): Maximum chunk size in characters - defaults to 500chunk_overlap(optional): Chunk overlap size in characters - defaults to 200
Example Tool Call:
{
"tool": "initialize_knowledge_base",
"arguments": {
"knowledge_base_path": "/path/to/docs",
"embedding_model": "all-mpnet-base-v2",
"chunk_size": 1000,
"chunk_overlap": 200
}
}Example LLM Query:
"Initialize a knowledge base from /home/user/documents using the all-mpnet-base-v2 embedding model with 1000 character chunks and 200 character overlap"
2. semantic_search
Perform semantic search on the knowledge base.
Parameters:
query: Search query textknowledge_base_path(optional): Path to knowledge base - defaults to current KBtop_k(optional): Number of results to return - defaults to 7include_scores(optional): Include similarity scores - defaults to false
Example Tool Call:
{
"tool": "semantic_search",
"arguments": {
"query": "How to implement RAG systems?",
"knowledge_base_path": "/path/to/docs",
"top_k": 5,
"include_scores": true
}
}Example LLM Query:
"Search for 'machine learning optimization techniques' and show me the top 5 results with similarity scores"
3. refresh_knowledge_base
Update the knowledge base with new or changed documents.
Parameters:
knowledge_base_path(optional): Path to knowledge base - defaults to current KB
Example Tool Call:
{
"tool": "refresh_knowledge_base",
"arguments": {
"knowledge_base_path": "/path/to/docs"
}
}Example LLM Query:
"Refresh the knowledge base to include any new or modified documents"
4. get_knowledge_base_stats
Get detailed statistics about the knowledge base.
Parameters:
knowledge_base_path(optional): Path to knowledge base - defaults to current KB
Example Tool Call:
{
"tool": "get_knowledge_base_stats",
"arguments": {
"knowledge_base_path": "/path/to/docs"
}
}Example LLM Query:
"Show me the statistics for the knowledge base including document count, chunk information, and current configuration"
5. list_documents
List all documents in the knowledge base with metadata.
Parameters:
knowledge_base_path(optional): Path to knowledge base - defaults to current KB
Example Tool Call:
{
"tool": "list_documents",
"arguments": {
"knowledge_base_path": "/path/to/docs"
}
}Example LLM Query:
"List all documents in the knowledge base with their chunk counts and metadata"
Technical Details
Document Processing
The system uses a sophisticated document processing pipeline:
File Detection: Scans directories for supported file types
Content Extraction:
Plain text files: Direct UTF-8/Latin-1 reading
PDF files: PyMuPDF-based text extraction
Text Chunking:
Splits documents into manageable chunks
Preserves word boundaries
Maintains context with configurable overlap
Embedding Generation
Default Model:
ibm-granite/granite-embedding-278m-multilingualBatch Processing: Efficient batch encoding for large document sets
Fallback Support: Automatic fallback to
all-MiniLM-L6-v2if primary model failsProgress Tracking: Visual progress bars for large operations
Vector Search
Index Type: FAISS IndexFlatIP (Inner Product)
Similarity Metric: Cosine similarity (via L2 normalization)
Performance: Scales to millions of documents
Accuracy: Exact nearest neighbor search
Document Store
Storage: SQLite database
Tracking: File hash, modification time, chunk count
Incremental Updates: Only processes changed files
Location: Stored alongside knowledge base documents
Configuration Examples
MCP Client Configurations
Basic Configuration (Claude Desktop/Cursor/Cline):
{
"mcpServers": {
"rag": {
"command": "uvx",
"args": ["rag-mcp-server"]
}
}
}Full Configuration with All Parameters:
{
"mcpServers": {
"rag": {
"command": "uvx",
"args": [
"rag-mcp-server",
"--knowledge-base", "/path/to/documents",
"--embedding-model", "ibm-granite/granite-embedding-278m-multilingual",
"--chunk-size", "500",
"--chunk-overlap", "200",
"--top-k", "7",
"--verbose"
]
}
}
}Multiple Knowledge Base Configuration:
{
"mcpServers": {
"rag-technical": {
"command": "uvx",
"args": [
"rag-mcp-server",
"--knowledge-base", "/docs/technical",
"--embedding-model", "all-mpnet-base-v2",
"--chunk-size", "1000",
"--chunk-overlap", "400"
]
},
"rag-research": {
"command": "uvx",
"args": [
"rag-mcp-server",
"--knowledge-base", "/docs/research",
"--embedding-model", "all-MiniLM-L6-v2",
"--chunk-size", "500",
"--chunk-overlap", "100",
"--port", "8001"
]
}
}
}Command Line Examples
High-Quality Configuration for Research:
uvx rag-mcp-server \
--knowledge-base /path/to/research/docs \
--embedding-model "all-mpnet-base-v2" \
--chunk-size 1000 \
--chunk-overlap 400 \
--top-k 10Fast Processing Configuration:
uvx rag-mcp-server \
--knowledge-base /path/to/large/corpus \
--embedding-model "all-MiniLM-L6-v2" \
--chunk-size 2000 \
--chunk-overlap 100 \
--top-k 5Multilingual Configuration:
uvx rag-mcp-server \
--knowledge-base /path/to/multilingual/docs \
--embedding-model "ibm-granite/granite-embedding-278m-multilingual" \
--chunk-size 500 \
--chunk-overlap 200 \
--top-k 7Development Configuration with Verbose Logging:
uvx --from . rag-mcp-server \
--knowledge-base ./test_documents \
--embedding-model "all-MiniLM-L6-v2" \
--chunk-size 300 \
--chunk-overlap 50 \
--top-k 3 \
--verboseError Handling
The server implements comprehensive error handling:
File Access Errors: Graceful handling of permission issues
Encoding Errors: Automatic encoding detection and fallback
Model Loading Errors: Fallback to default models
Database Errors: Transaction rollback and recovery
Search Errors: Informative error messages
Performance Considerations
Memory Usage
Embeddings are stored in memory for fast search
Approximate memory:
num_chunks × embedding_dimension × 4 bytesExample: 10,000 chunks × 384 dimensions ≈ 15 MB
Processing Speed
Document processing: ~100-500 docs/minute (depending on size)
Embedding generation: ~50-200 chunks/second (model dependent)
Search latency: <10ms for 100K documents
Optimization Tips
Use smaller embedding models for faster processing
Increase chunk size for fewer chunks (may reduce accuracy)
Decrease overlap for faster processing (may lose context)
Use SSD storage for document store database
Development
Running Tests
pytest tests/Code Formatting
black src/
isort src/Type Checking
mypy src/Troubleshooting
Common Issues
"No knowledge base path provided"
Solution: Either provide path in tool call or use
--knowledge-baseflag
"Model mismatch detected"
Solution: This is a warning; the system will use the closest available model
"Failed to initialize embedding model"
Solution: Check internet connection or use a locally cached model
"No documents found in knowledge base"
Solution: Ensure directory contains .txt or .pdf files
Debug Mode
Enable verbose logging for troubleshooting:
uvx rag-mcp-server --verboseHelp and Resources
Contributing
Contributions are welcome! Please:
Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request
License
MIT License - see LICENSE file for details.
Acknowledgments
Built on MCP (Model Context Protocol)
Powered by Sentence Transformers
Vector search by FAISS
PDF processing by PyMuPDF
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/tungetti/rag-mcp-server'
If you have feedback or need assistance with the MCP directory API, please join our Discord server