Integrates with local Ollama instances to utilize embedding and language models for semantic vector search and natural language answer generation.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Hybrid RAG Project MCP ServerSearch my local documents for the 2024 project milestones and budget."
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Hybrid RAG Project
A generalized Retrieval-Augmented Generation (RAG) system with hybrid search capabilities that works with any documents you provide. Combines semantic (dense vector) search and keyword (sparse BM25) search for optimal document retrieval, with an MCP server API for easy integration.
šÆ Key Features: Multi-format support ⢠Local LLM ⢠Claude Desktop integration ⢠Structured data queries ⢠Document-type-aware retrieval
š Quick Start (No MCP Required!)
You don't need Claude Desktop or MCP to use this project! Just run:
That's it! Ask questions about the 43,835 document chunks in the sample dataset.
š See for complete usage instructions. š Browse all documentation in the docs/ folder or start with docs/README.md.
Overview
This project implements a hybrid RAG system that combines:
Semantic Search: Dense vector embeddings for understanding meaning and context
Keyword Search: BM25 sparse retrieval for exact keyword matching
Hybrid Fusion: Reciprocal Rank Fusion (RRF) to combine results from both methods
MCP Server: Both REST API and Model Context Protocol server for Claude integration
Multi-format Support: Automatically loads documents from various file formats
The hybrid approach ensures better retrieval accuracy by leveraging the strengths of both search methods.
Features
Vector-based semantic search using Chroma and Ollama embeddings
BM25 keyword search for exact term matching
Ensemble retriever with Reciprocal Rank Fusion (RRF)
Integration with local Ollama LLM for answer generation
Support for multiple document formats (TXT, PDF, MD, DOCX, CSV)
Automated document loading from data directory
RESTful API server with
/ingestand/queryendpointsModel Context Protocol (MCP) server for Claude Desktop/API integration
Configuration-driven architecture (no hardcoded values)
Persistent vector store for faster subsequent queries
Architecture
Prerequisites
Python 3.9+
Ollama installed and running locally
Required Ollama models:
llama3.1:latest(or another LLM model)nomic-embed-text(or another embedding model)
Installing Ollama
Visit ollama.ai to download and install Ollama for your platform.
After installation, pull the required models:
Verify Ollama is running:
Installation
Clone the repository:
Create a virtual environment:
Install dependencies:
Project Structure
Sample Data (UCSC Extension Project)
This repository includes 13 sample data files for demonstration and testing purposes. These files represent a realistic business scenario for TechVision Electronics and are designed to showcase the system's capabilities across multiple document types.
š Included Sample Files
Structured Data (CSV) - 7 files:
product_catalog.csv- Product inventory with specifications (5,000 rows)inventory_levels.csv- Stock levels and warehouse data (10,000 rows)sales_orders_november.csv- Monthly sales transactions (8,000 rows)warranty_claims_q4.csv- Customer warranty claims (3,000 rows)production_schedule_dec2024.csv- Manufacturing schedule (4,000 rows)supplier_pricing.csv- Vendor pricing information (6,000 rows)shipping_manifests.csv- Shipping and logistics data (5,000 rows)
Unstructured Data (Markdown) - 5 files:
customer_feedback_q4_2024.md- Customer reviews and feedback (600 chunks)market_analysis_2024.md- Market research and trends (400 chunks)quality_control_report_nov2024.md- QC findings and issues (501 chunks)return_policy_procedures.md- Policy documentation (300 chunks)support_tickets_summary.md- Technical support summary (700 chunks)
Text Data - 1 file:
product_specifications.txt- Technical specifications (334 chunks)
Total Dataset:
41,000 CSV rows (chunked into 41,000 documents at 10 rows per chunk)
2,835 text/markdown chunks (chunked at 1000 chars with 200 char overlap)
43,835 total searchable document chunks
šÆ Purpose
These sample files are included to:
Demonstrate the system's hybrid search capabilities
Test both semantic (vector) and lexical (keyword) retrieval
Validate document-type-aware retrieval architecture
Provide immediate working examples without additional setup
Showcase cross-document query synthesis
š Testing Results
Comprehensive testing results are documented in TESTING_RESULTS.md, showing:
ā 100% retrieval success rate across all document types
ā 17 test queries with detailed results
ā Performance metrics and comparative analysis
ā Semantic vs Lexical vs Hybrid search comparison
š” Using the Sample Data
Quick Start:
For Production Use: To use your own data instead:
Remove or backup the sample files from
data/Add your own documents (TXT, PDF, MD, DOCX, CSV)
Re-run ingestion
Optionally uncomment data exclusions in
.gitignore
Modify this file to:
Use different Ollama models
Change the data directory location
Adjust retrieval parameters (k values)
Configure server host/port
Change vector store persistence location
Usage
Option 1: Command Line Script
Add your documents to the
data/directory:
Run the script:
The script will:
Load all supported documents from the
data/directoryInitialize Ollama embeddings and LLM
Create vector and BM25 retrievers
Build the hybrid RAG chain
Execute sample queries and display results
Option 2: REST API Server
Start the REST API server:
The server will start on http://localhost:8000
To stop the server: Press Ctrl+C for graceful shutdown
Ingest documents (do this first):
Response:
Query documents:
Response:
Check server status:
API Endpoints
Endpoint | Method | Description |
| GET | Health check |
| POST | Load documents from data/ directory |
| POST | Query documents with hybrid search |
| GET | Get system status and configuration |
Option 3: Claude Desktop/API via MCP
The MCP (Model Context Protocol) server allows Claude to directly query your local RAG system.
Setup for Claude Desktop
First, add documents to your data directory:
Edit the to use the correct absolute path:
Add this configuration to Claude Desktop:
On macOS:
# Copy the configuration mkdir -p ~/Library/Application\ Support/Claude # Edit the file and add your MCP server configuration nano ~/Library/Application\ Support/Claude/claude_desktop_config.jsonOn Windows:
%APPDATA%\Claude\claude_desktop_config.jsonOn Linux:
~/.config/Claude/claude_desktop_config.jsonRestart Claude Desktop
In Claude Desktop, you'll now see the MCP tools available. You can ask Claude:
"Use the ingest_documents tool to load my documents"
"Query my documents about [your question]"
"Check the status of the RAG system"
Available MCP Tools
Claude will have access to these tools:
Document Ingestion & Search:
ingest_documents: Start loading and indexing documents asynchronously from the data/ directoryget_ingestion_status: Monitor the progress of document ingestion (percentage, current file, stage)query_documents: Query the documents using hybrid search (semantic + keyword)get_status: Check the RAG system status
Structured Data Queries (for CSV files):
list_datasets: List all available CSV datasets with columns and row countscount_by_field: Count rows where a field matches a value (e.g., "count people named Michael")filter_dataset: Get all rows matching field criteria (e.g., "all people from Company X")get_dataset_stats: Get statistics about a dataset (rows, columns, memory usage)
Async Ingestion with Progress Tracking
The ingestion process now runs asynchronously with real-time progress updates:
Non-blocking: Ingestion runs in the background
Progress tracking: See percentage complete (0-100%)
File-level updates: Know which file is currently being processed
Stage information: Loading files (0-80%) ā Building index (80-100%) ā Completed
Status monitoring: Check progress at any time with
get_ingestion_status
Example Usage with Claude
Structured Data Queries
For CSV files, use structured query tools for exact counts and filtering:
When to use each approach:
Structured queries (
count_by_field,filter_dataset): For exact counts, filtering, and structured dataSemantic search (
query_documents): For conceptual questions, understanding content, summarization
Supported File Formats
The system automatically loads and processes these formats:
.txt- Plain text files.pdf- PDF documents.md- Markdown files.docx- Microsoft Word documents.csv- CSV files
Simply drop any supported files into the data/ directory!
How It Works
Document Loading
The DocumentLoaderUtility class:
Scans the
data/directory recursivelyIdentifies supported file formats
Uses appropriate loaders for each format
Adds metadata (source file, file type) to each document
Returns a list of
Documentobjects ready for indexing
Hybrid Retrieval
The EnsembleRetriever uses Reciprocal Rank Fusion (RRF) to:
Retrieve top-k results from vector search (semantic)
Retrieve top-k results from BM25 search (keyword)
Assign reciprocal rank scores to each result
Combine scores to produce a unified ranking
Return the most relevant documents overall
This approach handles:
Semantic queries ("How do I request time off?")
Keyword queries ("PTO form HR-42")
Complex queries benefiting from both methods
Customization
Using Different Models
Edit config/config.yaml to change models:
Adjusting Retrieval Parameters
Modify the k values in config/config.yaml:
Adding More File Format Support
Edit src/hybrid_rag/document_loader.py to add more loaders:
Customizing the Prompt
Edit the prompt template in scripts/run_demo.py or scripts/mcp_server.py:
Development Workflow
Add documents to
data/directoryModify configuration in
config/config.yamlas neededTest with command line:
python scripts/run_demo.pyDeploy MCP server:
python scripts/mcp_server.pyIntegrate via API in your applications
Troubleshooting
"Error connecting to Ollama"
Ensure Ollama is installed and running
Check that the Ollama service is accessible at the configured URL
Verify models are downloaded:
ollama list
"No documents found in data directory"
Add files to the
data/directoryEnsure files have supported extensions (.txt, .pdf, .md, .docx, .csv)
Check the
config/config.yamldata directory path is correct
"ModuleNotFoundError"
Ensure virtual environment is activated:
source .venv/bin/activateReinstall dependencies:
pip install -r requirements.txt
Poor Retrieval Results
Add more relevant documents to the
data/directoryAdjust
kvalues inconfig/config.yamlTry different embedding models
Ensure query terminology matches document content
API Errors
Ensure you call
/ingestbefore/queryCheck server logs for detailed error messages
Verify Ollama is running and accessible
Check that documents were successfully loaded
Example: Complete Workflow
Dependencies
Core libraries:
langchain: Framework for LLM applicationslangchain-community: Community integrationslangchain-ollama: Ollama integrationchromadb: Vector database for embeddingsrank-bm25: BM25 implementation for keyword searchfastapi: Web framework for APIuvicorn: ASGI serverpyyaml: YAML configuration parsing
Document loaders:
pypdf: PDF processingpython-docx: Word document processingunstructured: Markdown and other formats
Performance Tips
Vector Store Persistence: The vector store is persisted to disk (
chroma_db/) after ingestion, making subsequent queries faster.Batch Processing: When adding many documents, use the
/ingestendpoint once rather than multiple times.Retrieval Parameters: Lower
kvalues (e.g., 2-3) are faster and often sufficient for small document sets.Model Selection: Smaller embedding models are faster but may sacrifice some accuracy.
License
This project is provided as-is for educational and demonstration purposes.
Contributing
Feel free to submit issues, fork the repository, and create pull requests for any improvements.
Resources
Changelog
Version 2.0.0
Generalized system to work with any documents
Added
data/directory for document ingestionCreated
DocumentLoaderUtilityfor multi-format supportRestructured project to follow Python best practices (src layout)
Moved all configuration to
config/directoryMoved all documentation to
docs/directoryCreated proper Python package structure with
setup.pyOrganized scripts into
scripts/directoryUpdated all import paths and documentation
Version 1.0.0
Initial implementation with sample HR documents
Basic hybrid search with vector and BM25 retrievers