Performs RAG (Retrieval-Augmented Generation) lookups using vector embeddings and semantic search to query documents and code repositories, with support for PDF text extraction, ChromaDB vector storage, and relevance-scored results.
RooCode-RAG-Lookup
RooCode MCP Server for performing RAG (Retrieval-Augmented Generation) lookups in documents and code repositories using vector embeddings and semantic search.
Example Usage
Ask a question: e.g. "What is the maximum number of entries* in a word document?" and prompt the LLM stating "use rag". The LLM is usally a decent judge of when it should use a tool or not and may decide to use the tool on its own.
*This is related to the maximum number of XML properties and elements addressable in Word
Features
Full RAG Implementation: Complete vector-based semantic search using ChromaDB and Haystack
Document Indexing: Automatic text extraction and chunking from PDF documents
Vector Embeddings: Sentence transformer embeddings for semantic similarity
RAG Lookup Tool: Search through documents and code repositories with relevance scoring
Test Tool: Simple hello world tool to verify MCP server connectivity
Async MCP Protocol: Full JSON-RPC 2.0 support via stdio
Installation
Install Python dependencies:
Configure RooCode to use this MCP server by adding the configuration from
mcp_config.jsonto your RooCode settings.
Configuration
Add the
mcp_config.jsonto your RooCode MCP server settings in the edit global settings part of MCP tools. If the tool is ready to use it will show a green status.Set the following environment variables:
RAG_LOOKUP_PATH: Path to this project directoryPYTHON_PATH: Path to your Python executable
Configure parameters in
parameters.py:EMBEDDING_MODEL: Sentence transformer model (default: all-mpnet-base-v2)COLLECTION_NAME: ChromaDB collection nameCHUNK_SIZE: Text chunk size in words (default: 500)CHUNK_OVERLAP: Overlap between chunks (default: 50)DEFAULT_TOP_K: Number of results to return (default: 5)
Available Tools
1. rag_lookup
Perform semantic search using RAG in documents and code repositories. Returns relevant chunks with similarity scores and metadata.
Parameters:
query(required): The search querysource(optional): Where to search - "documents", "repos", or "both" (default: "both")
Returns:
Relevant text chunks with similarity scores
Source file information and metadata
Statistics on documents searched
Example:
Response Format:
2. say_hello
Simple test tool that returns a greeting message with timestamp.
Parameters:
name(optional): Name to include in greeting (default: "World")
Example:
Usage
1. Extract and Index Documents
Place PDF documents in the Documents/ or Repos/ folders, then run:
2. Query the RAG System
3. Use via MCP Server
Once configured in RooCode, use the rag_lookup tool through the MCP interface. There is an MCP menu in RooCode settings editing the global settings will give you json settings to edit {"mcpServers":{}}, copy and paste the mcp_config.json into the global MCP settings.
Testing
Test the MCP server locally:
Project Structure
Technology Stack
MCP Python SDK: Protocol implementation for RooCode integration
Haystack: Document processing and RAG pipeline framework
ChromaDB: Vector database for embeddings storage
Sentence Transformers: Semantic embeddings (all-mpnet-base-v2)
PDFPlumber: PDF text extraction with layout preservation
Async/Await: Concurrent request handling
JSON-RPC 2.0: Communication protocol
Stdio Transport: RooCode integration
How It Works
Document Extraction: PDFs are parsed using
parse_pdf.pywhich extracts text and metadataText Chunking: Documents are split into overlapping chunks using
DocumentSplitterEmbedding Generation: Text chunks are converted to 768-dimensional vectors using sentence transformers
Vector Storage: Embeddings are stored in ChromaDB with metadata for retrieval
Semantic Search: Queries are embedded and matched against stored vectors using cosine similarity
Result Ranking: Top-K most relevant chunks are returned with scores and metadata
Requirements
See requirements.txt for full dependencies. Key packages:
mcp>=1.0.0- MCP protocol supporthaystack-ai- RAG frameworkchroma-haystack- ChromaDB integrationsentence-transformers- Embedding modelspdfplumber- PDF extraction
License
MIT