Skip to main content
Glama

AVS Document Search System

by patw
rag.md4.5 kB
# Retrieval-Augmented Generation (RAG) with Atlas Vector Search Retrieval-Augmented Generation (RAG) is an architecture that enhances Large Language Models (LLMs) with additional data to generate more accurate responses. By combining an LLM with a retrieval system powered by MongoDB Atlas Vector Search, you can create applications that leverage both the language understanding of LLMs and the specific data stored in your Atlas clusters. ## Why Use RAG? LLMs face several limitations that can impact the quality of their responses: - **Stale Data**: LLMs are trained on static datasets, meaning their knowledge is limited to information available at training time. - **No Access to Local Data**: LLMs cannot access personal or organizational data, limiting their ability to provide domain-specific responses. - **Hallucinations**: Incomplete or outdated training data can lead to LLMs generating inaccurate information. RAG addresses these limitations by implementing three core steps: ingestion, retrieval, and generation. During ingestion, custom data is stored as vector embeddings in Atlas. The retrieval phase uses Atlas Vector Search to find semantically similar documents based on user queries. Finally, the generation phase prompts the LLM with these retrieved documents as context, resulting in more accurate and relevant responses. ## RAG with Atlas Vector Search The RAG implementation with Atlas Vector Search follows a straightforward process. First, data is ingested into Atlas, where it's converted into vector embeddings. Then, when a user asks a question, Atlas Vector Search retrieves semantically similar documents. Finally, an LLM uses this retrieved context to generate a more informed response. ## Ingestion Data ingestion for RAG involves processing custom data and storing it in a vector database. The process typically includes four key steps: 1. Load the data from various sources (PDFs, documents, etc.) 2. Split data into manageable chunks (chunking) 3. Convert text into vector embeddings using embedding models 4. Store embeddings alongside original data in Atlas collections Here's a Python example showing how to convert text into embeddings: ```python from sentence_transformers import SentenceTransformer model = SentenceTransformer("nomic-ai/nomic-embed-text-v1") embedding = model.encode("Your text here").tolist() ``` ## Retrieval The retrieval phase uses Atlas Vector Search to find documents most similar to a user's query. This is achieved by: 1. Creating an Atlas Vector Search index on collections containing embeddings 2. Converting user queries into vector embeddings 3. Running vector search queries against the database Example retrieval pipeline: ```python pipeline = [ { "$vectorSearch": { "index": "vector_index", "queryVector": query_embedding, "path": "embedding", "exact": True, "limit": 5 } }, { "$project": { "_id": 0, "text": 1 } } ] ``` ## Generation In the generation phase, retrieved documents are used as context for LLM prompts. This approach significantly reduces hallucinations and improves response accuracy. The LLM receives both the user's question and the relevant retrieved documents, allowing it to generate more informed answers. ## Get Started To implement a basic RAG system with Atlas Vector Search, you would typically: 1. Set up your Atlas cluster and connection string 2. Create a data ingestion pipeline to load and embed your documents 3. Define a Vector Search index on your data 4. Build a retrieval system that matches user queries to relevant documents 5. Implement a generation component that uses LLMs with retrieved context ## Next Steps For advanced implementations, consider these resources: - **Integrate with AI Frameworks**: Combine Atlas with popular LLM frameworks for enhanced functionality - **Local Implementations**: Build RAG systems using local deployments - **AI Agents**: Implement agentic RAG for more complex workflows To improve your RAG system's performance, consider using advanced embedding models like Voyage AI, implementing pre-filtering on other data fields, or using hybrid search that combines semantic and full-text search results. The key to successful RAG implementation lies in choosing appropriate embedding models, optimizing chunking strategies, and carefully evaluating your system's performance using appropriate metrics.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/patw/avs-docs-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server