MCP Chat Support System

PROJECT_IMPLEMENTATION_DETAILS.md•39.7 KiB

# ClientSphere RAG Platform - Complete Implementation Documentation ## Table of Contents 1. [Project Overview](#project-overview) 2. [System Architecture](#system-architecture) 3. [RAG Architecture Deep Dive](#rag-architecture-deep-dive) 4. [Frontend Implementation](#frontend-implementation) 5. [Backend Services](#backend-services) 6. [Database Schema](#database-schema) 7. [Authentication & Security](#authentication--security) 8. [Deployment Architecture](#deployment-architecture) 9. [API Reference](#api-reference) 10. [Widget Implementation](#widget-implementation) 11. [File Processing Pipeline](#file-processing-pipeline) 12. [Vector Database & Embeddings](#vector-database--embeddings) 13. [LLM Integration](#llm-integration) 14. [Billing & Usage Tracking](#billing--usage-tracking) 15. [Configuration & Environment](#configuration--environment) --- ## Project Overview ### Concept **ClientSphere** is a multi-tenant RAG (Retrieval-Augmented Generation) platform that enables businesses to create AI-powered customer support chatbots trained on their own knowledge base. The platform allows companies to: - Upload documents (PDFs, DOCX, TXT, Markdown) to build a knowledge base - Deploy a chat widget on their website - Provide AI-powered answers with citations and confidence scores - Track usage and manage billing per tenant ### Key Features 1. **Multi-Tenant Architecture**: Complete data isolation between tenants 2. **RAG Pipeline**: Document ingestion → Chunking → Embedding → Vector Search → LLM Generation 3. **Anti-Hallucination**: Strict prompting and verification to prevent incorrect answers 4. **Embeddable Widget**: JavaScript widget that can be embedded on any website 5. **Usage Tracking**: Token counting, cost estimation, and quota enforcement 6. **Production Ready**: Authentication, rate limiting, health checks, error handling --- ## System Architecture ### High-Level Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ Frontend (React) │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Dashboard │ │ Knowledge │ │ Chat Test │ │ │ │ │ │ Base Mgmt │ │ Interface │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ │ │ └─────────┼──────────────────┼──────────────────┼───────────────┘ │ │ │ │ │ │ ┌─────────┼──────────────────┼──────────────────┼───────────────┐ │ │ │ │ │ │ ┌──────▼──────────┐ ┌───▼──────────┐ ┌───▼──────────┐ │ │ │ Node.js API │ │ Node.js API │ │ Node.js API │ │ │ │ (Session Mgmt) │ │ (Widget) │ │ (Chat Proxy) │ │ │ └──────┬──────────┘ └───┬──────────┘ └───┬──────────┘ │ │ │ │ │ │ │ └──────────────────┼──────────────────┘ │ │ │ │ │ ┌───────▼────────┐ │ │ │ SQLite DB │ │ │ │ (Sessions, │ │ │ │ Messages) │ │ │ └────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ │ │ HTTP Requests │ ┌───────────────────────────▼─────────────────────────────────────┐ │ FastAPI RAG Backend (Python) │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ Document Upload → Parse → Chunk → Embed → Store │ │ │ └──────────────────────────────────────────────────────────┘ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ Query → Embed → Vector Search → LLM → Answer │ │ │ └──────────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ ChromaDB │ │ Embedding │ │ LLM Service │ │ │ │ (Vector DB) │ │ Service │ │ (Gemini/ │ │ │ │ │ │ (Sentence │ │ OpenAI) │ │ │ │ │ │ Transformers│ │ │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ SQLite DB (Billing, Usage Tracking) │ │ │ └──────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` ### Component Breakdown #### 1. **Frontend (React + Vite)** - **Location**: `src/` - **Framework**: React 18 with TypeScript - **UI Library**: Radix UI components - **Styling**: Tailwind CSS - **State Management**: React Context API - **Deployment**: Cloudflare Pages #### 2. **Node.js Backend (Session/API Layer)** - **Location**: `server/` - **Framework**: Express.js with TypeScript - **Purpose**: - Session management for chat widget - Message storage and history - Proxy to RAG backend - Widget script generation - **Database**: SQLite (sessions, messages, tenants, users) - **Deployment**: Cloudflare Workers or standalone server #### 3. **FastAPI RAG Backend (AI Processing)** - **Location**: `rag-backend/` - **Framework**: FastAPI (Python 3.11+) - **Purpose**: - Document ingestion and processing - Vector search and retrieval - LLM integration (Gemini/OpenAI) - Answer generation with citations - Billing and usage tracking - **Database**: - ChromaDB (vector storage) - SQLite (billing, usage) - **Deployment**: Hugging Face Spaces (or Render/Railway) --- ## RAG Architecture Deep Dive ### RAG Pipeline Overview The RAG (Retrieval-Augmented Generation) system follows a two-phase approach: 1. **Ingestion Phase**: Documents → Chunks → Embeddings → Vector Store 2. **Query Phase**: Question → Embedding → Vector Search → Context → LLM → Answer ### Phase 1: Document Ingestion #### Step 1: Document Upload (`POST /kb/upload`) ```python # Location: rag-backend/app/main.py @app.post("/kb/upload") async def upload_document( file: UploadFile, tenant_id: str, user_id: str, kb_id: str ): # 1. Validate file (type, size) # 2. Save to disk # 3. Create document record # 4. Trigger background processing ``` **File Validation**: - Allowed types: PDF, DOCX, TXT, Markdown - Max size: 50MB (configurable) - Multi-tenant isolation: `tenant_id` required #### Step 2: Document Parsing (`app/rag/ingest.py`) ```python class DocumentParser: def parse(self, file_path: Path) -> ParsedDocument: # Extract text based on file type # - PDF: PyMuPDF (fitz) # - DOCX: python-docx # - TXT/MD: Direct read # Preserve page numbers for citations # Return: text + page_map (char_position → page_number) ``` **Key Features**: - Preserves page numbers for accurate citations - Handles multiple file formats - Extracts metadata (file type, size) #### Step 3: Text Chunking (`app/rag/chunking.py`) ```python class DocumentChunker: def chunk_text( self, text: str, page_numbers: Dict[int, int] ) -> List[TextChunk]: # 1. Split into paragraphs (natural boundaries) # 2. Use tiktoken for accurate token counting # 3. Create chunks of ~600 tokens with 150 token overlap # 4. Preserve page numbers in chunks # 5. Ensure minimum chunk size (100 tokens) ``` **Chunking Strategy**: - **Chunk Size**: 600 tokens (configurable) - **Overlap**: 150 tokens (prevents context loss at boundaries) - **Minimum Size**: 100 tokens (filters tiny chunks) - **Token Counting**: Uses `tiktoken` with `cl100k_base` encoding (GPT-4 compatible) **Why Overlap?** - Prevents information loss at chunk boundaries - Ensures context continuity for retrieval - Example: If a sentence spans two chunks, it appears in both #### Step 4: Embedding Generation (`app/rag/embeddings.py`) ```python class EmbeddingService: def __init__(self, model_name: str = "all-MiniLM-L6-v2"): # Lazy load SentenceTransformer model # Model: all-MiniLM-L6-v2 (384 dimensions) # Fast, good quality, optimized for semantic search def embed_texts(self, texts: List[str]) -> List[List[float]]: # Batch processing for efficiency # Returns: List of 384-dimensional vectors ``` **Embedding Model**: `all-MiniLM-L6-v2` - **Dimensions**: 384 - **Speed**: Fast inference (~100ms per batch) - **Quality**: Good for semantic search - **Size**: ~80MB (downloads on first use) **Why This Model?** - Optimized for semantic similarity search - Fast enough for real-time queries - Good balance between speed and quality - Can be swapped for larger models (e.g., `all-mpnet-base-v2`) for better quality #### Step 5: Vector Storage (`app/rag/vectorstore.py`) ```python class VectorStore: def __init__(self): # Initialize ChromaDB with persistence self.client = chromadb.PersistentClient( path="data/vectordb", settings=ChromaSettings( anonymized_telemetry=False, allow_reset=True ) ) self.collection = self.client.get_or_create_collection( name="clientsphere_kb", metadata={"hnsw:space": "cosine"} # Cosine similarity ) def add_documents( self, documents: List[str], embeddings: List[List[float]], metadatas: List[Dict], ids: List[str] ): # Store in ChromaDB with metadata # Metadata includes: tenant_id, kb_id, user_id, file_name, page_number ``` **ChromaDB Configuration**: - **Similarity Metric**: Cosine similarity - **Persistence**: Local file system (`data/vectordb/`) - **Collection**: Single collection for all tenants (filtered by metadata) - **Metadata**: Rich metadata for filtering and citations **Multi-Tenant Isolation**: - All chunks include `tenant_id` in metadata - Vector searches always filter by `tenant_id` - Zero risk of cross-tenant data leakage ### Phase 2: Query Processing #### Step 1: Query Embedding ```python # User asks: "What is your return policy?" query_embedding = embedding_service.embed_query(query) # Returns: 384-dimensional vector ``` #### Step 2: Vector Search (`app/rag/retrieval.py`) ```python class RetrievalService: def retrieve( self, query: str, tenant_id: str, kb_id: str, user_id: str, top_k: int = 10 ) -> Tuple[List[RetrievalResult], float, bool]: # 1. Generate query embedding query_embedding = self.embedding_service.embed_query(query) # 2. Search with filters (CRITICAL: tenant_id isolation) filter_dict = { "tenant_id": tenant_id, # Multi-tenant isolation "kb_id": kb_id, "user_id": user_id } # 3. ChromaDB similarity search results = self.vector_store.search( query_embedding=query_embedding, top_k=top_k, filter_dict=filter_dict ) # 4. Filter by similarity threshold (0.15) # 5. Calculate confidence score # 6. Check for direct matches (intent-based) return filtered_results, avg_confidence, has_relevant ``` **Retrieval Parameters**: - **Top K**: 10 chunks retrieved (configurable) - **Similarity Threshold**: 0.15 (low threshold to include more results) - **Strict Threshold**: 0.45 (for answer generation - anti-hallucination) - **Confidence Calculation**: Maximum of top 3 results (if >= 0.4) or weighted average **Confidence Scoring**: ```python # Heavy confidence mode: Use maximum similarity from top results if max_score >= 0.4: avg_confidence = max_score # Use best match else: # Weighted average of top 3 weighted_avg = sum(scores * weights) / sum(weights) ``` #### Step 3: Context Formatting ```python def get_context_for_llm( results: List[RetrievalResult], max_tokens: int = 2500 ) -> Tuple[str, List[Dict]]: # Format chunks with source citations # Example: # [Source 1: refund_policy.pdf] (Page 2) # We offer a 30-day return policy... # Build citation metadata # Return: formatted_context, citations_info ``` **Context Format**: ``` [Source 1: refund_policy.pdf] (Page 2) We offer a 30-day return policy on all products. Items must be in original condition... --- [Source 2: faq.pdf] (Page 5) Returns can be initiated through your account dashboard or by contacting support... ``` #### Step 4: LLM Answer Generation (`app/rag/answer.py`) ```python class AnswerService: def generate_answer( self, question: str, context: str, citations_info: List[Dict], confidence: float, has_relevant_results: bool ) -> Dict[str, Any]: # GATE 1: No relevant results → REFUSE if not has_relevant_results: return refusal_response() # GATE 2: Low confidence (< 0.30) → REFUSE if confidence < 0.30: return refusal_response() # GATE 3: Intent-based gating (integration/API questions) if "integration" in intents and confidence < 0.50: return refusal_response() # Generate draft answer draft_answer = self.provider.generate( system_prompt=format_draft_prompt(context, question), user_prompt=question ) # Verify answer (MANDATORY) verification = verifier.verify_answer( draft_answer=draft_answer, context=context, citations_info=citations_info ) if verification["pass"]: return { "answer": draft_answer, "citations": extract_citations(draft_answer, citations_info), "confidence": confidence, "from_knowledge_base": True, "verifier_passed": True } else: # Verifier failed → REFUSE return refusal_response() ``` **Anti-Hallucination Measures**: 1. **Strict System Prompts** (`app/rag/prompts.py`): ``` You are a helpful assistant that answers questions ONLY from the provided context. RULES: 1. If the context doesn't contain the answer, say "I don't know" 2. Never make up information 3. Always cite sources using [Source X] 4. If unsure, suggest contacting support ``` 2. **Temperature = 0.0**: Maximum determinism (no randomness) 3. **Verifier Service** (`app/rag/verifier.py`): - Checks if answer claims are supported by context - Flags unsupported claims - Refuses to answer if verification fails 4. **Confidence Thresholds**: - Retrieval threshold: 0.15 (lenient, to include more results) - Answer threshold: 0.30 (strict, to prevent low-quality answers) - Integration threshold: 0.50 (very strict for technical questions) **LLM Providers**: 1. **Gemini** (Default): - Model: `gemini-1.5-flash` (fast, cost-effective) - Fallback: `gemini-pro`, `gemini-1.0-pro` - Auto-detects available models 2. **OpenAI** (Alternative): - Model: `gpt-3.5-turbo` - Configurable via `LLM_PROVIDER` env var --- ## Frontend Implementation ### Technology Stack - **Framework**: React 18.3 with TypeScript - **Build Tool**: Vite 5.4 - **UI Components**: Radix UI (headless, accessible) - **Styling**: Tailwind CSS 3.4 - **State Management**: React Context API - **Routing**: React Router 6.22 - **HTTP Client**: Axios 1.13 ### Project Structure ``` src/ ├── api/ │ └── ragClient.ts # RAG backend API client ├── components/ # Reusable UI components │ ├── ChatInterface.tsx # Main chat UI │ ├── RAGChatInterface.tsx # RAG-specific chat │ └── ... ├── pages/ # Page components │ ├── tenant/ │ │ ├── Dashboard.tsx │ │ ├── KnowledgeBase.tsx │ │ ├── WidgetCustomization.tsx │ │ └── ... │ └── rag/ │ ├── KnowledgeBase.tsx │ ├── RetrievalTest.tsx │ ├── Chat.tsx │ └── Billing.tsx ├── hooks/ # Custom React hooks ├── context/ # React Context providers ├── lib/ # Utility functions └── types/ # TypeScript types ``` ### RAG API Client (`src/api/ragClient.ts`) **Key Features**: 1. **Environment-Aware URL Resolution**: ```typescript function getRAGApiUrl(): string { // Priority: // 1. VITE_RAG_API_URL env var (build-time) // 2. Production: Hugging Face Space URL // 3. Development: localhost:8000 } ``` 2. **Authentication**: ```typescript // JWT token from localStorage // X-Tenant-Id and X-User-Id headers (dev mode) // Auto-logout on 401 ``` 3. **Error Handling**: ```typescript // 401 → Logout and redirect // 402 → Quota exceeded (show upgrade banner) // Network errors → User-friendly messages ``` **API Methods**: - `uploadDocument(file, kbId)` - Upload document - `getKBStats(kbId)` - Get knowledge base statistics - `searchKB(query, kbId, topK)` - Search without LLM - `chat(question, kbId, conversationId)` - Full RAG chat - `getUsage(range)` - Usage statistics - `getLimits()` - Plan limits - `getCostReport(range)` - Cost breakdown ### Knowledge Base Page (`src/pages/rag/KnowledgeBase.tsx`) **Features**: - Drag & drop file upload - File validation (type, size) - Real-time KB statistics - Document list with status - Delete documents - Toast notifications **Upload Flow**: ```typescript 1. User selects/drops file 2. Validate file (type: PDF/DOCX/TXT/MD, size: < 50MB) 3. Call RAGClient.uploadDocument() 4. Show loading state 5. Poll KB stats until chunks appear 6. Show success/error toast ``` ### Chat Interface (`src/pages/rag/Chat.tsx`) **Features**: - Real-time chat UI - Message history - Confidence score display - Citation links - Refusal state handling - Loading indicators **Chat Flow**: ```typescript 1. User types question 2. Send to RAGClient.chat() 3. Display answer with citations 4. Show confidence score 5. Handle refusal (low confidence) 6. Show escalation suggestion if needed ``` --- ## Backend Services ### Node.js Backend (`server/`) **Purpose**: Session management and chat widget proxy **Key Responsibilities**: 1. **Session Management**: - Create chat sessions for widget users - Store session tokens - Track session metadata (domain, IP, user agent) 2. **Message Storage**: - Store user messages - Store AI responses - Maintain conversation history 3. **RAG Proxy**: - Forward chat requests to RAG backend - Add authentication headers - Handle errors and retries 4. **Widget Script**: - Generate tenant-specific widget JavaScript - Inject tenant ID and API URL - Serve as `/api/widget/script/:tenantId` **Database Schema** (SQLite): ```sql -- Chat sessions CREATE TABLE chat_sessions ( id INTEGER PRIMARY KEY, tenant_id INTEGER NOT NULL, session_token TEXT UNIQUE NOT NULL, domain TEXT, started_at DATETIME DEFAULT CURRENT_TIMESTAMP ); -- Chat messages CREATE TABLE chat_messages ( id INTEGER PRIMARY KEY, session_id INTEGER NOT NULL, sender TEXT NOT NULL, -- 'user' or 'ai' message TEXT NOT NULL, metadata TEXT DEFAULT '{}', timestamp DATETIME DEFAULT CURRENT_TIMESTAMP ); ``` **Chat Endpoint** (`server/src/routes/chat.ts`): ```typescript router.post('/messages', async (req, res) => { // 1. Get session from token // 2. Save user message // 3. Get chat history (last 20 messages) // 4. Call RAG backend with history // 5. Save AI response // 6. Return response with RAG metadata }); ``` ### FastAPI RAG Backend (`rag-backend/`) **Purpose**: AI processing and knowledge base management **Key Modules**: 1. **`app/main.py`**: FastAPI application and endpoints 2. **`app/config.py`**: Configuration settings 3. **`app/rag/ingest.py`**: Document parsing 4. **`app/rag/chunking.py`**: Text chunking 5. **`app/rag/embeddings.py`**: Embedding generation 6. **`app/rag/vectorstore.py`**: ChromaDB operations 7. **`app/rag/retrieval.py`**: Vector search 8. **`app/rag/answer.py`**: LLM integration 9. **`app/rag/prompts.py`**: Prompt templates 10. **`app/rag/verifier.py`**: Answer verification 11. **`app/billing/`**: Usage tracking and quotas **Database Schema** (SQLite): ```sql -- Billing: Tenant plans CREATE TABLE tenant_plans ( tenant_id TEXT PRIMARY KEY, plan_name TEXT NOT NULL, monthly_chat_limit INTEGER, created_at DATETIME DEFAULT CURRENT_TIMESTAMP ); -- Billing: Usage tracking CREATE TABLE usage_monthly ( id INTEGER PRIMARY KEY, tenant_id TEXT NOT NULL, year INTEGER NOT NULL, month INTEGER NOT NULL, total_requests INTEGER DEFAULT 0, total_tokens INTEGER DEFAULT 0, total_cost_usd REAL DEFAULT 0.0, gemini_requests INTEGER DEFAULT 0, openai_requests INTEGER DEFAULT 0, UNIQUE(tenant_id, year, month) ); ``` --- ## Database Schema ### Node.js Backend (SQLite) **Tables**: - `users`: User accounts (email, password_hash, google_id) - `tenants`: Tenant/organization records - `user_tenants`: User-tenant relationships (roles, permissions) - `chat_sessions`: Chat session tracking - `chat_messages`: Individual messages - `widget_configs`: Widget customization settings - `analytics_events`: Analytics tracking ### RAG Backend (SQLite + ChromaDB) **SQLite Tables**: - `tenant_plans`: Subscription plans per tenant - `usage_monthly`: Monthly usage statistics - `documents`: Document metadata (status, file info) **ChromaDB**: - **Collection**: `clientsphere_kb` - **Metadata Fields**: - `tenant_id` (string): Multi-tenant isolation - `kb_id` (string): Knowledge base ID - `user_id` (string): User who uploaded - `file_name` (string): Original filename - `file_type` (string): PDF, DOCX, TXT, MD - `chunk_id` (string): Unique chunk identifier - `chunk_index` (int): Chunk position in document - `page_number` (int): Page number for citations - `token_count` (int): Tokens in chunk - `created_at` (string): ISO timestamp --- ## Authentication & Security ### Multi-Tenant Isolation **Critical Security Measures**: 1. **Tenant ID Enforcement**: - All vector searches filter by `tenant_id` - Never accepts `tenant_id` from request body in production - Extracted from JWT token or headers (dev mode) 2. **Development vs Production**: ```python # Development (ENV=dev) # Allows X-Tenant-Id and X-User-Id headers # Production (ENV=prod) # Requires JWT token with tenant_id claim # Ignores headers from request body ``` 3. **JWT Authentication**: ```python # JWT payload structure: { "tenant_id": "tenant_123", "user_id": "user_456", "sub": "user_456", "exp": 1234567890 } ``` ### Rate Limiting **Per-Tenant Rate Limits**: - `/chat`: 10 requests/minute - `/kb/search`: 30 requests/minute - `/kb/upload`: 20 requests/hour **Implementation**: `slowapi` with Redis (optional) or in-memory ### CORS Configuration ```python # Development: Allow all origins # Production: Restrict to frontend domain ALLOWED_ORIGINS = "https://main.clientsphere.pages.dev" ``` --- ## Deployment Architecture ### Frontend (Cloudflare Pages) **Build Process**: ```bash npm run build # Vite build # Output: dist/ ``` **Deployment**: ```bash wrangler pages deploy dist --project-name=clientsphere ``` **Environment Variables** (Cloudflare Pages Dashboard): - `VITE_API_URL`: Node.js backend URL - `VITE_RAG_API_URL`: RAG backend URL - `VITE_GOOGLE_CLIENT_ID`: Google OAuth client ID **URL**: `https://main.clientsphere.pages.dev` ### Node.js Backend **Options**: 1. **Cloudflare Workers**: Serverless (if compatible) 2. **Standalone Server**: Express.js on Render/Railway 3. **Docker**: Containerized deployment **Environment Variables**: - `RAG_BACKEND_URL`: FastAPI backend URL - `FRONTEND_URL`: Frontend URL (for widget script) - `JWT_SECRET`: JWT signing secret - `DATABASE_PATH`: SQLite database path ### RAG Backend (Hugging Face Spaces) **Deployment**: 1. Push code to Hugging Face Space 2. Configure environment variables 3. Space auto-builds Docker image 4. Runs `app.py` entry point **Dockerfile**: ```dockerfile FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . CMD ["python", "app.py"] ``` **Environment Variables** (HF Spaces): - `GEMINI_API_KEY`: Google Gemini API key - `ENV`: `dev` or `prod` - `JWT_SECRET`: JWT secret (for prod) - `PORT`: 7860 (auto-set by HF Spaces) **URL**: `https://chiragpatankar-rag-backend.hf.space` **Note**: Hugging Face converts underscores to hyphens in URLs! --- ## API Reference ### RAG Backend Endpoints #### Health Check ```http GET /health ``` **Response**: ```json { "status": "healthy", "version": "1.0.0", "vector_db_connected": true, "llm_configured": true } ``` #### Upload Document ```http POST /kb/upload Content-Type: multipart/form-data file: <file> kb_id: default tenant_id: tenant_123 (dev mode) user_id: user_456 (dev mode) ``` **Response**: ```json { "success": true, "message": "Document uploaded successfully", "file_name": "manual.pdf", "chunks_created": 45, "document_id": "doc_abc123" } ``` #### Get Knowledge Base Stats ```http GET /kb/stats?kb_id=default&tenant_id=tenant_123&user_id=user_456 ``` **Response**: ```json { "tenant_id": "tenant_123", "kb_id": "default", "user_id": "user_456", "total_documents": 3, "total_chunks": 127, "file_names": ["manual.pdf", "faq.pdf", "policy.txt"] } ``` #### Search Knowledge Base ```http GET /kb/search?query=return%20policy&kb_id=default&top_k=5 ``` **Response**: ```json { "success": true, "results": [ { "chunk_id": "tenant_123_default_manual.pdf_0_abc123", "content": "We offer a 30-day return policy...", "metadata": { "file_name": "manual.pdf", "page_number": 2 }, "similarity_score": 0.85 } ], "confidence": 0.82, "has_relevant_results": true } ``` #### Chat with RAG ```http POST /chat Content-Type: application/json { "kb_id": "default", "question": "What is your return policy?", "conversation_id": "conv_abc123", "history": [ {"role": "user", "content": "Hello"}, {"role": "ai", "content": "Hi! How can I help?"} ] } ``` **Response**: ```json { "success": true, "answer": "Based on the documentation, we offer a 30-day return policy... [Source 1]", "citations": [ { "file_name": "manual.pdf", "chunk_id": "tenant_123_default_manual.pdf_0_abc123", "page_number": 2, "relevance_score": 0.85, "excerpt": "We offer a 30-day return policy..." } ], "confidence": 0.82, "from_knowledge_base": true, "escalation_suggested": false, "refused": false, "verifier_passed": true, "conversation_id": "conv_abc123" } ``` #### Get Usage Statistics ```http GET /billing/usage?range=month ``` **Response**: ```json { "tenant_id": "tenant_123", "period": "2024-01", "total_requests": 450, "total_tokens": 125000, "total_cost_usd": 0.45, "gemini_requests": 450, "openai_requests": 0, "start_date": "2024-01-01", "end_date": "2024-01-31" } ``` --- ## Widget Implementation ### Widget Script Generation **Endpoint**: `GET /api/widget/script/:tenantId` **Process**: 1. Verify tenant exists 2. Generate self-contained JavaScript 3. Inject tenant ID and API URL 4. Return as JavaScript file **Widget Features**: - Floating chat button - Expandable chat window - Session management - Message history - Typing indicators - RAG metadata display (citations, confidence) ### Widget Code Structure ```javascript (function() { const TENANT_ID = 'tenant_123'; const API_URL = 'https://backend.example.com/api'; // Create UI elements const button = createChatButton(); const widget = createChatWidget(); // Session management let sessionToken = null; async function initializeSession() { const response = await fetch(`${API_URL}/chat/sessions`, { method: 'POST', body: JSON.stringify({ tenantId: TENANT_ID }) }); sessionToken = (await response.json()).sessionToken; } async function sendMessage(message) { const response = await fetch(`${API_URL}/chat/messages`, { method: 'POST', body: JSON.stringify({ sessionToken, message, tenantId: TENANT_ID }) }); return await response.json(); } // Initialize on page load initializeSession(); })(); ``` ### Embedding the Widget **HTML Code** (generated by frontend): ```html  <script src="https://backend.example.com/api/widget/script/TENANT_ID" async></script> ``` **Placement**: Just before closing `</body>` tag --- ## File Processing Pipeline ### Complete Flow ``` 1. User uploads file (PDF/DOCX/TXT/MD) ↓ 2. File validation (type, size) ↓ 3. Save to disk (data/uploads/) ↓ 4. Background task triggered ↓ 5. Document parsing - PDF: PyMuPDF → Extract text + page numbers - DOCX: python-docx → Extract text - TXT/MD: Direct read ↓ 6. Text chunking - Split into paragraphs - Create chunks (~600 tokens) - Add overlap (150 tokens) - Preserve page numbers ↓ 7. Embedding generation - Batch process chunks - Generate 384-dim vectors ↓ 8. Vector storage - Add to ChromaDB with metadata - Include tenant_id, kb_id, user_id ↓ 9. Update document status: "processed" ``` ### Error Handling - **Parsing errors**: Log and mark document as "error" - **Chunking errors**: Skip document, notify user - **Embedding errors**: Retry with exponential backoff - **Storage errors**: Rollback, mark as "error" --- ## Vector Database & Embeddings ### ChromaDB Configuration **Storage**: Local file system (`data/vectordb/`) **Collection Settings**: - **Name**: `clientsphere_kb` - **Similarity Metric**: Cosine similarity - **Persistence**: Enabled (survives restarts) **Metadata Schema**: ```python { "tenant_id": str, # Multi-tenant isolation "kb_id": str, # Knowledge base ID "user_id": str, # User who uploaded "file_name": str, # Original filename "file_type": str, # PDF, DOCX, TXT, MD "chunk_id": str, # Unique chunk ID "chunk_index": int, # Position in document "page_number": int, # Page number (for citations) "token_count": int, # Tokens in chunk "created_at": str # ISO timestamp } ``` ### Embedding Model **Model**: `all-MiniLM-L6-v2` (Sentence Transformers) **Specifications**: - **Dimensions**: 384 - **Max Sequence Length**: 512 tokens - **Language**: English (multilingual variants available) - **Speed**: ~100ms per batch of 32 texts - **Quality**: Good for semantic search (not best, but fast) **Alternative Models** (for better quality): - `all-mpnet-base-v2`: 768 dims, slower, better quality - `sentence-transformers/all-MiniLM-L12-v2`: 384 dims, better than L6 ### Similarity Search **Algorithm**: Cosine similarity **Formula**: ``` similarity = dot(query_embedding, chunk_embedding) / (||query_embedding|| * ||chunk_embedding||) ``` **Range**: -1 to 1 (typically 0 to 1 for normalized embeddings) **Thresholds**: - **Retrieval**: 0.15 (lenient, to include more results) - **Answer Generation**: 0.30 (strict, to prevent hallucinations) - **Integration Questions**: 0.50 (very strict) --- ## LLM Integration ### Gemini Provider **Model**: `gemini-1.5-flash` (default) **Features**: - Fast inference (~1-2 seconds) - Cost-effective - Good quality for RAG - Auto-fallback to other models if unavailable **Configuration**: ```python genai.configure(api_key=GEMINI_API_KEY) model = genai.GenerativeModel("gemini-1.5-flash") response = model.generate_content( prompt, generation_config=genai.types.GenerationConfig( temperature=0.0, # Maximum determinism max_output_tokens=1024 ) ) ``` **Usage Tracking**: - Prompt tokens (estimated: 1 token ≈ 4 chars) - Completion tokens (estimated) - Actual usage from API response (if available) ### OpenAI Provider **Model**: `gpt-3.5-turbo` (default) **Configuration**: ```python client = OpenAI(api_key=OPENAI_API_KEY) response = client.chat.completions.create( model="gpt-3.5-turbo", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_prompt} ], temperature=0.0, max_tokens=1024 ) ``` **Usage Tracking**: - Exact token counts from API response - Cost calculation based on model pricing ### Prompt Engineering **System Prompt** (Draft Generation): ``` You are a helpful assistant that answers questions based ONLY on the provided context. RULES: 1. Answer ONLY from the context provided 2. If the context doesn't contain the answer, say "I don't know" 3. Never make up information 4. Always cite sources using [Source X] 5. Be concise and helpful 6. If unsure, suggest contacting support ``` **Verifier Prompt**: ``` Verify if the following answer is supported by the context: Answer: {draft_answer} Context: {context} Check: 1. Are all claims in the answer supported by the context? 2. Are there any unsupported claims? 3. Does the answer accurately represent the context? Respond with JSON: { "pass": true/false, "issues": ["list of issues"], "unsupported_claims": ["list of unsupported claims"] } ``` --- ## Billing & Usage Tracking ### Subscription Plans **Starter Plan**: - Monthly chat limit: 500 - Cost: $X/month - Features: Basic RAG, standard support **Growth Plan**: - Monthly chat limit: 5,000 - Cost: $Y/month - Features: Advanced RAG, priority support **Pro Plan**: - Monthly chat limit: Unlimited - Cost: $Z/month - Features: Enterprise RAG, dedicated support ### Usage Tracking **Per Request Tracking**: ```python # Tracked on every /chat request: { "tenant_id": "tenant_123", "request_id": "req_abc123", "timestamp": "2024-01-15T10:30:00Z", "provider": "gemini", "model": "gemini-1.5-flash", "prompt_tokens": 1250, "completion_tokens": 350, "total_tokens": 1600, "estimated_cost_usd": 0.0012 } ``` **Monthly Aggregation**: - Sum all requests for tenant in current month - Calculate total tokens and cost - Compare against plan limits ### Quota Enforcement **Check Before LLM Call**: ```python def check_quota(tenant_id: str) -> bool: plan = get_tenant_plan(tenant_id) usage = get_monthly_usage(tenant_id) if plan.monthly_chat_limit is None: return True # Unlimited plan return usage.total_requests < plan.monthly_chat_limit ``` **Response on Quota Exceeded**: ```json { "success": false, "error": "AI quota exceeded. Upgrade your plan.", "status_code": 402 } ``` ### Cost Calculation **Gemini Pricing** (example): - Input: $0.00025 per 1K tokens - Output: $0.0005 per 1K tokens **OpenAI Pricing** (example): - GPT-3.5-turbo: $0.0015 per 1K tokens (input), $0.002 per 1K tokens (output) **Calculation**: ```python cost = (prompt_tokens / 1000 * input_price) + (completion_tokens / 1000 * output_price) ``` --- ## Configuration & Environment ### Frontend Environment Variables **.env.development**: ```bash VITE_API_URL=http://localhost:3001 VITE_RAG_API_URL=http://localhost:8000 VITE_GOOGLE_CLIENT_ID=your-client-id ``` **.env.production**: ```bash VITE_API_URL=https://mcp-backend.example.com VITE_RAG_API_URL=https://chiragpatankar-rag-backend.hf.space VITE_GOOGLE_CLIENT_ID=your-client-id ``` ### Node.js Backend Environment Variables **.env**: ```bash NODE_ENV=production PORT=3001 RAG_BACKEND_URL=https://chiragpatankar-rag-backend.hf.space FRONTEND_URL=https://main.clientsphere.pages.dev JWT_SECRET=your-secret-key DATABASE_PATH=./database.sqlite ``` ### RAG Backend Environment Variables **.env**: ```bash # Environment ENV=dev # or "prod" # LLM Configuration LLM_PROVIDER=gemini # or "openai" GEMINI_API_KEY=your-gemini-key OPENAI_API_KEY=your-openai-key # JWT (Production) JWT_SECRET=your-jwt-secret # Chunking CHUNK_SIZE=600 CHUNK_OVERLAP=150 MIN_CHUNK_SIZE=100 # Retrieval TOP_K=10 SIMILARITY_THRESHOLD=0.15 SIMILARITY_THRESHOLD_STRICT=0.30 # LLM Settings TEMPERATURE=0.0 MAX_CONTEXT_TOKENS=2500 REQUIRE_VERIFIER=true # Security MAX_FILE_SIZE_MB=50 ALLOWED_ORIGINS=* RATE_LIMIT_ENABLED=true # Embedding Model EMBEDDING_MODEL=all-MiniLM-L6-v2 ``` ### Hugging Face Spaces Configuration **README.md** (YAML frontmatter): ```yaml --- title: ClientSphere RAG Backend sdk: docker app_file: app.py --- ``` **Environment Variables** (HF Spaces Settings): - `GEMINI_API_KEY`: Your Gemini API key - `ENV`: `dev` or `prod` - `JWT_SECRET`: JWT secret (if `ENV=prod`) --- ## Conclusion This document provides a comprehensive overview of the ClientSphere RAG platform implementation. The system is designed for: - **Scalability**: Multi-tenant architecture with isolated data - **Reliability**: Error handling, retries, health checks - **Security**: JWT authentication, tenant isolation, rate limiting - **Quality**: Anti-hallucination measures, confidence scoring, citations - **Production Ready**: Billing, usage tracking, monitoring For questions or contributions, please refer to the individual component README files or contact the development team. --- **Last Updated**: January 2024 **Version**: 1.0.0

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ChiragPatankar/MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

PROJECT_IMPLEMENTATION_DETAILS.md•39.7 KiB