š¬ LiveKit RAG Assistant
AI-powered semantic search and question-answering system for LiveKit documentation
šÆ Features
Dual Search Modes: Documentation (Pinecone) + Real-time Web Search (Tavily)
Standard MCP Server: Async LangChain integration with Model Context Protocol
Fast Responses: Groq LLM (llama-3.3-70b) with ultra-fast inference
Semantic Search: HuggingFace embeddings (384-dim) with vector indexing
Source Attribution: View exact sources for every answer
Chat History: Persistent conversation tracking with recent query access
Query Validation: Prevents invalid inputs with helpful error messages
Copy-to-Clipboard: One-click message sharing
š Quick Start
Prerequisites
Python 3.10+ with conda
API Keys: GROQ, TAVILY, PINECONE, HuggingFace
Installation
Running the Application
Terminal 1 - Start MCP Server:
Terminal 2 - Start Streamlit UI:
The app opens at http://localhost:8501
š Architecture
š§ Tech Stack
Component | Technology | Purpose |
Frontend | Streamlit 1.28+ | Premium glassmorphism UI |
Backend | MCP Standard | Async subprocess server |
LLM | Groq API | Ultra-fast inference (free tier) |
Embeddings | HuggingFace | sentence-transformers/all-MiniLM-L6-v2 |
Vector DB | Pinecone Serverless | Ultra-fast similarity search (AWS us-east-1) |
Web Search | Tavily API | Real-time internet search |
Framework | LangChain | LLM orchestration & tools |
Language | Python 3.12 | Modern syntax & features |
š Project Structure
š® Usage
Ask Questions
Choose Search Mode: Documentation (š) or Web Search (š)
Type Question: Natural language queries work best
Get Answer: AI responds with detailed 3-5 sentence answers
View Sources: Click "View Sources" to see cited documents
Features
Copy Messages: š Click button on any message to copy
Recent Queries: ā» Quick re-ask from history
Quick Help: š” Expandable tips and usage guide
Performance Metrics: š Real-time statistics in System Status
Error Prevention: ā Query validation with helpful feedback
ā” Performance
First Query: ~15-20s (model initialization)
Subsequent Queries: 3-8s (cached LLM)
Copy/History: Instant (client-side)
Metrics Update: Real-time (no overhead)
š Example Queries
"How do I set up LiveKit?"
"What are the best practices for video conferencing?"
"How do I implement real-time communication?"
"What authentication methods are available?"
"How do I handle bandwidth optimization?"
"Deploy to Kubernetes - how does LiveKit handle it?"
š ļø Configuration
Edit .env file:
š Ingestion
Populate Pinecone with LiveKit documentation:
This creates 3,007 searchable vector chunks from LiveKit docs.
š Status Checks
View system status in Streamlit sidebar:
šØ Troubleshooting
Issue | Solution |
"No relevant documentation found" | Try web search mode or different keywords |
"MCP Server not found" | Ensure
is running in Terminal 1 |
Slow first response | Normal - model loads on first query (~15-20s) |
API key errors | Check
file and verify all keys are set |
Empty Pinecone index | Run
to populate |
š Notes
All chat history saved in session state
Supports semantic search with keyword fallback
Responses stored with source attribution
Query validation prevents invalid inputs
Performance optimized for fast inference
šØāš» Created By
@THENABILMAN - GitHub
š License
Built with ā¤ļø for developers. Feel free to modify and extend!
Version: Enhanced v1.0 | Status: ā Production Ready | Date: November 1, 2025
This server cannot be installed
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
Enables AI-powered semantic search and question-answering for LiveKit documentation using Pinecone vector search and real-time web search with Tavily, providing detailed responses with source attribution.