MCP-RAG

README.md•17.1 kB

# MCP-RAG: Agentic AI Orchestration for Business Analytics A lightweight demonstration of **Model Context Protocol (MCP)** combined with **Retrieval-Augmented Generation (RAG)** to orchestrate multi-agent AI workflows for business analysis. ## 🎯 What This Project Demonstrates This project showcases how to build **agentic AI systems** that can: 1. **Orchestrate Multiple Agents**: MCP servers coordinate different specialized tools 2. **Retrieve Business Knowledge**: RAG provides context-aware information retrieval 3. **Perform Statistical Analysis**: Automated data analysis with natural language queries 4. **Maintain Modularity**: Easy to swap LLM backends and add new capabilities ## 🚀 Key Features - **MCP-Based Coordination**: Multiple specialized servers working together - **Business Analytics Tools**: Mean, standard deviation, correlation, linear regression - **RAG Knowledge Base**: Business terms, policies, and analysis guidelines - **Modular Design**: Easy to extend with new tools or swap LLM backends - **Natural Language Interface**: Ask questions like "What's the average earnings from Q1?" ## 📋 Prerequisites - Python 3.8+ - Google Gemini API key (free tier available) - for future LLM integration - Basic understanding of MCP and RAG concepts ## 🛠️ Installation 1. **Clone the repository**: ```bash git clone https://github.com/ANSH-RIYAL/MCP-RAG.git cd MCP-RAG ``` 2. **Install dependencies**: ```bash pip install -r requirements.txt ``` 3. **Set up environment variables**: **For Gemini API (default)**: ```bash export LLM_MODE="gemini" export GEMINI_API_KEY="your-gemini-api-key" ``` **For Custom Localhost API**: ```bash export LLM_MODE="custom" export CUSTOM_API_URL="http://localhost:8000" export CUSTOM_API_KEY="your-api-key" # Optional ``` ## 🎮 Usage ### Quick Demo Run the demonstration script to see both MCP servers in action: ```bash python main.py ``` This will show: - Business analytics tools working with sample data - RAG knowledge retrieval for business terms - How the systems can work together - LLM integration with the selected backend ### LLM Backend Selection The system supports two LLM backends: #### Option 1: Google Gemini API (Default) ```bash export LLM_MODE="gemini" export GEMINI_API_KEY="your-gemini-api-key" python main.py ``` #### Option 2: Custom Localhost API ```bash export LLM_MODE="custom" export CUSTOM_API_URL="http://localhost:8000" export CUSTOM_API_KEY="your-api-key" # Optional python main.py ``` **Custom API Requirements:** - Must support OpenAI-compatible chat completions endpoint (`/v1/chat/completions`) - Should accept tool/function calling format - Expected to run on localhost:8000 (configurable) ### Conversation Scenarios Run the conversation scenarios to see real-world usage examples: ```bash python test_scenarios.py ``` This demonstrates the LinkedIn post scenarios showing how non-technical users interact with the system. ### Business Analytics Tools The system provides these analysis capabilities: - **Data Exploration**: Get dataset information and sample data - **Statistical Analysis**: Mean, standard deviation with filtering - **Correlation Analysis**: Find relationships between variables - **Predictive Modeling**: Linear regression for forecasting ### RAG Knowledge Retrieval Access business knowledge through: - **Term Definitions**: Look up business concepts - **Policy Information**: Retrieve company procedures - **Analysis Guidelines**: Get context for data interpretation ## 📖 Scenarios & Use Cases ### Scenario 1: Sales Analysis ``` Manager: "What's the average earnings from Q1?" MCP-RAG System: 1. Analytics Server: calculate_mean(column='earnings', filter_column='quarter', filter_value='Q1-2024') → Mean of earnings: 101666.67 2. RAG Server: get_business_terms(term='earnings') → Earnings: Total revenue generated by a department or company in a given period 3. Response: "Average earnings for Q1-2024: $101,667" ``` ### Scenario 2: Performance Correlation ``` Manager: "What's the correlation between sales and expenses?" MCP-RAG System: 1. Analytics Server: calculate_correlation(column1='sales', column2='expenses') → Correlation between sales and expenses: 0.923 2. Response: "Correlation: 0.923 (strong positive relationship)" ``` ### Scenario 3: Predictive Modeling ``` Manager: "Build a model to predict earnings from sales and employees" MCP-RAG System: 1. Analytics Server: linear_regression(target_column='earnings', feature_columns=['sales', 'employees']) → Linear Regression Results: Target: earnings Features: ['sales', 'employees'] Intercept: 15000.00 sales coefficient: 0.45 employees coefficient: 1250.00 R-squared: 0.987 2. Response: "Model created with R² = 0.987" ``` ### Scenario 4: Business Knowledge ``` Manager: "What does profit margin mean?" MCP-RAG System: 1. RAG Server: get_business_terms(term='profit margin') → Profit Margin: Percentage of revenue that remains as profit after expenses, calculated as (earnings - expenses) / earnings 2. Response: "Profit Margin: Percentage of revenue that remains as profit after expenses" ``` ### Scenario 5: Policy Information ``` Manager: "What are the budget allocation policies?" MCP-RAG System: 1. RAG Server: get_company_policies(policy_type='budget') → Budget Allocation: Marketing gets 25% of total budget, Engineering gets 30%, Sales gets 45% 2. Response: "Budget Allocation: Marketing gets 25%, Engineering gets 30%, Sales gets 45%" ``` ## 🔧 Customization Guide ### For Your Organization #### Step 1: Replace Sample Data 1. **Update Business Data**: Replace `data/sample_business_data.csv` with your actual data - Ensure columns are numeric for analysis tools - Add any categorical columns for filtering - Include time-based columns for trend analysis 2. **Update Knowledge Base**: Replace `data/business_knowledge.txt` with your organization's: - Business terms and definitions - Company policies and procedures - Analysis guidelines and best practices #### Step 2: Add Custom Analytics Tools **File to modify**: `src/servers/business_analytics_server.py` 1. **Add New Tools**: In the `handle_list_tools()` function (around line 29), add new tools to the tools list: ```python @server.list_tools() async def handle_list_tools() -> ListToolsResult: return ListToolsResult( tools=[ # ... existing tools (calculate_mean, calculate_std, calculate_correlation, linear_regression) ... Tool( name="your_custom_analysis", description="Your custom analysis tool", inputSchema={ "type": "object", "properties": { "parameter": {"type": "string"} }, "required": ["parameter"] } ) ] ) ``` 2. **Implement Tool Logic**: In the `handle_call_tool()` function (around line 140), add the corresponding handler: ```python elif name == "your_custom_analysis": parameter = arguments["parameter"] # Your custom analysis logic here result = f"Custom analysis result for {parameter}" return CallToolResult( content=[TextContent(type="text", text=result)] ) ``` #### Step 3: Extend RAG Capabilities **File to modify**: `src/servers/rag_server.py` 1. **Add New Knowledge Sources**: Modify the `load_business_knowledge()` function (around line 25) to include: - Database connections - Document processing (PDFs, Word docs) - API integrations (Salesforce, HubSpot, etc.) 2. **Add New RAG Tools**: In the `handle_list_tools()` function (around line 50), add new tools: ```python Tool( name="your_custom_rag_tool", description="Your custom knowledge retrieval tool", inputSchema={ "type": "object", "properties": { "query": {"type": "string"} }, "required": ["query"] } ) ``` 3. **Implement RAG Tool Logic**: In the `handle_call_tool()` function (around line 90), add the handler: ```python elif name == "your_custom_rag_tool": query = arguments["query"] # Your custom RAG logic here result = f"Custom RAG result for {query}" return CallToolResult( content=[TextContent(type="text", text=result)] ) ``` #### Step 4: Integrate LLM Backend **File to create**: `src/servers/llm_server.py` (new file) The system already includes a flexible LLM client (`src/core/llm_client.py`) that supports both Gemini and custom localhost APIs. 1. **Using the Existing LLM Client**: The `FlexibleRAGAgent` in `src/core/gemini_rag_agent.py` already supports: - Google Gemini API - Custom localhost API (OpenAI-compatible format) 2. **Create Custom LLM Server** (optional): If you need a dedicated MCP server for LLM operations: ```python import asyncio from mcp.server import Server from mcp.server.stdio import stdio_server from mcp.types import Tool, TextContent, CallToolResult server = Server("llm-server") @server.list_tools() async def handle_list_tools(): return ListToolsResult( tools=[ Tool( name="process_natural_language", description="Convert natural language to tool calls", inputSchema={ "type": "object", "properties": { "query": {"type": "string"} }, "required": ["query"] } ) ] ) @server.call_tool() async def handle_call_tool(name: str, arguments: dict): if name == "process_natural_language": query = arguments["query"] # Integrate with OpenAI, Gemini, or local models # Convert natural language to appropriate tool calls return CallToolResult( content=[TextContent(type="text", text=f"Processed: {query}")] ) ``` 3. **Add to requirements.txt**: ```txt openai>=1.0.0 google-genai>=0.3.0 httpx>=0.24.0 ``` #### Step 5: Add New Data Sources **Files to modify**: `src/servers/business_analytics_server.py` and `src/servers/rag_server.py` 1. **Database Connectors**: Add tools to connect to: - PostgreSQL, MySQL, SQLite - MongoDB, Redis - Data warehouses (Snowflake, BigQuery) 2. **API Integrations**: Connect to business systems: - CRM systems (Salesforce, HubSpot) - Marketing platforms (Google Analytics, Facebook Ads) - Financial systems (QuickBooks, Xero) ### Current Tool Implementations **Business Analytics Tools** (`src/servers/business_analytics_server.py`): - `calculate_mean` - Calculate average of numeric columns - `calculate_std` - Calculate standard deviation - `calculate_correlation` - Find relationships between variables - `linear_regression` - Build predictive models - `get_data_info` - Get dataset information **RAG Tools** (`src/servers/rag_server.py`): - `get_business_terms` - Look up business definitions - `get_company_policies` - Retrieve policy information - `search_business_knowledge` - General knowledge search **LLM Integration** (`src/core/llm_client.py`): - `FlexibleRAGAgent` - Supports both Gemini and custom localhost APIs - `LLMClient` - Handles API communication for both backends - Tool calling and conversation management ### Modular Architecture Benefits The modular design allows you to: - **Swap Components**: Replace any server without affecting others - **Add Capabilities**: Plug in new tools without rewriting existing code - **Scale Independently**: Run different servers on different machines - **Customize Per Use Case**: Use only the tools you need ### Example Extensions #### Adding Sentiment Analysis **File to create**: `src/servers/sentiment_analysis_server.py` ```python # Create sentiment_analysis_server.py @server.list_tool() async def analyze_sentiment(text: str) -> CallToolResult: # Integrate with sentiment analysis API # Return sentiment scores and insights ``` #### Adding Forecasting **File to modify**: `src/servers/business_analytics_server.py` ```python # Add to handle_list_tools() function Tool( name="time_series_forecast", description="Forecast future values using time series analysis", inputSchema={ "type": "object", "properties": { "column": {"type": "string"}, "periods": {"type": "integer"} } } ) ``` #### Adding Document Processing **File to create**: `src/servers/document_processor_server.py` ```python # Create document_processor_server.py @server.list_tool() async def process_document(file_path: str) -> CallToolResult: # Extract text from PDFs, Word docs, etc. # Add to knowledge base ``` ## 🏗️ Architecture ### Project Structure ``` MCP-RAG/ ├── data/ │ ├── sample_business_data.csv # Business dataset for analysis │ └── business_knowledge.txt # RAG knowledge base ├── src/ │ └── servers/ │ ├── business_analytics_server.py # Statistical analysis tools │ └── rag_server.py # Knowledge retrieval tools ├── main.py # Demo and orchestration script ├── test_scenarios.py # Conversation scenarios ├── requirements.txt # Dependencies └── README.md # This file ``` ### Key Components 1. **Business Analytics Server**: MCP server providing statistical analysis tools 2. **RAG Server**: MCP server for business knowledge retrieval 3. **Orchestration Layer**: Coordinates between servers and LLM (future) 4. **Data Layer**: Sample business data and knowledge base ## 🔧 Configuration ### Environment Variables | Variable | Description | Default | |----------|-------------|---------| | `LLM_MODE` | LLM backend mode: "gemini" or "custom" | `gemini` | | `GEMINI_API_KEY` | Gemini API key for LLM integration | None | | `GEMINI_MODEL` | Gemini model name | `gemini-2.0-flash-exp` | | `CUSTOM_API_URL` | Custom localhost API URL | `http://localhost:8000` | | `CUSTOM_API_KEY` | Custom API key (optional) | None | ### Sample Data The system includes: - **Quarterly Business Data**: Sales, Marketing, Engineering metrics across 4 quarters - **Business Knowledge Base**: Terms, policies, and analysis guidelines ## 🎯 Use Cases ### For Business Leaders - **No-Code Analytics**: Ask natural language questions about business data - **Quick Insights**: Get statistical analysis without technical expertise - **Context-Aware Reports**: Combine data analysis with business knowledge ### For Data Teams - **Modular Architecture**: Easy to add new analysis tools - **LLM Integration**: Ready for natural language query processing - **Extensible Framework**: Build custom agents for specific needs ### For AI Engineers - **MCP Protocol**: Learn modern AI orchestration patterns - **RAG Implementation**: Understand knowledge retrieval systems - **Agentic Design**: Build multi-agent AI workflows ## 🚀 Future Enhancements ### Planned Features - [ ] **LLM Integration**: Connect with Gemini, OpenAI, or local models - [ ] **Natural Language Queries**: Process complex business questions - [ ] **Advanced Analytics**: Time series analysis, clustering, forecasting - [ ] **Web Interface**: User-friendly dashboard for non-technical users - [ ] **Real-time Data**: Connect to live data sources - [ ] **Custom Knowledge Bases**: Upload company-specific documents ### Integration Possibilities - **Local LLM API**: Use open-source models with [Local LLM API](https://github.com/ANSH-RIYAL/local-llm-api) - **Database Connectors**: Connect to SQL databases, data warehouses - **API Integrations**: Salesforce, HubSpot, Google Analytics - **Document Processing**: PDF, DOCX, email analysis ## 🤝 Contributing This is a foundation for building agentic AI systems. Contributions welcome: - **New Analysis Tools**: Add statistical methods, ML models - **Knowledge Base Expansion**: Business domains, industry-specific content - **LLM Integrations**: Support for different AI models - **Documentation**: Tutorials, use cases, best practices ## 📄 License MIT License - feel free to use and modify for your own projects! ## 🔗 Related Projects - **[Local LLM API](https://github.com/ANSH-RIYAL/local-llm-api)**: Run open-source LLMs locally - **MCP Protocol**: [Official documentation](https://modelcontextprotocol.io/) --- **Ready to build your own agentic AI system?** Start with this foundation and extend it for your specific needs. The modular design makes it easy to add new capabilities while maintaining clean architecture. #AgenticAI #MCP #RAG #BusinessAnalytics #OpenSourceAI

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ANSH-RIYAL/MCP-RAG'

If you have feedback or need assistance with the MCP directory API, please join our Discord server