README.md•17.1 kB
# MCP-RAG: Agentic AI Orchestration for Business Analytics
A lightweight demonstration of **Model Context Protocol (MCP)** combined with **Retrieval-Augmented Generation (RAG)** to orchestrate multi-agent AI workflows for business analysis.
## 🎯 What This Project Demonstrates
This project showcases how to build **agentic AI systems** that can:
1. **Orchestrate Multiple Agents**: MCP servers coordinate different specialized tools
2. **Retrieve Business Knowledge**: RAG provides context-aware information retrieval
3. **Perform Statistical Analysis**: Automated data analysis with natural language queries
4. **Maintain Modularity**: Easy to swap LLM backends and add new capabilities
## 🚀 Key Features
- **MCP-Based Coordination**: Multiple specialized servers working together
- **Business Analytics Tools**: Mean, standard deviation, correlation, linear regression
- **RAG Knowledge Base**: Business terms, policies, and analysis guidelines
- **Modular Design**: Easy to extend with new tools or swap LLM backends
- **Natural Language Interface**: Ask questions like "What's the average earnings from Q1?"
## 📋 Prerequisites
- Python 3.8+
- Google Gemini API key (free tier available) - for future LLM integration
- Basic understanding of MCP and RAG concepts
## 🛠️ Installation
1. **Clone the repository**:
```bash
git clone https://github.com/ANSH-RIYAL/MCP-RAG.git
cd MCP-RAG
```
2. **Install dependencies**:
```bash
pip install -r requirements.txt
```
3. **Set up environment variables**:
**For Gemini API (default)**:
```bash
export LLM_MODE="gemini"
export GEMINI_API_KEY="your-gemini-api-key"
```
**For Custom Localhost API**:
```bash
export LLM_MODE="custom"
export CUSTOM_API_URL="http://localhost:8000"
export CUSTOM_API_KEY="your-api-key" # Optional
```
## 🎮 Usage
### Quick Demo
Run the demonstration script to see both MCP servers in action:
```bash
python main.py
```
This will show:
- Business analytics tools working with sample data
- RAG knowledge retrieval for business terms
- How the systems can work together
- LLM integration with the selected backend
### LLM Backend Selection
The system supports two LLM backends:
#### Option 1: Google Gemini API (Default)
```bash
export LLM_MODE="gemini"
export GEMINI_API_KEY="your-gemini-api-key"
python main.py
```
#### Option 2: Custom Localhost API
```bash
export LLM_MODE="custom"
export CUSTOM_API_URL="http://localhost:8000"
export CUSTOM_API_KEY="your-api-key" # Optional
python main.py
```
**Custom API Requirements:**
- Must support OpenAI-compatible chat completions endpoint (`/v1/chat/completions`)
- Should accept tool/function calling format
- Expected to run on localhost:8000 (configurable)
### Conversation Scenarios
Run the conversation scenarios to see real-world usage examples:
```bash
python test_scenarios.py
```
This demonstrates the LinkedIn post scenarios showing how non-technical users interact with the system.
### Business Analytics Tools
The system provides these analysis capabilities:
- **Data Exploration**: Get dataset information and sample data
- **Statistical Analysis**: Mean, standard deviation with filtering
- **Correlation Analysis**: Find relationships between variables
- **Predictive Modeling**: Linear regression for forecasting
### RAG Knowledge Retrieval
Access business knowledge through:
- **Term Definitions**: Look up business concepts
- **Policy Information**: Retrieve company procedures
- **Analysis Guidelines**: Get context for data interpretation
## 📖 Scenarios & Use Cases
### Scenario 1: Sales Analysis
```
Manager: "What's the average earnings from Q1?"
MCP-RAG System:
1. Analytics Server: calculate_mean(column='earnings', filter_column='quarter', filter_value='Q1-2024')
→ Mean of earnings: 101666.67
2. RAG Server: get_business_terms(term='earnings')
→ Earnings: Total revenue generated by a department or company in a given period
3. Response: "Average earnings for Q1-2024: $101,667"
```
### Scenario 2: Performance Correlation
```
Manager: "What's the correlation between sales and expenses?"
MCP-RAG System:
1. Analytics Server: calculate_correlation(column1='sales', column2='expenses')
→ Correlation between sales and expenses: 0.923
2. Response: "Correlation: 0.923 (strong positive relationship)"
```
### Scenario 3: Predictive Modeling
```
Manager: "Build a model to predict earnings from sales and employees"
MCP-RAG System:
1. Analytics Server: linear_regression(target_column='earnings', feature_columns=['sales', 'employees'])
→ Linear Regression Results:
Target: earnings
Features: ['sales', 'employees']
Intercept: 15000.00
sales coefficient: 0.45
employees coefficient: 1250.00
R-squared: 0.987
2. Response: "Model created with R² = 0.987"
```
### Scenario 4: Business Knowledge
```
Manager: "What does profit margin mean?"
MCP-RAG System:
1. RAG Server: get_business_terms(term='profit margin')
→ Profit Margin: Percentage of revenue that remains as profit after expenses, calculated as (earnings - expenses) / earnings
2. Response: "Profit Margin: Percentage of revenue that remains as profit after expenses"
```
### Scenario 5: Policy Information
```
Manager: "What are the budget allocation policies?"
MCP-RAG System:
1. RAG Server: get_company_policies(policy_type='budget')
→ Budget Allocation: Marketing gets 25% of total budget, Engineering gets 30%, Sales gets 45%
2. Response: "Budget Allocation: Marketing gets 25%, Engineering gets 30%, Sales gets 45%"
```
## 🔧 Customization Guide
### For Your Organization
#### Step 1: Replace Sample Data
1. **Update Business Data**: Replace `data/sample_business_data.csv` with your actual data
- Ensure columns are numeric for analysis tools
- Add any categorical columns for filtering
- Include time-based columns for trend analysis
2. **Update Knowledge Base**: Replace `data/business_knowledge.txt` with your organization's:
- Business terms and definitions
- Company policies and procedures
- Analysis guidelines and best practices
#### Step 2: Add Custom Analytics Tools
**File to modify**: `src/servers/business_analytics_server.py`
1. **Add New Tools**: In the `handle_list_tools()` function (around line 29), add new tools to the tools list:
```python
@server.list_tools()
async def handle_list_tools() -> ListToolsResult:
return ListToolsResult(
tools=[
# ... existing tools (calculate_mean, calculate_std, calculate_correlation, linear_regression) ...
Tool(
name="your_custom_analysis",
description="Your custom analysis tool",
inputSchema={
"type": "object",
"properties": {
"parameter": {"type": "string"}
},
"required": ["parameter"]
}
)
]
)
```
2. **Implement Tool Logic**: In the `handle_call_tool()` function (around line 140), add the corresponding handler:
```python
elif name == "your_custom_analysis":
parameter = arguments["parameter"]
# Your custom analysis logic here
result = f"Custom analysis result for {parameter}"
return CallToolResult(
content=[TextContent(type="text", text=result)]
)
```
#### Step 3: Extend RAG Capabilities
**File to modify**: `src/servers/rag_server.py`
1. **Add New Knowledge Sources**: Modify the `load_business_knowledge()` function (around line 25) to include:
- Database connections
- Document processing (PDFs, Word docs)
- API integrations (Salesforce, HubSpot, etc.)
2. **Add New RAG Tools**: In the `handle_list_tools()` function (around line 50), add new tools:
```python
Tool(
name="your_custom_rag_tool",
description="Your custom knowledge retrieval tool",
inputSchema={
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
)
```
3. **Implement RAG Tool Logic**: In the `handle_call_tool()` function (around line 90), add the handler:
```python
elif name == "your_custom_rag_tool":
query = arguments["query"]
# Your custom RAG logic here
result = f"Custom RAG result for {query}"
return CallToolResult(
content=[TextContent(type="text", text=result)]
)
```
#### Step 4: Integrate LLM Backend
**File to create**: `src/servers/llm_server.py` (new file)
The system already includes a flexible LLM client (`src/core/llm_client.py`) that supports both Gemini and custom localhost APIs.
1. **Using the Existing LLM Client**: The `FlexibleRAGAgent` in `src/core/gemini_rag_agent.py` already supports:
- Google Gemini API
- Custom localhost API (OpenAI-compatible format)
2. **Create Custom LLM Server** (optional): If you need a dedicated MCP server for LLM operations:
```python
import asyncio
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent, CallToolResult
server = Server("llm-server")
@server.list_tools()
async def handle_list_tools():
return ListToolsResult(
tools=[
Tool(
name="process_natural_language",
description="Convert natural language to tool calls",
inputSchema={
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
)
]
)
@server.call_tool()
async def handle_call_tool(name: str, arguments: dict):
if name == "process_natural_language":
query = arguments["query"]
# Integrate with OpenAI, Gemini, or local models
# Convert natural language to appropriate tool calls
return CallToolResult(
content=[TextContent(type="text", text=f"Processed: {query}")]
)
```
3. **Add to requirements.txt**:
```txt
openai>=1.0.0
google-genai>=0.3.0
httpx>=0.24.0
```
#### Step 5: Add New Data Sources
**Files to modify**: `src/servers/business_analytics_server.py` and `src/servers/rag_server.py`
1. **Database Connectors**: Add tools to connect to:
- PostgreSQL, MySQL, SQLite
- MongoDB, Redis
- Data warehouses (Snowflake, BigQuery)
2. **API Integrations**: Connect to business systems:
- CRM systems (Salesforce, HubSpot)
- Marketing platforms (Google Analytics, Facebook Ads)
- Financial systems (QuickBooks, Xero)
### Current Tool Implementations
**Business Analytics Tools** (`src/servers/business_analytics_server.py`):
- `calculate_mean` - Calculate average of numeric columns
- `calculate_std` - Calculate standard deviation
- `calculate_correlation` - Find relationships between variables
- `linear_regression` - Build predictive models
- `get_data_info` - Get dataset information
**RAG Tools** (`src/servers/rag_server.py`):
- `get_business_terms` - Look up business definitions
- `get_company_policies` - Retrieve policy information
- `search_business_knowledge` - General knowledge search
**LLM Integration** (`src/core/llm_client.py`):
- `FlexibleRAGAgent` - Supports both Gemini and custom localhost APIs
- `LLMClient` - Handles API communication for both backends
- Tool calling and conversation management
### Modular Architecture Benefits
The modular design allows you to:
- **Swap Components**: Replace any server without affecting others
- **Add Capabilities**: Plug in new tools without rewriting existing code
- **Scale Independently**: Run different servers on different machines
- **Customize Per Use Case**: Use only the tools you need
### Example Extensions
#### Adding Sentiment Analysis
**File to create**: `src/servers/sentiment_analysis_server.py`
```python
# Create sentiment_analysis_server.py
@server.list_tool()
async def analyze_sentiment(text: str) -> CallToolResult:
# Integrate with sentiment analysis API
# Return sentiment scores and insights
```
#### Adding Forecasting
**File to modify**: `src/servers/business_analytics_server.py`
```python
# Add to handle_list_tools() function
Tool(
name="time_series_forecast",
description="Forecast future values using time series analysis",
inputSchema={
"type": "object",
"properties": {
"column": {"type": "string"},
"periods": {"type": "integer"}
}
}
)
```
#### Adding Document Processing
**File to create**: `src/servers/document_processor_server.py`
```python
# Create document_processor_server.py
@server.list_tool()
async def process_document(file_path: str) -> CallToolResult:
# Extract text from PDFs, Word docs, etc.
# Add to knowledge base
```
## 🏗️ Architecture
### Project Structure
```
MCP-RAG/
├── data/
│ ├── sample_business_data.csv # Business dataset for analysis
│ └── business_knowledge.txt # RAG knowledge base
├── src/
│ └── servers/
│ ├── business_analytics_server.py # Statistical analysis tools
│ └── rag_server.py # Knowledge retrieval tools
├── main.py # Demo and orchestration script
├── test_scenarios.py # Conversation scenarios
├── requirements.txt # Dependencies
└── README.md # This file
```
### Key Components
1. **Business Analytics Server**: MCP server providing statistical analysis tools
2. **RAG Server**: MCP server for business knowledge retrieval
3. **Orchestration Layer**: Coordinates between servers and LLM (future)
4. **Data Layer**: Sample business data and knowledge base
## 🔧 Configuration
### Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `LLM_MODE` | LLM backend mode: "gemini" or "custom" | `gemini` |
| `GEMINI_API_KEY` | Gemini API key for LLM integration | None |
| `GEMINI_MODEL` | Gemini model name | `gemini-2.0-flash-exp` |
| `CUSTOM_API_URL` | Custom localhost API URL | `http://localhost:8000` |
| `CUSTOM_API_KEY` | Custom API key (optional) | None |
### Sample Data
The system includes:
- **Quarterly Business Data**: Sales, Marketing, Engineering metrics across 4 quarters
- **Business Knowledge Base**: Terms, policies, and analysis guidelines
## 🎯 Use Cases
### For Business Leaders
- **No-Code Analytics**: Ask natural language questions about business data
- **Quick Insights**: Get statistical analysis without technical expertise
- **Context-Aware Reports**: Combine data analysis with business knowledge
### For Data Teams
- **Modular Architecture**: Easy to add new analysis tools
- **LLM Integration**: Ready for natural language query processing
- **Extensible Framework**: Build custom agents for specific needs
### For AI Engineers
- **MCP Protocol**: Learn modern AI orchestration patterns
- **RAG Implementation**: Understand knowledge retrieval systems
- **Agentic Design**: Build multi-agent AI workflows
## 🚀 Future Enhancements
### Planned Features
- [ ] **LLM Integration**: Connect with Gemini, OpenAI, or local models
- [ ] **Natural Language Queries**: Process complex business questions
- [ ] **Advanced Analytics**: Time series analysis, clustering, forecasting
- [ ] **Web Interface**: User-friendly dashboard for non-technical users
- [ ] **Real-time Data**: Connect to live data sources
- [ ] **Custom Knowledge Bases**: Upload company-specific documents
### Integration Possibilities
- **Local LLM API**: Use open-source models with [Local LLM API](https://github.com/ANSH-RIYAL/local-llm-api)
- **Database Connectors**: Connect to SQL databases, data warehouses
- **API Integrations**: Salesforce, HubSpot, Google Analytics
- **Document Processing**: PDF, DOCX, email analysis
## 🤝 Contributing
This is a foundation for building agentic AI systems. Contributions welcome:
- **New Analysis Tools**: Add statistical methods, ML models
- **Knowledge Base Expansion**: Business domains, industry-specific content
- **LLM Integrations**: Support for different AI models
- **Documentation**: Tutorials, use cases, best practices
## 📄 License
MIT License - feel free to use and modify for your own projects!
## 🔗 Related Projects
- **[Local LLM API](https://github.com/ANSH-RIYAL/local-llm-api)**: Run open-source LLMs locally
- **MCP Protocol**: [Official documentation](https://modelcontextprotocol.io/)
---
**Ready to build your own agentic AI system?** Start with this foundation and extend it for your specific needs. The modular design makes it easy to add new capabilities while maintaining clean architecture.
#AgenticAI #MCP #RAG #BusinessAnalytics #OpenSourceAI