MCP_INTEGRATION_GUIDE.mdā¢6.94 kB
# š¤ True MCP Server Integration Guide
## šÆ **What We Now Have**
You now have **BOTH**:
1. **FastAPI Server** (`main.py`) - For direct API calls and automation
2. **True MCP Server** (`mcp_server.py`) - For LLM integration via Model Context Protocol
## š **MCP Server Features**
### **Available Tools for LLMs:**
- `extract_dataset_metadata` - Analyze dataset structure and statistics
- `generate_data_quality_rules` - Create intelligent quality rules
- `process_complete_dataset` - Run full onboarding pipeline
- `list_catalog_files` - Show cataloged datasets
- `list_processed_datasets` - Show locally processed datasets
- `get_dataset_summary` - Get detailed dataset information
## š **How to Use with Claude Desktop**
### **Step 1: Install MCP Dependencies**
```bash
pip install mcp==1.0.0
```
### **Step 2: Configure Claude Desktop**
Add this to your Claude Desktop MCP configuration:
```json
{
"mcpServers": {
"dataset-onboarding": {
"command": "python",
"args": ["C:/path/to/your/mcp/mcp_server.py"],
"cwd": "C:/path/to/your/mcp",
"env": {
"GOOGLE_SERVICE_ACCOUNT_KEY_PATH": "mcp1-467108-702d9a41627c.json",
"MCP_SERVER_FOLDER_ID": "1roPPn6-sQHKyQDw8rVkmnlTyao3KZSBQ",
"MCP_CLIENT_FOLDER_ID": "1lJ5OKMqbSKuz_7aAAjcWSKyxPb-yMvXP"
}
}
}
}
```
### **Step 3: Test the Integration**
```bash
# Test the MCP server
python test_mcp_server.py
```
## š¬ **Example LLM Conversations**
Once integrated with Claude Desktop, you can have conversations like:
### **Example 1: Dataset Analysis**
**You:** "I uploaded a customer data CSV to my Google Drive. The file ID is `1ABC123XYZ`. Can you analyze it for me?"
**Claude:** *Uses `extract_dataset_metadata` tool*
"I've analyzed your customer dataset! It has 1,250 rows and 8 columns including customer_id, email, age, etc. Here are the key insights..."
### **Example 2: Quality Assessment**
**You:** "What data quality issues should I watch out for in this dataset?"
**Claude:** *Uses `generate_data_quality_rules` tool*
"I've generated 12 data quality rules for your dataset. Critical issues include: 3 columns with null values, email uniqueness concerns..."
### **Example 3: Complete Processing**
**You:** "Please process this dataset completely and prepare it for our data catalog."
**Claude:** *Uses `process_complete_dataset` tool*
"I've completed the full processing pipeline! Generated contract, metadata, and quality reports. All files are organized in processed_datasets/customer_data_2024/"
## š§ **Integration with Other LLMs**
### **OpenAI GPT with Function Calling**
```python
import openai
from mcp.client.session import ClientSession
# Use MCP tools as OpenAI functions
tools = await mcp_session.list_tools()
openai_functions = convert_mcp_tools_to_openai_format(tools)
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": "Analyze my dataset"}],
functions=openai_functions
)
```
### **Custom LLM Integration**
```python
# Any LLM can use the MCP server
async def use_mcp_with_custom_llm(user_query, file_id):
async with mcp_session:
if "analyze" in user_query.lower():
result = await mcp_session.call_tool("extract_dataset_metadata", {"file_id": file_id})
return result.content[0].text
```
## šļø **Architecture Overview**
```
āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāā
ā LLM Client ā ā MCP Server ā ā Google Drive ā
ā (Claude, GPT) āāāāāŗā (mcp_server.py) āāāāāŗā Datasets ā
āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāāā
ā Processing Logic ā
ā (dataset_processor,ā
ā utils, etc.) ā
āāāāāāāāāāāāāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāāā
ā Organized Output ā
ā(processed_datasets)ā
āāāāāāāāāāāāāāāāāāāā
```
## šÆ **Use Cases with LLMs**
### **Data Analyst Assistant**
- "Analyze this sales data and tell me about data quality issues"
- "Generate a summary report for the marketing dataset"
- "What columns have missing values in the customer file?"
### **Data Engineer Copilot**
- "Process all datasets in my folder and create contracts"
- "Check if the new dataset follows our quality standards"
- "Generate metadata for the quarterly reports"
### **Business User Helper**
- "I uploaded a spreadsheet, can you tell me what's in it?"
- "Are there any problems with my data that I should fix?"
- "Create a professional summary of this dataset"
## š **Testing Your MCP Server**
### **Manual Test**
```bash
# Test the MCP server directly
python test_mcp_server.py
```
### **With Claude Desktop**
1. Configure the MCP server in Claude Desktop settings
2. Restart Claude Desktop
3. Ask: "What tools do you have available for dataset processing?"
4. Claude should list your MCP tools
### **Debug Mode**
```bash
# Run MCP server with debug output
python mcp_server.py --debug
```
## š **Production Deployment**
### **As a Service**
```bash
# Run MCP server as a background service
nohup python mcp_server.py > mcp_server.log 2>&1 &
```
### **With Docker**
```dockerfile
FROM python:3.11-slim
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["python", "mcp_server.py"]
```
### **Load Balancing**
For high-traffic scenarios, run multiple MCP server instances behind a load balancer.
## š **Benefits of True MCP Integration**
### **Before (FastAPI Only)**
- ā Manual API calls required
- ā No LLM integration
- ā Complex integration setup
### **After (True MCP Server)**
- ā
Natural language interaction with LLMs
- ā
Automatic tool discovery
- ā
Seamless Claude Desktop integration
- ā
Standardized protocol
- ā
Easy to extend with new tools
## š® **Future Enhancements**
- **Multi-modal Support**: Handle images, PDFs, etc.
- **Streaming Responses**: Real-time processing updates
- **Tool Chaining**: Automatic multi-step workflows
- **Custom Prompts**: Domain-specific instructions
- **Webhook Integration**: Event-driven processing
Your dataset onboarding system is now a **true MCP server** that can be used by any MCP-compatible LLM! š