Skip to main content
Glama

MCP Dataset Onboarding Server

by Magenta91
MCP_INTEGRATION_GUIDE.md•6.94 kB
# šŸ¤– True MCP Server Integration Guide ## šŸŽÆ **What We Now Have** You now have **BOTH**: 1. **FastAPI Server** (`main.py`) - For direct API calls and automation 2. **True MCP Server** (`mcp_server.py`) - For LLM integration via Model Context Protocol ## šŸ”Œ **MCP Server Features** ### **Available Tools for LLMs:** - `extract_dataset_metadata` - Analyze dataset structure and statistics - `generate_data_quality_rules` - Create intelligent quality rules - `process_complete_dataset` - Run full onboarding pipeline - `list_catalog_files` - Show cataloged datasets - `list_processed_datasets` - Show locally processed datasets - `get_dataset_summary` - Get detailed dataset information ## šŸš€ **How to Use with Claude Desktop** ### **Step 1: Install MCP Dependencies** ```bash pip install mcp==1.0.0 ``` ### **Step 2: Configure Claude Desktop** Add this to your Claude Desktop MCP configuration: ```json { "mcpServers": { "dataset-onboarding": { "command": "python", "args": ["C:/path/to/your/mcp/mcp_server.py"], "cwd": "C:/path/to/your/mcp", "env": { "GOOGLE_SERVICE_ACCOUNT_KEY_PATH": "mcp1-467108-702d9a41627c.json", "MCP_SERVER_FOLDER_ID": "1roPPn6-sQHKyQDw8rVkmnlTyao3KZSBQ", "MCP_CLIENT_FOLDER_ID": "1lJ5OKMqbSKuz_7aAAjcWSKyxPb-yMvXP" } } } } ``` ### **Step 3: Test the Integration** ```bash # Test the MCP server python test_mcp_server.py ``` ## šŸ’¬ **Example LLM Conversations** Once integrated with Claude Desktop, you can have conversations like: ### **Example 1: Dataset Analysis** **You:** "I uploaded a customer data CSV to my Google Drive. The file ID is `1ABC123XYZ`. Can you analyze it for me?" **Claude:** *Uses `extract_dataset_metadata` tool* "I've analyzed your customer dataset! It has 1,250 rows and 8 columns including customer_id, email, age, etc. Here are the key insights..." ### **Example 2: Quality Assessment** **You:** "What data quality issues should I watch out for in this dataset?" **Claude:** *Uses `generate_data_quality_rules` tool* "I've generated 12 data quality rules for your dataset. Critical issues include: 3 columns with null values, email uniqueness concerns..." ### **Example 3: Complete Processing** **You:** "Please process this dataset completely and prepare it for our data catalog." **Claude:** *Uses `process_complete_dataset` tool* "I've completed the full processing pipeline! Generated contract, metadata, and quality reports. All files are organized in processed_datasets/customer_data_2024/" ## šŸ”§ **Integration with Other LLMs** ### **OpenAI GPT with Function Calling** ```python import openai from mcp.client.session import ClientSession # Use MCP tools as OpenAI functions tools = await mcp_session.list_tools() openai_functions = convert_mcp_tools_to_openai_format(tools) response = openai.ChatCompletion.create( model="gpt-4", messages=[{"role": "user", "content": "Analyze my dataset"}], functions=openai_functions ) ``` ### **Custom LLM Integration** ```python # Any LLM can use the MCP server async def use_mcp_with_custom_llm(user_query, file_id): async with mcp_session: if "analyze" in user_query.lower(): result = await mcp_session.call_tool("extract_dataset_metadata", {"file_id": file_id}) return result.content[0].text ``` ## šŸ—ļø **Architecture Overview** ``` ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ LLM Client │ │ MCP Server │ │ Google Drive │ │ (Claude, GPT) │◄──►│ (mcp_server.py) │◄──►│ Datasets │ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ │ ā–¼ ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ Processing Logic │ │ (dataset_processor,│ │ utils, etc.) │ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ │ ā–¼ ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ Organized Output │ │(processed_datasets)│ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ ``` ## šŸŽÆ **Use Cases with LLMs** ### **Data Analyst Assistant** - "Analyze this sales data and tell me about data quality issues" - "Generate a summary report for the marketing dataset" - "What columns have missing values in the customer file?" ### **Data Engineer Copilot** - "Process all datasets in my folder and create contracts" - "Check if the new dataset follows our quality standards" - "Generate metadata for the quarterly reports" ### **Business User Helper** - "I uploaded a spreadsheet, can you tell me what's in it?" - "Are there any problems with my data that I should fix?" - "Create a professional summary of this dataset" ## šŸ” **Testing Your MCP Server** ### **Manual Test** ```bash # Test the MCP server directly python test_mcp_server.py ``` ### **With Claude Desktop** 1. Configure the MCP server in Claude Desktop settings 2. Restart Claude Desktop 3. Ask: "What tools do you have available for dataset processing?" 4. Claude should list your MCP tools ### **Debug Mode** ```bash # Run MCP server with debug output python mcp_server.py --debug ``` ## šŸš€ **Production Deployment** ### **As a Service** ```bash # Run MCP server as a background service nohup python mcp_server.py > mcp_server.log 2>&1 & ``` ### **With Docker** ```dockerfile FROM python:3.11-slim COPY . /app WORKDIR /app RUN pip install -r requirements.txt CMD ["python", "mcp_server.py"] ``` ### **Load Balancing** For high-traffic scenarios, run multiple MCP server instances behind a load balancer. ## šŸŽ‰ **Benefits of True MCP Integration** ### **Before (FastAPI Only)** - āŒ Manual API calls required - āŒ No LLM integration - āŒ Complex integration setup ### **After (True MCP Server)** - āœ… Natural language interaction with LLMs - āœ… Automatic tool discovery - āœ… Seamless Claude Desktop integration - āœ… Standardized protocol - āœ… Easy to extend with new tools ## šŸ”® **Future Enhancements** - **Multi-modal Support**: Handle images, PDFs, etc. - **Streaming Responses**: Real-time processing updates - **Tool Chaining**: Automatic multi-step workflows - **Custom Prompts**: Domain-specific instructions - **Webhook Integration**: Event-driven processing Your dataset onboarding system is now a **true MCP server** that can be used by any MCP-compatible LLM! šŸš€

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Magenta91/MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server