Provides comprehensive management of Google Cloud Dataproc clusters and jobs, including cluster creation/deletion, Spark/PySpark/Hive job submission and monitoring, service account impersonation, and real-time analytics with intelligent parameter defaults and semantic search capabilities.
Dataproc MCP Server
A production-ready Model Context Protocol (MCP) server for Google Cloud Dataproc operations with intelligent parameter injection, enterprise-grade security, and comprehensive tooling. Designed for seamless integration with Roo (VS Code).
π Quick Start
Recommended: Roo (VS Code) Integration
Add this to your Roo MCP settings:
With Custom Config File
Alternative: Global Installation
5-Minute Setup
Install the package:
npm install -g @dipseth/dataproc-mcp-server@latestRun the setup:
dataproc-mcp --setupConfigure authentication:
# Edit the generated config file nano config/server.jsonStart the server:
dataproc-mcp
π Claude.ai Web App Compatibility
β PRODUCTION-READY: Full Claude.ai Integration with HTTPS Tunneling & OAuth
The Dataproc MCP Server now provides complete Claude.ai web app compatibility with a working solution that includes all 22 MCP tools!
π Working Solution (Tested & Verified)
Terminal 1 - Start MCP Server:
Terminal 2 - Start Cloudflare Tunnel:
Result: Claude.ai can see and use all tools successfully! π
Key Features:
β Complete Tool Access - All 22 MCP tools available in Claude.ai
β HTTPS Tunneling - Cloudflare tunnel for secure external access
β OAuth Authentication - GitHub OAuth for secure authentication
β Trusted Certificates - No browser warnings or connection issues
β WebSocket Support - Full WebSocket compatibility with Claude.ai
β Production Ready - Tested and verified working solution
Quick Setup:
Setup GitHub OAuth (5 minutes)
Generate SSL certificates:
npm run ssl:generateStart services (2 terminals as shown above)
Connect Claude.ai to your tunnel URL
π Complete Guide: See
docs/claude-ai-integration.mdfor detailed setup instructions, troubleshooting, and advanced features.
π Certificate Setup: See
docs/trusted-certificates.mdfor SSL certificate configuration.
β¨ Features
π― Core Capabilities
22 Production-Ready MCP Tools - Complete Dataproc management suite
π§ Knowledge Base Semantic Search - Natural language queries with optional Qdrant integration
π Response Optimization - 60-96% token reduction with Qdrant storage
π Generic Type Conversion System - Automatic, type-safe data transformations
60-80% Parameter Reduction - Intelligent default injection
Multi-Environment Support - Dev/staging/production configurations
Service Account Impersonation - Enterprise authentication
Real-time Job Monitoring - Comprehensive status tracking
π Response Optimization
96.2% Token Reduction -
list_clusters: 7,651 β 292 tokensAutomatic Qdrant Storage - Full data preserved and searchable
Resource URI Access -
dataproc://responses/clusters/list/abc123Graceful Fallback - Works without Qdrant, falls back to full responses
9.95ms Processing - Lightning-fast optimization with <1MB memory usage
π Generic Type Conversion System
75% Code Reduction - Eliminates manual conversion logic across services
Type-Safe Transformations - Automatic field detection and mapping
Intelligent Compression - Field-level compression with configurable thresholds
0.50ms Conversion Times - Lightning-fast processing with 100% compression ratios
Zero-Configuration - Works automatically with existing TypeScript types
Backward Compatible - Seamless integration with existing functionality
οΏ½ Enterprise Security
Input Validation - Zod schemas for all 16 tools
Rate Limiting - Configurable abuse prevention
Credential Management - Secure handling and rotation
Audit Logging - Comprehensive security event tracking
Threat Detection - Injection attack prevention
π Quality Assurance
90%+ Test Coverage - Comprehensive test suite
Performance Monitoring - Configurable thresholds
Multi-Environment Testing - Cross-platform validation
Automated Quality Gates - CI/CD integration
Security Scanning - Vulnerability management
π Developer Experience
5-Minute Setup - Quick start guide
Interactive Documentation - HTML docs with examples
Comprehensive Examples - Multi-environment configs
Troubleshooting Guides - Common issues and solutions
IDE Integration - TypeScript support
π οΈ Complete MCP Tools Suite (22 Tools)
π Enhanced with Generic Type Conversion: All tools now benefit from automatic, type-safe data transformations with intelligent compression and field mapping.
π Cluster Management (8 Tools)
Tool | Description | Smart Defaults | Key Features |
| Create and start new clusters | β 80% fewer params | Profile-based, auto-config |
| Create from YAML configuration | β Project/region injection | Template-driven setup |
| Create using predefined profiles | β 85% fewer params | 8 built-in profiles |
| List all clusters with filtering | β No params needed | Semantic queries, pagination |
| List MCP-created clusters | β Profile filtering | Creation tracking |
| Get detailed cluster information | β 75% fewer params | Semantic data extraction |
| Delete existing clusters | β Project/region defaults | Safe deletion |
| Get Zeppelin notebook URL | β Auto-discovery | Web interface access |
πΌ Job Management (7 Tools)
Tool | Description | Smart Defaults | Key Features |
| Submit Hive queries to clusters | β 70% fewer params | Async support, timeouts |
| Submit Spark/PySpark/Presto jobs | β 75% fewer params | Multi-engine support, Local file staging |
| Cancel running or pending jobs | β JobID only needed | Emergency cancellation, cost control |
| Get job execution status | β JobID only needed | Real-time monitoring |
| Get job outputs and results | β Auto-pagination | Result formatting |
| Get Hive query status | β Minimal params | Query tracking |
| Get Hive query results | β Smart pagination | Enhanced async support |
π Configuration & Profiles (3 Tools)
Tool | Description | Smart Defaults | Key Features |
| List available cluster profiles | β Category filtering | 8 production profiles |
| Get detailed profile configuration | β Profile ID only | Template access |
| Query stored cluster data | β Natural language | Semantic search |
π Analytics & Insights (4 Tools)
Tool | Description | Smart Defaults | Key Features |
| Quick status of all active jobs | β No params needed | Multi-project view |
| Comprehensive cluster analytics | β Auto-discovery | Machine types, components |
| Job performance analytics | β Success rates | Error patterns, metrics |
| Query comprehensive knowledge base | β Natural language | Clusters, jobs, errors |
π― Key Capabilities
π§ Semantic Search: Natural language queries with Qdrant integration
β‘ Smart Defaults: 60-80% parameter reduction through intelligent injection
π Response Optimization: 96% token reduction with full data preservation
π Async Support: Non-blocking job submission and monitoring
π·οΈ Profile System: 8 production-ready cluster templates
π Analytics: Comprehensive insights and performance tracking
π Configuration
Project-Based Configuration
The server supports a project-based configuration format:
Authentication Methods
Service Account Impersonation (Recommended)
Direct Service Account Key
Application Default Credentials
Hybrid Authentication with fallbacks
π Documentation
Quick Start Guide - Get started in 5 minutes
Knowledge Base Semantic Search - Natural language queries and setup
Generic Type Conversion System - Architectural design and implementation
Generic Converter Migration Guide - Migration from manual conversions
API Reference - Complete tool documentation
Configuration Examples - Real-world configurations
Security Guide - Best practices and compliance
Installation Guide - Detailed setup instructions
π§ MCP Client Integration
Claude Desktop
Roo (VS Code)
ποΈ Architecture
π Generic Type Conversion System Architecture
π¦ Performance
Response Time Achievements
Schema Validation: ~2ms (target: <5ms) β
Parameter Injection: ~1ms (target: <2ms) β
Generic Type Conversion: ~0.50ms (target: <2ms) β
Credential Validation: ~25ms (target: <50ms) β
MCP Tool Call: ~50ms (target: <100ms) β
Throughput Achievements
Schema Validation: ~2000 ops/sec β
Parameter Injection: ~5000 ops/sec β
Generic Type Conversion: ~2000 ops/sec β
Credential Validation: ~200 ops/sec β
MCP Tool Call: ~100 ops/sec β
Compression Achievements
Field-Level Compression: Up to 100% compression ratios β
Memory Optimization: 30-60% reduction in memory usage β
Type Safety: Zero runtime type errors with automatic validation β
π§ͺ Testing
π€ Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Setup
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Support
GitHub Issues: Report bugs and request features
Documentation: Complete documentation
NPM Package: Package information
π Acknowledgments
Model Context Protocol - The protocol that makes this possible
Google Cloud Dataproc - The service we're integrating with
Qdrant - High-performance vector database powering our semantic search and knowledge indexing
TypeScript - For type safety and developer experience
Made with β€οΈ for the MCP and Google Cloud communities