Provides comprehensive management of Google Cloud Dataproc clusters and jobs, including cluster creation/deletion, Spark/PySpark/Hive job submission and monitoring, service account impersonation, and real-time analytics with intelligent parameter defaults and semantic search capabilities.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Dataproc MCP Serverlist all running Dataproc clusters in us-central1"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Dataproc MCP Server
A production-ready Model Context Protocol (MCP) server for Google Cloud Dataproc operations with intelligent parameter injection, enterprise-grade security, and comprehensive tooling. Designed for seamless integration with Roo (VS Code).
π Quick Start
Recommended: Roo (VS Code) Integration
Add this to your Roo MCP settings:
{
"mcpServers": {
"dataproc": {
"command": "npx",
"args": ["@dipseth/dataproc-mcp-server@latest"],
"env": {
"LOG_LEVEL": "info"
}
}
}
}With Custom Config File
{
"mcpServers": {
"dataproc": {
"command": "npx",
"args": ["@dipseth/dataproc-mcp-server@latest"],
"env": {
"LOG_LEVEL": "info",
"DATAPROC_CONFIG_PATH": "/path/to/your/config.json"
}
}
}
}Alternative: Global Installation
# Install globally
npm install -g @dipseth/dataproc-mcp-server
# Start the server
dataproc-mcp-server
# Or run directly
npx @dipseth/dataproc-mcp-server@latest5-Minute Setup
Install the package:
npm install -g @dipseth/dataproc-mcp-server@latestRun the setup:
dataproc-mcp --setupConfigure authentication:
# Edit the generated config file nano config/server.jsonStart the server:
dataproc-mcp
π Claude.ai Web App Compatibility
β PRODUCTION-READY: Full Claude.ai Integration with HTTPS Tunneling & OAuth
The Dataproc MCP Server now provides complete Claude.ai web app compatibility with a working solution that includes all 22 MCP tools!
π Working Solution (Tested & Verified)
Terminal 1 - Start MCP Server:
DATAPROC_CONFIG_PATH=config/github-oauth-server.json npm start -- --http --oauth --port 8080Terminal 2 - Start Cloudflare Tunnel:
cloudflared tunnel --url https://localhost:8443 --origin-server-name localhost --no-tls-verifyResult: Claude.ai can see and use all tools successfully! π
Key Features:
β Complete Tool Access - All 22 MCP tools available in Claude.ai
β HTTPS Tunneling - Cloudflare tunnel for secure external access
β OAuth Authentication - GitHub OAuth for secure authentication
β Trusted Certificates - No browser warnings or connection issues
β WebSocket Support - Full WebSocket compatibility with Claude.ai
β Production Ready - Tested and verified working solution
Quick Setup:
Setup GitHub OAuth (5 minutes)
Generate SSL certificates:
npm run ssl:generateStart services (2 terminals as shown above)
Connect Claude.ai to your tunnel URL
π Complete Guide: See
docs/claude-ai-integration.mdfor detailed setup instructions, troubleshooting, and advanced features.
π Certificate Setup: See
docs/trusted-certificates.mdfor SSL certificate configuration.
β¨ Features
π― Core Capabilities
22 Production-Ready MCP Tools - Complete Dataproc management suite
π§ Knowledge Base Semantic Search - Natural language queries with optional Qdrant integration
π Response Optimization - 60-96% token reduction with Qdrant storage
π Generic Type Conversion System - Automatic, type-safe data transformations
60-80% Parameter Reduction - Intelligent default injection
Multi-Environment Support - Dev/staging/production configurations
Service Account Impersonation - Enterprise authentication
Real-time Job Monitoring - Comprehensive status tracking
π Response Optimization
96.2% Token Reduction -
list_clusters: 7,651 β 292 tokensAutomatic Qdrant Storage - Full data preserved and searchable
Resource URI Access -
dataproc://responses/clusters/list/abc123Graceful Fallback - Works without Qdrant, falls back to full responses
9.95ms Processing - Lightning-fast optimization with <1MB memory usage
π Generic Type Conversion System
75% Code Reduction - Eliminates manual conversion logic across services
Type-Safe Transformations - Automatic field detection and mapping
Intelligent Compression - Field-level compression with configurable thresholds
0.50ms Conversion Times - Lightning-fast processing with 100% compression ratios
Zero-Configuration - Works automatically with existing TypeScript types
Backward Compatible - Seamless integration with existing functionality
οΏ½ Enterprise Security
Input Validation - Zod schemas for all 16 tools
Rate Limiting - Configurable abuse prevention
Credential Management - Secure handling and rotation
Audit Logging - Comprehensive security event tracking
Threat Detection - Injection attack prevention
π Quality Assurance
90%+ Test Coverage - Comprehensive test suite
Performance Monitoring - Configurable thresholds
Multi-Environment Testing - Cross-platform validation
Automated Quality Gates - CI/CD integration
Security Scanning - Vulnerability management
π Developer Experience
5-Minute Setup - Quick start guide
Interactive Documentation - HTML docs with examples
Comprehensive Examples - Multi-environment configs
Troubleshooting Guides - Common issues and solutions
IDE Integration - TypeScript support
π οΈ Complete MCP Tools Suite (22 Tools)
π Enhanced with Generic Type Conversion: All tools now benefit from automatic, type-safe data transformations with intelligent compression and field mapping.
π Cluster Management (8 Tools)
Tool | Description | Smart Defaults | Key Features |
| Create and start new clusters | β 80% fewer params | Profile-based, auto-config |
| Create from YAML configuration | β Project/region injection | Template-driven setup |
| Create using predefined profiles | β 85% fewer params | 8 built-in profiles |
| List all clusters with filtering | β No params needed | Semantic queries, pagination |
| List MCP-created clusters | β Profile filtering | Creation tracking |
| Get detailed cluster information | β 75% fewer params | Semantic data extraction |
| Delete existing clusters | β Project/region defaults | Safe deletion |
| Get Zeppelin notebook URL | β Auto-discovery | Web interface access |
πΌ Job Management (7 Tools)
Tool | Description | Smart Defaults | Key Features |
| Submit Hive queries to clusters | β 70% fewer params | Async support, timeouts |
| Submit Spark/PySpark/Presto jobs | β 75% fewer params | Multi-engine support, Local file staging |
| Cancel running or pending jobs | β JobID only needed | Emergency cancellation, cost control |
| Get job execution status | β JobID only needed | Real-time monitoring |
| Get job outputs and results | β Auto-pagination | Result formatting |
| Get Hive query status | β Minimal params | Query tracking |
| Get Hive query results | β Smart pagination | Enhanced async support |
π Configuration & Profiles (3 Tools)
Tool | Description | Smart Defaults | Key Features |
| List available cluster profiles | β Category filtering | 8 production profiles |
| Get detailed profile configuration | β Profile ID only | Template access |
| Query stored cluster data | β Natural language | Semantic search |
π Analytics & Insights (4 Tools)
Tool | Description | Smart Defaults | Key Features |
| Quick status of all active jobs | β No params needed | Multi-project view |
| Comprehensive cluster analytics | β Auto-discovery | Machine types, components |
| Job performance analytics | β Success rates | Error patterns, metrics |
| Query comprehensive knowledge base | β Natural language | Clusters, jobs, errors |
π― Key Capabilities
π§ Semantic Search: Natural language queries with Qdrant integration
β‘ Smart Defaults: 60-80% parameter reduction through intelligent injection
π Response Optimization: 96% token reduction with full data preservation
π Async Support: Non-blocking job submission and monitoring
π·οΈ Profile System: 8 production-ready cluster templates
π Analytics: Comprehensive insights and performance tracking
π Configuration
Project-Based Configuration
The server supports a project-based configuration format:
# profiles/@analytics-workloads.yaml
my-company-analytics-prod-1234:
region: us-central1
tags:
- DataProc
- analytics
- production
labels:
service: analytics-service
owner: data-team
environment: production
cluster_config:
# ... cluster configurationAuthentication Methods
Service Account Impersonation (Recommended)
Direct Service Account Key
Application Default Credentials
Hybrid Authentication with fallbacks
π Documentation
Quick Start Guide - Get started in 5 minutes
Knowledge Base Semantic Search - Natural language queries and setup
Generic Type Conversion System - Architectural design and implementation
Generic Converter Migration Guide - Migration from manual conversions
API Reference - Complete tool documentation
Configuration Examples - Real-world configurations
Security Guide - Best practices and compliance
Installation Guide - Detailed setup instructions
π§ MCP Client Integration
Claude Desktop
{
"mcpServers": {
"dataproc": {
"command": "npx",
"args": ["@dataproc/mcp-server"],
"env": {
"LOG_LEVEL": "info"
}
}
}
}Roo (VS Code)
{
"mcpServers": {
"dataproc-server": {
"command": "npx",
"args": ["@dataproc/mcp-server"],
"disabled": false,
"alwaysAllow": [
"list_clusters",
"get_cluster",
"list_profiles"
]
}
}
}ποΈ Architecture
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β MCP Client ββββββ Dataproc MCP ββββββ Google Cloud β
β (Claude/Roo) β β Server β β Dataproc β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
ββββββββ΄βββββββ
β Features β
βββββββββββββββ€
β β’ Security β
β β’ Profiles β
β β’ Validationβ
β β’ Monitoringβ
β β’ Generic β
β Converter β
βββββββββββββββπ Generic Type Conversion System Architecture
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Source Types ββββββ Generic Converter ββββββ Qdrant Payloads β
β β’ ClusterData β β System β β β’ Compressed β
β β’ QueryResults β β β β β’ Type-Safe β
β β’ JobData β β ββββββββββββββββ β β β’ Optimized β
βββββββββββββββββββ β βField Analyzerβ β βββββββββββββββββββ
β βTransformationβ β
β βEngine β β
β βCompression β β
β βService β β
β ββββββββββββββββ β
ββββββββββββββββββββπ¦ Performance
Response Time Achievements
Schema Validation: ~2ms (target: <5ms) β
Parameter Injection: ~1ms (target: <2ms) β
Generic Type Conversion: ~0.50ms (target: <2ms) β
Credential Validation: ~25ms (target: <50ms) β
MCP Tool Call: ~50ms (target: <100ms) β
Throughput Achievements
Schema Validation: ~2000 ops/sec β
Parameter Injection: ~5000 ops/sec β
Generic Type Conversion: ~2000 ops/sec β
Credential Validation: ~200 ops/sec β
MCP Tool Call: ~100 ops/sec β
Compression Achievements
Field-Level Compression: Up to 100% compression ratios β
Memory Optimization: 30-60% reduction in memory usage β
Type Safety: Zero runtime type errors with automatic validation β
π§ͺ Testing
# Run all tests
npm test
# Run specific test suites
npm run test:unit
npm run test:integration
npm run test:performance
# Run with coverage
npm run test:coverageπ€ Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Setup
# Clone the repository
git clone https://github.com/dipseth/dataproc-mcp.git
cd dataproc-mcp
# Install dependencies
npm install
# Build the project
npm run build
# Run tests
npm test
# Start development server
npm run devπ License
This project is licensed under the MIT License - see the LICENSE file for details.
π Support
GitHub Issues: Report bugs and request features
Documentation: Complete documentation
NPM Package: Package information
π Acknowledgments
Model Context Protocol - The protocol that makes this possible
Google Cloud Dataproc - The service we're integrating with
Qdrant - High-performance vector database powering our semantic search and knowledge indexing
TypeScript - For type safety and developer experience
Made with β€οΈ for the MCP and Google Cloud communities