Dataproc MCP Server
Provides comprehensive management of Google Cloud Dataproc clusters and jobs, including cluster creation/deletion, Spark/PySpark/Hive job submission and monitoring, service account impersonation, and real-time analytics with intelligent parameter defaults and semantic search capabilities.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Dataproc MCP Serverlist all running Dataproc clusters in us-central1"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Dataproc MCP Server
A production-ready Model Context Protocol (MCP) server for Google Cloud Dataproc operations with intelligent parameter injection, enterprise-grade security, and comprehensive tooling. Designed for seamless integration with Roo (VS Code).
π Quick Start
Recommended: Roo (VS Code) Integration
Add this to your Roo MCP settings:
{
"mcpServers": {
"dataproc": {
"command": "npx",
"args": ["@dipseth/dataproc-mcp-server@latest"],
"env": {
"LOG_LEVEL": "info"
}
}
}
}With Custom Config File
{
"mcpServers": {
"dataproc": {
"command": "npx",
"args": ["@dipseth/dataproc-mcp-server@latest"],
"env": {
"LOG_LEVEL": "info",
"DATAPROC_CONFIG_PATH": "/path/to/your/config.json"
}
}
}
}Alternative: Global Installation
# Install globally
npm install -g @dipseth/dataproc-mcp-server
# Start the server
dataproc-mcp-server
# Or run directly
npx @dipseth/dataproc-mcp-server@latest5-Minute Setup
Install the package:
npm install -g @dipseth/dataproc-mcp-server@latestRun the setup:
dataproc-mcp --setupConfigure authentication:
# Edit the generated config file nano config/server.jsonStart the server:
dataproc-mcp
π Claude.ai Web App Compatibility
β PRODUCTION-READY: Full Claude.ai Integration with HTTPS Tunneling & OAuth
The Dataproc MCP Server now provides complete Claude.ai web app compatibility with a working solution that includes all 22 MCP tools!
π Working Solution (Tested & Verified)
Terminal 1 - Start MCP Server:
DATAPROC_CONFIG_PATH=config/github-oauth-server.json npm start -- --http --oauth --port 8080Terminal 2 - Start Cloudflare Tunnel:
cloudflared tunnel --url https://localhost:8443 --origin-server-name localhost --no-tls-verifyResult: Claude.ai can see and use all tools successfully! π
Key Features:
β Complete Tool Access - All 22 MCP tools available in Claude.ai
β HTTPS Tunneling - Cloudflare tunnel for secure external access
β OAuth Authentication - GitHub OAuth for secure authentication
β Trusted Certificates - No browser warnings or connection issues
β WebSocket Support - Full WebSocket compatibility with Claude.ai
β Production Ready - Tested and verified working solution
Quick Setup:
Setup GitHub OAuth (5 minutes)
Generate SSL certificates:
npm run ssl:generateStart services (2 terminals as shown above)
Connect Claude.ai to your tunnel URL
π Complete Guide: See
docs/claude-ai-integration.mdfor detailed setup instructions, troubleshooting, and advanced features.
π Certificate Setup: See
docs/trusted-certificates.mdfor SSL certificate configuration.
β¨ Features
π― Core Capabilities
22 Production-Ready MCP Tools - Complete Dataproc management suite
π§ Knowledge Base Semantic Search - Natural language queries with optional Qdrant integration
π Response Optimization - 60-96% token reduction with Qdrant storage
π Generic Type Conversion System - Automatic, type-safe data transformations
60-80% Parameter Reduction - Intelligent default injection
Multi-Environment Support - Dev/staging/production configurations
Service Account Impersonation - Enterprise authentication
Real-time Job Monitoring - Comprehensive status tracking
π Response Optimization
96.2% Token Reduction -
list_clusters: 7,651 β 292 tokensAutomatic Qdrant Storage - Full data preserved and searchable
Resource URI Access -
dataproc://responses/clusters/list/abc123Graceful Fallback - Works without Qdrant, falls back to full responses
9.95ms Processing - Lightning-fast optimization with <1MB memory usage
π Generic Type Conversion System
75% Code Reduction - Eliminates manual conversion logic across services
Type-Safe Transformations - Automatic field detection and mapping
Intelligent Compression - Field-level compression with configurable thresholds
0.50ms Conversion Times - Lightning-fast processing with 100% compression ratios
Zero-Configuration - Works automatically with existing TypeScript types
Backward Compatible - Seamless integration with existing functionality
οΏ½ Enterprise Security
Input Validation - Zod schemas for all 16 tools
Rate Limiting - Configurable abuse prevention
Credential Management - Secure handling and rotation
Audit Logging - Comprehensive security event tracking
Threat Detection - Injection attack prevention
π Quality Assurance
90%+ Test Coverage - Comprehensive test suite
Performance Monitoring - Configurable thresholds
Multi-Environment Testing - Cross-platform validation
Automated Quality Gates - CI/CD integration
Security Scanning - Vulnerability management
π Developer Experience
5-Minute Setup - Quick start guide
Interactive Documentation - HTML docs with examples
Comprehensive Examples - Multi-environment configs
Troubleshooting Guides - Common issues and solutions
IDE Integration - TypeScript support
π οΈ Complete MCP Tools Suite (22 Tools)
π Enhanced with Generic Type Conversion: All tools now benefit from automatic, type-safe data transformations with intelligent compression and field mapping.
π Cluster Management (8 Tools)
Tool | Description | Smart Defaults | Key Features |
| Create and start new clusters | β 80% fewer params | Profile-based, auto-config |
| Create from YAML configuration | β Project/region injection | Template-driven setup |
| Create using predefined profiles | β 85% fewer params | 8 built-in profiles |
| List all clusters with filtering | β No params needed | Semantic queries, pagination |
| List MCP-created clusters | β Profile filtering | Creation tracking |
| Get detailed cluster information | β 75% fewer params | Semantic data extraction |
| Delete existing clusters | β Project/region defaults | Safe deletion |
| Get Zeppelin notebook URL | β Auto-discovery | Web interface access |
πΌ Job Management (7 Tools)
Tool | Description | Smart Defaults | Key Features |
| Submit Hive queries to clusters | β 70% fewer params | Async support, timeouts |
| Submit Spark/PySpark/Presto jobs | β 75% fewer params | Multi-engine support, Local file staging |
| Cancel running or pending jobs | β JobID only needed | Emergency cancellation, cost control |
| Get job execution status | β JobID only needed | Real-time monitoring |
| Get job outputs and results | β Auto-pagination | Result formatting |
| Get Hive query status | β Minimal params | Query tracking |
| Get Hive query results | β Smart pagination | Enhanced async support |
π Configuration & Profiles (3 Tools)
Tool | Description | Smart Defaults | Key Features |
| List available cluster profiles | β Category filtering | 8 production profiles |
| Get detailed profile configuration | β Profile ID only | Template access |
| Query stored cluster data | β Natural language | Semantic search |
π Analytics & Insights (4 Tools)
Tool | Description | Smart Defaults | Key Features |
| Quick status of all active jobs | β No params needed | Multi-project view |
| Comprehensive cluster analytics | β Auto-discovery | Machine types, components |
| Job performance analytics | β Success rates | Error patterns, metrics |
| Query comprehensive knowledge base | β Natural language | Clusters, jobs, errors |
π― Key Capabilities
π§ Semantic Search: Natural language queries with Qdrant integration
β‘ Smart Defaults: 60-80% parameter reduction through intelligent injection
π Response Optimization: 96% token reduction with full data preservation
π Async Support: Non-blocking job submission and monitoring
π·οΈ Profile System: 8 production-ready cluster templates
π Analytics: Comprehensive insights and performance tracking
π Configuration
Project-Based Configuration
The server supports a project-based configuration format:
# profiles/@analytics-workloads.yaml
my-company-analytics-prod-1234:
region: us-central1
tags:
- DataProc
- analytics
- production
labels:
service: analytics-service
owner: data-team
environment: production
cluster_config:
# ... cluster configurationAuthentication Methods
Service Account Impersonation (Recommended)
Direct Service Account Key
Application Default Credentials
Hybrid Authentication with fallbacks
π Documentation
Quick Start Guide - Get started in 5 minutes
Knowledge Base Semantic Search - Natural language queries and setup
Generic Type Conversion System - Architectural design and implementation
Generic Converter Migration Guide - Migration from manual conversions
API Reference - Complete tool documentation
Configuration Examples - Real-world configurations
Security Guide - Best practices and compliance
Installation Guide - Detailed setup instructions
π§ MCP Client Integration
Claude Desktop
{
"mcpServers": {
"dataproc": {
"command": "npx",
"args": ["@dataproc/mcp-server"],
"env": {
"LOG_LEVEL": "info"
}
}
}
}Roo (VS Code)
{
"mcpServers": {
"dataproc-server": {
"command": "npx",
"args": ["@dataproc/mcp-server"],
"disabled": false,
"alwaysAllow": [
"list_clusters",
"get_cluster",
"list_profiles"
]
}
}
}ποΈ Architecture
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β MCP Client ββββββ Dataproc MCP ββββββ Google Cloud β
β (Claude/Roo) β β Server β β Dataproc β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
ββββββββ΄βββββββ
β Features β
βββββββββββββββ€
β β’ Security β
β β’ Profiles β
β β’ Validationβ
β β’ Monitoringβ
β β’ Generic β
β Converter β
βββββββββββββββπ Generic Type Conversion System Architecture
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Source Types ββββββ Generic Converter ββββββ Qdrant Payloads β
β β’ ClusterData β β System β β β’ Compressed β
β β’ QueryResults β β β β β’ Type-Safe β
β β’ JobData β β ββββββββββββββββ β β β’ Optimized β
βββββββββββββββββββ β βField Analyzerβ β βββββββββββββββββββ
β βTransformationβ β
β βEngine β β
β βCompression β β
β βService β β
β ββββββββββββββββ β
ββββββββββββββββββββπ¦ Performance
Response Time Achievements
Schema Validation: ~2ms (target: <5ms) β
Parameter Injection: ~1ms (target: <2ms) β
Generic Type Conversion: ~0.50ms (target: <2ms) β
Credential Validation: ~25ms (target: <50ms) β
MCP Tool Call: ~50ms (target: <100ms) β
Throughput Achievements
Schema Validation: ~2000 ops/sec β
Parameter Injection: ~5000 ops/sec β
Generic Type Conversion: ~2000 ops/sec β
Credential Validation: ~200 ops/sec β
MCP Tool Call: ~100 ops/sec β
Compression Achievements
Field-Level Compression: Up to 100% compression ratios β
Memory Optimization: 30-60% reduction in memory usage β
Type Safety: Zero runtime type errors with automatic validation β
π§ͺ Testing
# Run all tests
npm test
# Run specific test suites
npm run test:unit
npm run test:integration
npm run test:performance
# Run with coverage
npm run test:coverageπ€ Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Setup
# Clone the repository
git clone https://github.com/dipseth/dataproc-mcp.git
cd dataproc-mcp
# Install dependencies
npm install
# Build the project
npm run build
# Run tests
npm test
# Start development server
npm run devπ License
This project is licensed under the MIT License - see the LICENSE file for details.
π Support
GitHub Issues: Report bugs and request features
Documentation: Complete documentation
NPM Package: Package information
π Acknowledgments
Model Context Protocol - The protocol that makes this possible
Google Cloud Dataproc - The service we're integrating with
Qdrant - High-performance vector database powering our semantic search and knowledge indexing
TypeScript - For type safety and developer experience
Made with β€οΈ for the MCP and Google Cloud communities
This server cannot be installed
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/dipseth/dataproc-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server