Skip to main content
Glama

Dataproc MCP Server

npm version npm downloads Build Status Release Status Coverage Status License: MIT Node.js Version TypeScript MCP Compatible semantic-release

A production-ready Model Context Protocol (MCP) server for Google Cloud Dataproc operations with intelligent parameter injection, enterprise-grade security, and comprehensive tooling. Designed for seamless integration with Roo (VS Code).

πŸš€ Quick Start

Add this to your Roo MCP settings:

{ "mcpServers": { "dataproc": { "command": "npx", "args": ["@dipseth/dataproc-mcp-server@latest"], "env": { "LOG_LEVEL": "info" } } } }

With Custom Config File

{ "mcpServers": { "dataproc": { "command": "npx", "args": ["@dipseth/dataproc-mcp-server@latest"], "env": { "LOG_LEVEL": "info", "DATAPROC_CONFIG_PATH": "/path/to/your/config.json" } } } }

Alternative: Global Installation

# Install globally npm install -g @dipseth/dataproc-mcp-server # Start the server dataproc-mcp-server # Or run directly npx @dipseth/dataproc-mcp-server@latest

5-Minute Setup

  1. Install the package:

    npm install -g @dipseth/dataproc-mcp-server@latest
  2. Run the setup:

    dataproc-mcp --setup
  3. Configure authentication:

    # Edit the generated config file nano config/server.json
  4. Start the server:

    dataproc-mcp

🌐 Claude.ai Web App Compatibility

βœ… PRODUCTION-READY: Full Claude.ai Integration with HTTPS Tunneling & OAuth

The Dataproc MCP Server now provides complete Claude.ai web app compatibility with a working solution that includes all 22 MCP tools!

πŸš€ Working Solution (Tested & Verified)

Terminal 1 - Start MCP Server:

DATAPROC_CONFIG_PATH=config/github-oauth-server.json npm start -- --http --oauth --port 8080

Terminal 2 - Start Cloudflare Tunnel:

cloudflared tunnel --url https://localhost:8443 --origin-server-name localhost --no-tls-verify

Result: Claude.ai can see and use all tools successfully! πŸŽ‰

Key Features:

  • βœ… Complete Tool Access - All 22 MCP tools available in Claude.ai

  • βœ… HTTPS Tunneling - Cloudflare tunnel for secure external access

  • βœ… OAuth Authentication - GitHub OAuth for secure authentication

  • βœ… Trusted Certificates - No browser warnings or connection issues

  • βœ… WebSocket Support - Full WebSocket compatibility with Claude.ai

  • βœ… Production Ready - Tested and verified working solution

Quick Setup:

  1. Setup GitHub OAuth (5 minutes)

  2. Generate SSL certificates: npm run ssl:generate

  3. Start services (2 terminals as shown above)

  4. Connect Claude.ai to your tunnel URL

πŸ“– Complete Guide: See docs/claude-ai-integration.md for detailed setup instructions, troubleshooting, and advanced features.

πŸ“– Certificate Setup: See docs/trusted-certificates.md for SSL certificate configuration.

✨ Features

🎯 Core Capabilities

  • 22 Production-Ready MCP Tools - Complete Dataproc management suite

  • 🧠 Knowledge Base Semantic Search - Natural language queries with optional Qdrant integration

  • πŸš€ Response Optimization - 60-96% token reduction with Qdrant storage

  • πŸ”„ Generic Type Conversion System - Automatic, type-safe data transformations

  • 60-80% Parameter Reduction - Intelligent default injection

  • Multi-Environment Support - Dev/staging/production configurations

  • Service Account Impersonation - Enterprise authentication

  • Real-time Job Monitoring - Comprehensive status tracking

πŸš€ Response Optimization

  • 96.2% Token Reduction - list_clusters: 7,651 β†’ 292 tokens

  • Automatic Qdrant Storage - Full data preserved and searchable

  • Resource URI Access - dataproc://responses/clusters/list/abc123

  • Graceful Fallback - Works without Qdrant, falls back to full responses

  • 9.95ms Processing - Lightning-fast optimization with <1MB memory usage

πŸ”„ Generic Type Conversion System

  • 75% Code Reduction - Eliminates manual conversion logic across services

  • Type-Safe Transformations - Automatic field detection and mapping

  • Intelligent Compression - Field-level compression with configurable thresholds

  • 0.50ms Conversion Times - Lightning-fast processing with 100% compression ratios

  • Zero-Configuration - Works automatically with existing TypeScript types

  • Backward Compatible - Seamless integration with existing functionality

οΏ½ Enterprise Security

  • Input Validation - Zod schemas for all 16 tools

  • Rate Limiting - Configurable abuse prevention

  • Credential Management - Secure handling and rotation

  • Audit Logging - Comprehensive security event tracking

  • Threat Detection - Injection attack prevention

πŸ“Š Quality Assurance

  • 90%+ Test Coverage - Comprehensive test suite

  • Performance Monitoring - Configurable thresholds

  • Multi-Environment Testing - Cross-platform validation

  • Automated Quality Gates - CI/CD integration

  • Security Scanning - Vulnerability management

πŸš€ Developer Experience

  • 5-Minute Setup - Quick start guide

  • Interactive Documentation - HTML docs with examples

  • Comprehensive Examples - Multi-environment configs

  • Troubleshooting Guides - Common issues and solutions

  • IDE Integration - TypeScript support

πŸ› οΈ Complete MCP Tools Suite (22 Tools)

πŸ”„ Enhanced with Generic Type Conversion: All tools now benefit from automatic, type-safe data transformations with intelligent compression and field mapping.

πŸš€ Cluster Management (8 Tools)

Tool

Description

Smart Defaults

Key Features

start_dataproc_cluster

Create and start new clusters

βœ… 80% fewer params

Profile-based, auto-config

create_cluster_from_yaml

Create from YAML configuration

βœ… Project/region injection

Template-driven setup

create_cluster_from_profile

Create using predefined profiles

βœ… 85% fewer params

8 built-in profiles

list_clusters

List all clusters with filtering

βœ… No params needed

Semantic queries, pagination

list_tracked_clusters

List MCP-created clusters

βœ… Profile filtering

Creation tracking

get_cluster

Get detailed cluster information

βœ… 75% fewer params

Semantic data extraction

delete_cluster

Delete existing clusters

βœ… Project/region defaults

Safe deletion

get_zeppelin_url

Get Zeppelin notebook URL

βœ… Auto-discovery

Web interface access

πŸ’Ό Job Management (7 Tools)

Tool

Description

Smart Defaults

Key Features

submit_hive_query

Submit Hive queries to clusters

βœ… 70% fewer params

Async support, timeouts

submit_dataproc_job

Submit Spark/PySpark/Presto jobs

βœ… 75% fewer params

Multi-engine support, Local file staging

cancel_dataproc_job

Cancel running or pending jobs

βœ… JobID only needed

Emergency cancellation, cost control

get_job_status

Get job execution status

βœ… JobID only needed

Real-time monitoring

get_job_results

Get job outputs and results

βœ… Auto-pagination

Result formatting

get_query_status

Get Hive query status

βœ… Minimal params

Query tracking

get_query_results

Get Hive query results

βœ… Smart pagination

Enhanced async support

πŸ“‹ Configuration & Profiles (3 Tools)

Tool

Description

Smart Defaults

Key Features

list_profiles

List available cluster profiles

βœ… Category filtering

8 production profiles

get_profile

Get detailed profile configuration

βœ… Profile ID only

Template access

query_cluster_data

Query stored cluster data

βœ… Natural language

Semantic search

πŸ“Š Analytics & Insights (4 Tools)

Tool

Description

Smart Defaults

Key Features

check_active_jobs

Quick status of all active jobs

βœ… No params needed

Multi-project view

get_cluster_insights

Comprehensive cluster analytics

βœ… Auto-discovery

Machine types, components

get_job_analytics

Job performance analytics

βœ… Success rates

Error patterns, metrics

query_knowledge

Query comprehensive knowledge base

βœ… Natural language

Clusters, jobs, errors

🎯 Key Capabilities

  • 🧠 Semantic Search: Natural language queries with Qdrant integration

  • ⚑ Smart Defaults: 60-80% parameter reduction through intelligent injection

  • πŸ“Š Response Optimization: 96% token reduction with full data preservation

  • πŸ”„ Async Support: Non-blocking job submission and monitoring

  • 🏷️ Profile System: 8 production-ready cluster templates

  • πŸ“ˆ Analytics: Comprehensive insights and performance tracking

πŸ“‹ Configuration

Project-Based Configuration

The server supports a project-based configuration format:

# profiles/@analytics-workloads.yaml my-company-analytics-prod-1234: region: us-central1 tags: - DataProc - analytics - production labels: service: analytics-service owner: data-team environment: production cluster_config: # ... cluster configuration

Authentication Methods

  1. Service Account Impersonation (Recommended)

  2. Direct Service Account Key

  3. Application Default Credentials

  4. Hybrid Authentication with fallbacks

πŸ“š Documentation

πŸ”§ MCP Client Integration

Claude Desktop

{ "mcpServers": { "dataproc": { "command": "npx", "args": ["@dataproc/mcp-server"], "env": { "LOG_LEVEL": "info" } } } }

Roo (VS Code)

{ "mcpServers": { "dataproc-server": { "command": "npx", "args": ["@dataproc/mcp-server"], "disabled": false, "alwaysAllow": [ "list_clusters", "get_cluster", "list_profiles" ] } } }

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ MCP Client │────│ Dataproc MCP │────│ Google Cloud β”‚ β”‚ (Claude/Roo) β”‚ β”‚ Server β”‚ β”‚ Dataproc β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β” β”‚ Features β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β€’ Security β”‚ β”‚ β€’ Profiles β”‚ β”‚ β€’ Validationβ”‚ β”‚ β€’ Monitoringβ”‚ β”‚ β€’ Generic β”‚ β”‚ Converter β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”„ Generic Type Conversion System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Source Types │────│ Generic Converter │────│ Qdrant Payloads β”‚ β”‚ β€’ ClusterData β”‚ β”‚ System β”‚ β”‚ β€’ Compressed β”‚ β”‚ β€’ QueryResults β”‚ β”‚ β”‚ β”‚ β€’ Type-Safe β”‚ β”‚ β€’ JobData β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β€’ Optimized β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚Field Analyzerβ”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚Transformationβ”‚ β”‚ β”‚ β”‚Engine β”‚ β”‚ β”‚ β”‚Compression β”‚ β”‚ β”‚ β”‚Service β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

🚦 Performance

Response Time Achievements

  • Schema Validation: ~2ms (target: <5ms) βœ…

  • Parameter Injection: ~1ms (target: <2ms) βœ…

  • Generic Type Conversion: ~0.50ms (target: <2ms) βœ…

  • Credential Validation: ~25ms (target: <50ms) βœ…

  • MCP Tool Call: ~50ms (target: <100ms) βœ…

Throughput Achievements

  • Schema Validation: ~2000 ops/sec βœ…

  • Parameter Injection: ~5000 ops/sec βœ…

  • Generic Type Conversion: ~2000 ops/sec βœ…

  • Credential Validation: ~200 ops/sec βœ…

  • MCP Tool Call: ~100 ops/sec βœ…

Compression Achievements

  • Field-Level Compression: Up to 100% compression ratios βœ…

  • Memory Optimization: 30-60% reduction in memory usage βœ…

  • Type Safety: Zero runtime type errors with automatic validation βœ…

πŸ§ͺ Testing

# Run all tests npm test # Run specific test suites npm run test:unit npm run test:integration npm run test:performance # Run with coverage npm run test:coverage

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

# Clone the repository git clone https://github.com/dipseth/dataproc-mcp.git cd dataproc-mcp # Install dependencies npm install # Build the project npm run build # Run tests npm test # Start development server npm run dev

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Support

πŸ† Acknowledgments


Made with ❀️ for the MCP and Google Cloud communities

-
security - not tested
A
license - permissive license
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dipseth/dataproc-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server