Dataproc MCP Server

prompts.md•20 KiB

# Enhanced Built-in Prompt System Architecture ## Executive Summary This document outlines the architecture for an enhanced built-in prompt system that replaces the current static prompts in `src/prompts/dataproc-prompts.ts` with a sophisticated, context-aware system that leverages the existing MCP infrastructure including RFC 6570 templating, Qdrant knowledge base, profile management, and parameter injection. ## System Architecture Overview ```mermaid graph TB subgraph "Enhanced Prompt System" DPG[DataprocPromptGenerator] PC[PromptContext] PT[PromptTemplate] PB[PromptBuilder] end subgraph "Existing Infrastructure" TI[TemplatingIntegration] TM[TemplateManager] TR[TemplateResolver] PI[ParameterInjector] KI[KnowledgeIndexer] PM[ProfileManager] DPM[DefaultParameterManager] DR[DynamicResolver] end subgraph "MCP Server" MS[McpServer] PH[PromptHandlers] end DPG --> PC DPG --> PT DPG --> PB DPG --> TI TI --> TM TI --> TR TI --> PI DPG --> KI DPG --> PM DPG --> DPM PI --> DR MS --> PH PH --> DPG classDef enhanced fill:#e1f5fe classDef existing fill:#f3e5f5 classDef mcp fill:#e8f5e8 class DPG,PC,PT,PB enhanced class TI,TM,TR,PI,KI,PM,DPM,DR existing class MS,PH mcp ``` ## Core Components Design ### 1. DataprocPromptGenerator Class The central orchestrator that coordinates all prompt generation activities. ```typescript interface DataprocPromptGeneratorConfig { enableTemplating: boolean; enableKnowledgeIntegration: boolean; enableProfileIntegration: boolean; cacheConfig: { enableCaching: boolean; defaultTtlMs: number; maxCacheSize: number; }; dynamicResolution: { executeAtGenerationTime: boolean; timeoutMs: number; }; } class DataprocPromptGenerator { private templatingIntegration: TemplatingIntegration; private knowledgeIndexer: KnowledgeIndexer; private profileManager: ProfileManager; private parameterInjector: ParameterInjector; private dynamicResolver: DynamicResolver; private promptCache: Map<string, CachedPrompt>; // Core methods async generatePrompt(promptId: string, context: PromptContext): Promise<EnhancedPrompt> async resolvePromptTemplate(template: PromptTemplate, context: PromptContext): Promise<ResolvedPrompt> private async executeKnowledgeQueries(template: PromptTemplate, context: PromptContext): Promise<KnowledgeResults> private async resolveProfileParameters(context: PromptContext): Promise<ProfileParameters> private async resolveDynamicFunctions(content: string, context: PromptContext): Promise<string> } ``` ### 2. PromptContext Interface Comprehensive context information for prompt generation. ```typescript interface PromptContext { // Core context promptId: string; toolName: string; userParameters: Record<string, unknown>; // Environment context environment?: string; profileId?: string; projectId?: string; region?: string; // Knowledge context knowledgeQueries?: KnowledgeQuery[]; semanticContext?: string[]; // Template context templateOverrides?: Record<string, unknown>; dynamicFunctions?: DynamicFunction[]; // Security context securityContext: SecurityContext; // Metadata requestId?: string; timestamp: Date; metadata?: Record<string, unknown>; } interface KnowledgeQuery { type: 'semantic' | 'tag-based' | 'hybrid'; query: string; filters?: Record<string, unknown>; limit?: number; minConfidence?: number; } interface DynamicFunction { name: 'job_output' | 'qdrant_query' | 'profile_config' | 'cluster_status'; args: string[]; cacheKey?: string; ttl?: number; } ``` ### 3. Enhanced Prompt Template System Prompts are categorized into four types based on their integration complexity: ```mermaid graph LR subgraph "Prompt Categories" TE[Template-Enhanced] KD[Knowledge-Driven] RI[Resource-Integrated] FI[Fully-Integrated] end subgraph "Template Features" RFC[RFC 6570 Templating] PI[Parameter Injection] DF[Dynamic Functions] KS[Knowledge Search] RC[Resource Access] end TE --> RFC TE --> PI KD --> KS KD --> DF RI --> RC RI --> RFC FI --> RFC FI --> PI FI --> DF FI --> KS FI --> RC ``` ## Prompt Generation Flow ```mermaid sequenceDiagram participant Client participant PH as PromptHandlers participant DPG as DataprocPromptGenerator participant TI as TemplatingIntegration participant KI as KnowledgeIndexer participant DR as DynamicResolver participant PM as ProfileManager Client->>PH: getPrompt(promptId, args) PH->>DPG: generatePrompt(promptId, context) DPG->>DPG: buildPromptContext(args) DPG->>TI: resolveResourceUri(toolName, parameters) TI->>DPG: resourceUri + resolvedParams DPG->>PM: getProfileParameters(profileId) PM->>DPG: profileParams DPG->>KI: queryKnowledge(semanticQueries) KI->>DPG: knowledgeResults DPG->>DR: resolveDynamicFunctions(templateContent) DR->>DPG: resolvedContent DPG->>DPG: buildFinalPrompt(template, context, knowledge, params) DPG->>PH: enhancedPrompt PH->>Client: promptResponse ``` ## Enhanced Prompt Specifications ### 1. analyze-dataproc-query (Knowledge-Driven) **Purpose**: Analyze Hive/Spark queries with knowledge base examples and optimization insights. **Key Features**: - Semantic search for similar successful queries - Performance optimization recommendations from knowledge base - Context-aware analysis based on cluster configuration ```typescript const analyzeDataprocQueryTemplate: PromptTemplate = { id: 'analyze-dataproc-query', category: 'knowledge-driven', description: 'Analyze Hive/Spark queries with knowledge base examples', template: `Analyze this {{queryType || 'SQL'}} query for optimization and best practices: **Query to Analyze:** \`\`\`sql {{query}} \`\`\` **Context:** - Target Cluster: {{clusterName || 'any'}} - Optimization Level: {{optimizationLevel || 'basic'}} - Query Type: {{queryType || 'auto-detect'}} **Similar Query Examples from Knowledge Base:** {{qdrant_query("similar queries " + queryType, "limit:3 type:job")}} **Performance Insights:** {{qdrant_query("query optimization " + queryType, "limit:2 type:cluster")}} **Analysis Requirements:** 1. Query performance analysis with specific bottlenecks 2. Optimization suggestions based on similar successful queries 3. Best practices recommendations from knowledge base 4. Potential issues or warnings with suggested fixes 5. Alternative query approaches if applicable`, parameters: [ { name: 'query', type: 'string', required: true, source: 'tool' }, { name: 'queryType', type: 'string', required: false, source: 'tool' }, { name: 'clusterName', type: 'string', required: false, source: 'profile' }, { name: 'optimizationLevel', type: 'string', required: false, source: 'tool' } ], knowledgeQueries: [ { type: 'semantic', query: 'similar queries {{queryType}}', filters: { type: 'job' }, limit: 3 }, { type: 'semantic', query: 'query optimization {{queryType}}', filters: { type: 'cluster' }, limit: 2 } ], dynamicFunctions: [ { name: 'qdrant_query', args: ['similar queries " + queryType', 'limit:3 type:job'] }, { name: 'qdrant_query', args: ['query optimization " + queryType', 'limit:2 type:cluster'] } ] }; ``` ### 2. design-dataproc-cluster (Resource-Integrated) **Purpose**: Generate cluster configuration recommendations with profile integration and regional best practices. **Key Features**: - Profile-based configuration templates - Regional optimization recommendations - Cost and performance optimization strategies ```typescript const designDataprocClusterTemplate: PromptTemplate = { id: 'design-dataproc-cluster', category: 'resource-integrated', description: 'Generate cluster configuration with profile integration', template: `Design an optimal Dataproc cluster configuration for the following requirements: **Workload Requirements:** - Type: {{workloadType}} - Data Size: {{dataSize}} - Budget: {{budget}} - Region: {{region || 'flexible'}} - Additional Requirements: {{requirements || 'none specified'}} **Profile-Based Recommendations:** {{profile_config(profileId, "cluster-templates")}} **Similar Cluster Configurations:** {{qdrant_query("workload " + workloadType + " " + dataSize, "limit:3 type:cluster")}} **Regional Best Practices:** {{qdrant_query("region " + region + " cluster configuration", "limit:2")}} **Design Requirements:** 1. Recommended cluster configuration (machine types, disk sizes, node counts) 2. Software components and versions based on workload type 3. Networking and security recommendations for {{region}} 4. Cost optimization strategies for {{budget}} budget 5. Performance tuning suggestions for {{dataSize}} data 6. Complete YAML configuration file 7. Scaling recommendations and auto-scaling policies`, parameters: [ { name: 'workloadType', type: 'string', required: true, source: 'tool' }, { name: 'dataSize', type: 'string', required: true, source: 'tool' }, { name: 'budget', type: 'string', required: true, source: 'tool' }, { name: 'region', type: 'string', required: false, source: 'gcp' }, { name: 'requirements', type: 'string', required: false, source: 'tool' }, { name: 'profileId', type: 'string', required: false, source: 'profile' } ], resourceAccess: [ { uri: 'dataproc://profile/{{category}}/{{profileId}}/config', purpose: 'cluster-templates' } ], knowledgeQueries: [ { type: 'semantic', query: 'workload {{workloadType}} {{dataSize}}', filters: { type: 'cluster' }, limit: 3 }, { type: 'semantic', query: 'region {{region}} cluster configuration', limit: 2 } ] }; ``` ### 3. troubleshoot-dataproc-issue (Fully-Integrated) **Purpose**: Comprehensive troubleshooting with semantic error matching, job output analysis, and historical resolution patterns. **Key Features**: - Job output analysis for error context - Semantic matching of similar error patterns - Historical resolution strategies - Cluster configuration analysis ```typescript const troubleshootDataprocIssueTemplate: PromptTemplate = { id: 'troubleshoot-dataproc-issue', category: 'fully-integrated', description: 'Comprehensive troubleshooting with semantic error matching', template: `Troubleshoot this Dataproc issue with comprehensive analysis: **Issue Details:** - Type: {{issueType}} - Error Message: {{errorMessage || 'not provided'}} - Job ID: {{jobId || 'not applicable'}} - Cluster: {{clusterName || 'not specified'}} - Timeline: {{timeline || 'not specified'}} - Context: {{context || 'not provided'}} **Job Output Analysis:** {{job_output(jobId, "error-analysis")}} **Similar Error Patterns:** {{qdrant_query("error " + issueType + " " + errorMessage, "limit:5 type:error")}} **Cluster Configuration Analysis:** {{qdrant_query("cluster " + clusterName + " configuration issues", "limit:3 type:cluster")}} **Historical Resolution Patterns:** {{qdrant_query("resolved " + issueType + " troubleshooting", "limit:3")}} **Troubleshooting Analysis:** 1. **Root Cause Analysis**: Based on error patterns and job output 2. **Step-by-Step Troubleshooting Guide**: Prioritized diagnostic steps 3. **Diagnostic Commands**: Specific commands to gather more information 4. **Resolution Strategies**: Multiple approaches based on similar cases 5. **Prevention Recommendations**: Avoid future occurrences 6. **Escalation Path**: When to involve Google Cloud Support`, parameters: [ { name: 'issueType', type: 'string', required: true, source: 'tool' }, { name: 'errorMessage', type: 'string', required: false, source: 'tool' }, { name: 'jobId', type: 'string', required: false, source: 'tool' }, { name: 'clusterName', type: 'string', required: false, source: 'tool' }, { name: 'timeline', type: 'string', required: false, source: 'tool' }, { name: 'context', type: 'string', required: false, source: 'tool' } ], dynamicFunctions: [ { name: 'job_output', args: ['{{jobId}}', 'error-analysis'] }, { name: 'qdrant_query', args: ['error {{issueType}} {{errorMessage}}', 'limit:5 type:error'] }, { name: 'qdrant_query', args: ['cluster {{clusterName}} configuration issues', 'limit:3 type:cluster'] }, { name: 'qdrant_query', args: ['resolved {{issueType}} troubleshooting', 'limit:3'] } ] }; ``` ### 4. generate-dataproc-query (Template-Enhanced) **Purpose**: Generate optimized queries with template-driven optimization and cluster-specific recommendations. **Key Features**: - Template-based parameter resolution - Cluster-specific optimization hints - Performance level-based recommendations - Query pattern examples from knowledge base ```typescript const generateDataprocQueryTemplate: PromptTemplate = { id: 'generate-dataproc-query', category: 'template-enhanced', description: 'Generate optimized queries with template-driven optimization', template: `Generate an optimized {{dialect}} query for the following requirements: **Requirements:** - Purpose: {{queryPurpose}} - Tables: {{tableNames || 'to be determined'}} - Query Type: {{queryType}} - Dialect: {{dialect}} - Performance Level: {{performanceLevel}} - Target Cluster: {{clusterName || 'any'}} **Cluster-Specific Optimizations:** {{cluster_status(clusterName, "optimization-hints")}} **Query Pattern Examples:** {{qdrant_query("successful " + queryType + " " + dialect + " queries", "limit:3 type:job")}} **Performance Best Practices:** {{qdrant_query(dialect + " " + performanceLevel + " optimization", "limit:2")}} **Generation Requirements:** 1. **Complete, Runnable Query**: Fully functional {{dialect}} query 2. **Query Explanation**: Logic and approach explanation 3. **Performance Optimizations**: Specific to {{performanceLevel}} level 4. **Alternative Approaches**: Different strategies if applicable 5. **Execution Considerations**: Memory, parallelism, and resource usage 6. **Monitoring Recommendations**: Key metrics to track during execution`, parameters: [ { name: 'queryPurpose', type: 'string', required: true, source: 'tool' }, { name: 'tableNames', type: 'string', required: false, source: 'tool' }, { name: 'queryType', type: 'string', required: true, source: 'tool' }, { name: 'dialect', type: 'string', required: true, source: 'tool' }, { name: 'performanceLevel', type: 'string', required: true, source: 'tool' }, { name: 'clusterName', type: 'string', required: false, source: 'profile' } ], templateResolution: { enableParameterInjection: true, enableDynamicFunctions: true, cacheResults: true } }; ``` ## Integration Points ### 1. TemplateManager and TemplateResolver Integration - **RFC 6570 URI templating**: Leverage existing template parsing and expansion - **Parameter inheritance chains**: Use existing parameter injection hierarchy - **Template validation**: Utilize existing validation framework ### 2. KnowledgeIndexer Integration - **Semantic search capabilities**: Query Qdrant for relevant examples and patterns - **Dynamic insights**: Access field analysis and pattern detection - **Result formatting**: Use existing result formatting and confidence scoring ### 3. ResourceAccessor Integration - **MCP resource access**: Access profile configurations and cluster data - **Resource URI resolution**: Use templating system for resource URIs - **Caching**: Leverage existing resource caching mechanisms ### 4. ParameterInjector Integration - **Dynamic parameter resolution**: Resolve parameters at generation time - **Inheritance chain**: GCP defaults → Profile → Template → Tool overrides - **Validation**: Use existing parameter validation framework ### 5. ProfileManager Integration - **Profile-based configurations**: Access cluster templates and defaults - **Environment-specific settings**: Load environment-appropriate configurations - **Parameter inheritance**: Integrate with parameter injection system ## Implementation Plan ### Phase 1: Core Infrastructure (Week 1-2) **Deliverables:** - `DataprocPromptGenerator` class with basic structure - `PromptContext` interface and context building logic - Integration with existing services (TemplatingIntegration, KnowledgeIndexer, ProfileManager) - Basic caching mechanism - Updated prompt handlers to use new system **Key Tasks:** 1. Create core class structure and configuration 2. Implement service integration points 3. Build context resolution logic 4. Add caching and performance monitoring 5. Update MCP prompt registration system ### Phase 2: Template System (Week 2-3) **Deliverables:** - Template-Enhanced prompts (basic RFC 6570 templating) - Knowledge-Driven prompts (Qdrant integration) - Dynamic function resolution system - Parameter injection integration **Key Tasks:** 1. Implement template parsing and resolution 2. Add knowledge base query execution 3. Create dynamic function system 4. Integrate parameter inheritance chains 5. Add template validation and error handling ### Phase 3: Advanced Features (Week 3-4) **Deliverables:** - Resource-Integrated prompts (MCP resource access) - Fully-Integrated prompts (all systems combined) - Advanced caching and optimization - Performance monitoring and metrics **Key Tasks:** 1. Implement MCP resource access integration 2. Create complex prompt templates 3. Add advanced caching strategies 4. Implement performance optimization 5. Add comprehensive error handling and fallbacks ### Phase 4: Testing & Optimization (Week 4-5) **Deliverables:** - Comprehensive test suite - Performance benchmarks - Documentation and examples - Migration guide from static prompts **Key Tasks:** 1. Unit tests for all components 2. Integration tests with existing services 3. Performance testing and optimization 4. Documentation and usage examples 5. Migration strategy and backward compatibility ## File Structure ``` src/ ├── prompts/ │ ├── enhanced-dataproc-prompts.ts # New enhanced prompt system │ ├── prompt-generator.ts # DataprocPromptGenerator class │ ├── prompt-context.ts # PromptContext interface │ ├── prompt-templates.ts # Enhanced template definitions │ └── template-functions.ts # Dynamic function implementations ├── types/ │ ├── enhanced-prompts.ts # Type definitions │ └── prompt-templates.ts # Template type definitions └── handlers/ └── enhanced-prompt-handlers.ts # Updated prompt handlers ``` ## Benefits and Expected Outcomes ### 1. Enhanced User Experience - **Context-aware prompts**: Prompts adapt to user's environment, profile, and historical data - **Intelligent recommendations**: Leverage knowledge base for relevant examples and best practices - **Reduced cognitive load**: Pre-populated with relevant information and examples ### 2. Improved System Integration - **Unified infrastructure**: Leverages all existing MCP services in a cohesive manner - **Consistent parameter handling**: Uses established parameter inheritance chains - **Scalable architecture**: Easily extensible for new prompt types and features ### 3. Performance Optimization - **Intelligent caching**: Cache dynamic function results with appropriate TTLs - **Parallel execution**: Execute knowledge queries and resource access concurrently - **Optimized generation**: Execute dynamic functions at generation time for faster LLM processing ### 4. Maintainability and Extensibility - **Modular design**: Clear separation of concerns and well-defined interfaces - **Type safety**: Comprehensive TypeScript interfaces and validation - **Testable architecture**: Each component can be tested independently This architecture provides a sophisticated foundation for intelligent, context-aware prompts that leverage the full power of the existing MCP infrastructure while delivering a superior user experience through dynamic, personalized prompt generation.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dipseth/dataproc-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

prompts.md•20 KiB