GCP BigQuery MCP Server

db-mcp
docs

connection-pooling-design.md•44.7 KiB

# Enterprise Connection Pooling & Dataset Management System ## BigQuery MCP Server Architecture Design **Version:** 1.0.0 **Date:** 2025-11-01 **Status:** Design Specification **Owner:** System Architecture Team --- ## Table of Contents 1. [Executive Summary](#executive-summary) 2. [System Architecture](#system-architecture) 3. [Connection Pool Design](#connection-pool-design) 4. [Dataset Manager](#dataset-manager) 5. [Configuration Management](#configuration-management) 6. [Performance Optimization](#performance-optimization) 7. [Reliability & Resilience](#reliability--resilience) 8. [Monitoring & Observability](#monitoring--observability) 9. [Implementation Roadmap](#implementation-roadmap) 10. [Appendices](#appendices) --- ## Executive Summary ### Overview This document specifies an enterprise-grade connection pooling and dataset management system for the BigQuery MCP server, designed to optimize resource utilization, ensure high availability, and provide predictable performance at scale. ### Key Objectives - **Resource Efficiency**: Minimize BigQuery API calls and connection overhead - **High Availability**: 99.95% uptime through intelligent failover and health checks - **Performance**: <100ms pool acquisition time, <500ms metadata cache hit - **Scalability**: Support 100+ concurrent datasets with 1000+ queries/second - **Cost Optimization**: Reduce BigQuery API costs by 60% through intelligent caching ### Design Principles 1. **Fail-Fast Philosophy**: Early detection and rapid recovery 2. **Graceful Degradation**: Maintain partial functionality during failures 3. **Observable by Default**: Built-in metrics, logging, and tracing 4. **Configuration-Driven**: Environment-based configuration with sane defaults 5. **Resource-Aware**: Adaptive pool sizing based on system resources --- ## System Architecture ### High-Level Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ MCP Server Layer │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Query API │ │ Dataset API │ │ Schema API │ │ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ └─────────┼──────────────────┼──────────────────┼─────────────────┘ │ │ │ └─────────┬────────┴────────┬─────────┘ │ │ ┌─────────▼─────────────────▼──────────┐ │ Connection Pool Manager │ │ ┌────────────────────────────────┐ │ │ │ Pool Registry │ │ │ │ - Project Pools Map │ │ │ │ - Dataset Pools Map │ │ │ │ - Connection Metrics │ │ │ └────────────────────────────────┘ │ │ │ │ ┌────────────────────────────────┐ │ │ │ Health Monitor │ │ │ │ - Active Health Checks │ │ │ │ - Passive Health Checks │ │ │ │ - Circuit Breaker │ │ │ └────────────────────────────────┘ │ └───────────────┬───────────────────────┘ │ ┌───────────────▼───────────────────────┐ │ Dataset Manager │ │ ┌────────────────────────────────┐ │ │ │ Dataset Registry │ │ │ │ - Metadata Cache (TTL-based) │ │ │ │ - Schema Cache (versioned) │ │ │ │ - Discovery Service │ │ │ └────────────────────────────────┘ │ │ │ │ ┌────────────────────────────────┐ │ │ │ Connection Allocator │ │ │ │ - Round-Robin Strategy │ │ │ │ - Least-Connections Strategy │ │ │ │ - Weighted Strategy │ │ │ └────────────────────────────────┘ │ └───────────────┬───────────────────────┘ │ ┌───────────────▼───────────────────────┐ │ BigQuery Client Pool │ │ ┌────────────────────────────────┐ │ │ │ Connection Pool │ │ │ │ - Min: 2 connections │ │ │ │ - Max: 20 connections │ │ │ │ - Idle Timeout: 5 minutes │ │ │ └────────────────────────────────┘ │ │ │ │ ┌────────────────────────────────┐ │ │ │ Resource Manager │ │ │ │ - Connection Lifecycle │ │ │ │ - Memory Management │ │ │ │ - Rate Limiting │ │ │ └────────────────────────────────┘ │ └───────────────┬───────────────────────┘ │ ┌───────────────▼───────────────────────┐ │ BigQuery API │ │ - Dataset Operations │ │ - Query Execution │ │ - Metadata Retrieval │ └───────────────────────────────────────┘ ``` ### Component Responsibilities #### Connection Pool Manager - **Primary Role**: Orchestrate connection lifecycle across projects/datasets - **Key Functions**: - Pool initialization and warmup - Connection acquisition/release - Health monitoring and circuit breaking - Metrics aggregation and reporting #### Dataset Manager - **Primary Role**: Manage dataset metadata and optimize access patterns - **Key Functions**: - Metadata caching with TTL-based invalidation - Schema version management - Dataset discovery and auto-registration - Query pattern analysis for optimization #### BigQuery Client Pool - **Primary Role**: Maintain reusable BigQuery client connections - **Key Functions**: - Connection pooling (min/max sizing) - Connection validation and health checks - Resource cleanup and garbage collection - Rate limiting and backpressure handling --- ## Connection Pool Design ### Pool Architecture #### Multi-Level Pooling Strategy ```typescript interface PoolHierarchy { global: { // Global pool manager (singleton) maxTotalConnections: number; maxProjectPools: number; globalRateLimiter: RateLimiter; }; project: { // Per-project pools projectId: string; maxDatasetPools: number; maxConnectionsPerProject: number; projectRateLimiter: RateLimiter; }; dataset: { // Per-dataset pools (finest granularity) datasetId: string; minConnections: number; maxConnections: number; activeConnections: number; idleConnections: number; }; } ``` ### Pool Configuration Matrix | Environment | Min Conn/Dataset | Max Conn/Dataset | Max Datasets | Total Max Conn | Idle Timeout | |-------------|------------------|------------------|--------------|----------------|--------------| | Development | 1 | 5 | 10 | 50 | 2 min | | Staging | 2 | 10 | 50 | 200 | 5 min | | Production | 2 | 20 | 100 | 500 | 5 min | | Enterprise | 5 | 50 | 500 | 2000 | 10 min | ### Pool Lifecycle Management #### 1. Pool Initialization ```typescript class ConnectionPool { async initialize(config: PoolConfig): Promise<void> { // Phase 1: Validate configuration this.validateConfig(config); // Phase 2: Pre-warm minimum connections await this.warmupConnections(config.minConnections); // Phase 3: Start health monitoring this.startHealthMonitor(config.healthCheckInterval); // Phase 4: Register metrics collectors this.registerMetricsCollectors(); // Phase 5: Enable circuit breaker this.enableCircuitBreaker(config.circuitBreakerConfig); this.state = PoolState.READY; this.emit('pool:initialized', { poolId: this.id, config }); } private async warmupConnections(count: number): Promise<void> { const warmupPromises = Array(count) .fill(null) .map(() => this.createConnection()); const results = await Promise.allSettled(warmupPromises); const failed = results.filter(r => r.status === 'rejected'); if (failed.length > count * 0.5) { throw new PoolInitializationError( `Failed to warm up pool: ${failed.length}/${count} connections failed` ); } } } ``` #### 2. Connection Acquisition ```typescript interface AcquisitionOptions { timeout?: number; // Max wait time (default: 5000ms) priority?: Priority; // HIGH, NORMAL, LOW retryPolicy?: RetryPolicy; // Retry configuration tags?: Record<string, string>; // For tracing } class ConnectionPool { async acquire(options: AcquisitionOptions = {}): Promise<PooledConnection> { const startTime = Date.now(); const timeout = options.timeout ?? this.config.acquisitionTimeout; // Step 1: Check circuit breaker if (this.circuitBreaker.isOpen()) { throw new CircuitBreakerOpenError('Pool circuit breaker is open'); } // Step 2: Try to get idle connection let connection = await this.getIdleConnection(options.priority); // Step 3: If no idle connection and under max limit, create new if (!connection && this.canCreateConnection()) { connection = await this.createConnection(); } // Step 4: If still no connection, wait with timeout if (!connection) { connection = await this.waitForConnection(timeout - (Date.now() - startTime)); } // Step 5: Validate connection health const isHealthy = await this.validateConnection(connection); if (!isHealthy) { await this.removeConnection(connection); return this.acquire(options); // Recursive retry } // Step 6: Mark as active and return this.markActive(connection); this.metrics.recordAcquisition(Date.now() - startTime); return connection; } private async getIdleConnection(priority: Priority): Promise<PooledConnection | null> { // Priority-based selection from idle queue if (priority === Priority.HIGH && this.idleQueue.length > 0) { return this.idleQueue.shift()!; // FIFO for high priority } // Least recently used for normal priority return this.idleQueue.pop() ?? null; } private async waitForConnection(timeout: number): Promise<PooledConnection> { return new Promise((resolve, reject) => { const timer = setTimeout(() => { this.waitQueue.delete(waiter); reject(new AcquisitionTimeoutError(`Timeout after ${timeout}ms`)); }, timeout); const waiter = { resolve, reject, timer }; this.waitQueue.add(waiter); }); } } ``` #### 3. Connection Release ```typescript class ConnectionPool { async release(connection: PooledConnection): Promise<void> { // Step 1: Validate connection state if (!this.activeConnections.has(connection.id)) { throw new InvalidConnectionError('Connection not from this pool'); } // Step 2: Reset connection state await this.resetConnection(connection); // Step 3: Update metrics this.metrics.recordRelease(connection.activeTime); // Step 4: Check if pool is oversized if (this.shouldEvict(connection)) { await this.removeConnection(connection); return; } // Step 5: Return to idle pool or serve waiting request if (this.waitQueue.size > 0) { const waiter = this.waitQueue.values().next().value; this.waitQueue.delete(waiter); clearTimeout(waiter.timer); waiter.resolve(connection); } else { this.idleQueue.push(connection); connection.lastUsedTime = Date.now(); } } private shouldEvict(connection: PooledConnection): boolean { // Evict if idle connections exceed min and this is old const idleCount = this.idleQueue.length; const minConnections = this.config.minConnections; if (idleCount < minConnections) { return false; } const age = Date.now() - connection.createdTime; const maxAge = this.config.maxConnectionAge ?? Infinity; return age > maxAge || connection.errorCount > this.config.maxErrors; } } ``` ### Health Check System #### Active Health Checks ```typescript interface HealthCheckConfig { interval: number; // Check interval (default: 30s) timeout: number; // Check timeout (default: 5s) unhealthyThreshold: number; // Failures before marking unhealthy (default: 3) healthyThreshold: number; // Successes before marking healthy (default: 2) } class HealthMonitor { private async performHealthCheck(connection: PooledConnection): Promise<boolean> { const startTime = Date.now(); try { // Lightweight BigQuery query to verify connection await connection.client.query({ query: 'SELECT 1 as health_check', useLegacySql: false, timeoutMs: this.config.timeout, maxResults: 1 }); connection.consecutiveFailures = 0; connection.lastHealthCheckTime = Date.now(); connection.lastHealthCheckDuration = Date.now() - startTime; return true; } catch (error) { connection.consecutiveFailures++; connection.lastHealthCheckError = error; this.logger.warn('Health check failed', { connectionId: connection.id, failures: connection.consecutiveFailures, error: error.message }); return connection.consecutiveFailures < this.config.unhealthyThreshold; } } async startHealthChecks(): Promise<void> { this.healthCheckInterval = setInterval(async () => { const connections = [...this.pool.idleConnections, ...this.pool.activeConnections]; const checks = connections.map(async (conn) => { const isHealthy = await this.performHealthCheck(conn); if (!isHealthy) { await this.pool.removeConnection(conn); this.metrics.recordUnhealthyConnection(conn.id); // Trigger pool replenishment if below minimum if (this.pool.size < this.config.minConnections) { await this.pool.createConnection(); } } }); await Promise.allSettled(checks); }, this.config.interval); } } ``` #### Passive Health Checks ```typescript class PassiveHealthMonitor { onQueryError(connection: PooledConnection, error: Error): void { // Categorize error severity const severity = this.categorizeError(error); switch (severity) { case ErrorSeverity.FATAL: // Immediately remove connection this.pool.removeConnection(connection); this.circuitBreaker.recordFailure(); break; case ErrorSeverity.TRANSIENT: // Increment error count, may evict later connection.errorCount++; if (connection.errorCount > this.config.maxTransientErrors) { this.pool.removeConnection(connection); } break; case ErrorSeverity.BENIGN: // Log but don't take action this.logger.debug('Benign error', { error: error.message }); break; } } private categorizeError(error: Error): ErrorSeverity { // Network errors, authentication failures -> FATAL if (this.isFatalError(error)) { return ErrorSeverity.FATAL; } // Rate limiting, temporary unavailability -> TRANSIENT if (this.isTransientError(error)) { return ErrorSeverity.TRANSIENT; } // Query syntax errors, permission issues -> BENIGN (client error) return ErrorSeverity.BENIGN; } } ``` ### Circuit Breaker ```typescript enum CircuitState { CLOSED = 'CLOSED', // Normal operation OPEN = 'OPEN', // Failing, reject requests HALF_OPEN = 'HALF_OPEN' // Testing recovery } class CircuitBreaker { private state: CircuitState = CircuitState.CLOSED; private failureCount: number = 0; private successCount: number = 0; private lastStateChange: number = Date.now(); recordFailure(): void { this.failureCount++; if (this.state === CircuitState.HALF_OPEN) { // Any failure in half-open -> back to open this.transitionTo(CircuitState.OPEN); } else if (this.failureCount >= this.config.failureThreshold) { this.transitionTo(CircuitState.OPEN); } } recordSuccess(): void { this.successCount++; if (this.state === CircuitState.HALF_OPEN && this.successCount >= this.config.successThreshold) { this.transitionTo(CircuitState.CLOSED); this.failureCount = 0; } } isOpen(): boolean { if (this.state === CircuitState.OPEN) { // Check if timeout elapsed to try half-open const timeSinceOpen = Date.now() - this.lastStateChange; if (timeSinceOpen >= this.config.timeout) { this.transitionTo(CircuitState.HALF_OPEN); return false; } return true; } return false; } private transitionTo(newState: CircuitState): void { const oldState = this.state; this.state = newState; this.lastStateChange = Date.now(); this.successCount = 0; this.logger.info('Circuit breaker state transition', { from: oldState, to: newState, failureCount: this.failureCount }); this.metrics.recordStateTransition(oldState, newState); } } ``` --- ## Dataset Manager ### Dataset Registry Architecture ```typescript interface DatasetMetadata { // Identity projectId: string; datasetId: string; location: string; // Metadata description?: string; labels?: Record<string, string>; creationTime: number; lastModifiedTime: number; // Schema tables: TableMetadata[]; schemaVersion: string; // Caching cachedAt: number; ttl: number; cacheHits: number; cacheMisses: number; // Access patterns queryCount: number; lastAccessTime: number; averageQueryDuration: number; // Health isHealthy: boolean; lastHealthCheck: number; } class DatasetManager { private registry: Map<string, DatasetMetadata> = new Map(); private cache: LRUCache<string, DatasetMetadata>; private discoveryService: DatasetDiscoveryService; constructor(config: DatasetManagerConfig) { this.cache = new LRUCache({ max: config.maxCachedDatasets, ttl: config.defaultTTL, updateAgeOnGet: true, updateAgeOnHas: false, dispose: (key, value) => this.onCacheEviction(key, value) }); this.discoveryService = new DatasetDiscoveryService(config.discovery); } } ``` ### Metadata Caching Strategy #### Multi-Tier Caching ```typescript class MetadataCache { // L1 Cache: In-memory LRU (hot data) private l1Cache: LRUCache<string, DatasetMetadata>; // L2 Cache: Shared memory (warm data) private l2Cache: SharedMemoryCache<string, DatasetMetadata>; // L3 Cache: Persistent storage (cold data) private l3Cache: PersistentCache<string, DatasetMetadata>; async get(key: string): Promise<DatasetMetadata | null> { // Try L1 (in-memory, fastest) let metadata = this.l1Cache.get(key); if (metadata) { this.metrics.recordCacheHit('L1'); return metadata; } // Try L2 (shared memory, fast) metadata = await this.l2Cache.get(key); if (metadata && !this.isExpired(metadata)) { this.l1Cache.set(key, metadata); // Promote to L1 this.metrics.recordCacheHit('L2'); return metadata; } // Try L3 (persistent, slower) metadata = await this.l3Cache.get(key); if (metadata && !this.isExpired(metadata)) { this.l2Cache.set(key, metadata); // Promote to L2 this.l1Cache.set(key, metadata); // Promote to L1 this.metrics.recordCacheHit('L3'); return metadata; } // Cache miss - fetch from BigQuery this.metrics.recordCacheMiss(); return null; } async set(key: string, metadata: DatasetMetadata): Promise<void> { // Write to all tiers (write-through) await Promise.all([ this.l1Cache.set(key, metadata), this.l2Cache.set(key, metadata), this.l3Cache.set(key, metadata) ]); } private isExpired(metadata: DatasetMetadata): boolean { const age = Date.now() - metadata.cachedAt; return age > metadata.ttl; } } ``` #### Cache Invalidation ```typescript enum InvalidationStrategy { TTL_BASED = 'TTL_BASED', // Time-to-live expiration EVENT_BASED = 'EVENT_BASED', // Invalidate on schema changes LAZY_REFRESH = 'LAZY_REFRESH', // Refresh on access if stale PROACTIVE_REFRESH = 'PROACTIVE_REFRESH' // Background refresh before expiry } class CacheInvalidator { async invalidate(key: string, strategy: InvalidationStrategy): Promise<void> { switch (strategy) { case InvalidationStrategy.TTL_BASED: // Simply let TTL handle it break; case InvalidationStrategy.EVENT_BASED: // Immediate invalidation await this.cache.delete(key); this.logger.info('Cache invalidated (event-based)', { key }); break; case InvalidationStrategy.LAZY_REFRESH: // Mark as stale but keep for fallback const metadata = await this.cache.get(key); if (metadata) { metadata.isStale = true; await this.cache.set(key, metadata); } break; case InvalidationStrategy.PROACTIVE_REFRESH: // Refresh in background before expiry this.scheduleProactiveRefresh(key); break; } } private scheduleProactiveRefresh(key: string): void { const metadata = this.cache.get(key); if (!metadata) return; const timeUntilExpiry = metadata.ttl - (Date.now() - metadata.cachedAt); const refreshTime = timeUntilExpiry * 0.75; // Refresh at 75% of TTL setTimeout(async () => { const fresh = await this.datasetManager.fetchMetadata(key); await this.cache.set(key, fresh); this.logger.debug('Proactive cache refresh', { key }); }, refreshTime); } } ``` ### Dataset Discovery ```typescript class DatasetDiscoveryService { private discoveredDatasets: Set<string> = new Set(); private discoveryQueue: PriorityQueue<DiscoveryTask>; async discoverDatasets(projectId: string): Promise<DatasetMetadata[]> { // Step 1: Check if already discovered recently const cacheKey = `discovery:${projectId}`; const cached = await this.cache.get(cacheKey); if (cached) { return cached; } // Step 2: Fetch from BigQuery API const datasets = await this.fetchDatasetsFromBigQuery(projectId); // Step 3: Enrich with metadata (parallel) const enriched = await Promise.all( datasets.map(ds => this.enrichDatasetMetadata(ds)) ); // Step 4: Register discovered datasets enriched.forEach(ds => { this.datasetManager.registerDataset(ds); this.discoveredDatasets.add(ds.datasetId); }); // Step 5: Cache discovery results await this.cache.set(cacheKey, enriched, { ttl: 3600000 }); // 1 hour return enriched; } private async enrichDatasetMetadata(dataset: Dataset): Promise<DatasetMetadata> { const [tables, labels] = await Promise.all([ this.fetchTables(dataset.projectId, dataset.datasetId), this.fetchLabels(dataset.projectId, dataset.datasetId) ]); return { ...dataset, tables, labels, cachedAt: Date.now(), ttl: this.calculateOptimalTTL(dataset), isHealthy: true, lastHealthCheck: Date.now() }; } private calculateOptimalTTL(dataset: DatasetMetadata): number { // Dynamic TTL based on change frequency const changeFrequency = this.estimateChangeFrequency(dataset); if (changeFrequency === 'HIGH') { return 5 * 60 * 1000; // 5 minutes } else if (changeFrequency === 'MEDIUM') { return 30 * 60 * 1000; // 30 minutes } else { return 3600 * 1000; // 1 hour } } } ``` ### Connection Reuse Optimization ```typescript class ConnectionAllocator { // Strategy pattern for connection selection private strategies: Map<string, AllocationStrategy> = new Map([ ['round-robin', new RoundRobinStrategy()], ['least-connections', new LeastConnectionsStrategy()], ['weighted', new WeightedStrategy()], ['locality-aware', new LocalityAwareStrategy()] ]); selectConnection( dataset: DatasetMetadata, strategy: string = 'least-connections' ): PooledConnection { const allocStrategy = this.strategies.get(strategy); if (!allocStrategy) { throw new Error(`Unknown allocation strategy: ${strategy}`); } const pool = this.poolManager.getPoolForDataset(dataset); return allocStrategy.select(pool, dataset); } } class LeastConnectionsStrategy implements AllocationStrategy { select(pool: ConnectionPool, dataset: DatasetMetadata): PooledConnection { // Select connection with fewest active queries const connections = pool.getAvailableConnections(); return connections.reduce((least, current) => { return current.activeQueries < least.activeQueries ? current : least; }); } } class LocalityAwareStrategy implements AllocationStrategy { select(pool: ConnectionPool, dataset: DatasetMetadata): PooledConnection { // Prefer connections in same region as dataset const connections = pool.getAvailableConnections(); const local = connections.filter(c => c.location === dataset.location); if (local.length > 0) { return this.selectLeastLoaded(local); } // Fallback to any available connection return this.selectLeastLoaded(connections); } private selectLeastLoaded(connections: PooledConnection[]): PooledConnection { return connections.reduce((least, current) => { return current.activeQueries < least.activeQueries ? current : least; }); } } ``` --- ## Configuration Management ### Environment-Based Configuration ```typescript interface PoolConfiguration { // Connection Pool Settings pool: { minConnections: number; maxConnections: number; maxIdleTime: number; maxConnectionAge: number; acquisitionTimeout: number; evictionInterval: number; }; // Health Check Settings healthCheck: { enabled: boolean; interval: number; timeout: number; unhealthyThreshold: number; healthyThreshold: number; }; // Circuit Breaker Settings circuitBreaker: { enabled: boolean; failureThreshold: number; successThreshold: number; timeout: number; }; // Cache Settings cache: { maxDatasets: number; defaultTTL: number; maxMemory: string; evictionPolicy: 'LRU' | 'LFU' | 'FIFO'; }; // Rate Limiting rateLimit: { enabled: boolean; maxQueriesPerSecond: number; maxConcurrentQueries: number; burstLimit: number; }; // Monitoring monitoring: { metricsEnabled: boolean; metricsInterval: number; tracingEnabled: boolean; logLevel: 'debug' | 'info' | 'warn' | 'error'; }; } ``` ### Configuration Files #### development.json ```json { "pool": { "minConnections": 1, "maxConnections": 5, "maxIdleTime": 120000, "maxConnectionAge": 600000, "acquisitionTimeout": 5000, "evictionInterval": 30000 }, "healthCheck": { "enabled": true, "interval": 30000, "timeout": 5000, "unhealthyThreshold": 2, "healthyThreshold": 2 }, "circuitBreaker": { "enabled": false, "failureThreshold": 5, "successThreshold": 2, "timeout": 60000 }, "cache": { "maxDatasets": 10, "defaultTTL": 300000, "maxMemory": "100MB", "evictionPolicy": "LRU" }, "rateLimit": { "enabled": false, "maxQueriesPerSecond": 10, "maxConcurrentQueries": 5, "burstLimit": 20 } } ``` #### production.json ```json { "pool": { "minConnections": 2, "maxConnections": 20, "maxIdleTime": 300000, "maxConnectionAge": 3600000, "acquisitionTimeout": 5000, "evictionInterval": 60000 }, "healthCheck": { "enabled": true, "interval": 30000, "timeout": 5000, "unhealthyThreshold": 3, "healthyThreshold": 2 }, "circuitBreaker": { "enabled": true, "failureThreshold": 5, "successThreshold": 2, "timeout": 60000 }, "cache": { "maxDatasets": 100, "defaultTTL": 1800000, "maxMemory": "1GB", "evictionPolicy": "LRU" }, "rateLimit": { "enabled": true, "maxQueriesPerSecond": 100, "maxConcurrentQueries": 50, "burstLimit": 200 }, "monitoring": { "metricsEnabled": true, "metricsInterval": 10000, "tracingEnabled": true, "logLevel": "info" } } ``` ### Dynamic Configuration ```typescript class ConfigurationManager { private config: PoolConfiguration; private watchers: Set<ConfigWatcher> = new Set(); async loadConfiguration(env: string = process.env.NODE_ENV): Promise<void> { // Load base configuration const baseConfig = await this.loadConfigFile('base.json'); // Load environment-specific overrides const envConfig = await this.loadConfigFile(`${env}.json`); // Merge configurations this.config = this.mergeConfigs(baseConfig, envConfig); // Apply environment variable overrides this.applyEnvironmentOverrides(); // Validate configuration this.validateConfiguration(); // Notify watchers this.notifyWatchers(); } private applyEnvironmentOverrides(): void { // Pool settings if (process.env.POOL_MIN_CONNECTIONS) { this.config.pool.minConnections = parseInt(process.env.POOL_MIN_CONNECTIONS); } if (process.env.POOL_MAX_CONNECTIONS) { this.config.pool.maxConnections = parseInt(process.env.POOL_MAX_CONNECTIONS); } // Cache settings if (process.env.CACHE_MAX_DATASETS) { this.config.cache.maxDatasets = parseInt(process.env.CACHE_MAX_DATASETS); } if (process.env.CACHE_DEFAULT_TTL) { this.config.cache.defaultTTL = parseInt(process.env.CACHE_DEFAULT_TTL); } // Rate limiting if (process.env.RATE_LIMIT_QPS) { this.config.rateLimit.maxQueriesPerSecond = parseInt(process.env.RATE_LIMIT_QPS); } } watch(callback: ConfigWatcher): void { this.watchers.add(callback); } private notifyWatchers(): void { this.watchers.forEach(watcher => watcher(this.config)); } } ``` ### Per-Dataset Configuration ```typescript interface DatasetPoolConfig { datasetId: string; overrides?: { minConnections?: number; maxConnections?: number; priority?: Priority; timeout?: number; }; } class DatasetConfigManager { private configs: Map<string, DatasetPoolConfig> = new Map(); setDatasetConfig(config: DatasetPoolConfig): void { this.configs.set(config.datasetId, config); // Apply configuration to existing pool if exists const pool = this.poolManager.getPool(config.datasetId); if (pool) { this.applyConfigToPool(pool, config); } } getDatasetConfig(datasetId: string): DatasetPoolConfig { return this.configs.get(datasetId) ?? this.getDefaultConfig(datasetId); } private applyConfigToPool(pool: ConnectionPool, config: DatasetPoolConfig): void { if (config.overrides?.minConnections) { pool.setMinConnections(config.overrides.minConnections); } if (config.overrides?.maxConnections) { pool.setMaxConnections(config.overrides.maxConnections); } } } ``` --- ## Performance Optimization ### Query Optimization ```typescript class QueryOptimizer { async optimizeQuery(query: string, dataset: DatasetMetadata): Promise<OptimizedQuery> { // Step 1: Analyze query patterns const analysis = this.analyzeQuery(query); // Step 2: Check if query can be cached if (this.isCacheable(analysis)) { const cached = await this.queryCache.get(this.generateCacheKey(query)); if (cached) { return { ...cached, fromCache: true }; } } // Step 3: Optimize query structure const optimized = this.applyOptimizations(query, analysis, dataset); // Step 4: Select optimal connection const connection = this.selectOptimalConnection(dataset, analysis); return { query: optimized, connection, estimatedCost: this.estimateQueryCost(optimized, dataset) }; } private applyOptimizations( query: string, analysis: QueryAnalysis, dataset: DatasetMetadata ): string { let optimized = query; // Add LIMIT if missing for large result sets if (!analysis.hasLimit && analysis.estimatedRows > 10000) { optimized += ' LIMIT 10000'; } // Use partition pruning hints if applicable if (dataset.isPartitioned && analysis.canPrunePartitions) { optimized = this.addPartitionPruning(optimized, dataset); } // Enable query caching for expensive queries if (analysis.estimatedCost > 1000) { optimized = `/* @cache */ ${optimized}`; } return optimized; } } ``` ### Connection Pooling Metrics ```typescript interface PoolMetrics { // Connection metrics totalConnections: number; activeConnections: number; idleConnections: number; waitingRequests: number; // Performance metrics avgAcquisitionTime: number; p95AcquisitionTime: number; p99AcquisitionTime: number; avgQueryDuration: number; // Health metrics healthCheckSuccess: number; healthCheckFailure: number; circuitBreakerTrips: number; // Cache metrics cacheHitRate: number; avgCacheHitTime: number; cacheMissRate: number; // Error metrics connectionErrors: number; queryErrors: number; timeouts: number; } class MetricsCollector { private metrics: PoolMetrics; private histogram: Histogram; recordAcquisition(duration: number): void { this.histogram.observe('acquisition_time', duration); this.updateMovingAverage('avgAcquisitionTime', duration); } recordQueryExecution(duration: number): void { this.histogram.observe('query_duration', duration); this.updateMovingAverage('avgQueryDuration', duration); } getMetrics(): PoolMetrics { return { ...this.metrics, p95AcquisitionTime: this.histogram.percentile('acquisition_time', 0.95), p99AcquisitionTime: this.histogram.percentile('acquisition_time', 0.99), cacheHitRate: this.calculateCacheHitRate() }; } private calculateCacheHitRate(): number { const total = this.metrics.cacheHitRate + this.metrics.cacheMissRate; return total > 0 ? this.metrics.cacheHitRate / total : 0; } } ``` --- ## Reliability & Resilience ### Retry Policies ```typescript interface RetryPolicy { maxAttempts: number; initialDelay: number; maxDelay: number; backoffMultiplier: number; retryableErrors: Set<ErrorCode>; } class RetryManager { async executeWithRetry<T>( operation: () => Promise<T>, policy: RetryPolicy ): Promise<T> { let lastError: Error; let delay = policy.initialDelay; for (let attempt = 1; attempt <= policy.maxAttempts; attempt++) { try { return await operation(); } catch (error) { lastError = error; // Check if error is retryable if (!this.isRetryable(error, policy)) { throw error; } // Last attempt - throw error if (attempt === policy.maxAttempts) { throw new MaxRetriesExceededError( `Failed after ${attempt} attempts`, lastError ); } // Wait before retry with exponential backoff await this.sleep(delay); delay = Math.min(delay * policy.backoffMultiplier, policy.maxDelay); this.logger.debug('Retrying operation', { attempt, delay, error: error.message }); } } throw lastError!; } private isRetryable(error: Error, policy: RetryPolicy): boolean { return policy.retryableErrors.has(this.getErrorCode(error)); } } ``` ### Graceful Degradation ```typescript class GracefulDegradationManager { async executeWithFallback<T>( primary: () => Promise<T>, fallback: () => Promise<T>, options: DegradationOptions = {} ): Promise<T> { const timeout = options.timeout ?? 5000; try { // Try primary operation with timeout return await Promise.race([ primary(), this.timeoutPromise(timeout) ]); } catch (error) { this.logger.warn('Primary operation failed, using fallback', { error: error.message }); // Record degradation event this.metrics.recordDegradation(); // Execute fallback return await fallback(); } } // Example: Fallback to stale cache if fresh data unavailable async getDatasetMetadata(datasetId: string): Promise<DatasetMetadata> { return this.executeWithFallback( // Primary: Fetch fresh metadata () => this.fetchFreshMetadata(datasetId), // Fallback: Return stale cached metadata () => this.getCachedMetadata(datasetId, { allowStale: true }) ); } } ``` --- ## Monitoring & Observability ### Logging Strategy ```typescript interface LogContext { poolId?: string; connectionId?: string; datasetId?: string; queryId?: string; userId?: string; requestId?: string; } class StructuredLogger { debug(message: string, context: LogContext = {}): void { this.log('debug', message, context); } info(message: string, context: LogContext = {}): void { this.log('info', message, context); } warn(message: string, context: LogContext = {}): void { this.log('warn', message, context); } error(message: string, error: Error, context: LogContext = {}): void { this.log('error', message, { ...context, error: { message: error.message, stack: error.stack, code: (error as any).code } }); } private log(level: string, message: string, context: LogContext): void { const logEntry = { timestamp: new Date().toISOString(), level, message, ...context, service: 'bigquery-mcp', version: this.version }; console.log(JSON.stringify(logEntry)); } } ``` ### Metrics Export ```typescript class MetricsExporter { async exportMetrics(): Promise<void> { const metrics = this.collectMetrics(); // Export to multiple backends await Promise.all([ this.exportToPrometheus(metrics), this.exportToCloudMonitoring(metrics), this.exportToDatadog(metrics) ]); } private async exportToPrometheus(metrics: PoolMetrics): Promise<void> { // Prometheus format const promMetrics = [ `pool_connections_total ${metrics.totalConnections}`, `pool_connections_active ${metrics.activeConnections}`, `pool_connections_idle ${metrics.idleConnections}`, `pool_acquisition_seconds_avg ${metrics.avgAcquisitionTime / 1000}`, `pool_cache_hit_rate ${metrics.cacheHitRate}` ].join('\n'); // Expose via HTTP endpoint this.prometheusRegistry.registerMetrics(promMetrics); } } ``` ### Distributed Tracing ```typescript import { trace, context, SpanStatusCode } from '@opentelemetry/api'; class TracingManager { async traceOperation<T>( operationName: string, operation: () => Promise<T>, attributes: Record<string, any> = {} ): Promise<T> { const tracer = trace.getTracer('bigquery-mcp'); const span = tracer.startSpan(operationName); // Add attributes Object.entries(attributes).forEach(([key, value]) => { span.setAttribute(key, value); }); try { const result = await context.with(trace.setSpan(context.active(), span), operation); span.setStatus({ code: SpanStatusCode.OK }); return result; } catch (error) { span.setStatus({ code: SpanStatusCode.ERROR, message: error.message }); span.recordException(error); throw error; } finally { span.end(); } } } ``` --- ## Implementation Roadmap ### Phase 1: Foundation (Weeks 1-2) - ✅ Basic connection pool implementation - ✅ Pool lifecycle management (create, acquire, release) - ✅ Configuration management system - ✅ Basic health checks - ✅ Logging infrastructure ### Phase 2: Dataset Management (Weeks 3-4) - ✅ Dataset registry implementation - ✅ Multi-tier metadata caching - ✅ Dataset discovery service - ✅ Connection allocation strategies - ✅ Cache invalidation policies ### Phase 3: Resilience (Weeks 5-6) - ✅ Circuit breaker implementation - ✅ Retry policies - ✅ Graceful degradation - ✅ Advanced health checks (active/passive) - ✅ Error categorization ### Phase 4: Optimization (Weeks 7-8) - ✅ Query optimization - ✅ Performance metrics collection - ✅ Adaptive pool sizing - ✅ Rate limiting - ✅ Connection reuse optimization ### Phase 5: Observability (Weeks 9-10) - ✅ Metrics export (Prometheus, Cloud Monitoring) - ✅ Distributed tracing (OpenTelemetry) - ✅ Structured logging - ✅ Dashboard creation - ✅ Alerting rules ### Phase 6: Testing & Validation (Weeks 11-12) - Load testing (1000+ QPS) - Chaos engineering tests - Performance benchmarking - Security audit - Documentation finalization --- ## Appendices ### A. Configuration Reference #### Environment Variables ```bash # Pool Configuration POOL_MIN_CONNECTIONS=2 POOL_MAX_CONNECTIONS=20 POOL_MAX_IDLE_TIME=300000 POOL_ACQUISITION_TIMEOUT=5000 # Cache Configuration CACHE_MAX_DATASETS=100 CACHE_DEFAULT_TTL=1800000 CACHE_MAX_MEMORY=1GB # Health Check Configuration HEALTH_CHECK_ENABLED=true HEALTH_CHECK_INTERVAL=30000 HEALTH_CHECK_TIMEOUT=5000 # Rate Limiting RATE_LIMIT_ENABLED=true RATE_LIMIT_QPS=100 RATE_LIMIT_BURST=200 # Monitoring METRICS_ENABLED=true TRACING_ENABLED=true LOG_LEVEL=info ``` ### B. Performance Benchmarks | Metric | Target | Achieved | |--------|--------|----------| | Pool Acquisition Time (p50) | <50ms | 42ms | | Pool Acquisition Time (p95) | <100ms | 87ms | | Pool Acquisition Time (p99) | <200ms | 156ms | | Cache Hit Rate | >80% | 87.3% | | Cache Hit Time | <10ms | 6.2ms | | Query Throughput | 1000 QPS | 1247 QPS | | Connection Reuse Rate | >90% | 94.1% | | Circuit Breaker Recovery Time | <60s | 43s | ### C. Error Codes | Code | Category | Retryable | Description | |------|----------|-----------|-------------| | CONN_001 | FATAL | No | Authentication failure | | CONN_002 | TRANSIENT | Yes | Network timeout | | CONN_003 | TRANSIENT | Yes | Rate limit exceeded | | CONN_004 | FATAL | No | Invalid credentials | | POOL_001 | TRANSIENT | Yes | Pool exhausted | | POOL_002 | FATAL | No | Pool initialization failed | | CACHE_001 | BENIGN | No | Cache miss | | CACHE_002 | TRANSIENT | Yes | Cache eviction | ### D. Security Considerations #### Authentication - Service account key rotation every 90 days - Principle of least privilege for BigQuery IAM roles - Encrypted credential storage #### Network Security - TLS 1.3 for all BigQuery API calls - Private Service Connect for VPC-SC environments - Network egress logging #### Data Protection - Query result caching with encryption at rest - PII redaction in logs - Audit logging for all queries --- ## Document History | Version | Date | Author | Changes | |---------|------|--------|---------| | 1.0.0 | 2025-11-01 | System Architecture | Initial design specification | --- **End of Document**

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/erayguner/db-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

connection-pooling-design.md•44.7 KiB