# BigQuery MCP Server Research Findings
**Research Agent**: Researcher
**Session ID**: session-session-1761478601265-7bkgfbfin
**Date**: 2025-11-02
**Status**: COMPLETE
---
## Executive Summary
This document consolidates comprehensive research findings for the BigQuery MCP Server project, covering enterprise-grade best practices across six critical areas:
1. **BigQuery API Best Practices** - Enterprise patterns for production deployments
2. **MCP Server Architecture** - Model Context Protocol implementation standards
3. **Workload Identity Federation** - Zero-key authentication security patterns
4. **Connection Pooling** - Efficient resource management strategies
5. **Query Optimization** - Performance and cost optimization techniques
6. **Rate Limiting & Quota Management** - Protection against abuse and cost overruns
---
## 1. BigQuery API Best Practices for Enterprise Applications
### 1.1 Connection Management
#### Best Practice: Use Connection Pooling
```typescript
// ✅ RECOMMENDED: Connection pool with automatic cleanup
class BigQueryConnectionPool {
private pool: Map<string, BigQuery> = new Map();
private maxConnections: number = 10;
private connectionTTL: number = 300000; // 5 minutes
private lastAccess: Map<string, number> = new Map();
async getConnection(projectId: string): Promise<BigQuery> {
const key = this.getConnectionKey(projectId);
// Check if connection exists and is fresh
if (this.pool.has(key)) {
const lastUse = this.lastAccess.get(key) || 0;
if (Date.now() - lastUse < this.connectionTTL) {
this.lastAccess.set(key, Date.now());
return this.pool.get(key)!;
}
// Connection expired, remove it
this.pool.delete(key);
this.lastAccess.delete(key);
}
// Create new connection if under limit
if (this.pool.size >= this.maxConnections) {
this.evictOldestConnection();
}
const client = new BigQuery({
projectId,
// WIF credentials automatically loaded
});
this.pool.set(key, client);
this.lastAccess.set(key, Date.now());
return client;
}
private evictOldestConnection(): void {
let oldestKey: string | null = null;
let oldestTime = Date.now();
for (const [key, time] of this.lastAccess.entries()) {
if (time < oldestTime) {
oldestTime = time;
oldestKey = key;
}
}
if (oldestKey) {
this.pool.delete(oldestKey);
this.lastAccess.delete(oldestKey);
}
}
// Cleanup idle connections periodically
startCleanupTask(): void {
setInterval(() => {
const now = Date.now();
for (const [key, lastUse] of this.lastAccess.entries()) {
if (now - lastUse > this.connectionTTL) {
this.pool.delete(key);
this.lastAccess.delete(key);
}
}
}, 60000); // Every minute
}
}
```
**Benefits**:
- Reduces connection overhead (authentication latency)
- Reuses HTTP connections (better performance)
- Automatic cleanup prevents memory leaks
- TTL ensures fresh credentials with WIF
### 1.2 Query Optimization Patterns
#### Pattern 1: Query Parameterization
```typescript
// ✅ SECURE & PERFORMANT: Use parameterized queries
async function executeParameterizedQuery(
client: BigQuery,
sql: string,
params: Record<string, any>
): Promise<any[]> {
const query = {
query: sql,
params,
useLegacySql: false,
useQueryCache: true, // Enable query cache
};
const [rows] = await client.query(query);
return rows;
}
// Example usage
const results = await executeParameterizedQuery(
client,
'SELECT * FROM `dataset.table` WHERE user_id = @userId AND date > @startDate',
{
userId: 12345,
startDate: '2024-01-01'
}
);
```
**Benefits**:
- Prevents SQL injection
- Enables query plan caching
- Better performance for repeated queries
- Type safety with parameters
#### Pattern 2: Partition and Clustering
```typescript
// ✅ COST-EFFECTIVE: Use partitioned and clustered tables
const tableMetadata = {
schema: {
fields: [
{name: 'timestamp', type: 'TIMESTAMP'},
{name: 'user_id', type: 'STRING'},
{name: 'event_type', type: 'STRING'},
{name: 'data', type: 'JSON'},
]
},
timePartitioning: {
type: 'DAY',
field: 'timestamp',
expirationMs: 7776000000, // 90 days
},
clustering: {
fields: ['user_id', 'event_type']
},
// Require partition filter for all queries
requirePartitionFilter: true,
};
// Query with partition pruning
const query = `
SELECT user_id, COUNT(*) as events
FROM \`project.dataset.events\`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
AND user_id = @userId
GROUP BY user_id
`;
```
**Cost Savings**:
- Up to 90% reduction in query costs
- Faster query execution (smaller scan size)
- Automatic data lifecycle management
- Partition pruning reduces bytes processed
#### Pattern 3: Query Result Caching
```typescript
class QueryCache {
private cache: Map<string, {result: any[], expiry: number}> = new Map();
private defaultTTL: number = 300000; // 5 minutes
async executeWithCache(
client: BigQuery,
sql: string,
ttl?: number
): Promise<any[]> {
const cacheKey = this.generateCacheKey(sql);
// Check cache
if (this.cache.has(cacheKey)) {
const cached = this.cache.get(cacheKey)!;
if (Date.now() < cached.expiry) {
return cached.result;
}
this.cache.delete(cacheKey);
}
// Execute query
const [rows] = await client.query({
query: sql,
useQueryCache: true, // BigQuery's built-in cache
});
// Store in app cache
this.cache.set(cacheKey, {
result: rows,
expiry: Date.now() + (ttl || this.defaultTTL)
});
return rows;
}
private generateCacheKey(sql: string): string {
return crypto.createHash('sha256').update(sql).digest('hex');
}
}
```
**Multi-Layer Caching**:
1. **BigQuery Cache** (24 hours, free): Exact query match
2. **Application Cache** (configurable): Results stored in memory
3. **CDN/Edge Cache** (for static queries): Distributed caching
### 1.3 Error Handling & Retry Logic
#### Enterprise-Grade Retry Strategy
```typescript
interface RetryConfig {
maxRetries: number;
initialDelayMs: number;
maxDelayMs: number;
backoffMultiplier: number;
retryableErrors: string[];
}
class BigQueryRetryHandler {
private config: RetryConfig = {
maxRetries: 5,
initialDelayMs: 1000,
maxDelayMs: 32000,
backoffMultiplier: 2,
retryableErrors: [
'RATE_LIMIT_EXCEEDED',
'QUOTA_EXCEEDED',
'INTERNAL',
'UNAVAILABLE',
'DEADLINE_EXCEEDED'
]
};
async executeWithRetry<T>(
operation: () => Promise<T>,
context: string
): Promise<T> {
let lastError: Error | null = null;
let delayMs = this.config.initialDelayMs;
for (let attempt = 0; attempt <= this.config.maxRetries; attempt++) {
try {
return await operation();
} catch (error: any) {
lastError = error;
// Check if error is retryable
if (!this.isRetryable(error)) {
throw error;
}
// Last attempt, don't wait
if (attempt === this.config.maxRetries) {
break;
}
// Log retry attempt
console.warn(`[BigQuery] Retry ${attempt + 1}/${this.config.maxRetries} for ${context}`, {
error: error.message,
delayMs
});
// Wait with exponential backoff + jitter
await this.sleep(delayMs + Math.random() * 1000);
delayMs = Math.min(delayMs * this.config.backoffMultiplier, this.config.maxDelayMs);
}
}
throw new Error(`Operation failed after ${this.config.maxRetries} retries: ${lastError?.message}`);
}
private isRetryable(error: any): boolean {
const errorCode = error.code || error.errors?.[0]?.reason;
return this.config.retryableErrors.includes(errorCode);
}
private sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
}
```
**Retry Best Practices**:
- Exponential backoff with jitter (prevents thundering herd)
- Differentiate transient vs. permanent errors
- Circuit breaker for cascading failures
- Comprehensive logging for debugging
### 1.4 Cost Optimization Strategies
#### Pattern 1: Query Cost Estimation
```typescript
async function estimateQueryCost(
client: BigQuery,
sql: string
): Promise<{bytesProcessed: number; estimatedCostUSD: number}> {
// Dry run to get bytes processed
const [job] = await client.createQueryJob({
query: sql,
dryRun: true,
});
const bytesProcessed = parseInt(job.metadata.statistics.totalBytesProcessed || '0');
// BigQuery pricing: $6.25 per TB processed (as of 2024)
const costPerByte = 6.25 / (1024 * 1024 * 1024 * 1024); // $6.25 per TB
const estimatedCostUSD = bytesProcessed * costPerByte;
return {bytesProcessed, estimatedCostUSD};
}
// Usage: Enforce cost limits before execution
const {bytesProcessed, estimatedCostUSD} = await estimateQueryCost(client, userQuery);
if (estimatedCostUSD > 10.0) {
throw new Error(`Query cost ($${estimatedCostUSD.toFixed(2)}) exceeds limit ($10.00)`);
}
```
#### Pattern 2: Quota Management
```typescript
class QuotaManager {
private dailyQuota: number = 100; // GB per day
private usedToday: number = 0;
private lastReset: Date = new Date();
async checkQuota(bytesProcessed: number): Promise<void> {
this.resetIfNewDay();
const gbProcessed = bytesProcessed / (1024 * 1024 * 1024);
if (this.usedToday + gbProcessed > this.dailyQuota) {
throw new Error(`Daily quota exceeded (${this.usedToday.toFixed(2)}/${this.dailyQuota} GB)`);
}
this.usedToday += gbProcessed;
}
private resetIfNewDay(): void {
const now = new Date();
if (now.getDate() !== this.lastReset.getDate()) {
this.usedToday = 0;
this.lastReset = now;
}
}
}
```
---
## 2. MCP Server Architecture Patterns and Standards
### 2.1 MCP Protocol Overview
The Model Context Protocol (MCP) is a standardized protocol for integrating AI models with external data sources and tools. For BigQuery, MCP enables:
- **Structured Tool Definitions**: Standardized query execution interface
- **Resource Providers**: Dataset and table metadata access
- **Prompt Templates**: Pre-built SQL query patterns
- **Bidirectional Communication**: Streaming results and progress updates
### 2.2 MCP Tool Implementation Pattern
```typescript
/**
* Standard MCP Tool for BigQuery Operations
*/
interface MCPTool {
name: string;
description: string;
inputSchema: JSONSchema;
annotations?: {
cost?: 'low' | 'medium' | 'high';
requiresAuth?: boolean;
cacheable?: boolean;
};
}
const listDatasetsToolSchema: MCPTool = {
name: 'list_datasets',
description: 'List all BigQuery datasets accessible to the authenticated user',
inputSchema: {
type: 'object',
properties: {
projectId: {
type: 'string',
description: 'GCP project ID'
},
maxResults: {
type: 'number',
description: 'Maximum number of datasets to return',
default: 100,
maximum: 1000
}
},
required: ['projectId']
},
annotations: {
cost: 'low',
requiresAuth: true,
cacheable: true
}
};
const executeQueryToolSchema: MCPTool = {
name: 'execute_query',
description: 'Execute a BigQuery SQL query and return results',
inputSchema: {
type: 'object',
properties: {
projectId: {
type: 'string',
description: 'GCP project ID'
},
query: {
type: 'string',
description: 'Standard SQL query to execute',
maxLength: 1048576 // 1MB max
},
maxResults: {
type: 'number',
description: 'Maximum number of rows to return',
default: 1000,
maximum: 10000
},
timeoutMs: {
type: 'number',
description: 'Query timeout in milliseconds',
default: 60000,
maximum: 600000 // 10 minutes max
},
useLegacySql: {
type: 'boolean',
description: 'Use legacy SQL syntax (default: false)',
default: false
}
},
required: ['projectId', 'query']
},
annotations: {
cost: 'high',
requiresAuth: true,
cacheable: false
}
};
```
### 2.3 MCP Resource Provider Pattern
```typescript
/**
* MCP Resource for BigQuery Metadata
*/
interface MCPResource {
uri: string;
name: string;
description?: string;
mimeType?: string;
}
class BigQueryResourceProvider {
async listResources(): Promise<MCPResource[]> {
return [
{
uri: 'bigquery://datasets',
name: 'BigQuery Datasets',
description: 'List of all accessible datasets',
mimeType: 'application/json'
},
{
uri: 'bigquery://datasets/{datasetId}/tables',
name: 'Dataset Tables',
description: 'Tables within a specific dataset',
mimeType: 'application/json'
},
{
uri: 'bigquery://datasets/{datasetId}/tables/{tableId}/schema',
name: 'Table Schema',
description: 'Schema definition for a table',
mimeType: 'application/json'
}
];
}
async readResource(uri: string): Promise<any> {
const parsed = this.parseResourceURI(uri);
switch (parsed.type) {
case 'datasets':
return this.listDatasets(parsed.projectId);
case 'tables':
return this.listTables(parsed.projectId, parsed.datasetId);
case 'schema':
return this.getTableSchema(
parsed.projectId,
parsed.datasetId,
parsed.tableId
);
default:
throw new Error(`Unknown resource type: ${parsed.type}`);
}
}
}
```
### 2.4 MCP Server Health Monitoring
```typescript
interface MCPHealthStatus {
status: 'healthy' | 'degraded' | 'unhealthy';
timestamp: string;
checks: {
wifAuthentication: boolean;
bigqueryConnection: boolean;
quotaAvailable: boolean;
};
metadata?: {
activeConnections: number;
cacheHitRate: number;
avgQueryLatencyMs: number;
};
}
class MCPHealthMonitor {
async checkHealth(): Promise<MCPHealthStatus> {
const checks = await Promise.all([
this.checkWIFAuthentication(),
this.checkBigQueryConnection(),
this.checkQuotaAvailability()
]);
const allHealthy = checks.every(c => c === true);
const anyFailing = checks.some(c => c === false);
return {
status: allHealthy ? 'healthy' : (anyFailing ? 'unhealthy' : 'degraded'),
timestamp: new Date().toISOString(),
checks: {
wifAuthentication: checks[0],
bigqueryConnection: checks[1],
quotaAvailable: checks[2]
},
metadata: {
activeConnections: this.getActiveConnections(),
cacheHitRate: this.getCacheHitRate(),
avgQueryLatencyMs: this.getAvgQueryLatency()
}
};
}
}
```
---
## 3. Workload Identity Federation Security Patterns
### 3.1 Token Lifecycle Management
```typescript
class WIFTokenManager {
private currentToken: {
accessToken: string;
expiry: Date;
} | null = null;
private refreshBuffer: number = 300; // 5 minutes before expiry
async getToken(): Promise<string> {
if (this.shouldRefresh()) {
await this.refreshToken();
}
return this.currentToken!.accessToken;
}
private shouldRefresh(): boolean {
if (!this.currentToken) return true;
const bufferMs = this.refreshBuffer * 1000;
const refreshTime = new Date(this.currentToken.expiry.getTime() - bufferMs);
return new Date() >= refreshTime;
}
private async refreshToken(): Promise<void> {
// Exchange external token with STS
const stsResponse = await this.exchangeToken();
// Impersonate service account
const saToken = await this.impersonateServiceAccount(stsResponse.accessToken);
this.currentToken = {
accessToken: saToken.accessToken,
expiry: new Date(saToken.expireTime)
};
}
}
```
### 3.2 Attribute-Based Access Control
```typescript
// Terraform configuration for ABAC
const attributeConditions = {
// Only verified emails from company domain
emailRestriction: `
assertion.email_verified == true &&
assertion.hd == 'company.com'
`,
// Require specific group membership
groupRestriction: `
assertion.groups.contains('bigquery-users') ||
assertion.groups.contains('data-engineers')
`,
// Require MFA for production access
mfaRequirement: `
assertion.amr.contains('mfa')
`,
// Time-based access (business hours only)
timeRestriction: `
request.time.getHours() >= 8 &&
request.time.getHours() <= 18 &&
request.time.getDayOfWeek() >= 1 &&
request.time.getDayOfWeek() <= 5
`
};
```
---
## 4. Connection Pooling Strategies
### 4.1 Intelligent Connection Pooling
```typescript
class AdaptiveConnectionPool {
private minConnections: number = 2;
private maxConnections: number = 10;
private currentSize: number = 0;
private waitQueue: Array<(client: BigQuery) => void> = [];
async acquire(projectId: string): Promise<BigQuery> {
// Try to get existing connection
const existing = this.tryGetExistingConnection(projectId);
if (existing) return existing;
// Create new connection if under limit
if (this.currentSize < this.maxConnections) {
return this.createConnection(projectId);
}
// Wait for available connection
return new Promise((resolve) => {
this.waitQueue.push(resolve);
});
}
release(client: BigQuery): void {
if (this.waitQueue.length > 0) {
const next = this.waitQueue.shift()!;
next(client);
} else {
// Return to pool
this.returnToPool(client);
}
}
// Auto-scale based on load
async autoScale(): Promise<void> {
const queueLength = this.waitQueue.length;
if (queueLength > 5 && this.currentSize < this.maxConnections) {
// Scale up
await this.createConnection(this.defaultProjectId);
} else if (queueLength === 0 && this.currentSize > this.minConnections) {
// Scale down
this.evictIdleConnection();
}
}
}
```
---
## 5. Query Optimization Techniques
### 5.1 Query Performance Analyzer
```typescript
class QueryOptimizer {
async analyzeQuery(sql: string): Promise<{
warnings: string[];
suggestions: string[];
estimatedCost: number;
complexity: 'low' | 'medium' | 'high';
}> {
const warnings: string[] = [];
const suggestions: string[] = [];
// Check for SELECT *
if (/SELECT\s+\*/i.test(sql)) {
warnings.push('Avoid SELECT * - specify only needed columns');
suggestions.push('Replace SELECT * with explicit column names');
}
// Check for LIMIT clause
if (!/LIMIT\s+\d+/i.test(sql)) {
warnings.push('No LIMIT clause found');
suggestions.push('Add LIMIT clause to reduce data transfer');
}
// Check for partition filter
if (!/WHERE.*timestamp|date/i.test(sql)) {
warnings.push('No partition filter detected');
suggestions.push('Add WHERE clause on partition column');
}
// Estimate cost
const {bytesProcessed, estimatedCostUSD} = await this.estimateCost(sql);
// Determine complexity
const complexity = this.calculateComplexity(sql);
return {
warnings,
suggestions,
estimatedCost: estimatedCostUSD,
complexity
};
}
}
```
---
## 6. Rate Limiting & Quota Management
### 6.1 Multi-Layer Rate Limiting
```typescript
class RateLimiter {
private tokenbuckets: Map<string, TokenBucket> = new Map();
async checkLimit(userId: string, operation: string): Promise<boolean> {
const key = `${userId}:${operation}`;
if (!this.tokenbuckets.has(key)) {
this.tokenbuckets.set(key, new TokenBucket({
capacity: 100, // 100 requests
fillRate: 10, // 10 per second
fillInterval: 1000 // Every 1 second
}));
}
const bucket = this.tokenbuckets.get(key)!;
return bucket.consume(1);
}
}
class TokenBucket {
private tokens: number;
constructor(private config: {
capacity: number;
fillRate: number;
fillInterval: number;
}) {
this.tokens = config.capacity;
this.startRefill();
}
consume(tokens: number): boolean {
if (this.tokens >= tokens) {
this.tokens -= tokens;
return true;
}
return false;
}
private startRefill(): void {
setInterval(() => {
this.tokens = Math.min(
this.tokens + this.config.fillRate,
this.config.capacity
);
}, this.config.fillInterval);
}
}
```
### 6.2 Quota Enforcement
```typescript
interface QuotaConfig {
dailyQueriesLimit: number;
dailyBytesLimit: number; // GB
concurrentQueriesLimit: number;
queryCostLimit: number; // USD
}
class QuotaEnforcer {
async enforceQuota(
userId: string,
query: string
): Promise<{allowed: boolean; reason?: string}> {
// Check concurrent queries
const concurrent = await this.getConcurrentQueries(userId);
if (concurrent >= this.config.concurrentQueriesLimit) {
return {
allowed: false,
reason: `Concurrent query limit reached (${concurrent}/${this.config.concurrentQueriesLimit})`
};
}
// Check daily query count
const todayCount = await this.getTodayQueryCount(userId);
if (todayCount >= this.config.dailyQueriesLimit) {
return {
allowed: false,
reason: `Daily query limit reached (${todayCount}/${this.config.dailyQueriesLimit})`
};
}
// Estimate cost
const {estimatedCostUSD} = await this.estimateQueryCost(query);
if (estimatedCostUSD > this.config.queryCostLimit) {
return {
allowed: false,
reason: `Query cost ($${estimatedCostUSD.toFixed(2)}) exceeds limit ($${this.config.queryCostLimit})`
};
}
return {allowed: true};
}
}
```
---
## Summary of Key Findings
### Critical Recommendations
1. **Workload Identity Federation** (PRIORITY: P0)
- Eliminate all service account keys
- Implement 1-hour token lifetime
- Use attribute-based access control
- Enable comprehensive audit logging
2. **Connection Pooling** (PRIORITY: P1)
- Implement adaptive connection pool (2-10 connections)
- Use TTL-based cleanup (5 minutes)
- Auto-scale based on queue length
3. **Query Optimization** (PRIORITY: P1)
- Mandatory partition filters
- Query cost estimation before execution
- Multi-layer caching (BigQuery + App + CDN)
- Parameterized queries for security
4. **Rate Limiting** (PRIORITY: P1)
- Token bucket algorithm (100 req/min per user)
- Daily quota enforcement
- Concurrent query limits
- Cost-based throttling
5. **Error Handling** (PRIORITY: P2)
- Exponential backoff with jitter
- Circuit breaker for cascading failures
- Differentiate transient vs. permanent errors
- Comprehensive logging and metrics
### Performance Metrics
| Metric | Target | Measurement |
|--------|--------|-------------|
| Query Latency (p95) | < 2 seconds | Cloud Monitoring |
| Cache Hit Rate | > 60% | Application metrics |
| Connection Pool Efficiency | > 80% reuse | Pool statistics |
| Token Refresh Success Rate | > 99.9% | WIF metrics |
| Cost Per Query | < $0.01 | BigQuery logs |
### Security Posture
| Control | Implementation | Status |
|---------|----------------|--------|
| Zero-Key Authentication | WIF with OIDC | ✅ DESIGNED |
| Attribute-Based Access | Domain, groups, MFA | ✅ DESIGNED |
| Audit Logging | Cloud Audit Logs + BigQuery sink | ✅ DESIGNED |
| Encryption in Transit | HTTPS + TLS 1.3 | ✅ DEFAULT |
| Encryption at Rest | CMEK for datasets | ✅ DESIGNED |
---
**Next Steps**:
1. Share findings with architecture team via memory
2. Coordinate implementation with coder agents
3. Validate security patterns with security team
4. Create performance benchmarks
5. Document operational runbooks
**Coordination Memory Key**: `research/bigquery/findings-complete`