BigQuery Validator

MIT License

Overview InspectNew Endpoints Schema Related Servers Reviews Score

mcp-bigquery
docs

api-reference.md•12.4 kB

# API Reference ## Tools MCP BigQuery provides eleven comprehensive tools for SQL validation, analysis, and schema discovery. ### bq_validate_sql Validates BigQuery SQL syntax without executing the query. **Input Schema:** ```json { "type": "object", "properties": { "sql": { "type": "string", "description": "The SQL query to validate" }, "params": { "type": "object", "description": "Optional query parameters", "additionalProperties": true } }, "required": ["sql"] } ``` **Parameters:** | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `sql` | string | Yes | The SQL query to validate | | `params` | object | No | Query parameters as key-value pairs | **Response (Success):** ```json { "isValid": true } ``` **Response (Error):** ```json { "isValid": false, "error": { "code": "INVALID_SQL", "message": "Error description", "location": { "line": 1, "column": 15 } } } ``` ### bq_dry_run_sql Performs a dry-run analysis to estimate costs and preview schema. **Input Schema:** ```json { "type": "object", "properties": { "sql": { "type": "string", "description": "The SQL query to analyze" }, "params": { "type": "object", "description": "Optional query parameters", "additionalProperties": true }, "pricePerTiB": { "type": "number", "description": "Price per TiB for cost estimation", "default": 5.0 } }, "required": ["sql"] } ``` **Parameters:** | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `sql` | string | Yes | - | The SQL query to analyze | | `params` | object | No | {} | Query parameters | | `pricePerTiB` | number | No | 5.0 | USD price per TiB | **Response (Success):** ```json { "totalBytesProcessed": 1073741824, "usdEstimate": 0.005, "referencedTables": [ { "project": "project-id", "dataset": "dataset_name", "table": "table_name" } ], "schemaPreview": [ { "name": "column_name", "type": "STRING", "mode": "NULLABLE", "description": "Column description" } ] } ``` **Response (Error):** ```json { "error": { "code": "INVALID_SQL", "message": "Error description", "details": [] } } ``` ### bq_analyze_query_structure Analyzes SQL query structure and complexity. **Input Schema:** ```json { "type": "object", "properties": { "sql": { "type": "string", "description": "The SQL query to analyze" }, "params": { "type": "object", "description": "Optional query parameters" } }, "required": ["sql"] } ``` **Response:** ```json { "query_type": "SELECT", "has_joins": true, "has_subqueries": false, "has_cte": true, "has_aggregations": true, "has_window_functions": false, "table_count": 2, "complexity_score": 25, "join_types": ["LEFT", "INNER"], "functions_used": ["COUNT", "SUM"] } ``` ### bq_extract_dependencies Extracts table and column dependencies from SQL. **Response:** ```json { "tables": [ { "project": "my-project", "dataset": "my_dataset", "table": "my_table", "full_name": "my-project.my_dataset.my_table" } ], "columns": ["id", "name", "created_at"], "dependency_graph": { "my_dataset.my_table": ["id", "name", "created_at"] }, "table_count": 1, "column_count": 3 } ``` ### bq_validate_query_syntax Enhanced syntax validation with detailed error reporting. **Response:** ```json { "is_valid": true, "issues": [ { "type": "performance", "message": "SELECT * may impact performance", "severity": "warning" } ], "suggestions": [ "Specify exact columns needed instead of using SELECT *" ], "bigquery_specific": { "uses_legacy_sql": false, "has_array_syntax": false, "has_struct_syntax": true } } ``` ### bq_list_datasets Lists all datasets in the BigQuery project. **Input Schema:** ```json { "type": "object", "properties": { "project_id": { "type": "string", "description": "GCP project ID (uses default if not provided)" }, "max_results": { "type": "integer", "description": "Maximum number of datasets to return" } } } ``` **Response:** ```json { "project": "my-project", "dataset_count": 2, "datasets": [ { "dataset_id": "analytics", "project": "my-project", "location": "US", "created": "2024-01-01T00:00:00", "modified": "2024-06-01T00:00:00", "description": "Analytics data", "labels": {"env": "production"} } ] } ``` ### bq_list_tables Lists all tables in a BigQuery dataset. **Input Schema:** ```json { "type": "object", "properties": { "dataset_id": { "type": "string", "description": "The dataset ID" }, "project_id": { "type": "string", "description": "GCP project ID" }, "max_results": { "type": "integer" }, "table_type_filter": { "type": "array", "items": {"type": "string"}, "description": "Filter by table types (TABLE, VIEW, EXTERNAL, MATERIALIZED_VIEW)" } }, "required": ["dataset_id"] } ``` **Response:** ```json { "dataset_id": "analytics", "project": "my-project", "table_count": 3, "tables": [ { "table_id": "users", "table_type": "TABLE", "num_bytes": 1048576, "num_rows": 10000, "partitioning": { "type": "DAY", "field": "created_at" }, "clustering_fields": ["user_id"] } ] } ``` ### bq_describe_table Gets detailed table schema and metadata. **Input Schema:** ```json { "type": "object", "properties": { "table_id": {"type": "string"}, "dataset_id": {"type": "string"}, "project_id": {"type": "string"}, "format_output": { "type": "boolean", "description": "Format schema as table" } }, "required": ["table_id", "dataset_id"] } ``` **Response:** ```json { "table_id": "users", "dataset_id": "analytics", "table_type": "TABLE", "schema": [ { "name": "user_id", "type": "INTEGER", "mode": "REQUIRED", "description": "Unique identifier" }, { "name": "address", "type": "RECORD", "mode": "NULLABLE", "fields": [ { "name": "street", "type": "STRING", "mode": "NULLABLE" } ] } ], "statistics": { "num_bytes": 1048576, "num_rows": 10000 } } ``` ### bq_get_table_info Gets comprehensive table information. **Response:** ```json { "table_id": "users", "dataset_id": "analytics", "full_table_id": "my-project.analytics.users", "table_type": "TABLE", "created": "2024-01-15T00:00:00", "statistics": { "num_bytes": 1048576, "num_rows": 10000, "num_active_logical_bytes": 786432 }, "time_travel": { "max_time_travel_hours": 168 }, "partitioning": { "type": "DAY", "time_partitioning": { "field": "created_at", "require_partition_filter": false } } } ``` ### bq_query_info_schema Queries INFORMATION_SCHEMA views for metadata. **Input Schema:** ```json { "type": "object", "properties": { "query_type": { "type": "string", "enum": ["tables", "columns", "table_storage", "partitions", "views", "routines", "custom"] }, "dataset_id": {"type": "string"}, "project_id": {"type": "string"}, "table_filter": {"type": "string"}, "custom_query": {"type": "string"}, "limit": {"type": "integer", "default": 100} }, "required": ["query_type", "dataset_id"] } ``` **Response:** ```json { "query_type": "columns", "dataset_id": "analytics", "query": "SELECT ... FROM INFORMATION_SCHEMA.COLUMNS ...", "schema": [ { "name": "column_name", "type": "STRING", "mode": "NULLABLE" } ], "metadata": { "total_bytes_processed": 1024, "estimated_cost_usd": 0.000005 } } ``` ### bq_analyze_query_performance Analyzes query performance and provides optimization suggestions. **Input Schema:** ```json { "type": "object", "properties": { "sql": { "type": "string", "description": "The SQL query to analyze" }, "project_id": {"type": "string"} }, "required": ["sql"] } ``` **Response:** ```json { "query_analysis": { "bytes_processed": 107374182400, "gigabytes_processed": 100.0, "estimated_cost_usd": 0.5, "table_count": 1 }, "performance_score": 65, "performance_rating": "GOOD", "optimization_suggestions": [ { "type": "SELECT_STAR", "severity": "MEDIUM", "message": "Query uses SELECT *", "recommendation": "Select only needed columns" } ] } ``` ## Response Schemas ### Error Object All error responses follow this structure: ```typescript interface Error { code: "INVALID_SQL" | "AUTHENTICATION_ERROR" | "PERMISSION_DENIED"; message: string; location?: { line: number; column: number; }; details?: Array<{ reason: string; location: string; message: string; }>; } ``` ### Table Reference ```typescript interface TableReference { project: string; dataset: string; table: string; } ``` ### Schema Field ```typescript interface SchemaField { name: string; type: "STRING" | "INT64" | "FLOAT64" | "BOOL" | "TIMESTAMP" | "DATE" | "TIME" | "DATETIME" | "GEOGRAPHY" | "NUMERIC" | "BIGNUMERIC" | "BYTES" | "STRUCT" | "ARRAY" | "JSON"; mode: "NULLABLE" | "REQUIRED" | "REPEATED"; description?: string; fields?: SchemaField[]; // For STRUCT types } ``` ## Error Codes | Code | Description | Common Causes | |------|-------------|---------------| | `INVALID_SQL` | SQL syntax or semantic error | Typos, invalid references, wrong syntax | | `AUTHENTICATION_ERROR` | Failed to authenticate with BigQuery | Missing credentials, expired tokens | | `PERMISSION_DENIED` | Insufficient permissions | Missing IAM roles, dataset access | ## Parameter Types Due to BigQuery dry-run limitations, all parameters are treated as STRING type. Use explicit casting in your queries: ```sql -- Correct way to use numeric parameters SELECT * FROM orders WHERE amount > CAST(@min_amount AS FLOAT64) AND created_date = CAST(@date AS DATE) ``` ## Rate Limits MCP BigQuery inherits BigQuery's API quotas: | Quota | Limit | Scope | |-------|-------|-------| | API requests | 100/second | Per project | | Concurrent API requests | 300 | Per project | | Query complexity | 50 MB uncompressed | Per query | ## Authentication The server uses Google Cloud Application Default Credentials (ADC). See [Authentication](installation.md#authentication) for setup details. ## Environment Variables | Variable | Description | Default | |----------|-------------|---------| | `BQ_PROJECT` | Default GCP project ID | From ADC | | `BQ_LOCATION` | Default dataset location | None | | `SAFE_PRICE_PER_TIB` | Default price per TiB | 5.0 | | `GOOGLE_APPLICATION_CREDENTIALS` | Path to service account key | None | ## Usage Examples ### Basic Validation ```python # Python example using MCP client result = client.call_tool( "bq_validate_sql", { "sql": "SELECT * FROM dataset.table" } ) if result["isValid"]: print("Query is valid") else: print(f"Error: {result['error']['message']}") ``` ### Cost Estimation ```python result = client.call_tool( "bq_dry_run_sql", { "sql": "SELECT * FROM large_table", "pricePerTiB": 6.0 # Custom pricing } ) print(f"Estimated cost: ${result['usdEstimate']:.2f}") print(f"Bytes to scan: {result['totalBytesProcessed']:,}") ``` ### Parameter Validation ```python result = client.call_tool( "bq_validate_sql", { "sql": "SELECT * FROM users WHERE id = @user_id", "params": {"user_id": "12345"} } ) ``` ## Limitations 1. **Parameters are STRING only** - Cast to appropriate types in SQL 2. **No query execution** - All operations are dry-run only 3. **No data returned** - Only metadata and cost estimates 4. **Cache disabled** - Always returns fresh estimates 5. **No UDF creation** - Can validate UDF usage but not create them ## Next Steps - See [Usage Guide](usage.md) for detailed examples - Check [Examples](examples.md) for real-world scenarios - Read [Development](development.md) for contributing

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/caron14/mcp-bigquery'

If you have feedback or need assistance with the MCP directory API, please join our Discord server