BigQuery Validator

development.md•9.97 KiB

# Development Guide Guide for contributors and developers working on MCP BigQuery. ## Development Setup ### Prerequisites - Python 3.10+ - Git - Google Cloud SDK with BigQuery API enabled ### Clone and Install ```bash # Clone repository git clone https://github.com/caron14/mcp-bigquery.git cd mcp-bigquery # Install with development dependencies pip install -e ".[dev]" # Or using uv uv pip install -e ".[dev]" ``` ### Environment Setup ```bash # Set up Google Cloud authentication gcloud auth application-default login # Configure project export BQ_PROJECT="your-test-project" export BQ_LOCATION="US" # Install pre-commit hooks pre-commit install # Run development server python -m mcp_bigquery ``` ### Pre-commit Setup This project uses pre-commit hooks to ensure code quality: ```bash # Install pre-commit hooks (one-time setup) pre-commit install # Run all hooks manually pre-commit run --all-files # Update hook versions pre-commit autoupdate ``` Configured hooks: - **isort**: Sorts Python imports - **black**: Formats Python code (line length: 100) - **flake8**: Checks Python code style - **ruff**: Fast Python linter - **mypy**: Type checking for Python ## Project Structure ``` mcp-bigquery/ ├── src/mcp_bigquery/ │ ├── __init__.py # Version + exports │ ├── __main__.py # CLI entry point (logging flags added in v0.4.2) │ ├── server.py # MCP server implementation │ ├── config.py # Environment/config resolution │ ├── logging_config.py # Central log formatting + level helpers │ ├── cache.py # In-memory caches (clients + schema metadata) │ ├── clients/ │ │ ├── __init__.py │ │ └── factory.py # Shared BigQuery client creation │ ├── schema_explorer/ │ │ ├── __init__.py │ │ ├── datasets.py # Dataset listing flows │ │ ├── tables.py # Table metadata aggregation │ │ ├── describe.py # Schema inspection helpers │ │ └── _formatters.py # Formatter helpers for schema/table views │ ├── info_schema/ │ │ ├── __init__.py │ │ ├── queries.py # INFORMATION_SCHEMA query builders │ │ ├── performance.py # Performance analysis heuristics │ │ └── _templates.py # SQL template catalog │ ├── sql_analyzer.py # SQL analysis engine (v0.3.0) │ ├── validators.py # Input validation utilities │ ├── exceptions.py # Custom exception types │ └── constants.py # Shared constants/env defaults ├── tests/ │ ├── conftest.py │ ├── test_features.py │ ├── test_quality_improvements.py │ ├── test_min.py │ ├── test_imports.py │ └── test_integration.py ├── docs/ ├── examples/ └── pyproject.toml ``` See also [Module Responsibility Map](module_map.md) for per-file responsibilities captured during the v0.4.2 refactor. ## Testing ### Run All Tests ```bash # Run all tests pytest tests/ # Run with coverage pytest --cov=mcp_bigquery tests/ # Run specific test file pytest tests/test_min.py -v ``` ### Test Categories 1. **Unit Tests** - No BigQuery credentials required ```bash pytest tests/test_min.py::TestWithoutCredentials ``` 2. **Integration Tests** - Requires BigQuery access ```bash pytest tests/test_integration.py ``` ### Writing Tests ```python # Example unit test import pytest from mcp_bigquery.server import validate_sql @pytest.mark.asyncio async def test_validate_simple_query(): result = await validate_sql({"sql": "SELECT 1"}) assert result["isValid"] is True # Example integration test @pytest.mark.requires_credentials async def test_public_dataset_query(): sql = "SELECT * FROM `bigquery-public-data.samples.shakespeare`" result = await dry_run_sql({"sql": sql}) assert result["totalBytesProcessed"] > 0 ``` ## Code Style ### Formatting ```bash # Format with black black src/ tests/ # Check with ruff ruff check src/ tests/ # Type checking with mypy mypy src/ ``` ### Style Guidelines 1. Follow PEP 8 2. Use type hints for all functions 3. Add docstrings to public functions 4. Keep functions small and focused 5. Use descriptive variable names ## Making Changes ### 1. Create Feature Branch ```bash git checkout -b feature/your-feature-name ``` ### 2. Make Changes Follow the existing code patterns: ```python async def your_new_function(params: dict) -> dict: """ Brief description of function. Args: params: Dictionary with 'sql' and optional 'params' Returns: Dictionary with result or error """ try: # Implementation return {"success": True} except Exception as e: return {"error": {"code": "ERROR_CODE", "message": str(e)}} ``` ### 3. Test Your Changes ```bash # Run tests pytest tests/ # Test manually python -m mcp_bigquery ``` ### 4. Update Documentation Update relevant documentation: - Add new features to README.md - Update API documentation - Add examples if applicable ### 5. Submit Pull Request ```bash # Commit changes git add . git commit -m "feat: add new feature" # Push to GitHub git push origin feature/your-feature-name ``` ## Building and Publishing ### Build Package ```bash # Clean previous builds rm -rf dist/ build/ *.egg-info # Build distribution python -m build # Check package contents tar -tzf dist/mcp-bigquery-*.tar.gz | head -20 ``` ### Test Package Locally ```bash # Install from local build pip install dist/mcp-bigquery-*.whl # Test installation mcp-bigquery --version ``` ### Publish to PyPI ```bash # Test on TestPyPI first python -m twine upload --repository testpypi dist/* # Publish to PyPI python -m twine upload dist/* ``` ## Logging and Debugging ### CLI Controls (v0.4.2) `python -m mcp_bigquery` now delegates to `logging_config` so log levels are consistent across tools. Logs default to `WARNING` and stream to stderr. ```bash mcp-bigquery --verbose # INFO mcp-bigquery -vv # DEBUG mcp-bigquery --quiet # ERROR mcp-bigquery --json-logs # Structured JSON logs mcp-bigquery --no-color # Disable ANSI colors ``` These switches stack with the `DEBUG=true` environment variable or the `config.log_level` default resolved in `mcp_bigquery.config`. ### Programmatic Setup ```python from mcp_bigquery.logging_config import setup_logging, resolve_log_level from mcp_bigquery.config import get_config config = get_config() level = resolve_log_level(default_level=config.log_level, verbose=1, quiet=0) setup_logging(level=level, format_json=True, colored=False) ``` ### Common Issues 1. **Import errors** ```bash # Ensure package is installed in editable mode pip install -e . ``` 2. **Authentication errors** ```bash # Check credentials gcloud auth application-default print-access-token ``` 3. **Test failures** ```bash # Run single test with verbose output pytest tests/test_min.py::test_name -vvs ``` ## Architecture Notes ### MCP Server Implementation The server follows MCP protocol standards: 1. **Tool Registration** - Eleven tools registered in `handle_list_tools()` 2. **Tool Execution** - Requests handled in `handle_call_tool()` 3. **Error Handling** - Consistent error format across all tools 4. **Async Support** - All operations are async for performance ### Core Modules #### Client Factory (`clients/factory.py`) - Single place for constructing cached BigQuery clients with retries and ADC handling. - Respects `BQ_PROJECT`, `BQ_LOCATION`, and `SAFE_PRICE_PER_TIB` via `config.get_config()`. - Legacy `mcp_bigquery.bigquery_client` remains as a thin façade that delegates to the factory. #### Logging (`logging_config.py`) - Provides `setup_logging()` and `resolve_log_level()` used by the CLI and server during startup. - Routes logs to stderr by default, enables JSON/colored formatting toggles, and exposes decorators for measuring performance of dry-run helpers. #### SQL Analyzer (`sql_analyzer.py`) - v0.3.0 - SQLAnalyzer class for static SQL analysis - Uses sqlparse for AST parsing - Complexity scoring algorithm - BigQuery-specific syntax support #### Schema Explorer Package (`schema_explorer/`) - updated v0.4.2 - `datasets.py`, `tables.py`, and `describe.py` split responsibilities for dataset listing, table aggregation, and schema formatting. - `_formatters.py` centralizes shared serializers (timestamps, partitions, nested schema trees). - Modules rely on the client factory plus `validators`/`exceptions` and never import each other, preserving clean boundaries. #### Information Schema Package (`info_schema/`) - updated v0.4.2 - `_templates.py` stores INFORMATION_SCHEMA SQL patterns. - `queries.py` handles templating, dry-runs, dependency extraction, and error normalization. - `performance.py` inspects query plans to emit heuristics and `optimization_suggestions`. ### Error Handling Standard error format: ```python { "error": { "code": "INVALID_SQL", "message": "Human-readable error", "location": {"line": 1, "column": 10}, "details": [] # Optional } } ``` ## Contributing Guidelines 1. **Open an issue first** - Discuss major changes before implementing 2. **Follow existing patterns** - Maintain consistency with current code 3. **Add tests** - All new features need test coverage 4. **Update docs** - Keep documentation in sync with code 5. **One feature per PR** - Keep pull requests focused ## Release Process 1. Update version in `pyproject.toml` and `src/mcp_bigquery/__init__.py` 2. Update CHANGELOG in README.md 3. Create and push git tag 4. Build and publish to PyPI 5. Create GitHub release ## Getting Help - **Issues**: [GitHub Issues](https://github.com/caron14/mcp-bigquery/issues) - **Discussions**: [GitHub Discussions](https://github.com/caron14/mcp-bigquery/discussions) - **Documentation**: This guide and API reference ## License MIT License - See LICENSE file for details

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/caron14/mcp-bigquery'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

development.md•9.97 KiB