Skip to main content
Glama
development.md10.2 kB
# Development Guide Guide for contributors and developers working on MCP BigQuery. ## Development Setup ### Prerequisites - Python 3.10+ - Git - Google Cloud SDK with BigQuery API enabled ### Clone and Install ```bash # Clone repository git clone https://github.com/caron14/mcp-bigquery.git cd mcp-bigquery # Install with development dependencies pip install -e ".[dev]" # Or using uv uv pip install -e ".[dev]" ``` ### Environment Setup ```bash # Set up Google Cloud authentication gcloud auth application-default login # Configure project export BQ_PROJECT="your-test-project" export BQ_LOCATION="US" # Install pre-commit hooks pre-commit install # Run development server python -m mcp_bigquery ``` ### Pre-commit Setup This project uses pre-commit hooks to ensure code quality: ```bash # Install pre-commit hooks (one-time setup) pre-commit install # Run all hooks manually pre-commit run --all-files # Update hook versions pre-commit autoupdate ``` Configured hooks: - **isort**: Sorts Python imports - **black**: Formats Python code (line length: 100) - **flake8**: Checks Python code style - **ruff**: Fast Python linter - **mypy**: Type checking for Python ## Project Structure ``` mcp-bigquery/ ├── src/mcp_bigquery/ │ ├── __init__.py # Version + exports │ ├── __main__.py # CLI entry point (logging flags added in v0.4.2) │ ├── server.py # MCP server implementation │ ├── config.py # Environment/config resolution │ ├── logging_config.py # Central log formatting + level helpers │ ├── cache.py # In-memory caches (clients + schema metadata) │ ├── clients/ │ │ ├── __init__.py │ │ └── factory.py # Shared BigQuery client creation │ ├── schema_explorer/ │ │ ├── __init__.py │ │ ├── datasets.py # Dataset listing flows │ │ ├── tables.py # Table metadata aggregation │ │ ├── describe.py # Schema inspection helpers │ │ └── _formatters.py # Formatter helpers for schema/table views │ ├── info_schema/ │ │ ├── __init__.py │ │ ├── queries.py # INFORMATION_SCHEMA query builders │ │ ├── performance.py # Performance analysis heuristics │ │ └── _templates.py # SQL template catalog │ ├── sql_analyzer.py # SQL analysis engine (v0.3.0) │ ├── validators.py # Input validation utilities │ ├── exceptions.py # Custom exception types │ └── constants.py # Shared constants/env defaults ├── tests/ │ ├── conftest.py │ ├── test_features.py │ ├── test_quality_improvements.py │ ├── test_min.py │ ├── test_imports.py │ └── test_integration.py ├── docs/ ├── examples/ └── pyproject.toml ``` See also [Module Responsibility Map](module_map.md) for per-file responsibilities captured during the v0.4.2 refactor. ## Testing ### Run All Tests ```bash # Run all tests pytest tests/ # Run with coverage pytest --cov=mcp_bigquery tests/ # Run specific test file pytest tests/test_min.py -v ``` ### Test Categories 1. **Unit Tests** - No BigQuery credentials required ```bash pytest tests/test_min.py::TestWithoutCredentials ``` 2. **Integration Tests** - Requires BigQuery access ```bash pytest tests/test_integration.py ``` ### Writing Tests ```python # Example unit test import pytest from mcp_bigquery.server import validate_sql @pytest.mark.asyncio async def test_validate_simple_query(): result = await validate_sql({"sql": "SELECT 1"}) assert result["isValid"] is True # Example integration test @pytest.mark.requires_credentials async def test_public_dataset_query(): sql = "SELECT * FROM `bigquery-public-data.samples.shakespeare`" result = await dry_run_sql({"sql": sql}) assert result["totalBytesProcessed"] > 0 ``` ## Code Style ### Formatting ```bash # Format with black black src/ tests/ # Check with ruff ruff check src/ tests/ # Type checking with mypy mypy src/ ``` ### Style Guidelines 1. Follow PEP 8 2. Use type hints for all functions 3. Add docstrings to public functions 4. Keep functions small and focused 5. Use descriptive variable names ## Making Changes ### 1. Create Feature Branch ```bash git checkout -b feature/your-feature-name ``` ### 2. Make Changes Follow the existing code patterns: ```python async def your_new_function(params: dict) -> dict: """ Brief description of function. Args: params: Dictionary with 'sql' and optional 'params' Returns: Dictionary with result or error """ try: # Implementation return {"success": True} except Exception as e: return {"error": {"code": "ERROR_CODE", "message": str(e)}} ``` ### 3. Test Your Changes ```bash # Run tests pytest tests/ # Test manually python -m mcp_bigquery ``` ### 4. Update Documentation Update relevant documentation: - Add new features to README.md - Update API documentation - Add examples if applicable ### 5. Submit Pull Request ```bash # Commit changes git add . git commit -m "feat: add new feature" # Push to GitHub git push origin feature/your-feature-name ``` ## Building and Publishing ### Build Package ```bash # Clean previous builds rm -rf dist/ build/ *.egg-info # Build distribution python -m build # Check package contents tar -tzf dist/mcp-bigquery-*.tar.gz | head -20 ``` ### Test Package Locally ```bash # Install from local build pip install dist/mcp-bigquery-*.whl # Test installation mcp-bigquery --version ``` ### Publish to PyPI ```bash # Test on TestPyPI first python -m twine upload --repository testpypi dist/* # Publish to PyPI python -m twine upload dist/* ``` ## Logging and Debugging ### CLI Controls (v0.4.2) `python -m mcp_bigquery` now delegates to `logging_config` so log levels are consistent across tools. Logs default to `WARNING` and stream to stderr. ```bash mcp-bigquery --verbose # INFO mcp-bigquery -vv # DEBUG mcp-bigquery --quiet # ERROR mcp-bigquery --json-logs # Structured JSON logs mcp-bigquery --no-color # Disable ANSI colors ``` These switches stack with the `DEBUG=true` environment variable or the `config.log_level` default resolved in `mcp_bigquery.config`. ### Programmatic Setup ```python from mcp_bigquery.logging_config import setup_logging, resolve_log_level from mcp_bigquery.config import get_config config = get_config() level = resolve_log_level(default_level=config.log_level, verbose=1, quiet=0) setup_logging(level=level, format_json=True, colored=False) ``` ### Common Issues 1. **Import errors** ```bash # Ensure package is installed in editable mode pip install -e . ``` 2. **Authentication errors** ```bash # Check credentials gcloud auth application-default print-access-token ``` 3. **Test failures** ```bash # Run single test with verbose output pytest tests/test_min.py::test_name -vvs ``` ## Architecture Notes ### MCP Server Implementation The server follows MCP protocol standards: 1. **Tool Registration** - Eleven tools registered in `handle_list_tools()` 2. **Tool Execution** - Requests handled in `handle_call_tool()` 3. **Error Handling** - Consistent error format across all tools 4. **Async Support** - All operations are async for performance ### Core Modules #### Client Factory (`clients/factory.py`) - Single place for constructing cached BigQuery clients with retries and ADC handling. - Respects `BQ_PROJECT`, `BQ_LOCATION`, and `SAFE_PRICE_PER_TIB` via `config.get_config()`. - Legacy `mcp_bigquery.bigquery_client` remains as a thin façade that delegates to the factory. #### Logging (`logging_config.py`) - Provides `setup_logging()` and `resolve_log_level()` used by the CLI and server during startup. - Routes logs to stderr by default, enables JSON/colored formatting toggles, and exposes decorators for measuring performance of dry-run helpers. #### SQL Analyzer (`sql_analyzer.py`) - v0.3.0 - SQLAnalyzer class for static SQL analysis - Uses sqlparse for AST parsing - Complexity scoring algorithm - BigQuery-specific syntax support #### Schema Explorer Package (`schema_explorer/`) - updated v0.4.2 - `datasets.py`, `tables.py`, and `describe.py` split responsibilities for dataset listing, table aggregation, and schema formatting. - `_formatters.py` centralizes shared serializers (timestamps, partitions, nested schema trees). - Modules rely on the client factory plus `validators`/`exceptions` and never import each other, preserving clean boundaries. #### Information Schema Package (`info_schema/`) - updated v0.4.2 - `_templates.py` stores INFORMATION_SCHEMA SQL patterns. - `queries.py` handles templating, dry-runs, dependency extraction, and error normalization. - `performance.py` inspects query plans to emit heuristics and `optimization_suggestions`. ### Error Handling Standard error format: ```python { "error": { "code": "INVALID_SQL", "message": "Human-readable error", "location": {"line": 1, "column": 10}, "details": [] # Optional } } ``` ## Contributing Guidelines 1. **Open an issue first** - Discuss major changes before implementing 2. **Follow existing patterns** - Maintain consistency with current code 3. **Add tests** - All new features need test coverage 4. **Update docs** - Keep documentation in sync with code 5. **One feature per PR** - Keep pull requests focused ## Release Process 1. Update version in `pyproject.toml` and `src/mcp_bigquery/__init__.py` 2. Update CHANGELOG in README.md 3. Create and push git tag 4. Build and publish to PyPI 5. Create GitHub release ## Getting Help - **Issues**: [GitHub Issues](https://github.com/caron14/mcp-bigquery/issues) - **Discussions**: [GitHub Discussions](https://github.com/caron14/mcp-bigquery/discussions) - **Documentation**: This guide and API reference ## License MIT License - See LICENSE file for details

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/caron14/mcp-bigquery'

If you have feedback or need assistance with the MCP directory API, please join our Discord server