# Development Guide
Guide for contributors and developers working on MCP BigQuery.
## Development Setup
### Prerequisites
- Python 3.10+
- Git
- Google Cloud SDK with BigQuery API enabled
### Clone and Install
```bash
# Clone repository
git clone https://github.com/caron14/mcp-bigquery.git
cd mcp-bigquery
# Install with development dependencies
pip install -e ".[dev]"
# Or using uv
uv pip install -e ".[dev]"
```
### Environment Setup
```bash
# Set up Google Cloud authentication
gcloud auth application-default login
# Configure project
export BQ_PROJECT="your-test-project"
export BQ_LOCATION="US"
# Install pre-commit hooks
pre-commit install
# Run development server
python -m mcp_bigquery
```
### Pre-commit Setup
This project uses pre-commit hooks to ensure code quality:
```bash
# Install pre-commit hooks (one-time setup)
pre-commit install
# Run all hooks manually
pre-commit run --all-files
# Update hook versions
pre-commit autoupdate
```
Configured hooks:
- **isort**: Sorts Python imports
- **black**: Formats Python code (line length: 100)
- **flake8**: Checks Python code style
- **ruff**: Fast Python linter
- **mypy**: Type checking for Python
## Project Structure
```
mcp-bigquery/
├── src/mcp_bigquery/
│ ├── __init__.py # Version + exports
│ ├── __main__.py # CLI entry point (logging flags added in v0.4.2)
│ ├── server.py # MCP server implementation
│ ├── config.py # Environment/config resolution
│ ├── logging_config.py # Central log formatting + level helpers
│ ├── cache.py # In-memory caches (clients + schema metadata)
│ ├── clients/
│ │ ├── __init__.py
│ │ └── factory.py # Shared BigQuery client creation
│ ├── schema_explorer/
│ │ ├── __init__.py
│ │ ├── datasets.py # Dataset listing flows
│ │ ├── tables.py # Table metadata aggregation
│ │ ├── describe.py # Schema inspection helpers
│ │ └── _formatters.py # Formatter helpers for schema/table views
│ ├── info_schema/
│ │ ├── __init__.py
│ │ ├── queries.py # INFORMATION_SCHEMA query builders
│ │ ├── performance.py # Performance analysis heuristics
│ │ └── _templates.py # SQL template catalog
│ ├── sql_analyzer.py # SQL analysis engine (v0.3.0)
│ ├── validators.py # Input validation utilities
│ ├── exceptions.py # Custom exception types
│ └── constants.py # Shared constants/env defaults
├── tests/
│ ├── conftest.py
│ ├── test_features.py
│ ├── test_quality_improvements.py
│ ├── test_min.py
│ ├── test_imports.py
│ └── test_integration.py
├── docs/
├── examples/
└── pyproject.toml
```
See also [Module Responsibility Map](module_map.md) for per-file responsibilities captured during the v0.4.2 refactor.
## Testing
### Run All Tests
```bash
# Run all tests
pytest tests/
# Run with coverage
pytest --cov=mcp_bigquery tests/
# Run specific test file
pytest tests/test_min.py -v
```
### Test Categories
1. **Unit Tests** - No BigQuery credentials required
```bash
pytest tests/test_min.py::TestWithoutCredentials
```
2. **Integration Tests** - Requires BigQuery access
```bash
pytest tests/test_integration.py
```
### Writing Tests
```python
# Example unit test
import pytest
from mcp_bigquery.server import validate_sql
@pytest.mark.asyncio
async def test_validate_simple_query():
result = await validate_sql({"sql": "SELECT 1"})
assert result["isValid"] is True
# Example integration test
@pytest.mark.requires_credentials
async def test_public_dataset_query():
sql = "SELECT * FROM `bigquery-public-data.samples.shakespeare`"
result = await dry_run_sql({"sql": sql})
assert result["totalBytesProcessed"] > 0
```
## Code Style
### Formatting
```bash
# Format with black
black src/ tests/
# Check with ruff
ruff check src/ tests/
# Type checking with mypy
mypy src/
```
### Style Guidelines
1. Follow PEP 8
2. Use type hints for all functions
3. Add docstrings to public functions
4. Keep functions small and focused
5. Use descriptive variable names
## Making Changes
### 1. Create Feature Branch
```bash
git checkout -b feature/your-feature-name
```
### 2. Make Changes
Follow the existing code patterns:
```python
async def your_new_function(params: dict) -> dict:
"""
Brief description of function.
Args:
params: Dictionary with 'sql' and optional 'params'
Returns:
Dictionary with result or error
"""
try:
# Implementation
return {"success": True}
except Exception as e:
return {"error": {"code": "ERROR_CODE", "message": str(e)}}
```
### 3. Test Your Changes
```bash
# Run tests
pytest tests/
# Test manually
python -m mcp_bigquery
```
### 4. Update Documentation
Update relevant documentation:
- Add new features to README.md
- Update API documentation
- Add examples if applicable
### 5. Submit Pull Request
```bash
# Commit changes
git add .
git commit -m "feat: add new feature"
# Push to GitHub
git push origin feature/your-feature-name
```
## Building and Publishing
### Build Package
```bash
# Clean previous builds
rm -rf dist/ build/ *.egg-info
# Build distribution
python -m build
# Check package contents
tar -tzf dist/mcp-bigquery-*.tar.gz | head -20
```
### Test Package Locally
```bash
# Install from local build
pip install dist/mcp-bigquery-*.whl
# Test installation
mcp-bigquery --version
```
### Publish to PyPI
```bash
# Test on TestPyPI first
python -m twine upload --repository testpypi dist/*
# Publish to PyPI
python -m twine upload dist/*
```
## Logging and Debugging
### CLI Controls (v0.4.2)
`python -m mcp_bigquery` now delegates to `logging_config` so log levels are consistent across tools. Logs default to `WARNING` and stream to stderr.
```bash
mcp-bigquery --verbose # INFO
mcp-bigquery -vv # DEBUG
mcp-bigquery --quiet # ERROR
mcp-bigquery --json-logs # Structured JSON logs
mcp-bigquery --no-color # Disable ANSI colors
```
These switches stack with the `DEBUG=true` environment variable or the `config.log_level` default resolved in `mcp_bigquery.config`.
### Programmatic Setup
```python
from mcp_bigquery.logging_config import setup_logging, resolve_log_level
from mcp_bigquery.config import get_config
config = get_config()
level = resolve_log_level(default_level=config.log_level, verbose=1, quiet=0)
setup_logging(level=level, format_json=True, colored=False)
```
### Common Issues
1. **Import errors**
```bash
# Ensure package is installed in editable mode
pip install -e .
```
2. **Authentication errors**
```bash
# Check credentials
gcloud auth application-default print-access-token
```
3. **Test failures**
```bash
# Run single test with verbose output
pytest tests/test_min.py::test_name -vvs
```
## Architecture Notes
### MCP Server Implementation
The server follows MCP protocol standards:
1. **Tool Registration** - Eleven tools registered in `handle_list_tools()`
2. **Tool Execution** - Requests handled in `handle_call_tool()`
3. **Error Handling** - Consistent error format across all tools
4. **Async Support** - All operations are async for performance
### Core Modules
#### Client Factory (`clients/factory.py`)
- Single place for constructing cached BigQuery clients with retries and ADC handling.
- Respects `BQ_PROJECT`, `BQ_LOCATION`, and `SAFE_PRICE_PER_TIB` via `config.get_config()`.
- Legacy `mcp_bigquery.bigquery_client` remains as a thin façade that delegates to the factory.
#### Logging (`logging_config.py`)
- Provides `setup_logging()` and `resolve_log_level()` used by the CLI and server during startup.
- Routes logs to stderr by default, enables JSON/colored formatting toggles, and exposes decorators for measuring performance of dry-run helpers.
#### SQL Analyzer (`sql_analyzer.py`) - v0.3.0
- SQLAnalyzer class for static SQL analysis
- Uses sqlparse for AST parsing
- Complexity scoring algorithm
- BigQuery-specific syntax support
#### Schema Explorer Package (`schema_explorer/`) - updated v0.4.2
- `datasets.py`, `tables.py`, and `describe.py` split responsibilities for dataset listing, table aggregation, and schema formatting.
- `_formatters.py` centralizes shared serializers (timestamps, partitions, nested schema trees).
- Modules rely on the client factory plus `validators`/`exceptions` and never import each other, preserving clean boundaries.
#### Information Schema Package (`info_schema/`) - updated v0.4.2
- `_templates.py` stores INFORMATION_SCHEMA SQL patterns.
- `queries.py` handles templating, dry-runs, dependency extraction, and error normalization.
- `performance.py` inspects query plans to emit heuristics and `optimization_suggestions`.
### Error Handling
Standard error format:
```python
{
"error": {
"code": "INVALID_SQL",
"message": "Human-readable error",
"location": {"line": 1, "column": 10},
"details": [] # Optional
}
}
```
## Contributing Guidelines
1. **Open an issue first** - Discuss major changes before implementing
2. **Follow existing patterns** - Maintain consistency with current code
3. **Add tests** - All new features need test coverage
4. **Update docs** - Keep documentation in sync with code
5. **One feature per PR** - Keep pull requests focused
## Release Process
1. Update version in `pyproject.toml` and `src/mcp_bigquery/__init__.py`
2. Update CHANGELOG in README.md
3. Create and push git tag
4. Build and publish to PyPI
5. Create GitHub release
## Getting Help
- **Issues**: [GitHub Issues](https://github.com/caron14/mcp-bigquery/issues)
- **Discussions**: [GitHub Discussions](https://github.com/caron14/mcp-bigquery/discussions)
- **Documentation**: This guide and API reference
## License
MIT License - See LICENSE file for details