Databricks MCP Server

mcp-development.mdc•9.82 KiB

# MCP Development Guidelines ## Model Context Protocol (MCP) Architecture ### Core Concepts - **MCP Server**: FastAPI application with integrated MCP server via FastMCP - **Tools**: Python functions that expose Databricks capabilities to AI assistants - **Prompts**: Markdown files that become MCP prompts for AI context - **Authentication**: OAuth flow through Databricks Apps for secure access ### Project Structure - [server/app.py](mdc:server/app.py) - Main FastAPI app with integrated MCP server - [server/tools/](mdc:server/tools/) - MCP tools organized by functionality - [prompts/](mdc:prompts/) - Markdown files that become MCP prompts - [dba_mcp_proxy/](mdc:dba_mcp_proxy/) - Local proxy for Claude CLI integration ### MCP Tool Development #### Simple Tool Function Pattern ```python from typing import Dict, Any from databricks.sdk import WorkspaceClient from fastmcp import MCPServer def load_module_tools(mcp_server: MCPServer): """Register tools from this module with the MCP server.""" @mcp_server.tool def databricks_tool(param1: str, param2: int = 10) -> Dict[str, Any]: """ Brief description of what this tool does. Args: param1: Description of parameter param2: Description with default value Returns: Dictionary with result or error information """ try: # Direct Databricks SDK call client = WorkspaceClient() # Perform Databricks operation result = client.some_api.method(param1, param2) return { "success": True, "data": result, "message": "Operation completed successfully" } except Exception as e: return { "success": False, "error": str(e), "message": "Operation failed" } ``` #### Tool System Architecture The modular tools system (`server/tools/`) is organized into specialized modules: - `core.py` - Health checks and basic operations - `sql_operations.py` - SQL warehouse and query tools - `unity_catalog.py` - Unity Catalog operations (catalogs, schemas, tables) - `jobs_pipelines.py` - Job and DLT pipeline management - `workspace_files.py` - Workspace file operations - `dashboards.py` - **Comprehensive dashboard management tools** (only Lakeview Dashboards no support for legacy dashbaords) - `repositories.py` - Git repository integration - `data_management.py` - DBFS and data operations (commented out) - `governance.py` - Governance tools (commented out) ### Adding New Tools Tools are automatically registered when added to modules. Follow existing patterns: ```python def load_module_tools(mcp_server): """Register tools from this module.""" @mcp_server.tool def your_new_tool(param: str) -> dict: """Tool description for Claude.""" # Direct Databricks SDK implementation return {"result": "data"} ``` **Key principles:** - Direct Databricks SDK calls (no wrappers) - Simple error handling with try/catch - Return dictionaries with consistent structure - No decorators, no abstractions, no magic ### MCP Server Configuration #### FastAPI Integration ```python from fastapi import FastAPI from fastmcp import MCPServer from server.tools import core, sql_operations, unity_catalog app = FastAPI(title="Databricks MCP Server") # Initialize MCP server mcp_server = MCPServer() # Load tools from modules core.load_module_tools(mcp_server) sql_operations.load_module_tools(mcp_server) unity_catalog.load_module_tools(mcp_server) # Mount MCP server app.mount("/mcp", mcp_server.app) ``` ### Prompt Development #### Markdown Prompt Pattern ```markdown # Databricks SQL Operations You can help users execute SQL queries and manage SQL warehouses in their Databricks workspace. ## Available Operations ### Execute SQL Query - **Tool**: `execute_sql_query` - **Parameters**: - `query`: SQL query to execute - `warehouse_id`: ID of the SQL warehouse to use - **Returns**: Query results and execution status ### List SQL Warehouses - **Tool**: `list_sql_warehouses` - **Parameters**: None - **Returns**: List of available SQL warehouses ## Usage Examples 1. To execute a simple query: ``` execute_sql_query( query="SELECT * FROM my_table LIMIT 10", warehouse_id="1234567890" ) ``` 2. To list available warehouses: ``` list_sql_warehouses() ``` ## Best Practices - Always specify a warehouse_id when executing queries - Use LIMIT clauses for large result sets - Handle errors gracefully with try-catch blocks ``` ### Authentication Flow #### Direct OAuth Integration ```python from databricks.sdk import WorkspaceClient from databricks.sdk.errors import DatabricksError def get_authenticated_client() -> WorkspaceClient: """ Get authenticated Databricks workspace client. The client automatically uses OAuth tokens from Databricks Apps. """ try: return WorkspaceClient() except DatabricksError as e: raise Exception(f"Authentication failed: {e}") ``` ### Error Handling #### Simple Error Responses ```python def handle_databricks_error(operation: str, error: Exception) -> Dict[str, Any]: """Standard error handling for Databricks operations.""" return { "success": False, "error": str(error), "operation": operation, "message": f"Failed to {operation}", "timestamp": datetime.utcnow().isoformat() } ``` ### Testing MCP Tools #### Simple Unit Test Pattern ```python import pytest from unittest.mock import Mock, patch from server.tools.core import health_check def test_health_check_success(): """Test successful health check.""" with patch('server.tools.core.WorkspaceClient') as mock_client: mock_client.return_value.current_user.me.return_value = Mock( user_name="test_user", display_name="Test User" ) result = health_check() assert result["success"] is True assert "status" in result["data"] assert result["data"]["user"] == "test_user" def test_health_check_failure(): """Test health check with authentication failure.""" with patch('server.tools.core.WorkspaceClient') as mock_client: mock_client.side_effect = Exception("Authentication failed") result = health_check() assert result["success"] is False assert "error" in result assert "Authentication failed" in result["error"] ``` ### Best Practices #### Tool Design - **Single Responsibility**: Each tool should do one thing well - **Clear Documentation**: Include comprehensive docstrings - **Type Safety**: Use type hints for all parameters and return values - **Simple Error Handling**: Always handle exceptions and return structured responses - **Idempotency**: Tools should be safe to call multiple times #### Performance - **Direct SDK Calls**: Call Databricks SDK directly, no wrapper layers - **Simple Operations**: Keep operations straightforward and focused - **Resource Cleanup**: Properly close connections and clean up resources #### Security - **Input Validation**: Validate all input parameters - **Error Sanitization**: Don't expose sensitive information in error messages - **Authentication**: Always verify authentication before operations - **Authorization**: Check permissions before performing operations ### Forbidden MCP Patterns (DO NOT ADD THESE) ❌ **Complex tool abstractions** or wrapper layers around Databricks SDK ❌ **Custom authentication systems** - use Databricks OAuth only ❌ **Complex error handling systems** - keep error handling simple ❌ **Tool factories** or complex tool registration patterns ❌ **Custom MCP extensions** - use standard MCP patterns only ❌ **Complex prompt generation** - keep prompts simple and direct ### Required MCP Patterns (ALWAYS USE THESE) ✅ **Direct SDK calls** - call Databricks SDK directly ✅ **Simple tool functions** - one function per tool ✅ **Basic error handling** - try/catch with simple return dictionaries ✅ **Clear documentation** - simple docstrings for each tool ✅ **Standard MCP patterns** - follow MCP specification exactly ✅ **Simple prompts** - clear, direct markdown files ### Code Review Questions Before adding any MCP tool, ask yourself: - "Is this the simplest way to expose this Databricks functionality?" - "Would a new developer understand this tool immediately?" - "Am I adding abstraction for a real need or hypothetical flexibility?" - "Can I solve this with direct Databricks SDK calls?" - "Does this follow the existing MCP patterns in the codebase?" ### Examples of Good vs Bad MCP Tools **❌ BAD (Over-engineered):** ```python class AbstractDatabricksTool(ABC): @abstractmethod def execute(self, params: Dict[str, Any]) -> Dict[str, Any]: ... class SQLQueryTool(AbstractDatabricksTool): def __init__(self, client_factory: ClientFactory): ... def execute(self, params: Dict[str, Any]) -> Dict[str, Any]: ... ``` **✅ GOOD (Simple):** ```python @mcp_server.tool def execute_sql_query(query: str, warehouse_id: str) -> Dict[str, Any]: """Execute a SQL query on Databricks.""" try: client = WorkspaceClient() result = client.sql.execute_query(query, warehouse_id=warehouse_id) return {"success": True, "data": result} except Exception as e: return {"success": False, "error": str(e)} ``` ## Summary: MCP Development Principles ✅ **Readable**: Any developer can understand the MCP tool immediately ✅ **Maintainable**: Simple patterns that are easy to modify ✅ **Focused**: Each tool has a single, clear purpose ✅ **Direct**: No unnecessary abstractions or indirection ✅ **Practical**: Exposes Databricks functionality without over-engineering When in doubt, choose the **simpler** MCP tool. Your future self (and your teammates) will thank you.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/PulkitXChadha/awesome-databricks-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

mcp-development.mdc•9.82 KiB