Provides comprehensive access to Databricks functionality including cluster management, job execution, notebook operations, workspace file management, DBFS operations, Unity Catalog interactions, SQL execution, and Git repository synchronization.
🤖 Databricks Custom MCP Demo
Databricks MCP Server
A Model Completion Protocol (MCP) server for Databricks that provides access to Databricks functionality via the MCP protocol. This allows LLM-powered tools to interact with Databricks clusters, jobs, notebooks, and more.
Credit for the initial version goes to @JustTryAI and Markov
Features
MCP Protocol Support: Implements the MCP protocol to allow LLMs to interact with Databricks
Databricks API Integration: Provides access to Databricks REST API functionality
Tool Registration: Exposes Databricks functionality as MCP tools
Async Support: Built with asyncio for efficient operation
Available Tools
The Databricks MCP Server exposes the following tools:
Cluster Management
list_clusters: List all Databricks clusters
create_cluster: Create a new Databricks cluster
terminate_cluster: Terminate a Databricks cluster
get_cluster: Get information about a specific Databricks cluster
start_cluster: Start a terminated Databricks cluster
Job Management
list_jobs: List all Databricks jobs
run_job: Run a Databricks job
run_notebook: Submit and wait for a one-time notebook run
create_job: Create a new Databricks job
delete_job: Delete a Databricks job
get_run_status: Get status information for a job run
list_job_runs: List recent runs for a job
cancel_run: Cancel a running job
Workspace Files
list_notebooks: List notebooks in a workspace directory
export_notebook: Export a notebook from the workspace
import_notebook: Import a notebook into the workspace
delete_workspace_object: Delete a notebook or directory
get_workspace_file_content: Retrieve content of any workspace file (JSON, notebooks, scripts, etc.)
get_workspace_file_info: Get metadata about workspace files
File System
list_files: List files and directories in a DBFS path
dbfs_put: Upload a small file to DBFS
dbfs_delete: Delete a DBFS file or directory
Cluster Libraries
install_library: Install libraries on a cluster
uninstall_library: Remove libraries from a cluster
list_cluster_libraries: Check installed libraries on a cluster
Repos
create_repo: Clone a Git repository
update_repo: Update an existing repo
list_repos: List repos in the workspace
pull_repo: Pull the latest commit for a Databricks repo
Unity Catalog
list_catalogs: List catalogs
create_catalog: Create a catalog
list_schemas: List schemas in a catalog
create_schema: Create a schema
list_tables: List tables in a schema
create_table: Execute a CREATE TABLE statement
get_table_lineage: Fetch lineage information for a table
Composite
sync_repo_and_run_notebook: Pull a repo and execute a notebook in one call
SQL Execution
execute_sql: Execute a SQL statement (warehouse_id optional if DATABRICKS_WAREHOUSE_ID env var is set)
Manual Installation
Prerequisites
Python 3.10 or higher
uv
package manager (recommended for MCP servers)
Setup
Install
uv
if you don't have it already:# MacOS/Linux curl -LsSf https://astral.sh/uv/install.sh | sh # Windows (in PowerShell) irm https://astral.sh/uv/install.ps1 | iexRestart your terminal after installation.
Clone the repository:
git clone https://github.com/robkisk/databricks-mcp.git cd databricks-mcpRun the setup script:
# Linux/Mac ./scripts/setup.sh # Windows (PowerShell) .\scripts\setup.ps1The setup script will:
Install
uv
if not already installedCreate a virtual environment
Install all project dependencies
Verify the installation works
Alternative manual setup:
# Create and activate virtual environment uv venv # On Windows .\.venv\Scripts\activate # On Linux/Mac source .venv/bin/activate # Install dependencies in development mode uv pip install -e . # Install development dependencies uv pip install -e ".[dev]"Set up environment variables:
# Required variables # Windows set DATABRICKS_HOST=https://your-databricks-instance.azuredatabricks.net set DATABRICKS_TOKEN=your-personal-access-token # Linux/Mac export DATABRICKS_HOST=https://your-databricks-instance.azuredatabricks.net export DATABRICKS_TOKEN=your-personal-access-token # Optional: Set default SQL warehouse (makes warehouse_id optional in execute_sql) export DATABRICKS_WAREHOUSE_ID=sql_warehouse_12345You can also create an
.env
file based on the.env.example
template.
Running the MCP Server
Standalone
To start the MCP server directly for testing or development, run:
This is useful for seeing direct output and logs.
Integrating with AI Clients
To use this server with AI clients like Cursor or Claude CLI, you need to register it.
Cursor Setup
Open your global MCP configuration file located at
~/.cursor/mcp.json
(create it if it doesn't exist).Add the following entry within the
mcpServers
object, replacing placeholders with your actual values and ensuring the path tostart_mcp_server.sh
is correct:{ "mcpServers": { // ... other servers ... "databricks-mcp-local": { "command": "/absolute/path/to/your/project/databricks-mcp-server/start_mcp_server.sh", "args": [], "env": { "DATABRICKS_HOST": "https://your-databricks-instance.azuredatabricks.net", "DATABRICKS_TOKEN": "dapiXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX", "DATABRICKS_WAREHOUSE_ID": "sql_warehouse_12345", "RUNNING_VIA_CURSOR_MCP": "true" } } // ... other servers ... } }Important: Replace
/absolute/path/to/your/project/databricks-mcp-server/
with the actual absolute path to this project directory on your machine.Replace the
DATABRICKS_HOST
andDATABRICKS_TOKEN
values with your credentials.Save the file and restart Cursor.
You can now invoke tools using
databricks-mcp-local:<tool_name>
(e.g.,databricks-mcp-local:list_jobs
).
Claude CLI Setup
Use the
claude mcp add
command to register the server. Provide your credentials using the-e
flag for environment variables and point the command to thestart_mcp_server.sh
script using--
followed by the absolute path:claude mcp add databricks-mcp-local \ -s user \ -e DATABRICKS_HOST="https://your-databricks-instance.azuredatabricks.net" \ -e DATABRICKS_TOKEN="dapiXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" \ -e DATABRICKS_WAREHOUSE_ID="sql_warehouse_12345" \ -- /absolute/path/to/your/project/databricks-mcp-server/start_mcp_server.shImportant: Replace
/absolute/path/to/your/project/databricks-mcp-server/
with the actual absolute path to this project directory on your machine.Replace the
DATABRICKS_HOST
andDATABRICKS_TOKEN
values with your credentials.You can now invoke tools using
databricks-mcp-local:<tool_name>
in your Claude interactions.
Querying Databricks Resources
The repository includes utility scripts to quickly view Databricks resources:
Usage Examples
SQL Execution with Default Warehouse
Workspace File Content Retrieval
Repo Sync and Notebook Execution
Create Nightly ETL Job
Project Structure
See docs/project_structure.md
for a more detailed view of the project structure.
Development
Code Standards
Python code follows PEP 8 style guide with a maximum line length of 100 characters
Use 4 spaces for indentation (no tabs)
Use double quotes for strings
All classes, methods, and functions should have Google-style docstrings
Type hints are required for all code except tests
Linting
The project uses the following linting tools:
Testing
The project uses pytest for testing. To run the tests:
You can also run the tests directly with pytest:
A minimum code coverage of 80% is the goal for the project.
Documentation
API documentation is generated using Sphinx and can be found in the
docs/api
directoryAll code includes Google-style docstrings
See the
examples/
directory for usage examples
Examples
Check the examples/
directory for usage examples. To run examples:
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Ensure your code follows the project's coding standards
Add tests for any new functionality
Update documentation as necessary
Verify all tests pass before submitting
This server cannot be installed
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
Enables LLM-powered tools to interact with Databricks clusters, jobs, notebooks, SQL warehouses, and Unity Catalog through the Model Completion Protocol. Provides comprehensive access to Databricks REST API functionality including cluster management, job execution, workspace operations, and data catalog operations.