Skip to main content
Glama
OliveriGuido

Databricks MCP Server

by OliveriGuido

Databricks MCP Server — Natural-Language Analytics POC

A small Model Context Protocol server that lets an LLM client (e.g. Claude Desktop) answer business questions in natural language over a Databricks dataset — without writing SQL by hand.

It runs against the public samples.nyctaxi.trips dataset that ships with every Databricks workspace, so it's reproducible by anyone.

What it exposes (the three MCP primitives)

Primitive

Name

Purpose

Tool

run_query

Executes a read-only SQL query against samples.nyctaxi.trips and returns the rows.

Resource

schema://nyctaxi

Curated schema + metric definitions and gotchas — the context layer that makes the generated SQL correct.

Prompts

revenue_by_month, busiest_pickup_zones, trips_by_hour, fare_distance_summary

Ready-made business questions.

Related MCP server: MCP Iceberg Catalog

Safety / governance

Two layers, on purpose:

  1. App-level guard (is_read_only): only a single SELECT/WITH statement is accepted; any write/DDL keyword (INSERT, UPDATE, DROP, ...) is rejected, and a LIMIT 1000 is appended when missing.

  2. The real guarantee: connect with a Databricks token whose grants are read-only on the catalog. App guards reduce footguns; permissions are what actually protect the data. Never give an LLM a write-capable credential.

Architecture

Claude Desktop  ──stdio──►  MCP server (this repo)  ──Databricks SQL connector──►  samples.nyctaxi.trips
   (client)                  tool · resource · prompts                              (read-only)

run_query doesn't open the connection in-process — it shells out to query_runner.py (subprocess.run(..., stdin=subprocess.DEVNULL, capture_output=True)). See the note below for why.

Implementation note: why run_query uses a subprocess

Both points were reproduced and verified on Windows + the FastMCP stdio transport (Claude Desktop and the MCP Inspector). Symptom in both: the tool call hangs and the client returns MCP error -32001: Request timed out at ~60s, even though the same query runs in ~4s with the connector directly.

  1. sql.connect() stalls ~60s when called inside the server process. From a clean child process it connects in ~2s; inside the FastMCP process it blocks until the client's request times out. It stalls on the event-loop thread and on a worker thread, so it's a process-level interaction with the connector — not just the event loop being blocked. Running the query in a child process avoids it. (Disabling telemetry / use_cloud_fetch does not help.)

  2. stdin=subprocess.DEVNULL is required on the child. A stdio MCP server's own stdin is the JSON-RPC pipe from the client. A child started with the default stdin=None inherits that pipe handle and hangs until the client gives up (~60s). Detaching stdin makes it return at query speed. capture_output=True already detaches stdout/stderr — stdin is the one that's easy to miss, so piping the query out to a subprocess without it does not fix the hang.

Gotcha — don't launch the Inspector from Git Bash on Windows. MSYS2 rewrites the POSIX-looking DATABRICKS_HTTP_PATH (/sql/1.0/warehouses/…C:/Program Files/Git/sql/1.0/warehouses/…), so the server gets a 404, not a timeout. Use PowerShell or cmd. Claude Desktop passes env vars directly and is unaffected.

Run it

Prereqs: Python 3.11+, uv, a Databricks workspace with a running SQL Warehouse and the samples catalog.

Windows / PowerShell (recommended on Windows — see the Git Bash gotcha above):

cd "C:\path\to\databricks-mcp"
uv sync                                    # first time only

# from SQL Warehouses -> Connection details, plus a personal access token.
# These live only in THIS PowerShell window (nothing is written to disk):
$env:DATABRICKS_HOST      = "dbc-xxxxxxxx-xxxx.cloud.databricks.com"
$env:DATABRICKS_HTTP_PATH = "/sql/1.0/warehouses/xxxxxxxxxxxxxxxx"
$env:DATABRICKS_TOKEN     = "dapixxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

# launch the browser inspector, then run a query from its UI:
npx @modelcontextprotocol/inspector uv run server.py
uv sync
export DATABRICKS_HOST="adb-....azuredatabricks.net"
export DATABRICKS_HTTP_PATH="/sql/1.0/warehouses/...."
export DATABRICKS_TOKEN="dapi...."
npx @modelcontextprotocol/inspector uv run server.py

Connect to Claude Desktop

You can reach the config file in two ways:

  • Via the UI (recommended): in Claude Desktop go to Settings → Developer → Edit Config. This opens (and creates, if missing) claude_desktop_config.json in the right folder.

  • By path: edit it directly at %APPDATA%\Claude\claude_desktop_config.json (Windows) or ~/Library/Application Support/Claude/claude_desktop_config.json (macOS).

Copy the contents of claude_desktop_config.example.json into that file, fill in your real values, and restart Claude Desktop. Then ask things like:

"What were the busiest pickup zones, and how does monthly revenue trend?"

Notes

  • samples.nyctaxi.trips is a public Databricks dataset; no private data is used.

  • Secrets live in env vars / the Claude Desktop config, both git-ignored.

Install Server
F
license - not found
A
quality
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/OliveriGuido/databricks-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server