CONFIGURATION.md•11.6 kB
# Configuration Guide
This guide covers all configuration options for the Databricks MCP Server.
## Table of Contents
- [Environment Variables](#environment-variables)
- [Authentication Methods](#authentication-methods)
- [MCP Client Configuration](#mcp-client-configuration)
- [Advanced Configuration](#advanced-configuration)
- [Troubleshooting](#troubleshooting)
## Environment Variables
The server is configured primarily through environment variables.
### Required Variables
```env
# Databricks workspace URL (required)
DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
# Authentication (one method required)
DATABRICKS_TOKEN=dapi1234567890abcdef
```
### Optional Variables
```env
# Service Principal OAuth (alternative to token)
DATABRICKS_CLIENT_ID=your-client-id
DATABRICKS_CLIENT_SECRET=your-client-secret
# Azure AD (for Azure Databricks)
DATABRICKS_AZURE_TENANT_ID=your-tenant-id
```
## Authentication Methods
### Method 1: Personal Access Token (Recommended for Development)
Personal Access Tokens are the easiest way to get started.
#### Creating a Token
1. Log into your Databricks workspace
2. Click on your profile icon (top right)
3. Select **User Settings**
4. Go to **Access tokens** tab
5. Click **Generate new token**
6. Give it a description and optional lifetime
7. Copy the token immediately (you won't see it again)
#### Configuration
```env
DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
DATABRICKS_TOKEN=dapi1234567890abcdef
```
**Pros:**
- Simple to set up
- Good for individual development
- Works across all Databricks clouds
**Cons:**
- Tied to individual user account
- Manual rotation required
- Not recommended for production
### Method 2: Service Principal OAuth (Recommended for Production)
Service principals provide machine-to-machine authentication without user credentials.
#### Creating a Service Principal
**AWS & GCP Databricks:**
1. Go to **Admin Console** → **Service Principals**
2. Click **Add Service Principal**
3. Enter application ID and display name
4. Generate a client secret
5. Grant necessary permissions
**Azure Databricks:**
1. Create an Azure AD application
2. Create a client secret
3. Add the application to Databricks workspace
4. Grant necessary permissions
#### Configuration
```env
DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
DATABRICKS_CLIENT_ID=your-client-id
DATABRICKS_CLIENT_SECRET=your-client-secret
```
**Pros:**
- Not tied to individual users
- Better for automation
- Recommended for production
- Can be rotated programmatically
**Cons:**
- More complex setup
- Requires admin privileges to create
### Method 3: Azure AD Token (Azure Databricks Only)
For Azure Databricks workspaces with Azure AD integration.
#### Configuration
```env
DATABRICKS_HOST=https://adb-1234567890123456.7.azuredatabricks.net
DATABRICKS_AZURE_TENANT_ID=your-tenant-id
DATABRICKS_TOKEN=azure-ad-token
```
**Note:** Azure AD tokens expire after 1 hour. Implement token refresh in production.
## MCP Client Configuration
### Claude Desktop
Add this configuration to your Claude Desktop config file:
**macOS:** `~/Library/Application Support/Claude/claude_desktop_config.json`
**Windows:** `%APPDATA%\Claude\claude_desktop_config.json`
**Linux:** `~/.config/Claude/claude_desktop_config.json`
```json
{
"mcpServers": {
"databricks": {
"command": "databricks-mcp-server",
"env": {
"DATABRICKS_HOST": "https://your-workspace.cloud.databricks.com",
"DATABRICKS_TOKEN": "your-token"
}
}
}
}
```
### Using with .env File
If you prefer to use a `.env` file:
```json
{
"mcpServers": {
"databricks": {
"command": "sh",
"args": [
"-c",
"cd /path/to/databricks-mcp-server && node dist/index.js"
]
}
}
}
```
Then create `.env` in the server directory:
```env
DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
DATABRICKS_TOKEN=your-token
```
### Other MCP Clients
For other MCP clients, refer to their documentation for server configuration. The general pattern is:
1. Set the command to run: `databricks-mcp-server`
2. Provide environment variables for authentication
3. Ensure the server can communicate via stdio
## Advanced Configuration
### Multiple Workspaces
To connect to multiple Databricks workspaces, create separate server instances:
```json
{
"mcpServers": {
"databricks-prod": {
"command": "databricks-mcp-server",
"env": {
"DATABRICKS_HOST": "https://prod-workspace.cloud.databricks.com",
"DATABRICKS_TOKEN": "prod-token"
}
},
"databricks-dev": {
"command": "databricks-mcp-server",
"env": {
"DATABRICKS_HOST": "https://dev-workspace.cloud.databricks.com",
"DATABRICKS_TOKEN": "dev-token"
}
}
}
}
```
### Custom Installation Path
If you installed from source or to a custom location:
```json
{
"mcpServers": {
"databricks": {
"command": "node",
"args": ["/custom/path/to/databricks-mcp-server/dist/index.js"],
"env": {
"DATABRICKS_HOST": "https://your-workspace.cloud.databricks.com",
"DATABRICKS_TOKEN": "your-token"
}
}
}
}
```
### Proxy Configuration
If your network requires a proxy:
```json
{
"mcpServers": {
"databricks": {
"command": "databricks-mcp-server",
"env": {
"DATABRICKS_HOST": "https://your-workspace.cloud.databricks.com",
"DATABRICKS_TOKEN": "your-token",
"HTTP_PROXY": "http://proxy.company.com:8080",
"HTTPS_PROXY": "http://proxy.company.com:8080"
}
}
}
}
```
## Permission Requirements
### Workspace Permissions
The token or service principal needs these permissions:
**Minimum (read-only):**
- Workspace access
- Can view clusters, jobs, notebooks
**Recommended:**
- Workspace User
- Cluster create permissions
- Job create/manage permissions
**Full functionality:**
- Workspace Admin (for user/group management)
- Unity Catalog permissions (for data operations)
### Unity Catalog Permissions
For Unity Catalog operations, ensure your principal has:
- `USAGE` on catalogs and schemas
- `SELECT` on tables (for queries)
- `CREATE` permissions (to create catalogs/schemas/tables)
- `MODIFY` permissions (to update/delete objects)
### API Access
Ensure API access is enabled:
1. Go to **Admin Console** → **Workspace Settings**
2. Under **Advanced**, enable **Personal Access Tokens**
3. Enable relevant APIs for your use case
## Workspace URL Formats
Different Databricks cloud providers use different URL formats:
**AWS:**
```
https://dbc-12345678-abcd.cloud.databricks.com
```
**Azure:**
```
https://adb-1234567890123456.7.azuredatabricks.net
```
**GCP:**
```
https://12345678901234.5.gcp.databricks.com
```
Always include `https://` in the `DATABRICKS_HOST` variable.
## Security Best Practices
### Token Security
1. **Never commit tokens to version control**
```gitignore
.env
.env.local
*.env
```
2. **Use environment-specific tokens**
- Separate tokens for dev, staging, production
- Different tokens for different applications
3. **Rotate tokens regularly**
- Set expiration dates on tokens
- Rotate at least every 90 days
- Have a rotation procedure in place
4. **Use minimal permissions**
- Grant only the permissions needed
- Use separate tokens for different functions
- Review permissions periodically
### Service Principal Security
1. **Protect client secrets**
- Store in secure secret management system
- Never log or expose in error messages
- Rotate regularly
2. **Implement least privilege**
- Grant minimal necessary permissions
- Use separate SPs for different applications
- Review and audit permissions
3. **Monitor usage**
- Enable audit logging
- Monitor for unusual activity
- Set up alerts for suspicious behavior
## Troubleshooting
### Connection Issues
**Problem:** Cannot connect to Databricks workspace
**Solutions:**
1. Verify `DATABRICKS_HOST` is correct
2. Ensure URL includes `https://`
3. Check network connectivity
4. Verify proxy settings if applicable
5. Check firewall rules
### Authentication Errors
**Problem:** "Authentication failed"
**Solutions:**
1. Verify token is valid and not expired
2. Check token has correct format (starts with `dapi`)
3. Ensure API access is enabled in workspace
4. Verify service principal credentials if using OAuth
5. Check token permissions
### Permission Denied
**Problem:** "Permission denied" or "403 Forbidden"
**Solutions:**
1. Verify token has necessary workspace permissions
2. Check Unity Catalog grants for data operations
3. Ensure user/SP has access to specific resources
4. Review workspace access controls
5. Contact workspace admin for permission review
### Tool Execution Errors
**Problem:** Tool calls fail or return errors
**Solutions:**
1. Check tool input parameters
2. Verify resource exists (cluster ID, job ID, etc.)
3. Check resource state (can't start a running cluster)
4. Review error message for specific issue
5. Check Databricks workspace audit logs
### Configuration Not Loading
**Problem:** Environment variables not being read
**Solutions:**
1. Verify `.env` file is in correct directory
2. Check file permissions (must be readable)
3. Ensure no syntax errors in `.env` file
4. Restart MCP client after configuration changes
5. Check MCP client logs for startup errors
### Rate Limiting
**Problem:** "Too many requests" or 429 errors
**Solutions:**
1. The server automatically handles rate limiting with backoff
2. If persistent, reduce request frequency
3. Contact Databricks support to increase limits
4. Implement request batching where possible
### SQL Warehouse Required
**Problem:** "SQL warehouse required" for table queries
**Solutions:**
1. Create a SQL warehouse in Databricks
2. Note the warehouse ID
3. Pass warehouse ID to `query_table` tool
4. Ensure warehouse is running
5. Verify permissions on warehouse
## Getting Help
If you're still experiencing issues:
1. Check the [README](./README.md) for general information
2. Review [API.md](./API.md) for tool-specific documentation
3. See [EXAMPLES.md](./EXAMPLES.md) for usage examples
4. Search existing GitHub issues
5. Create a new issue with:
- Detailed problem description
- Configuration (redact sensitive info)
- Error messages
- Steps to reproduce
## Configuration Examples
### Development Setup
```env
# .env file for local development
DATABRICKS_HOST=https://dev-workspace.cloud.databricks.com
DATABRICKS_TOKEN=dapi_dev_token_12345
```
### Production Setup
```env
# .env file for production (use secret management in real production)
DATABRICKS_HOST=https://prod-workspace.cloud.databricks.com
DATABRICKS_CLIENT_ID=prod-service-principal-id
DATABRICKS_CLIENT_SECRET=prod-service-principal-secret
```
### Multi-Cloud Setup
For organizations using multiple cloud providers:
```json
{
"mcpServers": {
"databricks-aws": {
"command": "databricks-mcp-server",
"env": {
"DATABRICKS_HOST": "https://aws-workspace.cloud.databricks.com",
"DATABRICKS_TOKEN": "aws-token"
}
},
"databricks-azure": {
"command": "databricks-mcp-server",
"env": {
"DATABRICKS_HOST": "https://adb-12345.azuredatabricks.net",
"DATABRICKS_TOKEN": "azure-token"
}
},
"databricks-gcp": {
"command": "databricks-mcp-server",
"env": {
"DATABRICKS_HOST": "https://12345.gcp.databricks.com",
"DATABRICKS_TOKEN": "gcp-token"
}
}
}
}
```