list_clusters | List all Databricks clusters |
create_cluster | Create a new Databricks cluster |
terminate_cluster | Terminate a Databricks cluster |
get_cluster | Get information about a specific Databricks cluster |
start_cluster | Start a terminated Databricks cluster |
list_jobs | List Databricks jobs with pagination and filtering. Args:
limit: Number of jobs to return (default: 25, keeps response under token limits)
offset: Starting position for pagination (default: 0, use pagination_info.next_offset for next page)
created_by: Filter by creator email (e.g. 'user@company.com'), case-insensitive, optional
include_run_status: Include latest run status and duration (default: true, set false for faster response)
Returns:
JSON with jobs array and pagination_info. Each job includes latest_run with state, duration_minutes, etc.
Use pagination_info.next_offset for next page. Total jobs shown in pagination_info.total_jobs. |
list_job_runs | List recent job runs with detailed status and duration information. Args:
job_id: Specific job ID to list runs for (optional, omit to see runs across all jobs)
limit: Number of runs to return (default: 10, most recent first)
Returns:
JSON with runs array. Each run includes state (RUNNING/SUCCESS/FAILED), result_state,
duration_minutes for completed runs, current_duration_minutes for running jobs. |
run_job | |
list_notebooks | List notebooks in a workspace directory |
export_notebook | Export a notebook from the workspace |
list_files | List files and directories in DBFS |
execute_sql | Execute a SQL statement and wait for completion (blocking) |
execute_sql_nonblocking | Start SQL statement execution and return immediately with statement_id (non-blocking) |
get_sql_status | Get the status and results of a SQL statement by statement_id |
create_notebook | Create a new notebook in the Databricks workspace |
create_job | Create a new Databricks job to run a notebook (uses serverless by default) |
upload_file_to_volume | Upload a local file to a Databricks Unity Catalog volume.
Args:
local_file_path: Path to local file (e.g. './data/products.json')
volume_path: Full volume path (e.g. '/Volumes/catalog/schema/volume/file.json')
overwrite: Whether to overwrite existing file (default: False)
Returns:
JSON with upload results including success status, file size in MB, and upload time.
Example:
# Upload large dataset to volume
result = upload_file_to_volume(
local_file_path='./stark_export/products_full.json',
volume_path='/Volumes/kbqa/stark_mas_eval/stark_raw_data/products_full.json',
overwrite=True
)
Note: Handles large files (multi-GB) with progress tracking and proper error handling.
Perfect for uploading extracted datasets to Unity Catalog volumes for processing. |
upload_file_to_dbfs | Upload a local file to Databricks File System (DBFS).
Args:
local_file_path: Path to local file (e.g. './data/notebook.py')
dbfs_path: DBFS path (e.g. '/tmp/uploaded/notebook.py')
overwrite: Whether to overwrite existing file (default: True)
Returns:
JSON with upload results including success status, file size, and upload time.
Example:
# Upload script to DBFS
result = upload_file_to_dbfs(
local_file_path='./scripts/analysis.py',
dbfs_path='/tmp/analysis.py',
overwrite=True
)
Note: For large files (>10MB), uses chunked upload with proper retry logic.
DBFS is good for temporary files, scripts, and smaller datasets. |
list_volume_files | List files and directories in a Unity Catalog volume.
Args:
volume_path: Volume path to list (e.g. '/Volumes/catalog/schema/volume/directory')
Returns:
JSON with directory listing including file names, sizes, and modification times.
Example:
# List files in volume directory
files = list_volume_files('/Volumes/kbqa/stark_mas_eval/stark_raw_data/')
Note: Returns detailed file information including sizes for managing large datasets. |