upload_file_to_dbfs
Transfer local files to Databricks File System (DBFS) for temporary storage, scripts, or smaller datasets. Handles large files via chunked upload and provides JSON output with success status, file size, and upload time.
Instructions
Upload a local file to Databricks File System (DBFS).
Args:
local_file_path: Path to local file (e.g. './data/notebook.py')
dbfs_path: DBFS path (e.g. '/tmp/uploaded/notebook.py')
overwrite: Whether to overwrite existing file (default: True)
Returns:
JSON with upload results including success status, file size, and upload time.
Example:
# Upload script to DBFS
result = upload_file_to_dbfs(
local_file_path='./scripts/analysis.py',
dbfs_path='/tmp/analysis.py',
overwrite=True
)
Note: For large files (>10MB), uses chunked upload with proper retry logic.
DBFS is good for temporary files, scripts, and smaller datasets.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| dbfs_path | Yes | ||
| local_file_path | Yes | ||
| overwrite | No |
Implementation Reference
- The primary MCP tool handler for 'upload_file_to_dbfs'. Decorated with @mcp.tool() for registration. Handles input parameters defining the schema (local_file_path: str, dbfs_path: str, overwrite: bool=True), implements logic to upload based on file size using DBFS helpers, computes metrics, and returns JSON response.@mcp.tool() async def upload_file_to_dbfs( local_file_path: str, dbfs_path: str, overwrite: bool = True ) -> str: """ Upload a local file to Databricks File System (DBFS). Args: local_file_path: Path to local file (e.g. './data/notebook.py') dbfs_path: DBFS path (e.g. '/tmp/uploaded/notebook.py') overwrite: Whether to overwrite existing file (default: True) Returns: JSON with upload results including success status, file size, and upload time. Example: # Upload script to DBFS result = upload_file_to_dbfs( local_file_path='./scripts/analysis.py', dbfs_path='/tmp/analysis.py', overwrite=True ) Note: For large files (>10MB), uses chunked upload with proper retry logic. DBFS is good for temporary files, scripts, and smaller datasets. """ logger.info(f"Uploading file from {local_file_path} to DBFS: {dbfs_path}") try: import os import time if not os.path.exists(local_file_path): raise FileNotFoundError(f"Local file not found: {local_file_path}") # Get file info start_time = time.time() file_size = os.path.getsize(local_file_path) file_size_mb = file_size / (1024 * 1024) # Choose upload method based on file size if file_size > 10 * 1024 * 1024: # > 10MB result = await dbfs.upload_large_file( dbfs_path=dbfs_path, local_file_path=local_file_path, overwrite=overwrite ) else: # Read and upload small file with open(local_file_path, 'rb') as f: file_content = f.read() result = await dbfs.put_file( dbfs_path=dbfs_path, file_content=file_content, overwrite=overwrite ) end_time = time.time() upload_time = end_time - start_time return json.dumps({ "success": True, "file_size_mb": round(file_size_mb, 1), "upload_time_seconds": round(upload_time, 1), "dbfs_path": dbfs_path, "file_size_bytes": file_size }) except Exception as e: logger.error(f"Error uploading file to DBFS: {str(e)}") return json.dumps({ "success": False, "error": str(e), "dbfs_path": dbfs_path })
- src/api/dbfs.py:51-134 (helper)Supporting helper function called by the tool handler for large files (>10MB). Implements chunked upload using DBFS API endpoints: /create, /add-block, /close with base64-encoded chunks.async def upload_large_file( dbfs_path: str, local_file_path: str, overwrite: bool = True, buffer_size: int = 1024 * 1024, # 1MB chunks ) -> Dict[str, Any]: """ Upload a large file to DBFS in chunks. Args: dbfs_path: The path where the file should be stored in DBFS local_file_path: Local path to the file to upload overwrite: Whether to overwrite an existing file buffer_size: Size of chunks to upload Returns: Empty response on success Raises: DatabricksAPIError: If the API request fails FileNotFoundError: If the local file does not exist """ logger.info(f"Uploading large file from {local_file_path} to DBFS path: {dbfs_path}") if not os.path.exists(local_file_path): raise FileNotFoundError(f"Local file not found: {local_file_path}") # Create a handle for the upload create_response = make_api_request( "POST", "/api/2.0/dbfs/create", data={ "path": dbfs_path, "overwrite": overwrite, }, ) handle = create_response.get("handle") try: with open(local_file_path, "rb") as f: chunk_index = 0 while True: chunk = f.read(buffer_size) if not chunk: break # Convert chunk to base64 chunk_base64 = base64.b64encode(chunk).decode("utf-8") # Add to handle make_api_request( "POST", "/api/2.0/dbfs/add-block", data={ "handle": handle, "data": chunk_base64, }, ) chunk_index += 1 logger.debug(f"Uploaded chunk {chunk_index}") # Close the handle return make_api_request( "POST", "/api/2.0/dbfs/close", data={"handle": handle}, ) except Exception as e: # Attempt to abort the upload on error try: make_api_request( "POST", "/api/2.0/dbfs/close", data={"handle": handle}, ) except Exception: pass logger.error(f"Error uploading file: {str(e)}") raise
- src/api/dbfs.py:16-49 (helper)Supporting helper function for small files (<=10MB). Reads file content, base64 encodes it, and uploads via DBFS /put API endpoint.async def put_file( dbfs_path: str, file_content: bytes, overwrite: bool = True, ) -> Dict[str, Any]: """ Upload a file to DBFS. Args: dbfs_path: The path where the file should be stored in DBFS file_content: The content of the file as bytes overwrite: Whether to overwrite an existing file Returns: Empty response on success Raises: DatabricksAPIError: If the API request fails """ logger.info(f"Uploading file to DBFS path: {dbfs_path}") # Convert bytes to base64 content_base64 = base64.b64encode(file_content).decode("utf-8") return make_api_request( "POST", "/api/2.0/dbfs/put", data={ "path": dbfs_path, "contents": content_base64, "overwrite": overwrite, }, )
- src/server/simple_databricks_mcp_server.py:457-457 (registration)The @mcp.tool() decorator registers the upload_file_to_dbfs function as an MCP tool.@mcp.tool()