Skip to main content
Glama
samhavens

Databricks MCP Server

by samhavens

list_jobs

Retrieve Databricks job listings with pagination and filtering options to manage and monitor scheduled workflows efficiently.

Instructions

List Databricks jobs with pagination and filtering.

Args:
    limit: Number of jobs to return (default: 25, keeps response under token limits)
    offset: Starting position for pagination (default: 0, use pagination_info.next_offset for next page)
    created_by: Filter by creator email (e.g. 'user@company.com'), case-insensitive, optional
    include_run_status: Include latest run status and duration (default: true, set false for faster response)

Returns:
    JSON with jobs array and pagination_info. Each job includes latest_run with state, duration_minutes, etc.
    Use pagination_info.next_offset for next page. Total jobs shown in pagination_info.total_jobs.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
limitNo
offsetNo
created_byNo
include_run_statusNo

Implementation Reference

  • The primary MCP tool handler for 'list_jobs'. Registers the tool via @mcp.tool(), implements client-side pagination, creator filtering, and enriches each job with latest run status by calling jobs.list_runs(job_id, limit=1). Returns formatted JSON.
    @mcp.tool()
    async def list_jobs(
        limit: int = 25, 
        offset: int = 0, 
        created_by: Optional[str] = None,
        include_run_status: bool = True
    ) -> str:
        """List Databricks jobs with pagination and filtering.
        
        Args:
            limit: Number of jobs to return (default: 25, keeps response under token limits)
            offset: Starting position for pagination (default: 0, use pagination_info.next_offset for next page)
            created_by: Filter by creator email (e.g. 'user@company.com'), case-insensitive, optional
            include_run_status: Include latest run status and duration (default: true, set false for faster response)
        
        Returns:
            JSON with jobs array and pagination_info. Each job includes latest_run with state, duration_minutes, etc.
            Use pagination_info.next_offset for next page. Total jobs shown in pagination_info.total_jobs.
        """
        logger.info(f"Listing jobs (limit={limit}, offset={offset}, created_by={created_by})")
        try:
            # Fetch all jobs from API
            result = await jobs.list_jobs()
            
            if "jobs" in result:
                all_jobs = result["jobs"]
                
                # Filter by creator if specified
                if created_by:
                    all_jobs = [job for job in all_jobs 
                               if job.get("creator_user_name", "").lower() == created_by.lower()]
                
                total_jobs = len(all_jobs)
                
                # Apply client-side pagination
                start_idx = offset
                end_idx = offset + limit
                paginated_jobs = all_jobs[start_idx:end_idx]
                
                # Enhance jobs with run status if requested
                if include_run_status and paginated_jobs:
                    enhanced_jobs = []
                    for job in paginated_jobs:
                        enhanced_job = job.copy()
                        
                        # Get most recent run for this job
                        try:
                            runs_result = await jobs.list_runs(job_id=job["job_id"], limit=1)
                            if "runs" in runs_result and runs_result["runs"]:
                                latest_run = runs_result["runs"][0]
                                
                                # Add run status info
                                enhanced_job["latest_run"] = {
                                    "run_id": latest_run.get("run_id"),
                                    "state": latest_run.get("state", {}).get("life_cycle_state"),
                                    "result_state": latest_run.get("state", {}).get("result_state"),
                                    "start_time": latest_run.get("start_time"),
                                    "end_time": latest_run.get("end_time"),
                                }
                                
                                # Calculate duration if both times available
                                start_time = latest_run.get("start_time")
                                end_time = latest_run.get("end_time")
                                if start_time and end_time:
                                    duration_ms = end_time - start_time
                                    enhanced_job["latest_run"]["duration_seconds"] = duration_ms // 1000
                                    enhanced_job["latest_run"]["duration_minutes"] = duration_ms // 60000
                            else:
                                enhanced_job["latest_run"] = {"status": "no_runs"}
                                
                        except Exception as e:
                            enhanced_job["latest_run"] = {"error": f"Failed to get run info: {str(e)}"}
                        
                        enhanced_jobs.append(enhanced_job)
                    
                    paginated_jobs = enhanced_jobs
                
                # Create paginated response
                paginated_result = {
                    "jobs": paginated_jobs,
                    "pagination_info": {
                        "total_jobs": total_jobs,
                        "returned": len(paginated_jobs),
                        "limit": limit,
                        "offset": offset,
                        "has_more": end_idx < total_jobs,
                        "next_offset": end_idx if end_idx < total_jobs else None,
                        "filtered_by": {"created_by": created_by} if created_by else None
                    }
                }
                
                return json.dumps(paginated_result)
            else:
                return json.dumps(result)
                
        except Exception as e:
            logger.error(f"Error listing jobs: {str(e)}")
            return json.dumps({"error": str(e)})
  • Low-level helper function that makes the actual Databricks API call to /api/2.0/jobs/list. Called by the MCP handler.
    async def list_jobs(limit: Optional[int] = None, page_token: Optional[str] = None) -> Dict[str, Any]:
        """
        List jobs with optional pagination.
        
        Args:
            limit: Maximum number of jobs to return (1-100, default: 20)
            page_token: Token for pagination (from previous response's next_page_token)
        
        Returns:
            Response containing a list of jobs and optional next_page_token
            
        Raises:
            DatabricksAPIError: If the API request fails
        """
        params = {}
        if limit is not None:
            # Databricks API limits: 1-100 for jobs list
            if limit < 1:
                limit = 1
            elif limit > 100:
                limit = 100
            params["limit"] = limit
        if page_token is not None:
            params["page_token"] = page_token
            
        logger.info(f"Listing jobs (limit={limit}, page_token={'***' if page_token else None})")
        return make_api_request("GET", "/api/2.0/jobs/list", params=params if params else None)
  • Supporting helper for listing job runs (/api/2.0/jobs/runs/list), used by the handler to fetch latest run status for each job.
    async def list_runs(job_id: Optional[int] = None, limit: Optional[int] = None) -> Dict[str, Any]:
        """
        List job runs, optionally filtered by job_id.
        
        Args:
            job_id: ID of the job to list runs for (optional)
            limit: Maximum number of runs to return (optional)
            
        Returns:
            Response containing a list of job runs
            
        Raises:
            DatabricksAPIError: If the API request fails
        """
        params = {}
        if job_id is not None:
            params["job_id"] = job_id
        if limit is not None:
            params["limit"] = limit
            
        logger.info(f"Listing runs (job_id={job_id}, limit={limit})")
        return make_api_request("GET", "/api/2.0/jobs/runs/list", params=params if params else None) 
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and does well by disclosing key behavioral traits: it mentions pagination mechanics, token limit considerations ('keeps response under token limits'), performance trade-offs ('set false for faster response'), and case-insensitive filtering. It also details the return structure, including pagination info and job data with latest run details. However, it lacks information on rate limits, authentication requirements, or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and well-structured: it starts with a clear purpose statement, then details arguments with explanations, and concludes with return value information. Every sentence adds value—no fluff or repetition. The use of sections (Args, Returns) enhances readability without unnecessary verbosity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (4 parameters, no annotations, no output schema), the description is largely complete: it covers purpose, parameters, return format, and pagination behavior. However, it lacks context on authentication, error cases, or rate limits, which are important for a listing tool in a cloud service like Databricks. The absence of an output schema is mitigated by the detailed return description.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, so the description must compensate fully, which it does excellently. It adds meaningful semantics for all 4 parameters: explains 'limit' default and token limit rationale, describes 'offset' usage with pagination guidance, specifies 'created_by' format and case-insensitivity, and clarifies 'include_run_status' impact on performance. This goes well beyond the basic schema to provide practical usage context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'List Databricks jobs with pagination and filtering.' It specifies the verb ('List') and resource ('Databricks jobs'), and distinguishes it from siblings like 'list_job_runs' by focusing on jobs rather than runs. However, it doesn't explicitly contrast with other listing tools like 'list_clusters' or 'list_notebooks' beyond the resource type.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for listing jobs with optional filtering and pagination, but doesn't explicitly state when to use this tool versus alternatives. For example, it doesn't clarify if this should be used over 'list_job_runs' for job metadata or when filtering by creator is needed. The guidance is limited to functional parameters rather than contextual decision-making.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/samhavens/databricks-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server