Skip to main content
Glama
warrenzhu25

Dataproc MCP Server

by warrenzhu25

get_job

Retrieve detailed information about a specific Dataproc job, including status, configuration, and execution details, by providing project ID, region, and job ID.

Instructions

Get details of a specific job.

Args:
    project_id: Google Cloud project ID
    region: Dataproc region
    job_id: Job ID

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
project_idYes
regionYes
job_idYes

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • MCP tool handler function for 'get_job', decorated with @mcp.tool(). It instantiates DataprocClient and calls its get_job method to retrieve and return job details as a string.
    @mcp.tool()
    async def get_job(project_id: str, region: str, job_id: str) -> str:
        """Get details of a specific job.
    
        Args:
            project_id: Google Cloud project ID
            region: Dataproc region
            job_id: Job ID
        """
        client = DataprocClient()
        try:
            result = await client.get_job(project_id, region, job_id)
            return str(result)
        except Exception as e:
            logger.error("Failed to get job", error=str(e))
            return f"Error: {str(e)}"
  • Core implementation of get_job in DataprocClient class. Uses Google Cloud Dataproc_v1 JobControllerClient to fetch job details via API and formats the response dictionary.
    async def get_job(
        self, project_id: str, region: str, job_id: str
    ) -> dict[str, Any]:
        """Get details of a specific job."""
        try:
            loop = asyncio.get_event_loop()
            client = self._get_job_client(region)
    
            request = types.GetJobRequest(
                project_id=project_id, region=region, job_id=job_id
            )
    
            job = await loop.run_in_executor(None, client.get_job, request)
    
            return {
                "job_id": job.reference.job_id,
                "cluster_name": job.placement.cluster_name,
                "status": job.status.state.name,
                "status_detail": job.status.details,
                "job_type": self._get_job_type(job),
                "submission_time": job.status.state_start_time.isoformat()
                if job.status.state_start_time
                else None,
                "start_time": job.status.state_start_time.isoformat()
                if job.status.state_start_time
                else None,
                "end_time": job.status.state_start_time.isoformat()
                if job.status.state_start_time
                else None,
                "driver_output_uri": job.driver_output_resource_uri,
                "driver_control_files_uri": job.driver_control_files_uri,
            }
    
        except Exception as e:
            logger.error("Failed to get job", error=str(e))
            raise
  • Helper method to determine the type of a Dataproc job based on which job config is present.
    def _get_job_type(self, job: types.Job) -> str:
        """Extract job type from job object."""
        if job.spark_job:
            return "spark"
        elif job.pyspark_job:
            return "pyspark"
        elif job.spark_sql_job:
            return "spark_sql"
        elif job.hive_job:
            return "hive"
        elif job.pig_job:
            return "pig"
        elif job.hadoop_job:
            return "hadoop"
        else:
            return "unknown"
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden for behavioral disclosure. While 'Get details' implies a read-only operation, the description doesn't specify authentication requirements, rate limits, error conditions, or what format/details are returned. For a tool with no annotation coverage, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately brief with a clear purpose statement followed by parameter documentation. The structure is logical and front-loaded. However, the parameter documentation is somewhat redundant since parameter names are already in the schema, though necessary given 0% schema description coverage.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that an output schema exists, the description doesn't need to explain return values. However, with no annotations, 3 parameters at 0% schema coverage, and multiple similar sibling tools, the description should provide more context about what type of job this retrieves and how it differs from other get/list tools to be complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the schema provides no parameter descriptions. The description lists the three parameters with brief labels but doesn't explain what values are expected, format requirements, or where to find these IDs. It adds minimal semantic value beyond what's already evident from parameter names in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Get') and resource ('details of a specific job'), making the purpose immediately understandable. However, it doesn't differentiate this tool from similar siblings like 'get_batch_job' or 'get_cluster', which would require more specificity about what type of job this retrieves.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With siblings like 'get_batch_job', 'list_jobs', and 'submit_job' available, there's no indication of whether this tool is for Dataproc jobs specifically, batch jobs, or general jobs, nor when one should use this versus the other get/list tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/warrenzhu25/dataproc-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server