| list_clusters | List Dataproc clusters in a project and region. Args:
project_id: Google Cloud project ID (optional, uses gcloud config default)
region: Dataproc region (optional, uses gcloud config default)
|
| create_cluster | Create a new Dataproc cluster. Args:
cluster_name: Name for the new cluster
project_id: Google Cloud project ID (optional, uses gcloud config default)
region: Dataproc region (optional, uses gcloud config default)
num_instances: Number of worker instances
machine_type: Machine type for cluster nodes
disk_size_gb: Boot disk size in GB
image_version: Dataproc image version
|
| delete_cluster | Delete a Dataproc cluster. Args:
cluster_name: Name of the cluster to delete
project_id: Google Cloud project ID (optional, uses gcloud config default)
region: Dataproc region (optional, uses gcloud config default)
|
| get_cluster | Get details of a specific Dataproc cluster. Args:
cluster_name: Name of the cluster
project_id: Google Cloud project ID (optional, uses gcloud config default)
region: Dataproc region (optional, uses gcloud config default)
|
| submit_job | Submit a job to a Dataproc cluster. Args:
project_id: Google Cloud project ID
region: Dataproc region
cluster_name: Target cluster name
job_type: Type of job (spark, pyspark, spark_sql, hive, pig, hadoop)
main_file: Main file/class for the job
args: Job arguments
jar_files: JAR files to include
properties: Job properties
|
| list_jobs | List jobs in a Dataproc cluster. Args:
project_id: Google Cloud project ID
region: Dataproc region
cluster_name: Cluster name (optional)
job_states: Filter by job states
|
| get_job | Get details of a specific job. Args:
project_id: Google Cloud project ID
region: Dataproc region
job_id: Job ID
|
| cancel_job | Cancel a running job. Args:
project_id: Google Cloud project ID
region: Dataproc region
job_id: Job ID to cancel
|
| create_batch_job | Create a Dataproc batch job. Args:
project_id: Google Cloud project ID
region: Dataproc region
batch_id: Unique identifier for the batch job
job_type: Type of batch job (spark, pyspark, spark_sql)
main_file: Main file/class for the job
args: Job arguments
jar_files: JAR files to include
properties: Job properties
service_account: Service account email
network_uri: Network URI
subnetwork_uri: Subnetwork URI
|
| list_batch_jobs | List Dataproc batch jobs. Args:
project_id: Google Cloud project ID
region: Dataproc region
page_size: Number of results per page
|
| get_batch_job | Get details of a specific batch job. Args:
project_id: Google Cloud project ID
region: Dataproc region
batch_id: Batch job ID
|
| delete_batch_job | Delete a batch job. Args:
project_id: Google Cloud project ID
region: Dataproc region
batch_id: Batch job ID to delete
|
| compare_batch_jobs | Compare two Dataproc batch jobs and return detailed differences. Args:
batch_id_1: First batch job ID to compare
batch_id_2: Second batch job ID to compare
project_id: Google Cloud project ID (optional, uses gcloud config default)
region: Dataproc region (optional, uses gcloud config default)
|