create_batch_job
Submit and manage batch jobs on Google Cloud Dataproc. Define job types (Spark, PySpark, Spark SQL), include main files, JARs, arguments, and configure properties for efficient job execution.
Instructions
Create a Dataproc batch job.
Args:
project_id: Google Cloud project ID
region: Dataproc region
batch_id: Unique identifier for the batch job
job_type: Type of batch job (spark, pyspark, spark_sql)
main_file: Main file/class for the job
args: Job arguments
jar_files: JAR files to include
properties: Job properties
service_account: Service account email
network_uri: Network URI
subnetwork_uri: Subnetwork URI
Input Schema
Name | Required | Description | Default |
---|---|---|---|
args | No | ||
batch_id | Yes | ||
jar_files | No | ||
job_type | Yes | ||
main_file | Yes | ||
network_uri | No | ||
project_id | Yes | ||
properties | No | ||
region | Yes | ||
service_account | No | ||
subnetwork_uri | No |
Input Schema (JSON Schema)
{
"properties": {
"args": {
"default": null,
"items": {
"type": "string"
},
"title": "Args",
"type": "array"
},
"batch_id": {
"title": "Batch Id",
"type": "string"
},
"jar_files": {
"default": null,
"items": {
"type": "string"
},
"title": "Jar Files",
"type": "array"
},
"job_type": {
"title": "Job Type",
"type": "string"
},
"main_file": {
"title": "Main File",
"type": "string"
},
"network_uri": {
"default": null,
"title": "Network Uri",
"type": "string"
},
"project_id": {
"title": "Project Id",
"type": "string"
},
"properties": {
"additionalProperties": {
"type": "string"
},
"default": null,
"title": "Properties",
"type": "object"
},
"region": {
"title": "Region",
"type": "string"
},
"service_account": {
"default": null,
"title": "Service Account",
"type": "string"
},
"subnetwork_uri": {
"default": null,
"title": "Subnetwork Uri",
"type": "string"
}
},
"required": [
"project_id",
"region",
"batch_id",
"job_type",
"main_file"
],
"title": "create_batch_jobArguments",
"type": "object"
}