submit_spark_application
Submit a Spark application to a specified engine for execution, with configurable arguments, properties, and environment variables.
Instructions
Submit a Spark application for execution on a Spark engine.
Args: engine_id: Spark engine identifier application: Application file path (JAR, Python, R file) arguments: Application arguments array conf: Spark configuration properties (e.g., {"spark.executor.memory": "2g"}) env: Environment variables name: Application name (will be added to conf as spark.app.name) job_endpoint: External job endpoint service_instance_id: Service instance ID - "iae" or "emr" type: Engine type - "spark" or "gluten" context_type: Context type - "project", "git_project", or "space" volumes: Volume mounts (watsonx.data software only). List of dicts with: - name: volume name - mount_path: path in spark cluster (e.g., "/mount/path") - source_sub_path: path in volume to mount (e.g., "/source/path") - read_only: boolean flag
Returns: Dict with application_id, state, and submission details
Examples: Minimal configuration for IBM Cloud Object Storage using cos:// protocol:
{
"engine_id": "spark398",
"application": "cos://bucket.instance/app.py",
"arguments": ["cos://bucket.instance/data.csv"],
"conf": {
"spark.hadoop.fs.cos.instance.endpoint": "s3.direct.us-east.cloud-object-storage.appdomain.cloud",
"spark.hadoop.fs.cos.instance.access.key": "your-access-key",
"spark.hadoop.fs.cos.instance.secret.key": "your-secret-key"
}
}
Minimal configuration for IBM Cloud Object Storage using s3a:// protocol:
{
"engine_id": "spark398",
"application": "s3a://bucket/app.py",
"arguments": ["s3a://bucket/data.csv"],
"conf": {
"spark.hadoop.fs.s3a.bucket.bucket.access.key": "your-access-key",
"spark.hadoop.fs.s3a.bucket.bucket.secret.key": "your-secret-key",
"spark.hadoop.fs.s3a.bucket.bucket.aws.credentials.provider": "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
}
}
Optional conf parameters (uses engine defaults if not specified):
- spark.app.name: Custom application name
- ae.spark.driver.log.level / ae.spark.executor.log.level: Log levels
- spark.driver.cores / spark.driver.memory: Driver resources
- spark.executor.cores / spark.executor.memory: Executor resourcesInput Schema
| Name | Required | Description | Default |
|---|---|---|---|
| engine_id | Yes | ||
| application | Yes | ||
| arguments | No | ||
| conf | No | ||
| env | No | ||
| name | No | ||
| job_endpoint | No | ||
| service_instance_id | No | ||
| type | No | ||
| context_type | No | ||
| volumes | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||