SparkSubmit Operator

The SparkSubmit operator connects to a Spark instance and invokes a local spark-submit to schedule a job.

Operator arguments

Name
Type
Value Restrictions
Required
Default
Description

file

String

It should be a path to a valid java or python spark application file

Yes

Spark application to submit (jar or python file)

args

List of Strings, Numbers or Booleans

Yes

Additional application arguments

master

String

It should be a valid supported Master URL:

  • local

  • local[K]

  • local[K,F]

  • local[]

  • local[,F]

  • spark://HOST:PORT

  • spark://HOST1:PORT1, HOST2:PORT2

  • yarn

Yes

The master URL for the Spark cluster

deployMode

client cluster

No

cluster

Whether to launch the driver locally (client) or in the cluster (cluster)

className

String

No

The entry point of the java application. Required for java applications.

name

String

No

Name of the task. When submitted the id of the study, experiment and trial will be appended.

jars

List of Strings

Each item of the list should be a path that matches an existing jar file

No

A list of jars to be added in the classpath.

pyFiles

List of Strings

Each item of the list should be a path that matches an existing python file

No

A list of python scripts to be added to the PYTHONPATH

files

List of Strings

Each item of the list should be a path that matches an existing file

No

A list of files to be added to the context of the spark-submit

conf

Object (key-value pairs)

No

Mapping containing additional Spark configurations. See Spark documentation.

envVars

Object (key-value pairs)

No

Env variables when running the spark-submit command

sparkSubmitExec

String

It should be a path that matches an existing executable

No

The default for the Spark installation

The path of the spark-submit executable command

sparkHome

String

It should be a path that matches an existing directory

No

The default for the Spark installation

The path of the SPARK_HOME

proxyUser

String

No

The user to be used to execute Spark applications

verbose

Boolean

No

true

If additional debugging output should be displayed

component

String

It should match the name of an existing Component of the System under test

Yes

The name of the component whose properties can be used as arguments of the operator

Last updated