1 of 4

Spark optimization pack

The Spark optimization pack allows tuning applications running on the Apache Spark framework. Through this optimization pack, Akamas is able to explore the space of the Spark parameters in order to find the configurations that best optimize the allocated resources or the execution time.

To achieve these goals the optimization pack provides parameters that focus on the following areas:

Driver and executors' resources allocation
Parallelism
Shuffling
Spark SQL

Similarly, the bundled metrics provide visibility on the following statistics from the Spark History Server:

Execution time
Executors' resource usage
Garbage collection time

Component Types

Component Type

Description

Installing

Here’s the command to install the Spark optimization pack using the Akamas CLI:

akamas install optimization-pack Spark

Spark Application 2.2.0

This page describes the Optimization Pack for Spark Application 2.2.0.

Metrics

Duration

Metric

Unit

Desciption

Driver

Metric

Unit

Description

Executors

Stages and Tasks

Parameters

Execution

CPU and Memory allocation

Shuffling

Dynamic allocation

SQL

Compression and Serialization

Constraints

The following tables show a list of constraints that may be required in the definition of the study, depending on the tuned parameters:

Cluster size

The overall resources allocated to the application should be constrained by a maximum and, sometimes, a minimum value:

the maximum value could be the sum of resources physically available in the cluster, or a lower limit to allow the concurrent execution of other applications
an optional minimum value could be useful to avoid configurations that allocate executors that are both small and scarce

Spark Application 2.3.0

This page describes the Optimization Pack for Spark Application 2.3.0.

Metrics

Duration

Metric

Unit

Desciption

Driver

Metric

Unit

Description

Executors

Stages and Tasks

Parameters

Execution

CPU and Memory allocation

Shuffling

Dynamic allocation

SQL

Compression and Serialization

Constraints

The following tables show a list of constraints that may be required in the definition of the study, depending on the tuned parameters:

Cluster size

The overall resources allocated to the application should be constrained by a maximum and, sometimes, a minimum value:

the maximum value could be the sum of resources physically available in the cluster, or a lower limit to allow the concurrent execution of other applications
an optional minimum value could be useful to avoid configurations that allocate executors that are both small and scarce

Spark Application 2.4.0

This page describes the Optimization Pack for Spark Application 2.4.0.

Metrics

Duration

Metric

Unit

Desciption

Driver

Metric

Unit

Description

Executors

Stages and Tasks

Parameters

Execution

CPU and Memory allocation

Shuffling

Dynamic allocation

SQL

Compression and Serialization

Constraints

The following tables show a list of constraints that may be required in the definition of the study, depending on the tuned parameters:

Cluster size

The overall resources allocated to the application should be constrained by a maximum and, sometimes, a minimum value:

the maximum value could be the sum of resources physically available in the cluster, or a lower limit to allow the concurrent execution of other applications
an optional minimum value could be useful to avoid configurations that allocate executors that are both small and scarce

Spark Application 2.2.0

This page describes the Optimization Pack for Spark Application 2.2.0.

Metrics

Duration

Metric

Unit

Desciption

Driver

Metric

Unit

Description

Executors

Metric

Unit

Description

Stages and Tasks

Metric

Unit

Description

Parameters

Execution

Parameter

Unit

Type

Default value

Domain

Restart

Description

CPU and Memory allocation

Parameter

Unit

Type

Default value

Domain

Restart

Description

Shuffling

Parameter

Unit

Type

Default value

Domain

Restart

Description

Dynamic allocation

Parameter

Unit

Type

Default value

Domain

Restart

Description

SQL

Parameter

Unit

Type

Default value

Domain

Restart

Description

Compression and Serialization

Parameter

Unit

Type

Default value

Domain

Restart

Description

Constraints

The following tables show a list of constraints that may be required in the definition of the study, depending on the tuned parameters:

Cluster size

The overall resources allocated to the application should be constrained by a maximum and, sometimes, a minimum value:

the maximum value could be the sum of resources physically available in the cluster, or a lower limit to allow the concurrent execution of other applications
an optional minimum value could be useful to avoid configurations that allocate executors that are both small and scarce

Spark optimization pack

To achieve these goals the optimization pack provides parameters that focus on the following areas:

Driver and executors' resources allocation
Parallelism
Shuffling
Spark SQL

Similarly, the bundled metrics provide visibility on the following statistics from the Spark History Server:

Execution time
Executors' resource usage
Garbage collection time

Component Types

Component Type

Description

Installing

Here’s the command to install the Spark optimization pack using the Akamas CLI:

akamas install optimization-pack Spark

Spark Application 2.4.0

This page describes the Optimization Pack for Spark Application 2.4.0.

Metrics

Duration

Metric

Unit

Desciption

Driver

Metric

Unit

Description

Executors

Metric

Unit

Description

Stages and Tasks

Metric

Unit

Description

Parameters

Execution

Parameter

Unit

Type

Default value

Domain

Restart

Description

CPU and Memory allocation

Parameter

Unit

Type

Default value

Domain

Restart

Description

Shuffling

Parameter

Unit

Type

Default value

Domain

Restart

Description

Dynamic allocation

Parameter

Unit

Type

Default value

Domain

Restart

Description

SQL

Parameter

Unit

Type

Default value

Domain

Restart

Description

Compression and Serialization

Parameter

Unit

Type

Default value

Domain

Restart

Description

Constraints

The following tables show a list of constraints that may be required in the definition of the study, depending on the tuned parameters:

Cluster size

The overall resources allocated to the application should be constrained by a maximum and, sometimes, a minimum value:

the maximum value could be the sum of resources physically available in the cluster, or a lower limit to allow the concurrent execution of other applications
an optional minimum value could be useful to avoid configurations that allocate executors that are both small and scarce

Spark Application 2.3.0

This page describes the Optimization Pack for Spark Application 2.3.0.

Metrics

Duration

Metric

Unit

Desciption

Driver

Metric

Unit

Description

Executors

Metric

Unit

Description

Stages and Tasks

Metric

Unit

Description

Parameters

Execution

Parameter

Unit

Type

Default value

Domain

Restart

Description

CPU and Memory allocation

Parameter

Unit

Type

Default value

Domain

Restart

Description

Shuffling

Parameter

Unit

Type

Default value

Domain

Restart

Description

Dynamic allocation

Parameter

Unit

Type

Default value

Domain

Restart

Description

SQL

Parameter

Unit

Type

Default value

Domain

Restart

Description

Compression and Serialization

Parameter

Unit

Type

Default value

Domain

Restart

Description

Constraints

The following tables show a list of constraints that may be required in the definition of the study, depending on the tuned parameters:

Cluster size

The overall resources allocated to the application should be constrained by a maximum and, sometimes, a minimum value:

the maximum value could be the sum of resources physically available in the cluster, or a lower limit to allow the concurrent execution of other applications
an optional minimum value could be useful to avoid configurations that allocate executors that are both small and scarce