Akamas Docs
3.2.1
3.2.1
  • How to use this documentation
  • Getting started with Akamas
    • Introduction to Akamas
    • Licensing
    • Deployment
      • Cloud Hosting
    • Security
    • Maintenance & Support (M&S) Services
      • Customer Support Services
      • Support levels for Customer Support Services
      • Support levels for software versions
      • Support levels with Akamas
  • Installing Akamas
    • Architecture
    • Docker compose installation
      • Prerequisites
        • Hardware Requirements
        • Software Requirements
        • Network requirements
      • Install Akamas dependencies
      • Install the Akamas Server
        • Online installation mode
          • Online installation behind a Proxy server
        • Offline installation mode
        • Changing UI Ports
        • Setup HTTPS configuration
      • Troubleshoot Docker installation issues
    • Kubernetes installation
      • Prerequisites
        • Cluster Requirements
        • Software Requirements
      • Install Akamas
        • Online Installation
        • Offline Installation
      • Accessing Akamas
      • HTTPS configuration
    • Install the CLI
      • Setup the CLI
      • Initialize the CLI
      • Change CLI configuration
      • Use a proxy server
    • Verify the installation
    • Management container/pod
    • Install the license
    • Manage anonymous data collection
    • Manage Akamas
      • Akamas logs
      • Audit logs
      • Install upgrades and patches
      • Monitor the Akamas Server
      • Backup & Recover of the Akamas Server
  • Using Akamas
    • General optimization process and methodology
    • Preparing optimization studies
      • Modeling systems
      • Modeling components
        • Creating custom optimization packs
        • Managing optimization packs
      • Creating telemetry instances
      • Creating automation workflows
        • Creating workflows for offline studies
        • Performing load testing to support optimization activities
        • Creating workflows for live optimizations
      • Creating optimization studies
        • Defining optimization goal & constraints
        • Defining windowing policies
        • Defining KPIs
        • Defining parameters & metrics
        • Defining workloads
        • Defining optimization steps
        • Setting safety policies
    • Running optimization studies
      • Before running optimization studies
      • Analyzing results of offline optimization studies
        • Optimization Insights
      • Analyzing results of live optimization studies
      • Before applying optimization results
    • Guidelines for choosing optimization parameters
      • Guidelines for Kubernetes
      • Guidelines for JVM layer (OpenJDK)
      • Guidelines for JVM (OpenJ9)
      • Guidelines for Oracle Database
      • Guidelines for PostgreSQL
    • Guidelines for defining optimization studies
      • Optimizing Linux
      • Optimizing Java OpenJDK
      • Optimizing OpenJ9
      • Optimizing Web Applications
      • Optimizing Kubernetes
      • Optimizing Spark
      • Optimizing Oracle Database
      • Optimizing MongoDB
      • Optimizing MySQL Database
      • Optimizing PostgreSQL
  • Integrating Akamas
    • Integrating Telemetry Providers
      • CSV provider
        • Install CSV provider
        • Create CSV telemetry instances
      • Dynatrace provider
        • Install Dynatrace provider
        • Create Dynatrace telemetry instances
      • Prometheus provider
        • Install Prometheus provider
        • Create Prometheus telemetry instances
        • CloudWatch Exporter
        • OracleDB Exporter
      • Spark History Server provider
        • Install Spark History Server provider
        • Create Spark History Server telemetry instances
      • NeoLoadWeb provider
        • Install NeoLoadWeb telemetry provider
        • Create NeoLoadWeb telemetry instances
      • LoadRunner Professional provider
        • Install LoadRunner Professional provider
        • Create LoadRunner Professional telemetry instances
      • LoadRunner Enterprise provider
        • Install LoadRunner Enterprise provider
        • Create LoadRunner Enterprise telemetry instances
      • AWS provider
        • Install AWS provider
        • Create AWS telemetry instances
    • Integrating Configuration Management
    • Integrating Value Stream Delivery
    • Integrating Load Testing
      • Integrating NeoLoad
      • Integrating Load Runner Professional
      • Integrating LoadRunner Enterprise
  • Akamas Reference
    • Glossary
      • System
      • Component
      • Metric
      • Parameter
      • Component Type
      • Workflow
      • Telemetry Provider
      • Telemetry Instance
      • Optimization Pack
      • Goals & Constraints
      • KPI
      • Optimization Study
      • Offline Optimization Study
      • Live Optimization Study
      • Workspace
    • Construct templates
      • System template
      • Component template
      • Parameter template
      • Metric template
      • Component Types template
      • Telemetry Provider template
      • Telemetry Instance template
      • Workflows template
      • Study template
        • Goal & Constraints
        • Windowing policy
          • Trim windowing
          • Stability windowing
        • Parameter selection
        • Metric selection
        • Workload selection
        • KPIs
        • Steps
          • Baseline step
          • Bootstrap step
          • Preset step
          • Optimize step
        • Parameter rendering
        • Optimizer Options
    • Workflow Operators
      • General operator arguments
      • Executor Operator
      • FileConfigurator Operator
      • LinuxConfigurator Operator
      • WindowsExecutor Operator
      • WindowsFileConfigurator Operator
      • Sleep Operator
      • OracleExecutor Operator
      • OracleConfigurator Operator
      • SparkSSHSubmit Operator
      • SparkSubmit Operator
      • SparkLivy Operator
      • NeoLoadWeb Operator
      • LoadRunner Operator
      • LoadRunnerEnteprise Operator
    • Telemetry metric mapping
      • Dynatrace metrics mapping
      • Prometheus metrics mapping
      • NeoLoadWeb metrics mapping
      • Spark History Server metrics mapping
      • LoadRunner metrics mapping
    • Optimization Packs
      • Linux optimization pack
        • Amazon Linux
        • Amazon Linux 2
        • Amazon Linux 2022
        • CentOS 7
        • CentOS 8
        • RHEL 7
        • RHEL 8
        • Ubuntu 16.04
        • Ubuntu 18.04
        • Ubuntu 20.04
      • DotNet optimization pack
        • DotNet Core 3.1
      • Java OpenJDK optimization pack
        • Java OpenJDK 8
        • Java OpenJDK 11
      • OpenJ9 optimization pack
        • IBM J9 VM 6
        • IBM J9 VM 8
        • Eclipse Open J9 11
      • NodeJS optimization pack
        • NodeJS
      • GO optimization pack
        • GO 1
      • Web Application optimization pack
        • Web Application
      • Docker optimization pack
        • Container
      • Kubernetes optimization pack
        • Kubernetes Pod
        • Kubernetes Container
        • Kubernetes Workload
        • Kubernetes Namespace
        • Kubernetes Cluster
      • WebSphere optimization pack
        • WebSphere 8.5
        • WebSphere Liberty ND
      • AWS optimization pack
        • EC2
        • Lambda
      • PostgreSQL optimization pack
        • PostgreSQL 11
        • PostgreSQL 12
      • Cassandra optimization pack
        • Cassandra
      • MySQL Database optimization pack
        • MySQL 8.0
      • Oracle Database optimization pack
        • Oracle Database 12c
        • Oracle Database 18c
        • Oracle Database 19c
        • RDS Oracle Database 11g
        • RDS Oracle Database 12c
      • MongoDB optimization pack
        • MongoDB 4
        • MongoDB 5
      • Elasticsearch optimization pack
        • Elasticsearch 6
      • Spark optimization pack
        • Spark Application 2.2.0
        • Spark Application 2.3.0
        • Spark Application 2.4.0
    • Command Line commands
      • Administration commands
      • User and Workspace management commands
      • Authentication commands
      • Resource management commands
      • Optimizer options commands
    • Release Notes
  • Knowledge Base
    • Setting up a Konakart environment for testing Akamas
    • Modeling a sample Java-based e-commerce application (Konakart)
    • Optimizing a web application
    • Optimizing a sample Java OpenJ9 application
    • Optimizing a sample Java OpenJDK application
    • Optimizing a sample Linux system
    • Optimizing a MongoDB server instance
    • Optimizing a Kubernetes application
    • Leveraging Ansible to automate AWS instance management
    • Guidelines for optimizing AWS EC2 instances
    • Optimizing a sample application running on AWS
    • Optimizing a Spark application
    • Optimizing an Oracle Database server instance
    • Optimizing an Oracle Database for an e-commerce service
    • Guidelines for optimizing Oracle RDS
    • Optimizing a MySQL server database running Sysbench
    • Optimizing a MySQL server database running OLTPBench
    • Optimizing cost of a Kubernetes application while preserving SLOs in production
    • Optimizing a live full-stack deployment (K8s + JVM)
  • Akamas Free Trial
Powered by GitBook
On this page
  • Metrics
  • Duration
  • Driver
  • Executors
  • Stages and Tasks
  • Parameters
  • Execution
  • CPU and Memory allocation
  • Shuffling
  • Dynamic allocation
  • SQL
  • Compression and Serialization
  • Constraints
  • Cluster size

Was this helpful?

Export as PDF
  1. Akamas Reference
  2. Optimization Packs
  3. Spark optimization pack

Spark Application 2.3.0

This page describes the Optimization Pack for Spark Application 2.3.0.

Metrics

Duration

Metric
Unit
Desciption

spark_application_duration

milliseconds

The duration of the Spark application

spark_job_duration

milliseconds

The duration of the job

spark_stage_duration

milliseconds

The duration of the stage

spark_task_duration

milliseconds

The duration of the task

Driver

Metric
Unit
Description

spark_driver_rdd_blocks

blocks

The total number of persisted RDD blocks for the driver

spark_driver_mem_used

bytes

The total amount of memory used by the driver

spark_driver_disk_used

bytes

The total amount of disk used for RDDs by the driver

spark_driver_cores

cores

The total number of concurrent tasks that can be run by the driver

spark_driver_total_input_bytes

bytes

The total number of bytes read from RDDs or persisted data by the driver

spark_driver_total_tasks

tasks

The total number of tasks run for each the driver

spark_driver_total_duration

milliseconds

The total amount of time spent by the driver running tasks

spark_driver_max_mem_used

bytes

The maximum amount of memory used by the driver

spark_driver_total_jvm_gc_duration

milliseconds

The total amount of time spent by the driver's JVM doing garbage across all tasks

spark_driver_total_shuffle_read

bytes

The total number of bytes read during a shuffle by the driver

spark_driver_total_shuffle_write

bytes

The total number of bytes written in shuffle operations by the driver

spark_driver_used_on_heap_storage_memory

bytes

The amount of on-heap memory used by the driver

spark_driver_used_off_heap_storage_memory

bytes

The amount of off-heap memory used by the driver

spark_driver_total_on_heap_storage_memory

bytes

The total amount of available on-heap memory for the driver

spark_driver_total_off_heap_storage_memory

bytes

The total amount of available off-heap memory for the driver

Executors

Metric
Unit
Description

spark_executor_max_count

executors

The maximum number of executors used for the application

spark_executor_rdd_blocks

blocks

The total number of persisted RDD blocks for each executor

spark_executor_mem_used

bytes

The total amount of memory used by each executor

spark_executor_disk_used

bytes

The total amount of disk used for RDDs by each executor

spark_executor_cores

cores

The number of cores used by each executor

spark_executor_total_input_bytes

bytes

The total number of bytes read from RDDs or persisted data by each executor

spark_executor_total_tasks

tasks

The total number of tasks run for each the executor

spark_executor_total_duration

milliseconds

The total amount of time spent by each executor running tasks

spark_executor_max_mem_used

bytes

The maximum amount of memory used by each executor

spark_executor_total_jvm_gc_duration

milliseconds

The total amount of time spent by each executor's JVM doing garbage collection across all tasks

spark_executor_total_shuffle_read

bytes

The total number of bytes read during a shuffle by each executor

spark_executor_total_shuffle_write

bytes

The total number of bytes written in shuffle operations by each executor

spark_executor_used_on_heap_storage_memory

bytes

The amount of on-heap memory used by each executor

spark_executor_used_off_heap_storage_memory

bytes

The amount of off-heap memory used by each executor

spark_executor_total_on_heap_storage_memory

bytes

The total amount of available on-heap memory for each executor

spark_executor_total_off_heap_storage_memory

bytes

The total amount of available off-heap memory for each executor

Stages and Tasks

Metric
Unit
Description

spark_stage_shuffle_read_bytes

bytes

The total number of bytes read in shuffle operations by each stage

spark_task_jvm_gc_duration

milliseconds

The total duration of JVM garbage collections for each task

spark_task_peak_execution_memory

bytes

The sum of the peak sizes across internal data structures created for each task

spark_task_result_size

bytes

The size of the result of the computation of each task

spark_task_result_serialization_time

milliseconds

The time spent by each task serializing the computation result

spark_task_shuffle_read_fetch_wait_time

milliseconds

The time spent by each task waiting for remote shuffle blocks

spark_task_shuffle_read_local_blocks_fetched

blocks

The total number of local blocks fetched in shuffle operations by each task

spark_task_shuffle_read_local_bytes

bytes

The total number of bytes read in shuffle operations from local disk by each task

spark_task_shuffle_read_remote_blocks_fetched

blocks

The total number of remote blocks fetched in shuffle operations by each task

spark_task_shuffle_read_remote_bytes

bytes

The total number of remote bytes read in shuffle operations by each task

spark_task_shuffle_read_remote_bytes_to_disk

bytes

The total number of remote bytes read to disk in shuffle operations by each task

spark_task_shuffle_write_time

nanoseconds

The time spent by each task writing data on disk or on buffer caches during shuffle operations

spark_task_executor_deserialize_time

nanoseconds

The time spent by the executor deserializing the task

spark_task_executor_deserialize_cpu_time

nanoseconds

The CPU time spent by the executor deserializing the task

spark_task_stage_shuffle_write_records

records

The total number of records written in shuffle operations broken down by task and stage

spark_task_stage_shuffle_write_bytes

records

The total number of bytes written in shuffle operations broken down by task and stage

spark_task_stage_shuffle_read_records

records

The total number of records read in shuffle operations broken down by task and stage

spark_task_stage_disk_bytes_spilled

bytes

The total number of bytes spilled on disk broken down by task and stage

spark_task_stage_memory_bytes_spilled

bytes

The total number of bytes spilled on memory broken down by task and stage

spark_task_stage_input_bytes_read

bytes

The total number of bytes read, broken down by task and stage

spark_task_stage_input_records_read

records

The total number of records read, broken down by task and stage

spark_task_stage_output_bytes_written

bytes

The total number of bytes written, broken down by task and stage

spark_task_stage_output_records_written

records

The total number of records written, broken down by task and stage

spark_task_stage_executor_run_time

nanoseconds

The time spent by each executor actually running tasks (including fetching shuffle data) broken down by task, stage and executor

spark_task_stage_executor_cpu_time

nanoseconds

The CPU time spent by each executor actually running each task (including fetching shuffle data) broken down by task and stage

Parameters

Execution

Parameter
Unit
Type
Default value
Domain
Restart
Description

driverCores

integer

cores

You should select your own default

You should select your own domain

yes

The number of CPU cores assigned to the driver in cluster deploy mode.

numExecutors

integer

executors

You should select your own default

You should select your own domain

yes

Number of executors to use. YARN only.

totalExecutorCores

integer

cores

You should select your own default

You should select your own domain

yes

Total number of cores for the application. Spark standalone and Mesos only.

executorCores

integer

cores

You should select your own default

You should select your own domain

yes

Number of CPU cores for an executor. Spark standalone and YARN only.

defaultParallelism

integer

partitions

You should select your own default

You should select your own domain

yes

Default number of partitions in RDDs returned by transformations like join, reduceByKey, and parallelize when not set by user.

broadcastBlockSize

integer

kilobytes

4096

256 → 131072

yes

Size of each piece of a block for TorrentBroadcastFactory.

schedulerMode

categorical

FIFO

FIFO, FAIR

yes

Define the scheduling strategy across jobs.

CPU and Memory allocation

Parameter
Unit
Type
Default value
Domain
Restart
Description

driverMemory

integer

megabytes

You should select your own default

You should select your own domain

yes

Amount of memory to use for the driver process.

yarnDriverMemoryOverhead

integer

megabytes

384

384 → 65536

yes

Off-heap memory to be allocated per driver in cluster mode. Currently supported in YARN and Kubernetes.

executorMemory

integer

megabytes

You should select your own default

You should select your own domain

yes

Amount of memory to use per executor.

yarnExecutorMemoryOverhead

integer

megabytes

384

384 → 65536

yes

Off-heap memory to be allocated per executor. Currently supported in YARN and Kubernetes.

memoryOffHeapEnabled

categorical

false

true, false

yes

If true, Spark will attempt to use off-heap memory for certain operations.

memoryOffHeapSize

integer

megabytes

0

0 → 16384

yes

The absolute amount of memory in bytes which can be used for off-heap allocation.

Shuffling

Parameter
Unit
Type
Default value
Domain
Restart
Description

reducerMaxSizeInFlight

integer

megabytes

48

1 → 1024

yes

Maximum size of map outputs to fetch simultaneously from each reduce task in MB.

shuffleFileBuffer

integer

kilobytes

32

1 → 2048

yes

Size of the in-memory buffer for each shuffle file output stream in KB.

shuffleCompress

categorical

true

true, false

yes

Whether to compress map output files.

shuffleServiceEnabled

categorical

true

true, false

yes

Enables the external shuffle service. This service preserves the shuffle files written by executors so the executors can be safely removed.

Dynamic allocation

Parameter
Unit
Type
Default value
Domain
Restart
Description

dynamicAllocationEnabled

categorical

true

true, false

yes

Whether to use dynamic resource allocation, which scales the number of executors registered with this application up and down based on the workload. Requires spark.shuffle.service.enabled to be set.

dynamicAllocationExecutorIdleTimeout

integer

60

1 → 3600

yes

If dynamic allocation is enabled and an executor has been idle for more than this duration, the executor will be removed.

dynamicAllocationInitialExecutors

integer

executors

You should select your own default

You should select your own domain

yes

Initial number of executors to run if dynamic allocation is enabled.

dynamicAllocationMinExecutors

integer

executors

You should select your own default

You should select your own domain

yes

Lower bound for the number of executors if dynamic allocation is enabled.

dynamicAllocationMaxExecutors

integer

executors

You should select your own default

You should select your own domain

yes

Upper bound for the number of executors if dynamic allocation is enabled.

SQL

Parameter
Unit
Type
Default value
Domain
Restart
Description

sqlInMemoryColumnarStorageCompressed

categorical

true

true, false

yes

When set to true Spark SQL will automatically select a compression codec for each column based on statistics of the data.

sqlInMemoryColumnarStorageBatchSize

integer

records

1000

1 → 100000

yes

Controls the size of batches for columnar caching. Larger batch sizes can improve memory utilization and compression, but risk OOMs when caching data.

sqlFilesMaxPartitionBytes

integer

bytes

134217728

1024 → 1073741824

yes

The maximum number of bytes to pack into a single partition when reading files.

sqlFilesOpenCostInBytes

integer

bytes

4194304

262144 → 67108864

yes

The estimated cost to open a file, measured by the number of bytes could be scanned in the same time. This is used when putting multiple files into a partition.

Compression and Serialization

Parameter
Unit
Type
Default value
Domain
Restart
Description

compressionLz4BlockSize

integer

bytes

32

8 → 1024

yes

Block size in bytes used in LZ4 compression.

serializer

categorical

org.apache.spark.serializer.KryoSerializer

org.apache.spark.serializer.JavaSerializer, org.apache.spark.serializer.KryoSerializer

yes

Class to use for serializing objects that will be sent over the network or need to be cached in serialized form.

kryoserializerBuffer

integer

bytes

64

8 → 1024

yes

Initial size of Kryo's serialization buffer. Note that there will be one buffer per core on each worker.

Constraints

The following tables show a list of constraints that may be required in the definition of the study, depending on the tuned parameters:

Cluster size

The overall resources allocated to the application should be constrained by a maximum and, sometimes, a minimum value:

  • the maximum value could be the sum of resources physically available in the cluster, or a lower limit to allow the concurrent execution of other applications

  • an optional minimum value could be useful to avoid configurations that allocate executors that are both small and scarce

driverMemory + executorMemory * numExecutors < MEMORY_CAP

The overall allocated memory should not exceed the specified limit

driverCores + executorCores * numExecutors < CPU_CAP

The overall allocated CPUs should not exceed the specified limit

driverMemory + executorMemory * numExecutors > MIN_MEMORY

The overall allocated memory should not exceed the specified limit

driverCores + executorCores * numExecutors > MIN_CPUS

The overall allocated CPUs should not exceed the specified limit

Last updated 1 year ago

Was this helpful?