The Spark History Server provider collects metrics from a Spark History Server instance and makes them available to Akamas.
Prerequisites
This section provides the minimum requirements that you should match before using the Spark History Server telemetry provider.
Apache Spark 2.3
Spark History Server API must be reachable at the provided address and port (the default port is 18080
).
spark-application
You can check Spark History Server provider metrics mapping to see how component-type metrics are extracted by this provider.
Versions < 2.0.0 are compatible with Akamas until version 1.8.0
Versions >= 2.0.0 are compatible with Akamas from version 1.9.0
This section lists the workflow operators this provider depends on:
Akamas uses components to identify specific elements of the system to be monitored and optimized. Your system might contain multiple components to model, for example, a Spark application and each host of the cluster. To point Akamas to the right component when extracting metrics you need to add a property called sparkApplication
to your Spark Application component. The provider will only extract metrics for components for which this property has been specified.
To install the Spark History Server provider, create a YAML file (called provider.yml
in this example) with the definition of the provider:
Then you can install the provider using the Akamas CLI:
The installed provider is shared with all users of your Akamas installation and can monitor many different systems, by configuring appropriate telemetry provider instances.
To create an instance of the Spark History Server provider, build a YAML file (instance.yml
in this example) with the definition of the instance:
Then you can create the instance for the system spark-system
using the Akamas CLI:
When you create an instance of the Spark History Server provider, you should specify some configuration information to allow the provider to correctly extract and process metrics from the Spark History server.
You can specify configuration information within the config
part of the YAML of the instance definition.
address
- hostname of the Spark History Server instance
The following YAML file describes the definition of a telemetry instance.
The following table reports the reference for the config
section within the definition of the Spark History Server provider instance:
This section reports common use cases addressed by this provider.
Check Spark Application page for a list of all Spark application metrics available in Akamas
This example shows how to configure a Spark History Server provider in order to collect performance metrics about a Spark application submitted to the cluster using the Spark SSH Submit operator.
As a first step, you need to create a YAML file (spark_instance.yml
) containing the configuration the provider needs to connect to the Spark History Server, plus the filter on the desired level of granularity for the imported metrics:
and then create the telemetry instance using the Akamas CLI:
Finally, you will need to define for your study a workflow that includes the submission of the Spark application to the cluster, in this case using the Spark SSH Submit operator:
This section reports common best practices you can adopt to ease the use of this telemetry provider.
configure metrics granularity: in order to reduce the collection time, configure the importLevel
to import metrics with a granularity no finer than the study requires.
wait for metrics publication: make sure in the workflow there is a few-minute interval between the end of the Spark application and the execution of the Spark telemetry instance, since the Spark History Server may take some time to complete the publication of the metrics.
Field | Type | Description | Default value | Restriction | Required |
---|---|---|---|---|---|
address
URL
Spark History Server address
Yes
importLevel
String
Granularity of the imported metrics
job
Allowed values: job
, stage
, task
No
port
Integer
Spark History Server listening port
18080
No