# Create Spark History Server telemetry instances

## Create a telemetry instance <a href="#create-a-telemetry-instance" id="create-a-telemetry-instance"></a>

To create an instance of the Spark History Server provider, build a YAML file (`instance.yml` in this example) with the definition of the instance:

```yaml
provider: SparkHistoryServer
config:
  address: spark_master_node
  port: 18080
  importLevel: stage
```

Then you can create the instance for the system `spark-system` using the Akamas CLI:

```bash
akamas create telemetry-instance instance.yml spark-system
```

### Configuration options <a href="#configuration-options" id="configuration-options"></a>

When you create an instance of the Spark History Server provider, you should specify some configuration information to allow the provider to correctly extract and process metrics from the Spark History server.

You can specify configuration information within the `config` part of the YAML of the instance definition.

#### Required properties <a href="#required-properties" id="required-properties"></a>

* `address` - hostname of the Spark History Server instance

### Telemetry instance reference <a href="#telemetry-instance-reference" id="telemetry-instance-reference"></a>

The following YAML file describes the definition of a telemetry instance.

```yaml
provider: SparkHistoryServer  # This is an instance of the Spark History Server provider
config:
  address: spark_master_node # The adress of Spark History Server
  port: 18080   # The port of Spark History Server
  importLevel: job  # The granularity of the imported metrics
```

The following table reports the reference for the `config` section within the definition of the Spark History Server provider instance:

| Field         | Type    | Description                         | Default value | Restriction                            | Required |
| ------------- | ------- | ----------------------------------- | ------------- | -------------------------------------- | -------- |
| `address`     | URL     | Spark History Server address        |               |                                        | Yes      |
| `importLevel` | String  | Granularity of the imported metrics | `job`         | Allowed values: `job`, `stage`, `task` | No       |
| `port`        | Integer | Spark History Server listening port | `18080`       |                                        | No       |

## Use cases <a href="#use-cases" id="use-cases"></a>

This section reports common use cases addressed by this provider.

### Collect stage metrics of a Spark Application <a href="#collect-stage-metrics-of-a-spark-application" id="collect-stage-metrics-of-a-spark-application"></a>

Check [Spark Application page](https://docs.akamas.io/akamas-docs/3.6/reference/optimization-packs/spark-pack) for a list of all Spark application metrics available in Akamas

This example shows how to configure a Spark History Server provider in order to collect performance metrics about a Spark application submitted to the cluster using the [Spark SSH Submit](https://docs.akamas.io/akamas-docs/3.6/reference/workflow-operators/sparksshsubmit-operator) operator.

As a first step, you need to create a YAML file (`spark_instance.yml`) containing the configuration the provider needs to connect to the Spark History Server, plus the filter on the desired level of granularity for the imported metrics:

```yaml
provider: SparkHistoryServer
config:
  address: spark_master_node
  port: 18080
  importLevel: stage
```

and then create the telemetry instance using the Akamas CLI:

```bash
akamas create telemetry-instance spark_instance.yml
```

Finally, you will need to define for your study a workflow that includes the submission of the Spark application to the cluster, in this case using the [Spark SSH Submit operator](https://docs.akamas.io/akamas-docs/3.6/reference/workflow-operators/sparksshsubmit-operator):

```yaml
name: spark_workflow
tasks:
  - name: Run Spark application
    operator: SSHSparkSubmit
    arguments:
      component: spark
```

## Best practices <a href="#best-practices" id="best-practices"></a>

This section reports common best practices you can adopt to ease the use of this telemetry provider.

* **configure metrics granularity**: in order to reduce the collection time, configure the `importLevel` to import metrics with a granularity no finer than the study requires.
* **wait for metrics publication**: make sure in the workflow there is a few-minute interval between the end of the Spark application and the execution of the Spark telemetry instance, since the Spark History Server may take some time to complete the publication of the metrics.
