> For the complete documentation index, see [llms.txt](https://docs.akamas.io/akamas-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.akamas.io/akamas-docs/3.1.2/using-akamas/using-optimization-packs/spark-pack.md).

# Optimizing Spark

When optimizing applications running on the Apache Spark framework, the goal is to find the configurations that best optimize the allocated resources or the execution time.

Please refer to the [Spark optimization pack](/akamas-docs/3.1.2/akamas-reference/optimization-packs/spark-pack.md) for the list of component types, parameters, metrics, and constraints.

### Workflows <a href="#workflow-design" id="workflow-design"></a>

#### Applying parameters <a href="#applying-parameters" id="applying-parameters"></a>

Akamas offers several operators that you can use to apply the parameters for the tuned Spark application. In particular, we suggest using the [Spark SSH Submit operator](/akamas-docs/3.1.2/akamas-reference/workflow-operators/sparksshsubmit-operator.md), which connects to a target instance to submit the application using the configuration parameters to test.

Other solutions include:

* the [Spark Livy Operator](/akamas-docs/3.1.2/akamas-reference/workflow-operators/sparklivy-operator.md), which allows submitting the application along with the configuration parameters using the [Livy Rest interface](https://livy.incubator.apache.org/docs/latest/rest-api.html)
* the standard [Executor operator](/akamas-docs/3.1.2/akamas-reference/workflow-operators/executor-operator.md), which allows running a custom command or script once the [FileConfigurator operator](/akamas-docs/3.1.2/akamas-reference/workflow-operators/fileconfigurator-operator.md) updated the default Spark configuration file or a custom one using a template.

#### A typical workflow <a href="#a-typical-workflow" id="a-typical-workflow"></a>

You can organize a typical workflow to optimize a Spark application in three parts:

1. Setup the test environment
   1. prepare any required input data
   2. apply the Spark configuration parameters, if you are going for a file-based solution
2. Execute the Spark application
3. Perform cleanup

Here’s an example of a typical workflow where Akamas executes the Spark application using the [Spark SSH Submit operator](/akamas-docs/3.1.2/akamas-reference/workflow-operators/sparksshsubmit-operator.md):

{% code lineNumbers="true" %}

```yaml
name: Spark workflow
tasks:
   - name: cwspark
     arguments:
        master: yarn
        deployMode: cluster
        file: /home/hadoop/scripts/pi.py
        args: [ 100 ]L
```

{% endcode %}

### Telemetry Providers <a href="#telemetry-providers" id="telemetry-providers"></a>

Akamas can access [Spark History Server](/akamas-docs/3.1.2/integrating-akamas/integrating-telemetry-providers/spark-history-server-provider.md) statistics using the [Spark History Server Provider](/akamas-docs/3.1.2/integrating-akamas/integrating-telemetry-providers/spark-history-server-provider.md). This provider maps the metrics in this optimization pack to the statistics provided by the History Server endpoint.

Here’s a configuration example for a telemetry provider instance:

{% code lineNumbers="true" %}

```yaml
provider: SparkHistoryServer
config:
  address: sparkmaster.akamas.io
  port: 18080
```

{% endcode %}

### Examples

See this [page](/akamas-docs/3.1.2/knowledge-base/optimizing-a-spark-application.md) for an example of a study leveraging the Spark pack.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.akamas.io/akamas-docs/3.1.2/using-akamas/using-optimization-packs/spark-pack.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.