# Optimize cost of a Kubernetes deployment subject to Horizontal Pod Autoscaler

In this guide, you optimize the cost (or resource footprint) of a Kubernetes deployment where the number of replicas is controlled by the HPA. The study tunes both pod resource settings (CPU and memory requests and limits) and HPA options (target CPU utilization) at the same time, while also taking into account your application performance and reliability requirements (SLOs). This optimization happens in production, leveraging Akamas live optimization capabilities.

## Prerequisites

* an Akamas instance
* a Kubernetes cluster, with a deployment to be optimized
* a Horizontal Pod Autoscaler working on the desired deployment
* a supported telemetry data source configured to collect metrics from the target Kubernetes cluster (see [here](https://docs.akamas.io/akamas-docs/3.6/integrating/integrating-telemetry-providers) for the full list)
* a way to apply configuration changes recommended by Akamas to the target deployment and HPA. In this guide, Akamas interacts directly with the Kubernetes APIs via `kubectl.`You need a service account with permissions to update your deployment (see below for other integration options).

## Optimization setup

In this guide, we assume the following setup:

* the Kubernetes deployment to be optimized is called *frontend* (in the *hipster-shop* namespace)
* in the deployment, there is a container named *server*, where the app runs
* the HPA is called *frontend-hpa*
* both Dynatrace and Prometheus are used as observability tools

Let's set up the Akamas optimization for this use case.

### System

For this optimization, you need the following components to model the frontend tech stack:

* The Kubernetes Workload, Container and Pod components, containing metrics like CPU used for the different objects and parameters to be tuned like CPU limits at the container levels (from the [kubernetes-pack](https://docs.akamas.io/akamas-docs/3.6/reference/optimization-packs/kubernetes-pack "mention"))
* An HPA component, which contains HPA parameters like the target CPU utilization
* A Web Application component, which contains service-level metrics like throughput and response time of the microservice (from the [web-application](https://docs.akamas.io/akamas-docs/3.6/reference/optimization-packs/web-application-pack/web-application "mention")optimization pack)

Let's start by creating the system, which represents the Kubernetes deployment to be optimized. To create it, write a `system.yaml` manifest like this:

```
name: frontend
description: The frontend Kubernetes deployment
```

Then run:

```bash
akamas create system system.yaml
```

Now create the three Kubernetes components. Create a `workload.yaml` manifest like the following:

```
name: workload_frontend
description: The frontend Kubernetes workload
componentType: Kubernetes Workload
properties:
  prometheus:
    namespace: hipster-shop
    deployment: frontend
```

Then create a `container.yaml` manifest like the following:

```
name: server
description: The server Kubernetes container
componentType: Kubernetes Container
properties:
  prometheus:
    namespace: hipster-shop
    pod: frontend.*
    container: server
```

And a `pod.yaml` manifest like the following:

```
name: pod_frontend
description: The frontend Kubernetes pod
componentType: Kubernetes Pod
properties:
  prometheus:
    namespace: hipster-shop
    pod: frontend.*
```

Now create the entities by running:

```bash
akamas create component workload.yaml frontend-2
akamas create component container.yaml frontend-2
akamas create component pod.yaml frontend-2
```

Now create an `application.yaml` manifest like the following:

```
name: webapp
description: The web application of frontend deployment
componentType: Web Application
properties:
  dynatrace:
    id: SERVICE-80258F7AA97F2E4D
  prometheus:
    namespace: hipster-shop-2
    pod: frontend.*
    container: server

```

{% hint style="info" %}
Notice the component includes properties that specify how Dynatrace telemetry will look up this container in the Kubernetes cluster.

These properties are dependent upon the telemetry provider you are using. See the reference for the full list of supported providers and relative configurations.
{% endhint %}

The run:

```bash
akamas create component application.yaml frontend-2
```

Finally, create an`hpa.yaml` manifest like the following:

```
name: frontend_hpa
description: The HPA for the frontend
componentType: HPA
```

{% hint style="info" %}
The HPA component does not provide any metric, so we do not need to specify anything about the workload.
{% endhint %}

Then run:

```bash
akamas create component hpa.yaml frontend-2
```

### Workflow

To optimize a Kubernetes microservice in production, you need to create a workflow that defines how the new configuration recommended by Akamas will be deployed in production.

Let's explore the high-level tasks required in this scenario and the options you have to adapt it to your environment:

<details>

<summary>1) Update the Kubernetes deployment and HPA configurations</summary>

The first step is to update the Kubernetes deployment and HPA with the new configuration. This can be done in several ways depending on your environment and processes:

* A simple option is to let Akamas directly update the Kubernetes entities leveraging the Kubernetes APIs via `kubectl` commands.
* Another option is to follow an Infrastructure-as-code approach, where the configuration change is managed via pull requests to a Git repository, leveraging your pipelines to deploy the change in production.

In this guide, we take the first option and use the `kubectl patch` and `kubectl apply` commands to configure the new deployment and the HPA.

These commands are executed from the toolbox, an Akamas utility that can be enabled in an Akamas installation on Kubernetes. Make sure that `kubectl` is configured correctly to connect to your Kubernetes cluster and can update your target deployment. See [here](https://docs.akamas.io/akamas-docs/3.6/installing/management-container-pod) for more details.

</details>

<details>

<summary>2) Wait for the new deployment to be rolled out in production</summary>

In a live optimization, Akamas needs to understand when the new deployment rollout is complete and whether it was completed successfully or not. This is key information for Akamas AI to observe and optimize your applications safely.

This task can be done in several ways depending on how you manage changes, as discussed in the previous task:

* A simple option is to use the`kubectl rollout` command to wait for the deployment rollout completion. This is the approach used in this guide.
* Another option is to follow an Infrastructure-as-code approach, where a change is managed via pull requests to a Git repository, leveraging your pipelines to deploy in production. In this situation, the deployment process is executed externally and is not controlled by Akamas. Hence, the workflow task will periodically poll the Kubernetes deployment to recognize when the new deployment has landed in production.

</details>

<details>

<summary>3) Wait for the appropriate time to start the experiment</summary>

When dealing with the HPA, it is important that Akamas always observes the same timeframe.

If the configuration change requires too much time (e.g., because it requires a manual step), the akamas experiments will see a different workload pattern (e.g., we could observe the night instead of the day). This would make the analysis quite complex, especially for humans.

Albeit Akamas handles different workload patterns, it's always better to run each experiment on the same time slot, so that each configuration is evaluated against a similar workload pattern.

In this example we assume that we want to evaluate a new configuration every hour, hence we will insert a workload step that waits for the end of the current hour.

Typically, this depends on the configuration process of your application.

</details>

<details>

<summary>4) Observe how the application behaves with the new configuration</summary>

In a live optimization, Akamas simply needs to wait for a given observation interval, while the application works in production with the new configuration. Telemetry metrics will be collected during this observation period and will be analyzed by Akamas AI to recommend the next configuration.

Since we decided to evaluate a configuration every hour, we use a 55 minute observation interval, leaving 5 minutes for the configuration process.

</details>

Let's now create a `workflow.yaml` manifest like the following:

```
name: frontend-11-delayedApproval-hpa-1hour-system2
tasks:
  - name: configure frontend
    operator: FileConfigurator
    arguments:
      source:
        hostname: toolbox
        username: akamas
        key: /home/stefano/tmp_ak_key
        path: /work/examples/hipstershop-hpa/hipstershop-2/ak-frontend.sh.templ
      target:
        hostname: toolbox
        username: akamas
        key: /home/stefano/tmp_ak_key
        path: /work/ak-frontend-2.sh

  - name: apply frontend
    operator: Executor
    arguments:
      timeout: 5m
      host:
        hostname: toolbox
        username: akamas
        key: /home/stefano/tmp_ak_key
      command: sh /work/ak-frontend-2.sh hipster-shop-2 frontend

  - name: verify frontend
    operator: Executor
    arguments:
      timeout: 5m
      host:
        hostname: toolbox
        username: akamas
        key: /home/stefano/tmp_ak_key
      command: kubectl rollout status --timeout=5m deployment/frontend -n hipster-shop-2;

  - name: configure hpa
    operator: FileConfigurator
    arguments:
      source:
        hostname: toolbox
        username: akamas
        key: /home/stefano/tmp_ak_key
        path: /work/examples/hipstershop-hpa/hipstershop-2/frontend-hpa-v2.yaml.templ
      target:
        hostname: toolbox
        username: akamas
        key: /home/stefano/tmp_ak_key
        path: /work/frontend-hpa-v2-2.yaml

  - name: apply hpa
    operator: Executor
    arguments:
      timeout: 5m
      host:
        hostname: toolbox
        username: akamas
        key: /home/stefano/tmp_ak_key
      command: kubectl apply -f /work/frontend-hpa-v2-2.yaml -n hipster-shop-2

  - name: check if we are in time or wait for start of next hour
    operator: Executor
    arguments:
      host:
        hostname: toolbox
        username: akamas
        key: /home/stefano/tmp_ak_key
      command: if [ $(date +%M) -lt 55 ]; then sleep $((60*(60 - $(date +%M)))); else sleep 0; fi

  - name: observe 55 minutes
    operator: Sleep
    arguments:
      seconds: 3300
```

Then run:

```bash
akamas create workflow workflow.yaml
```

### Telemetry

To collect metrics of your target Kubernetes deployment, you create a telemetry instance based on your observability setup.

Create a `dynatrace.yaml`manifest like the following:

```
provider: Dynatrace
config:
  url: <YOUR_DYNATRACE_URL>
  token: <YOUR_DYNATRACE_TOKEN>
  pushEvents: false
```

Then run:

```bash
akamas create telemetry-instance dynatrace.yaml frontend-2
```

Create a `prometheus.yaml`manifest like the following:

```
provider: Prometheus
config:
  address: prom-kube-prometheus-stack-prometheus.monitoring
  port: 9090
  duration: 60
  logLevel: DETAILED
metrics:
  - metric: cost
    datasourceMetric: 'sum(kube_pod_container_resource_requests{resource="cpu" %FILTERS%})*29 + sum(kube_pod_container_resource_requests{resource="memory" %FILTERS%})/1024/1024/1024*3.2'
```

Then run:

```bash
akamas create telemetry-instance prometheus.yaml frontend-2
```

### Study

It's now time to create the Akamas study to achieve your optimization objectives.

Let's explore how the study is designed by going through the main concepts. The complete study manifest is available at the bottom.

<details>

<summary>Goal</summary>

Your overall objective is to reduce the cost (or resource footprint) of a Kubernetes deployment. To do that, you need to define the goal, which is a metric (or combination of metrics) representing the deployment cost to be minimized.

There are different approaches to measuring the cost of Kubernetes deployments:

* A simple approach is to consider that Kubernetes allocates infrastructure resources based on pod resource **requests** (CPU and memory). Hence, the cost of a deployment can be derived from the deployment aggregate CPU and memory requests. In this guide, we use this approach and define the study goal as the sum of CPU and memory requests of the container to be optimized.
* Alternatively, the cost of a Kubernetes deployment can also be collected from external data sources that provide actual cost metrics like OpenCost. In this case, the study goal can be defined by leveraging the cost metric. See here for more information on how to integrate cost metrics.

Notice that weighting factors can be used in the goal formula to specify the importance of CPU vs memory resources. For example, the cloud price of 1 CPU is about 9 times that of 1 GB of RAM. You can customize those weights based on your requirements so that Akamas knows how to truly reach the most cost-efficient configuration in your specific context.

</details>

<details>

<summary>Constraints</summary>

When optimizing for cost reduction (or resource footprint), it's key not to impact application response time or introduce risks of availability and reliability issues. To ensure this, you can define your performance and reliability requirements (SLOs) as metric constraints.

In this study:

* to ensure **application performance**, constraints are specified on application response times and error rate
* to ensure **application reliability**, constraints are specified on container peak CPU and memory utilization, and container out-of-memory kills

</details>

<details>

<summary>Parameters</summary>

To achieve cost-efficient and reliable microservices, Kubernetes container resources and HPA scaling options must be configured optimally and tuned jointly, as they are heavily interconnected.

To do that, the study includes the following parameters:

* Kubernetes container: CPU and memory requests and limits
* HPA target CPU utilization

The study also includes parameter constraints to ensure that recommended configurations are safe and comply with best practices. In particular:

* CPU limits must be at most 2x CPU requests, to avoid excessive over-commitment of CPU limits in the cluster.

Notice that the parameters and constraints can change depending on your policies. For example, it is a best practice to set memory requests == limits to avoid pod eviction, hence we are only tuning the memory limit in the study and set the request to the same value in the deployment file.

</details>

<details>

<summary>Workload</summary>

Akamas live optimization considers the application's workload to recommend new configurations that are optimal for the goal (e.g. reduce cost) while meeting all metric constraints (e.g., latency and error rates).

For Kubernetes microservices, the workload is typically the throughput (requests/sec) of the microservice API endpoints. This is the approach used in this guide.

</details>

<details>

<summary>Approval mode</summary>

In this live optimization, the manual approval is set to false, meaning that as soon as a new configuration gets generated, the workflow will be executed without any human involvement.

You can set it to true so that Akamas will ask for user approval when a new configuration gets generated. Once you approve it, the workflow will be executed, and the new configuration will be deployed to production according to the integration strategy you have defined above.

</details>

You can now create a `study.yaml` manifest like the following:

```
name: ak-frontend - live - system 2
system: frontend-2
workflow: frontend-11-delayedApproval-hpa-1hour-system2

goal:
  name: Cost
  objective: minimize
  function:
    formula: web_application.cost
  constraints:
    absolute:
      - name: Application response time degradation
        formula: web_application.requests_response_time_p50:p90 <= 60
      - name: Application error rate degradation
        formula: web_application.requests_error_rate:p90 <= 0.02
      - name: Container CPU saturation
        formula: server.container_cpu_util_max:p90 < 0.8
      - name: Container memory saturation
        formula: server.container_memory_used:max / server.container_memory_limit < 0.7

windowing:
  type: trim
  trim: [1m,  1m]
  task: observe 55 minutes

parametersSelection:
  - name: server.cpu_request
    domain: [10, 500]
  - name: server.cpu_limit
    domain: [10, 500]
  - name: server.memory_limit
    domain: [16, 640]
  - name: frontend_hpa.metrics_resource_target_averageUtilization
    domain: [10, 90]

parameterConstraints:
  - name: CPU request less or equal to limits
    formula: server.cpu_request <= server.cpu_limit
  - name: CPU limit within a given factor of request
    formula: server.cpu_limit <= server.cpu_request * 2

workloadsSelection:
  - name: web_application.requests_throughput:max
  - name: web_application.requests_throughput

numberOfTrials: 1
steps:
  - name: baseline
    type: baseline
    numberOfTrials: 3
    values:
      server.cpu_request: 200
      server.cpu_limit: 400
      server.memory_limit: 128
      frontend_hpa.metrics_resource_target_averageUtilization: 60
    renderParameters: [frontend_hpa.metrics_resource_target_averageUtilization]

  - name: optimize
    type: optimize
    numberOfExperiments: 300

```

Then run:

```bash
akamas create study study.yaml
```

You can now follow the live optimization progress and explore the results using the Akamas UI.
