In this guide, you optimize the cost (or resource footprint) of a Kubernetes deployment where the number of replicas is controlled by the HPA. The study tunes both pod resource settings (CPU and memory requests and limits) and HPA options (target CPU utilization) at the same time, while also taking into account your application performance and reliability requirements (SLOs). This optimization happens in production, leveraging Akamas live optimization capabilities.
an Akamas instance
a Kubernetes cluster, with a deployment to be optimized
a Horizontal Pod Autoscaler working on the desired deployment
a supported telemetry data source configured to collect metrics from the target Kubernetes cluster (see for the full list)
a way to apply configuration changes recommended by Akamas to the target deployment and HPA. In this guide, Akamas interacts directly with the Kubernetes APIs via kubectl.You need a service account with permissions to update your deployment (see below for other integration options).
In this guide, we assume the following setup:
the Kubernetes deployment to be optimized is called frontend (in the hipster-shop namespace)
in the deployment, there is a container named server, where the app runs
the HPA is called frontend-hpa
Let's set up the Akamas optimization for this use case.
For this optimization, you need the following components to model the frontend tech stack:
The Kubernetes Workload, Container and Pod components, containing metrics like CPU used for the different objects and parameters to be tuned like CPU limits at the container levels (from the )
An HPA component, which contains HPA parameters like the target CPU utilization
A Web Application component, which contains service-level metrics like throughput and response time of the microservice (from the optimization pack)
Let's start by creating the system, which represents the Kubernetes deployment to be optimized. To create it, write a system.yaml manifest like this:
Then run:
Now create the three Kubernetes components. Create a workload.yaml manifest like the following:
Then create a container.yaml manifest like the following:
And a pod.yaml manifest like the following:
Now create the entities by running:
Now create an application.yaml manifest like the following:
The run:
Finally, create anhpa.yaml manifest like the following:
Then run:
To optimize a Kubernetes microservice in production, you need to create a workflow that defines how the new configuration recommended by Akamas will be deployed in production.
Let's explore the high-level tasks required in this scenario and the options you have to adapt it to your environment:
Let's now create a workflow.yaml manifest like the following:
Then run:
To collect metrics of your target Kubernetes deployment, you create a telemetry instance based on your observability setup.
Create a dynatrace.yamlmanifest like the following:
Then run:
Create a prometheus.yamlmanifest like the following:
Then run:
It's now time to create the Akamas study to achieve your optimization objectives.
Let's explore how the study is designed by going through the main concepts. The complete study manifest is available at the bottom.
You can now create a study.yaml manifest like the following:
Then run:
You can now follow the live optimization progress and explore the results using the Akamas UI.
In this guide, we take the first option and use the kubectl patch and kubectl apply commands to configure the new deployment and the HPA.
These commands are executed from the toolbox, an Akamas utility that can be enabled in an Akamas installation on Kubernetes. Make sure that kubectl is configured correctly to connect to your Kubernetes cluster and can update your target deployment. See here for more details.
Another option is to follow an Infrastructure-as-code approach, where a change is managed via pull requests to a Git repository, leveraging your pipelines to deploy in production. In this situation, the deployment process is executed externally and is not controlled by Akamas. Hence, the workflow task will periodically poll the Kubernetes deployment to recognize when the new deployment has landed in production.
Alternatively, the cost of a Kubernetes deployment can also be collected from external data sources that provide actual cost metrics like OpenCost. In this case, the study goal can be defined by leveraging the cost metric. See here for more information on how to integrate cost metrics.
Notice that weighting factors can be used in the goal formula to specify the importance of CPU vs memory resources. For example, the cloud price of 1 CPU is about 9 times that of 1 GB of RAM. You can customize those weights based on your requirements so that Akamas knows how to truly reach the most cost-efficient configuration in your specific context.
CPU limits must be at most 2x CPU requests, to avoid excessive over-commitment of CPU limits in the cluster.
Notice that the parameters and constraints can change depending on your policies. For example, it is a best practice to set memory requests == limits to avoid pod eviction, hence we are only tuning the memory limit in the study and set the request to the same value in the deployment file.
name: frontend
description: The frontend Kubernetes deploymentakamas create system system.yamlname: workload_frontend
description: The frontend Kubernetes workload
componentType: Kubernetes Workload
properties:
prometheus:
namespace: hipster-shop
deployment: frontendname: server
description: The server Kubernetes container
componentType: Kubernetes Container
properties:
prometheus:
namespace: hipster-shop
pod: frontend.*
container: servername: pod_frontend
description: The frontend Kubernetes pod
componentType: Kubernetes Pod
properties:
prometheus:
namespace: hipster-shop
pod: frontend.*akamas create component workload.yaml frontend-2
akamas create component container.yaml frontend-2
akamas create component pod.yaml frontend-2name: webapp
description: The web application of frontend deployment
componentType: Web Application
properties:
dynatrace:
id: SERVICE-80258F7AA97F2E4D
prometheus:
namespace: hipster-shop-2
pod: frontend.*
container: server
akamas create component application.yaml frontend-2name: frontend_hpa
description: The HPA for the frontend
componentType: HPAakamas create component hpa.yaml frontend-2name: frontend-11-delayedApproval-hpa-1hour-system2
tasks:
- name: configure frontend
operator: FileConfigurator
arguments:
source:
hostname: toolbox
username: akamas
key: /home/stefano/tmp_ak_key
path: /work/examples/hipstershop-hpa/hipstershop-2/ak-frontend.sh.templ
target:
hostname: toolbox
username: akamas
key: /home/stefano/tmp_ak_key
path: /work/ak-frontend-2.sh
- name: apply frontend
operator: Executor
arguments:
timeout: 5m
host:
hostname: toolbox
username: akamas
key: /home/stefano/tmp_ak_key
command: sh /work/ak-frontend-2.sh hipster-shop-2 frontend
- name: verify frontend
operator: Executor
arguments:
timeout: 5m
host:
hostname: toolbox
username: akamas
key: /home/stefano/tmp_ak_key
command: kubectl rollout status --timeout=5m deployment/frontend -n hipster-shop-2;
- name: configure hpa
operator: FileConfigurator
arguments:
source:
hostname: toolbox
username: akamas
key: /home/stefano/tmp_ak_key
path: /work/examples/hipstershop-hpa/hipstershop-2/frontend-hpa-v2.yaml.templ
target:
hostname: toolbox
username: akamas
key: /home/stefano/tmp_ak_key
path: /work/frontend-hpa-v2-2.yaml
- name: apply hpa
operator: Executor
arguments:
timeout: 5m
host:
hostname: toolbox
username: akamas
key: /home/stefano/tmp_ak_key
command: kubectl apply -f /work/frontend-hpa-v2-2.yaml -n hipster-shop-2
- name: check if we are in time or wait for start of next hour
operator: Executor
arguments:
host:
hostname: toolbox
username: akamas
key: /home/stefano/tmp_ak_key
command: if [ $(date +%M) -lt 55 ]; then sleep $((60*(60 - $(date +%M)))); else sleep 0; fi
- name: observe 55 minutes
operator: Sleep
arguments:
seconds: 3300akamas create workflow workflow.yamlprovider: Dynatrace
config:
url: <YOUR_DYNATRACE_URL>
token: <YOUR_DYNATRACE_TOKEN>
pushEvents: falseakamas create telemetry-instance dynatrace.yaml frontend-2provider: Prometheus
config:
address: prom-kube-prometheus-stack-prometheus.monitoring
port: 9090
duration: 60
logLevel: DETAILED
metrics:
- metric: cost
datasourceMetric: 'sum(kube_pod_container_resource_requests{resource="cpu" %FILTERS%})*29 + sum(kube_pod_container_resource_requests{resource="memory" %FILTERS%})/1024/1024/1024*3.2'akamas create telemetry-instance prometheus.yaml frontend-2name: ak-frontend - live - system 2
system: frontend-2
workflow: frontend-11-delayedApproval-hpa-1hour-system2
goal:
name: Cost
objective: minimize
function:
formula: web_application.cost
constraints:
absolute:
- name: Application response time degradation
formula: web_application.requests_response_time_p50:p90 <= 60
- name: Application error rate degradation
formula: web_application.requests_error_rate:p90 <= 0.02
- name: Container CPU saturation
formula: server.container_cpu_util_max:p90 < 0.8
- name: Container memory saturation
formula: server.container_memory_used:max / server.container_memory_limit < 0.7
windowing:
type: trim
trim: [1m, 1m]
task: observe 55 minutes
parametersSelection:
- name: server.cpu_request
domain: [10, 500]
- name: server.cpu_limit
domain: [10, 500]
- name: server.memory_limit
domain: [16, 640]
- name: frontend_hpa.metrics_resource_target_averageUtilization
domain: [10, 90]
parameterConstraints:
- name: CPU request less or equal to limits
formula: server.cpu_request <= server.cpu_limit
- name: CPU limit within a given factor of request
formula: server.cpu_limit <= server.cpu_request * 2
workloadsSelection:
- name: web_application.requests_throughput:max
- name: web_application.requests_throughput
numberOfTrials: 1
steps:
- name: baseline
type: baseline
numberOfTrials: 3
values:
server.cpu_request: 200
server.cpu_limit: 400
server.memory_limit: 128
frontend_hpa.metrics_resource_target_averageUtilization: 60
renderParameters: [frontend_hpa.metrics_resource_target_averageUtilization]
- name: optimize
type: optimize
numberOfExperiments: 300
akamas create study study.yaml