Optimizing cost of a Kubernetes application while preserving SLOs in production
In this example, you will use Akamas live optimization to minimize the cost of a Kubernetes deployment, while preserving application performance and reliability requirements.
In this example, you need:
- an Akamas instance
- a Kubernetes cluster, with a deployment to be optimized
- the
kubectl
command installed in the Akamas instance, configured to access the target Kubernetes and with privileges to get and update the deployment configurations - a supported telemetry data source (e.g. Prometheus or Dynatrace) configured to collect metrics from the target Kubernetes cluster
This example leverages the following optimization packs:
The system represents the Kubernetes deployment to be optimized (let's call it "frontend"). You can create a
system.yaml
manifest like this:name: frontend
description: Kubernetes frontend deployment
Create the new system resource:
akamas create system system.yaml
The system will then have two components:
- A Kubernetes container component, which contains container-level metrics like CPU usage and parameters like CPU limits
- A Web Application component, which contains service-level metrics like throughput and response time
In this example, we assume the deployment to be optimized is called frontend, with a container named server, and is located within the boutique namespace. We also assume that Dynatrace is used as a telemetry provider.
Create a
component-container.yaml
manifest like the following:name: container
description: Kubernetes container, part of the frontend deployment
componentType: Kubernetes Container
properties:
dynatrace:
type: CONTAINER_GROUP_INSTANCE
kubernetes:
namespace: boutique
containerName: server
basePodName: frontend-*
Then run:
akamas create component component-container.yaml frontend
Now create a
component-webapp.yaml
manifest like the following:name: webapp
description: The service related to the frontend deployment
componentType: Web Application
properties:
dynatrace:
id: <TELEMETRY_DYNATRACE_WEBAPP_ID>
Then run:
akamas create component component-webapp.yaml frontend
The workflow in this example is composed of three main steps:
- 1.Update the Kubernetes deployment manifest with the Akamas recommended deployment parameters (CPU and memory limits)
- 2.Apply the new parameters (kubectl apply)
- 3.Wait for the rollout to complete
- 4.Sleep for 30 minutes (observation interval)
Create a
workflow.yaml
manifest like the following:name: frontend
tasks:
- name: configure
operator: FileConfigurator
arguments:
source:
hostname: mymachine
username: user
key: /home/user/.ssh/key
path: frontend.yaml.templ
target:
hostname: mymachine
username: user
key: /home/user/.ssh/key
path: frontend.yaml
- name: apply
operator: Executor
arguments:
timeout: 5m
host:
hostname: mymachine
username: user
key: /home/user/.ssh/key
command: kubectl apply -f frontend.yaml
- name: verify
operator: Executor
arguments:
timeout: 5m
host:
hostname: mymachine
username: user
key: /home/user/.ssh/key
command: kubectl rollout status --timeout=5m deployment/frontend -n boutique;
- name: observe
operator: Sleep
arguments:
seconds: 1800
Then run:
akamas create workflow workflow.yaml
Create the
telemetry.yaml
manifest like the following:provider: Dynatrace
config:
url: <YOUR_DYNATRACE_URL>
token: <YOUR_DYNATRACE_TOKEN>
pushEvents: false
Then run:
akamas create telemetry-instance telemetry.yaml frontend
In this live optimization:
- the goal is to reduce the cost of the Kubernetes deployment. In this example, the cost is based on the amount of CPU and memory limits (assuming requests = limits).
- the approval mode is set to manual, a new recommendation is generated daily
- to avoid impacting application performance, constraints are specified on desired response times and error rates
- to avoid impacting application reliability, constraints are specified on peak resource usage and out-of-memory kills
- the parameters to be tuned are the container CPU and memory limits (we assume requests=limits in the deployment file)
Create a
study.yaml
manifest like the following:1
name: frontend
2
system: frontend
3
workflow: frontend
4
requireApproval: true
5
6
goal:
7
objective: minimize
8
function:
9
formula: (((container.container_cpu_limit/1000) * 3) + (container.container_memory_limit/(1024*1024*1024)))
10
constraints:
11
absolute:
12
- name: Response Time
13
formula: webapp.requests_response_time <= 300
14
- name: Error Rate
15
formula: webapp.service_error_rate:max <= 0.05
16
- name: Container CPU saturation
17
formula: container.container_cpu_util:p95 < 0.8
18
- name: Container memory saturation
19
formula: container.container_memory_util:max < 0.7
20
- name: Container out-of-memory kills
21
formula: container.container_oom_kills_count == 0
22
23
parametersSelection:
24
- name: container.cpu_limit
25
domain: [300, 1000]
26
- name: container.memory_limit
27
domain: [800, 1536]
28
29
windowing:
30
type: trim
31
trim: [5m, 0m]
32
task: observe
33
34
workloadsSelection:
35
- name: webapp.requests_throughput
36
37
steps:
38
- name: baseline
39
type: baseline
40
numberOfTrials: 48
41
values:
42
container.cpu_limit: 1000
43
container.memory_limit: 1536
44
45
- name: optimize
46
type: optimize
47
numberOfTrials: 48
48
numberOfExperiments: 100
49
numberOfInitExperiments: 0
50
maxFailedExperiments: 50
Then run:
akamas create study study.yaml
You can now follow the live optimization progress and explore the results using the Akamas UI for Live optimizations.