Optimizing a Kubernetes application
In this example, we’ll optimize Online Boutique, a demo e-commerce application running on microservices, by tuning the resources allocated to a selection of pods. This is a common use case where we want to minimize the cost associated with running an application without impacting the SLO.
The test environment includes the following instances:
- Akamas: the instance running Akamas.
- Cluster: an instance hosting a Minikube cluster.
You can configure the Minikube cluster using the scripts provided in the public repository by running the command
kubectl apply -f kubernetes-online-boutique/kube/prometheus.yaml
To gather metrics about the application we will use Prometheus. It will be automatically configured by applying the artifacts in the repository with the following command:
kubectl apply -f kubernetes-online-boutique/kube/
The targeted system is Online Boutique, a microservice-based demo application. In the same namespace, a deployment running the load generator will stress the boutique and forward the performance metrics to Prometheus.
To configure the application and the load generator on your (Minikube) cluster, apply the definitions provided in the public repository by running the following command:
kubectl apply -f kubernetes-online-boutique/kube/
In this section, we will guide you through the steps required to set up the optimization on Akamas.
If you have not installed the Kubernetes optimization pack yet, take a look at the Kubernetes optimization pack page to proceed with the installation.
Notice: the artifacts to create the Akamas entities can be found in the public repository, under the
akamas
directory.Here’s the definition of the system containing our components and telemetry-instances for this example:
1
name: Online Boutique
2
description: The Online Boutique by Google
To create the system run the following command:
akamas create component application.yaml 'Online Boutique'
We’ll use a component of type WebApplication to represent at a high level the Online Boutique application. To identify the related Prometheus metrics the configuration requires the
prometheus
property for the telemetry service, detailed later in this guide.Here’s the definition of the component:
1
name: online_boutique
2
description: The Online Boutique application
3
componentType: Web Application
4
properties:
5
prometheus:
6
instance: .*
7
job: .*
8
namespace: akamas-demo
9
container: server|redis
To create the component in the system run the following command:
akamas create component application.yaml 'Online Boutique'
The public repository contains the definition of all the services that compose Online Boutique. In this guide, for the sake of simplicity, we’ll only tune the resources of the containers in the frontend and the product-catalog pods, defined as components of type Kubernetes Container.
Here’s their definition:
1
name: frontend
2
description: The frontend of the online boutique by Google
3
componentType: Kubernetes Container
4
properties:
5
prometheus:
6
job: .*
7
instance: .*
8
name: .*
9
pod: ak-frontend.*
10
container: server
1
name: productcatalogservice
2
description: The productcatalogservice of the online boutique by Google
3
componentType: Kubernetes Container
4
properties:
5
prometheus:
6
job: .*
7
instance: .*
8
name: .*
9
pod: ak-productcatalogservice.*
10
container: server
To create the component in the system run the following command:
akamas create component frontend.yaml 'Online Boutique'
akamas create component productcatalogservice.yaml 'Online Boutique'
The workflow is divided into the following steps:
- Create the YAML artifacts with the updated resource limits for the tuned containers.
- Apply the updated definitions to the cluster.
- Wait for the rollout to complete.
- Start the load generator
- Let the test run for a fixed amount of time
- Stop the test and reset the load generator
The following is the definition of the workflow:
1
name: boutique
2
tasks:
3
- name: Configure Online Boutique
4
operator: FileConfigurator
5
arguments:
6
source:
7
hostname: CLUSTER_INSTANCE_IP
8
username: akamas
9
password: akamas
10
path: boutique.yaml.templ
11
target:
12
hostname: cluster
13
username: akamas
14
password: akamas
15
path: boutique.yaml
16
17
- name: Apply new configuration to the Online Boutique
18
operator: Executor
19
arguments:
20
host:
21
hostname: CLUSTER_INSTANCE_IP
22
username: akamas
23
password: akamas
24
command: kubectl apply -f boutique.yaml
25
26
- name: Check Online Boutique is up
27
operator: Executor
28
arguments:
29
retries: 0
30
host:
31
hostname: CLUSTER_INSTANCE_IP
32
username: akamas
33
password: akamas
34
command: kubectl wait --for=condition=available deploy/ak-frontend deploy/ak-productcatalogservice --timeout=30s
35
36
- name: Start Locust Test
37
operator: Executor
38
arguments:
39
host:
40
hostname: CLUSTER_INSTANCE_IP
41
username: akamas
42
password: akamas
43
command: bash load-test.sh start
44
45
- name: Test
46
operator: Sleep
47
arguments:
48
seconds: 150
49
50
- name: Stop Locust test
51
operator: Executor
52
arguments:
53
host:
54
hostname: CLUSTER_INSTANCE_IP
55
username: akamas
56
password: akamas
57
command: bash load-test.sh stop
To better illustrate the process, here is a snippet of the template file used to update the resource limits for the frontend deployment.
1
apiVersion: apps/v1
2
kind: Deployment
3
metadata:
4
name: ak-frontend
5
namespace: akamas-demo
6
spec:
7
selector:
8
matchLabels:
9
app: ak-frontend
10
template:
11
metadata:
12
labels:
13
app: ak-frontend
14
# other definitions...
15
spec:
16
containers:
17
- name: server
18
image: gcr.io/google-samples/microservices-demo/frontend:v0.2.2
19
# other definitions...
20
resources:
21
requests:
22
cpu: ${frontend.cpu_limit}
23
memory: ${frontend.memory_limit}
24
limits:
25
cpu: ${frontend.cpu_limit}
26
memory: ${frontend.memory_limit}
27
# other definitions...
The following are respectively the script to start and stop the load generator:
1
apiVersion: apps/v1
2
kind: Deployment
3
metadata:
4
name: ak-frontend
5
namespace: akamas-demo
6
spec:
7
selector:
8
matchLabels:
9
app: ak-frontend
10
replicas: 1
11
strategy:
12
rollingUpdate:
13
maxSurge: 1
14
maxUnavailable: 1
15
template:
16
metadata:
17
labels:
18
app: ak-frontend
19
spec:
20
serviceAccountName: default
21
affinity:
22
nodeAffinity:
23
preferredDuringSchedulingIgnoredDuringExecution:
24
- weight: 1
25
preference:
26
matchExpressions:
27
- key: akamas/node
28
operator: In
29
values:
30
- akamas
31
containers:
32
- name: server
33
image: gcr.io/google-samples/microservices-demo/frontend:v0.2.2
34
ports:
35
- containerPort: 8080
36
readinessProbe:
37
initialDelaySeconds: 10
38
httpGet:
39
path: "/_healthz"
40
port: 8080
41
httpHeaders:
42
- name: "Cookie"
43
value: "shop_session-id=x-readiness-probe"
44
livenessProbe:
45
initialDelaySeconds: 10
46
httpGet:
47
path: "/_healthz"
48
port: 8080
49
httpHeaders:
50
- name: "Cookie"
51
value: "shop_session-id=x-liveness-probe"
52
env:
53
- name: PORT
54
value: "8080"
55
- name: PRODUCT_CATALOG_SERVICE_ADDR
56
value: "ak-productcatalogservice:3550"
57
- name: CURRENCY_SERVICE_ADDR
58
value: "ak-currencyservice:7000"
59
- name: CART_SERVICE_ADDR
60
value: "ak-cartservice:7070"
61
- name: RECOMMENDATION_SERVICE_ADDR
62
value: "ak-recommendationservice:8080"
63
- name: SHIPPING_SERVICE_ADDR
64
value: "ak-shippingservice:50051"
65
- name: CHECKOUT_SERVICE_ADDR
66
value: "ak-checkoutservice:5050"
67
- name: AD_SERVICE_ADDR
68
value: "ak-adservice:9555"
69
- name: ENV_PLATFORM
70
value: "aws"
71
- name: DISABLE_TRACING
72
value: "1"
73
- name: DISABLE_PROFILER
74
value: "1"
75
resources:
76
requests:
77
cpu: ${frontend.cpu_limit}
78
memory: ${frontend.memory_limit}
79
limits:
80
cpu: ${frontend.cpu_limit}
81
memory: ${frontend.memory_limit}
82
83
---
84
apiVersion: apps/v1
85
kind: Deployment
86
metadata:
87
name: ak-productcatalogservice
88
namespace: akamas-demo
89
spec:
90
selector:
91
matchLabels:
92
app: ak-productcatalogservice
93
replicas: 1
94
template:
95
metadata:
96
labels:
97
app: ak-productcatalogservice
98
spec:
99
serviceAccountName: default
100
terminationGracePeriodSeconds: 5
101
affinity:
102
nodeAffinity:
103
preferredDuringSchedulingIgnoredDuringExecution:
104
- weight: 1
105
preference:
106
matchExpressions:
107
- key: akamas/node
108
operator: In
109
values:
110
- akamas
111
containers:
112
- name: server
113
image: gcr.io/google-samples/microservices-demo/productcatalogservice:v0.2.2
114
ports:
115
- containerPort: 3550
116
env:
117
- name: PORT
118
value: "3550"
119
- name: DISABLE_STATS
120
value: "1"
121
- name: DISABLE_TRACING
122
value: "1"
123
- name: DISABLE_PROFILER
124
value: "1"
125
readinessProbe:
126
exec:
127
command: ["/bin/grpc_health_probe", "-addr=:3550"]
128
livenessProbe:
129
exec:
130
command: ["/bin/grpc_health_probe", "-addr=:3550"]
131
resources:
132
requests:
133
cpu: ${productcatalogservice.cpu_limit}
134
memory: ${productcatalogservice.memory_limit}
135
limits:
136
cpu: ${productcatalogservice.cpu_limit}
137
memory: ${productcatalogservice.memory_limit}
1
#/bin/bash
2
3
ACTION=$1
4
5
LOCUST_ENDPOINT="$(minikube service -n akamas-demo ak-loadgenerator | awk '/web-ui.*http/ {print $8}')"
6
7
case $ACTION in
8
start)
9
echo curl -X POST -d 'user_count=100' -d 'spawn_rate=3' -d 'host=http://ak-frontend:80' "${LOCUST_ENDPOINT}/swarm"
10
;;
11
stop)
12
echo curl "${LOCUST_ENDPOINT}/stop"
13
echo curl "${LOCUST_ENDPOINT}/stats/reset"
14
;;
15
*)
16
echo "Unrecognized option '${ACTION}'"
17
exit 1
18
;;
19
esac
If you have not installed the Prometheus telemetry provider yet, take a look at the telemetry provider page Prometheus provider to proceed with the installation.
With the definition of the telemetry instance shown below, we import the end-user performance metrics provided by the load-generator, along with a custom definition of "cost" given by a weighted sum of the CPU and memory allocated for the pods in the cluster:
1
provider: Prometheus
2
config:
3
address: CLUSTER_IP
4
port: PROM_PORT
5
metrics:
6
- metric: users
7
datasourceMetric: "locust_users"
8
- metric: transactions_throughput
9
datasourceMetric: 'rate(locust_requests_num_requests{name="Aggregated"}[30s]) - rate(locust_requests_num_failures{name="Aggregated"}[30s])'
10
- metric: transactions_error_throughput
11
datasourceMetric: 'rate(locust_requests_num_failures{name="Aggregated"}[30s])'
12
- metric: transactions_error_rate
13
datasourceMetric: "locust_requests_fail_ratio"
14
- metric: transactions_response_time
15
datasourceMetric: 'locust_requests_avg_response_time{name="Aggregated"}'
16
- metric: transactions_response_time_p50
17
datasourceMetric: 'locust_requests_current_response_time_percentile_50'
18
- metric: transactions_response_time_p95
19
datasourceMetric: 'locust_requests_current_response_time_percentile_95'
20
21
- metric: cost
22
datasourceMetric: 'sum(kube_pod_container_resource_requests{resource="cpu" %FILTERS%})*29 + sum(kube_pod_container_resource_requests{resource="memory" %FILTERS%})/1024/1024/1024*3.2'
To create the telemetry instance execute the following command:
akamas create telemetry-instance prometheus.yml 'Online Boutique'
With this study, we want to minimize the "cost" of running the application, which, according to the definition described in the previous section, it means reducing the resources allocated to the tuned pods in the cluster. At the same time, we want the application to stay within the expected SLO, and that is obtained by defining a constraint on the response time and error rate recorded by the load generator.
1
name: Minimize Kubernetes Online Boutique cost while matching SLOs
2
system: Online Boutique
3
workflow: boutique
4
5
goal:
6
objective: minimize
7
constraints:
8
absolute:
9
- name: response_time
10
formula: online_boutique.transactions_response_time <= 500
11
- name: error_rate
12
formula: online_boutique.transactions_error_rate <= 0.02
13
function:
14
formula: online_boutique.cost
15
16
windowing:
17
type: trim
18
trim: [1m, 30s]
19
task: Test
20
21
metricsSelection:
22
- online_boutique.cost
23
- online_boutique.transactions_throughput
24
- online_boutique.transactions_error_rate
25
- online_boutique.transactions_response_time
26
- online_boutique.transactions_response_time_p95
27
- online_boutique.users
28
- frontend.container_cpu_used
29
- frontend.container_cpu_util
30
- frontend.container_cpu_limit
31
- frontend.container_cpu_throttle_time
32
- frontend.container_memory_used
33
- frontend.container_memory_util
34
- frontend.container_memory_limit
35
- productcatalogservice.container_cpu_used
36
- productcatalogservice.container_cpu_util
37
- productcatalogservice.container_cpu_limit
38
- productcatalogservice.container_cpu_throttle_time
39
- productcatalogservice.container_memory_used
40
- productcatalogservice.container_memory_util
41
- productcatalogservice.container_memory_limit
42
43
parametersSelection:
44
- name: frontend.cpu_limit
45
domain: [100, 300]
46
- name: frontend.memory_limit
47
domain: [64, 512]
48
- name: productcatalogservice.cpu_limit
49
domain: [100, 500]
50
- name: productcatalogservice.memory_limit
51
domain: [64, 512]
52
53
steps:
54
- name: baseline
55
type: baseline
56
values:
57
frontend.cpu_limit: 300
58
frontend.memory_limit: 256
59
productcatalogservice.cpu_limit: 300
60
productcatalogservice.memory_limit: 256
61
62
- name: optimize
63
type: optimize
64
numberOfExperiments: 50
65
To create and run the study execute the following commands:
akamas create study study.yaml
akamas start study 'Minimize Kubernetes Online Boutique cost while matching SLOs'