Optimizing a Kubernetes application
In this example, we’ll optimize Online Boutique, a demo e-commerce application running on microservices, by tuning the resources allocated to a selection of pods. This is a common use case where we want to minimize the cost associated with running an application without impacting the SLO.
Notice: all the required artifacts are published in this public repository.
Environment setup
The test environment includes the following instances:
Akamas: the instance running Akamas.
Cluster: an instance hosting a Minikube cluster.
You can configure the Minikube cluster using the scripts provided in the public repository by running the command
kubectl apply -f kubernetes-online-boutique/kube/prometheus.yaml
Telemetry Infrastructure setup
To gather metrics about the application we will use Prometheus. It will be automatically configured by applying the artifacts in the repository with the following command:
kubectl apply -f kubernetes-online-boutique/kube/
Application and Test tool
The targeted system is Online Boutique, a microservice-based demo application. In the same namespace, a deployment running the load generator will stress the boutique and forward the performance metrics to Prometheus.
To configure the application and the load generator on your (Minikube) cluster, apply the definitions provided in the public repository by running the following command:
kubectl apply -f kubernetes-online-boutique/kube/
Optimization setup
In this section, we will guide you through the steps required to set up the optimization on Akamas.
If you have not installed the Kubernetes optimization pack yet, take a look at the Kubernetes optimization pack page to proceed with the installation.
Notice: the artifacts to create the Akamas entities can be found in the public repository, under the akamas
directory.
System
System Online Boutique
Here’s the definition of the system containing our components and telemetry-instances for this example:
name: Online Boutique
description: The Online Boutique by Google
To create the system run the following command:
akamas create component application.yaml 'Online Boutique'
Component online_boutique
We’ll use a component of type WebApplication to represent at a high level the Online Boutique application. To identify the related Prometheus metrics the configuration requires the prometheus
property for the telemetry service, detailed later in this guide.
Here’s the definition of the component:
name: online_boutique
description: The Online Boutique application
componentType: Web Application
properties:
prometheus:
instance: .*
job: .*
namespace: akamas-demo
container: server|redis
To create the component in the system run the following command:
akamas create component application.yaml 'Online Boutique'
Component frontend and productcatalogservice
The public repository contains the definition of all the services that compose Online Boutique. In this guide, for the sake of simplicity, we’ll only tune the resources of the containers in the frontend and the product-catalog pods, defined as components of type Kubernetes Container.
Here’s their definition:
name: frontend
description: The frontend of the online boutique by Google
componentType: Kubernetes Container
properties:
prometheus:
job: .*
instance: .*
name: .*
pod: ak-frontend.*
container: server
name: productcatalogservice
description: The productcatalogservice of the online boutique by Google
componentType: Kubernetes Container
properties:
prometheus:
job: .*
instance: .*
name: .*
pod: ak-productcatalogservice.*
container: server
To create the component in the system run the following command:
akamas create component frontend.yaml 'Online Boutique'
akamas create component productcatalogservice.yaml 'Online Boutique'
Workflow
The workflow is divided into the following steps:
Create the YAML artifacts with the updated resource limits for the tuned containers.
Apply the updated definitions to the cluster.
Wait for the rollout to complete.
Start the load generator
Let the test run for a fixed amount of time
Stop the test and reset the load generator
The following is the definition of the workflow:
name: boutique
tasks:
- name: Configure Online Boutique
operator: FileConfigurator
arguments:
source:
hostname: CLUSTER_INSTANCE_IP
username: akamas
password: akamas
path: boutique.yaml.templ
target:
hostname: cluster
username: akamas
password: akamas
path: boutique.yaml
- name: Apply new configuration to the Online Boutique
operator: Executor
arguments:
host:
hostname: CLUSTER_INSTANCE_IP
username: akamas
password: akamas
command: kubectl apply -f boutique.yaml
- name: Check Online Boutique is up
operator: Executor
arguments:
retries: 0
host:
hostname: CLUSTER_INSTANCE_IP
username: akamas
password: akamas
command: kubectl wait --for=condition=available deploy/ak-frontend deploy/ak-productcatalogservice --timeout=30s
- name: Start Locust Test
operator: Executor
arguments:
host:
hostname: CLUSTER_INSTANCE_IP
username: akamas
password: akamas
command: bash load-test.sh start
- name: Test
operator: Sleep
arguments:
seconds: 150
- name: Stop Locust test
operator: Executor
arguments:
host:
hostname: CLUSTER_INSTANCE_IP
username: akamas
password: akamas
command: bash load-test.sh stop
To better illustrate the process, here is a snippet of the template file used to update the resource limits for the frontend deployment.
apiVersion: apps/v1
kind: Deployment
metadata:
name: ak-frontend
namespace: akamas-demo
spec:
selector:
matchLabels:
app: ak-frontend
template:
metadata:
labels:
app: ak-frontend
# other definitions...
spec:
containers:
- name: server
image: gcr.io/google-samples/microservices-demo/frontend:v0.2.2
# other definitions...
resources:
requests:
cpu: ${frontend.cpu_limit}
memory: ${frontend.memory_limit}
limits:
cpu: ${frontend.cpu_limit}
memory: ${frontend.memory_limit}
# other definitions...
The following are respectively the script to start and stop the load generator:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ak-frontend
namespace: akamas-demo
spec:
selector:
matchLabels:
app: ak-frontend
replicas: 1
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
template:
metadata:
labels:
app: ak-frontend
spec:
serviceAccountName: default
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: akamas/node
operator: In
values:
- akamas
containers:
- name: server
image: gcr.io/google-samples/microservices-demo/frontend:v0.2.2
ports:
- containerPort: 8080
readinessProbe:
initialDelaySeconds: 10
httpGet:
path: "/_healthz"
port: 8080
httpHeaders:
- name: "Cookie"
value: "shop_session-id=x-readiness-probe"
livenessProbe:
initialDelaySeconds: 10
httpGet:
path: "/_healthz"
port: 8080
httpHeaders:
- name: "Cookie"
value: "shop_session-id=x-liveness-probe"
env:
- name: PORT
value: "8080"
- name: PRODUCT_CATALOG_SERVICE_ADDR
value: "ak-productcatalogservice:3550"
- name: CURRENCY_SERVICE_ADDR
value: "ak-currencyservice:7000"
- name: CART_SERVICE_ADDR
value: "ak-cartservice:7070"
- name: RECOMMENDATION_SERVICE_ADDR
value: "ak-recommendationservice:8080"
- name: SHIPPING_SERVICE_ADDR
value: "ak-shippingservice:50051"
- name: CHECKOUT_SERVICE_ADDR
value: "ak-checkoutservice:5050"
- name: AD_SERVICE_ADDR
value: "ak-adservice:9555"
- name: ENV_PLATFORM
value: "aws"
- name: DISABLE_TRACING
value: "1"
- name: DISABLE_PROFILER
value: "1"
resources:
requests:
cpu: ${frontend.cpu_limit}
memory: ${frontend.memory_limit}
limits:
cpu: ${frontend.cpu_limit}
memory: ${frontend.memory_limit}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: ak-productcatalogservice
namespace: akamas-demo
spec:
selector:
matchLabels:
app: ak-productcatalogservice
replicas: 1
template:
metadata:
labels:
app: ak-productcatalogservice
spec:
serviceAccountName: default
terminationGracePeriodSeconds: 5
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: akamas/node
operator: In
values:
- akamas
containers:
- name: server
image: gcr.io/google-samples/microservices-demo/productcatalogservice:v0.2.2
ports:
- containerPort: 3550
env:
- name: PORT
value: "3550"
- name: DISABLE_STATS
value: "1"
- name: DISABLE_TRACING
value: "1"
- name: DISABLE_PROFILER
value: "1"
readinessProbe:
exec:
command: ["/bin/grpc_health_probe", "-addr=:3550"]
livenessProbe:
exec:
command: ["/bin/grpc_health_probe", "-addr=:3550"]
resources:
requests:
cpu: ${productcatalogservice.cpu_limit}
memory: ${productcatalogservice.memory_limit}
limits:
cpu: ${productcatalogservice.cpu_limit}
memory: ${productcatalogservice.memory_limit}
#/bin/bash
ACTION=$1
LOCUST_ENDPOINT="$(minikube service -n akamas-demo ak-loadgenerator | awk '/web-ui.*http/ {print $8}')"
case $ACTION in
start)
echo curl -X POST -d 'user_count=100' -d 'spawn_rate=3' -d 'host=http://ak-frontend:80' "${LOCUST_ENDPOINT}/swarm"
;;
stop)
echo curl "${LOCUST_ENDPOINT}/stop"
echo curl "${LOCUST_ENDPOINT}/stats/reset"
;;
*)
echo "Unrecognized option '${ACTION}'"
exit 1
;;
esac
Telemetry
If you have not installed the Prometheus telemetry provider yet, take a look at the telemetry provider page Prometheus provider to proceed with the installation.
With the definition of the telemetry instance shown below, we import the end-user performance metrics provided by the load-generator, along with a custom definition of "cost" given by a weighted sum of the CPU and memory allocated for the pods in the cluster:
provider: Prometheus
config:
address: CLUSTER_IP
port: PROM_PORT
metrics:
- metric: users
datasourceMetric: "locust_users"
- metric: transactions_throughput
datasourceMetric: 'rate(locust_requests_num_requests{name="Aggregated"}[30s]) - rate(locust_requests_num_failures{name="Aggregated"}[30s])'
- metric: transactions_error_throughput
datasourceMetric: 'rate(locust_requests_num_failures{name="Aggregated"}[30s])'
- metric: transactions_error_rate
datasourceMetric: "locust_requests_fail_ratio"
- metric: transactions_response_time
datasourceMetric: 'locust_requests_avg_response_time{name="Aggregated"}'
- metric: transactions_response_time_p50
datasourceMetric: 'locust_requests_current_response_time_percentile_50'
- metric: transactions_response_time_p95
datasourceMetric: 'locust_requests_current_response_time_percentile_95'
- metric: cost
datasourceMetric: 'sum(kube_pod_container_resource_requests{resource="cpu" %FILTERS%})*29 + sum(kube_pod_container_resource_requests{resource="memory" %FILTERS%})/1024/1024/1024*3.2'
To create the telemetry instance execute the following command:
akamas create telemetry-instance prometheus.yml 'Online Boutique'
Study
With this study, we want to minimize the "cost" of running the application, which, according to the definition described in the previous section, it means reducing the resources allocated to the tuned pods in the cluster. At the same time, we want the application to stay within the expected SLO, and that is obtained by defining a constraint on the response time and error rate recorded by the load generator.
name: Minimize Kubernetes Online Boutique cost while matching SLOs
system: Online Boutique
workflow: boutique
goal:
objective: minimize
constraints:
absolute:
- name: response_time
formula: online_boutique.transactions_response_time <= 500
- name: error_rate
formula: online_boutique.transactions_error_rate <= 0.02
function:
formula: online_boutique.cost
windowing:
type: trim
trim: [1m, 30s]
task: Test
metricsSelection:
- online_boutique.cost
- online_boutique.transactions_throughput
- online_boutique.transactions_error_rate
- online_boutique.transactions_response_time
- online_boutique.transactions_response_time_p95
- online_boutique.users
- frontend.container_cpu_used
- frontend.container_cpu_util
- frontend.container_cpu_limit
- frontend.container_cpu_throttle_time
- frontend.container_memory_used
- frontend.container_memory_util
- frontend.container_memory_limit
- productcatalogservice.container_cpu_used
- productcatalogservice.container_cpu_util
- productcatalogservice.container_cpu_limit
- productcatalogservice.container_cpu_throttle_time
- productcatalogservice.container_memory_used
- productcatalogservice.container_memory_util
- productcatalogservice.container_memory_limit
parametersSelection:
- name: frontend.cpu_limit
domain: [100, 300]
- name: frontend.memory_limit
domain: [64, 512]
- name: productcatalogservice.cpu_limit
domain: [100, 500]
- name: productcatalogservice.memory_limit
domain: [64, 512]
steps:
- name: baseline
type: baseline
values:
frontend.cpu_limit: 300
frontend.memory_limit: 256
productcatalogservice.cpu_limit: 300
productcatalogservice.memory_limit: 256
- name: optimize
type: optimize
numberOfExperiments: 50
To create and run the study execute the following commands:
akamas create study study.yaml
akamas start study 'Minimize Kubernetes Online Boutique cost while matching SLOs'
Last updated
Was this helpful?