All pages
Powered by GitBook
1 of 1

Loading...

Optimizing a Kubernetes application

In this example, we’ll optimize Online Boutique, a demo e-commerce application running on microservices, by tuning the resources allocated to a selection of pods. This is a common use case where we want to minimize the cost associated with running an application without impacting the SLO.

Notice: all the required artifacts are published in this public repository.

Environment setup

The test environment includes the following instances:

  • Akamas: the instance running Akamas.

  • Cluster: an instance hosting a Minikube cluster.

You can configure the Minikube cluster using the scripts provided in the public repository by running the command

Telemetry Infrastructure setup

To gather metrics about the application we will use Prometheus. It will be automatically configured by applying the artifacts in the repository with the following command:

Application and Test tool

The targeted system is Online Boutique, a microservice-based demo application. In the same namespace, a deployment running the load generator will stress the boutique and forward the performance metrics to Prometheus.

To configure the application and the load generator on your (Minikube) cluster, apply the definitions provided in the public repository by running the following command:

Optimization setup

In this section, we will guide you through the steps required to set up the optimization on Akamas.

If you have not installed the Kubernetes optimization pack yet, take a look at the page to proceed with the installation.

Notice: the artifacts to create the Akamas entities can be found in the public repository, under the akamas directory.

System

System Online Boutique

Here’s the definition of the system containing our components and telemetry-instances for this example:

To create the system run the following command:

Component online_boutique

We’ll use a component of type to represent at a high level the Online Boutique application. To identify the related Prometheus metrics the configuration requires the prometheus property for the telemetry service, detailed later in this guide.

Here’s the definition of the component:

To create the component in the system run the following command:

Component frontend and productcatalogservice

The public repository contains the definition of all the services that compose Online Boutique. In this guide, for the sake of simplicity, we’ll only tune the resources of the containers in the frontend and the product-catalog pods, defined as components of type .

Here’s their definition:

To create the component in the system run the following command:

Workflow

The workflow is divided into the following steps:

  • Create the YAML artifacts with the updated resource limits for the tuned containers.

  • Apply the updated definitions to the cluster.

  • Wait for the rollout to complete.

  • Start the load generator

The following is the definition of the workflow:

To better illustrate the process, here is a snippet of the template file used to update the resource limits for the frontend deployment.

The following are respectively the script to start and stop the load generator:

Telemetry

If you have not installed the Prometheus telemetry provider yet, take a look at the telemetry provider page to proceed with the installation.

With the definition of the telemetry instance shown below, we import the end-user performance metrics provided by the load-generator, along with a custom definition of "cost" given by a weighted sum of the CPU and memory allocated for the pods in the cluster:

To create the telemetry instance execute the following command:

Study

With this study, we want to minimize the "cost" of running the application, which, according to the definition described in the previous section, means reducing the resources allocated to the tuned pods in the cluster. At the same time, we want the application to stay within the expected SLO, and that is obtained by defining a constraint on the response time and error rate recorded by the load generator.

To create and run the study execute the following commands:

Let the test run for a fixed amount of time

  • Stop the test and reset the load generator

  • Kubernetes optimization pack
    WebApplication
    Kubernetes Container
    Prometheus provider
    kubectl apply -f kubernetes-online-boutique/kube/prometheus.yaml
    kubectl apply -f kubernetes-online-boutique/kube/
    kubectl apply -f kubernetes-online-boutique/kube/
    name: Online Boutique
    description: The Online Boutique by Google
    akamas create component application.yaml 'Online Boutique'
    name: online_boutique
    description: The Online Boutique application
    componentType: Web Application
    properties:
      prometheus:
        instance: .*
        job: .*
        namespace: akamas-demo
        container: server|redis
    akamas create component application.yaml 'Online Boutique'
    name: frontend
    description: The frontend of the online boutique by Google
    componentType: Kubernetes Container
    properties:
      prometheus:
        job: .*
        instance: .*
        name: .*
        pod: ak-frontend.*
        container: server
    name: productcatalogservice
    description: The productcatalogservice of the online boutique by Google
    componentType: Kubernetes Container
    properties:
      prometheus:
        job: .*
        instance: .*
        name: .*
        pod: ak-productcatalogservice.*
        container: server
    akamas create component frontend.yaml 'Online Boutique'
    akamas create component productcatalogservice.yaml 'Online Boutique'
    name: boutique
    tasks:
      - name: Configure Online Boutique
        operator: FileConfigurator
        arguments:
          source:
            hostname: CLUSTER_INSTANCE_IP
            username: akamas
            password: akamas
            path: boutique.yaml.templ
          target:
            hostname: cluster
            username: akamas
            password: akamas
            path: boutique.yaml
    
      - name: Apply new configuration to the Online Boutique
        operator: Executor
        arguments:
          host:
            hostname: CLUSTER_INSTANCE_IP
            username: akamas
            password: akamas
          command: kubectl apply -f boutique.yaml
    
      - name: Check Online Boutique is up
        operator: Executor
        arguments:
          retries: 0
          host:
            hostname: CLUSTER_INSTANCE_IP
            username: akamas
            password: akamas
          command: kubectl wait --for=condition=available deploy/ak-frontend deploy/ak-productcatalogservice --timeout=30s
    
      - name: Start Locust Test
        operator: Executor
        arguments:
          host:
            hostname: CLUSTER_INSTANCE_IP
            username: akamas
            password: akamas
          command: bash load-test.sh start
    
      - name: Test
        operator: Sleep
        arguments:
          seconds: 150
    
      - name: Stop Locust test
        operator: Executor
        arguments:
          host:
            hostname: CLUSTER_INSTANCE_IP
            username: akamas
            password: akamas
          command: bash load-test.sh stop
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: ak-frontend
      namespace: akamas-demo
    spec:
      selector:
        matchLabels:
          app: ak-frontend
      template:
        metadata:
          labels:
            app: ak-frontend
        # other definitions...
        spec:
          containers:
            - name: server
              image: gcr.io/google-samples/microservices-demo/frontend:v0.2.2
              # other definitions...
              resources:
                requests:
                  cpu: ${frontend.cpu_limit}
                  memory: ${frontend.memory_limit}
                limits:
                  cpu: ${frontend.cpu_limit}
                  memory: ${frontend.memory_limit}
    # other definitions...
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: ak-frontend
      namespace: akamas-demo
    spec:
      selector:
        matchLabels:
          app: ak-frontend
      replicas: 1
      strategy:
        rollingUpdate:
          maxSurge: 1
          maxUnavailable: 1
      template:
        metadata:
          labels:
            app: ak-frontend
        spec:
          serviceAccountName: default
          affinity:
            nodeAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
                - weight: 1
                  preference:
                    matchExpressions:
                      - key: akamas/node
                        operator: In
                        values:
                          - akamas
          containers:
            - name: server
              image: gcr.io/google-samples/microservices-demo/frontend:v0.2.2
              ports:
                - containerPort: 8080
              readinessProbe:
                initialDelaySeconds: 10
                httpGet:
                  path: "/_healthz"
                  port: 8080
                  httpHeaders:
                    - name: "Cookie"
                      value: "shop_session-id=x-readiness-probe"
              livenessProbe:
                initialDelaySeconds: 10
                httpGet:
                  path: "/_healthz"
                  port: 8080
                  httpHeaders:
                    - name: "Cookie"
                      value: "shop_session-id=x-liveness-probe"
              env:
                - name: PORT
                  value: "8080"
                - name: PRODUCT_CATALOG_SERVICE_ADDR
                  value: "ak-productcatalogservice:3550"
                - name: CURRENCY_SERVICE_ADDR
                  value: "ak-currencyservice:7000"
                - name: CART_SERVICE_ADDR
                  value: "ak-cartservice:7070"
                - name: RECOMMENDATION_SERVICE_ADDR
                  value: "ak-recommendationservice:8080"
                - name: SHIPPING_SERVICE_ADDR
                  value: "ak-shippingservice:50051"
                - name: CHECKOUT_SERVICE_ADDR
                  value: "ak-checkoutservice:5050"
                - name: AD_SERVICE_ADDR
                  value: "ak-adservice:9555"
                - name: ENV_PLATFORM
                  value: "aws"
                - name: DISABLE_TRACING
                  value: "1"
                - name: DISABLE_PROFILER
                  value: "1"
              resources:
                requests:
                  cpu: ${frontend.cpu_limit}
                  memory: ${frontend.memory_limit}
                limits:
                  cpu: ${frontend.cpu_limit}
                  memory: ${frontend.memory_limit}
    
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: ak-productcatalogservice
      namespace: akamas-demo
    spec:
      selector:
        matchLabels:
          app: ak-productcatalogservice
      replicas: 1
      template:
        metadata:
          labels:
            app: ak-productcatalogservice
        spec:
          serviceAccountName: default
          terminationGracePeriodSeconds: 5
          affinity:
            nodeAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
                - weight: 1
                  preference:
                    matchExpressions:
                      - key: akamas/node
                        operator: In
                        values:
                          - akamas
          containers:
            - name: server
              image: gcr.io/google-samples/microservices-demo/productcatalogservice:v0.2.2
              ports:
                - containerPort: 3550
              env:
                - name: PORT
                  value: "3550"
                - name: DISABLE_STATS
                  value: "1"
                - name: DISABLE_TRACING
                  value: "1"
                - name: DISABLE_PROFILER
                  value: "1"
              readinessProbe:
                exec:
                  command: ["/bin/grpc_health_probe", "-addr=:3550"]
              livenessProbe:
                exec:
                  command: ["/bin/grpc_health_probe", "-addr=:3550"]
              resources:
                requests:
                  cpu: ${productcatalogservice.cpu_limit}
                  memory: ${productcatalogservice.memory_limit}
                limits:
                  cpu: ${productcatalogservice.cpu_limit}
                  memory: ${productcatalogservice.memory_limit}
    #/bin/bash
    
    ACTION=$1
    
    LOCUST_ENDPOINT="$(minikube service -n akamas-demo ak-loadgenerator | awk '/web-ui.*http/ {print $8}')"
    
    case $ACTION in
      start)
        echo curl -X POST -d 'user_count=100' -d 'spawn_rate=3' -d 'host=http://ak-frontend:80'  "${LOCUST_ENDPOINT}/swarm"
        ;;
      stop)
        echo curl "${LOCUST_ENDPOINT}/stop"
        echo curl "${LOCUST_ENDPOINT}/stats/reset"
        ;;
      *)
        echo "Unrecognized option '${ACTION}'"
        exit 1
        ;;
    esac
    provider: Prometheus
    config:
      address: CLUSTER_IP
      port: PROM_PORT
    metrics:
      - metric: users
        datasourceMetric: "locust_users"
      - metric: transactions_throughput
        datasourceMetric: 'rate(locust_requests_num_requests{name="Aggregated"}[30s]) - rate(locust_requests_num_failures{name="Aggregated"}[30s])'
      - metric: transactions_error_throughput
        datasourceMetric: 'rate(locust_requests_num_failures{name="Aggregated"}[30s])'
      - metric: transactions_error_rate
        datasourceMetric: "locust_requests_fail_ratio"
      - metric: transactions_response_time
        datasourceMetric: 'locust_requests_avg_response_time{name="Aggregated"}'
      - metric: transactions_response_time_p50
        datasourceMetric: 'locust_requests_current_response_time_percentile_50'
      - metric: transactions_response_time_p95
        datasourceMetric: 'locust_requests_current_response_time_percentile_95'
    
      - metric: cost
        datasourceMetric: 'sum(kube_pod_container_resource_requests{resource="cpu" %FILTERS%})*29 + sum(kube_pod_container_resource_requests{resource="memory" %FILTERS%})/1024/1024/1024*3.2'
    akamas create telemetry-instance prometheus.yml 'Online Boutique'
    name: Minimize Kubernetes Online Boutique cost while matching SLOs
    system: Online Boutique
    workflow: boutique
    
    goal:
      objective: minimize
      constraints:
        absolute:
          - name: response_time
            formula: online_boutique.transactions_response_time <= 500
          - name: error_rate
            formula: online_boutique.transactions_error_rate <= 0.02
      function:
        formula: online_boutique.cost
    
    windowing:
      type: trim
      trim: [1m, 30s]
      task: Test
    
    metricsSelection:
      - online_boutique.cost
      - online_boutique.transactions_throughput
      - online_boutique.transactions_error_rate
      - online_boutique.transactions_response_time
      - online_boutique.transactions_response_time_p95
      - online_boutique.users
      - frontend.container_cpu_used
      - frontend.container_cpu_util
      - frontend.container_cpu_limit
      - frontend.container_cpu_throttle_time
      - frontend.container_memory_used
      - frontend.container_memory_util
      - frontend.container_memory_limit
      - productcatalogservice.container_cpu_used
      - productcatalogservice.container_cpu_util
      - productcatalogservice.container_cpu_limit
      - productcatalogservice.container_cpu_throttle_time
      - productcatalogservice.container_memory_used
      - productcatalogservice.container_memory_util
      - productcatalogservice.container_memory_limit
    
    parametersSelection:
      - name: frontend.cpu_limit
        domain: [100, 300]
      - name: frontend.memory_limit
        domain: [64, 512]
      - name: productcatalogservice.cpu_limit
        domain: [100, 500]
      - name: productcatalogservice.memory_limit
        domain: [64, 512]
    
    steps:
      - name: baseline
        type: baseline
        values:
          frontend.cpu_limit: 300
          frontend.memory_limit: 256
          productcatalogservice.cpu_limit: 300
          productcatalogservice.memory_limit: 256
    
      - name: optimize
        type: optimize
        numberOfExperiments: 50
    
    akamas create study study.yaml
    akamas start study 'Minimize Kubernetes Online Boutique cost while matching SLOs'