Optimize cost of a Java microservice on Kubernetes while preserving SLOs in production

In this guide, you optimize the cost (or resource footprint) of a Java microservice running on Kubernetes. The study tunes both pod resource settings (CPU and memory requests and limits) and JVM options (max heap size, garbage collection algorithm, etc.) at the same time, while also taking into account your application performance and reliability requirements (SLOs). This optimization happens in production, leveraging Akamas live optimization capabilities.

Prerequisites

  • an Akamas instance

  • a Kubernetes cluster, with a Java-based deployment to be optimized

  • a supported telemetry data source configured to collect metrics from the target Kubernetes cluster (see here for the full list)

  • a way to apply configuration changes recommended by Akamas to the target deployment. In this guide, Akamas interacts directly with the Kubernetes APIs via kubectl.You need a service account with permission to update your deployment (see below for other integration options)

Optimization setup

In this guide, we assume the following setup:

  • the Kubernetes deployment to be optimized is called adservice (in the boutique namespace)

  • in the deployment, there is a container named server, where the application JVM runs

  • Dynatrace is used as an observability tool

Let's set up the Akamas optimization for this use case.

System

For this optimization, you need the following components to model the adservice tech stack:

  • A Kubernetes container component, which contains container-level metrics like CPU usage and parameters to be tuned like CPU limits (from the Kubernetes optimization pack)

  • A Java OpenJDK component, which contains JVM-level metrics like heap memory usage and parameters to be tuned like the garbage collector algorithm (from the Java OpenJDK optimization pack)

  • A Web Application component, which contains service-level metrics like throughput and response time of the microservice (from the Web application optimization pack)

Let's start by creating the system, that represents the Kubernetes deployment to be optimized. To create it, write a system.yaml manifest like this:

name: adservice
description: The Adservice deployment

Then run:

akamas create system system.yaml

Now create a component-container.yaml manifest like the following:

name: server
description: Kubernetes container in the frontend deployment
componentType: Kubernetes Container
properties:
  dynatrace:
    type: CONTAINER_GROUP_INSTANCE
    kubernetes:
      namespace: boutique
      containerName: server
      basePodName: frontend-*

Notice the component includes properties that specify how Dynatrace telemetry will look up this container in the Kubernetes cluster (the same will happen for the following components).

These properties are dependent upon the telemetry provider you are using.

Then run:

akamas create component component-container.yaml frontend

Next, create a component-jvm.yaml manifest like the following:

name: jvm
description: JVM of the frontend deployment
componentType: java-openjdk-17
properties:
  dynatrace:
    type: PROCESS
    tags:
     akamas: adservice-jvm

Then run:

akamas create component component-jvm.yaml adservice

Now create a component-webapp.yaml manifest like the following:

name: webapp
description: The HTTP service of the adservice deployment
componentType: Web Application
properties:
  dynatrace:
    type: SERVICE
    name: adservice

Then run:

akamas create component component-webapp.yaml frontend

Workflow

To optimize a Kubernetes microservice in production, you need to create a workflow that defines how to deploy in production the new configuration recommended by Akamas.

Let's explore the high-level tasks required in this scenario and the options you have to adapt it to your environment:

1) Update the Kubernetes deployment configuration

The first step is to update the Kubernetes deployment with the new configuration. This can be done in several ways depending on your environment and processes:

  • A simple option is to let Akamas directly update the deployment leveraging the Kubernetes APIs via kubectl commands

  • Another option is to follow an Infrastructure-as-code approach, where the configuration change is managed via pull requests to a Git repository, leveraging your pipelines to deploy the change in production

In this guide, we take the first option and use the kubectl apply command to configure the new deployment. These commands are executed from the toolbox, an Akamas utility that can be enabled in an Akamas installation on Kubernetes. Make sure that kubectl is configured correctly to connect to your Kubernetes cluster and can update your target deployment. See here for more details.

2) Wait for the new deployment to be rolled out in production

In a live optimization, Akamas needs to understand when the new deployment rollout is complete and whether it was completed successfully or not. This is key information for Akamas AI to observe and optimize your applications safely.

This task can be done in several ways depending on how you manage changes, as discussed in the previous task:

  • A simple option is to use thekubectl rollout command to wait for the deployment rollout completion. This is the approach used in this guide

  • Another option is to follow an Infrastructure-as-code approach, where a change is managed via pull requests to a Git repository, leveraging your pipelines to deploy in production. In this situation, the deployment process is executed externally and is not controlled by Akamas. Hence, the workflow task will periodically poll the Kubernetes deployment to recognize when the new deployment has landed in production

3) Observe how the application behaves with the new configuration

In a live optimization, Akamas simply needs to wait for a given observation interval, while the application works in production with the new configuration. Telemetry metrics will be collected during this observation period and will be analyzed by Akamas AI to recommend the next configuration.

A 30-minute observation interval is recommended for most situations.

Let's now create a workflow.yaml manifest like the following:

name: adservice
tasks:
  - name: configure
    operator: FileConfigurator
    arguments:
      source:
        hostname: toolbox
        username: akamas
        password: <your-toolbox-password>
        path: adservice.yaml.templ
      target:
        hostname: toolbox
        username: akamas
        password: <your-toolbox-password>
        path: adservice.yaml

  - name: apply
    operator: Executor
    arguments:
      timeout: 5m
      host:
        hostname: toolbox
        username: akamas
        password: <your-toolbox-password>
      command: kubectl apply -f adservice.yaml

  - name: verify
    operator: Executor
    arguments:
      timeout: 5m
      host:
        hostname: toolbox
        username: akamas
        password: <your-toolbox-password>
      command: kubectl rollout status --timeout=5m deployment/adservice -n boutique;

  - name: observe
    operator: Sleep
    arguments:
      seconds: 1800

In the configure task, Akamas will apply the container CPU/memory limits and JVM options recommended by Akamas AI to the deployment file. To do that, copy your deployment manifest to a template file (here called adservice.yaml.templ), and substitute the current values with Akamas parameter placeholders as follows:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: adservice
spec:
  selector:
    matchLabels:
      app: adservice
  replicas: 1
  template:
    metadata:
      labels:
        app: adservice
    spec:
      containers:
        - name: server
          image: gcr.io/google-samples/microservices-demo/adservice:v0.3.8
          ports:
            - containerPort: 9555
          env:
            - name: PORT
              value: "9555"
            - name: JAVA_OPTS
              value: "${jvm.*}"
          resources:
            limits:
              cpu: ${server.cpu_limit}
              memory: ${server.memory_limit}

Whenever Akamas recommended configuration is applied, the configure task will create the actual adservice.yaml deployment file with the parameter placeholders substituted with values recommended by Akamas AI, and then the new deployment will be applied via kubectl apply.

To create the workflow, run:

akamas create workflow workflow.yaml

Telemetry

Create a telemetry instance based on your observability setup to collect your target Kubernetes deployment metrics.

Create a telemetry.yamlmanifest like the following:

provider: Dynatrace
config:
  url: <YOUR_DYNATRACE_URL>
  token: <YOUR_DYNATRACE_TOKEN>

Then run:

akamas create telemetry-instance telemetry.yaml adservice

Study

It's time to create the Akamas study to achieve your optimization objectives.

Let's explore how the study is designed by going through the main concepts. The complete study manifest is available at the bottom.

Goal

Your overall objective is to reduce the cost (or resource footprint) of a Kubernetes deployment. To do that, you need to define the goal, which is a metric (or combination of metrics) representing the deployment cost to be minimized.

There are different approaches to measuring the cost of Kubernetes deployments:

  • A simple approach is to consider that Kubernetes allocates infrastructure resources based on pod resource requests (CPU and memory). Hence, the cost of a deployment can be derived from the deployment aggregate CPU and memory requests. In this guide, we use this approach and define the study goal as the sum of CPU and memory requests of the container to be optimized

  • Alternatively, the cost of a Kubernetes deployment can also be collected from external data sources that provide actual cost metrics like OpenCost. In this case, the study goal can be defined by leveraging the cost metric

Notice that weighting factors can be used in the goal formula to specify the importance of CPU vs memory resources. For example, the cloud price of 1 CPU is about 9 times that of 1 GB of RAM. You can customize those weights based on your requirements so that Akamas knows how to truly reach the most cost-efficient configuration in your specific context.

Constraints

When optimizing for cost reduction (or resource footprint), it's key not to impact application response time or introduce risks of availability and reliability issues. To ensure this, you can define your performance and reliability requirements (SLOs) as metric constraints.

In this study:

  • to ensure application performance, constraints are specified on application response times and error rate

  • to ensure application reliability, constraints are specified on:

    • container peak CPU and memory utilization, and container out-of-memory kills

    • JVM garbage collection time %, to prevent out-of-memory in the JVM heap memory

Parameters

To achieve cost-efficient and reliable Java-based microservices, Kubernetes container resources and JVM runtime options must be configured optimally and tuned jointly, as they are heavily interconnected.

To do that, the study includes the following parameters:

  • Kubernetes container: CPU and memory requests and limits

  • JVM: heap size and garbage collection (GC) algorithms

The study also includes parameter constraints to ensure that recommended configurations are safe and comply with best practices. In particular:

  • Kubernetes container memory limit must be higher than JVM heap size, plus a buffer to account for JVM off-heap memory usage

  • CPU limits must be at most 2x CPU requests, to avoid excessive over-commitment of CPU limits in the cluster

Notice that the parameters and constraints can change depending on your policies. For example, it is a best practice to set memory requests == limits to avoid pod eviction. In this case, you only include memory requests in the study and set limits to the same value in the deployment file.

Workload

Akamas live optimization considers the application's workload to recommend new configurations that are optimal for the goal (e.g. reduce cost) while meeting all metric constraints (e.g., latency and error rates).

For Kubernetes microservices, the workload is typically the throughput (requests/sec) of the microservice API endpoints. This is the approach used in this guide.

Approval mode and recommendation frequency

In this live optimization, the manual approval is set to required, meaning that Akamas will ask for user approval when a new configuration gets generated. Once you approve it, the workflow will be executed, and the new configuration will be deployed to production according to the integration strategy you have defined above.

You can set it to false to enable fully autonomous optimization: in this case, as soon as a new configuration gets generated, the workflow will be executed without any human involvement.

The recommendation frequency can be chosen by leveraging the numberOfTrials parameter. As the workflow duration is set to 30 minutes, in order to have a new configuration generated daily, set the number of trials to 48.

You can now create a study.yaml manifest like the following:

name: adservice - optimize costs tuning K8s and JVM
system: adservice
workflow: adservice

goal:
  name: Cost
  objective: minimize
  function:
    formula: ((server.container_cpu_limit)/1000)*29 + ((((server.container_memory_limit)/1024)/1024)/1024)*3
  constraints:
    absolute: 
      - name: Application response time degradation
        formula: web_application.requests_response_time:max <= 5
      - name: Application error rate degradation
        formula: web_application.requests_error_rate:max <= 0.02
      - name: Container CPU saturation
        formula: server.container_cpu_util_max:p95 < 1
      - name: Container memory saturation
        formula: server.container_memory_util_max:max < 1
      - name: Container out-of-memory
        formula: server.container_restarts == 0
      - name: JVM heap saturation
        formula: jvm.jvm_gc_time:max < 0.05

windowing:
  type: trim
  trim: [2m, 0s]
  task: observe

parametersSelection:
  - name: server.cpu_request
    domain: [10, 181]
  - name: server.cpu_limit
    domain: [10, 181]
  - name: server.memory_request
    domain: [16, 2048]
  - name: jvm.jvm_maxHeapSize
    domain: [16, 1024]
  - name: jvm.jvm_gcType

parameterConstraints:
  - name: JVM off-heap safety buffer
    formula: jvm.jvm_maxHeapSize + 1000 < server.memory_limit
  - name: CPU limit at most 2x of requests
    formula: server.cpu_limit <= server.cpu_request * 2

workloadsSelection:
  - name: web_application.requests_throughput

numberOfTrials: 48
steps:
  - name: baseline
    type: baseline
    values:
      server.cpu_limit: 1000
      server.memory_limit: 2048
      jvm.jvm_maxHeapSize: 1024
      jvm.jvm_gcType: Serial

  - name: optimize
    type: optimize
    numberOfExperiments: 21

Then run:

akamas create study study.yaml

You can now follow the live optimization progress and explore the results using the Akamas UI.

Artifact templates

To quickly set up this optimization, download the Akamas template manifests and update the values file to match your needs. Then, create your optimization using the Akamas scaffolding.

Last updated