2 of 20

Knowledge Base

This guide describes how to apply the Akamas approach to the optimization of some real-world cases and how to set up a test environment for experimenting with Akamas.

Setting up a Konakart environment for testing Akamas

This page describes how to setup a simple yet complete performance testing environment that you can use for running Akamas offline optimization study:

The target application is Konakart, a real-world Java-based e-commerce application
JMeter will be used to execute stress tests, with built-in sample scenarios
Prometheus will be used to monitor the environment (load test, JVM and OS metrics)

Reference Architecture

The following picture describes the high-level Akamas architecture that is enabled by this setup.

Setup Docker Swarm

First of all, install Docker on your Linux box:

sudo apt update && sudo apt install docker.io

Now enable the user ubuntu to run docker without sudo:

sudo usermod -aG docker $USER
newgrp docker

In this environment we leverage Docker Swarm, a Docker native container orchestration system. Even though in this scenario everything runs on a single machine, Swarm is handy as it provides the ability to specify container resource limits (e.g. how many CPUs and how much memory the container can use) which can be very useful from a tuning perspective.

At this point, we can initialize Docker Swarm with this simple command:

docker swarm init

You should see a message stating that Docker Swarm has been setup.

You now create a Docker network that all the containers will use to communicate:

docker network create -d overlay --attachable akamas_lab

Setup Konakart

In order to setup Konakart, first clone this repository:

git clone https://github.com/akamaslabs/konakart-docker.git

Now you can start Konakart by running:

docker stack deploy --compose-file konakart-docker/konakart/docker-compose.yml sut

You can now verify that Konakart is up and running by accessing your instance on port 8780:

Setup JMeter (optional)

Unless you plan to use a different load testing tool (such as LoadRunner Enterprise or Neotys NeoLoad), JMeter is a great choice to start using Akamas.

Setting up JMeter is straightforward as a JMeter container comes already configured with test plans built for performance testing Konakart. Moreover, JMeter is already configured with the Prometheus integration so that performance test metrics (e.g. transaction throughput and transaction response time) are collected via Prometheus Listener for JMeter. You just need to verify the ability to launch a performance test in this environment.

You can launch a first manual performance test by running the following command, where you need to replace YOUR_INSTANCE_ADDRESS with the address of your Konakart instance:

docker run --net=akamas_lab --rm --name jmeter -i -v $(pwd)/konakart-docker/jmeter:/tmp -w /tmp -p 9270:9270 chiabre/jmeter_plugins -t ramp_test_plan.jmx -JTARGET_HOST=YOUR_INSTANCE_ADDRESS

You should see an output similar to the one displayed here below, indicating that JMeter is running successfully:

START Running Jmeter on Fri Apr  2 06:35:35 UTC 2021
jmeter log level = OFF
jmeter defautl args -n -Dlog_level.jmeter=OFF -Jprometheus.ip=0.0.0.0
jmeter user args -t ramp_test_plan.jmx -JTARGET_HOST=konakart.dev.akamas.io -JTHREADS=40
Creating summariser <summary>
Created the tree successfully using ramp_test_plan.jmx
Starting standalone test @ Fri Apr 02 06:35:37 GMT 2021 (1617345337225)
Waiting for possible Shutdown/StopTestNow/HeapDump/ThreadDump message on port 4445
summary +    153 in 00:00:23 =    6.8/s Avg:    44 Min:     9 Max:  1300 Err:     0 (0.00%) Active: 5 Started: 5 Finished: 0
summary +    612 in 00:00:30 =   20.4/s Avg:    25 Min:     5 Max:   324 Err:     0 (0.00%) Active: 11 Started: 11 Finished: 0
summary =    765 in 00:00:53 =   14.5/s Avg:    29 Min:     5 Max:  1300 Err:     0 (0.00%)
summary +   1071 in 00:00:30 =   35.8/s Avg:    26 Min:     4 Max:   335 Err:     0 (0.00%) Active: 17 Started: 17 Finished: 0
summary =   1836 in 00:01:23 =   22.2/s Avg:    27 Min:     4 Max:  1300 Err:     0 (0.00%)
summary +   1408 in 00:00:30 =   46.9/s Avg:    60 Min:     4 Max:  1087 Err:     0 (0.00%) Active: 23 Started: 23 Finished: 0
summary =   3244 in 00:01:53 =   28.8/s Avg:    41 Min:     4 Max:  1300 Err:     0 (0.00%)
summary +   1831 in 00:00:30 =   61.0/s Avg:    60 Min:     4 Max:  1634 Err:     0 (0.00%) Active: 29 Started: 29 Finished: 0
summary =   5075 in 00:02:23 =   35.6/s Avg:    48 Min:     4 Max:  1634 Err:     0 (0.00%)
summary +   1913 in 00:00:30 =   63.7/s Avg:   137 Min:     5 Max:  2024 Err:     0 (0.00%) Active: 35 Started: 35 Finished: 0
summary =   6988 in 00:02:53 =   40.5/s Avg:    72 Min:     4 Max:  2024 Err:     0 (0.00%)
summary +   1785 in 00:00:28 =   63.3/s Avg:   225 Min:     8 Max:  2833 Err:     0 (0.00%) Active: 0 Started: 40 Finished: 40
summary =   8773 in 00:03:21 =   43.7/s Avg:   103 Min:     4 Max:  2833 Err:     0 (0.00%)
Tidying up ...    @ Fri Apr 02 06:38:58 GMT 2021 (1617345538215)
... end of run
END Running Jmeter on Fri Apr  2 06:38:58 UTC 2021

In case you see any errors (check out the Err: column), chances are that JMeter cannot contact Konakart. Please verify that your instance address is correct and relaunch the manual test until you are sure JMeter is running correctly.

Test plans

The JMeter docker image includes a couple of test plans described here below:

Ramp test plan

This test allows you to stress test Konakart with a ramp load profile. This profile is included in the ramp_test_plan.jmx file.

You can customize the profile by setting the following JMeter variables (see the example above):

THREADS, the maximum number of virtual users (default 20)
RAMP_SEC, the load test duration in seconds (default 200)

Plateau test plan

This test allows you to do a performance test of Konakart with an initial ramp-up and then with a constant load. This profile is included in the plateau_test_plan.jmx file.

You can customize the scenario by setting the following JMeter variables (see the example above):

THREADS, the maximum number of virtual users (default 20)
RAMP_UP_MIN, the ramp-up duration in minutes (default 1)
RAMP_UP_COUNT, the number of steps in the ramp (default 5)
HOLD_MIN, the plateau duration in minutes (default 5)

Setup Prometheus (optional)

Unless you plan to use a different monitoring tool (such as Dynatrace), Prometheus is a great choice to start using Akamas.

Now that Konakart and JMeter are up and running, the last step is to setup Prometheus. In this scenario, Prometheus allows you to gather any metrics you will need for your Akamas optimizations, for example, the performance test metrics measured by JMeter (e.g. transaction throughput and response time) or related to application resource usage.

This environment also includes a number of useful dashboards that you can use to monitor the application, infrastructure and load testing key metrics.

By running the following command you can launch Prometheus and Grafana, plus a set of preconfigured dashboards to monitor your load tests:

docker stack deploy -c konakart-docker/monitoring/docker-compose.yml monitoring

Explore Grafana dashboards (optional)

The following is a quick overview of the preconfigured dashboards that you can use to monitor the application, infrastructure and load-testing key metrics. These dashboards are available as part of the Prometheus installation.

You can view this dashboard by accessing Grafana on port 3000 of your Konakart instance.

JMeter Exporter

The JMeter dashboard allows you to monitor your performance tests.

For example, run again the JMeter performance test described before and see the results in the JMeter dashboard:

Docker Container

The Docker dashboard allows you to see the resource consumption of your containers, including the Konakart application:

JMX Exporter

The Node dashboard allows you to see the OS-level Linux performance metrics of your instance:

At this point, you have a simple test environment for your Akamas optimizations.

Modeling a sample Java-based e-commerce application (Konakart)

This page provides code snippets for each Akamas construct by considering the Konakart, a Java-based e-commerce application (https://www.konakart.com/), as a reference and modeling it as a 2-tier application, with an application server and a database (MySQL) layers.

For simplicity's sake, the optimization use case is defined as follows:

Optimization scope: JVM parameters
Optimization goal: reduce the application memory footprint (i.e. heap max that can be allocated by the JVM)
Optimization constraints: no impact on service level (i.e. response times, throughput, and error rate have to be the same before/after the optimization)

System

This is the YAML file providing the system definition for the reference use case:

name: Konakart
description: Konakart e-commerce application

Component

Since the optimization scope only considers JVM parameters, only a java component needs to be modeled.

The following snippet defines a Konakart java component based on a java-openjdk-11 component type.

name: Konakart jvm
description: jvm layer of Konakart e-commerce
componentType: java-openjdk-11

A different optimization scope would have required a different modeling approach. For instance, a broader optimization scope including the Linux layers of both application and database and the database layer would have required 3 additional components: 2 distinct components (of the same component type) for the 2 Linux layers and 1 component for the database layer.

From a monitoring perspective, there are only 2 mandatory data sources: the JVM layer, which will provide the goal metric needed to evaluate the score (heap size), and the web application layer, which will provide the metrics needed to evaluate optimization constraints (response time, throughput and error rate). The web-application component is based on a particular component type that aims to ease the collection of end-users metrics and has no parameters attached. Generally speaking the definition of a component based on the web application component type can be handy every time an optimization foresees the execution of a performance test and it is required to evaluate the end-to-end metrics.

The following snippet defines a Konakart component based on a web-application component type.

name: Konakart
description: Web Application layer of Konakart e-commerce
componentType: Web Application

A more comprehensive approach to telemetries could include additional metrics and data sources to provide a better understanding of the system behavior. The example provided only focuses on the mandatory metrics and the components needed to model them.

Component Type

Here is a (simplified) component types definition for the reference use case.

name: java-openjdk-11
description: The component type of Java OpenJDK and Oracle HotSpot version 11

parameters:
  - name: jvm_maxHeapSize
    domain:
      type: integer
      domain: [16, 102400]
    defaultValue: 1024
    operators:
      FileConfigurator:
        confTemplate: -Xmx${value}M

  - name: jvm_gcType
    domain:
      type: categorical
      categories: [Serial, Parallel, ConcMarkSweep, G1]
    defaultValue: G1
    operators:
      FileConfigurator:
        confTemplate: -XX:+Use${value}GC

metrics:
  - name: jvm_heap_size
  - name: jvm_gc_time

name: Web Application
description: Component-type containing the metrics representing a web application.

parameters: []

metrics:
  - name: transactions_response_time
  - name: transactions_throughput
  - name: transactions_error_rate

These component types are included in the "Java" and "Web Application" Optimization Packs and are available in any Akamas installation.

Parameters

Here is a (simplified) definition of the java parameters related to the java component type used for the reference use case.

parameters:
  - name: jvm_maxHeapSize
    description: Maximum heap size
    unit: megabytes

  - name: jvm_gcType
    description: Type of the garbage collection algorithm
    unit: ""

These parameters are included in the "Java" and "Web Application" Optimization Packs and are available in any Akamas installation.

Metrics

Here is a (simplified) definition of the web-application metrics related to the web-application component type used for the reference use case.

metrics:
  - name: transactions_response_time
    description: The average transaction response time
    unit: milliseconds

  - name: transactions_throughput
    description: The number of transactions executed per second
    unit: transactions/s

 - name: transactions_error_rate
    description: The percentage of transactions flagged as error
    unit: percent

These parameters are included in the "Web Application" Optimization Pack and are available in any Akamas installation.

Optimizing a web application

In this study, Akamas will optimize a web application by tuning the JVM parameters. The workflow leverages NeoLoad’s load generator through the dedicated NeoLoad Web operator and the NeoLoad Web provider to gather the metrics.

Optimization setup

System

The following snippets contain the definition of the system composed by a JVM running the petstore web application.

System: _webapplication_

name: webapplication
description: Petstore server

Component: _jvm_

name: jvm
description: JVM underlying the petstore application
componentType: openjdk-11
properties:
  instance: "petstore"

Component: _webapp_

name: webapp
description: Petstore web-application
componentType: Web Application

Telemetry instance: _NeoLoadWeb_

provider: NeoLoadWeb
config:
  accountToken: NLW_TOKEN

Workflow

Here’s a workflow that creates a new configuration file by interpolating the tuned parameters in a template file, restarts the application to apply the parameters, and triggers the execution of a load test:

name: neoloadweb_wf
tasks:
  - name: Set Java parameters
    operator: FileConfigurator
    arguments:
      source:
        hostname: app.petstore.com
        username: ubuntu
        key: # ...
        path: /home/ubuntu/akamas/conf_template
      target:
        hostname: app.petstore.com
        username: ubuntu
        key: # ...
        path: /home/ubuntu/akams/jvm.conf

  - name: Restart JVM
    operator: Executor
    arguments:
      command: bash /home/ubuntu/akamas/configure_and_restart.sh
      host:
        # script location ...

  - name: run NeoLoadWeb load test
    operator: NeoLoadWeb
    arguments:
      accountToken: NLW_TOKEN
      projectFile:
        # projectfile location ...

Study

Here’s a study in which Akamas tries to minimize to minimize the Java memory consumption by acting only on the heap size and on the type of garbage collector.

The web application metrics are used in the constraints in order to ensure the configuration does not degrade the service performances (throughput, response time, and error rate) below the acceptance level.

name: optimize_webapp_memory
system: webapplication
workflow: neoloadweb_wf
goal:
  objective: minimize
  function:
    formula: heap
    variables:
      heap:
        metric: jvm.jvm_maxHeapSize

  constraints:
    - metric: webapp.transactions_throughput
      greaterThan: 300
    - metric: webapp.transactions_response_time_max
      lowerThan: 6500
    - metric: webapp.transactions_response_time
      lowerThan: 600
    - metric: webapp.transactions_error_rate
      lowerThan: 0.1

parametersSelection:
  - name: jvm.jvm_maxHeapSize
    domain: [16, 2000]
  - name: jvm.jvm_gcType
    categories: [-XX:+UseG1GC, -XX:+UseParallelGC, -XX:+UseConcMarkSweepGC, -XX:+UseSerialGC]

numberOfTrials: 3
steps:
  - name: baseline
    type: baseline
    values:
      jvm.jvm_maxHeapSize: 2000
      jvm.jvm_gcType: -XX:+UseG1GC

  - name: optimization
    type: optimize
    numberOfExperiments: 99
    maxFailedExperiments: 25

Optimizing a sample Java OpenJ9 application

In this example study we’ll tune the parameters of PageRank, one of the benchmarks available in the Renaissance suite, with the goal of minimizing its memory usage. Application monitoring is provided by Prometheus, leveraging a JMX exporter.

Environment setup

The test environment includes the following instances:

Akamas: instance running Akamas
PageRank: instance running the PageRank benchmark and the Prometheus monitoring service

Telemetry Infrastructure setup

To gather metrics about PageRank we will use a Prometheus and a JMX exporter. Here’s the scraper to add to the Prometheus configuration to extract the metrics from the exporter:

- job_name: jmx-exporter
  static_configs:
    - targets: ['pagerank.akamas.io:5556']
      labels:
        instance: jvm

Application and Test tool

To run and monitor the benchmark we’ll require on the PageRank instance:

The Renaissance jar
The JMX exporter agent, plus a configuration file to expose the required classes

Here’s the snippet of code to configure the instance as required for this guide:

mkdir renaissance; cd renaissance
wget -O renaissance.jar https://github.com/renaissance-benchmarks/renaissance/releases/download/v0.10.0/renaissance-gpl-0.10.0.jar
wget -O jmx_exporter.jar https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.14.0/jmx_prometheus_javaagent-0.14.0.jar
echo -e '--\nwhitelistObjectNames: ["java.lang:*"]' > conf.yaml

Optimization setup

In this section, we will guide you through the steps required to set up the optimization on Akamas.

If you have not installed the Eclipse OpenJ9 optimization pack yet, take a look at the optimization pack page Eclipse OpenJ9 to proceed with the installation.

System

System pagerank

Here’s the definition of the system we will use to group our components and telemetry-instances for this example:

name: pagerank
description: A system to tune the pagerank benchmark

To create the system run the following command:

akamas create system pagerank.yaml

Component jvm

We’ll use a component of type IBM J9 VM 8 to represent the JVM underlying the PageRank benchmark. To identify the JMX-related metrics in Prometheus the configuration requires the prometheus property for the telemetry service, detailed later in this guide.

Here’s the definition of the component:

name: jvm
componentType: java-ibm-j9vm-8
properties:
  prometheus:
    instance: jvm
    job: jmx-exporter

To create the component in the system run the following command:

akamas create component jvm.yaml pagerank

Workflow

The workflow used for this study consists of two main stages:

generate the configuration file containing the tested OpenJ9 parameters
run the execution using previously written parameters

Here’s the definition of the workflow:

name: run-pagerank
tasks:
  - name: Configure parameters
    operator: FileConfigurator
    arguments:
      source:
        hostname: pagerank.akamas.io
        username: ubuntu
        path: /home/ubuntu/renaissance/j9_opts.template
        key: key
      target:
        hostname: pagerank.akamas.io
        username: ubuntu
        path: /home/ubuntu/renaissance/j9_opts
        key: key

  - name: Run benchmark
    operator: Executor
    arguments:
      command: "cd renaissance; java -javaagent:./jmx_exporter.jar=5556:conf.yaml $(cat j9_opts) -jar renaissance.jar -r 2 page-rank"
      host:
        hostname: pagerank.akamas.io
        username: ubuntu
        key: key

Where the configuration template is j9_opts.template is defined as follows:

 ${jvm.j9vm_gcPolicy} ${jvm.j9vm_maxHeapSize} ${jvm.j9vm_newSpaceFixed} ${jvm.j9vm_minFreeHeap} ${jvm.j9vm_maxFreeHeap} ${jvm.j9vm_gcThreads}

To create the workflow run the following command:

akamas create workflow workflow.yaml

Telemetry

The following is the definition of the telemetry instance that fetches metrics from the Prometheus service:

provider: Prometheus
config:
  address: pagerank.akamas.io
  port: 9090

To create the telemetry instance in the system run the following command:

akamas create telemetry-instance prometheus.yaml pagerank

This telemetry instance will be able to bind the fetched metrics to the related jvm component thanks to the prometheus attribute we previously added in its definition.

Study

The goal of this study is to find a JVM configuration that minimizes the peak memory used by the benchmark.

The optimized parameters are the maximum heap size, the garbage collector used and several other parameters managing the new and old heap areas. We also specify a constraint stating that the GC regions can’t exceed the total heap available, to avoid experimenting with parameter configurations that can’t start in the first place.

Here’s the definition of the study:

name: Optimize PageRank
description: Tweaking the Eclipse OpenJ9 parameters to optimize the page-rank benchmark.
system: pagerank
workflow: run-pagerank

goal:
  objective: minimize
  function:
    formula: max_memory
    variables:
      max_memory:
        metric: jvm.jvm_memory_used
        aggregation: max

parametersSelection:
  - name: jvm.j9vm_gcPolicy
  - name: jvm.j9vm_maxHeapSize
    domain: [1250, 2000]
  - name: jvm.j9vm_newSpaceFixed
    domain: [350, 2000]
  - name: jvm.j9vm_minFreeHeap
  - name: jvm.j9vm_maxFreeHeap
  - name: jvm.j9vm_gcThreads

parameterConstraints:
  - name: Max heap must always be greater than new size
    formula: jvm.j9vm_maxHeapSize > jvm.j9vm_newSpaceFixed
  - name: Max free always greater than min free
    formula: jvm.j9vm_minFreeHeap + 0.05 < jvm.j9vm_maxFreeHeap

steps:
  - name: baseline
    type: baseline
    values:
      jvm.jvm_gcType: gencon
      jvm.jvm_maxHeapSize: 2000

  - name: optimize
    type: optimize
    numberOfExperiments: 30

To create and run the study execute the following commands:

akamas create study study.yaml
akamas start study 'Optimize PageRank'

Optimizing a sample Java OpenJDK application

Environment setup

The test environment includes the following instances:

Akamas: instance running Akamas
PageRank: instance running the PageRank benchmark and the Prometheus monitoring service

Telemetry Infrastructure setup

To gather metrics about PageRank we will use a Prometheus and a JMX exporter. Here’s the scraper to add to the Prometheus configuration to extract the metrics from the exporter:

- job_name: jmx-exporter
  static_configs:
    - targets: ['pagerank.akamas.io:5556']
      labels:
      instance: jvm

Application and Test tool

To run and monitor the benchmark we’ll require on the PageRank instance:

The Renaissance jar
The JMX exporter agent, plus a configuration file to expose the required classes

Here’s the snippet of code to configure the instance as required for this guide:

mkdir renaissance; cd renaissance
wget -O renaissance.jar https://github.com/renaissance-benchmarks/renaissance/releases/download/v0.10.0/renaissance-gpl-0.10.0.jar
wget -O jmx_exporter.jar https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.14.0/jmx_prometheus_javaagent-0.14.0.jar
echo -e '--\nwhitelistObjectNames: ["java.lang:*"]' > conf.yaml

Optimization setup

In this section, we will guide you through the steps required to set up the optimization on Akamas.

If you have not installed the Java OpenJDK optimization pack yet, take a look at the optimization pack page Java OpenJDK to proceed with the installation.

System

System pagerank

Here’s the definition of the system we will use to group our components and telemetry instances for this example:

name: pagerank
description: A system to tune the pagerank benchmark

To create the system run the following command:

akamas create system pagerank.yaml

Component jvm

We’ll use a component of type Java OpenJDK 11 to represent the JVM underlying the PageRank benchmark. To identify the JMX-related metrics in Prometheus the configuration requires the prometheus property for the telemetry service, detailed later in this guide.

Here’s the definition of the component:

name: jvm
componentType: openjdk-11
properties:
  prometheus:
    instance: jvm
    job: jmx-exporter

To create the component in the system run the following command:

akamas create component jvm.yaml pagerank

Workflow

The workflow used for this study consists of two main stages:

generate the configuration file containing the tested Java parameters
run the execution using previously written parameters

Here’s the definition of the workflow:

name: run-pagerank
tasks:
  - name: Configure parameters
    operator: FileConfigurator
    arguments:
      source:
        hostname: pagerank.akamas.io
        username: ubuntu
        path: /home/ubuntu/renaissance/java_opts.template
        key: key
      target:
        hostname: pagerank.akamas.io
        username: ubuntu
        path: /home/ubuntu/renaissance/java_opts
        key: key

  - name: Run benchmark
    operator: Executor
    arguments:
      command: "cd renaissance; java -javaagent:./jmx_exporter.jar=5556:conf.yaml $(cat java_opts) -jar renaissance.jar -r 2 page-rank"
      host:
        hostname: pagerank.akamas.io
        username: ubuntu
        key: key

Where the configuration template is java_opts.template is defined as follows:

 ${jvm.jvm_gcType} ${jvm.jvm_maxHeapSize} ${jvm.jvm_newSize} ${jvm.jvm_survivorRatio} ${jvm.jvm_maxTenuringThreshold}

To create the workflow run the following command:

akamas create workflow workflow.yaml

Telemetry

The following is the definition of the telemetry instance that fetches metrics from the Prometheus service:

provider: Prometheus
config:
  address: pagerank.akamas.io
  port: 9090

To create the telemetry instance in the system run the following command:

akamas create telemetry-instance prometheus.yaml pagerank

This telemetry instance will be able to bind the fetched metrics to the related jvm component thanks to the prometheus attribute we previously added in its definition.

Study

The goal of this study is to find a JVM configuration that minimizes the peak memory used by the benchmark.

Here’s the definition of the study:

name: Optimize PageRank
description: Tweaking the JVM parameters to optimize the page-rank benchmark.
system: pagerank
workflow: run-pagerank

goal:
  objective: minimize
  function:
    formula: max_memory
    variables:
      max_memory:
        metric: jvm.jvm_memory_used
        aggregation: max

parametersSelection:
  - name: jvm.jvm_gcType
  - name: jvm.jvm_maxHeapSize
    domain: [1250, 2000]
  - name: jvm.jvm_newSize
    domain: [350, 2000]
  - name: jvm.jvm_survivorRatio
  - name: jvm.jvm_maxTenuringThreshold

parameterConstraints:
  - name: Max heap must always be greater than new size
    formula: jvm.jvm_maxHeapSize > jvm.jvm_newSize

steps:
  - name: baseline
    type: baseline
    values:
      jvm.jvm_gcType: G1
      jvm.jvm_maxHeapSize: 2000

  - name: optimize
    type: optimize
    numberOfExperiments: 30

To create and run the study execute the following commands:

akamas create study study.yaml
akamas start study 'Optimize PageRank'

Optimizing a sample Linux system

In this study, Akamas is tasked with the optimization of Linux1, a Linux-based system (Ubuntu). The study's goal is to maximize the throughput of the computations of the Sysbench CPU benchmark.

Sysbench is a suite of benchmarks for CPU, file system, memory, threads, etc… typically used for testing the performance of databases.

System1 comes with a Node Exporter that collects system metrics that Akamas consumes through a Prometheus provider. Concerning Sysbench metrics, The study uses the CSV provider to make them available to Akamas.

The study uses Sysbench to execute a performance test against System1.

Telemetry

Setup a Prometheus and a Node Exporter to monitor the System
Install the Prometheus provider
Create a provider instance:

provider: "Prometheus"
config:
  address: "linux1" # address of the Prometheus of system1
  port: 9090 # port of the Prometheus of system1
  component: "linux1-linux"

4. Install the CSV File provider.

5. Create a provider instance:

provider: "CSV"
config:
 address: "linux1"
 authType: "password"
 username: "ubuntu"
 auth: "[INSERT PASSWORD HERE]"
 protocol: scp
 remoteFilePattern: "/home/ubuntu/benchmark_log.csv" # the remote path of the csv with the metrics of the benchmark
 componentColumn: "component" # which column of the csv should contain the name of the component
 csvFormat: "horizontal"
metrics:
- metric: "throughput"
  datasourceMetric: "events_per_second"

Workflow

The study uses a four-task workflow to test System1 with a new configuration:

Task Configure OS, which leverages the LinuxConfigurator operator to apply a new set of Linux configuration parameters
Task Start benchmark, which leverages the Executor operator to launch the benchmark

The following YAML file represents the complete workflow definition:

name: "workflow for linux 1"
tasks
- name: "Configure OS"
  operator: "LinuxConfigurator"
  arguments:
    component: linux1-linux

- name: "Start benchmark"
  operator: "Executor"
  arguments:
    command: "bash /home/ubuntu/benchmark.sh"
    host:
      hostname: "linux1"
      username: "ubuntu"
      password: "[INSERT_HERE_PASSWORD]"

System

Within Akamas, System1 is modeled by a system of two components:

system1-linux, which represents the actual Linux system with its metrics and parameters and is of type Ubuntu 16.04
system1-bench, which represents the Sysbench with its metrics and is of type Sysbench

The following YAML file represents the definition of the Sysbench component type:

name: "Sysbench"
description: "A component-type for Sysbench"
metrics:
- "throughput" # only one metric

Study

Goal: minimize the throughput of the benchmark
Windowing: take the default (compute the score for the entire duration of a trial)
Parameters selection: select only CPU scheduling parameters
Metrics selection: select only the throughput of the benchmark
Trials: 3
Steps: one baseline and one optimize

The following YAML file represents the definition of the study:

system: "system for linux1"
workflow: "workflow ofr linux1"
name: "linux optimization with sysbench"
description: "Optimizing an Ubuntu instance with a CPU intensive benchmark: sysbench"
goal:
  objective: minimize
  function:
    formula: "linux1-benchmark.throughput"
metricsSelection:
  - "linux1-benchmark.throughput"
parametersSelection:
  - name: "linux1-linux.os_cpuSchedMinGranularity"
  - name: "linux1-linux.os_cpuSchedWakeupGranularity"
  - name: "linux1-linux.os_CPUSchedMigrationCost"
  - name: "linux1-linux.os_CPUSchedChildRunsFirst"
  - name: "linux1-linux.os_CPUSchedLatency"
  - name: "linux1-linux.os_CPUSchedAutogroupEnabled"
  - name: "linux1-linux.os_CPUSchedNrMigrate"
numberOfTrials: 3
steps:
  - name: "baseline"
    type: "baseline"
    values:
      linux1-linux.os_cpuSchedMinGranularity: 2250000
      linux1-linux.os_cpuSchedWakeupGranularity: 3000000
      linux1-linux.os_CPUSchedMigrationCost: 500000
      linux1-linux.os_CPUSchedChildRunsFirst: 0
      linux1-linux.os_CPUSchedLatency: 18000000
      linux1-linux.os_CPUSchedAutogroupEnabled: 1
      linux1-linux.os_CPUSchedNrMigrate: 32
  - name: "optimization"
    type: "optimize"
    numberOfExperiments: 99
    maxFailedExperiments: 25

Optimizing a MongoDB server instance

In this example study, we are going to optimize a MongoDB single server instance by setting the performance goal of maximizing the throughput of operations toward the database.

Concerning performance tests, we are going to employ YCSB, a popular benchmark created by Yahoo for testing various NoSQL databases.

To extract MongoDB metrics, we are going to spin up a Prometheus instance and we are going to use the MongoDB Prometheus exporter.

Environment setup

Hosts and ports

You can use a single host for both MongoDB and YCSB but, in the following example, we replicate a common pattern in performance engineering by externalizing the load injection tool into a separate instance to avoid performance issues and measurement noise.

mongo.mycompany.com for the MongoDB server instance (port 27017) and the MongoDB Prometheus exporter (port 9100)
ycsb.mycompany.com for YCSB and Prometheus (port 9090)

Notice: in the following, the assumption is to be working with Linux hosts.

Prometheus and exporters

To correctly extract MongoDB metrics we can leverage a solution like Prometheus, paired with the MongoDB Prometheus exporter. To do so we would need to:

Install the MongoDB Prometheus exporter on mongo.mycompany.com
Install and configure Prometheus on ycsb.mycompany.com

Install the MongoDB Prometheus exporter

You can check how to install the exporter here. On Ubuntu you can use the system package manager:

apt-get install prometheus-mongodb-provider

By default, the exporter will expose MongoDB metrics on port 9100

Install and configure Prometheus

You can check how to configure Prometheus here; by default, it will run on port 9090.

The following YAML fileprometheus.yaml is an example of the Prometheus configuration that you can use.

global:
  scrape_interval:     15s  # Set the scrape interval to every 15 seconds. The default is every 1 minute.
  evaluation_interval: 15s  # Evaluate rules every 15 seconds. The default is every 1 minute.
  scrape_timeout: 15s

scrape_configs:
  # Mongo exporter
  - job_name: 'mongo_exporter'
    scrape_interval: 15s
    static_configs:
      - targets: ['mongo.mycompany.com:9001']
    relabel_configs:
      - source_labels: ["__address__"]
        regex: ".*"
        target_label: "instance"
        # this replacement should match the name of the akamas component of MongoDB
        replacement: "mongo"

System

Since we are interested in tuning MongoDB by acting on its configuration parameters and by observing its throughput measured using YCSB, we need two components:

A mongo component which represents the MongoDB instance all the configuration parameters and maps directly to mongo.mycompany.com
A ycsb component which represents YCSB, in particular, it "houses" the metrics of the performance test, which can be used as parts of the goal of a study. This component maps directly to ucsb.mycompany.com

Here’s the definition of our system (system.yaml):

name: mongodb system
description: reference system

Here’s the definition of our mongo component (mongo.yaml):

name: mongo
description: The MongoDB server
componentType: MongoDB-4 # MongoDB version 4
properties:
  hostname: mongo.mycompany.com
  prometheus:
    # we are telling akamas that this component should be monitored using prometheus, and each data-point with a label instance=mongo should be mapped to this component
    instance: mongo
  sshPort: 22
  username: myusername
  key: ... RSA KEY ...

Here’s the definition of our ycsb component (ycsb.yaml):

name: ycsb
description: The YCSB client
componentType: YCSB
properties:
  hostname: ycsb.mycompany.io
  instance: ycsb
  sshPort: 22
  username: myusername
  key: ... RSA KEY ...

We can create the system by running:

akamas create system system.yaml

We can then create the components by running:

akamas create component mongo.yaml "mongodb system"
akamas create component ycsb.yaml "mongodb system"

Workflow

As described in the MongoDB optimization pack page, a workflow for optimizing MongoDB can be structured in three main steps:

Configure MongoDB with the configuration parameters decided by Akamas
Test the performance of the application
Prepare test results

Notice: here we have omitted the Cleanup step because it is not necessary for the context of this study.

Configure MongoDB

We can define a workflow task that uses the FileConfigurator operator to interpolate Akamas parameters into a MongoDB configuration script:

name: configure mongo
operator: FileConfigurator
arguments:
  sourcePath: /home/myusername/mongo/templates/mongo_launcher.sh.templ # MongoDB configuration script with placeholders for Akamas parameters
  targetPath: /home/myusername/mongo/launcher.sh # configuration script with interpolated Akamas parameters
  component: mongo # mongo should match the component of your system that represents your MongoDB instance

Here’s an example of a templated configuration script for MongoDB:

#!/bin/sh

cd "$(dirname "$0")" || exit

CACHESIZE=${mongo.mongodb_cache_size}
SYNCDELAY=${mongo.mongodb_syncdelay}
EVICTION_DIRTY_TRIGGER=${mongo.mongodb_eviction_dirty_trigger}
EVICTION_DIRTY_TARGET=${mongo.mongodb_eviction_dirty_target}
EVICTION_THREADS_MIN=${mongo.mongodb_eviction_threads_min}
EVICTION_THREADS_MAX=${mongo.mongodb_eviction_threads_max}
EVICTION_TRIGGER=${mongo.mongodb_eviction_trigger}
EVICTION_TARGET=${mongo.mongodb_eviction_target}
USE_NOATIME=${mongo.mongodb_datafs_use_noatime}

# Here we have to remount the disk mongodb uses for data, to take advantage of the USE_NOATIME parameter

sudo service mongod stop
sudo umount /mnt/mongodb
if [ "$USE_NOATIME" = true ]; then
        sudo mount /dev/nvme0n1 /mnt/mongodb -o noatime
else
        sudo mount /dev/nvme0n1 /mnt/mongodb
fi
sudo service mongod start

# flush logs
echo -n | sudo tee /mnt/mongodb/log/mongod.log
sudo service mongod restart

until grep -q "waiting for connections on port 27017" /mnt/mongodb/log/mongod.log
do
        echo "waiting MongoDB..."
        sleep 60
done

sleep 5
sudo service prometheus-mongodb-exporter restart
# set knobs
mongo --quiet --eval "db.adminCommand({setParameter:1, 'wiredTigerEngineRuntimeConfig': 'cache_size=${CACHESIZE}m, eviction=(threads_min=$EVICTION_THREADS_MIN,threads_max=$EVICTION_THREADS_MAX), eviction_dirty_trigger=$EVICTION_DIRTY_TRIGGER, eviction_dirty_target=$EVICTION_DIRTY_TARGET', eviction_trigger=$EVICTION_TRIGGER, eviction_target=$EVICTION_TARGET})"
mongo --quiet --eval "db = db.getSiblingDB('admin'); db.runCommand({ setParameter : 1, syncdelay: $SYNCDELAY})"

sleep 3Shell

We can add a workflow task that actually executes the MongoDB configuration script produced by the FileConfigurator:

name: launch mongo
operator: Executor
arguments:
  command: bash /home/myusername/mongo/launcher.sh
  component: mongo  # we can take all the ssh connection parameters from the properties of the mongo component

In each task, we leveraged the reference to the "mongo" component to fetch from its properties all the authentication info to ssh into the right machine e let the FileConfigurator and Executor do their work

Test the performance of the application

We can define a workflow task that uses the Executor operator to launch the YCSB benchmark against MongoDB:

name: launch ycsb
operator: Executor
arguments:
  command: bash /home/myusername/ycsb/launch_load.sh
  component: ycsb # we can take all the ssh connection parameters from the properties of the ycsb component

Here’s an example of a launch script for YCSB:

#!/bin/bash

MONGODB_SERVER_IP="mongo.mycompany.com"
RECORDCOUNT=30000000
RUN_THREADS=10
LOAD_THREADS=10
DURATION=1800 # 30 minutes
WORKLOAD="a"

cd "$(dirname "$0")" || exit

# here we use the db_records file to check if we have already loaded the db with data
# if not we run a load script
db_records=$(cat db_records)
if [ "$RECORDCOUNT" != "$db_records" ]; then
        bash scripts/create_db_mongo.sh ${MONGODB_SERVER_IP} "$RECORDCOUNT" "$LOAD_THREADS" "$WORKLOAD"
        echo "$RECORDCOUNT" > db_records
fi

cd /home/myuser/ycsb-0.15.0 || exit
# launch task in background
./bin/ycsb run mongodb-async -s -P workloads/workload"$WORKLOAD" -threads "$RUN_THREADS"  -p recordcount="$RECORDCOUNT" -p operationcount=0 -p maxexecutiontime="$DURATION" -p mongodb.url=mongodb://"$MONGODB_SERVER_IP":27017 &> /home/myuser/ycsb/outputRun.txt &
PID=$!

while kill -0 "$PID" >/dev/null 2>&1; do
        echo running

        if grep -q "java.net.ConnectException: Connection refused (Connection refused)" /home/myuser/ycsb/outputRun.txt; then
                echo "No connection, killing time!"
                echo -n > /home/myuser/ycsb/outputRun.txt
                ps -ef | grep -i com.yahoo.ycsb.Client | awk '{print $2}' | xargs -I{} kill -9 {}
                echo "Let's wait sometime... maybe Mongo is recovering data??"
                sleep 900
		            exit 1
        fi

        if grep -Fxq "Could not create a connection to the server" /home/myuser/ycsb/outputRun.txt; then
                echo "Unable to connect to server!"
                kill -9 ${PID}
                rm /home/myuser/ycsb/outputRun.txt
                rm /home/myuser/ycsb/db_records
        else
                sleep 10
        fi
done

Prepare test results

We can define a workflow task that launches a script that parses the YCSB results into a CSV file (Akamas will process the CSV file and then extract performance test metrics):

name: parse ycsb results
operator: Executor
arguments:
  command: python /home/myusername/ycsb/parser.py
  component: ycsb # we can take all the ssh connection parameters from the properties of the ycsb component

Complete workflow

By putting together all the tasks defined above we come up with the following workflow definition (workflow.yaml):

name: mongo workflow
tasks:
  - name: configure mongo
    operator: FileConfigurator
    arguments:
      sourcePath: /home/myusername/mongo/templates/mongo_launcher.sh.templ # MongoDB configuration script with placeholders for Akamas parameters
      targetPath: /home/myusername/mongo/launcher.sh # configuration script with interpolated Akamas parameters
      component: mongo # we can take all the ssh connection parameters from the properties of the mongo component

  - name: launch mongo
    operator: Executor
    arguments:
      command: bash /home/myusername/mongo/launcher.sh
      component: mongo # mongo should match the component of your system that represents your MongoDB instance

  - name: launch ycsb
    operator: Executor
    arguments:
      command: bash /home/myusername/ycsb/launch_load.sh
      component: ycsb # we can take all the ssh connection parameters from the properties of the ycsb component

  - name: parse ycsb results
    operator: Executor
    arguments:
      command: python /home/myuser/ycsb/parser.py
      component: ycsb # we can take all the ssh connection parameters from the properties of the ycsb component

We can create the workflow by running:

akamas create workflow workflow.yaml

Telemetry

Prometheus

Since we are employing Prometheus to extract MongoDB metrics, we can leverage the Prometheus provider to start ingesting data-points into Akamas. To use the Prometheus provider we need to define a telemetry-instance (prom.yaml):

provider: Prometheus # we are using Prometheus
config:
  address: ycsb.mycompany.com # address of Prometheus
  port: 9090

metrics:
  - metric: mongodb_connections_current
    datasourceMetric: mongodb_connections{instance="$INSTANCE$"}
    labels:
      - state
  - metric: mongodb_heap_used
    datasourceMetric: mongodb_extra_info_heap_usage_bytes{instance="$INSTANCE$"}
  - metric: mongodb_page_faults_total
    datasourceMetric: rate(mongodb_extra_info_page_faults_total{instance="$INSTANCE$"}[$DURATION$])
  - metric: mongodb_global_lock_current_queue
    datasourceMetric: mongodb_global_lock_current_queue{instance="$INSTANCE$"}
    labels:
      - type
  - metric: mongodb_mem_used
    datasourceMetric: mongodb_memory{instance="$INSTANCE$"}
    labels:
      - type
  - metric: mongodb_documents_inserted
    datasourceMetric: rate(mongodb_metrics_document_total{instance="$INSTANCE$", state="inserted"}[$DURATION$])
  - metric: mongodb_documents_updated
    datasourceMetric: rate(mongodb_metrics_document_total{instance="$INSTANCE$", state="updated"}[$DURATION$])
  - metric: mongodb_documents_deleted
    datasourceMetric: rate(mongodb_metrics_document_total{instance="$INSTANCE$", state="deleted"}[$DURATION$])
  - metric: mongodb_documents_returned
    datasourceMetric: rate(mongodb_metrics_document_total{instance="$INSTANCE$", state="returned"}[$DURATION$])

Notice: the fact that the instance definition contains the specification of Prometheus queries to map to Akamas metrics is temporary. In the next release, these queries will be embedded in Akamas.

By default, $DURATION$ will be replaced with 30s. You can override it to your needs by setting a duration property under prometheus within your mongo component

We can now create the telemetry instance and attach it to our system by running:

akamas create telemetry-instance prom.yaml "mongodb system"

CSV

Beyond MongoDB metrics, it is important to ingest into Akamas metrics related to the performance tests run with YCSB, in particular the throughput of operations. To achieve this we can leverage the CSV Provider which parses a CSV file to extract relevant metrics. The CSV file we are going to parse with the help of the provider is the one produced in the last task of the workflow of the study.

To start using the provider, we need to define a telemetry instance (csv.yaml):

provider: CSV
config:
  protocol: scp
  address: ycsb.mycompany.com
  username: myuser
  authType: key
  auth: ... RSA KEY ...
  remoteFilePattern: /home/ubuntu/ycsb/output.csv
  csvFormat: horizontal
  componentColumn: Component
  timestampColumn: timestamp
  timestampFormat: yyyy-MM-dd HH:mm:ss

metrics:
 # here we put which metric found in the csv provider should be mapped to which akamas metrics
 # we are only interested in the throughput, but you can add other metrics if you want
  - metric: throughput
    datasourceMetric: throughput
 ....

We can create the telemetry instance and attach it to our system by running:

akamas create telemetry-instance csv.yaml "mongodb system"

Study

Our goal for optimizing MongoDB is to maximize its throughput, measured using a performance test executed with YCSB.

Goal

Here’s the definition of the goal of our study, to maximize the throughput:

goal:
  objective: maximize
  function:
    formula: ycsb.throughput

Windowing

It is important that the throughput of our MongoDB instance should be considered valid only when it is stable, for this reason, we can use the stability windowing policy. This policy identifies a period of time with at least 100 samples with a standard deviation lower than 200 when the application throughput is maximum.

windowing:
  type: stability
  stability:
    # measure the goal function where the throughput has stdDev <= 200 for 100 consecutive data points
    metric: throughput
    labels:
      componentName: ycsb
    width: 100
    maxStdDev: 200
    # take only the temporal window when the throughput is maximum
    when:
      metric: throughput
      is: max
      labels:
        componentName: ycsb

Parameters to optimize

We are going to optimize every MongoDB parameter:

parametersSelection:
  - name: mongo.mongodb_syncdelay
  - name: mongo.mongodb_eviction_dirty_trigger
  - name: mongo.mongodb_eviction_dirty_target
  - name: mongo.mongodb_eviction_target
  - name: mongo.mongodb_eviction_trigger
  - name: mongo.mongodb_eviction_threads_min
  - name: mongo.mongodb_eviction_threads_max
  - name: mongo.mongodb_cache_size
    # here we have changed the domain of the cache size since we suppose our mongo.mycompany.com host has 32gb of RAM, you should adapt to your own instance
    domain: [500, 32000]

Steps

We are going to add to our study two steps:

A baseline step, in which we set a cache size of 1GB and use the default values for all the other MongoDB parameters
An optimize step, in which we perform 100 experiments to generate the best configuration for MongoDB

Here’s what these steps look like:

steps:
- name: baseline
  type: baseline
  values:
    mongo.mongodb_cache_size: 1024
  renderParameters:
  # use also all the other MongoDB parameters at their default value
  - mongo.*
- name: optimize mongo
  type: optimize
  numberOfExperiments: 100

Complete study

Here’s the study definition (study.yaml) for optimizing MongoDB:

name: study to tune MongoDB
description: study to tune MongoDB with YCSB perf test
system: mongodb system
workflow: mongo workflow
# Goal
goal:
  objective: maximize
  function:
    formula: ycsb.throughput
# Windowing
windowing:
  type: stability
  stability:
    metric: throughput
    labels:
      componentName: ycsb
    width: 100
    maxStdDev: 200
    when:
      metric: throughput
      is: max
      labels:
        componentName: ycsb
# parameters selection
parametersSelection:
  - name: mongo.mongodb_syncdelay
  - name: mongo.mongodb_eviction_dirty_trigger
  - name: mongo.mongodb_eviction_dirty_target
  - name: mongo.mongodb_eviction_target
  - name: mongo.mongodb_eviction_trigger
  - name: mongo.mongodb_eviction_threads_min
  - name: mongo.mongodb_eviction_threads_max
  - name: mongo.mongodb_cache_size
    # here we have changed the domain of the cache size since we suppose our mongo.mycompany.com host has 32gb of RAM
    domain: [500, 32000]
  - name: mongo.mongodb_datafs_use_noatime
parameterConstraints:
- name: c1
  formula: mongo.mongodb_eviction_threads_min <= mongo.mongodb_eviction_threads_max
- name: c2
  formula: mongodb_eviction_dirty_target <= mongodb_eviction_target
- name: c3
  formula: mongodb_eviction_dirty_trigger <= mongodb_eviction_trigger
# steps
steps:
- name: baseline
  type: baseline
  values:
    mongo.mongodb_cache_size: 1024
  renderParameters:
  # use also all the other MongoDB parameters at their default value
  - mongo.*
- name: optimize mongo
  type: optimize
  numberOfExperiments: 100

You can create the study by running:

akamas create study study.yaml

You can then start it by running:

akamas start study "study to tune MongoDB"

Optimizing a Kubernetes application

In this example, we’ll optimize Online Boutique, a demo e-commerce application running on microservices, by tuning the resources allocated to a selection of pods. This is a common use case where we want to minimize the cost associated with running an application without impacting the SLO.

Notice: all the required artifacts are published in this public repository.

Environment setup

The test environment includes the following instances:

Akamas: the instance running Akamas.
Cluster: an instance hosting a Minikube cluster.

You can configure the Minikube cluster using the scripts provided in the public repository by running the command

kubectl apply -f kubernetes-online-boutique/kube/prometheus.yaml

Telemetry Infrastructure setup

To gather metrics about the application we will use Prometheus. It will be automatically configured by applying the artifacts in the repository with the following command:

kubectl apply -f kubernetes-online-boutique/kube/

Application and Test tool

The targeted system is Online Boutique, a microservice-based demo application. In the same namespace, a deployment running the load generator will stress the boutique and forward the performance metrics to Prometheus.

To configure the application and the load generator on your (Minikube) cluster, apply the definitions provided in the public repository by running the following command:

kubectl apply -f kubernetes-online-boutique/kube/

Optimization setup

In this section, we will guide you through the steps required to set up the optimization on Akamas.

If you have not installed the Kubernetes optimization pack yet, take a look at the Kubernetes optimization pack page to proceed with the installation.

Notice: the artifacts to create the Akamas entities can be found in the public repository, under the akamas directory.

System

System Online Boutique

Here’s the definition of the system containing our components and telemetry-instances for this example:

name: Online Boutique
description: The Online Boutique by Google

To create the system run the following command:

akamas create component application.yaml 'Online Boutique'

Component online_boutique

We’ll use a component of type WebApplication to represent at a high level the Online Boutique application. To identify the related Prometheus metrics the configuration requires the prometheus property for the telemetry service, detailed later in this guide.

Here’s the definition of the component:

name: online_boutique
description: The Online Boutique application
componentType: Web Application
properties:
  prometheus:
    instance: .*
    job: .*
    namespace: akamas-demo
    container: server|redis

To create the component in the system run the following command:

akamas create component application.yaml 'Online Boutique'

Component frontend and productcatalogservice

The public repository contains the definition of all the services that compose Online Boutique. In this guide, for the sake of simplicity, we’ll only tune the resources of the containers in the frontend and the product-catalog pods, defined as components of type Kubernetes Container.

Here’s their definition:

name: frontend
description: The frontend of the online boutique by Google
componentType: Kubernetes Container
properties:
  prometheus:
    job: .*
    instance: .*
    name: .*
    pod: ak-frontend.*
    container: server

name: productcatalogservice
description: The productcatalogservice of the online boutique by Google
componentType: Kubernetes Container
properties:
  prometheus:
    job: .*
    instance: .*
    name: .*
    pod: ak-productcatalogservice.*
    container: server

To create the component in the system run the following command:

akamas create component frontend.yaml 'Online Boutique'
akamas create component productcatalogservice.yaml 'Online Boutique'

Workflow

The workflow is divided into the following steps:

Create the YAML artifacts with the updated resource limits for the tuned containers.
Apply the updated definitions to the cluster.
Wait for the rollout to complete.
Start the load generator
Let the test run for a fixed amount of time
Stop the test and reset the load generator

The following is the definition of the workflow:

name: boutique
tasks:
  - name: Configure Online Boutique
    operator: FileConfigurator
    arguments:
      source:
        hostname: CLUSTER_INSTANCE_IP
        username: akamas
        password: akamas
        path: boutique.yaml.templ
      target:
        hostname: cluster
        username: akamas
        password: akamas
        path: boutique.yaml

  - name: Apply new configuration to the Online Boutique
    operator: Executor
    arguments:
      host:
        hostname: CLUSTER_INSTANCE_IP
        username: akamas
        password: akamas
      command: kubectl apply -f boutique.yaml

  - name: Check Online Boutique is up
    operator: Executor
    arguments:
      retries: 0
      host:
        hostname: CLUSTER_INSTANCE_IP
        username: akamas
        password: akamas
      command: kubectl wait --for=condition=available deploy/ak-frontend deploy/ak-productcatalogservice --timeout=30s

  - name: Start Locust Test
    operator: Executor
    arguments:
      host:
        hostname: CLUSTER_INSTANCE_IP
        username: akamas
        password: akamas
      command: bash load-test.sh start

  - name: Test
    operator: Sleep
    arguments:
      seconds: 150

  - name: Stop Locust test
    operator: Executor
    arguments:
      host:
        hostname: CLUSTER_INSTANCE_IP
        username: akamas
        password: akamas
      command: bash load-test.sh stop

To better illustrate the process, here is a snippet of the template file used to update the resource limits for the frontend deployment.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ak-frontend
  namespace: akamas-demo
spec:
  selector:
    matchLabels:
      app: ak-frontend
  template:
    metadata:
      labels:
        app: ak-frontend
    # other definitions...
    spec:
      containers:
        - name: server
          image: gcr.io/google-samples/microservices-demo/frontend:v0.2.2
          # other definitions...
          resources:
            requests:
              cpu: ${frontend.cpu_limit}
              memory: ${frontend.memory_limit}
            limits:
              cpu: ${frontend.cpu_limit}
              memory: ${frontend.memory_limit}
# other definitions...

The following are respectively the script to start and stop the load generator:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ak-frontend
  namespace: akamas-demo
spec:
  selector:
    matchLabels:
      app: ak-frontend
  replicas: 1
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
  template:
    metadata:
      labels:
        app: ak-frontend
    spec:
      serviceAccountName: default
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 1
              preference:
                matchExpressions:
                  - key: akamas/node
                    operator: In
                    values:
                      - akamas
      containers:
        - name: server
          image: gcr.io/google-samples/microservices-demo/frontend:v0.2.2
          ports:
            - containerPort: 8080
          readinessProbe:
            initialDelaySeconds: 10
            httpGet:
              path: "/_healthz"
              port: 8080
              httpHeaders:
                - name: "Cookie"
                  value: "shop_session-id=x-readiness-probe"
          livenessProbe:
            initialDelaySeconds: 10
            httpGet:
              path: "/_healthz"
              port: 8080
              httpHeaders:
                - name: "Cookie"
                  value: "shop_session-id=x-liveness-probe"
          env:
            - name: PORT
              value: "8080"
            - name: PRODUCT_CATALOG_SERVICE_ADDR
              value: "ak-productcatalogservice:3550"
            - name: CURRENCY_SERVICE_ADDR
              value: "ak-currencyservice:7000"
            - name: CART_SERVICE_ADDR
              value: "ak-cartservice:7070"
            - name: RECOMMENDATION_SERVICE_ADDR
              value: "ak-recommendationservice:8080"
            - name: SHIPPING_SERVICE_ADDR
              value: "ak-shippingservice:50051"
            - name: CHECKOUT_SERVICE_ADDR
              value: "ak-checkoutservice:5050"
            - name: AD_SERVICE_ADDR
              value: "ak-adservice:9555"
            - name: ENV_PLATFORM
              value: "aws"
            - name: DISABLE_TRACING
              value: "1"
            - name: DISABLE_PROFILER
              value: "1"
          resources:
            requests:
              cpu: ${frontend.cpu_limit}
              memory: ${frontend.memory_limit}
            limits:
              cpu: ${frontend.cpu_limit}
              memory: ${frontend.memory_limit}

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ak-productcatalogservice
  namespace: akamas-demo
spec:
  selector:
    matchLabels:
      app: ak-productcatalogservice
  replicas: 1
  template:
    metadata:
      labels:
        app: ak-productcatalogservice
    spec:
      serviceAccountName: default
      terminationGracePeriodSeconds: 5
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 1
              preference:
                matchExpressions:
                  - key: akamas/node
                    operator: In
                    values:
                      - akamas
      containers:
        - name: server
          image: gcr.io/google-samples/microservices-demo/productcatalogservice:v0.2.2
          ports:
            - containerPort: 3550
          env:
            - name: PORT
              value: "3550"
            - name: DISABLE_STATS
              value: "1"
            - name: DISABLE_TRACING
              value: "1"
            - name: DISABLE_PROFILER
              value: "1"
          readinessProbe:
            exec:
              command: ["/bin/grpc_health_probe", "-addr=:3550"]
          livenessProbe:
            exec:
              command: ["/bin/grpc_health_probe", "-addr=:3550"]
          resources:
            requests:
              cpu: ${productcatalogservice.cpu_limit}
              memory: ${productcatalogservice.memory_limit}
            limits:
              cpu: ${productcatalogservice.cpu_limit}
              memory: ${productcatalogservice.memory_limit}

#/bin/bash

ACTION=$1

LOCUST_ENDPOINT="$(minikube service -n akamas-demo ak-loadgenerator | awk '/web-ui.*http/ {print $8}')"

case $ACTION in
  start)
    echo curl -X POST -d 'user_count=100' -d 'spawn_rate=3' -d 'host=http://ak-frontend:80'  "${LOCUST_ENDPOINT}/swarm"
    ;;
  stop)
    echo curl "${LOCUST_ENDPOINT}/stop"
    echo curl "${LOCUST_ENDPOINT}/stats/reset"
    ;;
  *)
    echo "Unrecognized option '${ACTION}'"
    exit 1
    ;;
esac

Telemetry

If you have not installed the Prometheus telemetry provider yet, take a look at the telemetry provider page Prometheus provider to proceed with the installation.

With the definition of the telemetry instance shown below, we import the end-user performance metrics provided by the load-generator, along with a custom definition of "cost" given by a weighted sum of the CPU and memory allocated for the pods in the cluster:

provider: Prometheus
config:
  address: CLUSTER_IP
  port: PROM_PORT
metrics:
  - metric: users
    datasourceMetric: "locust_users"
  - metric: transactions_throughput
    datasourceMetric: 'rate(locust_requests_num_requests{name="Aggregated"}[30s]) - rate(locust_requests_num_failures{name="Aggregated"}[30s])'
  - metric: transactions_error_throughput
    datasourceMetric: 'rate(locust_requests_num_failures{name="Aggregated"}[30s])'
  - metric: transactions_error_rate
    datasourceMetric: "locust_requests_fail_ratio"
  - metric: transactions_response_time
    datasourceMetric: 'locust_requests_avg_response_time{name="Aggregated"}'
  - metric: transactions_response_time_p50
    datasourceMetric: 'locust_requests_current_response_time_percentile_50'
  - metric: transactions_response_time_p95
    datasourceMetric: 'locust_requests_current_response_time_percentile_95'

  - metric: cost
    datasourceMetric: 'sum(kube_pod_container_resource_requests{resource="cpu" %FILTERS%})*29 + sum(kube_pod_container_resource_requests{resource="memory" %FILTERS%})/1024/1024/1024*3.2'

To create the telemetry instance execute the following command:

akamas create telemetry-instance prometheus.yml 'Online Boutique'

Study

With this study, we want to minimize the "cost" of running the application, which, according to the definition described in the previous section, it means reducing the resources allocated to the tuned pods in the cluster. At the same time, we want the application to stay within the expected SLO, and that is obtained by defining a constraint on the response time and error rate recorded by the load generator.

name: Minimize Kubernetes Online Boutique cost while matching SLOs
system: Online Boutique
workflow: boutique

goal:
  objective: minimize
  constraints:
    absolute:
      - name: response_time
        formula: online_boutique.transactions_response_time <= 500
      - name: error_rate
        formula: online_boutique.transactions_error_rate <= 0.02
  function:
    formula: online_boutique.cost

windowing:
  type: trim
  trim: [1m, 30s]
  task: Test

metricsSelection:
  - online_boutique.cost
  - online_boutique.transactions_throughput
  - online_boutique.transactions_error_rate
  - online_boutique.transactions_response_time
  - online_boutique.transactions_response_time_p95
  - online_boutique.users
  - frontend.container_cpu_used
  - frontend.container_cpu_util
  - frontend.container_cpu_limit
  - frontend.container_cpu_throttle_time
  - frontend.container_memory_used
  - frontend.container_memory_util
  - frontend.container_memory_limit
  - productcatalogservice.container_cpu_used
  - productcatalogservice.container_cpu_util
  - productcatalogservice.container_cpu_limit
  - productcatalogservice.container_cpu_throttle_time
  - productcatalogservice.container_memory_used
  - productcatalogservice.container_memory_util
  - productcatalogservice.container_memory_limit

parametersSelection:
  - name: frontend.cpu_limit
    domain: [100, 300]
  - name: frontend.memory_limit
    domain: [64, 512]
  - name: productcatalogservice.cpu_limit
    domain: [100, 500]
  - name: productcatalogservice.memory_limit
    domain: [64, 512]

steps:
  - name: baseline
    type: baseline
    values:
      frontend.cpu_limit: 300
      frontend.memory_limit: 256
      productcatalogservice.cpu_limit: 300
      productcatalogservice.memory_limit: 256

  - name: optimize
    type: optimize
    numberOfExperiments: 50

To create and run the study execute the following commands:

akamas create study study.yaml
akamas start study 'Minimize Kubernetes Online Boutique cost while matching SLOs'

Leveraging Ansible to automate AWS instance management

Ansible is an open-source software automation tool suited for instance configuration and provisioning, enabling an Infrastructure as Code approach to the Cloud. In this page we provide a set of ansible-playbooks templates to perform the most common task to tune EC2 instance types with Akamas, such as:

EC2 instance creation
EC2 instance termination
EC2 instance resizing

Refer to the Ansible documentation and to the Ansible ec2 module for more details, and make sure to check the concepts behind inventory management to build a robust automation.

The orchestrator requires access to an account or role linked to the correct policies; this requires managing AWS Policies and having access to the required security groups.

Instance Creation

The following example playbook provisions an EC2 instance using the latest Ubuntu 18-04 LTS image and then waits for it to be available. The playbook requires the following set of arguments:

key: the name of the SSH key pair to use
Name: the instance name
security_group: the name of the AWS security group
region: the selected AWS region

You can update the ec2_ami_info task to query for a different AMI family or specify directly the id under ec2.image.

When executing the script we must assign the following arguments as extra-vars:

intance_type: type of instance to provision
volume_size: size of the attached volume

# Launch an ubuntu instance and wait for ssh

- name: Create an instance request
  hosts: localhost
  gather_facts: False

  tasks:
    - name: query api
      ec2_ami_info:
        filters:
          name: "ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-*"
          owner-id: "099720109477"  # Canonical Group Limited
      register: amis
    - name: sort by creation date
      set_fact:
        sorted_amis: "{{ amis.images | sort(attribute='creation_date') }}"
    - name: get latest
      set_fact:
        latest_ami: "{{ sorted_amis | last }}"

    - name: Launch instance
      ec2:
         key_name: "{{ key }}"
         instance_type: "{{ instance_type | default('m5.xlarge') }}"
         group:
          - <your-security-groups>
         image: "{{ latest_ami.image_id }}"
         count: "{{ count | default('1') }}"
         wait: yes
         wait_timeout: 500
         region: "{{ region }}"
         spot_wait_timeout: 600
         instance_initiated_shutdown_behavior: terminate
         ebs_optimized: yes
         volumes:
           - device_name: /dev/sda1
             volume_type: gp2
             volume_size: "{{ volume_size | default('20') }}"
             delete_on_termination: yes
         instance_tags:
           Name: "{{ Name }}"
           CNAME: "{{ Name }}.<your-domain>"
      register: ec2

    - name: Wait for SSH to come up
      wait_for:
        host: "{{ item.public_dns_name }}"
        port: 22
        delay: 60
        timeout: 320
        state: started
      with_items: "{{ ec2.instances }}"

To apply the EC2 parameters from the AWS Optimization Pack selected by the Akamas engine you can generate the playbook arguments through a template like the following one, where ec2 is the name of the component:

ansible-playbook -i inventory --extra-vars "instance_type=${ec2.aws_ec2_instance_type}.${ec2.aws_ec2_instance_size}" provision.yaml

Instance Termination

The following playbook terminates all instances with the specified name (or any other tag). It requires the following arguments:

instance_name: the name instances name
region: the selected AWS region

# Terminate an aws instance

- name: Terminate instance
  hosts: localhost
  gather_facts: False
  tasks:
  - name: retrieve instance info
    ec2_instance_info:
      filters:
        "tag:Name": "{{ instance_name }}"
    register: ec2

  - name: terminate the instance
    ec2:
      state: absent
      instance_ids:
        - "{{ item.instance_id }}"
      region: "{{ region }}"
    with_items: "{{ ec2.instances }}"

Instance Resizing

Instance resizing is a little trickier to deploy as it requires you to install AWS CLI and setup the required credentials. The following playbook provides a simple way to stop, update and restart your instance: it is intended as a building block for more elaborate workflows.

It makes use of a list of arguments:

instance_name: your instance name
region: the selected AWS region

For a successful workflow it requires:

The instance to exist
The instance to be unique

# Change instance type, requires AWS CLI

- name: Resize the instance
  hosts: localhost
  gather_facts: no
  connection: local
  tasks:
  - name: save instance info
    ec2_instance_info:
      filters:
        "tag:Name": "{{ instance_name }}"
    register: ec2
  - name: stop the instance
    ec2:
      region: "{{ region | default('us-east-2') }}"
      state: stopped
      instance_ids:
        - "{{ ec2.instances[0].instance_id }}"
      instance_type: "{{ ec2.instances[0].instance_type }}"
      wait: True
  - name: Change the instances ec2 type
    shell: >
       aws ec2 modify-instance-attribute --instance-id "{{ ec2.instances[0].instance_id }}"
       --instance-type "{{ new_instance_type }}"
    delegate_to: localhost
  - name: restart the instance
    ec2:
      region: "{{ region }}"
      state: running
      instance_ids:
        - "{{ ec2.instances[0].instance_id }}"
      wait: True
    register: ec2
  - name: wait for SSH to come up
    wait_for:
      host: "{{ item.public_dns_name }}"
      port: 22
      delay: 60
      timeout: 500
      state: started
    with_items: "{{ ec2.instances }}"

To apply the EC2 parameters from the AWS Optimization Pack selected by the Akamas engine you can generate the playbook arguments through a template like the following, where ec2 is the name of the component:

ansible-playbook -i inventory --extra-vars "new_instance_type=${ec2.aws_ec2_instance_type}.${ec2.aws_ec2_instance_size}" resize.yaml

Guidelines for optimizing AWS EC2 instances

This page provides a list of best practices when optimizing applications on AWS EC2 instances with Akamas.

Environment setup

Before planning out an AWS study you must first ensure you have all the rights to effectively launch it. This means you have to:

Check your policies
Check your security groups

Policies

Policies allow instance manipulation and access by being attached to users, group or AWS resources.

The suggested best practice to avoid impacting other critical environments, like production, is to create multiple accounts in order to isolate the Akamas instance and the tested resources.

Whether the case, you’re also required to comply with the AWS Policies for instance manipulation. In the following, we show a standard, tag and resource-based policy.

Tags are indeed a robust way to scope your Akamas optimizations: you can invariantly refer to instances and sets of instances across generations and stack them for more elaborate conditions.

Tailoring your policies on your EC2 resources neatly avoids collateral effects during your experiments.

AWS Policies

In order to correctly enable the provisioning and manipulation of EC2 instances you have to set the correct AWS policies. Notice that AWS offers several pre-made policies.

Here we provide some basic resource-based permissions required in our scope:

EC2 instance start
EC2 instance stop
EC2 instance reboot
EC2 instance termination
EC2 instance description
EC2 instance manipulation

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:StartInstances",
                "ec2:StopInstances",
                "ec2:RebootInstances",
                "ec2:TerminateInstances",
                "ec2:ModifyInstanceAttribute"
            ],
            "Resource": "arn:aws:ec2:*:*:instance/*",
            "Condition": {
                "StringEquals": {
                    "ec2:ResourceTag/Owner": "${aws:username}"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": "ec2:DescribeInstances",
            "Resource": "*"
        }
    ]
}

Consider using tags in order to enforce finer access control for the automation tools, as shown in the example above.

Refer to the official reference for a complete list of AWS EC2 policies.

Security Groups

Security groups control inbound and outbound traffic to and from your AWS instances. In your typical optimization scenario the security group should allow inbound connections on SSH and any other relevant port, including the ones used to gather the telemetry, from the Akamas instance to the ones in the tuned system.

Workflows setup

Akamas workflows managing EC2 instances usually expect you to either create throwaway instances or resize already existing ones. This page provides an example for both cases: all you need to do is to apply the FileConfigurator operator by making the instance type and instance size parameters tunable.

Notice that:

The Sleep operator comes in handy in-between instance provisioning. Creating an instance and more so waiting for its DNS to come up may take a while, so forcing a few minutes to wait is usually worth it. This operator is a viable option if you can’t force it through an automation tool.
The Executor operator is better suited for launching benchmarks and applications rather than for setting up your instance. It’s better to use automation tools or ready-to-use AMIs to set up all required packages and dependencies, as the workflow should cover your actual study case.
The study environment has to be restored to its original configuration between experiments. While that is quite simple to achieve when creating and terminating instances, this may require additional steps in a resizing scenario.
The deletion and creation of new instances determine changes in ECDSA host keys. This may be interpreted as DNS spoofing from the Akamas instance, so consider overriding the default settings in such contexts.

Telemetry setup

While the CloudWatch Exporter is a natural choice for EC2 instance monitoring, EC2 instances are often Linux instances, so it’s useful to use the Linux optimization pack through the Prometheus provider paired with the Node exporter every time you can directly access the instance.

Optimizing a sample application running on AWS

In this example, you will go through the optimization of a Spark based PageRank algorithm on AWS instances. We’ll be using a PageRank implementation included in Renaissance, an industry-standard Java benchmarking suite developed by Oracle Labs, tweaking both Java and AWS parameters to improve the performance of our application.

Environment setup

For this example, you’re expected to use two dedicated machines:

an Akamas instance
a Linux-based AWS EC2 instance

The Akamas instance requires provisioning and manipulating instances, therefore it requires to be enabled to do so by setting AWS Policies, integrating with orchestration tools (such as Ansible) and an inventory linked to your AWS EC2 environment.

The Linux-based instance will run the application benchmark, so it requires the latest open-jdk11 release

sudo apt install openjdk-11-jre

Telemetry Infrastructure setup

For this study you’re going to require the following telemetry providers:

CSV Provider to parse the results of the benchmark
Prometheus provider to monitor the instance
AWS Telemetry provider to extract instance price

Application and Test tool

The renaissance suite provides the benchmark we’re going to optimize.

Since the application consists of a jar file only, the setup is rather straightforward; just download the binary in the ~/renaissance/ folder:

mkdir ~/renaissance
cd ~/renaissance
wget -O renaissance.jar https://github.com/renaissance-benchmarks/renaissance/releases/download/v0.10.0/renaissance-gpl-0.10.0.jar

In the same folder upload the template file launch.benchmark.sh.temp, containing the script that executes the benchmark using the provided parameters and parses the results:

#!/bin/bash
java -XX:MaxRAMPercentage=60 ${jvm.*} -jar renaissance.jar -r 50 --csv renaissance.csv page-rank

total_time=$(awk -F"," '{total_time+=$2}END{print total_time}' ./renaissance.csv)
first_line=$(head -n 1 renaissance.csv)
end_time=$(tail -n 1 renaissance.csv | cut -d',' -f3)
start_time=$(sed '2q;d' renaissance.csv | cut -d',' -f4)
echo $first_line,"TS,COMPONENT" > renaissance-parsed.csv
ts=$(date -d @$(($start_time/1000)) "+%Y-%m-%d %H:%M:%S")

echo -e "page-rank,$total_time,$end_time,$start_time,$ts,pagerank" >> renaissance-parsed.csv

You may find further info about the suite and its benchmarks in the official doc.

Optimization setup

In this section, we will guide you through the steps required to set up the optimization on Akamas.

Optimization packs

This example requires the installation of the following optimization packs:

System

Our system could be named renaissance after its application, so you’ll have a system.yaml file like this:

name: jvm
description: The JVM running the benchmark
componentType: java-openjdk-11
properties:
    prometheus:
      job: jmx
      instance: jmx_instance

Then create the new system resource:

akamas create component component-jvm.yaml renaissance

The renaissance system will then have three components:

A benchmark component
A Java component
An EC2 component, i.e. the underlying instance

Java component

Create a component-jvm.yaml file like the following:

name: jvm
description: The JVM running the benchmark
componentType: java-openjdk-11
properties:
    prometheus:
      job: jmx
      instance: jmx_instance

Then type:

akamas create component component-jvm.yaml renaissance

Benchmark component

Since there is no optimization pack associated with this component, you have to create some extra resources.

A metrics.yaml file for a new metric tracking execution time:

metrics:
  - name: elapsed
    unit: nanoseconds
    description: The duration of the benchmark execution

A component-type benchmark.yaml:

name: benchmark
description: A component type for the Renaissance Java benchmarking suite
metrics:
  - name: elapsed
parameters: []

The component pagerank.yaml:

name: pagerank
description: The pagerank application included in Renaissance benchmarks
componentType: benchmark

Create your new resources, by typing in your terminal the following commands:

akamas create metrics metrics.yaml
akamas create component-type benchmark.yaml
akamas create component pagerank.yaml renaissance

EC2 component

Create a component-ec2.yaml file like the following:

name: instance
description: The ec2 instance the benchmark runs on
componentType: ec2
properties:
  hostname: renaissance.akamas.io
  sshPort: 22
  instance: ec2_instance
  username:  ubuntu
  key: # SSH KEY
  ec2:
    region: us-east-2 # This is just a reference

Then create its resource by typing in your terminal:

akamas create component component-ec2.yaml renaissance

Workflow

The workflow in this example is composed by three main steps:

Update the instance type
Run the application benchmark
Stop the instance

To manage the instance we are going to integrate a very simple Ansible in our workflow: the FileConfigurator operator will replace the parameters in the template file in order to generate the code run by the Executor operator, as explained in the Ansible page.

In detail:

Update the instance size
1. Generate the the playbook file from the template
2. Update the instance using the playbook
3. Wait for the instance to be available
Run the application benchmark
1. Configure the benchmark Java launch script
2. Execute the launch script
3. Parse PageRank output to make it consumable by the CSV telemetry instance
Stop the instance
1. Configure the playbook to stop an instance with a specific instance id
2. Run the playbook to stop the instance

The following is the template of the Ansible playbook:

# Change instance type, requires AWS CLI

- name: Resize the instance
  hosts: localhost
  gather_facts: no
  connection: local
  tasks:
  - name: save instance info
    ec2_instance_info:
      filters:
        "tag:Name": <your-instance-name>
    register: ec2
  - name: Stop the instance
    ec2:
      region: <your-aws-region>
      state: stopped
      instance_ids:
        - "{{ ec2.instances[0].instance_id }}"
      instance_type: "{{ ec2.instances[0].instance_type }}"
      wait: True
  - name: Change the instances ec2 type
    shell: >
       aws ec2 modify-instance-attribute --instance-id "{{ ec2.instances[0].instance_id }}"
       --instance-type "${ec2.aws_ec2_instance_type}.${ec2.aws_ec2_instance_size}"
    delegate_to: localhost
  - name: restart the instance
    ec2:
      region: <your-aws-region>
      state: running
      instance_ids:
        - "{{ ec2.instances[0].instance_id }}"
      wait: True
    register: ec2
  - name: wait for SSH to come up
    wait_for:
      host: "{{ item.public_dns_name }}"
      port: 22
      delay: 60
      timeout: 320
      state: started
    with_items: "{{ ec2.instances }}"

The following is the workflow configuration file:

name: Pagerank AWS optimization
tasks:

  # Creating the EC2 instance
  - name: Configure provisioning
    operator: FileConfigurator
    arguments:
      sourcePath: /home/ubuntu/ansible/resize.yaml.templ
      targetPath: /home/ubuntu/ansible/resize.yaml
      host:
        hostname: bastion.akamas.io
        username: ubuntu
        key: # SSH KEY

  - name: Execute Provisioning
    operator: Executor
    arguments:
      command: ansible-playbook /home/akamas/ansible/resize.yaml
      host:
        hostname: bastion.akamas.io
        username: akamas
        key: # SSH KEY

  # Waiting for the instance to come up and set up its DNS
  - name: Pause
    operator: Sleep
    arguments:
      seconds: 120

  # Running the benchmark
  - name: Configure Benchmark
    operator: FileConfigurator
    arguments:
        source:
            hostname: renaissance.akamas.io
            username: ubuntu
            path: /home/ubuntu/renaissance/launch_benchmark.sh.templ
            key: # SSH KEY
        target:
            hostname: renaissance.akamas.io
            username: ubuntu
            path: /home/ubuntu/renaissance/launch_benchmark.sh
            key: # SSH KEY

  - name: Launch Benchmark
    operator: Executor
    arguments:
      command: bash /home/ubuntu/renaissance/launch_benchmark.sh
      host:
        hostname: renaissance.akamas.io
        username: ubuntu
        key: # SSH KEYCreate the workflow resource by typing in your terminal:

Telemetry

If you have not installed the Prometheus telemetry provider or the CSV telemetry provider yet, take a look at the telemetry provider pages Prometheus provider and CSV Provider to proceed with the installation.

Prometheus

Prometheus allows us to gather jvm execution metrics through the jmx exporter: download the java agent required to gather metrics from here, then update the two following files:

The prometheus.yml file, located in your Prometheus folder:

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: prometheus
    static_configs:
    - targets: ['localhost:9090']

  - job_name: jmx
    static_configs:
    - targets: ["localhost:9110"]
    relabel_configs:
    - source_labels: ["__address__"]
      regex: "(.*):.*"
      target_label: instance
      replacement: jmx_instanc

The config.yml file you have to create in the ~/renaissance folder:

startDelaySeconds: 0
username:
password:
ssl: false
lowercaseOutputName: false
lowercaseOutputLabelNames: false
# using the property above we are telling the export to export only relevant java metrics
whitelistObjectNames:
  - "java.lang:*"
  - "jvm:*"

Now you can create a prometheus-instance.yaml file:

provider: Prometheus
config:
  address: renaissance.akamas.io
  port: 9090

Then you can install the telemetry instance:

akamas create telemetry-instance prometheus-instance.yaml renaissance

You may find further info on exporting java metrics to Prometheus here.

CSV - Telemetry instance

Create a telemetry-csv.yaml file to read the benchmark output:

provider: CSV
config:
  protocol: scp
  address: renaissance.akamas.io
  username: ubuntu
  authType: key
  auth: # SSH KEY
  remoteFilePattern: /home/ubuntu/renaissance/renaissance-parsed.csv
  csvFormat: horizontal
  componentColumn: COMPONENT
  timestampColumn: TS
  timestampFormat: yyyy-MM-dd HH:mm:ss

metrics:
  - metric: elapsed
    datasourceMetric: nanos

Then create the resource by typing in your terminal:

akamas create telemetry-instance renaissance

Study

Here we provide a reference study for AWS. As we’ve anticipated, the goal of this study is to optimize a sample java application, the PageRank benchmark you may find in the renaissance benchmark suite by Oracle.

Our goal is rather simple: minimizing the product between the benchmark execution time and the instance price, that is, finding the most cost-effective instance for our application.

Create a study.yaml file with the following content:

name: aws
description: Tweaking aws and the JVM to optimize the page-rank application.
system: renaissance

goal:
  objective: minimize
  function:
    formula: benchmark.elapsed * aws.aws_ec2_price

workflow: workflow-aws

parametersSelection:
  - name: aws.aws_ec2_instance_type
    categories: [c5,c5d,c5a,m5,m5d,m5a,r5,r5d,r5a]
  - name: aws.aws_ec2_instance_size
    categories: [large,xlarge,2xlarge,4xlarge]
  - name: jvm.jvm_gcType
  - name: jvm.jvm_newSize
  - name: jvm.jvm_maxHeapSize
  - name: jvm.jvm_minHeapSize
  - name: jvm.jvm_survivorRatio
  - name: jvm.jvm_maxTenuringThreshold

steps:
  - name: baseline
    type: baseline
    numberOfTrials: 2
    values:
     aws.aws_ec2_instance_type: c5
     aws.aws_ec2_instance_size: 2xlarge
     jvm.jvm_gcType: G1
  - name: optimize
    type: optimize
    numberOfExperiments: 60

Then create the corresponding Akamas resource and start the study:

akamas create study study.yaml
akamas start study aws

Optimizing a Spark application

In this example study we’ll tune the parameters of SparkPi, one of the examples applications provided by most of the Apache Spark distributions, to minimize its execution time. Application monitoring is provided by the Spark History Server APIs.

Environment setup

The test environment includes the following instances:

Akamas: instance running Akamas
Spark cluster: composed of instances with 16 vCPUs and 64 GB of memory, where the Spark binaries are installed under /usr/lib/spark. In particular, the roles are:
- 1x master instance: the Spark node running the resource manager and Spark History Server (host: sparkmaster.akamas.io)
- 2x worker instances: the other instances in the cluster

Telemetry Infrastructure setup

To gather metrics about the application we will leverage the Spark History Server. If it is not already running, start it on the master instance with the following command:

/usr/lib/spark/sbin/start-history-server.sh

Application and Test tools

To make sure the tested application is available on your cluster and runs correctly, execute the following commands:

file /usr/lib/spark/examples/jars/spark-examples.jar
spark-submit \
  --master yarn --deploy-mode client \
  --class 'org.apache.spark.examples.SparkPi' \
  /usr/lib/spark/examples/jars/spark-examples.jar 100

Optimization setup

In this section, we will guide you through the steps required to set up on Akamas the optimization of the Spark application execution.

System

System spark

Here’s the definition of the system we will use to group our components and telemetry instances for this example:

name: spark
description: A system to tune the Spark Pi example application

To create the system run the following command:

akamas create system system.yaml

Component sparkPi

We’ll use a component of type Spark Application 2.3.0 to represent the application running on the Apache Spark framework 2.3.

In the snippet shown below, we specify:

the field properties required by Akamas to connect via SSH to the cluster master instance
the parameters required by spark-submit to execute the application
the sparkApplication flag required by the telemetry instance to associate the metrics from the History Server to this component

name: sparkPi
description: The Spark Application used to calculate KPIs for ContentWise Analytics
componentType: Spark Application 2.3.0

properties:
  hostname: sparkmaster.akamas.io
  username: hadoop
  key: ssh_key

  master: yarn
  deployMode: client
  className: org.apache.spark.examples.SparkPi
  file: /usr/lib/spark/examples/jars/spark-examples.jar
  args: [ 1000 ]

  sparkApplication: 'true'

To create the component in the system run the following command:

akamas create component sparkPi.yaml spark

Workflow

The workflow used for this study contains only a single stage, where the operator submits the application along with the Spark parameters under test.

Here’s the definition of the workflow:

name: Run SparkPi
tasks:
- name: run application
  operator: SSHSparkSubmit
  arguments:
    component: sparkPi
    retries: 0

To create the workflow run the following command:

akamas create workflow workflow.yaml

Telemetry

If you have not installed the Spark History Server telemetry provider yet, take a look at the telemetry provider page Spark History Server Provider to proceed with the installation.

Here’s the definition of the component, specifying the History Server endpoint:

provider: SparkHistoryServer
config:
  address: sparkmaster.akamas.io
  port: 18080

  importLevel: job

To create the telemetry instance in the system run the following command:

akamas create telemetry-instance telemetry.yaml spark

This telemetry instance will be able to bind the fetched metrics to the related sparkPi component thanks to the sparkApplication attribute we previously added in its definition.

Study

The goal of this study is to find a Spark configuration that minimizes the execution time for the example application.

To achieve this goal we’ll operate on the number of executor processes available to run the application job, and the memory and CPUs allocated for both driver and executors. The domains are configured so that the single driver/executor process does not exceed the size of the underlying instance, and the constraints make it so that the application overall does not require more resources than the ones available in the cluster, also taking into account that some resources must be reserved for other services such as the cluster manager.

Note that this study uses two constraints on the total number of resources to be used by the spark application. This example refers to a cluster of three nodes with 16 cores and 64 GB of memory each, and at least one core per instance should be reserved for the system.

Here’s the definition of the study:

name: Speedup SparkPi execution
system: spark
workflow: Run SparkPi

goal:
  objective: minimize
  function:
    formula: sparkPi.spark_application_duration

parametersSelection:
- name: sparkPi.driverCores
  domain: [1, 10]
- name: sparkPi.driverMemory
  domain: [32, 2048]
- name: sparkPi.executorCores
  domain: [1, 15]
- name: sparkPi.executorMemory
  domain: [32, 2048]
- name: sparkPi.numExecutors
  domain: [1, 45]

parameterConstraints:
- name: cap_total_allocated_cpus
  formula: (spark.driverCores + spark.executorCores*spark.numExecutors) <= 15*3

- name: cap_total_allocated_memory
  formula: (spark.driverMemory + spark.executorMemory*spark.numExecutors) <= 60*3

steps:
- name: baseline
  type: baseline

- name: tune
  type: optimize
  numberOfExperiments: 200
  maxFailedExperiments: 200

To create and run the study execute the following commands:

akamas create study study.yaml
akamas start study 'Speedup SparkPi execution'

Optimizing an Oracle Database server instance

In this example, we are going to tune the initialization parameters of an Oracle Database server instance in order to maximize its throughput while stressed by a load generator.

For the workload, we’ll use the OLTPBench's implementation of TPC-C, a popular transaction processing benchmarking suite, while to extract the metrics we are going to leverage the Oracle Prometheus exporter.

Environment setup

Environment

For the purpose of this experiment we are going to use two dedicated machines:

oraxe.mycompany.com, hosting a single Oracle 18c XE instance running inside a docker container (provisioned using the scripts on the official Oracle GitHub repository)
oltpbench.mycompany.com, that generates the workload using OLTPBench's TPC-C and will host the OracleDB Prometheus exporter instance

We assume to be working with Linux hosts

Prometheus and exporters

Install the OracleDB Prometheus exporter

The OracleDB Prometheus exporter publishes as metrics the results of the queries defined in the configuration file. In our case, we’ll use it to extract valuable performance metrics from Oracle’s Dynamic Performance (V$) Views.

We can spin up the exporter using the official Docker image using the following command, where cust-metrics.toml is our custom metrics file:

docker run -d --name orabench_exporter --restart always \
  -p 9161:9161 \
  -v ~/oracledb_exporter/cust-metrics.toml:/cust-metrics.toml \
  -e CUSTOM_METRICS=/cust-metrics.toml \
  -e DATA_SOURCE_NAME='system/passwd@//oraxe.mycompany.com:1521/XE' \
  iamseth/oracledb_exporter

The exporter will publish the metrics on port 9161.

Here’s the example metrics file used to run the exporter:

[[metric]]
context= "memory"
labels= [ "component" ]
metricsdesc= { size="Component memory extracted from v$memory_dynamic_components in Oracle." }
request = '''
SELECT component, current_size as "size"
FROM V$MEMORY_DYNAMIC_COMPONENTS
UNION
SELECT name, bytes as "size"
FROM V$SGAINFO
WHERE name in ('Free SGA Memory Available', 'Redo Buffers', 'Maximum SGA Size')
'''

[[metric]]
context = "activity"
metricsdesc = { value="Generic counter metric from v$sysstat view in Oracle." }
fieldtoappend = "name"
request = '''
SELECT name, value
FROM V$SYSSTAT WHERE name IN (
  'execute count',
  'user commits', 'user rollbacks',
  'db block gets from cache', 'consistent gets from cache', 'physical reads cache', /* CACHE */
  'redo log space requests'
 )
 '''

[[metric]]
context = "system_event"
labels = [ "event", "wait_class" ]
request = '''
SELECT
  event, wait_class,
  total_waits, time_waited
FROM V$SYSTEM_EVENT
'''
[metric.metricsdesc]
  total_waits= "Total number of waits for the event as per V$SYSTEM_EVENT in Oracle."
  time_waited= "Total time waited for the event (in hundredths of seconds) as per V$SYSTEM_EVENT in Oracle."

Install and configure Prometheus

You can check how to configure Prometheus here; by default, it will run on port 9090.

In order to configure the OracleDB exporter you can add the following snippet to the configuration file:

scrape_configs:
  - job_name: oraxe-exporter
    scrape_interval: 15s
    static_configs:
      - targets: [oltpbench.mycompany.com:9161]
    relabel_configs:
      - source_labels: [__address__]
        regex: (.*)
        target_label: instance
        replacement: oraxe

Optimization setup

System

In order to model the system composed of the tuned database and the workload generator we need two different components:

An oracle component that represents the Oracle Database instance and maps directly to oraxe.mycompany.com.
A tpcc component that represents the TPC-C workload from the OLTPBench suite and maps to oltpbench.mycompany.com.

For the tpcc component, we’ll need first to define some custom metrics and a new component-type. The following is the definition of the metrics (tpcc-metrics.yaml):

metrics:
  - name: throughput
    description: throughput
    unit: requests/s

  - name: resp_time
    description: resp_time
    unit: milliseconds

  - name: resp_time_min
    description: resp_time_min
    unit: milliseconds

  - name: resp_time90th
    description: resp_time90th
    unit: milliseconds

  - name: resp_time_max
    description: resp_time_max
    unit: milliseconds

The following is the definition of the new component-type (tpcc-ctype.yaml):

name: TPCC Benchmarck
description: OLTP TPCC Benchmarck

parameters: []

metrics:
  - name: throughput
  - name: resp_time
  - name: resp_time_min
  - name: resp_time90th
  - name: resp_time_max

We can then create the new component type running the commands:

akamas create metrics tpcc-metrics.yaml
akamas create component-type tpcc-ctype.yaml

As a next step, we can proceed then with the definition of our system (system.yaml):

name: oracle system
description: oracle system

Here’s the definition of our oracle component (oracle.yaml):

name: oracle
description: Oracle DB
componentType: Oracle Database 18c
properties:
  instance: oraxe

  connection:
    user: system
    password: passwd
    dsn: oraxe.mycompany.com:1521/XE

  hostname: oraxe.mycompany.com   # needed to run docker restart
  username: ubuntu
  sshPort: 22
  key: rsa_key_file

Here’s the definition of the tpcc component (tpcc.yaml):

name: tpcc
description: OLTP TPC-C load benchmarck
componentType: TPC-C Benchmarck
properties:
  hostname: oltpbench.mycompany.com
  username: ubuntu
  sshPort: 22
  key: rsa_key_file

We can create the system by running:

akamas create system system.yaml

We can then create the components by running:

akamas create component oracle.yaml 'oracle system'
akamas create component tpcc.yaml 'oracle system'

Telemetry

Prometheus

Since we are using Prometheus to extract the database metrics we can leverage the Prometheus provider, which already includes the queries needed for the Oracle metrics we need. To use the Prometheus provider we need to define a telemetry instance (prom.yaml):

provider: Prometheus
config:
  address: prometheus
  port: 9090

We can now create the telemetry instance and attach it to our system by running:

akamas create telemetry-instance prom.yaml 'oracle system'

CSV

Other than the telemetry of the Oracle instance, we need also the metrics in the output CSVs from the TPC-C workload runs. To ingest these metrics we can leverage the CSV Provider, defining the following telemetry instance (csv.yaml):

provider: csv
config:
  address: oltpbench.mycompany.com
  port: 22
  username: ubuntu
  protocol: scp
  authType: key
  auth: rsa_key_file

  remoteFilePattern: /home/ubuntu/oltpbench/results/output.csv
  componentColumn: component
  timestampColumn: ts
  timestampFormat: yyyy-MM-dd HH:mm:ss

metrics:
  - metric: throughput
    datasourceMetric: throughput
    staticLabels: {}

  - metric: resp_time
    datasourceMetric: avg_lat
    staticLabels: {}

  - metric: resp_time_min
    datasourceMetric: min_lat
    staticLabels: {}

  - metric: resp_time90th
    datasourceMetric: 90th_lat
    staticLabels: {}

  - metric: resp_time_max
    datasourceMetric: max_lat
    staticLabels: {}

We can create the telemetry instance and attach it to our system by running:

akamas create telemetry-instance csv.yaml 'oracle system'

Workflow

Remove previous executions' data

Using an Executor operator we run a command to clean the results folder that may contain files from previous executions

name: Clean results
operator: Executor
arguments:
  command: rm -f ~/oltpbench/results/*
  component: tpcc

Configure the Oracle instance

We define a task that uses the OracleConfigurator operator to update the Oracle initialization parameters:

name: Update parameters
operator: OracleConfigurator
arguments:
  component: oracle

Restart the instance

We define a task that uses the Executor operator that reboots the Oracle container for the parameters that need a restart to take effect:

name: Restart Oracle container
operator: Executor
arguments:
  command: docker restart oraxe
  component: oracle

Run the workload

We define a task that uses the Executor operator to launch the TPC-C benchmark against the Oracle instance:

name: Execute load test
operator: Executor
arguments:
  command: cd ~/oltpbench ; ./oltpbenchmark --bench tpcc --config tpcc_conf.xml --execute=true -s 5 --output out
  component: tpcc

Prepare test results

We define a workflow task that runs a script that parses the TPC-C output files and generates a file compatible with the CSV Provider:

name: Parse TPC-C results
operator: Executor
arguments:
  command: cd ~/oltpbench ; ./tpcc_parse_csv.sh
  component: tpcc

Where tpcc_parse_csv.sh is the following script:

#!/bin/bash

OUTFILE=output.csv

COMP_NAME=tpcc

BASETS=`tail -n+2 results/out.csv | head -n1 | cut -d',' -f3`
echo 'component,ts,throughput,avg_lat,min_lat,90th_lat,max_lat' > $OUTFILE
awk -F, "BEGIN{OFS=\",\"} NR>1 {\$1=strftime(\"%F %T\", ${BASETS}+\$1); print \"${COMP_NAME}\",\$0}" < results/out.res | cut -d',' -f1-5,9,12 >> $OUTFILE

Complete workflow

By putting together all the tasks defined above we come up with the following workflow definition (workflow.yaml):

name: oracle workflow
tasks:
  - name: Clean results
    operator: Executor
    arguments:
      command: rm -f ~/oltpbench/results/*
      component: tpcc

  - name: Update parameters
    operator: OracleConfigurator
    arguments:
      component: oracle

  - name: Restart Oracle container
    operator: Executor
    arguments:
      command: docker restart oraxe
      component: oracle

  - name: Execute load test
    operator: Executor
    arguments:
      command: cd ~/oltpbench ; ./oltpbenchmark --bench tpcc --config tpcc_conf.xml --execute=true -s 5 --output out
      component: tpcc

  - name: Parse TPC-C results
    operator: Executor
    arguments:
      command: cd ~/oltpbench ; ./tpcc_parse_csv.sh
      component: tpcc

We can create the workflow by running:

akamas create workflow workflow.yaml

Study

The objective of this study is to maximize the transaction throughput while stressed by the TPC-C load generator, and to achieve this goal the study will tune the size of the most important areas of the Oracle instance.

Goal

Here’s the definition of the goal for our study, which is to maximize the tpcc.throughput metric:

goal:
  objective: maximize
  function:
    formula: tpcc.throughput

Windowing

We define a window in order to consider only the data points after the ramp-up time of the load test:

windowing:
  type: trim
  trim: [4m, 1m]
  task: Execute load test

Parameters to optimize

For this study, we are trying to achieve our goal by tuning the size of several areas in the memory of the database instance. In particular, we will tune the overall size of the Program Global Area (containing the work area of the active sessions) and the size of the components of the Shared Global Area.

The domains are configured to explore, for each parameter, the values around the default values.

parametersSelection:
  - name: oracle.pga_aggregate_target
    domain: [1128, 4512]
  - name: oracle.db_cache_size
    domain: [512, 6144]
  - name: oracle.java_pool_size
    domain: [1, 1024]
  - name: oracle.large_pool_size
    domain: [1, 256]
  - name: oracle.log_buffer
    domain: [2, 256]
  - name: oracle.shared_pool_size
    domain: [128, 1024]
  - name: oracle.streams_pool_size
    domain: [1, 1024]

Constraints

The following constraint allows the study to explore different size configurations without exceeding the maximum overall memory available for the instance:

parameterConstraints:
  - name: Cap total memory to 10G
    formula: oracle.db_cache_size + oracle.java_pool_size + oracle.large_pool_size + oracle.log_buffer + oracle.shared_pool_size + oracle.streams_pool_size + oracle.pga_aggregate_target < 10240

Steps

We are going to add to our study two steps:

A baseline step, in which we configure the default values for the memory parameters as discovered from previous manual executions.
An optimization step, where we perform 200 experiments to search the set of parameters that best satisfies our goal.

The baseline step contains some additional parameters (oracle.memory_target, oracle.sga_target) that are required by Oracle in order to disable the automatic management of the SGA components.

Here’s what these steps look like:

steps:
  - name: baseline
    type: baseline
    values:
      oracle.pga_aggregate_target: 1128
      oracle.db_cache_size: 2496
      oracle.java_pool_size: 16
      oracle.large_pool_size: 16
      oracle.log_buffer: 13
      oracle.shared_pool_size: 640
      oracle.streams_pool_size: 0
      oracle.memory_target: 0
      oracle.sga_target: 0

  - name: optimization
    type: optimize
    numberOfExperiments: 200
    maxFailedExperiments: 200

Complete study

Here’s the study definition (study.yaml) for optimizing the Oracle instance:

name: Oracle: tune memory
description: Tune memory minimizing reponse
system: oracle system
workflow: oracle workflow

goal:
  objective: maximize
  function:
    formula: throughput
    variables:
      throughput:
        metric: tpcc.throughput

windowing:
  type: trim
  trim: [4m, 1m]
  task: Execute load test

parametersSelection:
  - name: oracle.pga_aggregate_target
    domain: [1128, 4512]
  - name: oracle.db_cache_size
    domain: [1024, 6144]
  - name: oracle.java_pool_size
    domain: [1, 1024]
  - name: oracle.large_pool_size
    domain: [1, 256]
  - name: oracle.log_buffer
    domain: [2, 256]
  - name: oracle.shared_pool_size
    domain: [128, 1024]
  - name: oracle.streams_pool_size
    domain: [1, 1024]

parameterConstraints:
  - name: Cap total memory to 10G
    formula: oracle.db_cache_size + oracle.java_pool_size + oracle.large_pool_size + oracle.log_buffer + oracle.shared_pool_size + oracle.streams_pool_size + oracle.pga_aggregate_target < 10240

steps:
  - name: baseline
    type: baseline
    values:
      oracle.pga_aggregate_target: 1128
      oracle.db_cache_size: 2496
      oracle.java_pool_size: 16
      oracle.large_pool_size: 16
      oracle.log_buffer: 13
      oracle.shared_pool_size: 640
      oracle.streams_pool_size: 0
      oracle.memory_target: 0
      oracle.sga_target: 0

  - name: optimization
    type: optimize
    numberOfExperiments: 200
    maxFailedExperiments: 200

You can create the study by running:

akamas create study study.yaml

You can then start it by running:

akamas start study 'Oracle: tune memory'

Optimizing an Oracle Database for an e-commerce service

In this example study, we will tune the initialization parameters of an Oracle Database server instance to minimize the memory required for running KonaKart, a popular Java e-commerce service, without significantly impacting the responsiveness of the whole system.

We’ll use Apache JMeter to stress the system for the test, while we will leverage the Oracle Prometheus exporter to extract the metrics.

Environment Setup

Environment

For this study, we will use three dedicated machines:

oradb.mycompany.com, hosting an Oracle Database 19c instance
konakart.mycompany.com, running the KonaKart Community Edition service
akamas.mycompany.com, which generates the workload using JMeter and will host the OracleDB Prometheus exporter instance

Refer to the following links to install and configure KonaKart Community Edition:

Install KonaKart: install and configure the service
Manual Installation: install the demo dataset

For this use case, we provisioned the database on a VM on Oracle Cloud, which allows us to easily provision licensed instances on demand.

Prometheus and exporters

Install the OracleDB Prometheus exporter

Through the OracleDB Prometheus exporter, we can publish as metrics the results of the arbitrary queries defined in the configuration file. In our case, we’ll use it to extract valuable performance metrics from Oracle’s Dynamic Performance (V$) Views.

We can spin up the exporter using the official Docker image using the following command, where cust-metrics.toml is our custom metrics file:

docker run -d --name orabench_exporter --restart always \
  -p 9161:9161 \
  -v ~/oracledb_exporter/cust-metrics.toml:/cust-metrics.toml \
  -e CUSTOM_METRICS=/cust-metrics.toml \
  -e DATA_SOURCE_NAME='user/passwd@//oradb.mycompany.com:1521/konakart' \
  iamseth/oracledb_exporter

The exporter will publish the metrics on the specified port 9161.

Here’s the metrics file used to run the exporter:

[[metric]]
context= "memory"
labels= [ "component" ]
metricsdesc= { size="Component memory extracted from v$memory_dynamic_components in Oracle." }
request = '''
SELECT component, current_size as "size"
FROM V$MEMORY_DYNAMIC_COMPONENTS
UNION
SELECT name, bytes as "size"
FROM V$SGAINFO
WHERE name in ('Free SGA Memory Available', 'Redo Buffers', 'Maximum SGA Size')
'''

[[metric]]
context = "activity"
metricsdesc = { value="Generic counter metric from v$sysstat view in Oracle." }
fieldtoappend = "name"
request = '''
SELECT name, value
FROM V$SYSSTAT WHERE name IN (
  'execute count',
  'user commits', 'user rollbacks',
  'db block gets from cache', 'consistent gets from cache', 'physical reads cache', /* CACHE */
  'redo log space requests'
 )
 '''

[[metric]]
context = "system_event"
labels = [ "event", "wait_class" ]
request = '''
SELECT
  event, wait_class,
  total_waits, time_waited
FROM V$SYSTEM_EVENT
'''
[metric.metricsdesc]
  total_waits= "Total number of waits for the event as per V$SYSTEM_EVENT in Oracle."
  time_waited= "Total time waited for the event (in hundredths of seconds) as per V$SYSTEM_EVENT in Oracle."

Install and configure Prometheus

Using the following snippet we configure Prometheus to fetch metrics from:

the JMeter exporter exposing the load-generator stats
the OracleDB exporter monitoring the database

scrape_configs:
  - job_name: jmeter
    static_configs:
      - targets: ['akamas.mycompany.com:9270']
    relabel_configs:
      - source_labels: [__address__]
        action: replace
        regex: "(.*):.*"
        target_label: instance

  - job_name: oracle-exp
    static_configs:
      - targets: ['akamas.mycompany.com:9161']
        labels:
          instance: oradb

For a complete guide on how to configure and manage Prometheus refer to the official documentation.

Setup the load-generator

The load generator runs containerized on the akamas.mycomopany.com instance using the attached Konakart_optimizePerf.jmx configuration file.

<?xml version="1.0" encoding="UTF-8"?>
<jmeterTestPlan version="1.2" properties="5.0" jmeter="5.3">
  <hashTree>
    <TestPlan guiclass="TestPlanGui" testclass="TestPlan" testname="Test Plan" enabled="true">
      <stringProp name="TestPlan.comments"></stringProp>
      <boolProp name="TestPlan.functional_mode">false</boolProp>
      <boolProp name="TestPlan.tearDown_on_shutdown">true</boolProp>
      <boolProp name="TestPlan.serialize_threadgroups">false</boolProp>
      <elementProp name="TestPlan.user_defined_variables" elementType="Arguments" guiclass="ArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
        <collectionProp name="Arguments.arguments">
          <elementProp name="HOST" elementType="Argument">
            <stringProp name="Argument.name">HOST</stringProp>
            <stringProp name="Argument.value">${__P(HOST,konakartoraclecloud.lab.akamas.io)}</stringProp>
            <stringProp name="Argument.metadata">=</stringProp>
          </elementProp>
          <elementProp name="VUSER" elementType="Argument">
            <stringProp name="Argument.name">VUSER</stringProp>
            <stringProp name="Argument.value">${__P(VUSER,60)}</stringProp>
            <stringProp name="Argument.metadata">=</stringProp>
          </elementProp>
        </collectionProp>
      </elementProp>
      <stringProp name="TestPlan.user_define_classpath"></stringProp>
    </TestPlan>
    <hashTree>
      <kg.apc.jmeter.threads.UltimateThreadGroup guiclass="kg.apc.jmeter.threads.UltimateThreadGroupGui" testclass="kg.apc.jmeter.threads.UltimateThreadGroup" testname=" Thread Group" enabled="true">
        <collectionProp name="ultimatethreadgroupdata">
          <collectionProp name="682260423">
            <stringProp name="-1375672883">${__P(VUSER,10)}</stringProp>
            <stringProp name="0">0</stringProp>
            <stringProp name="49586">200</stringProp>
            <stringProp name="53430">600</stringProp>
            <stringProp name="48">0</stringProp>
          </collectionProp>
        </collectionProp>
        <elementProp name="ThreadGroup.main_controller" elementType="LoopController" guiclass="LoopControlPanel" testclass="LoopController" testname="Loop Controller" enabled="true">
          <boolProp name="LoopController.continue_forever">false</boolProp>
          <intProp name="LoopController.loops">-1</intProp>
        </elementProp>
        <stringProp name="ThreadGroup.on_sample_error">startnextloop</stringProp>
      </kg.apc.jmeter.threads.UltimateThreadGroup>
      <hashTree>
        <RandomVariableConfig guiclass="TestBeanGUI" testclass="RandomVariableConfig" testname="Random catId" enabled="true">
          <stringProp name="maximumValue">30</stringProp>
          <stringProp name="minimumValue">1</stringProp>
          <stringProp name="outputFormat"></stringProp>
          <boolProp name="perThread">true</boolProp>
          <stringProp name="randomSeed">666</stringProp>
          <stringProp name="variableName">catId</stringProp>
        </RandomVariableConfig>
        <hashTree/>
        <HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="Welcome" enabled="true">
          <elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
            <collectionProp name="Arguments.arguments"/>
          </elementProp>
          <stringProp name="HTTPSampler.domain">${HOST}</stringProp>
          <stringProp name="HTTPSampler.port">8780</stringProp>
          <stringProp name="HTTPSampler.protocol">http</stringProp>
          <stringProp name="HTTPSampler.contentEncoding"></stringProp>
          <stringProp name="HTTPSampler.path">konakart/Welcome.action</stringProp>
          <stringProp name="HTTPSampler.method">GET</stringProp>
          <boolProp name="HTTPSampler.follow_redirects">false</boolProp>
          <boolProp name="HTTPSampler.auto_redirects">true</boolProp>
          <boolProp name="HTTPSampler.use_keepalive">true</boolProp>
          <boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
          <boolProp name="HTTPSampler.BROWSER_COMPATIBLE_MULTIPART">true</boolProp>
          <boolProp name="HTTPSampler.image_parser">true</boolProp>
          <boolProp name="HTTPSampler.concurrentDwn">true</boolProp>
          <stringProp name="HTTPSampler.embedded_url_re"></stringProp>
          <stringProp name="HTTPSampler.connect_timeout"></stringProp>
          <stringProp name="HTTPSampler.response_timeout"></stringProp>
        </HTTPSamplerProxy>
        <hashTree/>
        <HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="SelectCat" enabled="true">
          <elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
            <collectionProp name="Arguments.arguments"/>
          </elementProp>
          <stringProp name="HTTPSampler.domain">${HOST}</stringProp>
          <stringProp name="HTTPSampler.port">8780</stringProp>
          <stringProp name="HTTPSampler.protocol">http</stringProp>
          <stringProp name="HTTPSampler.contentEncoding"></stringProp>
          <stringProp name="HTTPSampler.path">konakart/SelectCat.action?catId=${catId}</stringProp>
          <stringProp name="HTTPSampler.method">GET</stringProp>
          <boolProp name="HTTPSampler.follow_redirects">false</boolProp>
          <boolProp name="HTTPSampler.auto_redirects">true</boolProp>
          <boolProp name="HTTPSampler.use_keepalive">true</boolProp>
          <boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
          <boolProp name="HTTPSampler.BROWSER_COMPATIBLE_MULTIPART">true</boolProp>
          <boolProp name="HTTPSampler.image_parser">true</boolProp>
          <boolProp name="HTTPSampler.concurrentDwn">true</boolProp>
          <stringProp name="HTTPSampler.embedded_url_re"></stringProp>
          <stringProp name="HTTPSampler.connect_timeout"></stringProp>
          <stringProp name="HTTPSampler.response_timeout"></stringProp>
        </HTTPSamplerProxy>
        <hashTree/>
        <com.github.johrstrom.listener.PrometheusListener guiclass="com.github.johrstrom.listener.gui.PrometheusListenerGui" testclass="com.github.johrstrom.listener.PrometheusListener" testname="Prometheus Listener" enabled="true">
          <collectionProp name="prometheus.collector_definitions">
            <elementProp name="" elementType="com.github.johrstrom.listener.ListenerCollectorConfig">
              <stringProp name="collector.help">Sampler Response Time</stringProp>
              <stringProp name="collector.metric_name">ResponseTime</stringProp>
              <stringProp name="collector.type">SUMMARY</stringProp>
              <collectionProp name="collector.labels">
                <stringProp name="102727412">label</stringProp>
                <stringProp name="3059181">code</stringProp>
              </collectionProp>
              <stringProp name="collector.quantiles_or_buckets">0.5,0.01|0.85,0.01|0.9,0.01|0.99,0.01;60</stringProp>
              <stringProp name="listener.collector.listen_to">samples</stringProp>
              <stringProp name="listener.collector.measuring">ResponseTime</stringProp>
            </elementProp>
            <elementProp name="" elementType="com.github.johrstrom.listener.ListenerCollectorConfig">
              <stringProp name="collector.help">Success and failure ratio</stringProp>
              <stringProp name="collector.metric_name">Ratio</stringProp>
              <stringProp name="collector.type">SUCCESS_RATIO</stringProp>
              <collectionProp name="collector.labels">
                <stringProp name="102727412">label</stringProp>
                <stringProp name="3059181">code</stringProp>
              </collectionProp>
              <stringProp name="collector.quantiles_or_buckets"></stringProp>
              <stringProp name="listener.collector.listen_to">samples</stringProp>
              <stringProp name="listener.collector.measuring">SuccessRatio</stringProp>
            </elementProp>
          </collectionProp>
        </com.github.johrstrom.listener.PrometheusListener>
        <hashTree/>
        <ResultCollector guiclass="ViewResultsFullVisualizer" testclass="ResultCollector" testname="View Results Tree" enabled="false">
          <boolProp name="ResultCollector.error_logging">false</boolProp>
          <objProp>
            <name>saveConfig</name>
            <value class="SampleSaveConfiguration">
              <time>true</time>
              <latency>true</latency>
              <timestamp>true</timestamp>
              <success>true</success>
              <label>true</label>
              <code>true</code>
              <message>true</message>
              <threadName>true</threadName>
              <dataType>true</dataType>
              <encoding>false</encoding>
              <assertions>true</assertions>
              <subresults>true</subresults>
              <responseData>false</responseData>
              <samplerData>false</samplerData>
              <xml>false</xml>
              <fieldNames>true</fieldNames>
              <responseHeaders>false</responseHeaders>
              <requestHeaders>false</requestHeaders>
              <responseDataOnError>false</responseDataOnError>
              <saveAssertionResultsFailureMessage>true</saveAssertionResultsFailureMessage>
              <assertionsResultsToSave>0</assertionsResultsToSave>
              <bytes>true</bytes>
              <sentBytes>true</sentBytes>
              <url>true</url>
              <threadCounts>true</threadCounts>
              <idleTime>true</idleTime>
              <connectTime>true</connectTime>
            </value>
          </objProp>
          <stringProp name="filename"></stringProp>
        </ResultCollector>
        <hashTree/>
        <ResultCollector guiclass="SummaryReport" testclass="ResultCollector" testname="Summary Report" enabled="false">
          <boolProp name="ResultCollector.error_logging">false</boolProp>
          <objProp>
            <name>saveConfig</name>
            <value class="SampleSaveConfiguration">
              <time>true</time>
              <latency>true</latency>
              <timestamp>true</timestamp>
              <success>true</success>
              <label>true</label>
              <code>true</code>
              <message>true</message>
              <threadName>true</threadName>
              <dataType>true</dataType>
              <encoding>false</encoding>
              <assertions>true</assertions>
              <subresults>true</subresults>
              <responseData>false</responseData>
              <samplerData>false</samplerData>
              <xml>false</xml>
              <fieldNames>true</fieldNames>
              <responseHeaders>false</responseHeaders>
              <requestHeaders>false</requestHeaders>
              <responseDataOnError>false</responseDataOnError>
              <saveAssertionResultsFailureMessage>true</saveAssertionResultsFailureMessage>
              <assertionsResultsToSave>0</assertionsResultsToSave>
              <bytes>true</bytes>
              <sentBytes>true</sentBytes>
              <url>true</url>
              <threadCounts>true</threadCounts>
              <idleTime>true</idleTime>
              <connectTime>true</connectTime>
            </value>
          </objProp>
          <stringProp name="filename"></stringProp>
        </ResultCollector>
        <hashTree/>
      </hashTree>
      <kg.apc.jmeter.threads.UltimateThreadGroup guiclass="kg.apc.jmeter.threads.UltimateThreadGroupGui" testclass="kg.apc.jmeter.threads.UltimateThreadGroup" testname="jp@gc - Ultimate Thread Group - slow slope" enabled="false">
        <collectionProp name="ultimatethreadgroupdata">
          <collectionProp name="-811668396">
            <stringProp name="50">2</stringProp>
            <stringProp name="48">0</stringProp>
            <stringProp name="1722">60</stringProp>
            <stringProp name="1508508">1140</stringProp>
            <stringProp name="0"></stringProp>
          </collectionProp>
          <collectionProp name="1110318546">
            <stringProp name="50">2</stringProp>
            <stringProp name="50547">300</stringProp>
            <stringProp name="1722">60</stringProp>
            <stringProp name="55476">840</stringProp>
            <stringProp name="0"></stringProp>
          </collectionProp>
          <collectionProp name="-277728215">
            <stringProp name="50">2</stringProp>
            <stringProp name="53430">600</stringProp>
            <stringProp name="1722">60</stringProp>
            <stringProp name="52593">540</stringProp>
            <stringProp name="0"></stringProp>
          </collectionProp>
          <collectionProp name="1889215593">
            <stringProp name="50">2</stringProp>
            <stringProp name="56313">900</stringProp>
            <stringProp name="1722">60</stringProp>
            <stringProp name="49710">240</stringProp>
            <stringProp name="0"></stringProp>
          </collectionProp>
        </collectionProp>
        <elementProp name="ThreadGroup.main_controller" elementType="LoopController" guiclass="LoopControlPanel" testclass="LoopController" testname="Loop Controller" enabled="true">
          <boolProp name="LoopController.continue_forever">false</boolProp>
          <intProp name="LoopController.loops">-1</intProp>
        </elementProp>
        <stringProp name="ThreadGroup.on_sample_error">continue</stringProp>
      </kg.apc.jmeter.threads.UltimateThreadGroup>
      <hashTree>
        <RandomVariableConfig guiclass="TestBeanGUI" testclass="RandomVariableConfig" testname="Random catId" enabled="true">
          <stringProp name="maximumValue">30</stringProp>
          <stringProp name="minimumValue">1</stringProp>
          <stringProp name="outputFormat"></stringProp>
          <boolProp name="perThread">true</boolProp>
          <stringProp name="randomSeed">666</stringProp>
          <stringProp name="variableName">catId</stringProp>
        </RandomVariableConfig>
        <hashTree/>
        <HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="Welcome" enabled="true">
          <elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
            <collectionProp name="Arguments.arguments"/>
          </elementProp>
          <stringProp name="HTTPSampler.domain">${__P(HOST,konakartoraclecloud.lab.akamas.io)}</stringProp>
          <stringProp name="HTTPSampler.port">8780</stringProp>
          <stringProp name="HTTPSampler.protocol">http</stringProp>
          <stringProp name="HTTPSampler.contentEncoding"></stringProp>
          <stringProp name="HTTPSampler.path">konakart/Welcome.action</stringProp>
          <stringProp name="HTTPSampler.method">GET</stringProp>
          <boolProp name="HTTPSampler.follow_redirects">false</boolProp>
          <boolProp name="HTTPSampler.auto_redirects">true</boolProp>
          <boolProp name="HTTPSampler.use_keepalive">true</boolProp>
          <boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
          <boolProp name="HTTPSampler.BROWSER_COMPATIBLE_MULTIPART">true</boolProp>
          <boolProp name="HTTPSampler.image_parser">true</boolProp>
          <boolProp name="HTTPSampler.concurrentDwn">true</boolProp>
          <stringProp name="HTTPSampler.embedded_url_re"></stringProp>
          <stringProp name="HTTPSampler.connect_timeout"></stringProp>
          <stringProp name="HTTPSampler.response_timeout"></stringProp>
        </HTTPSamplerProxy>
        <hashTree/>
        <TestAction guiclass="TestActionGui" testclass="TestAction" testname="think time" enabled="true">
          <intProp name="ActionProcessor.action">1</intProp>
          <intProp name="ActionProcessor.target">0</intProp>
          <stringProp name="ActionProcessor.duration">100</stringProp>
        </TestAction>
        <hashTree/>
        <HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="SelectCat" enabled="true">
          <elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
            <collectionProp name="Arguments.arguments"/>
          </elementProp>
          <stringProp name="HTTPSampler.domain">${__P(HOST,konakartoraclecloud.lab.akamas.io)}</stringProp>
          <stringProp name="HTTPSampler.port">8780</stringProp>
          <stringProp name="HTTPSampler.protocol">http</stringProp>
          <stringProp name="HTTPSampler.contentEncoding"></stringProp>
          <stringProp name="HTTPSampler.path">konakart/SelectCat.action?catId=${catId}</stringProp>
          <stringProp name="HTTPSampler.method">GET</stringProp>
          <boolProp name="HTTPSampler.follow_redirects">false</boolProp>
          <boolProp name="HTTPSampler.auto_redirects">true</boolProp>
          <boolProp name="HTTPSampler.use_keepalive">true</boolProp>
          <boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
          <boolProp name="HTTPSampler.BROWSER_COMPATIBLE_MULTIPART">true</boolProp>
          <boolProp name="HTTPSampler.image_parser">true</boolProp>
          <boolProp name="HTTPSampler.concurrentDwn">true</boolProp>
          <stringProp name="HTTPSampler.embedded_url_re"></stringProp>
          <stringProp name="HTTPSampler.connect_timeout"></stringProp>
          <stringProp name="HTTPSampler.response_timeout"></stringProp>
        </HTTPSamplerProxy>
        <hashTree/>
        <TestAction guiclass="TestActionGui" testclass="TestAction" testname="think time" enabled="true">
          <intProp name="ActionProcessor.action">1</intProp>
          <intProp name="ActionProcessor.target">0</intProp>
          <stringProp name="ActionProcessor.duration">100</stringProp>
        </TestAction>
        <hashTree/>
        <com.github.johrstrom.listener.PrometheusListener guiclass="com.github.johrstrom.listener.gui.PrometheusListenerGui" testclass="com.github.johrstrom.listener.PrometheusListener" testname="Prometheus Listener" enabled="true">
          <collectionProp name="prometheus.collector_definitions">
            <elementProp name="" elementType="com.github.johrstrom.listener.ListenerCollectorConfig">
              <stringProp name="collector.help">Sampler Response Time</stringProp>
              <stringProp name="collector.metric_name">ResponseTime</stringProp>
              <stringProp name="collector.type">SUMMARY</stringProp>
              <collectionProp name="collector.labels">
                <stringProp name="102727412">label</stringProp>
                <stringProp name="3059181">code</stringProp>
              </collectionProp>
              <stringProp name="collector.quantiles_or_buckets">0.5,0.01|0.85,0.01|0.9,0.01|0.99,0.01;60</stringProp>
              <stringProp name="listener.collector.listen_to">samples</stringProp>
              <stringProp name="listener.collector.measuring">ResponseTime</stringProp>
            </elementProp>
            <elementProp name="" elementType="com.github.johrstrom.listener.ListenerCollectorConfig">
              <stringProp name="collector.help">Success and failure ratio</stringProp>
              <stringProp name="collector.metric_name">Ratio</stringProp>
              <stringProp name="collector.type">SUCCESS_RATIO</stringProp>
              <collectionProp name="collector.labels">
                <stringProp name="102727412">label</stringProp>
                <stringProp name="3059181">code</stringProp>
              </collectionProp>
              <stringProp name="collector.quantiles_or_buckets"></stringProp>
              <stringProp name="listener.collector.listen_to">samples</stringProp>
              <stringProp name="listener.collector.measuring">SuccessRatio</stringProp>
            </elementProp>
          </collectionProp>
        </com.github.johrstrom.listener.PrometheusListener>
        <hashTree/>
        <ResultCollector guiclass="ViewResultsFullVisualizer" testclass="ResultCollector" testname="View Results Tree" enabled="false">
          <boolProp name="ResultCollector.error_logging">false</boolProp>
          <objProp>
            <name>saveConfig</name>
            <value class="SampleSaveConfiguration">
              <time>true</time>
              <latency>true</latency>
              <timestamp>true</timestamp>
              <success>true</success>
              <label>true</label>
              <code>true</code>
              <message>true</message>
              <threadName>true</threadName>
              <dataType>true</dataType>
              <encoding>false</encoding>
              <assertions>true</assertions>
              <subresults>true</subresults>
              <responseData>false</responseData>
              <samplerData>false</samplerData>
              <xml>false</xml>
              <fieldNames>true</fieldNames>
              <responseHeaders>false</responseHeaders>
              <requestHeaders>false</requestHeaders>
              <responseDataOnError>false</responseDataOnError>
              <saveAssertionResultsFailureMessage>true</saveAssertionResultsFailureMessage>
              <assertionsResultsToSave>0</assertionsResultsToSave>
              <bytes>true</bytes>
              <sentBytes>true</sentBytes>
              <url>true</url>
              <threadCounts>true</threadCounts>
              <idleTime>true</idleTime>
              <connectTime>true</connectTime>
            </value>
          </objProp>
          <stringProp name="filename"></stringProp>
        </ResultCollector>
        <hashTree/>
        <ResultCollector guiclass="SummaryReport" testclass="ResultCollector" testname="Summary Report" enabled="false">
          <boolProp name="ResultCollector.error_logging">false</boolProp>
          <objProp>
            <name>saveConfig</name>
            <value class="SampleSaveConfiguration">
              <time>true</time>
              <latency>true</latency>
              <timestamp>true</timestamp>
              <success>true</success>
              <label>true</label>
              <code>true</code>
              <message>true</message>
              <threadName>true</threadName>
              <dataType>true</dataType>
              <encoding>false</encoding>
              <assertions>true</assertions>
              <subresults>true</subresults>
              <responseData>false</responseData>
              <samplerData>false</samplerData>
              <xml>false</xml>
              <fieldNames>true</fieldNames>
              <responseHeaders>false</responseHeaders>
              <requestHeaders>false</requestHeaders>
              <responseDataOnError>false</responseDataOnError>
              <saveAssertionResultsFailureMessage>true</saveAssertionResultsFailureMessage>
              <assertionsResultsToSave>0</assertionsResultsToSave>
              <bytes>true</bytes>
              <sentBytes>true</sentBytes>
              <url>true</url>
              <threadCounts>true</threadCounts>
              <idleTime>true</idleTime>
              <connectTime>true</connectTime>
            </value>
          </objProp>
          <stringProp name="filename"></stringProp>
        </ResultCollector>
        <hashTree/>
      </hashTree>
      <kg.apc.jmeter.threads.UltimateThreadGroup guiclass="kg.apc.jmeter.threads.UltimateThreadGroupGui" testclass="kg.apc.jmeter.threads.UltimateThreadGroup" testname="jp@gc - Ultimate Thread Group  - steep slope" enabled="false">
        <collectionProp name="ultimatethreadgroupdata">
          <collectionProp name="-1293805452">
            <stringProp name="1696">55</stringProp>
            <stringProp name="48">0</stringProp>
            <stringProp name="50547">300</stringProp>
            <stringProp name="1507521">1035</stringProp>
            <stringProp name="0"></stringProp>
          </collectionProp>
          <collectionProp name="-1729043283">
            <stringProp name="52">4</stringProp>
            <stringProp name="52593">540</stringProp>
            <stringProp name="1598">20</stringProp>
            <stringProp name="54613">775</stringProp>
            <stringProp name="0"></stringProp>
          </collectionProp>
          <collectionProp name="110060986">
            <stringProp name="53">5</stringProp>
            <stringProp name="55352">800</stringProp>
            <stringProp name="1603">25</stringProp>
            <stringProp name="52500">510</stringProp>
            <stringProp name="0"></stringProp>
          </collectionProp>
          <collectionProp name="-1520762789">
            <stringProp name="54">6</stringProp>
            <stringProp name="1507614">1065</stringProp>
            <stringProp name="1629">30</stringProp>
            <stringProp name="49710">240</stringProp>
            <stringProp name="0"></stringProp>
          </collectionProp>
        </collectionProp>
        <elementProp name="ThreadGroup.main_controller" elementType="LoopController" guiclass="LoopControlPanel" testclass="LoopController" testname="Loop Controller" enabled="true">
          <boolProp name="LoopController.continue_forever">false</boolProp>
          <intProp name="LoopController.loops">-1</intProp>
        </elementProp>
        <stringProp name="ThreadGroup.on_sample_error">continue</stringProp>
      </kg.apc.jmeter.threads.UltimateThreadGroup>
      <hashTree>
        <RandomVariableConfig guiclass="TestBeanGUI" testclass="RandomVariableConfig" testname="Random catId" enabled="true">
          <stringProp name="maximumValue">30</stringProp>
          <stringProp name="minimumValue">1</stringProp>
          <stringProp name="outputFormat"></stringProp>
          <boolProp name="perThread">true</boolProp>
          <stringProp name="randomSeed">666</stringProp>
          <stringProp name="variableName">catId</stringProp>
        </RandomVariableConfig>
        <hashTree/>
        <HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="Welcome" enabled="true">
          <elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
            <collectionProp name="Arguments.arguments"/>
          </elementProp>
          <stringProp name="HTTPSampler.domain">${__P(HOST,konakartoraclecloud.lab.akamas.io)}</stringProp>
          <stringProp name="HTTPSampler.port">8780</stringProp>
          <stringProp name="HTTPSampler.protocol">http</stringProp>
          <stringProp name="HTTPSampler.contentEncoding"></stringProp>
          <stringProp name="HTTPSampler.path">konakart/Welcome.action</stringProp>
          <stringProp name="HTTPSampler.method">GET</stringProp>
          <boolProp name="HTTPSampler.follow_redirects">false</boolProp>
          <boolProp name="HTTPSampler.auto_redirects">true</boolProp>
          <boolProp name="HTTPSampler.use_keepalive">true</boolProp>
          <boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
          <boolProp name="HTTPSampler.BROWSER_COMPATIBLE_MULTIPART">true</boolProp>
          <boolProp name="HTTPSampler.image_parser">true</boolProp>
          <boolProp name="HTTPSampler.concurrentDwn">true</boolProp>
          <stringProp name="HTTPSampler.embedded_url_re"></stringProp>
          <stringProp name="HTTPSampler.connect_timeout"></stringProp>
          <stringProp name="HTTPSampler.response_timeout"></stringProp>
        </HTTPSamplerProxy>
        <hashTree/>
        <HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="SelectCat" enabled="true">
          <elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
            <collectionProp name="Arguments.arguments"/>
          </elementProp>
          <stringProp name="HTTPSampler.domain">${__P(HOST,konakartoraclecloud.lab.akamas.io)}</stringProp>
          <stringProp name="HTTPSampler.port">8780</stringProp>
          <stringProp name="HTTPSampler.protocol">http</stringProp>
          <stringProp name="HTTPSampler.contentEncoding"></stringProp>
          <stringProp name="HTTPSampler.path">konakart/SelectCat.action?catId=${catId}</stringProp>
          <stringProp name="HTTPSampler.method">GET</stringProp>
          <boolProp name="HTTPSampler.follow_redirects">false</boolProp>
          <boolProp name="HTTPSampler.auto_redirects">true</boolProp>
          <boolProp name="HTTPSampler.use_keepalive">true</boolProp>
          <boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
          <boolProp name="HTTPSampler.BROWSER_COMPATIBLE_MULTIPART">true</boolProp>
          <boolProp name="HTTPSampler.image_parser">true</boolProp>
          <boolProp name="HTTPSampler.concurrentDwn">true</boolProp>
          <stringProp name="HTTPSampler.embedded_url_re"></stringProp>
          <stringProp name="HTTPSampler.connect_timeout"></stringProp>
          <stringProp name="HTTPSampler.response_timeout"></stringProp>
        </HTTPSamplerProxy>
        <hashTree/>
        <com.github.johrstrom.listener.PrometheusListener guiclass="com.github.johrstrom.listener.gui.PrometheusListenerGui" testclass="com.github.johrstrom.listener.PrometheusListener" testname="Prometheus Listener" enabled="true">
          <collectionProp name="prometheus.collector_definitions">
            <elementProp name="" elementType="com.github.johrstrom.listener.ListenerCollectorConfig">
              <stringProp name="collector.help">Sampler Response Time</stringProp>
              <stringProp name="collector.metric_name">ResponseTime</stringProp>
              <stringProp name="collector.type">SUMMARY</stringProp>
              <collectionProp name="collector.labels">
                <stringProp name="102727412">label</stringProp>
                <stringProp name="3059181">code</stringProp>
              </collectionProp>
              <stringProp name="collector.quantiles_or_buckets">0.5,0.01|0.85,0.01|0.9,0.01|0.99,0.01;60</stringProp>
              <stringProp name="listener.collector.listen_to">samples</stringProp>
              <stringProp name="listener.collector.measuring">ResponseTime</stringProp>
            </elementProp>
            <elementProp name="" elementType="com.github.johrstrom.listener.ListenerCollectorConfig">
              <stringProp name="collector.help">Success and failure ratio</stringProp>
              <stringProp name="collector.metric_name">Ratio</stringProp>
              <stringProp name="collector.type">SUCCESS_RATIO</stringProp>
              <collectionProp name="collector.labels">
                <stringProp name="102727412">label</stringProp>
                <stringProp name="3059181">code</stringProp>
              </collectionProp>
              <stringProp name="collector.quantiles_or_buckets"></stringProp>
              <stringProp name="listener.collector.listen_to">samples</stringProp>
              <stringProp name="listener.collector.measuring">SuccessRatio</stringProp>
            </elementProp>
          </collectionProp>
        </com.github.johrstrom.listener.PrometheusListener>
        <hashTree/>
        <ResultCollector guiclass="ViewResultsFullVisualizer" testclass="ResultCollector" testname="View Results Tree" enabled="false">
          <boolProp name="ResultCollector.error_logging">false</boolProp>
          <objProp>
            <name>saveConfig</name>
            <value class="SampleSaveConfiguration">
              <time>true</time>
              <latency>true</latency>
              <timestamp>true</timestamp>
              <success>true</success>
              <label>true</label>
              <code>true</code>
              <message>true</message>
              <threadName>true</threadName>
              <dataType>true</dataType>
              <encoding>false</encoding>
              <assertions>true</assertions>
              <subresults>true</subresults>
              <responseData>false</responseData>
              <samplerData>false</samplerData>
              <xml>false</xml>
              <fieldNames>true</fieldNames>
              <responseHeaders>false</responseHeaders>
              <requestHeaders>false</requestHeaders>
              <responseDataOnError>false</responseDataOnError>
              <saveAssertionResultsFailureMessage>true</saveAssertionResultsFailureMessage>
              <assertionsResultsToSave>0</assertionsResultsToSave>
              <bytes>true</bytes>
              <sentBytes>true</sentBytes>
              <url>true</url>
              <threadCounts>true</threadCounts>
              <idleTime>true</idleTime>
              <connectTime>true</connectTime>
            </value>
          </objProp>
          <stringProp name="filename"></stringProp>
        </ResultCollector>
        <hashTree/>
        <ResultCollector guiclass="SummaryReport" testclass="ResultCollector" testname="Summary Report" enabled="false">
          <boolProp name="ResultCollector.error_logging">false</boolProp>
          <objProp>
            <name>saveConfig</name>
            <value class="SampleSaveConfiguration">
              <time>true</time>
              <latency>true</latency>
              <timestamp>true</timestamp>
              <success>true</success>
              <label>true</label>
              <code>true</code>
              <message>true</message>
              <threadName>true</threadName>
              <dataType>true</dataType>
              <encoding>false</encoding>
              <assertions>true</assertions>
              <subresults>true</subresults>
              <responseData>false</responseData>
              <samplerData>false</samplerData>
              <xml>false</xml>
              <fieldNames>true</fieldNames>
              <responseHeaders>false</responseHeaders>
              <requestHeaders>false</requestHeaders>
              <responseDataOnError>false</responseDataOnError>
              <saveAssertionResultsFailureMessage>true</saveAssertionResultsFailureMessage>
              <assertionsResultsToSave>0</assertionsResultsToSave>
              <bytes>true</bytes>
              <sentBytes>true</sentBytes>
              <url>true</url>
              <threadCounts>true</threadCounts>
              <idleTime>true</idleTime>
              <connectTime>true</connectTime>
            </value>
          </objProp>
          <stringProp name="filename"></stringProp>
        </ResultCollector>
        <hashTree/>
      </hashTree>
      <kg.apc.jmeter.threads.UltimateThreadGroup guiclass="kg.apc.jmeter.threads.UltimateThreadGroupGui" testclass="kg.apc.jmeter.threads.UltimateThreadGroup" testname="jp@gc - Ultimate Thread Group - Infinite slope" enabled="false">
        <collectionProp name="ultimatethreadgroupdata">
          <collectionProp name="-477682307">
            <stringProp name="1722">60</stringProp>
            <stringProp name="0">0</stringProp>
            <stringProp name="1572771">3600</stringProp>
            <stringProp name="0"></stringProp>
            <stringProp name="0"></stringProp>
          </collectionProp>
        </collectionProp>
        <elementProp name="ThreadGroup.main_controller" elementType="LoopController" guiclass="LoopControlPanel" testclass="LoopController" testname="Loop Controller" enabled="true">
          <boolProp name="LoopController.continue_forever">false</boolProp>
          <intProp name="LoopController.loops">-1</intProp>
        </elementProp>
        <stringProp name="ThreadGroup.on_sample_error">continue</stringProp>
      </kg.apc.jmeter.threads.UltimateThreadGroup>
      <hashTree>
        <RandomVariableConfig guiclass="TestBeanGUI" testclass="RandomVariableConfig" testname="Random catId" enabled="true">
          <stringProp name="maximumValue">30</stringProp>
          <stringProp name="minimumValue">1</stringProp>
          <stringProp name="outputFormat"></stringProp>
          <boolProp name="perThread">true</boolProp>
          <stringProp name="randomSeed">666</stringProp>
          <stringProp name="variableName">catId</stringProp>
        </RandomVariableConfig>
        <hashTree/>
        <HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="Welcome" enabled="true">
          <elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
            <collectionProp name="Arguments.arguments"/>
          </elementProp>
          <stringProp name="HTTPSampler.domain">${__P(HOST,konakartoraclecloud.lab.akamas.io)}</stringProp>
          <stringProp name="HTTPSampler.port">8780</stringProp>
          <stringProp name="HTTPSampler.protocol">http</stringProp>
          <stringProp name="HTTPSampler.contentEncoding"></stringProp>
          <stringProp name="HTTPSampler.path">konakart/Welcome.action</stringProp>
          <stringProp name="HTTPSampler.method">GET</stringProp>
          <boolProp name="HTTPSampler.follow_redirects">false</boolProp>
          <boolProp name="HTTPSampler.auto_redirects">true</boolProp>
          <boolProp name="HTTPSampler.use_keepalive">true</boolProp>
          <boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
          <boolProp name="HTTPSampler.BROWSER_COMPATIBLE_MULTIPART">true</boolProp>
          <boolProp name="HTTPSampler.image_parser">true</boolProp>
          <boolProp name="HTTPSampler.concurrentDwn">true</boolProp>
          <stringProp name="HTTPSampler.embedded_url_re"></stringProp>
          <stringProp name="HTTPSampler.connect_timeout"></stringProp>
          <stringProp name="HTTPSampler.response_timeout"></stringProp>
        </HTTPSamplerProxy>
        <hashTree/>
        <HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="SelectCat" enabled="true">
          <elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
            <collectionProp name="Arguments.arguments"/>
          </elementProp>
          <stringProp name="HTTPSampler.domain">${__P(HOST,konakartoraclecloud.lab.akamas.io)}</stringProp>
          <stringProp name="HTTPSampler.port">8780</stringProp>
          <stringProp name="HTTPSampler.protocol">http</stringProp>
          <stringProp name="HTTPSampler.contentEncoding"></stringProp>
          <stringProp name="HTTPSampler.path">konakart/SelectCat.action?catId=${catId}</stringProp>
          <stringProp name="HTTPSampler.method">GET</stringProp>
          <boolProp name="HTTPSampler.follow_redirects">false</boolProp>
          <boolProp name="HTTPSampler.auto_redirects">true</boolProp>
          <boolProp name="HTTPSampler.use_keepalive">true</boolProp>
          <boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
          <boolProp name="HTTPSampler.BROWSER_COMPATIBLE_MULTIPART">true</boolProp>
          <boolProp name="HTTPSampler.image_parser">true</boolProp>
          <boolProp name="HTTPSampler.concurrentDwn">true</boolProp>
          <stringProp name="HTTPSampler.embedded_url_re"></stringProp>
          <stringProp name="HTTPSampler.connect_timeout"></stringProp>
          <stringProp name="HTTPSampler.response_timeout"></stringProp>
        </HTTPSamplerProxy>
        <hashTree/>
        <com.github.johrstrom.listener.PrometheusListener guiclass="com.github.johrstrom.listener.gui.PrometheusListenerGui" testclass="com.github.johrstrom.listener.PrometheusListener" testname="Prometheus Listener" enabled="true">
          <collectionProp name="prometheus.collector_definitions">
            <elementProp name="" elementType="com.github.johrstrom.listener.ListenerCollectorConfig">
              <stringProp name="collector.help">Sampler Response Time</stringProp>
              <stringProp name="collector.metric_name">ResponseTime</stringProp>
              <stringProp name="collector.type">SUMMARY</stringProp>
              <collectionProp name="collector.labels">
                <stringProp name="102727412">label</stringProp>
                <stringProp name="3059181">code</stringProp>
              </collectionProp>
              <stringProp name="collector.quantiles_or_buckets">0.5,0.01|0.85,0.01|0.9,0.01|0.99,0.01;60</stringProp>
              <stringProp name="listener.collector.listen_to">samples</stringProp>
              <stringProp name="listener.collector.measuring">ResponseTime</stringProp>
            </elementProp>
            <elementProp name="" elementType="com.github.johrstrom.listener.ListenerCollectorConfig">
              <stringProp name="collector.help">Success and failure ratio</stringProp>
              <stringProp name="collector.metric_name">Ratio</stringProp>
              <stringProp name="collector.type">SUCCESS_RATIO</stringProp>
              <collectionProp name="collector.labels">
                <stringProp name="102727412">label</stringProp>
                <stringProp name="3059181">code</stringProp>
              </collectionProp>
              <stringProp name="collector.quantiles_or_buckets"></stringProp>
              <stringProp name="listener.collector.listen_to">samples</stringProp>
              <stringProp name="listener.collector.measuring">SuccessRatio</stringProp>
            </elementProp>
          </collectionProp>
        </com.github.johrstrom.listener.PrometheusListener>
        <hashTree/>
        <ResultCollector guiclass="ViewResultsFullVisualizer" testclass="ResultCollector" testname="View Results Tree" enabled="false">
          <boolProp name="ResultCollector.error_logging">false</boolProp>
          <objProp>
            <name>saveConfig</name>
            <value class="SampleSaveConfiguration">
              <time>true</time>
              <latency>true</latency>
              <timestamp>true</timestamp>
              <success>true</success>
              <label>true</label>
              <code>true</code>
              <message>true</message>
              <threadName>true</threadName>
              <dataType>true</dataType>
              <encoding>false</encoding>
              <assertions>true</assertions>
              <subresults>true</subresults>
              <responseData>false</responseData>
              <samplerData>false</samplerData>
              <xml>false</xml>
              <fieldNames>true</fieldNames>
              <responseHeaders>false</responseHeaders>
              <requestHeaders>false</requestHeaders>
              <responseDataOnError>false</responseDataOnError>
              <saveAssertionResultsFailureMessage>true</saveAssertionResultsFailureMessage>
              <assertionsResultsToSave>0</assertionsResultsToSave>
              <bytes>true</bytes>
              <sentBytes>true</sentBytes>
              <url>true</url>
              <threadCounts>true</threadCounts>
              <idleTime>true</idleTime>
              <connectTime>true</connectTime>
            </value>
          </objProp>
          <stringProp name="filename"></stringProp>
        </ResultCollector>
        <hashTree/>
        <ResultCollector guiclass="SummaryReport" testclass="ResultCollector" testname="Summary Report" enabled="false">
          <boolProp name="ResultCollector.error_logging">false</boolProp>
          <objProp>
            <name>saveConfig</name>
            <value class="SampleSaveConfiguration">
              <time>true</time>
              <latency>true</latency>
              <timestamp>true</timestamp>
              <success>true</success>
              <label>true</label>
              <code>true</code>
              <message>true</message>
              <threadName>true</threadName>
              <dataType>true</dataType>
              <encoding>false</encoding>
              <assertions>true</assertions>
              <subresults>true</subresults>
              <responseData>false</responseData>
              <samplerData>false</samplerData>
              <xml>false</xml>
              <fieldNames>true</fieldNames>
              <responseHeaders>false</responseHeaders>
              <requestHeaders>false</requestHeaders>
              <responseDataOnError>false</responseDataOnError>
              <saveAssertionResultsFailureMessage>true</saveAssertionResultsFailureMessage>
              <assertionsResultsToSave>0</assertionsResultsToSave>
              <bytes>true</bytes>
              <sentBytes>true</sentBytes>
              <url>true</url>
              <threadCounts>true</threadCounts>
              <idleTime>true</idleTime>
              <connectTime>true</connectTime>
            </value>
          </objProp>
          <stringProp name="filename"></stringProp>
        </ResultCollector>
        <hashTree/>
      </hashTree>
      <kg.apc.jmeter.threads.UltimateThreadGroup guiclass="kg.apc.jmeter.threads.UltimateThreadGroupGui" testclass="kg.apc.jmeter.threads.UltimateThreadGroup" testname="jp@gc - Ultimate Thread Group - gibbo slope" enabled="false">
        <collectionProp name="ultimatethreadgroupdata">
          <collectionProp name="-80734132">
            <stringProp name="1572">15</stringProp>
            <stringProp name="48">0</stringProp>
            <stringProp name="56313">900</stringProp>
            <stringProp name="1509345">1200</stringProp>
            <stringProp name="0"></stringProp>
          </collectionProp>
          <collectionProp name="-1213623823">
            <stringProp name="50">2</stringProp>
            <stringProp name="1509345">1200</stringProp>
            <stringProp name="48687">120</stringProp>
            <stringProp name="54639">780</stringProp>
            <stringProp name="0"></stringProp>
          </collectionProp>
          <collectionProp name="-1560134527">
            <stringProp name="51">3</stringProp>
            <stringProp name="1513251">1620</stringProp>
            <stringProp name="48873">180</stringProp>
            <stringProp name="50547">300</stringProp>
            <stringProp name="0"></stringProp>
          </collectionProp>
        </collectionProp>
        <elementProp name="ThreadGroup.main_controller" elementType="LoopController" guiclass="LoopControlPanel" testclass="LoopController" testname="Loop Controller" enabled="true">
          <boolProp name="LoopController.continue_forever">false</boolProp>
          <intProp name="LoopController.loops">-1</intProp>
        </elementProp>
        <stringProp name="ThreadGroup.on_sample_error">continue</stringProp>
      </kg.apc.jmeter.threads.UltimateThreadGroup>
      <hashTree>
        <RandomVariableConfig guiclass="TestBeanGUI" testclass="RandomVariableConfig" testname="Random catId" enabled="true">
          <stringProp name="maximumValue">30</stringProp>
          <stringProp name="minimumValue">1</stringProp>
          <stringProp name="outputFormat"></stringProp>
          <boolProp name="perThread">true</boolProp>
          <stringProp name="randomSeed">666</stringProp>
          <stringProp name="variableName">catId</stringProp>
        </RandomVariableConfig>
        <hashTree/>
        <HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="Welcome" enabled="true">
          <elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
            <collectionProp name="Arguments.arguments"/>
          </elementProp>
          <stringProp name="HTTPSampler.domain">${__P(HOST,konakartoraclecloud.lab.akamas.io)}</stringProp>
          <stringProp name="HTTPSampler.port">8780</stringProp>
          <stringProp name="HTTPSampler.protocol">http</stringProp>
          <stringProp name="HTTPSampler.contentEncoding"></stringProp>
          <stringProp name="HTTPSampler.path">konakart/Welcome.action</stringProp>
          <stringProp name="HTTPSampler.method">GET</stringProp>
          <boolProp name="HTTPSampler.follow_redirects">false</boolProp>
          <boolProp name="HTTPSampler.auto_redirects">true</boolProp>
          <boolProp name="HTTPSampler.use_keepalive">true</boolProp>
          <boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
          <boolProp name="HTTPSampler.BROWSER_COMPATIBLE_MULTIPART">true</boolProp>
          <boolProp name="HTTPSampler.image_parser">true</boolProp>
          <boolProp name="HTTPSampler.concurrentDwn">true</boolProp>
          <stringProp name="HTTPSampler.embedded_url_re"></stringProp>
          <stringProp name="HTTPSampler.connect_timeout"></stringProp>
          <stringProp name="HTTPSampler.response_timeout"></stringProp>
        </HTTPSamplerProxy>
        <hashTree/>
        <TestAction guiclass="TestActionGui" testclass="TestAction" testname="think time" enabled="true">
          <intProp name="ActionProcessor.action">1</intProp>
          <intProp name="ActionProcessor.target">0</intProp>
          <stringProp name="ActionProcessor.duration">100</stringProp>
        </TestAction>
        <hashTree/>
        <HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="SelectCat" enabled="true">
          <elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
            <collectionProp name="Arguments.arguments"/>
          </elementProp>
          <stringProp name="HTTPSampler.domain">${__P(HOST,konakartoraclecloud.lab.akamas.io)}</stringProp>
          <stringProp name="HTTPSampler.port">8780</stringProp>
          <stringProp name="HTTPSampler.protocol">http</stringProp>
          <stringProp name="HTTPSampler.contentEncoding"></stringProp>
          <stringProp name="HTTPSampler.path">konakart/SelectCat.action?catId=${catId}</stringProp>
          <stringProp name="HTTPSampler.method">GET</stringProp>
          <boolProp name="HTTPSampler.follow_redirects">false</boolProp>
          <boolProp name="HTTPSampler.auto_redirects">true</boolProp>
          <boolProp name="HTTPSampler.use_keepalive">true</boolProp>
          <boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
          <boolProp name="HTTPSampler.BROWSER_COMPATIBLE_MULTIPART">true</boolProp>
          <boolProp name="HTTPSampler.image_parser">true</boolProp>
          <boolProp name="HTTPSampler.concurrentDwn">true</boolProp>
          <stringProp name="HTTPSampler.embedded_url_re"></stringProp>
          <stringProp name="HTTPSampler.connect_timeout"></stringProp>
          <stringProp name="HTTPSampler.response_timeout"></stringProp>
        </HTTPSamplerProxy>
        <hashTree/>
        <TestAction guiclass="TestActionGui" testclass="TestAction" testname="think time" enabled="true">
          <intProp name="ActionProcessor.action">1</intProp>
          <intProp name="ActionProcessor.target">0</intProp>
          <stringProp name="ActionProcessor.duration">100</stringProp>
        </TestAction>
        <hashTree/>
        <com.github.johrstrom.listener.PrometheusListener guiclass="com.github.johrstrom.listener.gui.PrometheusListenerGui" testclass="com.github.johrstrom.listener.PrometheusListener" testname="Prometheus Listener" enabled="true">
          <collectionProp name="prometheus.collector_definitions">
            <elementProp name="" elementType="com.github.johrstrom.listener.ListenerCollectorConfig">
              <stringProp name="collector.help">Sampler Response Time</stringProp>
              <stringProp name="collector.metric_name">ResponseTime</stringProp>
              <stringProp name="collector.type">SUMMARY</stringProp>
              <collectionProp name="collector.labels">
                <stringProp name="102727412">label</stringProp>
                <stringProp name="3059181">code</stringProp>
              </collectionProp>
              <stringProp name="collector.quantiles_or_buckets">0.5,0.01|0.85,0.01|0.9,0.01|0.99,0.01;60</stringProp>
              <stringProp name="listener.collector.listen_to">samples</stringProp>
              <stringProp name="listener.collector.measuring">ResponseTime</stringProp>
            </elementProp>
            <elementProp name="" elementType="com.github.johrstrom.listener.ListenerCollectorConfig">
              <stringProp name="collector.help">Success and failure ratio</stringProp>
              <stringProp name="collector.metric_name">Ratio</stringProp>
              <stringProp name="collector.type">SUCCESS_RATIO</stringProp>
              <collectionProp name="collector.labels">
                <stringProp name="102727412">label</stringProp>
                <stringProp name="3059181">code</stringProp>
              </collectionProp>
              <stringProp name="collector.quantiles_or_buckets"></stringProp>
              <stringProp name="listener.collector.listen_to">samples</stringProp>
              <stringProp name="listener.collector.measuring">SuccessRatio</stringProp>
            </elementProp>
          </collectionProp>
        </com.github.johrstrom.listener.PrometheusListener>
        <hashTree/>
        <ResultCollector guiclass="ViewResultsFullVisualizer" testclass="ResultCollector" testname="View Results Tree" enabled="false">
          <boolProp name="ResultCollector.error_logging">false</boolProp>
          <objProp>
            <name>saveConfig</name>
            <value class="SampleSaveConfiguration">
              <time>true</time>
              <latency>true</latency>
              <timestamp>true</timestamp>
              <success>true</success>
              <label>true</label>
              <code>true</code>
              <message>true</message>
              <threadName>true</threadName>
              <dataType>true</dataType>
              <encoding>false</encoding>
              <assertions>true</assertions>
              <subresults>true</subresults>
              <responseData>false</responseData>
              <samplerData>false</samplerData>
              <xml>false</xml>
              <fieldNames>true</fieldNames>
              <responseHeaders>false</responseHeaders>
              <requestHeaders>false</requestHeaders>
              <responseDataOnError>false</responseDataOnError>
              <saveAssertionResultsFailureMessage>true</saveAssertionResultsFailureMessage>
              <assertionsResultsToSave>0</assertionsResultsToSave>
              <bytes>true</bytes>
              <sentBytes>true</sentBytes>
              <url>true</url>
              <threadCounts>true</threadCounts>
              <idleTime>true</idleTime>
              <connectTime>true</connectTime>
            </value>
          </objProp>
          <stringProp name="filename"></stringProp>
        </ResultCollector>
        <hashTree/>
        <ResultCollector guiclass="SummaryReport" testclass="ResultCollector" testname="Summary Report" enabled="false">
          <boolProp name="ResultCollector.error_logging">false</boolProp>
          <objProp>
            <name>saveConfig</name>
            <value class="SampleSaveConfiguration">
              <time>true</time>
              <latency>true</latency>
              <timestamp>true</timestamp>
              <success>true</success>
              <label>true</label>
              <code>true</code>
              <message>true</message>
              <threadName>true</threadName>
              <dataType>true</dataType>
              <encoding>false</encoding>
              <assertions>true</assertions>
              <subresults>true</subresults>
              <responseData>false</responseData>
              <samplerData>false</samplerData>
              <xml>false</xml>
              <fieldNames>true</fieldNames>
              <responseHeaders>false</responseHeaders>
              <requestHeaders>false</requestHeaders>
              <responseDataOnError>false</responseDataOnError>
              <saveAssertionResultsFailureMessage>true</saveAssertionResultsFailureMessage>
              <assertionsResultsToSave>0</assertionsResultsToSave>
              <bytes>true</bytes>
              <sentBytes>true</sentBytes>
              <url>true</url>
              <threadCounts>true</threadCounts>
              <idleTime>true</idleTime>
              <connectTime>true</connectTime>
            </value>
          </objProp>
          <stringProp name="filename"></stringProp>
        </ResultCollector>
        <hashTree/>
      </hashTree>
    </hashTree>
  </hashTree>
</jmeterTestPlan>

The provided run_test.sh wraps the command to execute the test, and requires as an argument the URL of the target KonaKart instance.

#!/bin/bash

set -e

TARGET="${1:-konakartoracle.lab.akamas.io}"
IMAGE=chiabre/jmeter_plugins

THIS_DIR=`dirname "${BASH_SOURCE[0]}"`
NOW=`date +'%Y%m%d%H%M%S'`

docker run --rm --name jmeter -i \
  --network akamas \
  -v /home/ubuntu/konakart-oracle/jmeter:/tmp \
  -w /tmp \
  -p 9270:9270 \
  $IMAGE \
  -t Konakart_optimizePerf.jmx -JTARGET_HOST=$TARGET

cp "${THIS_DIR}/jmeter.log" "${THIS_DIR}/jmeter.log.$NOW"

Optimization Setup

System

Our modeled system includes the following components:

The oracle component that models the Oracle Database instance on oradb.mycompany.com, whose parameters are the targets of our optimization
The webapp component that models the KonaKart service running on konakart.mycompany.com, providing the performance metrics used to validate the system’s SLOs

The first step is defining the system (system.yaml):

name: oracle system
description: Multi-tier application model featuring Java technology and Oracle Database on cloud

Here’s the definition of our oracle component (oracle.yaml), including the parameters needed to connect to the database instances and the filters to fetch metrics from Prometheus.

name: oracle
description: Oracle DB for konakart
componentType: Oracle Database 19c
properties:
  connection:
    user: user
    password: password
    host: oradb.mycompany.com
    service: konakart
    port: 1521

  prometheus:
    instance: oradb

Notice: in order to update the init parameter the user requires the ALTER SYSTEM privilege.

Here’s the definition of the konakart component (konakart.yaml), containing the filters to fetch the metrics from Prometheus:

name: konakart
description: Web application component for e2e metrics
componentType: Web Application
properties:
  prometheus:
    instance: jmeter
    job: jmeter

We can create the system by running the following command:

akamas create system system.yaml

We can then create the components by running the following commands:

akamas create component oracle.yaml 'oracle system'
akamas create component konakart.yaml 'oracle system'

Telemetry

Since we are using Prometheus to extract the database metrics we can leverage the Prometheus provider, which already includes the queries needed for the Oracle and JMetric queries for the metrics we need. To use the Prometheus provider we need to define a telemetry instance (prom.yaml):

provider: Prometheus
config:
  address: akamas.mycompany.com
  port: 9090

  logLevel: DETAILED

We can now create the telemetry instance and attach it to our system by running:

akamas create telemetry-instance prom.yaml 'oracle system'

Workflow

This section outlines the steps performed during the execution of the experiments.

Stop KonaKart

Using an Executor operator we run a command to stop the KonaKart instance using the script provided with the installation, then check the service is not running anymore with a custom script:

- name: stop konakart
  operator: Executor
  arguments:
    retries: 0
    command: bash /opt/konakart/bin/stopkonakart.sh
    host:
      hostname: konakart.mycompany.com
      username: ubuntu
      key: keyfile

- name: check konakart stop
  operator: Executor
  arguments:
    retries: 0
    command: bash /opt/scripts/check_konakart_stop.sh
    host:
      hostname: konakart.mycompany.com
      username: ubuntu
      key: keyfile

Attached you can find the referenced script check_konakart_stop.sh:

#! /bin/bash
numOfKonakartRunning=$(ps aux | grep "/opt/konakart/bin" | grep -v "grep" | wc -l)

if  [ $numOfKonakartRunning -eq 0 ] ;
then
    echo "konakart not running"
    exit 0
else
    echo "konakart is still running"
    exit 1
fi

Configure the Oracle instance

Using the OracleConfigurator operator to update the Oracle initialization parameters with the new configuration. Then with the Executor operator, we run some custom scripts to restart the database instance to apply the new parameters and check for a successful startup. Additionally, in case of a failed startup, the script of the last task restores a backup of the default configuration file (spfile), restarts the database, and returns an error code to notify Akamas that the tested configuration is invalid:

- name: update oracle parameters
  operator: OracleConfigurator
  arguments:
    retries: 0
    component: oracle

- name: restart oracle
  operator: Executor
  arguments:
    retries: 0
    command: sudo su -c 'bash /opt/scripts/restart_db.sh' - oracle
    host:
      hostname: oradb.mycompany.com
      username: opc
      key: oraclekey

- name: check oracle status
  operator: Executor
  arguments:
    retries: 0
    command: sudo su -c 'bash /opt/scripts/check_db.sh' - oracle
    host:
      hostname: oradb.mycompany.com
      username: opc
      key: oraclekey

Attached you can find the referenced script check_db.sh:

#!/bin/bash

ORA_OWNER=oracle

DB_NAME=kona_kart
BAK_FILE=~/akamas/spfilekona.ora

# [o] to avoid self-grep
PMON=[o]ra_pmon_kona

if ps -ef | grep -q $PMON ; then
  echo "[INFO] - Database running"

else
  if sudo test -f "$BAK_FILE"; then
    echo "[INFO] - Trying to restore Oracle database using file $BAK_FILE"
    echo "[INFO] - ORACLE_HOME value is: $ORACLE_HOME"

    cp "$BAK_FILE" "$ORACLE_HOME/dbs/"

    echo "[INFO] - Trying to restart oracle"

    echo "[INFO] - Stopping the database"
    $ORACLE_HOME/bin/dbshut $ORACLE_HOME
    # $ORACLE_HOME/bin/srvctl stop database -d $DB_NAME
    cat $ORACLE_HOME/rdbms/log/shutdown.log

    echo "[INFO] - Starting the database"
    $ORACLE_HOME/bin/dbstart $ORACLE_HOME
    # $ORACLE_HOME/bin/srvctl start database -d $DB_NAME
    cat $ORACLE_HOME/rdbms/log/startup.log

    ps -ef | grep $PMON

  else
    echo "[ERROR] - The spfile does not exist in current folder!"
  fi

  exit 255
fi

exit 0

and restart_db.sh:

#!/bin/bash

ORA_OWNER=oracle

DB_NAME=kona_kart

# [o] to avoid self-grep
PMON=[o]ra_pmon_kona

echo "[INFO] - Stopping the database"
$ORACLE_HOME/bin/dbshut $ORACLE_HOME
# $ORACLE_HOME/bin/srvctl stop database -d $DB_NAME
cat $ORACLE_HOME/rdbms/log/shutdown.log
echo "[INFO] - Starting the database"
$ORACLE_HOME/bin/dbstart $ORACLE_HOME
# $ORACLE_HOME/bin/srvctl start database -d $DB_NAME
cat $ORACLE_HOME/rdbms/log/startup.log

ps -ef | grep $PMON

exit 0

Restart the KonaKart instance

We then define the Executor operator tasks that restart the KonaKart service and check it is running correctly:

- name: start konakart
  operator: Executor
  arguments:
    retries: 0
    command: /opt/konakart/bin/startkonakart.sh
    host:
      hostname: konakart.mycompany.com
      username: ubuntu
      key: keyfile

- name: wait for konakart
  operator: Sleep
  arguments:
    retries: 0
    seconds: 30

- name: check konakart service status
  operator: Executor
  arguments:
    retries: 0
    command: bash /opt/scripts/check_konakart_start.sh
    host:
      hostname: konakart.mycompany.com
      username: ubuntu
      key: keyfile

Attached you can find the referenced script:

#! /bin/bash
numOfKonakartRunning=$(ps aux | grep "/opt/konakart/bin" | grep -v "grep" | wc -l)

if  [ $numOfKonakartRunning -eq 1 ] ;
then
    echo "konakart running"
    exit 0
else
    echo "konakart not running"
    exit 1
fi

Run the workload

Finally, we define a task that uses the Executor operator to run the JMeter load test against the KonaKart instance:

- name: test
  operator: Executor
  arguments:
    retries: 0
    command: bash /opt/scripts/run_test.sh konakart.mycompany.com
    host:
      hostname: akamas.mycompany.com
      username: ubuntu
      key: keyfile

Complete workflow

By putting together all the tasks defined above we come up with the following workflow definition (workflow.yaml):

name: konakart_workfl
tasks:

- name: stop konakart
  operator: Executor
  arguments:
    retries: 0
    command: bash /opt/konakart/bin/stopkonakart.sh
    host:
      hostname: konakart.mycompany.com
      username: ubuntu
      key: keyfile

- name: check konakart stop
  operator: Executor
  arguments:
    retries: 0
    command: bash /opt/scripts/check_konakart_stop.sh
    host:
      hostname: konakart.mycompany.com
      username: ubuntu
      key: keyfile

- name: update oracle parameters
  operator: OracleConfigurator
  arguments:
    retries: 0
    component: oracle

- name: restart oracle
  operator: Executor
  arguments:
    retries: 0
    command: sudo su -c 'bash /opt/scripts/restart_db.sh' - oracle
    host:
      hostname: oradb.mycompany.com
      username: opc
      key: oraclekey

- name: wait for oracle
  operator: Sleep
  arguments:
    retries: 0
    seconds: 60

- name: check oracle status
  operator: Executor
  arguments:
    retries: 0
    command: sudo su -c 'bash /opt/scripts/check_db.sh' - oracle
    host:
      hostname: oradb.mycompany.com
      username: opc
      key: oraclekey

- name: start konakart
  operator: Executor
  arguments:
    retries: 0
    command: /opt/konakart/bin/startkonakart.sh
    host:
      hostname: konakart.mycompany.com
      username: ubuntu
      key: keyfile

- name: wait for konakart
  operator: Sleep
  arguments:
    retries: 0
    seconds: 30

- name: check konakart service status
  operator: Executor
  arguments:
    retries: 0
    command: bash /opt/scripts/check_konakart_start.sh
    host:
      hostname: konakart.mycompany.com
      username: ubuntu
      key: keyfile

- name: test
  operator: Executor
  arguments:
    retries: 0
    command: bash /opt/scripts/run_test.sh konakart.mycompany.com
    host:
      hostname: akamas.mycompany.com
      username: ubuntu
      key: keyfile

We can create the workflow by running:

akamas create workflow workflow.yaml

Study

This study aims to minimize the memory allocated for the Oracle database while under a simulated load of the typical traffic, without impacting the SLOs.

This section provides a step-by-step description of the study definition.

Goal

Here’s the definition of the goal for our study, which is to minimize the memory allocated by Oracle to the SGA and PGA memory areas. The constraints ensure that any tested configuration that does not operate within the defined SLOs is flagged as not valid. In particular, the followings are required:

the peak error rate must not exceed 5 errors per seconds
the transaction throughput must not decrease more than 10% with respect to the baseline
the response time must not increase more than 20% with respect to the baseline

goal:
  objective: minimize
  function:
    formula: oracle.oracle_sga_total_size + oracle.oracle_pga_target_size
  constraints:
    absolute:
    - konakart.transactions_error_rate <= 5.0
    relativeToBaseline:
    - konakart.transactions_throughput >= -10%
    - konakart.transactions_response_time <= +20%

Windowing

We define a window to consider only the data points after the ramp-up time of the load test:

windowing:
  type: trim
  trim: [200s, 30s]
  task: testde

Parameters to optimize

For this study, we are trying to optimize the size of the two main memory areas, meaning the Program Global Area and the Shared Global Area.

Given our goal, we set the domains of the parameters to explore only sizes smaller than the baseline.

parametersSelection:
- name: oracle.sga_target
  domain: [256, 1536]
- name: oracle.sga_max_size
  domain: [256, 1536]
- name: oracle.pga_aggregate_target
  domain: [256, 512]

The following constraint prevents Akamas from exploring configurations that we already know Oracle won’t validate:

parameterConstraints:
- name:  SGA
  formula: oracle.sga_target <= oracle.sga_max_size

Steps

We are going to add to our study two steps:

A baseline step, in which we configure the default values for the memory parameters as discovered from previous manual executions.
An optimization step, where we perform 200 experiments to search the set of parameters that best satisfies our goal.

Here’s what these steps look like:

steps:
- name: Baseline step
  type: baseline
  values:
    oracle.sga_target: 1536
    oracle.pga_aggregate_target: 512
    oracle.sga_max_size: 1536
    oracle.pga_aggregate_limit: 2048

- name: Optimization step
  type: optimize
  numberOfExperiments: 200
  maxFailedExperiments: 200

Complete study

Here’s the study definition (study.yaml) for optimizing the Oracle instance:

name: Minimize Oracle memory for KonaKart
system: oracle system
workflow: konakart_workfl
goal:
  objective: minimize
  function:
    formula: oracle.oracle_sga_total_size + oracle.oracle_pga_target_size
  constraints:
    absolute:
    - konakart.transactions_error_rate <= 5.0
    relativeToBaseline:
    - konakart.transactions_throughput >= -10%
    - konakart.transactions_response_time <= +20%

windowing:
  type: trim
  trim: [200s, 30s]
  task: test

parametersSelection:
- name: oracle.sga_target
  domain: [256, 1536]
- name: oracle.sga_max_size
  domain: [256, 1536]
- name: oracle.pga_aggregate_target
  domain: [256, 512]

parameterConstraints:
- name:  SGA
  formula: oracle.sga_target <= oracle.sga_max_size

trialAggregation: AVG
numberOfTrials: 1

steps:
- name: Baseline step
  type: baseline
  values:
    oracle.sga_target: 1536
    oracle.pga_aggregate_target: 512
    oracle.sga_max_size: 1536
    oracle.pga_aggregate_limit: 2048

- name: Optimization step
  type: optimize
  numberOfExperiments: 200
  maxFailedExperiments: 200

You can create the study by running:

akamas create study study.yaml

You can then start it by running:

akamas start study 'Minimize Oracle memory for KonaKart'

Guidelines for optimizing Oracle RDS

This page provides a list of best practices when optimizing an Oracle RDS with Akamas.

Optimization setup

System setup

Every RDS instance fetches the initialization parameters from the definition of the DB parameter group it is bound to. A best practice is to create a dedicated copy of the baseline group for the target database, in order to avoid impacting any other database that may share the same configuration object.

Workflow setup

DB parameter groups must be configured through the the the dedicated Amazon RDS API interface. A simple way to implement this step in the Akamas workflow is to save the tested configuration in a configuration file and submit it through a custom executor leveraging the AWS Command Line Interface. The following snippets show an example tuning an instance with id oracletest, bound to configuration group named test-oracle:

name: tune RDS Oracle
tasks:
  - name: Generate Oracle configuration
    operator: FileConfigurator
    arguments:
      sourcePath: oracle/rdsscripts/oraconf.template
      targetPath: oracle/oraconf
      component: oracle

  - name: Update conf
    operator: Executor
    arguments:
      command: bash ~/oracle/rdsscripts/rds_update.sh test-oracle ~/oracle/
      component: oracle

  - name: Reboot Oracle
    operator: Executor
    arguments:
      command: bash ~/oracle/rdsscripts/rds_reboot.sh oracletest
      component: oracle

# rest of the workflow...

Where the following is an example of the configuration template oraconf.template:

pga_aggregate_target	${oracle.pga_aggregate_target}
pga_aggregate_limit	${oracle.pga_aggregate_target}
db_cache_size	${oracle.db_cache_size}
java_pool_size	${oracle.java_pool_size}
large_pool_size	${oracle.large_pool_size}
log_buffer	${oracle.log_buffer}

The following script rds_update.sh updates the configuration. It requires the name of the target DB parameter group and path of the temporary folder containing the generated configuration:

#!/bin/bash

set -euo pipefail

GROUP_NAME=$1
TMPFLD=$2

TS=`date +'%y%m%d%H%M%S'`

cd ${TMPFLD}

cp oraconf conf.$TS

AWK_CODE='{n=$2} $2~/[0-9]+m$/ {gsub(/m$/,"",n);n=n*1024*1024} $2~/[0-9]+k$/ {gsub(/k$/,"",n);n=n*1024} {print "ParameterName="$1",ParameterValue="n",ApplyMethod=pending-reboot"}'

echo Applying params: ; awk "${AWK_CODE}" oraconf

aws rds modify-db-parameter-group \
  --db-parameter-group-name ${GROUP_NAME} \
  --parameters `awk "${AWK_CODE}" oraconf`

# dump full new conf
aws rds describe-db-parameters \
  --db-parameter-group-name ${GROUP_NAME} | jq -c '.Parameters[] | {ParameterName, ParameterValue}' > full_pars_dump.$TS.jsonl

# if configuration changed wrt last one (ie: this is the first trial of the new experiment) wait for update propagation
diff -q `ls conf\.* | tail -n2` || (echo 'Configuration changed. Waiting for propagation.' && sleep 420 )

The following script rds_reboot.sh restarts the RDS instance with the provided id:

#!/bin/bash

set -u

INST_ID=$1

DELAY_SEC=30
RETRIES=60

aws rds reboot-db-instance --db-instance-identifier $INST_ID | jq -c '.DBInstance | {DBInstanceIdentifier, Engine, DBInstanceStatus}'

echo "Waiting for ${INST_ID}"

for i in `seq $RETRIES`; do
    sleep $DELAY_SEC
    status=`aws rds describe-db-instances --db-instance-identifier $INST_ID | jq -r '.DBInstances[].DBInstanceStatus'`
    echo "${INST_ID}: ${status}"
    [ "${status}" = 'available' ] && exit 0
    [ "${status}" = 'incompatible-parameters' ] && exit 255
done

exit 255

Optimizing a MySQL server database running Sysbench

In this example study, we are going to optimize a MySQL instance by setting the performance goal of maximizing the throughput of operations towards the database.

As regards the workload generation, in this example we are going to use Sysbench, a popular open-source benchmarking suite.

To import the results of the benchmark into Akamas, we are going to use a custom script to convert its output to a CSV file that can be parsed by the CSV provider.

Environment Setup

In order to run the Sysbench suite against a MySQL installation, you need to first install and configure the two software. In the following , we will assume that both MySQL and Sysbench will run on the same machine, to obtain more significant results in terms of performance you might want to run them on separate hosts.

A set of scripts is provided to support all the setup steps.

MySQL Installation

To install MySQL please follow the official documentation. In the following, we will make a few assumptions on the location of the configuration files, the user running the server, and the location of the datafiles. These assumptions are based on a default installation of MySQL on an Ubuntu instance performed via apt.

Configuration file: /etc/mysql/conf.d/mysql.cnf
MySQL user: mysql
MySQL root user password: root

This is a template for the configuration file mysql.cnf.template

[mysqld]
socket=/tmp/mysql.sock
ssl=0
innodb_buffer_pool_size     = ${mysql.mysql_innodb_buffer_pool_size}
innodb_thread_sleep_delay   = ${mysql.mysql_innodb_thread_sleep_delay}
innodb_flush_method         = ${mysql.mysql_innodb_flush_method}
innodb_log_file_size        = ${mysql.mysql_innodb_log_file_size}
innodb_thread_concurrency   = ${mysql.mysql_innodb_thread_concurrency}
innodb_max_dirty_pages_pct  = ${mysql.mysql_innodb_max_dirty_pages_pct}
innodb_read_ahead_threshold = ${mysql.mysql_innodb_read_ahead_threshold}

If your installation of MySQL has different default values for these parameters please update the provided scripts accordingly.

Sysbench Installation

To install Sysbench on an ubuntu machine run the following command

sudo apt install sysbench

To verify your installation of Sysbench and initialize the database you can use the scripts provided here below and place them in the /home/ubuntu/scripts folder. Move in the folder, make sure MySQL is already running, and run the init-db.sh script.

This is the init-db.sh script:

#!/bin/bash
set -e
cd "$(dirname "$0")"
mysql -u root -proot -e "CREATE DATABASE IF NOT EXISTS sbtest"

HOST="--mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-password=root"
sysbench oltp_read_only --tables=10 --table_size=1000000 --threads=100 $HOST --time=300 --max-requests=0 --report-interval=1 --rand-type=uniform --db-driver=mysql --mysql-db=sbtest --mysql-ssl=off prepare| tee -a res.warmup.ro.txt

#sleep 5
#sudo systemctl stop mysql
#
##Create the backup
#echo "Backing up the database"
#sudo rm -rf /tmp/backup
#sudo mkdir /tmp/backup
#sudo rsync -r --progress /var/lib/mysql /tmp/backup/
#sleep 2
#
#sudo systemctl start mysql
#sudo systemctl status mysql

This script will:

connect to your MySQL installation
create a sbtest database for the test
run the Sysbench data generation phase to populate the database

The init-db.sh script contains some information on the amount of data to generate. The default setting is quite small and should be used for testing purposes. You can then modify the test to suit your benchmarking needs. If you update the script please also update the run_benchmark.sh script accordingly.

Optimization Setup

Here follow a step by step explanation of all the required configuration for this example. You can find attached a zip file that contains all of the YAML files for your convenience.

System

In this example, we are interested in optimizing MySQL settings and measuring the peak throughput measured using Sysbench. Hence, we are going to create two components:

A mysql component which represents the MySQL instance, including all the configuration parameters
A Sysbenchcomponent which represents Sysbench and contains the custom metrics reported by the benchmark

The Sysbench component

MySQL is a widespread technology and Akamas provides a specific Optimization Pack to support its optimization. Sysbench, on the other hand, is a benchmark application and is not yet supported by a specific optimization pack. In order to use it in our study, we will need to define its metrics first. This operation can be done once and the created component type can be used across many systems.

First, build a metrics.yamlfile with the following content:

---
metrics:
  - name: throughput
    description: The throughput of the database
    unit: tps

  - name: response_time_avg
    description: The average response time of the database
    unit: milliseconds

  - name: response_time_95th
    description: The response time 95th percentile of the database
    unit: milliseconds

  - name: duration
    description: The duration of the task (load or benchmark execution)
    unit: seconds

You can now create the metrics by issuing the following command:

akamas create metrics metrics.yaml

Finally, create a file named sysbench.yaml with the following definition of the component:

name: Sysbench
description: >
  Sysbench benchmark. It is a purely synthetic benchmark that can create isolated contention on system  resources. Each of the benchmark’s transaction imposes some load on three specific resources: CPU, disk I/O, and locks. It is also used to simulate a database workload.
parameters: []
metrics:
  - name: throughput
  - name: response_time_avg
  - name: response_time_95th
  - name: duration

You can now create the component by issuing the following command:

akamas create component-type sysbench.yaml

Model the system

Here’s the definition of our system (system.yaml):

name: MySQL-Sysbench
description: A system for optimizing MySQL with Sysbench

Here’s the definition of our mysql component (mysql.yaml):

name: mysql
description: MySQL
componentType: MySQL 8.0
properties:
    hostname: gibbo.dev.akamas.io
    sshPort: "22"
    username: ubuntu
    sourcePath: /home/ubuntu/scripts/my.cnf.template
    targetPath: /home/ubuntu/scripts/my.cnf
    prometheus:
      instance: gibbo
      job: mysql_exporter
    key: |
      -----BEGIN RSA PRIVATE KEY-----

      -----END RSA PRIVATE KEY-----

Please make sure the component properties are correct for your environment (e.g. hostname, username, key, file paths, etc.).

Here’s the definition of our Sysbench component (sysbench.yaml):

name: Sysbench
description: Sysbench Benchmark for database systems
componentType: Sysbench

We can create the system by running:

akamas create system system.yaml

We can then create the components by running the following commands:

akamas create component mysql.yaml MySQL-Sysbench
akamas create component sysbench.yaml MySQL-Sysbench

Workflow

A workflow for optimizing MySQL can be structured in 6 tasks:

Reset Sysbench data
Configure MySQL
Restart MySQL
Launch the benchmark
Parse the benchmark results

Here below you can find the scripts that codify these tasks.

This is the restart-mysql.sh script:

#!/usr/bin/env bash
set -e

cd "$(dirname "$0")"

#Stop the DB
echo "Stopping MySQL"
sudo systemctl stop mysql &> /dev/null
#sudo systemctl status mysql

#Apply Configuration
echo "Copying the configuration"
sudo cp my.cnf /etc/mysql/conf.d/mysql.cnf

sync; sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"; sync

#Restart DB
echo "Restarting the database"
sudo systemctl start mysql &> /dev/null
#sudo systemctl status mysql
sleep 2

This is the clean_bench.sh script:

#!/usr/bin/env bash
set -e
cd "$(dirname "$0")"

if ! test -d results || [[ -z "$(ls -A results)" ]]; then
    echo "First iteration"
    mkdir -p results
    exit 0
fi

rm -rf results
mkdir -p results

This is the run_test.sh script:

#!/bin/bash
set -e

cd "$(dirname "$0")"

HOST="--mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-password=root"
sysbench oltp_read_only --tables=10 --table_size=1000000 --threads=100 $HOST --time=60 --max-requests=0 --report-interval=1 --rand-type=uniform --db-driver=mysql --mysql-db=sbtest --mysql-ssl=off run | tee -a results/res.txt

This file parse_csv.sh script:

#!/bin/bash
set -e
cd "$(dirname "$0")"
OUTFILE=$(pwd)/results/output.csv
INFILE=$(pwd)/results/res.txt
COMPONENT=Sysbench
epoch_now=$(date +"%s")
num_samples=$(grep -c "thds" ${INFILE})
epoch_start=$(($epoch_now - $num_samples))
cat $INFILE | while read line
do
        ts_sysbench=$(echo $line | cut -d' ' -f2)
        # CSV header
        [ "$ts_sysbench" == "started!" ] && echo "component,ts,throughput,response_time_95pct" > ${OUTFILE} && continue
        # CSV body
        tps=$(echo $line | cut -d' ' -f7)
        lat_95pct=$(echo $line | cut -d' ' -f14)
        # skip unless it's a metric line
        echo $line | grep -q "thds" || continue
        ts_seconds=$(echo $ts_sysbench | sed 's/s//')
        epoch_current=$(($epoch_start + $ts_seconds))
        ts=$(date -d @$(($epoch_current)) "+%Y-%m-%d %H:%M:%S")
        echo "${COMPONENT},$ts,$tps,$lat_95pct" >> ${OUTFILE}
done

Here is the complete Akamas workflow for this example (workflow.yaml):

name: MySQL-Sysbench
tasks:

  - name: Reset Sysbench data
    operator: Executor
    arguments:
      command: "bash /home/ubuntu/scripts/clean_bench.sh"
      component: mysql

  - name: Configure MySQL
    operator: FileConfigurator
    arguments:
      component: mysql

  - name: Restart MySQL
    operator: Executor
    arguments:
      command: "bash /home/ubuntu/scripts/restart-mysql.sh"
      component: mysql

  - name: test
    operator: Executor
    arguments:
      command: "bash /home/ubuntu/scripts/run_test.sh"
      component: mysql

  - name: Parse CSV results
    operator: Executor
    arguments:
      command: "bash /home/ubuntu/scripts/parse_csv.sh"
      component: mysql

You can create the workflow by running:

akamas create workflow workflow.yaml

Telemetry

We are going to use Akamas telemetry capability to import the metrics related to Sysbench benchmark results, in particular, the transaction throughput and latency. To achieve this we can leverage the Akamas CSV provider, which extracts metrics from CSV files. The CSV file is the one produced in the last task of the workflow of the study.

This telemetry provider can be installed running:

akamas install telemetry-provider telemetry/providers/csv.yaml

To start using the provider, we need to define a telemetry instance (csv.yaml):

provider: csv
config:
  protocol: scp
  address: gibbo.dev.akamas.io
  username: ubuntu
  authType: key
  auth: |
    -----BEGIN RSA PRIVATE KEY-----

    -----END RSA PRIVATE KEY-----
  remoteFilePattern: /home/ubuntu/scripts/results/output.csv
  csvFormat: horizontal
  componentColumn: component
  timestampColumn: ts
  timestampFormat: yyyy-MM-dd HH:mm:ss

metrics:
- metric: throughput
  datasourceMetric: throughput
- metric: response_time_95th
  datasourceMetric: response_time_95pct

Please make sure the telemetry configuration is correct for your environment (e.g. hostname, username, key, file paths, etc.).

You can create the telemetry instance and attach it to the system by running:

akamas create telemetry-instance csv.yaml MySQL-Sysbench

Study

In this example, we are going to leverage Akamas AI-driven optimization capabilities to maximize MySQL database transaction throughput, as measured by the Sysbench benchmark.

Here is the Akamas study definition (study.yaml):

name: MySQL Sysbench Tuning
description: Tuning of mysql-8 with Sysbench benchmark
system: MySQL-Sysbench
workflow: MySQL-Sysbench

goal:
  objective: maximize
  function:
    formula: Sysbench.throughput
  constraints: []

# Akamas score automatically trim 1m of warm-up and 1m of tear-down
windowing:
  task: test
  type: trim
  trim: [1m, 1m]

# We optimize some common MySQL parameters
parametersSelection:
  - name: mysql.mysql_innodb_buffer_pool_size
    domain: [5242880, 10485760]
  - name: mysql.mysql_innodb_thread_sleep_delay
    domain: [1,3000]
  - name: mysql.mysql_innodb_flush_method
  - name: mysql.mysql_innodb_log_file_size
  - name: mysql.mysql_innodb_thread_concurrency
    domain: [0, 4]
  - name: mysql.mysql_innodb_max_dirty_pages_pct
  - name: mysql.mysql_innodb_read_ahead_threshold

# The metrics we are interested in
metricsSelection:
  - Sysbench.throughput
  - Sysbench.response_time_95th

# Each experiment can run multiple trials to evaluate stability
numberOfTrials: 1

steps:
# We first run a baseline experiment with default values
  - name: baseline
    type: baseline
    renderParameters: ["mysql.*"]

# We then optimize for 200 experiments
  - name: optimize
    type: optimize
    optimizer: AKAMAS
    numberOfExperiments: 200
    maxFailedExperiments: 200
    renderParameters: ["mysql.*"]

You may need to update some parameter domains based on your environment (e.g. InnoDB buffer pool size maximum value depends on your server available memory)

You can create the study by running:

akamas create study study.yaml

You can then start it by running:

akamas start study "MySQL Sysbench Tuning"

You can now follow the study progress using the UI and explore the results using the Analysis and Metrics tabs.

Optimizing a MySQL server database running OLTPBench

In this example study, we are going to optimize a MySQL instance by setting the performance goal of maximizing the throughput of operations towards the database.

As regards the workload generation, in this example we are going to use OLTPBench, a popular open-source benchmarking suite for databases. OLTPBench supports several benchmarks, in this example we will be using Synthetic Resource Stresser.

To import the results of the benchmark into Akamas, we are going to use a custom script to convert its output to a CSV file that can be parsed by the CSV provider.

Environment Setup

In order to run the OLTP Benchmark suite against a MySQL installation, you need to first install and configure the two software. In the following, we will assume that both MySQL and OLTP will run on the same machine, to obtain more significant results in terms of performance you might want to run them on separate hosts.

MySQL Installation

Datafile location: /var/lib/mysql
Configuration file: /etc/mysql/conf.d/mysql.cnf
MySQL user: mysql
MySQL root user password: root

This is a template for the configuration file mysql.cnf.template

[mysqld]

innodb_buffer_pool_size     = 134217728
innodb_thread_sleep_delay   = 10000
innodb_flush_method         = fsync
innodb_log_file_size        = 50331648
innodb_thread_concurrency   = 0
innodb_max_dirty_pages_pct  = 10.00000
innodb_read_ahead_threshold = 56

If your installation of MySQL has different default values for these parameters please update the provided scripts accordingly.

OLTP Installation

To install OLTP you can download a pre-built version here or build it from the official repository. In the following, we will assume that OLTP is installed in the /home/ubuntu/oltp folder.

To verify your installation of OLTP and initialize the database you can download the following set of scripts and place them in the /home/ubuntu/scripts folder. Move in the folder and run the init-db.sh script.

This is the init-db.sh script:

#!/bin/bash
set -e


cd "$(dirname "$0")"
cd ../oltp
mysql -u root -proot -e "CREATE DATABASE resourcestresser"
./oltpbenchmark --bench resourcestresser --config scripts/resourcestresser.xml --create=true --load=true

sleep 5
sudo systemctl stop mysql

#Create the backup
echo "Backing up the database"
sudo rm -rf /tmp/backup
sudo mkdir /tmp/backup
sudo rsync -r --progress /var/lib/mysql /tmp/backup/
sleep 2

sudo systemctl start mysql
sudo systemctl status mysql

This script will:

connect to your MySQL installation
create a resourcestresser database for the test
run the OLTP data generation phase to populate the database
backup the initialized database under /tmp/backup

The resourcestresser.xml file contains the workload for the application. The default setting is quite small and should be used for testing purposes. You can then modify the test to suit your benchmarking needs.

Optimization Setup

Here follow a step-by-step explanation of all the required configurations for this example.

System

In this example, we are interested in optimizing MySQL settings and measuring the peak throughput measured using OLTPBench. Hence, we are going to create two components:

A mysql component which represents the MySQL instance, including all the configuration parameters
An OLTP component which represents OLTPBench and contains the custom metrics reported by the benchmark

The OLTP component

MySQL is a widespread technology and Akamas provides a specific Optimization Pack to support its optimization. OLTP, on the other hand, is a benchmark application and is not yet supported by a specific optimization pack. In order to use it in our study, we will need to define its metrics first. This operation can be done once and the created component type can be used across many systems.

First, build a metrics.yamlfile with the following content:

---
metrics:
  - name: throughput
    description: The throughput of the database
    unit: tps

  - name: response_time_avg
    description: The average response time of the database
    unit: milliseconds

  - name: response_time_min
    description: The minimum response time of the database
    unit: milliseconds

  - name: response_time_25th
    description: The response time 25th percentile of the database
    unit: milliseconds

  - name: response_time_median
    description: The response time median of the database
    unit: milliseconds

  - name: response_time_75th
    description: The response time 75th percentile of the database
    unit: milliseconds

  - name: response_time_90th
    description: The response time 90th percentile of the database
    unit: milliseconds

  - name: response_time_95th
    description: The response time 95th percentile of the database
    unit: milliseconds

  - name: response_time_99th
    description: The response time 99th percentile of the database
    unit: milliseconds

  - name: response_time_max
    description: The maximum response time of the database
    unit: milliseconds

  - name: duration
    description: The duration of the task (load or benchmark execution)
    unit: seconds

You can now create the metrics by issuing the following command:

akamas create metrics metrics.yaml

Finally, create a file named resourcestresser.yaml with the following definition of the component:

name: ResourceStresser
description: >
  ResourceStresser benchmark from OLTPBench for database systems. It is a
  purely synthetic benchmark that can create isolated contention on the system
  resources. Each of the benchmark’s transactions imposes some load on three
  specific resources: CPU, disk I/O, and locks.
parameters: []
metrics:
  - name: throughput
  - name: response_time_avg
  - name: response_time_max
  - name: response_time_min
  - name: response_time_25th
  - name: response_time_median
  - name: response_time_75th
  - name: response_time_90th
  - name: response_time_95th
  - name: response_time_99th
  - name: duration

You can now create the metrics by issuing the following command:

akamas create component-type resourcestresser.yaml

Model the system

Here’s the definition of our system (system.yaml):

name: MySQL-ResourceStresser
description: A system for evaluating MySQL with OLTP Benchmark

Here’s the definition of our mysql component (mysql.yaml):

name: mysql
description: MySQL
componentType: MySQL 8.0
properties:
    hostname: gibbo.dev.akamas.io
    sshPort: "22"
    username: ubuntu
    sourcePath: /home/ubuntu/scripts/my.cnf.template
    targetPath: /home/ubuntu/scripts/my.cnf
    prometheus:
      instance: gibbo
      job: mysql_exporter
    key: |
            -----BEGIN RSA PRIVATE KEY-----

            -----END RSA PRIVATE KEY-----

Here’s the definition of our OLTP component (oltp.yaml):

name: OLTP
description: OLTP Benchmark for database systems
componentType: ResourceStresser

We can create the system by running:

akamas create system system.yaml

We can then create the components by running:

akamas create component mysql.yaml MySQL-ResourceStresser
akamas create component oltp.yaml MySQL-ResourceStresser

Workflow

A workflow for optimizing MySQL can be structured into 6 tasks:

Reset OLTPBench data
Configure MySQL
Restart MySQL
Launch the benchmark
Parse the benchmark results

Here below you can find the scripts that codify these tasks.

This is the restart-mysql.sh script:

#!/usr/bin/env bash
set -e

cd "$(dirname "$0")"

#Stop the DB
echo "Stopping MySQL"
sudo systemctl stop mysql &> /dev/null
#sudo systemctl status mysql

#Apply Configuration
echo "Copying the configuration"
sudo cp my.cnf /etc/mysql/conf.d/mysql.cnf

#Drop data
echo "Dropping the data"
sudo rm -rf /var/lib/mysql
#Create the backup
# sudo rsync -r --progress /var/lib/mysql /tmp/backup/

#Restore the backup data
echo "Restoring the DB"
sudo rsync -r --progress /tmp/backup/mysql /var/lib/
sudo chown -R mysql: /var/lib/mysql

sync; sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"; sync

#Restart DB
echo "Restarting the database"
sudo systemctl start mysql &> /dev/null
#sudo systemctl status mysql
sleep 2

This is the clean_bench.sh script:

#!/usr/bin/env bash
set -e
cd "$(dirname "$0")"


if ! test -d results || [[ -z "$(ls -A results)" ]]; then
    echo "First iteration"
    mkdir -p results
    exit 0
fi

rm -rf results
mkdir -p results

This is the run_test.sh script:

#!/bin/bash
set -e

cd "$(dirname "$0")"

HOST="--mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-password=root"
sysbench oltp_read_only --tables=10 --table_size=1000000 --threads=100 $HOST --time=60 --max-requests=0 --report-interval=1 --rand-type=uniform --db-driver=mysql --mysql-db=sbtest --mysql-ssl=off run | tee -a results/res.txt

This file parse_csv.sh script:

#!/bin/bash
set -e
cd "$(dirname "$0")"
OUTFILE=$(pwd)/results/output.csv
INFILE=$(pwd)/results/res.txt
COMPONENT=Sysbench
epoch_now=$(date +"%s")
num_samples=$(grep -c "thds" ${INFILE})
epoch_start=$(($epoch_now - $num_samples))
cat $INFILE | while read line
do
        ts_sysbench=$(echo $line | cut -d' ' -f2)
        # CSV header
        [ "$ts_sysbench" == "started!" ] && echo "component,ts,throughput,response_time_95pct" > ${OUTFILE} && continue
        # CSV body
        tps=$(echo $line | cut -d' ' -f7)
        lat_95pct=$(echo $line | cut -d' ' -f14)
        # skip unless it's a metric line
        echo $line | grep -q "thds" || continue
        ts_seconds=$(echo $ts_sysbench | sed 's/s//')
        epoch_current=$(($epoch_start + $ts_seconds))
        ts=$(date -d @$(($epoch_current)) "+%Y-%m-%d %H:%M:%S")
        echo "${COMPONENT},$ts,$tps,$lat_95pct" >> ${OUTFILE}
done

Here is the complete Akamas workflow for this example (workflow.yaml):

name: MySQL-ResourceStresser
tasks:

  - name: Reset OLTP data
    operator: Executor
    arguments:
      command: "bash /home/ubuntu/scripts/clean_bench.sh"
      component: mysql

  - name: Configure MySQL
    operator: FileConfigurator
    arguments:
      component: mysql

  - name: Restart MySQL
    operator: Executor
    arguments:
      command: "/home/ubuntu/scripts/restart-mysql.sh"
      component: mysql

  - name: test
    operator: Executor
    arguments:
      command: "cd /home/ubuntu/oltp && ./oltpbenchmark --bench resourcestresser --config /home/ubuntu/scripts/resourcestresser.xml --execute=true -s 5 --output out"
      component: mysql

  - name: Parse csv results
    operator: Executor
    arguments:
      command: "bash /home/ubuntu/scripts/parse_csv.sh"
      component: mysql

You can create the workflow by running:

akamas create workflow workflow.yaml

Telemetry

We are going to use Akamas telemetry capability to import the metrics related to OLTPBench benchmark results, in particular the throughput of operations. To achieve this we can leverage the Akamas CSV provider, which extracts metrics from CSV files. The CSV file is the one produced in the last task of the workflow of the study.

This telemetry provider can be installed by running:

akamas install telemetry-provider telemetry/providers/csv.yaml

To start using the provider, we need to define a telemetry instance (csv.yaml):

provider: csv
config:
  protocol: scp
  address: gibbo.dev.akamas.io
  username: ubuntu
  authType: key
  auth: |
    -----BEGIN RSA PRIVATE KEY-----

    -----END RSA PRIVATE KEY-----
  remoteFilePattern: /home/ubuntu/output.csv
  csvFormat: horizontal
  componentColumn: component
  timestampColumn: ts
  timestampFormat: yyyy-MM-dd HH:mm:ss

metrics:
- metric: throughput
  datasourceMetric: throughput
- metric: response_time_avg
  datasourceMetric: response_time_avg
- metric: response_time_max
  datasourceMetric: response_time_max
- metric: response_time_min
  datasourceMetric: response_time_min
- metric: response_time_25th
  datasourceMetric: response_time_25th
- metric: response_time_median
  datasourceMetric: response_time_median
- metric: response_time_75th
  datasourceMetric: response_time_75th
- metric: response_time_90th
  datasourceMetric: response_time_90th
- metric: response_time_95th
  datasourceMetric: response_time_95th
- metric: response_time_99th
  datasourceMetric: response_time_99th

Please make sure the telemetry configuration is correct for your environment (e.g. hostname, username, key, file paths, etc.).

You can create the telemetry instance and attach it to the system by running:

akamas create telemetry-instance csv.yaml MySQL-ResourceStresser

Study

In this example, we are going to leverage Akamas AI-driven optimization capabilities to maximize MySQL database query throughput, as measured by the OLTPBench benchmark.

Here is the Akamas study definition (study.yaml):

name: MySQL Tuning
description: Tuning of mysql-8 with OLTPBenchmark using benchmark resourcestresser
system: MySQL-ResourceStresser
workflow: MySQL-ResourceStresser

goal:
  objective: maximize
  function:
    formula: max(1, OLTP.throughput)
  constraints: []

windowing:
  task: test
  type: trim
  trim: [1m, 1m]


parametersSelection:
  # postgres
  - name: mysql.mysql_innodb_buffer_pool_size
    domain: [5242880, 10485760]
  - name: mysql.mysql_innodb_thread_sleep_delay
  - name: mysql.mysql_innodb_flush_method
  - name: mysql.mysql_innodb_log_file_size
  - name: mysql.mysql_innodb_thread_concurrency
    domain: [0, 4]
  - name: mysql.mysql_innodb_max_dirty_pages_pct
  - name: mysql.mysql_innodb_read_ahead_threshold


trialAggregation: AVG
numberOfTrials: 1

steps:
  - name: baseline
    type: baseline
    renderParameters: ["mysql.*"]

  - name: optimize
    type: optimize
    optimizer: AKAMAS
    numberOfExperiments: 200
    maxFailedExperiments: 200
    renderParameters: ["mysql.*"]

You can create the study by running:

akamas create study study.yaml

You can then start it by running:

akamas start study "MySql Tuning"

You can now follow the study progress using the UI and explore the results using the Analysis and Metrics tabs.

Optimizing cost of a Kubernetes application while preserving SLOs in production

In this example, you will use Akamas live optimization to minimize the cost of a Kubernetes deployment, while preserving application performance and reliability requirements.

Environment setup

In this example, you need:

an Akamas instance
a Kubernetes cluster, with a deployment to be optimized
the kubectl command installed in the Akamas instance, configured to access the target Kubernetes and with privileges to get and update the deployment configurations
a supported telemetry data source (e.g. Prometheus or Dynatrace) configured to collect metrics from the target Kubernetes cluster

Optimization setup

Optimization packs

This example leverages the following optimization packs:

System

The system represents the Kubernetes deployment to be optimized (let's call it "frontend"). You can create a system.yaml manifest like this:

name: frontend
description: Kubernetes frontend deployment

Create the new system resource:

akamas create system system.yaml

The system will then have two components:

A Kubernetes container component, which contains container-level metrics like CPU usage and parameters like CPU limits
A Web Application component, which contains service-level metrics like throughput and response time

In this example, we assume the deployment to be optimized is called frontend, with a container named server, and is located within the boutique namespace. We also assume that Dynatrace is used as a telemetry provider.

Kubernetes component

Create a component-container.yaml manifest like the following:

name: container
description: Kubernetes container, part of the frontend deployment
componentType: Kubernetes Container
properties:
  dynatrace:
    type: CONTAINER_GROUP_INSTANCE
    kubernetes:
      namespace: boutique
      containerName: server
      basePodName: frontend-*

Then run:

akamas create component component-container.yaml frontend

Now create a component-webapp.yaml manifest like the following:

name: webapp
description: The service related to the frontend deployment
componentType: Web Application
properties:
  dynatrace:
    id: <TELEMETRY_DYNATRACE_WEBAPP_ID>

Then run:

akamas create component component-webapp.yaml frontend

Workflow

The workflow in this example is composed of three main steps:

Update the Kubernetes deployment manifest with the Akamas recommended deployment parameters (CPU and memory limits)
Apply the new parameters (kubectl apply)
Wait for the rollout to complete
Sleep for 30 minutes (observation interval)

Create a workflow.yaml manifest like the following:

name: frontend
tasks:
  - name: configure
    operator: FileConfigurator
    arguments:
      source:
        hostname: mymachine
        username: user
        key: /home/user/.ssh/key
        path: frontend.yaml.templ
      target:
        hostname: mymachine
        username: user
        key: /home/user/.ssh/key
        path: frontend.yaml

  - name: apply
    operator: Executor
    arguments:
      timeout: 5m
      host:
        hostname: mymachine
        username: user
        key: /home/user/.ssh/key
      command: kubectl apply -f frontend.yaml

  - name: verify
    operator: Executor
    arguments:
      timeout: 5m
      host:
        hostname: mymachine
        username: user
        key: /home/user/.ssh/key
      command: kubectl rollout status --timeout=5m deployment/frontend -n boutique;

  - name: observe
    operator: Sleep
    arguments:
      seconds: 1800

Then run:

akamas create workflow workflow.yaml

Telemetry

Create the telemetry.yamlmanifest like the following:

provider: Dynatrace
config:
  url: <YOUR_DYNATRACE_URL>
  token: <YOUR_DYNATRACE_TOKEN>
  pushEvents: false

Then run:

akamas create telemetry-instance telemetry.yaml frontend

Study

In this live optimization:

the goal is to reduce the cost of the Kubernetes deployment. In this example, the cost is based on the amount of CPU and memory limits (assuming requests = limits).
the approval mode is set to manual, a new recommendation is generated daily
to avoid impacting application performance, constraints are specified on desired response times and error rates
to avoid impacting application reliability, constraints are specified on peak resource usage and out-of-memory kills
the parameters to be tuned are the container CPU and memory limits (we assume requests=limits in the deployment file)

Create a study.yaml manifest like the following:

name: frontend
system: frontend
workflow: frontend
requireApproval: true

goal:
  objective: minimize
  function:
    formula: (((container.container_cpu_limit/1000) * 3) + (container.container_memory_limit/(1024*1024*1024)))
  constraints:
    absolute:
      - name: Response Time
        formula: webapp.requests_response_time <= 300
      - name: Error Rate
        formula: webapp.service_error_rate:max <= 0.05
      - name: Container CPU saturation
        formula: container.container_cpu_util:p95 < 0.8
      - name: Container memory saturation
        formula: container.container_memory_util:max < 0.7
      - name: Container out-of-memory kills
        formula: container.container_oom_kills_count == 0

parametersSelection:
  - name: container.cpu_limit
    domain: [300, 1000]
  - name: container.memory_limit
    domain: [800, 1536]

windowing:
  type: trim
  trim: [5m, 0m]
  task: observe

workloadsSelection:
  - name: webapp.requests_throughput

steps:
  - name: baseline
    type: baseline
    numberOfTrials: 48
    values:
      container.cpu_limit: 1000
      container.memory_limit: 1536

  - name: optimize
    type: optimize
    numberOfTrials: 48
    numberOfExperiments: 100
    numberOfInitExperiments: 0
    maxFailedExperiments: 50

Then run:

akamas create study study.yaml

You can now follow the live optimization progress and explore the results using the Akamas UI for Live optimizations.

Optimizing a live full-stack deployment (K8s + JVM)

The following study shows how to optimize the cost of a Kubernetes deployment considering also some JMV parameters in the optimization. Notice that, except for the JVM portion, the study is the same as the study in the example Optimizing a live K8s deployment.

name: Optimize Kubernetes Container + JVM
system: My Deployment + JVM
workflow: workflow

goal:
  objective: minimize
  function:
    formula: cost_csv.cost
  constraints:
    absolute:
      - name: adservice-response-time-slo
        formula: adservice_istio.istio_incoming_response_time_90_ms <= 10
      - name: adservice-restart-slo
        formula: adservice_pod.k8s_pod_container_restarts:max == 0
      - name: adservice-failures-slo
        formula: adservice_istio.istio_incoming_failed_transactions / adservice_istio.istio_incoming_success_transactions <= 0.1

workloadsSelection:
  - name: adservice_istio.istio_incoming_service_throughput

optimizerOptions:
  onlineMode: RECOMMEND
  experimentsWithBeta: 0
  safetyFactor: 0.6
  explorationFactor: 0.1
  optimizerEngineVersion: 1.5.3rc5


windowing:
  type: trim
  trim: [1m, 0s]
  task: Test

parametersSelection:
  - name: adservice_jvm.jvm_inlineSmallCode
  - name: adservice_jvm.jvm_maxInlineSize
    domain: [1, 500]
  - name: adservice_jvm.jvm_maxHeapFreeRatio
  - name: adservice_jvm.jvm_minHeapFreeRatio
  - name: adservice_jvm.jvm_gcType
  - name: adservice_jvm.jvm_parallelGCThreads
    domain: [1, 2]
  - name: adservice_jvm.jvm_concurrentGCThreads
    domain: [1, 2]
  - name: adservice_jvm.jvm_maxTenuringThreshold
  - name: adservice_jvm.jvm_compilationThreads
    domain: [2, 4]
  - name: adservice_jvm.jvm_newSize
    domain: [16, 300]
  - name: adservice_jvm.jvm_survivorRatio
  - name: adservice_jvm.jvm_minHeapSize
    domain: [16, 300]
  - name: adservice_jvm.jvm_maxHeapSize
    domain: [16, 300]
  - name: adservice_jvm.jvm_alwaysPreTouch
  - name: adservice_jvm.jvm_useTransparentHugePages
  - name: adservice.cpu_request
    domain: [10, 300]
  - name: adservice.memory_request
    domain: [64, 512]

parameterConstraints:
  - name: concGC_below_parGC
    formula: adservice_jvm.jvm_concurrentGCThreads <= adservice_jvm.jvm_parallelGCThreads
  - name: newSize_10Of_maxHeap
    formula: adservice_jvm.jvm_newSize >= 0.1 * adservice_jvm.jvm_maxHeapSize
  - name: newSize_90Of_maxHeap
    formula: adservice_jvm.jvm_newSize <= 0.9 * adservice_jvm.jvm_maxHeapSize
  - name: minHeap_10Of_maxHeap
    formula: adservice_jvm.jvm_minHeapSize >= 0.1 * adservice_jvm.jvm_maxHeapSize
  - name: minHeap_below_maxHeap
    formula: adservice_jvm.jvm_minHeapSize <= adservice_jvm.jvm_maxHeapSize

steps:
  - name: baseline
    type: baseline
    numberOfTrials: 14
    values:
      adservice.cpu_request: 300
      adservice.memory_request: 200
      adservice_jvm.jvm_gcType: "Parallel"

  - name: in_domain_baseline
    type: preset
    numberOfTrials: 14
    values:
      adservice.cpu_request: 300
      adservice.memory_request: 200
      adservice_jvm.jvm_gcType: "Parallel"
      adservice_jvm.jvm_maxHeapSize: 96
      adservice_jvm.jvm_minHeapSize: 16
      adservice_jvm.jvm_inlineSmallCode: 2048
      adservice_jvm.jvm_maxInlineSize: 35
      adservice_jvm.jvm_maxHeapFreeRatio: 100
      adservice_jvm.jvm_minHeapFreeRatio: 1
      adservice_jvm.jvm_parallelGCThreads: 1
      adservice_jvm.jvm_concurrentGCThreads: 1
      adservice_jvm.jvm_maxTenuringThreshold: 15
      adservice_jvm.jvm_compilationThreads: 2
      adservice_jvm.jvm_newSize: 16
      adservice_jvm.jvm_survivorRatio: 8
      adservice_jvm.jvm_alwaysPreTouch: '-AlwaysPreTouch'
      adservice_jvm.jvm_useTransparentHugePages: '-UseTransparentHugePages'

  - name: optimize
    type: optimize
    numberOfTrials: 14
    numberOfExperiments: 5
    numberOfInitExperiments: 0
    maxFailedExperiments: 1000

  - name: serial_GC_pretouch_thp
    type: preset
    numberOfTrials: 14
    values:
      adservice.cpu_request: 300
      adservice.memory_request: 200
      adservice_jvm.jvm_maxHeapSize: 96
      adservice_jvm.jvm_minHeapSize: 16
      adservice_jvm.jvm_inlineSmallCode: 2048
      adservice_jvm.jvm_maxInlineSize: 35
      adservice_jvm.jvm_maxHeapFreeRatio: 100
      adservice_jvm.jvm_minHeapFreeRatio: 1
      adservice_jvm.jvm_parallelGCThreads: 1
      adservice_jvm.jvm_concurrentGCThreads: 1
      adservice_jvm.jvm_maxTenuringThreshold: 15
      adservice_jvm.jvm_compilationThreads: 2
      adservice_jvm.jvm_newSize: 16
      adservice_jvm.jvm_survivorRatio: 8
      adservice_jvm.jvm_gcType: "Serial"
      adservice_jvm.jvm_alwaysPreTouch: '+AlwaysPreTouch'
      adservice_jvm.jvm_useTransparentHugePages: '+UseTransparentHugePages'

  - name: optimize2
    type: optimize
    numberOfTrials: 14
    numberOfExperiments: 5
    numberOfInitExperiments: 0
    maxFailedExperiments: 1000

  - name: ConcMarkSweep_GC
    type: preset
    numberOfTrials: 14
    values:
      adservice.cpu_request: 300
      adservice.memory_request: 200
      adservice_jvm.jvm_maxHeapSize: 96
      adservice_jvm.jvm_minHeapSize: 16
      adservice_jvm.jvm_inlineSmallCode: 2048
      adservice_jvm.jvm_maxInlineSize: 35
      adservice_jvm.jvm_maxHeapFreeRatio: 100
      adservice_jvm.jvm_minHeapFreeRatio: 1
      adservice_jvm.jvm_parallelGCThreads: 1
      adservice_jvm.jvm_concurrentGCThreads: 1
      adservice_jvm.jvm_maxTenuringThreshold: 15
      adservice_jvm.jvm_compilationThreads: 2
      adservice_jvm.jvm_newSize: 16
      adservice_jvm.jvm_survivorRatio: 8
      adservice_jvm.jvm_gcType: "ConcMarkSweep"
      adservice_jvm.jvm_alwaysPreTouch: '+AlwaysPreTouch'
      adservice_jvm.jvm_useTransparentHugePages: '+UseTransparentHugePages'

  - name: optimize3
    type: optimize
    numberOfTrials: 14
    numberOfExperiments: 5
    numberOfInitExperiments: 0
    maxFailedExperiments: 1000

  - name: G1_GC
    type: preset
    numberOfTrials: 14
    values:
      adservice.cpu_request: 300
      adservice.memory_request: 200
      adservice_jvm.jvm_maxHeapSize: 96
      adservice_jvm.jvm_minHeapSize: 16
      adservice_jvm.jvm_inlineSmallCode: 2048
      adservice_jvm.jvm_maxInlineSize: 35
      adservice_jvm.jvm_maxHeapFreeRatio: 100
      adservice_jvm.jvm_minHeapFreeRatio: 1
      adservice_jvm.jvm_parallelGCThreads: 1
      adservice_jvm.jvm_concurrentGCThreads: 1
      adservice_jvm.jvm_maxTenuringThreshold: 15
      adservice_jvm.jvm_compilationThreads: 2
      adservice_jvm.jvm_newSize: 16
      adservice_jvm.jvm_survivorRatio: 8
      adservice_jvm.jvm_gcType: "G1"
      adservice_jvm.jvm_alwaysPreTouch: '+AlwaysPreTouch'
      adservice_jvm.jvm_useTransparentHugePages: '+UseTransparentHugePages'

  - name: optimize4
    type: optimize
    numberOfTrials: 14
    numberOfExperiments: 5
    numberOfInitExperiments: 0
    maxFailedExperiments: 1000

  - name: ParNew_GC
    type: preset
    numberOfTrials: 14
    values:
      adservice.cpu_request: 300
      adservice.memory_request: 200
      adservice_jvm.jvm_maxHeapSize: 96
      adservice_jvm.jvm_minHeapSize: 16
      adservice_jvm.jvm_inlineSmallCode: 2048
      adservice_jvm.jvm_maxInlineSize: 35
      adservice_jvm.jvm_maxHeapFreeRatio: 100
      adservice_jvm.jvm_minHeapFreeRatio: 1
      adservice_jvm.jvm_parallelGCThreads: 1
      adservice_jvm.jvm_concurrentGCThreads: 1
      adservice_jvm.jvm_maxTenuringThreshold: 15
      adservice_jvm.jvm_compilationThreads: 2
      adservice_jvm.jvm_newSize: 16
      adservice_jvm.jvm_survivorRatio: 8
      adservice_jvm.jvm_gcType: "ParNew"
      adservice_jvm.jvm_alwaysPreTouch: '+AlwaysPreTouch'
      adservice_jvm.jvm_useTransparentHugePages: '+UseTransparentHugePages'

  - name: optimize5
    type: optimize
    numberOfTrials: 14
    numberOfExperiments: 1000
    numberOfInitExperiments: 0
    maxFailedExperiments: 1000

Note that in this study a present step has been specified with the specific value (category) of categorical parameters, as otherwise, the optimizer would only consider a category that has already been seen in the configurations history. For more details, please refer to the Optimize step page of the reference guide.