Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This page provides code snippets for each Akamas construct by considering the Konakart, a Java-based e-commerce application (https://www.konakart.com/), as a reference and modeling it as a 2-tier application, with an application server and a database (MySQL) layers.
For simplicity's sake, the optimization use case is defined as follows:
Optimization scope: JVM parameters
Optimization goal: reduce the application memory footprint (i.e. heap max that can be allocated by the JVM)
Optimization constraints: no impact on service level (i.e. response times, throughput, and error rate have to be the same before/after the optimization)
This is the YAML file providing the system definition for the reference use case:
Since the optimization scope only considers JVM parameters, only a Java component needs to be modeled.
The following snippet defines a Konakart Java component based on a java-openjdk-11 component type.
A different optimization scope would have required a different modeling approach. For instance, a broader optimization scope including the Linux layers of both application and database and the database layer would have required 3 additional components: 2 distinct components (of the same component type) for the 2 Linux layers and 1 component for the database layer.
From a monitoring perspective, there are only 2 mandatory data sources: the JVM layer, which will provide the goal metric needed to evaluate the score (heap size), and the web application layer, which will provide the metrics needed to evaluate optimization constraints (response time, throughput and error rate). The web application component is based on a particular component type that aims to ease the collection of end-user metrics and has no parameters attached. Generally speaking the definition of a component based on the web application component type can be handy every time an optimization foresees the execution of a performance test and it is required to evaluate the end-to-end metrics.
The following snippet defines a Konakart component based on a web application component type.
A more comprehensive approach to telemetries could include additional metrics and data sources to provide a better understanding of the system behavior. The example provided only focuses on the mandatory metrics and the components needed to model them.
Here is a (simplified) component types definition for the reference use case.
These component types are included in the "Java" and "Web Application" Optimization Packs and are available in any Akamas installation.
Here is a (simplified) definition of the Java parameters related to the Java component type used for the reference use case.
These parameters are included in the "Java" and "Web Application" Optimization Packs and are available in any Akamas installation.
Here is a (simplified) definition of the web application metrics related to the web application component type used for the reference use case.
These parameters are included in the "Web Application" Optimization Pack and are available in any Akamas installation.
To create a custom optimization pack, the following fixed directory structure and several YAML manifests need to be created.
The optimizationPack.yaml
file is the manifest of the optimization pack to be created, which should always be named optimizationPack
and have the following structure:
name
string
It should not contain spaces.
TRUE
The name of the optimization pack.
description
string
TRUE
A description to characterize the optimization pack.
weight
integer
weight > 0
TRUE
A weight to be associated to the optimization pack. This field is used for licensing purposes.
version
string
It should match the regexp: \d.\d.\d
TRUE
The version of the optimization pack.
tags
array of string
FALSE
An empty array
A set of tags to make the optimization pack more easily searchable and discoverable.
The component-types
directory should contain the manifests of the component types to be included in the optimization pack. No particular naming constraint is enforced on those manifests. See Component Types template for details on the structure of those manifests.
The metrics
directory should contain the manifests of the groups of metrics to be included in the optimization pack. No particular naming constraint is enforced on those manifests. See Metric template for details on the structure of those manifests.
The parameters
directory should contain the manifests of the groups of parameters to be included in the optimization pack. No particular naming constraint is enforced on those manifests. See Parameter template for details on the structure of those manifests.
The following command needs to be executed to produce the final JSON descriptor:
After this, the optimization pack can be installed and used.
This guide describes how to apply the Akamas approach to the optimization of some real-world cases and how to set up a test environment for experimenting with Akamas.
In this example, we’ll optimize Online Boutique, a demo e-commerce application running on microservices, by tuning the resources allocated to a selection of pods. This is a common use case where we want to minimize the cost associated with running an application without impacting the SLO.
Notice: all the required artifacts are published in this public repository.
The test environment includes the following instances:
Akamas: the instance running Akamas.
Cluster: an instance hosting a Minikube cluster.
You can configure the Minikube cluster using the scripts provided in the public repository by running the command
To gather metrics about the application we will use Prometheus. It will be automatically configured by applying the artifacts in the repository with the following command:
The targeted system is Online Boutique, a microservice-based demo application. In the same namespace, a deployment running the load generator will stress the boutique and forward the performance metrics to Prometheus.
To configure the application and the load generator on your (Minikube) cluster, apply the definitions provided in the public repository by running the following command:
In this section, we will guide you through the steps required to set up the optimization on Akamas.
If you have not installed the Kubernetes optimization pack yet, take a look at the Kubernetes optimization pack page to proceed with the installation.
Notice: the artifacts to create the Akamas entities can be found in the public repository, under the akamas
directory.
Here’s the definition of the system containing our components and telemetry-instances for this example:
To create the system run the following command:
We’ll use a component of type WebApplication to represent at a high level the Online Boutique application. To identify the related Prometheus metrics the configuration requires the prometheus
property for the telemetry service, detailed later in this guide.
Here’s the definition of the component:
To create the component in the system run the following command:
The public repository contains the definition of all the services that compose Online Boutique. In this guide, for the sake of simplicity, we’ll only tune the resources of the containers in the frontend and the product-catalog pods, defined as components of type Kubernetes Container.
Here’s their definition:
To create the component in the system run the following command:
The workflow is divided into the following steps:
Create the YAML artifacts with the updated resource limits for the tuned containers.
Apply the updated definitions to the cluster.
Wait for the rollout to complete.
Start the load generator
Let the test run for a fixed amount of time
Stop the test and reset the load generator
The following is the definition of the workflow:
To better illustrate the process, here is a snippet of the template file used to update the resource limits for the frontend deployment.
The following are respectively the script to start and stop the load generator:
If you have not installed the Prometheus telemetry provider yet, take a look at the telemetry provider page Prometheus provider to proceed with the installation.
With the definition of the telemetry instance shown below, we import the end-user performance metrics provided by the load-generator, along with a custom definition of "cost" given by a weighted sum of the CPU and memory allocated for the pods in the cluster:
To create the telemetry instance execute the following command:
With this study, we want to minimize the "cost" of running the application, which, according to the definition described in the previous section, means reducing the resources allocated to the tuned pods in the cluster. At the same time, we want the application to stay within the expected SLO, and that is obtained by defining a constraint on the response time and error rate recorded by the load generator.
To create and run the study execute the following commands:
This page provides a short compendium of general performance engineering best practices for any load testing exercise. The focus is on ensuring that realistic performance tests are designed and implemented to be successfully leveraged for optimization initiatives.
The goal of ensuring realistic performance tests boils down to two aspects:
sound test environments;
realistic workloads.
The pre-production environment (Test Env from now on) needs to represent the production environment (ProdEnv from now on) as closely as possible.
The most representative test environment would be a perfect replica of the production environment from both infrastructure (hardware) and architecture perspectives. The following criteria and guidelines can help design a TestEnv suitable for performance testing supporting optimization initiatives.
The hardware specifications of the physical or virtual servers running in TestEnv and ProdEnv must be identical. This is because any differences in the available resources (e.g. amount of RAM) or specification (e.g. CPU vendor and/or type) may affect both service performance and system configuration.
This general guideline can only be relaxed for servers/clusters running container(s) or container orchestration platforms (e.g. Kubernetes or OpenShift). Indeed, it is possible to safely execute most of the related optimization cases if the TestEnv guarantees enough spare/residual capacity (number of cores or amount of RAM) to allocate all the needed resources.
While for monolithic architectures this may translate into significant HW requirements, with microservices this might not be the case, for two main reasons:
microservices are typically smaller than monoliths and designed for horizontal scalability: this means that optimizing the configuration of the single instance (pod/container resources and runtime settings) becomes easier as they typically have smaller HW requirements;
modern approaches like Infrastructure-as-code (IaaC), typically used with cloud-native applications, allow for easily setting up cluster infrastructure that can mimic production environments.
Test Envs are typically downscaled/downsized with respect to Prod Envs. If this is the case, then optimizations can be safely executed provided it is possible to generate a "production-like" workload on each of the nodes/elements of the architecture.
This can be usually achieved if all the architectural layers have the same scale ratio between the two environments and the generated workload is scaled accordingly. For example, if the ProdEnvs has 4 nodes at the front-end layer, 4 at the backend layer, and 2 at the database layer, then a TestEnv can have 2 nodes, 2 nodes, and 1 node respectively.
From a performance testing perspective, the existence of a load balancing among multiple nodes can be ignored, if the load balancing relies on an external component that ensures a uniform distribution of the load across all nodes.
On the contrary, if an application-level balancing is in place, it might be required to include at least two nodes in the testing scenario so as to take into account the impact of such a mechanism on the performance of the cluster.
The TestEnv should also replicate the application ecosystem, including dependencies from external or downstream services.
External or downstream services should emulate the production behavior from both functional (e.g. response size and error rate) and performance (e.g. throughput and response times) perspectives. In case of constraints or limitations on the ability to leverage external/downstream services for testing purposes, the production behavior needs to be simulated via stubs/mock services.
In the case of microservices applications, it is also required to replicate dependencies within an application. Several approaches can be taken for this purpose, such as:
replicating interacting microservices;
mocking these microservices and simulating realistic response times using simulation tools such as https://github.com/spectolabs/hoverfly;
disregarding dependencies with nonrelevant services (e.g. messages produced during a test are simply left published in a queue without being dequeued).
The most representative performance test script would provide 100% coverage of all the possible test cases. Of course, this is very unlikely to be the case in performance testing. The following criteria and guidelines can be considered to establish the required test coverage.
Statistical relevance
The test cases included in the test script must cover at least 80% of the production workload.
Business relevance
The test cases included in the test script must cover all the business-critical functionalities that are known (or expected) to represent a significant load in the production environment
Technical relevance
The test cases included in the test script must cover all the functionalities that at the code level involve:
Large objects/data structure allocation and management
Long living objects/data structure allocation and management
Intensive CPU, data, or network utilization
"one-of-a-kind" implementations, such as connections to a data source, ad-hoc object allocation/management, etc.
The virtual user paths and behavior coded in the test script must be representative of the workload generated by production users. The most representative test script would account for the production users in terms of a mix of the different user paths, associated think times, and session length perspectives.
When single-user paths cannot be easily identified, the best practice is to consider each of them the most comprehensive user journey. In general, a worst-case approach is recommended.
The task of reproducing realistic workloads is easier for microservice architectures. On the contrary, for monolithic architectures, this task could become hard as it may not be easy to observe all of the workloads, due to custom frameworks, etc. With microservices, the workload can be completely decomposed in terms of APIs/endpoints and APM tools can provide full observability of production workload traffic and performance characteristics for each single API. This guarantees that the replicated workload can reproduce the production traffic as closely as possible.
Both test script data, that is datasets used in the test script, and test environment data, that is datasets in any involved databases/datastores, have to be characterized both in terms of size and variance so as to reproduce the production performance.
The test script data has to be characterized in order to guarantee production-like performances (e.g. cache behavior). In case this characterization is difficult, the best practice is to adopt a worst-case approach.
The test data must be sized and have an adequate variance to guarantee production like performances in the interaction with databases/datastores (e.g. query response times).
Most performance test tools provide the ability to easily define and modify the test scenarios on top of already defined test cases/scripts, test case-mix, and test data. This is especially useful in the Akamas context where it might be required to execute a specific test scenario, based on the specific optimization goal defined. The most common (and useful, in the Akamas context) test scenarios are described here below.
A load test aims at measuring system performance against a specified workload level, typically the one experienced or expected in production. Usually, the workload level is defined in terms of virtual user concurrency or request throughput.
In the load test, after an initial ramp-up, the target load level is maintained constant for a steady state until the end of the test.
When validating a load test, the following two key factors have to be considered:
The steady-state concurrency/throughput level: a good practice is to apply a worst-case approach by emulating at least 110% of the production throughput;
The steady-state duration: in general defining the length for steady-state is a complex task because it is strictly dependent on the technologies under test and also because phenomena such as bootstraps, warm-ups, and caching can affect the performance and behavior of the system only before or after a certain amount of time; as a general guide to validate the steady-state duration, it is useful to:
execute a long-run test by keeping the defined steady-state for at least 2h to 3h;
analyze test results by looking for any variation in the performance and behavior of the system over time;
In case no variation is observed, shorten the defined same steady-state to at least 30+min.
A Stress test is all about pushing the system under test to its limit.
Stress tests are useful to identify the maximum throughput that an application can cope with while working within its SLOs. Identifying the breaking point of an application is also useful to highlight the bottleneck(s) of the application.
A stress test also makes it possible to understand how the system reacts to excessive load, thus validating the architectural expectations. For example, it can be useful to discover that the application crashes when reaching the limit, instead of simply enqueuing requests and slowing down processing them.
An endurance test aims at validating the system's performance over an extended period of time. This type of test scenario is useful to validate that the best configuration found in an Akamas study provides stable results over an extended period of time (i.e., several hours).
The first validation is provided by utilization metrics (e.g. CPU, RAM, I/O), which should closely display in the test environments the same behavior of production environments. If the delta is significant, go back and review your test cases and the environment to close the gap and gain confidence in the test results.
In this study, Akamas will optimize a web application by tuning the JVM parameters. The workflow leverages NeoLoad’s load generator to gather metrics through the dedicated NeoLoad Web operator and the NeoLoad Web provider.
The following snippets contain the system definition composed by a JVM running the petstore web application.
System: webapplication
Component: jvm
Component: webapp
Telemetry instance: NeoLoadWeb
Here’s a workflow that creates a new configuration file by interpolating the tuned parameters in a template file, restarts the application to apply the parameters, and triggers the execution of a load test:
Here’s a study in which Akamas tries to minimize the Java memory consumption by acting only on the heap size and on the type of garbage collector.
The web application metrics are used in the constraints to ensure the configuration does not degrade the service performances (throughput, response time, and error rate) below the acceptance level.
This page describes how to setup a simple yet complete performance testing environment that you can use for running Akamas offline optimization study:
The target application is Konakart, a real-world Java-based e-commerce application
JMeter will be used to execute stress tests, with built-in sample scenarios
Prometheus will be used to monitor the environment (load test, JVM and OS metrics)
The following picture describes the high-level Akamas architecture that is enabled by this setup.
First of all, install Docker on your Linux box:
Now enable the user ubuntu
to run docker without sudo
:
In this environment we leverage Docker Swarm, a Docker native container orchestration system. Even though in this scenario everything runs on a single machine, Swarm is handy as it provides the ability to specify container resource limits (e.g. how many CPUs and how much memory the container can use) which can be very useful from a tuning perspective.
At this point, we can initialize Docker Swarm with this simple command:
You should see a message stating that Docker Swarm has been setup.
You now create a Docker network that all the containers will use to communicate:
In order to setup Konakart, first clone this repository:
Now you can start Konakart by running:
You can now verify that Konakart is up and running by accessing your instance on port 8780
:
Unless you plan to use a different load testing tool (such as LoadRunner Enterprise or Neotys NeoLoad), JMeter is a great choice to start using Akamas.
Setting up JMeter is straightforward as a JMeter container comes already configured with test plans built for performance testing Konakart. Moreover, JMeter is already configured with the Prometheus integration so that performance test metrics (e.g. transaction throughput and transaction response time) are collected via Prometheus Listener for JMeter. You just need to verify the ability to launch a performance test in this environment.
You can launch a first manual performance test by running the following command, where you need to replace YOUR_INSTANCE_ADDRESS
with the address of your Konakart instance:
You should see an output similar to the one displayed here below, indicating that JMeter is running successfully:
In case you see any errors (check out the Err:
column), chances are that JMeter cannot contact Konakart. Please verify that your instance address is correct and relaunch the manual test until you are sure JMeter is running correctly.
The JMeter docker image includes a couple of test plans described here below:
Ramp test plan
This test allows you to stress test Konakart with a ramp load profile. This profile is included in the ramp_test_plan.jmx
file.
You can customize the profile by setting the following JMeter variables (see the example above):
THREADS
, the maximum number of virtual users (default 20)
RAMP_SEC
, the load test duration in seconds (default 200)
Plateau test plan
This test allows you to do a performance test of Konakart with an initial ramp-up and then with a constant load. This profile is included in the plateau_test_plan.jmx
file.
You can customize the scenario by setting the following JMeter variables (see the example above):
THREADS
, the maximum number of virtual users (default 20)
RAMP_UP_MIN
, the ramp-up duration in minutes (default 1)
RAMP_UP_COUNT
, the number of steps in the ramp (default 5)
HOLD_MIN
, the plateau duration in minutes (default 5)
Unless you plan to use a different monitoring tool (such as Dynatrace), Prometheus is a great choice to start using Akamas.
Now that Konakart and JMeter are up and running, the last step is to setup Prometheus. In this scenario, Prometheus allows you to gather any metrics you will need for your Akamas optimizations, for example, the performance test metrics measured by JMeter (e.g. transaction throughput and response time) or related to application resource usage.
This environment also includes a number of useful dashboards that you can use to monitor the application, infrastructure and load testing key metrics.
By running the following command you can launch Prometheus and Grafana, plus a set of preconfigured dashboards to monitor your load tests:
The following is a quick overview of the preconfigured dashboards that you can use to monitor the application, infrastructure and load-testing key metrics. These dashboards are available as part of the Prometheus installation.
You can view this dashboard by accessing Grafana on port 3000
of your Konakart instance.
The JMeter dashboard allows you to monitor your performance tests.
For example, run again the JMeter performance test described before and see the results in the JMeter dashboard:
The Docker dashboard allows you to see the resource consumption of your containers, including the Konakart application:
The Node dashboard allows you to see the OS-level Linux performance metrics of your instance:
At this point, you have a simple test environment for your Akamas optimizations.
In this example study, we are going to optimize a MySQL instance by setting the performance goal of maximizing the throughput of operations towards the database.
As regards the workload generation, in this example we are going to use Sysbench, a popular open-source benchmarking suite.
To import the results of the benchmark into Akamas, we are going to use a custom script to convert its output to a CSV file that can be parsed by the .
In order to run the Sysbench suite against a MySQL installation, you need to first install and configure the two software. In the following , we will assume that both MySQL and Sysbench will run on the same machine, to obtain more significant results in terms of performance you might want to run them on separate hosts.
A set of scripts is provided to support all the setup steps.
To install MySQL please follow the official documentation. In the following, we will make a few assumptions on the location of the configuration files, the user running the server, and the location of the datafiles. These assumptions are based on a default installation of MySQL on an Ubuntu instance performed via apt.
Configuration file: /etc/mysql/conf.d/mysql.cnf
MySQL user: mysql
MySQL root user password: root
This is a template for the configuration file mysql.cnf.template
If your installation of MySQL has different default values for these parameters please update the provided scripts accordingly.
To install Sysbench on an ubuntu machine run the following command
To verify your installation of Sysbench and initialize the database you can use the scripts provided here below and place them in the /home/ubuntu/scripts
folder. Move in the folder, make sure MySQL is already running, and run the init-db.sh
script.
This is the init-db.sh
script:
This script will:
connect to your MySQL installation
create a sbtest
database for the test
run the Sysbench data generation phase to populate the database
The init-db.sh
script contains some information on the amount of data to generate. The default setting is quite small and should be used for testing purposes. You can then modify the test to suit your benchmarking needs. If you update the script please also update the run_benchmark.sh
script accordingly.
Here follow a step by step explanation of all the required configuration for this example. You can find attached a zip file that contains all of the YAML files for your convenience.
In this example, we are interested in optimizing MySQL settings and measuring the peak throughput measured using Sysbench. Hence, we are going to create two components:
A mysql
component which represents the MySQL instance, including all the configuration parameters
A Sysbench
component which represents Sysbench and contains the custom metrics reported by the benchmark
MySQL is a widespread technology and Akamas provides a specific Optimization Pack to support its optimization. Sysbench, on the other hand, is a benchmark application and is not yet supported by a specific optimization pack. To use it in our study, we will need to define its metrics first. This operation can be done once and the created component type can be used across many systems.
First, build a metrics.yaml
file with the following content:
You can now create the metrics by issuing the following command:
Finally, create a file named sysbench.yaml
with the following definition of the component:
You can now create the component by issuing the following command:
Here’s the definition of our system (system.yaml
):
Here’s the definition of our mysql
component (mysql.yaml
):
Please make sure the component properties are correct for your environment (e.g. hostname, username, key, file paths, etc.).
Here’s the definition of our Sysbench
component (sysbench.yaml
):
We can create the system by running:
We can then create the components by running the following commands:
A workflow for optimizing MySQL can be structured in 6 tasks:
Reset Sysbench data
Configure MySQL
Restart MySQL
Launch the benchmark
Parse the benchmark results
Here below you can find the scripts that codify these tasks.
This is the restart-mysql.sh
script:
This is the clean_bench.sh
script:
This is the run_test.sh
script:
This file parse_csv.sh
script:
Here is the complete Akamas workflow for this example (workflow.yaml
):
You can create the workflow by running:
This telemetry provider can be installed running:
To start using the provider, we need to define a telemetry instance (csv.yaml
):
Please make sure the telemetry configuration is correct for your environment (e.g. hostname, username, key, file paths, etc.).
You can create the telemetry instance and attach it to the system by running:
In this example, we are going to leverage Akamas AI-driven optimization capabilities to maximize MySQL database transaction throughput, as measured by the Sysbench benchmark.
Here is the Akamas study definition (study.yaml
):
You may need to update some parameter domains based on your environment (e.g. InnoDB buffer pool size maximum value depends on your server available memory)
You can create the study by running:
You can then start it by running:
You can now follow the study progress using the UI and explore the results using the Analysis and Metrics tabs.
In this example study we’ll tune the parameters of PageRank, one of the benchmarks available in the , to minimize its memory usage. Application monitoring is provided by Prometheus, leveraging a JMX exporter.
The test environment includes the following instances:
Akamas: instance running Akamas
PageRank: instance running the PageRank benchmark and the Prometheus monitoring service
To gather metrics about PageRank we will use a Prometheus and a JMX exporter. Here’s the scraper to add to the Prometheus configuration to extract the metrics from the exporter:
To run and monitor the benchmark we’ll require on the PageRank instance:
The
The , plus a configuration file to expose the required classes
Here’s the snippet of code to configure the instance as required for this guide:
In this section, we will guide you through the steps required to set up the optimization on Akamas.
Here’s the definition of the system we will use to group our components and telemetry-instances for this example:
To create the system run the following command:
Here’s the definition of the component:
To create the component in the system run the following command:
The workflow used for this study consists of two main stages:
generate the configuration file containing the tested OpenJ9 parameters
run the execution using previously written parameters
Here’s the definition of the workflow:
Where the configuration template is j9_opts.template
is defined as follows:
To create the workflow run the following command:
The following is the definition of the telemetry instance that fetches metrics from the Prometheus service:
To create the telemetry instance in the system run the following command:
This telemetry instance will be able to bind the fetched metrics to the related jvm component thanks to the prometheus
attribute we previously added in its definition.
The goal of this study is to find a JVM configuration that minimizes the peak memory used by the benchmark.
The optimized parameters are the maximum heap size, the garbage collector used and several other parameters managing the new and old heap areas. We also specify a constraint stating that the GC regions can’t exceed the total heap available, to avoid experimenting with parameter configurations that can’t start in the first place.
Here’s the definition of the study:
To create and run the study execute the following commands:
We are going to use Akamas telemetry capability to import the metrics related to Sysbench benchmark results, in particular, the transaction throughput and latency. To achieve this we can leverage the Akamas , which extracts metrics from CSV files. The CSV file is the one produced in the last task of the workflow of the study.
If you have not installed the Eclipse OpenJ9 optimization pack yet, take a look at the optimization pack page to proceed with the installation.
We’ll use a component of type to represent the JVM underlying the PageRank benchmark. To identify the JMX-related metrics in Prometheus the configuration requires the prometheus
property for the telemetry service, detailed later in this guide.
In this example, we are going to tune the initialization parameters of an Oracle Database server instance in order to maximize its throughput while stressed by a load generator.
For the workload, we’ll use the OLTPBench's implementation of TPC-C, a popular transaction processing benchmarking suite, while to extract the metrics we are going to leverage the Oracle Prometheus exporter.
For the purpose of this experiment we are going to use two dedicated machines:
oraxe.mycompany.com, hosting a single Oracle 18c XE instance running inside a docker container (provisioned using the scripts on the official Oracle GitHub repository)
oltpbench.mycompany.com, that generates the workload using OLTPBench's TPC-C and will host the OracleDB Prometheus exporter instance
We assume to be working with Linux hosts
The OracleDB Prometheus exporter publishes as metrics the results of the queries defined in the configuration file. In our case, we’ll use it to extract valuable performance metrics from Oracle’s Dynamic Performance (V$) Views.
We can spin up the exporter using the official Docker image using the following command, where cust-metrics.toml
is our custom metrics file:
The exporter will publish the metrics on the port 9161
.
Here’s the example metrics file used to run the exporter:
You can check how to configure Prometheus here; by default, it will run on port 9090.
To configure the OracleDB exporter you can add the following snippet to the configuration file:
To model the system composed of the tuned database and the workload generator we need two different components:
An oracle
component that represents the Oracle Database instance and maps directly to oraxe.mycompany.com.
A tpcc
component that represents the TPC-C workload from the OLTPBench suite and maps to oltpbench.mycompany.com.
For the tpcc
component, we’ll need first to define some custom metrics and a new component-type. The following is the definition of the metrics (tpcc-metrics.yaml
):
The following is the definition of the new component-type (tpcc-ctype.yaml
):
We can then create the new component type running the commands:
As a next step, we can proceed then with the definition of our system (system.yaml
):
Here’s the definition of our oracle
component (oracle.yaml
):
Here’s the definition of the tpcc
component (tpcc.yaml
):
We can create the system by running:
We can then create the components by running:
Since we are using Prometheus to extract the database metrics we can leverage the Prometheus provider, which already includes the queries needed for the Oracle metrics we need. To use the Prometheus provider we need to define a telemetry instance (prom.yaml
):
We can now create the telemetry instance and attach it to our system by running:
Other than the telemetry of the Oracle instance, we need also the metrics in the output CSVs from the TPC-C workload runs. To ingest these metrics we can leverage the CSV Provider, defining the following telemetry instance (csv.yaml
):
We can create the telemetry instance and attach it to our system by running:
Using an Executor operator we run a command to clean the results folder that may contain files from previous executions
We define a task that uses the OracleConfigurator operator to update the Oracle initialization parameters:
We define a task that uses the Executor operator that reboots the Oracle container for the parameters that need a restart to take effect:
We define a task that uses the Executor operator to launch the TPC-C benchmark against the Oracle instance:
We define a workflow task that runs a script that parses the TPC-C output files and generates a file compatible with the CSV Provider:
Where tpcc_parse_csv.sh
is the following script:
By putting together all the tasks defined above we come up with the following workflow definition (workflow.yaml
):
We can create the workflow by running:
The objective of this study is to maximize the transaction throughput while stressed by the TPC-C load generator, and to achieve this goal the study will tune the size of the most important areas of the Oracle instance.
Here’s the definition of the goal of our study, which is to maximize the tpcc.throughput
metric:
We define a window to consider only the data points after the ramp-up time of the load test:
For this study, we are trying to achieve our goal by tuning the size of several areas in the memory of the database instance. In particular, we will tune the overall size of the Program Global Area (containing the work area of the active sessions) and the size of the components of the Shared Global Area.
The domains are configured to explore, for each parameter, the values around the default values.
The following constraint allows the study to explore different size configurations without exceeding the maximum overall memory available for the instance:
We are going to add to our study two steps:
A baseline step, in which we configure the default values for the memory parameters as discovered from previous manual executions.
An optimization step, where we perform 200 experiments to search the set of parameters that best satisfies our goal.
The baseline step contains some additional parameters (oracle.memory_target
, oracle.sga_target
) that are required by Oracle in order to disable the automatic management of the SGA components.
Here’s what these steps look like:
Here’s the study definition (study.yaml
) for optimizing the Oracle instance:
You can create the study by running:
You can then start it by running:
This page provides a list of best practices when optimizing an Oracle RDS with Akamas.
Every RDS instance fetches the initialization parameters from the definition of the DB parameter group it is bound to. A best practice is to create a dedicated copy of the baseline group for the target database, to avoid impacting any other database that may share the same configuration object.
DB parameter groups must be configured through the dedicated Amazon RDS API interface. A simple way to implement this step in the Akamas workflow is to save the tested configuration in a configuration file and submit it through a custom executor leveraging the AWS Command Line Interface. The following snippets show an example of tuning an instance with id oracletest
, bound to the configuration group named test-oracle
:
Where the following is an example of the configuration template oraconf.template
:
The following script rds_update.sh
updates the configuration. It requires the name of the target DB parameter group and the path of the temporary folder containing the generated configuration:
The following script rds_reboot.sh
restarts the RDS instance with the provided ID:
The following study is similar to Optimizing a live K8s deployment but adds some JVM parameters in the optimization.
Note that this study includes a preset step with a specific value (category) of categorical parameters, since otherwise, the optimizer would only consider a category that has already been seen in the configuration history. For more details, please refer to the Optimize step page of the reference guide.
In this example study, we will tune the initialization parameters of an Oracle Database server instance to minimize the memory required for running KonaKart, a popular Java e-commerce service, without significantly impacting the responsiveness of the whole system.
We’ll use Apache JMeter to stress the system for the test, while we will leverage the Oracle Prometheus exporter to extract the metrics.
For this study, we will use three dedicated machines:
oradb.mycompany.com, hosting an Oracle Database 19c instance
konakart.mycompany.com, running the KonaKart Community Edition service
akamas.mycompany.com, which generates the workload using JMeter and will host the OracleDB Prometheus exporter instance
Refer to the following links to install and configure KonaKart Community Edition:
Install KonaKart: install and configure the service
Manual Installation: install the demo dataset
For this use case, we provisioned the database on a VM on Oracle Cloud, which allows us to easily provision licensed instances on demand.
Through the OracleDB Prometheus exporter, we can publish as metrics the results of the arbitrary queries defined in the configuration file. In our case, we’ll use it to extract valuable performance metrics from Oracle’s Dynamic Performance (V$) Views.
We can spin up the exporter using the official Docker image using the following command, where cust-metrics.toml
is our custom metrics file:
The exporter will publish the metrics on the specified port 9161
.
Here’s the metrics file used to run the exporter:
Using the following snippet we configure Prometheus to fetch metrics from:
the JMeter exporter exposing the load-generator stats
the OracleDB exporter monitoring the database
For a complete guide on how to configure and manage Prometheus refer to the official documentation.
The load generator runs containerized on the akamas.mycomopany.com instance using the attached Konakart_optimizePerf.jmx
configuration file.
The provided run_test.sh
wraps the command to execute the test, and requires as an argument the URL of the target KonaKart instance.
Our modeled system includes the following components:
The oracle
component that models the Oracle Database instance on oradb.mycompany.com, whose parameters are the targets of our optimization
The webapp
component that models the KonaKart service running on konakart.mycompany.com, providing the performance metrics used to validate the system’s SLOs
The first step is defining the system (system.yaml
):
Here’s the definition of our oracle
component (oracle.yaml
), including the parameters needed to connect to the database instances and the filters to fetch metrics from Prometheus.
Notice: to update the init parameter the user requires the ALTER SYSTEM
privilege.
Here’s the definition of the konakart
component (konakart.yaml
), containing the filters to fetch the metrics from Prometheus:
We can create the system by running the following command:
We can then create the components by running the following commands:
Since we are using Prometheus to extract the database metrics we can leverage the Prometheus provider, which already includes the queries needed for the Oracle and JMetric queries for the metrics we need. To use the Prometheus provider we need to define a telemetry instance (prom.yaml
):
We can now create the telemetry instance and attach it to our system by running:
This section outlines the steps performed during the execution of the experiments.
Using an Executor operator we run a command to stop the KonaKart instance using the script provided with the installation, then check the service is not running anymore with a custom script:
Attached you can find the referenced script check_konakart_stop.sh:
Using the OracleConfigurator operator to update the Oracle initialization parameters with the new configuration. Then with the Executor operator, we run some custom scripts to restart the database instance to apply the new parameters and check for a successful startup. Additionally, in case of a failed startup, the script of the last task restores a backup of the default configuration file (spfile
), restarts the database, and returns an error code to notify Akamas that the tested configuration is invalid:
Attached you can find the referenced script check_db.sh:
and restart_db.sh:
We then define the Executor operator tasks that restart the KonaKart service and check it is running correctly:
Attached you can find the referenced script:
Finally, we define a task that uses the Executor operator to run the JMeter load test against the KonaKart instance:
By putting together all the tasks defined above we come up with the following workflow definition (workflow.yaml
):
We can create the workflow by running:
This study aims to minimize the memory allocated for the Oracle database while under a simulated load of the typical traffic, without impacting the SLOs.
This section provides a step-by-step description of the study definition.
Here’s the definition of the goal for our study, which is to minimize the memory allocated by Oracle to the SGA and PGA memory areas. The constraints ensure that any tested configuration that does not operate within the defined SLOs is flagged as not valid. In particular, the following are required:
the peak error rate must not exceed 5 errors per second
the transaction throughput must not decrease more than 10% with respect to the baseline
the response time must not increase more than 20% with respect to the baseline
We define a window to consider only the data points after the ramp-up time of the load test:
For this study, we are trying to optimize the size of the two main memory areas, meaning the Program Global Area and the Shared Global Area.
Given our goal, we set the domains of the parameters to explore only sizes smaller than the baseline.
The following constraint prevents Akamas from exploring configurations that we already know Oracle won’t validate:
We are going to add to our study two steps:
A baseline step, in which we configure the default values for the memory parameters as discovered from previous manual executions.
An optimization step, where we perform 200 experiments to search the set of parameters that best satisfies our goal.
Here’s what these steps look like:
Here’s the study definition (study.yaml
) for optimizing the Oracle instance:
You can create the study by running:
You can then start it by running:
Locust (https://locust.io/) is a popular Python-based load-testing tool. If you use Locust to run your load tests you might want to follow these guidelines to import its metrics (throughput, errors, and response time) using the CSV file telemetry provider.
Locust can export the results of a test in a variety of formats, including CSV files.
To generate a csv file from locust add the --csv results/results
argument to the locust command line used to invoke the test as in this example.
This will make locust generate some CSV files in the results
folder; we are interested in the file name results_stats_history.csv
which contains some time-series with the core performance metrics.
We also suggest adding the following lines at the beginning of your locust file to reduce the sampling frequency reported in the CSV from the default of 1 second to 30 seconds as described here.
To import the CSV into Akamas we still need to do a bit of pre-processing to:
Update the timestamp in a more friendly format
Add a column with the name of the Akamas component
This can be done by running the following script, make sure to change application
on line 10 with the name of your Web Application component.
You can easily add this as an operator to the akamas workflow so that it gets executed at the end of every test run or integrated into the script that launches your locus test.
Now you can create a telemetry instance such as the following one to import the metrics.
Save this snippet in a YAML file by editing the following sections:
Host, username, and authentication to connect to the instance where the CSV file is hosted (lines 7-11)
remotefilePattern with the path to the CSV file to load on the instance
Now you can use the imported metrics in your study goal and constraints and explore them from the UI.
Here you can find a collection of sample artifacts to be used to setup a workflow that runs the test and prepares the csv file using the toolbox as the target host.
In this study, Akamas is tasked with the optimization of Linux1, a Linux-based system (Ubuntu). The study's goal is to maximize the throughput of the computations of the CPU benchmark.
Sysbench is a suite of benchmarks for CPU, file system, memory, threads, etc… typically used for testing the performance of databases.
System1 comes with a that collects system metrics that Akamas consumes through a . Concerning Sysbench metrics, The study uses the to make them available to Akamas.
The study uses Sysbench to execute a performance test against System1.
Setup a Prometheus and a Node Exporter to monitor the System
Install the Prometheus provider
Create a provider instance:
4. Install the CSV File provider.
5. Create a provider instance:
The study uses a four-task workflow to test System1 with a new configuration:
The following YAML file represents the complete workflow definition:
Within Akamas, System1 is modeled by a system of two components:
system1-linux, which represents the actual Linux system with its metrics and parameters and is of type Ubuntu 16.04
system1-bench, which represents the Sysbench with its metrics and is of type Sysbench
The following YAML file represents the definition of the Sysbench component type:
Goal: minimize the throughput of the benchmark
Windowing: take the default (compute the score for the entire duration of a trial)
Parameters selection: select only CPU scheduling parameters
Metrics selection: select only the throughput of the benchmark
Trials: 3
Steps: one baseline and one optimize
The following YAML file represents the definition of the study:
This page provides a list of best practices when optimizing applications on AWS EC2 instances with Akamas.
Before planning out an AWS study you must first ensure you have all the rights to effectively launch it. This means you have to:
Check your
Check your
Policies allow instance manipulation and access by being attached to users, group or AWS resources.
The suggested best practice to avoid impacting other critical environments, like production, is to in order to isolate the Akamas instance and the tested resources.
Whether the case, you’re also required to comply with the for instance manipulation. In the following, we show a standard, tag and resource-based policy.
are indeed a robust way to scope your Akamas optimizations: you can invariantly refer to instances and sets of instances across generations and stack them for more elaborate conditions.
on your EC2 resources neatly avoids collateral effects during your experiments.
In order to correctly enable the provisioning and manipulation of EC2 instances you have to set the correct . Notice that AWS offers several .
Here we provide some basic resource-based permissions required in our scope:
EC2 instance start
EC2 instance stop
EC2 instance reboot
EC2 instance termination
EC2 instance description
EC2 instance manipulation
Security groups control inbound and outbound traffic to and from your AWS instances. In your typical optimization scenario, the security group should allow inbound connections on SSH and any other relevant port, including the ones used to gather the telemetry, from the Akamas instance to the ones in the tuned system.
Notice that:
The study environment has to be restored to its original configuration between experiments. While that is quite simple to achieve when creating and terminating instances, this may require additional steps in a resizing scenario.
The deletion and creation of new instances determine changes in ECDSA host keys. This may be interpreted as DNS spoofing from the Akamas instance, so consider overriding the default settings in such contexts.
is an open-source software automation tool suited for instance configuration and provisioning, enabling an Infrastructure as Code approach to the Cloud. In this page we provide a set of templates to perform the most common task to tune EC2 instance types with Akamas, such as:
EC2 instance creation
EC2 instance termination
EC2 instance resizing
Refer to the and to the for more details, and make sure to check the concepts behind to build a robust automation.
The orchestrator requires access to an account or role linked to the correct policies; this requires managing and having access to the required security groups.
The following example playbook provisions an EC2 instance using the latest Ubuntu 18-04 LTS image and then waits for it to be available. The playbook requires the following set of arguments:
key: the name of the SSH key pair to use
Name: the instance name
security_group: the name of the AWS security group
region: the selected AWS region
You can update the ec2_ami_info
task to query for a different family or specify directly the id under ec2.image
.
When executing the script we must assign the following arguments as :
intance_type: type of instance to provision
volume_size: the size of the attached volume
The following playbook terminates all instances with the specified name (or any other tag). It requires the following arguments:
instance_name: the name of the instance
region: the selected AWS region
It makes use of a list of arguments:
instance_name: your instance name
region: the selected AWS region
For a successful workflow, it requires:
The instance to exist
The instance to be unique
In this example study, we are going to optimize a MySQL instance by setting the performance goal of maximizing the throughput of operations towards the database.
As regards workload generation, in this example we are going to use , a popular open-source benchmarking suite for databases. OLTPBench supports several benchmarks, in this example we will be using Synthetic Resource Stresser.
To import the results of the benchmark into Akamas, we are going to use a custom script to convert its output to a CSV file that can be parsed by the .
In order to run the OLTP Benchmark suite against a MySQL installation, you need to first install and configure the two software. In the following, we will assume that both MySQL and OLTP will run on the same machine, to obtain more significant results in terms of performance you might want to run them on separate hosts.
To install MySQL please follow the official documentation. In the following, we will make a few assumptions on the location of the configuration files, the user running the server, and the location of the datafiles. These assumptions are based on a default installation of MySQL on an Ubuntu instance performed via apt.
Datafile location: /var/lib/mysql
Configuration file: /etc/mysql/conf.d/mysql.cnf
MySQL user: mysql
MySQL root user password: root
This is a template for the configuration file mysql.cnf.template
If your installation of MySQL has different default values for these parameters please update the provided scripts accordingly.
To verify your installation of OLTP and initialize the database you can download the following set of scripts and place them in the /home/ubuntu/scripts
folder. Move in the folder and run the init-db.sh
script.
This is the init-db.sh
script:
This script will:
connect to your MySQL installation
create a resourcestresser
database for the test
run the OLTP data generation phase to populate the database
backup the initialized database under /tmp/backup
The resourcestresser.xml
file contains the workload for the application. The default setting is quite small and should be used for testing purposes. You can then modify the test to suit your benchmarking needs.
Here is a step-by-step explanation of all the required configurations for this example.
In this example, we are interested in optimizing MySQL settings and measuring the peak throughput measured using OLTPBench. Hence, we are going to create two components:
A mysql
component which represents the MySQL instance, including all the configuration parameters
An OLTP
component which represents OLTPBench and contains the custom metrics reported by the benchmark
MySQL is a widespread technology and Akamas provides a specific Optimization Pack to support its optimization. OLTP, on the other hand, is a benchmark application and is not yet supported by a specific optimization pack. To use it in our study, we will need to define its metrics first. This operation can be done once and the created component type can be used across many systems.
First, build a metrics.yaml
file with the following content:
You can now create the metrics by issuing the following command:
Finally, create a file named resourcestresser.yaml
with the following definition of the component:
You can now create the metrics by issuing the following command:
Here’s the definition of our system (system.yaml
):
Here’s the definition of our mysql
component (mysql.yaml
):
Here’s the definition of our OLTP
component (oltp.yaml
):
We can create the system by running:
We can then create the components by running:
A workflow for optimizing MySQL can be structured into 6 tasks:
Reset OLTPBench data
Configure MySQL
Restart MySQL
Launch the benchmark
Parse the benchmark results
Here below you can find the scripts that codify these tasks.
This is the restart-mysql.sh
script:
This is the clean_bench.sh
script:
This is the run_test.sh
script:
This file parse_csv.sh
script:
Here is the complete Akamas workflow for this example (workflow.yaml
):
You can create the workflow by running:
This telemetry provider can be installed by running:
To start using the provider, we need to define a telemetry instance (csv.yaml
):
Please make sure the telemetry configuration is correct for your environment (e.g. hostname, username, key, file paths, etc.).
You can create the telemetry instance and attach it to the system by running:
In this example, we are going to leverage Akamas AI-driven optimization capabilities to maximize MySQL database query throughput, as measured by the OLTPBench benchmark.
Here is the Akamas study definition (study.yaml
):
You can create the study by running:
You can then start it by running:
You can now follow the study progress using the UI and explore the results using the Analysis and Metrics tabs.
In this example study, we are going to optimize a MongoDB single server instance by setting the performance goal of maximizing the throughput of operations toward the database.
Concerning performance tests, we are going to employ , a popular benchmark created by Yahoo for testing various NoSQL databases.
To extract MongoDB metrics, we are going to spin up a instance and we are going to use the .
You can use a single host for both MongoDB and YCSB but, in the following example, we replicate a common pattern in performance engineering by externalizing the load injection tool into a separate instance to avoid performance issues and measurement noise.
for the MongoDB server instance (port 27017) and the MongoDB Prometheus exporter (port 9100)
for YCSB and Prometheus (port 9090)
Notice: in the following, the assumption is to be working with Linux hosts.
To correctly extract MongoDB metrics we can leverage a solution like Prometheus, paired with the MongoDB Prometheus exporter. To do so we would need to:
Install the MongoDB Prometheus exporter on mongo.mycompany.com
Install and configure Prometheus on ycsb.mycompany.com
By default, the exporter will expose MongoDB metrics on port 9100
The following YAML fileprometheus.yaml
is an example of the Prometheus configuration that you can use.
Since we are interested in tuning MongoDB by acting on its configuration parameters and by observing its throughput measured using YCSB, we need two components:
Here’s the definition of our system (system.yaml
):
Here’s the definition of our mongo
component (mongo.yaml
):
Here’s the definition of our ycsb
component (ycsb.yaml
):
We can create the system by running:
We can then create the components by running:
Configure MongoDB with the configuration parameters decided by Akamas
Test the performance of the application
Prepare test results
Notice: here we have omitted the Cleanup step because it is not necessary for the context of this study.
We can define a workflow task that uses the FileConfigurator operator to interpolate Akamas parameters into a MongoDB configuration script:
Here’s an example of a templated configuration script for MongoDB:
We can add a workflow task that actually executes the MongoDB configuration script produced by the FileConfigurator:
In each task, we leveraged the reference to the "mongo" component to fetch from its properties all the authentication info to SSH into the right machine and let the FileConfigurator and Executor do their work.
We can define a workflow task that uses the Executor operator to launch the YCSB benchmark against MongoDB:
Here’s an example of a launch script for YCSB:
We can define a workflow task that launches a script that parses the YCSB results into a CSV file (Akamas will process the CSV file and then extract performance test metrics):
By putting together all the tasks defined above we come up with the following workflow definition (workflow.yaml
):
We can create the workflow by running:
Notice: the fact that the instance definition contains the specification of Prometheus queries to map to Akamas metrics is temporary. In the next release, these queries will be embedded in Akamas.
By default, $DURATION$
will be replaced with 30s. You can override it to your needs by setting a duration
property under prometheus
within your mongo
component.
We can now create the telemetry instance and attach it to our system by running:
To start using the provider, we need to define a telemetry instance (csv.yaml
):
We can create the telemetry instance and attach it to our system by running:
Our goal for optimizing MongoDB is to maximize its throughput, measured using a performance test executed with YCSB.
Here’s the definition of the goal of our study, to maximize the throughput:
It is important that the throughput of our MongoDB instance should be considered valid only when it is stable, for this reason, we can use the stability windowing policy. This policy identifies a period of time with at least 100 samples with a standard deviation lower than 200 when the application throughput is maximum.
We are going to optimize every MongoDB parameter:
We are going to add to our study two steps:
A baseline step, in which we set a cache size of 1GB and use the default values for all the other MongoDB parameters
An optimize step, in which we perform 100 experiments to generate the best configuration for MongoDB
Here’s what these steps look like:
Here’s the study definition (study.yaml
) for optimizing MongoDB:
You can create the study by running:
You can then start it by running:
Akamas leverages the CSV telemetry provider to integrate a variety of data sources such as Instana.
All integrations based on this provider consist of two phases:
Metric extraction from Instana
Metric import via CSV provider
The first phase is composed of a set of scripts launched by a workflow task that interacts with the Instana API and saves the metrics of interest for the experiment in a CSV file with a proper format.
The second phase is executed by the CSV telemetry provider that imports the metrics from the CSV file.
To set up the integration you need:
A host (or a container) that can be accessed via SSH from Akamas to run the extraction scripts and host the generated CSV file.
The host must have the following packages installed:
python 3.10+
5.4.1
2.25.1
1.26.16
The host must be able to connect to Instana APIs
A token to authenticate to your Instana account and extract the metrics
The script required to set up this integration is not currently publicly available. To obtain them please contact support@akamas.io.
You can deploy the scripts once and then re-use them for multiple studies as all the required configurations can be provided as arguments which can be changed directly in the akamas workflow yaml or from the UI.
To deploy the scripts, extract the archive to a location of your choice on the host. You can verify that the script can be executed correctly by running the following command, substituting these placeholders:
<my-environment>
with your environment id.
<my-service-id>
with the id of one of your services on instana.
The script will extract the application metrics and save them to /tmp/instana/metrics.
These are the main parameters that can be used with the script along with their description.
token
: environment token
window_size
: defined in ms, is the size of the window for which the metrics are collected. The script collects metrics from now-window_size to now
rollup
: Depending on the selected timeFrame it's possible to select the rollup. The limitation is that we only return 600 Data points per call, thus if you select a windowSize of 1 hour the most accurate rollup you can query for would be 5s. Valid rollups are:
granularity
: granularity of the application metrics (services and endpoints)
max_attempts
: maximum number of attempts before considering the API call failed
timeshift
: fixed time to add to metrics' timestamp
timezone
: timezone to use in timestamp
output_dir
: directory in which save the output files
timestamp_format
: format in which the epoch timestamp is converted into the final CSV file
filename
: Output file name
component
: Akamas component name
type
: the available types are infrastructure
, service
and endpoint
plugin
: plugin type, the available plugins are: kubernetesPod
, containerd
, process
and jvmRuntimePlatform
query
: query to select the correct entity. Used for infrastructure entities
id
: entity ID of the selected service or endpoint
Once the scripts have been deployed you can use them across multiple studies.
To generate the CSV with the required metric add the following task to the workflow of your study taking care of substituting the following variables. Please note that all these variables can also be updated via UI once the workflow has been created.
<my-host>
with the hostname or IP of the instance hosting the scripts.
<my-user>
with the username used to access the instance via SSH.
<my-key>
with an SSH key to access the instance.
<my-environment>
with your environment ID.
<my-application-component>
with the name of the component of type Web Application in your system.
<my-jvm-component>
with the name of the component of type open-jdk in your system.
<my-container-component>
with the name of the component of type container in your system.
<my-instana-process>
with the ID of the Instana process you want to extract.
<my-instana-jvm>
with the id of the Instana jvm you want to extract.
<my-insana-pod-name>
with the name of the pod on Instana you want to extract.
<my-instana-container-name>
with the name of the container on Instana you want to extract.
<my-instana-endpoint-id>
with the id of the endpoint on Instana you want to extract.
<my-instana-service-id>
with the id of the service on Instana you want to extract.
Note that if your system does not include all these components you can just omit some commands as described in the yaml file.
Please note that the script will produce results in the /tmp/instana/metrics
folder. If you wish to run more studies in parallel you might need to change this folder as well.
To set up the CSV telemetry provider create a new telemetry instance for each of your system components.
Here you can find the configuration for each supported component type.
Take care of substituting the following variables.
<my-host>
with the hostname or IP of the instance hosting the scripts.
<my-user>
with the username used to access the instance via ssh.
<my-key>
with an ssh key to access the instance.
Here you can find the list of supported metrics. Metrics from Instana are mapped to metrics from the Akamas optimization pack. As an example the memoryRequests
metric on the kubernetesPod
entity in Instana is mapped to the container_memory_request
metric of component type Kubernetes Container
.
kubernetesPod
containerd
jvmRuntimePlatform
Akamas leverages the CSV telemetry provider to integrate a variety of data sources such as Instana.
All integrations based on this provider consist of two phases:
Metric extraction from Instana
Metric import via CSV provider
The first phase is composed of a set of scripts launched by a workflow task that interacts with the Instana API and saves the metrics of interest for the experiment in a CSV file with a proper format.
The second phase is executed by the CSV telemetry provider that imports the metrics from the CSV file.
In order to set up the integration you need:
A host (or a container) that can be accessed via SSH from Akamas to run the extraction scripts and host the generated CSV file.
The host must have the following packages installed:
python 3.10+
The host must be able to connect to Instana APIs
A token to authenticate to your Instana account and extract the metrics
The script required to set up this integration is not currently publicly available. To obtain them please contact support@akamas.io.
You can deploy the scripts once and then re-use them for multiple studies as all the required configurations can be provided as arguments which can be changed directly in the akamas workflow yaml or from the UI.
To deploy the scripts, extract the archive to a location of your choice on the host. You can verify that the script can be executed correctly by running the following command, substituting these placeholders:
<my-environment>
with your environment ID.
<my-service-id>
with the ID of one of your services on instana.
The script will extract the application metrics and save them to /tmp/instana/metrics.
These are the main parameters that can be used with the script along with their description.
token
: environment token
window_size
: defined in ms, is the size of the window for which the metrics are collected. The script collects metrics from now-window_size to now
rollup
: Depending on the selected timeFrame it's possible to select the rollup. The limitation is that we only return 600 Data points per call, thus if you select a windowSize of 1 hour the most accurate rollup you can query for would be 5s. Valid rollups are:
granularity
: granularity of the application metrics (services and endpoints)
max_attempts
: maximum number of attempts before considering the API call failed
timeshift
: fixed time to add to metrics' timestamp
timezone
: timezone to use in timestamp
output_dir
: directory in which save the output files
timestamp_format
: format in which the epoch timestamp is converted into the final CSV file
filename
: Output file name
component
: Akamas component name
type
: the available types are infrastructure
, service
and endpoint
plugin
: plugin type, the available plugins are: kubernetesPod
, containerd
, process
and jvmRuntimePlatform
query
: query to select the correct entity. Used for infrastructure entities
id
: entity ID of the selected service or endpoint
Once the scripts have been deployed you can use them across multiple studies.
To generate the CSV with the required metric add the following task to the workflow of your study taking care of substituting the following variables. Please note that all these variables can also be updated via UI once the workflow has been created.
<my-host>
with the hostname or IP of the instance hosting the scripts.
<my-user>
with the username used to access the instance via SSH.
<my-key>
with an SSH key to access the instance.
<my-environment>
with your environment ID.
<my-application-component>
with the name of the component of type Web Application in your system.
<my-jvm-component>
with the name of the component of type open-jdk in your system.
<my-container-component>
with the name of the component of type container in your system.
<my-instana-process>
with the ID of the Instana process you want to extract.
<my-instana-jvm>
with the id of the Instana jvm you want to extract.
<my-insana-pod-name>
with the name of the pod on Instana you want to extract.
<my-instana-container-name>
with the name of the container on Instana you want to extract.
<my-instana-endpoint-id>
with the id of the endpoint on Instana you want to extract.
<my-instana-service-id>
with the id of the service on Instana you want to extract.
Note that if your system does not include all these components you can just omit some commands as described in the yaml file.
Please note that the script will produce results in the /tmp/instana/metrics
folder. If you wish to run more studies in parallel you might need to change this folder as well.
To set up the CSV telemetry provider create a new telemetry instance for each of your system components.
Here you can find the configuration for each supported component type.
Take care of substituting the following variables.
<my-host>
with the hostname or IP of the instance hosting the scripts.
<my-user>
with the username used to access the instance via ssh.
<my-key>
with an ssh key to access the instance.
Here you can find the list of supported metrics. Metrics from instana are mapped to metrics from the Akamas optimization pack. As an example the memoryRequests
metric on the kubernetesPod
entity in Instana is mapped to the container_memory_request
metric of component type Kubernetes Container
.
kubernetesPod
containerd
jvmRuntimePlatform
Task Configure OS, which leverages the operator to apply a new set of Linux configuration parameters
Task Start benchmark, which leverages the to launch the benchmark
Consider using in order to enforce finer access control for the automation tools, as shown in the example above.
Refer to the for a complete list of AWS EC2 policies.
Akamas workflows managing EC2 instances usually expect you to either create throwaway instances or resize already existing ones. This provides an example for both cases: all you need to do is to apply the by making the instance type and instance size parameters tunable.
The comes in handy in between instance provisioning. Creating an instance and more so waiting for its DNS to come up may take a while, so forcing a few minutes to wait is usually worth it. This operator is a viable option if you can’t force it through an automation tool.
The is better suited for launching benchmarks and applications rather than for setting up your instance. It’s better to use automation tools or ready-to-use AMIs to set up all required packages and dependencies, as the workflow should cover your actual study case.
While the CloudWatch Exporter is a natural choice for EC2 instance monitoring, EC2 instances are often Linux instances, so it’s useful to use the through the paired with the Node exporter every time you can directly access the instance.
To apply the EC2 parameters from the selected by the Akamas engine you can generate the playbook arguments through a template like the following one, where ec2
is the name of the component:
Instance resizing is a little trickier to deploy as it requires you to and setup the . The following playbook provides a simple way to stop, update, and restart your instance: it is intended as a building block for more elaborate workflows.
To apply the EC2 parameters from the selected by the Akamas engine you can generate the playbook arguments through a template like the following, where ec2
is the name of the component:
To install OLTP you can download a pre-built version or build it from the . In the following, we will assume that OLTP is installed in the /home/ubuntu/oltp
folder.
We are going to use Akamas telemetry capability to import the metrics related to OLTPBench benchmark results, in particular the throughput of operations. To achieve this we can leverage the Akamas , which extracts metrics from CSV files. The CSV file is the one produced in the last task of the workflow of the study.
You can check how to install the exporter . On Ubuntu, you can use the system package manager:
You can check how to configure Prometheus ; by default, it will run on port 9090.
A mongo
component which represents the MongoDB instance all the configuration parameters and maps directly to
A ycsb
component which represents YCSB, in particular, it "houses" the metrics of the performance test, which can be used as parts of the goal of a study. This component maps directly to
As described in the page, a workflow for optimizing MongoDB can be structured in three main steps:
Since we are employing Prometheus to extract MongoDB metrics, we can leverage the to start ingesting data points into Akamas. To use the Prometheus provider we need to define a telemetry-instance (prom.yaml
):
Beyond MongoDB metrics, it is important to ingest into Akamas metrics related to the performance tests run with YCSB, in particular the throughput of operations. To achieve this we can leverage the which parses a CSV file to extract relevant metrics. The CSV file we are going to parse with the help of the provider is the one produced in the last task of the workflow of the study.
<my-token>
with the token you generated from Instana. You can read more on .
endpoint
: URL of the Instana environment (ex: )
<my-token>
with the token you generated from Instana. You can read more on .
5.4.1
2.25.1
1.26.16
<my-token>
with the token you generated from Instana. You can read more on .
endpoint
: URL of the Instana environment (ex: )
<my-token>
with the token you generated from Instana. You can read more on .
1 second
1
5 seconds
5
1 minute
60
5 minutes
300
1 hour
3600
Endpoint
-e, --endpoint
True
-
Token
-t, --token
True
-
Output Directory
-o, --output_directory
True
-
Window Size
-w, --window_size
False
3600000
Rollup
-r, --rollup
False
60
Granularity
-g, --granularity
False
60
Filename
-f, --filename
True
-
Component
-c, --component
True
-
Type
-tp, --type
True
-
Plugin
-p, --plugin
False
-
Query
-q, --query
False
-
Id
-id, --id
False
-
Max Attempts
-ma, --max_attempts
False
5
Timezone
-tz, --timezone
False
UTC
Timeshift
-ts, --timeshift
False
0
Timestamp Format
-tf, --timestamp_format
False
%Y-%m-%d %H:%M:00
Start Timestamp
-st, --start_timestamp
False
-
cpuRequests * 1000
Kubernetes Container
container_cpu_request
cpuLimits * 1000
Kubernetes Container
container_cpu_limit
memoryRequests
Kubernetes Container
container_memory_request
memoryLimits
Kubernetes Container
container_memory_limit
cpu.total_usage
Kubernetes Container
container_cpu_util
memory.usage
Kubernetes Container
container_memory_util
memory.total_rss
Kubernetes Container
container_memory_working_set
cpu.throttling_time
Kubernetes Container
container_cpu_throttle_time
threads.blocked
java-openjdk-XX
jvm_threads_deadlocked
jvm.heap.maxSize
java-openjdk-XX
jvm_heap_size
memory.used
java-openjdk-XX
jvm_memory_used
suspension.time
java-openjdk-XX
jvm_gc_duration
calls
Web Application
transactions_throughput
erroneousCalls
Web Application
transactions_error_throughput
latency - MEAN
Web Application
transactions_response_time
latency - P90
Web Application
transactions_response_time_p90
latency - P99
Web Application
transactions_response_time_p99
calls
Web Application
requests_throughput
erroneousCalls
Web Application
requests_error_throughput
latency - MEAN
Web Application
requests_response_time
latency - P90
Web Application
requests_response_time_p90
latency - P99
Web Application
requests_response_time_p99
# Setup Instana integration
1 second
1
5 seconds
5
1 minute
60
5 minutes
300
1 hour
3600
Endpoint
-e, --endpoint
True
-
Token
-t, --token
True
-
Output Directory
-o, --output_directory
True
-
Window Size
-w, --window_size
False
3600000
Rollup
-r, --rollup
False
60
Granularity
-g, --granularity
False
60
Filename
-f, --filename
True
-
Component
-c, --component
True
-
Type
-tp, --type
True
-
Plugin
-p, --plugin
False
-
Query
-q, --query
False
-
Id
-id, --id
False
-
Max Attempts
-ma, --max_attempts
False
5
Timezone
-tz, --timezone
False
UTC
Timeshift
-ts, --timeshift
False
0
Timestamp Format
-tf, --timestamp_format
False
%Y-%m-%d %H:%M:00
Start Timestamp
-st, --start_timestamp
False
-
cpuRequests * 1000
Kubernetes Container
container_cpu_request
cpuLimits * 1000
Kubernetes Container
container_cpu_limit
memoryRequests
Kubernetes Container
container_memory_request
memoryLimits
Kubernetes Container
container_memory_limit
cpu.total_usage
Kubernetes Container
container_cpu_util
memory.usage
Kubernetes Container
container_memory_util
memory.total_rss
Kubernetes Container
container_memory_working_set
cpu.throttling_time
Kubernetes Container
container_cpu_throttle_time
threads.blocked
java-openjdk-XX
jvm_threads_deadlocked
jvm.heap.maxSize
java-openjdk-XX
jvm_heap_size
memory.used
java-openjdk-XX
jvm_memory_used
suspension.time
java-openjdk-XX
jvm_gc_duration
calls
Web Application
transactions_throughput
erroneousCalls
Web Application
transactions_error_throughput
latency - MEAN
Web Application
transactions_response_time
latency - P90
Web Application
transactions_response_time_p90
latency - P99
Web Application
transactions_response_time_p99
calls
Web Application
requests_throughput
erroneousCalls
Web Application
requests_error_throughput
latency - MEAN
Web Application
requests_response_time
latency - P90
Web Application
requests_response_time_p90
latency - P99
Web Application
requests_response_time_p99