Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Optimizing performance of a Node.js application with V8 runtime tuning leveraging performance tests
Optimizing performance of a Java application with JVM tuning leveraging performance tests
COMING SOON! Please reach out to us at support@akamas.io if interested.
COMING SOON! Please reach out to us at support@akamas.io if interested.
COMING SOON! Please reach out to us at support@akamas.io if interested.
In this guide, you optimize the cost (or resource footprint) of a Java microservice running on Kubernetes. The study tunes both pod resource settings (CPU and memory requests and limits) and JVM options (max heap size, garbage collection algorithm, etc.) at the same time, while also taking into account your application performance and reliability requirements (SLOs). This optimization happens in production, leveraging Akamas live optimization capabilities.
an Akamas instance
a Kubernetes cluster, with a Java-based deployment to be optimized
a supported telemetry data source configured to collect metrics from the target Kubernetes cluster (see here for the full list)
a way to apply configuration changes recommended by Akamas to the target deployment. In this guide, Akamas interacts directly with the Kubernetes APIs via kubectl.
You need a service account with permission to update your deployment (see below for other integration options)
In this guide, we assume the following setup:
the Kubernetes deployment to be optimized is called adservice (in the boutique namespace)
in the deployment, there is a container named server, where the application JVM runs
Dynatrace is used as an observability tool
Let's set up the Akamas optimization for this use case.
For this optimization, you need the following components to model the adservice tech stack:
A Kubernetes container component, which contains container-level metrics like CPU usage and parameters to be tuned like CPU limits (from the Kubernetes optimization pack)
A Java OpenJDK component, which contains JVM-level metrics like heap memory usage and parameters to be tuned like the garbage collector algorithm (from the Java OpenJDK optimization pack)
A Web Application component, which contains service-level metrics like throughput and response time of the microservice (from the Web application optimization pack)
Let's start by creating the system, that represents the Kubernetes deployment to be optimized. To create it, write a system.yaml
manifest like this:
Then run:
Now create a component-container.yaml
manifest like the following:
Notice the component includes properties that specify how Dynatrace telemetry will look up this container in the Kubernetes cluster (the same will happen for the following components).
These properties are dependent upon the telemetry provider you are using.
Then run:
Next, create a component-jvm.yaml
manifest like the following:
Then run:
Now create a component-webapp.yaml
manifest like the following:
Then run:
To optimize a Kubernetes microservice in production, you need to create a workflow that defines how to deploy in production the new configuration recommended by Akamas.
Let's explore the high-level tasks required in this scenario and the options you have to adapt it to your environment:
Let's now create a workflow.yaml
manifest like the following:
In the configure
task, Akamas will apply the container CPU/memory limits and JVM options recommended by Akamas AI to the deployment file. To do that, copy your deployment manifest to a template file (here called adservice.yaml.templ
), and substitute the current values with Akamas parameter placeholders as follows:
Whenever Akamas recommended configuration is applied, the configure task will create the actual adservice.yaml
deployment file with the parameter placeholders substituted with values recommended by Akamas AI, and then the new deployment will be applied via kubectl apply
.
To create the workflow, run:
Create a telemetry instance based on your observability setup to collect your target Kubernetes deployment metrics.
Create a telemetry.yaml
manifest like the following:
Then run:
It's time to create the Akamas study to achieve your optimization objectives.
Let's explore how the study is designed by going through the main concepts. The complete study manifest is available at the bottom.
You can now create a study.yaml
manifest like the following:
Then run:
You can now follow the live optimization progress and explore the results using the Akamas UI.
To quickly set up this optimization, download the Akamas template manifests and update the values file to match your needs. Then, create your optimization using the Akamas scaffolding.
In this guide, you optimize the cost (or resource footprint) of a Kubernetes deployment where the number of replicas is controlled by the HPA. The study tunes both pod resource settings (CPU and memory requests and limits) and HPA options (target CPU utilization) at the same time, while also taking into account your application performance and reliability requirements (SLOs). This optimization happens in production, leveraging Akamas live optimization capabilities.
an Akamas instance
a Kubernetes cluster, with a deployment to be optimized
a Horizontal Pod Autoscaler working on the desired deployment
a supported telemetry data source configured to collect metrics from the target Kubernetes cluster (see for the full list)
a way to apply configuration changes recommended by Akamas to the target deployment and HPA. In this guide, Akamas interacts directly with the Kubernetes APIs via kubectl.
You need a service account with permissions to update your deployment (see below for other integration options).
In this guide, we assume the following setup:
the Kubernetes deployment to be optimized is called frontend (in the hipster-shop namespace)
in the deployment, there is a container named server, where the app runs
the HPA is called frontend-hpa
both Dynatrace and Prometheus are used as observability tools
Let's set up the Akamas optimization for this use case.
For this optimization, you need the following components to model the frontend tech stack:
An HPA component, which contains HPA parameters like the target CPU utilization
Let's start by creating the system, which represents the Kubernetes deployment to be optimized. To create it, write a system.yaml
manifest like this:
Then run:
Now create the three Kubernetes components. Create a workload.yaml
manifest like the following:
Then create a container.yaml
manifest like the following:
And a pod.yaml
manifest like the following:
Now create the entities by running:
Now create an application.yaml
manifest like the following:
Notice the component includes properties that specify how Dynatrace telemetry will look up this container in the Kubernetes cluster.
These properties are dependent upon the telemetry provider you are using. See the reference for the full list of supported providers and relative configurations.
The run:
Finally, create anhpa.yaml
manifest like the following:
The HPA component does not provide any metric, so we do not need to specify anything about the workload.
Then run:
To optimize a Kubernetes microservice in production, you need to create a workflow that defines how the new configuration recommended by Akamas will be deployed in production.
Let's explore the high-level tasks required in this scenario and the options you have to adapt it to your environment:
Let's now create a workflow.yaml
manifest like the following:
Then run:
To collect metrics of your target Kubernetes deployment, you create a telemetry instance based on your observability setup.
Create a dynatrace.yaml
manifest like the following:
Then run:
Create a prometheus.yaml
manifest like the following:
Then run:
It's now time to create the Akamas study to achieve your optimization objectives.
Let's explore how the study is designed by going through the main concepts. The complete study manifest is available at the bottom.
You can now create a study.yaml
manifest like the following:
Then run:
You can now follow the live optimization progress and explore the results using the Akamas UI.
In this example study we’ll tune the parameters of PageRank, one of the benchmarks available in the , with the goal of minimizing its memory usage. Application monitoring is provided by Prometheus, leveraging a JMX exporter.
The test environment includes the following instances:
Akamas: instance running Akamas
PageRank: instance running the PageRank benchmark and the Prometheus monitoring service
To gather metrics about PageRank we will use a Prometheus and a JMX exporter. Here’s the scraper to add to the Prometheus configuration to extract the metrics from the exporter:
To run and monitor the benchmark we’ll require on the PageRank instance:
The
The , plus a configuration file to expose the required classes
Here’s the snippet of code to configure the instance as required for this guide:
In this section, we will guide you through the steps required to set up the optimization on Akamas.
Here’s the definition of the system we will use to group our components and telemetry instances for this example:
To create the system run the following command:
Here’s the definition of the component:
To create the component in the system run the following command:
The workflow used for this study consists of two main stages:
generate the configuration file containing the tested Java parameters
run the execution using previously written parameters
Here’s the definition of the workflow:
Where the configuration template is java_opts.template
is defined as follows:
To create the workflow run the following command:
The following is the definition of the telemetry instance that fetches metrics from the Prometheus service:
To create the telemetry instance in the system run the following command:
This telemetry instance will be able to bind the fetched metrics to the related jvm component thanks to the prometheus
attribute we previously added in its definition.
The goal of this study is to find a JVM configuration that minimizes the peak memory used by the benchmark.
The optimized parameters are the maximum heap size, the garbage collector used and several other parameters managing the new and old heap areas. We also specify a constraint stating that the GC regions can’t exceed the total heap available, to avoid experimenting with parameter configurations that can’t start in the first place.
Here’s the definition of the study:
To create and run the study execute the following commands:
In this example, you will use Akamas live optimization to minimize the cost of a Kubernetes deployment, while preserving application performance and reliability requirements.
In this example, you need:
an Akamas instance
a Kubernetes cluster, with a deployment to be optimized
the kubectl
command installed in the Akamas instance, configured to access the target Kubernetes and with privileges to get and update the deployment configurations
a supported telemetry data source (e.g. Prometheus or Dynatrace) configured to collect metrics from the target Kubernetes cluster
This example leverages the following optimization packs:
The system represents the Kubernetes deployment to be optimized (let's call it "frontend"). You can create a system.yaml
manifest like this:
Create the new system resource:
The system will then have two components:
A Kubernetes container component, which contains container-level metrics like CPU usage and parameters to be tuned like CPU limits
A Web Application component, which contains service-level metrics like throughput and response time
In this example, we assume the deployment to be optimized is called frontend, with a container named server, and is located within the boutique namespace. We also assume that Dynatrace is used as a telemetry provider.
Create a component-container.yaml
manifest like the following:
Then run:
Now create a component-webapp.yaml
manifest like the following:
Then run:
The workflow in this example is composed of three main steps:
Update the Kubernetes deployment manifest with the parameters (CPU and memory limits) recommended by Akamas
Apply the new parameters (kubectl apply)
Wait for the rollout to complete
Sleep for 30 minutes (observation interval)
Create a workflow.yaml
manifest like the following:
Then run:
Create the telemetry.yaml
manifest like the following:
Then run:
In this live optimization:
the goal is to reduce the cost of the Kubernetes deployment. In this example, the cost is based on the amount of CPU and memory limits (assuming requests = limits).
the approval mode is set to manual, a new recommendation is generated daily
to avoid impacting application performance, constraints are specified on desired response times and error rates
to avoid impacting application reliability, constraints are specified on peak resource usage and out-of-memory kills
the parameters to be tuned are the container CPU and memory limits (we assume requests=limits in the deployment file)
Create a study.yaml
manifest like the following:
Then run:
You can now follow the live optimization progress and explore the results using the Akamas UI for Live optimizations.
The Kubernetes Workload, Container and Pod components, containing metrics like CPU used for the different objects and parameters to be tuned like CPU limits at the container levels (from the )
A Web Application component, which contains service-level metrics like throughput and response time of the microservice (from the optimization pack)
These commands are executed from the toolbox, an Akamas utility that can be enabled in an Akamas installation on Kubernetes. Make sure that kubectl
is configured correctly to connect to your Kubernetes cluster and can update your target deployment. See for more details.
If you have not installed the optimization pack yet, take a look at the optimization pack page Java OpenJDK to proceed with the installation.
We’ll use a component of type to represent the JVM underlying the PageRank benchmark. To identify the JMX-related metrics in Prometheus the configuration requires the prometheus
property for the telemetry service, detailed later in this guide.
Kubernetes microservices
Cloud instances
Spark applications
Application Runtimes
Optimizing cost of a Kubernetes microservice while preserving SLOs with performance tests
Optimizing cost of a Java microservice on Kubernetes while preserving SLOs with performance tests
In this example, you will go through the optimization of a Spark application running on AWS instances. We’ll be using a PageRank implementation included in Renaissance, an industry-standard Java benchmarking suite, tuning both Java and AWS parameters to improve the performance of our application.
For this example, you’re expected to use two dedicated machines:
an Akamas instance
a Linux-based AWS EC2 instance
The Akamas instance requires provisioning and manipulating instances, therefore it requires to be enabled to do so by setting AWS Policies, integrating with orchestration tools (such as Ansible), and an inventory linked to your AWS EC2 environment.
The Linux-based instance will run the application benchmark, so it requires the latest open-jdk11 release
For this study you’re going to require the following telemetry providers:
CSV Provider to parse the results of the benchmark
Prometheus provider to monitor the instance
AWS Telemetry provider to extract instance price
The renaissance suite provides the benchmark we’re going to optimize.
Since the application consists of a jar file only, the setup is rather straightforward; just download the binary in the ~/renaissance/
folder:
In the same folder upload the template file launch.benchmark.sh.temp
, containing the script that executes the benchmark using the provided parameters and parses the results:
You may find further info about the suite and its benchmarks in the official doc.
In this section, we will guide you through the steps required to set up the optimization on Akamas.
This example requires the installation of the following optimization packs:
Our system could be named renaissance after its application, so you’ll have a system.yaml
file like this:
Then create the new system resource:
The renaissance system will then have three components:
A benchmark component
A Java component
An EC2 component, i.e. the underlying instance
Java component
Create a component-jvm.yaml
file like the following:
Then type:
Benchmark component
Since there is no optimization pack associated with this component, you have to create some extra resources.
A metrics.yaml
file for a new metric tracking execution time:
A component-type benchmark.yaml
:
The component pagerank.yaml
:
Create your new resources, by typing in your terminal the following commands:
EC2 component
Create a component-ec2.yaml
file like the following:
Then create its resource by typing in your terminal:
The workflow in this example is composed of three main steps:
Update the instance type
Run the application benchmark
Stop the instance
To manage the instance we are going to integrate a very simple Ansible in our workflow: the FileConfigurator operator will replace the parameters in the template file in order to generate the code run by the Executor operator, as explained in the Ansible page.
In detail:
Update the instance size
Generate the playbook file from the template
Update the instance using the playbook
Wait for the instance to be available
Run the application benchmark
Configure the benchmark Java launch script
Execute the launch script
Parse PageRank output to make it consumable by the CSV telemetry instance
Stop the instance
Configure the playbook to stop an instance with a specific instance id
Run the playbook to stop the instance
The following is the template of the Ansible playbook:
The following is the workflow configuration file:
If you have not installed the Prometheus telemetry provider or the CSV telemetry provider yet, take a look at the telemetry provider pages Prometheus provider and CSV Provider to proceed with the installation.
Prometheus
Prometheus allows us to gather jvm execution metrics through the jmx exporter: download the java agent required to gather metrics from here, then update the two following files:
The prometheus.yml
file, located in your Prometheus folder:
The config.yml
file you have to create in the ~/renaissance folder:
Now you can create a prometheus-instance.yaml
file:
Then you can install the telemetry instance:
You may find further info on exporting Java metrics to Prometheus here.
CSV - Telemetry instance
Create a telemetry-csv.yaml
file to read the benchmark output:
Then create the resource by typing in your terminal:
Here we provide a reference study for AWS. As we’ve anticipated, the goal of this study is to optimize a sample Java application, the PageRank benchmark you may find in the renaissance benchmark suite by Oracle.
Our goal is rather simple: minimizing the product between the benchmark execution time and the instance price, that is, finding the most cost-effective instance for our application.
Create a study.yaml
file with the following content:
Then create the corresponding Akamas resource and start the study:
In this example study we’ll tune the parameters of SparkPi, one of the example applications provided by most of the Apache Spark distributions, to minimize its execution time. Application monitoring is provided by the Spark History Server APIs.
The test environment includes the following instances:
Akamas: instance running Akamas
Spark cluster: composed of instances with 16 vCPUs and 64 GB of memory, where the Spark binaries are installed under /usr/lib/spark
. In particular, the roles are:
1x master instance: the Spark node running the resource manager and Spark History Server (host: sparkmaster.akamas.io
)
2x worker instances: the other instances in the cluster
To gather metrics about the application we will leverage the Spark History Server. If it is not already running, start it on the master instance with the following command:
To make sure the tested application is available on your cluster and runs correctly, execute the following commands:
In this section, we will guide you through the steps required to set up on Akamas the optimization of the Spark application execution.
Here’s the definition of the system we will use to group our components and telemetry instances for this example:
To create the system run the following command:
We’ll use a component of type Spark Application 2.3.0 to represent the application running on the Apache Spark framework 2.3.
In the snippet shown below, we specify:
the field properties required by Akamas to connect via SSH to the cluster master instance
the parameters required by spark-submit
to execute the application
the sparkApplication
flag required by the telemetry instance to associate the metrics from the History Server to this component
To create the component in the system run the following command:
The workflow used for this study contains only a single stage, where the operator submits the application along with the Spark parameters under test.
Here’s the definition of the workflow:
To create the workflow run the following command:
If you have not installed the Spark History Server telemetry provider yet, take a look at the telemetry provider page Spark History Server Provider to proceed with the installation.
Here’s the definition of the component, specifying the History Server endpoint:
To create the telemetry instance in the system run the following command:
This telemetry instance will be able to bind the fetched metrics to the related sparkPi component thanks to the sparkApplication
attribute we previously added in its definition.
The goal of this study is to find a Spark configuration that minimizes the execution time for the example application.
To achieve this goal we’ll operate on the number of executor processes available to run the application job, and the memory and CPUs allocated for both driver and executors. The domains are configured so that the single driver/executor process does not exceed the size of the underlying instance, and the constraints make it so that the application overall does not require more resources than the ones available in the cluster, also taking into account that some resources must be reserved for other services such as the cluster manager.
Note that this study uses two constraints on the total number of resources to be used by the spark application. This example refers to a cluster of three nodes with 16 cores and 64 GB of memory each, and at least one core per instance should be reserved for the system.
Here’s the definition of the study:
To create and run the study execute the following commands: