1 of 100

3.2.1 How to use this documentation

This page is intended as your entry point to the Akamas documentation.

Getting started with Akamas

This guide introduces Akamas and covers various fundamental topics such as licensing and deployment models, security topics, and maintenance & support services.

It is recommended to read this guide before moving to other guides on how to install, integrate, and use Akamas. The section of the Reference guide can help in reviewing Akamas key concepts.

Introduction to Akamas

A quick introduction to Akamas

Akamas is the AI-powered optimization platform designed to maximize service quality and cost efficiency without compromising on application performance. Akamas supports both production environments under live, dynamic workloads, and in test/pre-production environments against any what-if scenario and workload.

Thanks to Akamas, performance engineers, DevOps, CloudOps, FinOps and SRE teams can keep complex applications, such as Kubernetes microservices applications, optimized to avoid any unnecessary cost and any performance risks.

Akamas Optimization platform

The Akamas optimization platform leverages patented AI techniques that can autonomously identify optimal full-stack configurations driven by any custom-defined goals and constraints (SLOs), without any human intervention, any agents, and any code or byte-code changes.

Licensing

Software Licenses

Akamas software licensing model is subscription-based (typically on a yearly basis). For more information on Akamas' cost model and software licensing costs, please contact .

Deployment

Akamas is an on-premise product running on a dedicated machine within the customer environment:

on a virtual or physical machine in your data center
on a virtual machine managed running on a cloud, by any cloud provider (e.g. AWS EC2)
on your own laptop

Akamas also provides a Free Trial option which can be requested .

Cloud Hosting

Refer to your Cloud Provider website for information about cloud hosting options and related cost information.

AWS EC2

For AWS EC2 costs visit the EC2 Pricing page and use the AWS Pricing Calculator to estimate the cost for your architecture.

Maintenance & Support (M&S) Services

This page is intended as a first introduction to Akamas Maintenance & Support (M&S) Services.

Please refer to the specific contract in place with your Company.

Akamas M&S Services include:

access to Software versions released as major and minor versions, service packs, patches, and hotfixes according to Support levels for software versions.
assistance from Akamas Customer Support for inquiries about the Akamas product and issues encountered while using Akamas products where there is a reasonable expectation that issues are caused by Akamas products, according to

Akamas M&S Services do not include any installation and upgrade services, creation of any custom optimization packs, telemetry providers, or workflow operators, or implementation of any custom features and integrations that are not provided out-of-the-box by the Akamas products.

Customer Support Services

Akamas Customer Support Services are delivered by Akamas support engineers, also called Support Agents, who will work remotely with Customer to provide a temporary remedy for the incident and, ultimately, a permanent resolution. Akamas Support Agents automatically escalate issues to the appropriate technical group within Akamas and notify Customers of any relevant progress. Akamas provides Customers with the ability to escalate issues when appropriate.

Please notice that Customer Support services are not to be considered as alternatives to product documentation and training, or to professional and consulting services, so adequate knowledge of Akamas products is assumed when interacting with Akamas Customer Support. Thus, during the resolution of a reported issue Support Agents may redirect Customer to training or professional services (that are not part of the scope of this service).

Support levels for Customer Support Services

Akamas Customer Support Services provides different standard levels of support. Please verify the level of support specified in the contract in place with your Company.

Severity levels

The following table describes the different severity levels for Customer Support.

Severity level

Description

Impact

Support levels for software versions

Different levels of support are provided for software versions of Akamas products, starting from its general availability (GA) date, and depending on the release of following software versions.

Version Numbering

Akamas adopts a three-place numbering scheme MA.MI.SP to designate released versions of its Software, where:

Support levels with Akamas

Based on the , the following table describes the level of support of the Akamas versions after the version 3.2 GA date (2023 May, 1st).

Version

Support Level

Installing Akamas

Architecture

Akamas is based on a microservices architecture where each service is deployed as a container and communicates with other services via REST APIs. Akamas can be deployed on a dedicated machine (Akamas Server) or on a Kubernets cluster.

The following figure represents the high-level Akamas architecture.

Interact with Akamas

Users can interact with Akamas via either the Graphical User Interface (GUI), Command-Line Interface (CLI), or via Application Programmatic Interface (API).

Docker compose installation

This section describes how to install Akamas on Docker.

Preliminary steps

Please make sure to read the section before installing Akamas.

Before installing Akamas, please follow these steps:

Prerequisites

Before installing the Akamas Server please make sure to review all the following requirements:

Hardware Requirements

Running in your data center

The following table provides the minimal hardware requirements for the virtual or physical machine used to install the Akamas server in your data center.

Resource

Requirement

CPU

Running on AWS EC2

To run Akamas on an AWS Instance you need to create a new virtual machine based on one of the supported operating systems. You can refer to for step-by-step instructions on creating the instance.

As shown in the following diagram, you can create the Akamas instance in the same AWS region, Virtual Private Cloud (VPC), and private subnet as your own already existing EC2 machines and by creating/configuring a new security group that allows communication between your application instances and Akamas instance. The inbound/outbound rules of this security group must be configured as explained in the Networking Requirements section of this page.

It is recommended to use an m6a.xlarge instance with at least 70GB of disks of type GP2 or GP3 and select the latest LTS version of Ubuntu.

Supported AWS Regions

Akamas can be run in any EC2 region.

You can find the latest version supported for your preferred region .

AWS Service Limits

Before installing Akamas on an AWS Instance please make sure to meet your AWS service limits (please refer to the official AWS documentation ).

Running on a laptop

This special case is also referred to as "Akamas-in-a-box" and is covered by the installation guide.

Network requirements

This section lists all the connectivity settings required to operate and manage Akamas

Internet access

Internet access is required for Akamas online installation and updated procedures and allows retrieving the most updated Akamas container images from the Akamas private Amazon Elastic Container Registry (ECR).

If internet access is not available for policies or security reasons, Akamas installation and updates can be executed offline.

Internet access from the Akamas server is not mandatory but it’s strongly recommended.

Ports

The following table provides a list of the ports on the Akamas server that have to be reachable by Akamas administrators and users to properly operate the system.

In the specific case of AWS instance and customer instances sharing the same VPC/Subnet inside AWS, you should:

open all of the ports listed in the table above for all inbound URLs (0.0.0.0/32) on your AWS security group
open outbound rules to all traffic and then attach this AWS security group (which must reside inside a private subnet) to the Akamas machine and all customer application AWS machines

Install Akamas dependencies

This page will guide you through the installation of software components that are required to get the Akamas Server installed on a machine. Please read the Akamas dependencies for a detailed list of these software components for each specific OS.

While some links to official documentation and installation resources are provided here, please make sure to refer to your internal system engineering department to ensure that your company deployment processes and best practices are correctly matched.

Dependencies Setup

As a preliminary step before installing any dependency, it is strongly suggested to create a user named akamas on your machine hosting Akamas Server.

Docker

Follow the reference documentation to install docker on your system.

Docker installation guide:

Docker compose is already installed since Docker 23+. To install it on previous versions of Docker follow this installation guide:

AWS CLI v2:

To run docker with a non-root user, such as the akamas user, you should add it to the docker group. You can follow the guide at:

Verify dependencies

As a quick check to verify that all dependencies have been correctly installed, you can run the following commands

Docker:

For offline installations, you can check docker with docker ps command

Docker compose :

Docker versions older than 23 must usedocker-compose command instead of docker compose

AWS CLI:

Install the Akamas Server

Akamas is deployed as a set of containerized services running on Docker and managed via Docker Compose. The latest version of the Akamas Docker Compose file and all the images required by Docker can be downloaded from the AWS ECR repository.

Two installation modes are available:

, in case the Akamas Server has access to the Internet - is also supported.
, in case the Akamas Server does not have access to the Internet.

Online installation mode

Akamas is deployed as a set of containerized services running on Docker and managed via Docker Compose. In the online installation mode, the latest version of the Akamas Docker Compose file and all the images required by Docker can be downloaded from the AWS ECR repository.

In case the Akamas Server is behind a proxy server please also read how to setup Akamas behind a Proxy.

Get Akamas Docker artifacts

It is suggested first to create a directory akamas in the home directory of your user, and then run the following command to get the latest compose file:

cd ~
mkdir akamas
cd akamas
curl -O https://s3.us-east-2.amazonaws.com/akamas/compose/$(curl https://s3.us-east-2.amazonaws.com/akamas/compose/stable.txt)/docker-compose.yml

Configure Akamas environment variables

To configure Akamas, you should set the following environment variables:

AKAMAS_CUSTOMER: the customer name matching the one referenced in the Akamas license.
AKAMAS_BASE_URL: the endpoint in the Akamas APIs that will be used to interact with the CLI, typically http://<akamas server dns address>:8000

You can export the variables using the following snippet:

It is recommended to save these exported variables in your ~/.bashrc file for convenience.

Start Akamas

In order to login into AWS ECR and pull the most recent Akamas container images you also need to set the AWS authentication variables to the appropriate values provided by Akamas Customer Support Services by running the following command:

At this point, you can start installing Akamas server by running the following AWS CLI commands:

Offline installation mode

Akamas is deployed as a set of containerized services running on Docker and managed via Docker Compose. In the offline installation mode, the latest version of the Akamas Docker Compose file and all the images required by Docker cannot be downloaded from the AWS ECR repository.

Get Akamas Docker artifacts

Get in contact with Akamas Customer Services to get the latest versions of the Akamas artifacts uploaded to a location of your choice on the dedicated Akamas Server.

Akamas installation artifacts will include:

Changing UI Ports

By default, Akamas uses the following ports for its UI:

80 (HTTP)
443 (HTTPS)

Depending on the configuration of your environment, you may want to change the default settings: to do so, you’ll have to update the Akamas docker-compose file.

Inside the docker-compose.yml file, scroll down until you come across the akamas-ui service. There you will find a specification as follows:

Setup HTTPS configuration

Akamas APIs and UI use plain HTTP when they are first installed. To enable the use of HTTPS you will need to:

Ask your security team to provide you with a valid certificate for your server. The certificate usually consists of two files with ".key" and ".pem" extensions. You will need to provide the Akamas server DNS name.
Create a folder named "certs" in the same directory as Akamas' docker-compose file;
Copy the ".key" and ".pem" files in the created "certs" folder and rename them to "akamas.key" and "akamas.pem" respectively. Make sure that the files belong to the same user and group you use to run Akamas.

Kubernetes installation

This section describes how to install Akamas on a Kubernetes cluster.

Preliminary steps

Before installing Akamas, please follow these steps:

Prerequisites

Before installing the Akamas please make sure to review all the following requirements:

Software Requirements

This page describes the requirements that should be fulfilled by the user when installing or managing an Akamas installation on Kubernetes. The software listed below is usually installed on the user's workstation or laptop.

Kubectl

Kubectl must be installed and configured to interact with the desired cluster. Refer to the to set up the client.

To interact with the Kubernetes API server you will need , preferably with a version matching the cluster. To check both the client and cluster versions, run:

Install Akamas

Akamas is deployed on your Kubernetes cluster through a Helm chart, and all the required images can be downloaded from the AWS ECR repository.

Two installation modes are available:

online installation, in case the Kubernetes cluster can access the Internet.
offline installation, in case the Kubernetes cluster does not have access to the Internet or you need to use a private image registry.

HTTPS configuration

HTTPS configuration can be set in the Akamas services (UI and Kong) or in the Ingress definition.

Certificate in the Ingress

Declare the certificate secret by adding a tls section to the Ingress definition:

ui:
  ingress:
    enabled: true
    className: "<class-name>"    # ingress class name

    hosts:
      - host: "example.company.com"
        paths:
          - path: /
            pathType: Prefix

    tls:
      - secretName: "<secret name>"  # secret name containing the certificate and key data
        hosts:
          - "example.company.com"

You can apply the same configuration to the kong service to add a certificate to the API Gateway.

For more information regarding the TLS definition refer to the .

Certificate in the Akamas services

To add a certificate to both the UI and API Gateway you need to generate the akamas.key and akamas.pem files, and create a secret in Akamas' namespace with the following command:

To complete the update, restart the deployments:

Install the CLI

This section describes how to install an Akamas workstation

The Akamas CLI allows users to invoke commands against the Akamas dedicated machine (Akamas Server). The Akamas CLI can also be installed on a different system than the Akamas Server.

Prerequisites

Linux and Windows operating systems are supported for installing Akamas CLI.

Installation steps

The Akamas CLI can be installed and configured in two simple steps:

Refer to the section to modify the CLI ports the Akamas Server is listening to. Section provides instructions on how to interact with Akamas via a proxy server.

Initialize the CLI

The CLI is used to interact with an akamas server. To initialize the configuration of the Akamas CLI you can run the command:

and follow the wizard to provide the required information such as the server IP.

Here is a summary of the configuration wizard options.

This configuration can be changed at any time (see how to ).

After this step, the Akamas CLI can be used to login to the Akamas server, by issuing the following command:

and providing the credentials as requested.

Change CLI configuration

The CLI configuration contains the information required to communicate with the akamas server. It can be easily created and updated with a configuration wizard. This page describes the main options of the Akamas CLI and how to modify them. If your Akamas instance is installed with Kubernetes, ensure the UI service is .

API Address

The CLI, as well as the UI, interacts with the akamas server via APIs. The apiAddress configuration contains the information required in order to communicate with the server.

Use a proxy server

The Akamas CLI supports interacting with the API server through an HTTP/HTTPS proxy server.

To enable access via an HTTP proxy, set the environment variable HTTP_PROXY. From the following snippet, replace proxy_ip and proxy_port with the desired values.

export HTTP_PROXY="http://<proxy_ip>:<proxy_port>"

Then, run an akamas command to verify access.

akamas status --debug

Access through an HTTPS proxy can be set by using environment variable HTTPS_PROXY, instead of HTTP_PROXY.

Verify the installation

Run the following command to verify the correct startup and initialization of Akamas:

akamas status

When all services have been started this command will return an "OK" message. Please notice that it might take a few minutes for Akamas to start all services.

To check that also UI is properly working please access the following URL:

http://<akamas server name here>

You will see the Akamas login form:

Please notice that it is impossible to log into Akamas before a license has been installed. Read here how to Install an Akamas license.

Install the license

Logging into Akamas requires a valid Akamas license.

To install a license get in touch with Akamas Customer Service to receive:

the Akamas license file
your "customer name" to configure in the variable AKAMAS_CUSTOMER for Docker installations or akamasCustomer for Kubernetes installations

Manage anonymous data collection

Akamas might collect anonymized usage information on running optimizations. Collection and tracking are disabled by default and can be manually enabled.

Docker installation

External tracking is managed through the following environment variables:

AKAMAS_TRACKER_URL: the target URL for all tracking info.

Manage Akamas

This section is a collection of different topics related to how to manage the Akamas Server.

This section covers some topics on how to manage the Akamas Server:

Audit logs

Akamas audit logs

Akamas stores all its logs into an internal Elasticsearch instance: some of these logs are reported to the user in the GUI in order to ease the monitoring of workflow executions, while other logs are only accessible via CLI and are mostly used to provide more context and information to support requests.

Audit access can be performed by using the CLI in order to extract logs related to UI or API access. For instance, to extract audit logs from the last hour use the following commands:

Monitor the Akamas Server

External tools

You can use any monitoring tool to check the availability of the Akamas instance.

Checking Akamas services

To check the status of the Akams services please run akamas status -d to identify which service is not able to start up correctly

Here is an example of output:

Backup & Recover of the Akamas Server

Akamas server backup

The process of backing up an Akamas server can be divided in two parts, that is system backup and otherwise start Akamas. Backup can be performed in any way you see fit: they’re just regular files so you can use any backup tool.

System backup

System services are hosted on AWS ECR repo so the only thing that fully defines a working Akamas application is the docker-compose.yml file. Performing a backup of the Akamas application is as simple as copying this single file to your backup location. you may schedule any script that performs this weekly or at any frequency you see fit

User data backup

You may list all existing Akamas studies via the Akamas CLI command:

Then you can export all existing studies one by one via the CLI command

where UUID is the UUID of a single study. This command exports into a single archive file (tar.gz). These archive files can be backed up to your favorite backup folder.

Akamas server recovery

Akamas server recovery involves recovering the system backup, restarting the Akamas service then re-importing the studies.

System Restore

To restore the system you must recover the original docker-compose.yml then launch the command

from the folder where you placed this YAML file and then wait for the system to come up, by checking it with the command

User data restore

All studies can be re-imported singularly with the CLI command (referring to the correct pathname of the archive):

Using Akamas

This section describes how to use Akamas

This guide introduces the optimization process and methodology with Akamas and then provides a step-by-step description of how to prepare, run and analyze Akamas optimization studies:

General optimization process
Preparing optimization studies
Running optimization studies

and also provides some technology-specific guidelines and examples on:

Modeling components

After identifying the components that are required to model a system, the following step is to model each identified key component.

Akamas provides the corresponding for their specific technology (and possibly version) and describing all the tunable parameters and metrics of interest. The full list of Akamas optimization packs is available on the o page of the Akamas reference guide.

The section of the reference guide describes the template required to define a system component, while the commands for creating a system component are listed on the page.

While the optimization process does not necessarily require component types and optimization packs to be defined, it is recommended to leverage this construct to facilitate modularization and reuse.

This is possible as the Akamas optimization pack model is extensible: custom optimization packs can be easily created without any programming to allow Akamas optimization capabilities to be applied to virtually any technology.

Managing optimization packs

Whether out-of-the-box or custom, before being used optimization packs need to be installed on an Akamas installation before being used.

Since optimization packs are global resources that are shared across all the workspaces on the same Akamas installation, an account with administrative privileges is required to manage them.

Optimization packs that are not yet installed are displayed as grayed out in the Akamas UI (this is the case for the AWS and Docker packs in the following figure).

An Akamas installation comes with the latest optimization packs already loaded in the store and is able to check the central repository for updates.

Creating automation workflows

After modeling the system and its components and ensuring that appropriate telemetry instances are defined, the following step (see the following figure) is to define a .

A workflow automates all the tasks to be executed in sequence (see the following figure) during the optimization study, in particular those leveraging integrations with external entities, such as telemetry providers or configuration management tools. Akamas provides a number of general-purpose and specialized workflow operators (see page).

The section of the reference guide describes the template required to define a workflow, while the commands for creating a workflow are listed on the page.

Since a workflow is an Akamas resource defined at the level and that can be used by multiple studies, it might be the case that a convenient workflow is already available or can be used to create a new workflow for the specific target system and integrations, by adding/removing some workflow tasks, changing the task sequence or the values assigned to task parameters.

Creating workflows for live optimizations

A workflow for a automates all the actions required to interface the configuration management. Notice that metrics collection is an implicit action that does not need to be coded as part of the workflow.

More in detail, a typical workflow includes the following types of tasks:

Applying the configuration, by preparing and then applying the parameter configuration that has been recommended and/or approved to the target environment - this may require interfacing configuration management tools or pushing configuration to a repository

Depending on the complexity of the system, the workflow might be composed by multiple actions of the same type, each operating on separate components of the target system.

As expected, with respect to

Defining optimization goal & constraints

The first fundamental step in creating a study is to define the study goal & constraints. While this step might be perceived as somewhat straightforward (e.g. constraints could be simply translated from SLOs already in place), defining the optimization goal really requires carefully balancing complexity and effectiveness, also as part of the general (iterative) optimization process. Please also read the Best Practices section here below.

In general, any performance engineering, tuning, and optimization activity involves complex tradeoffs among different - and potentially conflicting - goals and system performance metrics, such as:

Maximizing the business volume an application can support, while not making the single transaction slower or increasing errors above a desired threshold
Minimizing the duration of a batch processing task, while not increasing the cloud costs by more than 20% or using more than 8 CPUs

Akamas support all these (and other) scenarios by means of the optimization goal, that is the single metric or the formula combining multiple metrics that have to be either minimized or maximized, and one or more constraints among metrics of the system.

In general, constraints can be defined as either absolute constraints (e.g. app.response_time < 200 ms) or as relative constraints with respect to a baseline (e.g. app_response_time < +20% of the baseline), that is the current configuration in place, typically corresponding to the very first experiment in an offline optimization study which. Therefore, relative constraints are only applicable to offline optimization studies, while absolute constraints are applicable to both absolute and relative constraints.

Please notice that when defining constraints for an optimization study, it is required to also include those constraints listed in the Constraints section of the respective Optimization Packs which express internal constraints among parameters. For example, in case OpenJDK 11 components are to be tuned, the reference section is .

The page of the in the reference guide describes the corresponding structures. For offline optimization studies only, the Akamas UI allows the optimization goal and constraints to be defined as part of the visual procedure activated by the "Create a study" button (see the following figure).

Please notice that any experiment that does not respect the constraints is marked by Akamas as failed, even if correctly executed. The reason for this failure can be inspected in the experiment status. Similarly to workflow failures (see below), the Akamas AI engine automatically takes any failure due to constraint violations into account when searching the optimization space to identify the parameter configurations that might improve the goal metrics while matching constraints.

Best Practices

There are no general guidelines and best practices on how to best define goals & constraints, as this is where experience, knowledge, and processes meet.

Please refer to the section for a number of examples related to a variety of technologies and the guide for real-world examples.

Defining KPIs

While the optimization goal drives the Akamas AI toward optimal configurations, there might be other sub-optimal configurations of interest in case they do not simply match the optimization constraints but might also improve on some Key Performance Indicators (KPIs).

For example:

for a Kubernetes microservice Java-based application, a typical optimization goal is to reduce the overall (infrastructure or cloud) cost by tuning both Kubernetes and JVM parameters while keeping SLOs in terms of application response time and error rate under control
among different configurations that provide similar cost reduction in addition to matching all SLOs, a configuration that would also significantly cause the application response time might be worth considering with respect to an optimal configuration that does not improve on this KPI

Akamas automatically considers any metric referred to in the defined optimization goal and constraints for an offline optimization study as a KPI. Moreover, any other metrics of the system component can be specified as a KPI for an offline optimization study.

The page of the section in the reference guide describes how to define the corresponding structure. Specifying the KPIs can be done while first defining the study or from the Akamas UI, at either study creation time or afterward (see the following figures).

Once KPIs are defined, Akamas will represent the results of the optimization in the Insights section of the Akamas UI. Moreover, the corresponding suboptimal configuration associated with a specific KPI is highlighted in the Akamas UI by a textual badge "Best <KPI name>".

Please notice that KPIs can also be re-defined after an offline optimization study has been completed as their definition does not affect the optimization process, only the evaluation of its results. See the section and the page.

Defining optimization steps

A final step in defining an optimization study is to specify specifies the sequence of steps executed while running the study.

The following four types of steps are available:

Baseline: performs an experiment and sets it as a baseline for all the other ones
Bootstrap: imports experiments from other studies
Preset: performs an experiment with a specific configuration
Optimize: performs experiments and generates optimized configurations

Please notice that at least one baseline step is always required in any optimization study. This applies not only to offline optimization studies, but also to live optimization studies as it is being used to suggest changes to parameter values starting from the default values.

The page in the section in the reference guide describes how to define the corresponding structures for each of the different types of steps allowed by Akamas. For offline optimization studies only, the Akamas UI allows the optimization steps to be defined as part of the visual procedure activated by the "Create a study" button (see the following figure).

In addition to the best practices here below, please refer to the section for a number of examples related to a variety of technologies and the guide for real-world examples.

Best Practices

The following sections provide some best practices on how to best approach the step of defining the baseline step.

Ensure the baseline configuration is correct

In an optimization study, the baseline is an important experiment as it represents the system performance with the current configuration, and serves as a reference to assess the relative improvements the optimization achieved.

Therefore, it is important to make sure the baseline configuration of the study correctly reflects the current configuration - be it the vendor default or the result of a manual tuning exercise.

Evaluate which parameters to include in the baseline configuration

When defining the study baseline configuration it is important to evaluate which parameters to include. Indeed, several technologies have default values assigned to most of their configuration parameters. However, the runtime behavior can be different depending on whether the parameter is set to the default value or it is not set at all.

Therefore, it is recommended to review the current configuration (e.g. the one in place in production) and identify which parameters and values have been set (e.g. JVM maxHeapSize = 2GB, gcType = Parallel, etc.), and then to only set those parameters with their corresponding values, without adding any other parameters. This ensures that the specified baseline is consistent with the real production setup.

Setting safety policies

While Akamas leverages similar AI methods for both live optimizations and optimization studies, the way these methods are applied is radically different. Indeed, for optimization studies running in pre-production environments, the approach is to explore the configuration space by also accepting potential failed experiments, to identify regions that do not correspond to viable configurations. Of course, this approach cannot be accepted for live optimization running in production environments. For this purpose, Akamas live optimization uses observations of configuration changes combined with the automatic detection of workload contexts and provides several customizable safety policies when recommending configurations to be approved, revisited, and applied.

Akamas provides a few customizable optimizer options (refer to the options described on the page of the reference guide) that should be configured so as to make configurations recommended in live optimization and applied to production environments as safe as possible.

Exploration factor

Running optimization studies

Once all the preparatory steps for creating a study are done, running a study is straightforward: An optimization study can be started from either the Akamas UI (see the following figures) or the command line (refer to the page).

Before actually running an optimization study, it is highly recommended to read the following sections:

Before running optimization studies

The following provides some best practices that can be adopted before launching optimization studies, in particular for offline optimization studies.

Dry-running the optimization study

It is recommended to execute a dry-run of the study to verify that the workflow works as expected and in particular that the telemetry and configuration management steps are correctly executed.

Verify that workflow actually works

It is important to verify that all the steps of the workflow complete successfully and produce the expected results.

Analyzing results of offline optimization studies

Since an offline optimization study lasts for at most the number of configured experiments and typically runs in a test or pre-production environment, results could be safely either analyzed after the study has completely finished.

However, it is a good practice to analyze partial results while the study is still running as this may provide useful insights about both the system being optimized (e.g. understanding of the system dynamics and sub-optimal configurations that could be immediately applied) and about the optimization study itself (e.g. how to re-design a workflow or change constraints), early-on.

The Akamas UI displays the results of an offline optimization study in different visual areas:

the Best Configuration section provides the optimal configuration identified by Akamas, as a list of recommended values for the optimization parameters compared to the baseline and ranked according to their relevance;

the Progress tab see the following figures) displays the progression of the study with respect to the study steps, the status of each experiment (and trial), its associated score, and the parameter values of the corresponding configurations; this area is mostly used for study monitoring (e.g. identifying failing workflows) and troubleshooting purposes;

the Analysis tab (see the following figures) displays how the baseline and experiments score with respect to the optimization goal, and the values of metrics and parameters for the corresponding configurations; this area supports the analysis of the different configurations;

the Metrics tab (see the following figure) displays the behavior of the metrics for all executed experiments (and trials); this area supports both study validation activities and deeper analysis of the system behavior;

the Insights section (see the following figure) displays any suboptimal configurations that have been identified for the study KPIs, and also allows making comparisons among them and the best configuration - the page describes in further detail the Insight section and the insights tags displayed in other areas of the Akamas UI.

Analyzing results of live optimization studies

Even for live optimization studies, it is a good practice to analyze how the optimization is being executed with respect to the defined goal & constraints, and workloads.

This analysis may provide useful insights about the system being optimized (e.g. understanding of the system dynamics) and about the optimization study itself (e.g. how to adjust optimizer options or change constraints). Since this is more challenging for an environment that is being optimized live, a common practice to adopt a recommendation mode before possibly switching to a fully autonomous mode.

The Akamas UI displays the results of an offline optimization study in the following areas:

the Metrics section (see the following figures) displays the behavior of the metrics as configurations are recommended and applied (possibly after being reviewed and approved by users); this area supports the analysis of how the optimizer is driven by the configured safety and exploration factors.

Before applying optimization results

The following best practices should be considered before applying a configuration identified by an offline optimization study from a test or pre-production environment to a production environment.

Most of these best practices are general and refer to any configuration change and application rollout, not only to Akamas-related scenarios.

Validating the study results

Any configuration identified by Akamas in a test or pre-production environment, by executing a number of experiments and trials in a limited timeframe, should be first validated before being promoted to production in its ability to consistently deliver the expected performance over time.

Guidelines for choosing optimization parameters

In this section, some guidelines on how to choose optimization parameters are provided for the following specific technologies:

Kubernetes
JVM (OpenJDK)
JVM (OpenJ9)

These guidelines also provide an example of how to approach the selection of parameters (and how to define the associated domains and constraints) in an optimization study.

Guidelines for PostgreSQL

Suggested Parameters

When running a PostgreSQL optimization, consider starting from these recommendations:

Parameter

Recommendation

pg_max_connections

Performing load testing to support optimization activities

This page provides a short compendium of general performance engineering best practices to be applied in any load testing exercise. The focus is on how to ensure that realistic performance tests are designed and implemented to be successfully leveraged for optimization initiatives.

The goal of ensuring realistic performance tests boils down to two aspects:

sound test environments;
realistic workloads.

Test environments

A test o the pre-production environment (Test Env from now on) needs to represent as closely as possible the production environment (ProdEnv from now on).

The most representative test environment would be a perfect replica of the production environment from both infrastructure (hardware) and architecture perspectives. The following criteria and guidelines can help design a TestEnv that is suitable for performance testing supporting optimization initiatives.

Hardware specifications

The hardware specifications of the physical or virtual servers running in TestEnv and ProdEnv must be identical. This is because any differences in the available resources (e.g. amount of RAM) or specification (e.g. CPU vendor and/or type) may affect both services performance and system configuration.

This general guideline can only be relaxed for servers/clusters running container(s) or container orchestration platforms (e.g. Kubernetes or OpenShift). Indeed, it is possible to safely execute most of the related optimization cases if the TestEnv guarantees enough spare/residual capacity (number of cores or amount of RAM) to allocate all the needed resources.

While for monolithic architectures this may translate into significant HW requirements, with microservices this might not be the case, for two main reasons:

microservices are typically smaller than monoliths and designed for horizontal scalability: this means that optimizing the configuration of the single instance (pod/container resources and runtime settings) becomes easier as they typically have smaller HW requirements;
approaches like Infrastructure-as-code (IaaC), typically used with cloud-native applications, allow for easily setting up cluster infrastructure (on-prem or on the cloud) that can mimic production environments.

Downscaled/downsized architecture

Test Envs are typically downscaled/downsized with respect to Prod Envs. If this is the case, then optimizations can be safely executed provided it is possible to generate a "production-like" workload on each of the nodes/elements of the architecture.

This can be usually achieved if all the architectural layers have the same scale ratio between the two environments and the generated workload is scaled accordingly. For example, if the ProdEnvs has 4 nodes at the front-end layer, 4 at the backend layer, and 2 at the database layer, then a TestEnv can have 2 nodes, 2 nodes, and 1 node respectively.

Load balancing among nodes

From a performance testing perspective, the existence of a load balancing among multiple nodes can be ignored, if the load balancing relies on an external component that ensures a uniform distribution of the load across all nodes.

On the contrary, if an application-level balancing is in place, it might be required to include at least two nodes in the testing scenario so as to take into account the impact of such a mechanism on the performance of the cluster.

External/downstream services

The TestEnv should also replicate the application ecosystem, including dependencies from external or downstream services.

External or downstream services should emulate the production behavior from both functional (e.g. response size and error rate) and performance (e.g. throughput and response times) perspectives. In case of constraints or limitations on the ability to leverage external/downstream services for testing purposes, the production behavior needs to be simulated via stubs/mock services.

In the case of microservices applications, it is also required to replicate dependencies within an application. Several approaches can be taken for this purpose, such as:

replicating interacting microservices;
mocking these microservices and simulating realistic response times using simulation tools such as ;
disregarding dependencies with nonrelevant services (e.g. a post-processing service running on a mainframe whose messages are simply left published in a queue without being dequeued).

Test cases

The most representative performance test script would provide 100% coverage of all the possible test cases. Of course, this is very unlikely to be the case in performance testing. The following criteria and guidelines can be considered to establish the required test coverage.

Statistical relevance

The test cases included in the test script must cover at least 80% of the production workload.

Business relevance

The test cases included in the test script must cover all the business-critical functionalities that are known (or expected) to represent a significant load in the production environment

Technical relevance

The test cases included in the test script must cover all the functionalities that at the code level involve:

Large objects/data structure allocation and management
Long living objects/data structure allocation and management
Intensive CPU, data, or network utilization
"one of-a-kind" implementations, such as connections to a data source, ad-hoc objects allocation/management, etc.

Test user paths and behavior

The virtual user paths and behavior coded in the test script must be representative of the workload generated by production users. The most representative test script would account for the production users in terms of a mix of the different user paths, associated think times, and session length perspectives.

When single-user paths cannot be easily identified, the best practice is to consider each of them the most comprehensive user journey. In general, a worst-case approach is recommended.

The task of reproducing realistic workloads is easier for microservice architectures. On the contrary, for monolithic architectures, this task could become hard as it may not be easy to observe all of the workloads, due to custom frameworks, etc. With microservices, the workload can be completely decomposed in terms of APIs/endpoints and APM tools can provide full observability of production workload traffic and performance characteristics for each single API. This guarantees that the replicated workload can reproduce the production traffic as closely as possible.

Test data

Both test script data, that is datasets used in the test script, and test environment data, that is datasets in any involved databases/datastores, have to be characterized both in terms of size and variance so as to reproduce the production performances.

Test script data

The test script data has to be characterized in order to guarantee production-like performances (e.g. cache behavior). In case this characterization is difficult, the best practice is to adopt a worst-case approach.

Test environment data

The test data must be sized and have an adequate variance to guarantee production like performances in the interaction with databases/datastores (e.g. query response times).

Test scenarios

Most performance test tools provide the ability to easily define and modify the test scenarios on top of already defined test cases/scripts, test case-mix, and test data. This is especially useful in the Akamas context where it might be required to execute a specific test scenario, based on the specific optimization goal defined. The most common (and useful, in the Akamas context) test scenarios are described here below.

Load tests

A load test aims at measuring system performance against a specified workload level, typically the one experienced or expected in production. Usually, the workload level is defined in terms of virtual user concurrency or request throughput.

In the load test, after an initial ramp-up, the target load level is maintained constant for a steady state until the end of the test.

When validating a load test, the following two key factors have to be considered:

The steady-state concurrency/throughput level: a good practice is to apply a worst-case approach by emulating at least 110% of the production throughput;
The steady-state duration: in general defining the length for steady-state is a complex task because it is strictly dependent on the technologies under test and also because phenomena such as bootstraps, warm-ups, and caching can affect the performance and behavior of the system only before or after a certain amount of time; as a general guide to validate the steady-state duration, it is useful to:
1. execute a long-run test by keeping the defined steady-state for at least 2h to 3h;

Stress tests

A Stress test is all about pushing the system under test to its limit.

Stress tests are useful to identify the maximum throughput that an application can cope with while working within its SLOs. Identifying the breaking point of an application is also useful to highlight the bottleneck(s) of the application.

A stress test also makes it possible to understand how the system reacts to excessive load, thus validating the architectural expectations. For example, it can be useful to discover that the application crashes when reaching the limit, instead of simply enqueuing requests and slowing down processing them.

Endurance tests

An endurance test aims at validating the system's performance over an extended period of time.

Validating tests vs production

The first validation is provided by utilization metrics (e.g. CPU, RAM, I/O), which should closely display in the test environments the same behavior of production environments. If the delta is significant, some refinements of the test case and environment might be required to close the gap and gain confidence in the test results.