Only this pageAll pages
Powered by GitBook
Couldn't generate the PDF for 285 pages, generation stopped at 100.
Extend with 50 more pages.
1 of 100

3.2.2

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Installing Akamas

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Deployment

Akamas is an on-premise product running on a dedicated machine within the customer environment:

  • on a virtual or physical machine in your data center

  • on a virtual machine managed running on a cloud, by any cloud provider (e.g. AWS EC2)

  • on your own laptop

Akamas also provides a Free Trial option which can be requested .

here

How to use this documentation

This page is intended as your entry point to the Akamas documentation.

Getting started with Akamas

This guide introduces Akamas and covers various fundamental topics such as licensing and deployment models, security topics, and maintenance & support services.

Maintenance & Support (M&S) Services

This page is intended as a first introduction to Akamas Maintenance & Support (M&S) Services.

Please refer to the specific contract in place with your Company.

Akamas M&S Services include:

Akamas M&S Services do not include any installation and upgrade services, creation of any custom optimization packs, telemetry providers, or workflow operators, or implementation of any custom features and integrations that are not provided out-of-the-box by the Akamas products.

It is recommended to read this guide before moving to other guides on how to install, integrate, and use Akamas. The section of the Reference guide can help in reviewing Akamas key concepts.

access to Software versions released as major and minor versions, service packs, patches, and hotfixes according to .

assistance from for inquiries about the Akamas product and issues encountered while using Akamas products where there is a reasonable expectation that issues are caused by Akamas products, according to

Glossary
Support levels for software versions
Akamas Customer Support
Support levels for Customer Support Services

  • provides a very first introduction to AI-powered optimization

  • covers Akamas licensing, deployment, security topics

  • describes Akamas maintenance and support services.

This guide provides some preliminary knowledge required to puchaise, implement and use Akamas.

User personas: All roles

  • describes the Akamas architecture

  • provides the hardware, software and network prerequisites

  • describes the steps to install an Akamas Server and CLI

This guide provides the knowledge required to install and manage an Akamas installation.

User personas: Akamas Admin

  • describes the Akamas optimization process and methodology

  • provides guidelines for optimizing some specific technologies

  • provides examples of optimization studies

This guide provides the methodology to define an optimization process and knowledge to leverage Akamas

User personas: Analyst / Practicioner teams

  • describes how to integrate Akamas with the telemetry providers and configuration management tools

  • describes how to integrate Akamas with load testing tools

  • describes how to integrate Akamas with CI/CD tools

This guide provides the knowledge required to integrate Akamas with the ecosystem

User personas: Akamas Admin, DevOps team

  • provides a glossary of Akamas key concepts with references to construct templates and commands

  • provides a reference to Akamas construct templates

  • provides a reference to Akamas command-line commands

  • describes Akamas optimization packs and telemetry providers

User personas: Akamas Admin, DevOps team, Analyst / Practicioner teams

  • describes how to setup a test environment for experimenting with Akamas

  • describes how to apply the Akamas approach to the optimization of some real-world cases

  • provides examples of Akamas templates and commands for the real-world cases

User personas: Analyst / Practicioner teams

Getting started with Akamas
Installing Akamas
Using Akamas
Integrating Akamas
Akamas Reference
Knowledge Base

Cloud Hosting

Refer to your Cloud Provider website for information about cloud hosting options and related cost information.

AWS EC2

For AWS EC2 costs visit the and use the to estimate the cost for your architecture.

EC2 Pricing page
AWS Pricing Calculator

Customer Support Services

Akamas Customer Support Services are delivered by Akamas support engineers, also called Support Agents, who will work remotely with Customer to provide a temporary remedy for the incident and, ultimately, a permanent resolution. Akamas Support Agents automatically escalate issues to the appropriate technical group within Akamas and notify Customers of any relevant progress. Akamas provides Customers with the ability to escalate issues when appropriate.

Please notice that Customer Support services are not to be considered as alternatives to product documentation and training, or to professional and consulting services, so adequate knowledge of Akamas products is assumed when interacting with Akamas Customer Support. Thus, during the resolution of a reported issue Support Agents may redirect Customer to training or professional services (that are not part of the scope of this service).

Licensing

Software Licenses

Maintenance & Support Services

Other billable services

Akamas software licensing model is subscription-based (typically on a yearly basis). For more information on Akamas' cost model and software licensing costs, please contact .

Akamas software licenses include which also include access to .

Akamas also provides optional professional services for deployment, training, and integration activities. For more information about Akamas professional services, please contact .

info@akamas.io
info@akamas.io
Customer Support Services
Maintenance & Support Services

Security

Akamas takes security seriously and provides enterprise-grade software where customer data is kept safe at all times. This page describes some of the most important security aspects of Akamas software and information related to process and tools used by the Akamas company (Akamas S.p.A) to develop its software products.

Information managed by Akamas

Akamas manages the following types of information:

  • System configuration and performance metrics: technical data related to the systems being optimized. Examples of such data include the number of CPUs available in a virtual machine or the memory usage of a Java application server;

  • User accounts: accounts assigned to users to securely access the Akamas platform. For each user account, Akamas currently requires an account name and a password. Akamas does not collect any other personal identifying information;

  • Service Credentials: credentials used by Akamas to automate manual tasks and to integrate with external tools. In particular, Akamas leverages the following types of interaction:

    • Integration with monitoring and orchestration tools, e.g. to collect IT performance metrics and system configuration. As a best practice, Akamas recommends using dedicated service accounts with minimal read-only privileges.

    • Integration with the target systems to apply changes to configuration parameters. As a best practice, Akamas recommends using dedicated service accounts with minimal privileges to read/write identified parameters.

GDPR compliance

Akamas is a fully GDPR compliant product.

Akamas is a company owned by the Moviri Group. The Moviri Group and all its companies are fully compliant with GDPR. Moviri Group Data Privacy Policy and Data Breach Incident Response Plan which apply to all the owned companies can be requested from Akamas Customer Support.

Security certifications

Akamas is an on-premises product and does not transmit any data outside the customer network. Considering the kind of data that is managed within Akamas (see section "Which information is managed by Akamas"), specific security certifications like PCI or HIPAA are not required as the platform does not manage payment or health-related information.

Data encryption

Akamas takes the need for security seriously and understands the importance of encrypting data to keep it safe at-rest and in-flight.

In-Flight encryption

All the communications between Akamas UI and CLI and the back-end services are encrypted via HTTPS. The customer can configure Akamas to use customer-provided SSL certificates in all communications.

Communications between Akamas services and other integrated tools within the customer network rely on the security configuration requirements of the integrated tool (e.g. HTTPS calls to interact with REST services).

At-Rest encryption

Akamas is an on-premises product and runs on dedicated virtual machines within the customer environment. At-Rest Encryption can be achieved following customer policies and best practices, for example leveraging operating system-level techniques.

Akamas also provides an application-level encryption layer aimed at extending the scope of at-Rest encryption. With this increased level of security, sensitive data managed by Akamas (e.g. passwords, tokens, or keys required to interact with external systems) are safely stored in Akamas databases using industry-standard AES 256-bit encryption.

Encryption option for Akamas on EC2

In case of Akamas hosted on an AWS machine you may optionally create an EC2 instance with an encrypted EBS volume before installing OS and Akamas, in order to achieve a higher level of security.

Password management

Password Security

Passwords are securely stored using a one-way hash algorithm.

Password complexity

Akamas comes with a default password policy with the following requirements:

  • have a minimum length of 12 characters.

  • contains at least 1 uppercase and 1 lowercase character.

  • contains at least 1 special character.

  • is different from the username.

  • must be different from the last password set.

Customers can modify this policy by providing a custom one that matches their internal security policies.

Password rotation

Akamas enforces no password rotation mechanism.

Credential storage

  • When running on a Linux installation with KDE's KWallet enabled or with GNOME's Keyring enabled, the credentials will be stored in the default wallet/keyring.

  • When running on Windows, the credential will be stored in Windows Credential Locker.

  • When running on a macOS, the credential will be stored in Keychain.

  • When running on a Linux headless installation, the credentials will be stored in CLEAR TEXT in a file in the current Akamas configuration folder.

Resources visibility model

Akamas provides fine granularity control over resources managed within the platform. In particular, Akamas features two kinds of resources:

  • Workspace resources: entities bound to one of the isolated virtual environments (named workspaces) that can only be accessed in reading or writing mode by users to whom the administrators explicitly granted the required privileges. Such resources typically include sensitive data (e.g. passwords, API tokens). Examples of such resources include the system to be optimized, the set of configurations, optimization studies, etc.

  • Shared resources: entities that can be installed and updated by administrators and are available to all Akamas users. Such resources only contain technology-related information (e.g. the set of performance metrics for a Java application server). Examples of such resources include Optimization Packs, which are libraries of technology components that Akamas can optimize, such as a Java application server.

Akamas Logs

Akamas logs traffic from UI and APIs. Application level logs include user access via APIs and UI and any action taken by Akamas on integrated systems.

Akamas logs are retained on the dedicated virtual machine within the customer environment, by default, for 7 days. The retention period can be configured according to customer policies. Logs can be accessed either via UI or via log dump within the retention period. Additionally, logs have a format that can be easily integrated with external systems like log engines and SIEM to support forensic analysis.

Code scanning policy

Akamas is developed according to security best practices and the code is scanned regularly (at least daily).

The Akamas development process leverages modern continuous integration approaches and the development pipeline includes SonarQube, a leading security scanning product that includes comprehensive support for established security standards including CWE, SANS, and OWASP. Code scanning is automatically triggered in case of a new build, a release, and every night.

Vulnerability scanning and patch management policy

Akamas features modern micro-service architecture and is delivered as a set of docker containers whose images are hosted on a private Elastic Container Registry (ECR) repository on the AWS cloud. Akamas leverages the vulnerability scanning capabilities of AWS ECR to identify vulnerabilities within the product container images. AWS ECR uses the Common Vulnerabilities and Exposures (CVEs) database from the open-source Clair project.

If a vulnerability is detected, Akamas will perform a security assessment of the security risk in terms of the impact of the vulnerability, and evaluate the necessary steps (e.g. dependency updates) required to fix the vulnerability within a timeline related to the outcome of the security assessment.

After the assessment, the vulnerability can be fixed either by recommending the upgrade to a new product version or by delivering a patch or a hotfix for the current version.

Support levels with Akamas

Based on the , the following table describes the level of support of the Akamas versions after the version 3.2 GA date (2023 May, 1st).

Version
Support Level

3.2

Full Support

Notice: this will change once the following major version is released

3.1

Full Support

Notice: this will change once the following major version is released

3.0

Full Support

Notice: this will change once the following major version is released

2.x

Limited Support until 12 months after 3.0 GA date, that is 2023 September, 13th (see )

1.x

No Support

Support levels for software versions
Support Levels with Akamas 3.0

Introduction to Akamas

A quick introduction to Akamas

Akamas is the AI-powered optimization platform designed to maximize service quality and cost efficiency without compromising on application performance. Akamas supports both production environments under live, dynamic workloads, and in test/pre-production environments against any what-if scenario and workload.

Thanks to Akamas, performance engineers, DevOps, CloudOps, FinOps and SRE teams can keep complex applications, such as Kubernetes microservices applications, optimized to avoid any unnecessary cost and any performance risks.

Akamas Optimization platform

The Akamas optimization platform leverages patented AI techniques that can autonomously identify optimal full-stack configurations driven by any custom-defined goals and constraints (SLOs), without any human intervention, any agents, and any code or byte-code changes.

Akamas optimal configurations can be applied either i) under human approval (human-in-the-loop mode) or ii) automatically, as a continuous optimization step in a CI/CD pipeline (in-the-pipe) or iii) autonomously by Akamas (autopilot).

Akamas coverage

Akamas can optimize any system with respect to any set of parameters chosen from the application, middleware, database, cloud, and any other underlying layers.

Akamas provides dozens of out-of-the-box Optimization Packs available for key technologies such as JVM, Go, Kubernetes, Docker, Oracle, MongoDB, ElasticSearch, PostgreSQL, Spark, AWS EC2 and Lambda, and more. Optimization Pack provides parameters, relationships, and metrics to accelerate the optimization process setup and support company-wide best practices. Custom Optimization Packs can be easily created without any coding.

The following figure is illustrative of Akamas coverage for both managed technologies and integrated components of the ecosystem.

Akamas integrations

Akamas can integrate with any ecosystem thanks to out-of-the-box and custom integrations with the following components:

  • telemetry & monitoring tools and other sources of KPIs and cost data, such as Dynatrace, Prometheus, CloudWatch, and CSV files

  • configuration management tools, repositories and interfaces to apply configurations, such as Ansible, Openshift, and Git

  • value stream delivery tools to support a continuous optimization process, such as Jenkins, Dynatrace Cloud Automation, and GitLab

  • load testing tools to generate simulated workloads in test/pre-production, such as LoadRunner, NeoLoad, and JMeter

Akamas has been designed around Infrastructure-as-Code (IaC) and DevOps principles. Thanks to a comprehensive set of APIs and integration mechanisms, it is possible to extend the Akamas optimization platform to manage any system and integrate with any ecosystem.

Use Cases

Akamas optimization platform supports a variety of use cases, including:

  • Improve Service Quality: optimize application performance (e.g. maximize throughput, minimize response time and job execution time) and stability (lower fluctuations and peaks);

  • Increase Business Agility: identify resource bottlenecks in early stages of the delivery cycle, avoid delays due to manual remediations - release higher quality services and reduce production incidents;

  • Increase Service Resilience: improve service resilience under higher workloads (e.g. expected business growth) or failure scenarios identified by chaos engineering practices - improve SRE practice;

  • Reduce IT Cost / Cloud Bill: reduce on-premise infrastructure cost and cloud bills due to resource over-provisioning - improve cost efficiency of Kubernetes microservices applications;

  • Optimize Cloud Migration: safely migrate on-premise applications to cloud environments for optimal cost efficiency evaluate options to migrate to managed services (e.g. AWS Fargate);

  • Improve Operational Efficiency: save engineering time spent on manual tuning tasks and enable Performance Engineering teams to do more in less time (and with less external consulting).

Support levels for software versions

Different levels of support are provided for software versions of Akamas products, starting from its general availability (GA) date, and depending on the release of following software versions.

Version Numbering

Akamas adopts a three-place numbering scheme MA.MI.SP to designate released versions of its Software, where:

  • MA is the Major Version

  • MI is the Minor Version

  • SP is the Service Pack or Patch number

Support levels

The following table describes the three levels of support for a software version.

Support level
Description

Full Support

Akamas provides full support for one previous (either major or minor) version in addition to the latest available GA version.

For Software version in Full Support level: Akamas Support Agents provide service packs, patches, hotfixes, or workarounds to make the Software operate in substantial conformity with its then-current operating documentation.

Limited Support

Following the Full Support period, Akamas provides Limited Support for additional 12 months.

For Software versions in Limited Support level:

  • No new enhancements will be made to a version in "Limited Support" Akamas Support Agents will direct Customers to existing fixes, patches, or workarounds applicable to the reported case, if any;

  • Akamas Support Agents will provide hot fixes for problems of high technical impact or business exposure for customers;

  • Based on Customer input, Akamas Support Agents will determine the degree of impact and exposure and the consequent activities;

  • Akamas Support Agents will direct Customers to upgrade to a more current version of the Software.

No Support

Following the Limited Support period, Akamas provides no support for any Software version.

For Software versions in No Support level: No new maintenance releases, enhancements, patches, or hot fixes will be made available. Akamas Support Agents will direct Customers to upgrade to a more current version of the Software.

End-of-Life (EOL)

At any time, Akamas reserves the right to "end of life" (EOL) a software product and to terminate any Maintenance & Support Services for such product, provided that Licensor has notified the Licensee at least 12 months prior to the above-mentioned termination.

The period of time occurring between the "end of life" notification and the actual termination of Maintenance & Support Services is provided as follows:

  • No new enhancements will be introduced.

  • No enhancements will be made to support new or updated versions of the platform on which the product runs or which it integrates.

  • New hotfixes for problems of high technical impact or business exposure for customers may still be developed. Based on customer input, Akamas Support Agents will determine the degree of impact and exposure and the consequent activities.

  • Reasonable efforts will be done to inform the Customer of any fixes, service packs, patches, or workarounds applicable to the reported case if any.

Support levels for Customer Support Services

Akamas Customer Support Services provides different standard levels of support. Please verify the level of support specified in the contract in place with your Company.

Severity levels

The following table describes the different severity levels for Customer Support.

Severity level
Description
Impact

S1

Blocking: production Customer system is severely impacted.

Notice: this severity level only applies to production environments

Catastrophic business impacts (e.g. complete loss of a core business process and work cannot reasonably continue (e.g. all final users are unable to access the Customer application)

S2

Critical: one major Akamas functionality is unavailable

Significant loss or degradation of the Akamas services (e.g. Akamas is down or Akamas is not generating recommendations)

S3

Severe: limitation in accessing one major Akamas functionality

Moderate business impact and moderate loss or degradation of services, but work can reasonably continue in an impaired manner (e.g. only some specific functions are not working properly)

S4

Informational: Any other request

Minimum business impact.

Substantially functioning with minor or no impediments of services.

Support conditions

The contract in place with the Customer specifies the level of support provided by Akamas Agents, according at least to the following items:

  • Maximum number of support seats: this is the maximum number of named users within the Customer organization who can request Akamas Customer Support.

  • Language(s): these are the languages that can be used for interacting with Akamas Support Agents - the default is English.

  • Channel(s): these are the different communication channels that can be used to interact with Akamas Agents - these may include one or more options among web ticketing, email, phone, and Slack channel.

  • Max Initial Response Time: this refers to the time interval occurring from the time a request is opened by Customer to Customer Support and the time a Support Agent responds with a first notification (acknowledgment).

  • Severity: this is the level of severity associated with a reported issue, which initially corresponds to the severity level originally indicated by the Customer. Notice that the severity level may change, for example as new information becomes available or if Support Agents and Customer agree to re-evaluate it. Please notice that the severity level may be downgraded by Support Agents if Customer is not able to provide adequate resources or responses to enable Akamas to continue with its resolution efforts.

  • Initial Remedy: this refers to any operation aimed at addressing a reported issue by restoring a minimal level of operations, even if it may cause some performance degradation of the Customer service or operations. A workaround is to be considered a valid Initial Remedy.

Please notice that Support Agents may refuse to serve a service request to Customer Support either in case Customer does not have a valid Maintenance & Support subscription or in case the above-mentioned conditions or other conditions stated in the contract in place are not met. In any case, the Customer is expected to provide all the information required by Support Agent in order to serve service requests Customer Support.

Architecture

Akamas is based on a microservices architecture where each service is deployed as a container and communicates with other services via REST APIs. Akamas can be deployed on a dedicated machine (Akamas Server) or on a Kubernets cluster.

The following figure represents the high-level Akamas architecture.

Interact with Akamas

Users can interact with Akamas via either the Graphical User Interface (GUI), Command-Line Interface (CLI), or via Application Programmatic Interface (API).

Both the GUI and CLI leverage HTTP/S APIs which pass through an API gateway (based on Kong), which also takes care of authenticating users by interacting with Akamas access management and routing requests to the different services.

The Akamas CLI can be invoked on either the Akamas Server itself or on a different machine (e.g. a laptop or another server) where the Akamas CLI has been installed.

Repositories

Akamas data is securely stored in different databases:

  • time series data gathered from telemetry providers are stored in Elasticsearch;

  • application logs are also stored in Elasticsearch;

  • data related to systems, studies, workflows, and other user-provided data are stored in a Postgres database.

Notice: both Postgres and Elasticsearch and any other service included within Akamas are provided by Akamas as part of the Akamas installation package.

Services

Core Services

The following Spring-based microservices represent Akamas core services:

  • System Service: holds information about metrics, parameters, and systems that are being optimized

  • Campaign Service: holds information about optimization studies, including configurations and experiments

  • Metrics Service: stores raw performance metrics (in Elasticsearch)

  • Analyzer Service: automates the analysis of load tests and provides related functionalities such as smart windowing

  • Telemetry Service: takes care of integrating different data sources by supporting multiple Telemetry Providers

  • Optimizer Service: combines different optimization engines to generate optimized configurations using ML techniques

  • Orchestrator Service: manages the execution of user-defined workflows to drive load tests

  • User Service: takes care of user management activities such as user creation or password changes

  • License Service: takes care of license management activities, optimization pack, and study export.

Ancillary Services

Akamas also provides advanced management features like logging, self-monitoring, licensing, user management, and more.

Prerequisites

Before installing the Akamas Server please make sure to review all the following requirements:

Hardware requirements
Software requirements
Network requirements

Offline installation mode

Akamas is deployed as a set of containerized services running on Docker and managed via Docker Compose. In the offline installation mode, the latest version of the Akamas Docker Compose file and all the images required by Docker cannot be downloaded from the AWS ECR repository.

Get Akamas Docker artifacts

Get in contact with Akamas Customer Services to get the latest versions of the Akamas artifacts uploaded to a location of your choice on the dedicated Akamas Server.

Akamas installation artifacts will include:

  • images.tar.gz: a tarball containing Akamas main images.

  • docker-compose.yml: docker-compose file for Akamas.

  • akamas: the binary file of the Akamas CLI that will be used to verify the installation.

Import Docker images

A preliminary step in the offline installation mode is to import the shipped Docker images by running the following commands in the same directory where the tar files have been stored:

cd <your bundle files location>
docker image load -i images.tar.gz

Notice that this import procedure could take quite some time!

Configure Akamas environment variables

To configure Akamas, the following environment variables are required to be set:

  • AKAMAS_CUSTOMER: the customer name matching the one referenced in the Akamas license.

  • AKAMAS_BASE_URL: the endpoint in the Akamas APIs that will be used to interact with the CLI, typically http://<akamas server dns address>:8000

Environment variables creation is performed by the snippet below:

# add double quotes ("xx xx") if the name contains white spaces
export AKAMAS_CUSTOMER=<your name or your organization name>
export AKAMAS_BASE_URL=http://<akamas server DNS address>:8000

It is recommended to save these exported variables in your ~/.bashrc file for convenience.

Run installation

To start Akamas you can now simply navigate into the akamas folder and run a docker-compose command as follows:

cd <your docker-compose file location>
docker compose up -d

You may get the following error:

Error saving credentials: error storing credentials - err: exit status 1, out: Cannot autolaunch D-Bus without X11 $DISPLAY
  • Ubuntu

sudo apt-get install -y pass
  • RHEL

yum install pass

This is a documented docker bug (see ) that can be solved by installing the "pass" package:

this link

Install the Akamas Server

Akamas is deployed as a set of containerized services running on Docker and managed via Docker Compose. The latest version of the Akamas Docker Compose file and all the images required by Docker can be downloaded from the AWS ECR repository.

Two installation modes are available:

Online installation mode

Akamas is deployed as a set of containerized services running on Docker and managed via Docker Compose. In the online installation mode, the latest version of the Akamas Docker Compose file and all the images required by Docker can be downloaded from the AWS ECR repository.

Get Akamas Docker artifacts

It is suggested first to create a directory akamas in the home directory of your user, and then run the following command to get the latest compose file:

cd ~
mkdir akamas
cd akamas
curl -O https://s3.us-east-2.amazonaws.com/akamas/compose/$(curl https://s3.us-east-2.amazonaws.com/akamas/compose/stable.txt)/docker-compose.yml

Configure Akamas environment variables

To configure Akamas, you should set the following environment variables:

  • AKAMAS_CUSTOMER: the customer name matching the one referenced in the Akamas license.

  • AKAMAS_BASE_URL: the endpoint in the Akamas APIs that will be used to interact with the CLI, typically http://<akamas server dns address>:8000

You can export the variables using the following snippet:

# Add double quotes ("xx xx") if the name contains white spaces
export AKAMAS_CUSTOMER=<your name or your organization name>
export AKAMAS_BASE_URL=http://<akamas server DNS address>:8000

It is recommended to save these exported variables in your ~/.bashrc file for convenience.

Start Akamas

In order to login into AWS ECR and pull the most recent Akamas container images you also need to set the AWS authentication variables to the appropriate values provided by Akamas Customer Support Services by running the following command:

export AWS_ACCESS_KEY_ID=<your access key id>
export AWS_SECRET_ACCESS_KEY=<your secret access key>
export AWS_DEFAULT_REGION=us-east-2

At this point, you can start installing Akamas server by running the following AWS CLI commands:

aws ecr get-login-password --region us-east-2 | docker login -u AWS --password-stdin https://485790562880.dkr.ecr.us-east-2.amazonaws.com
docker compose up -d

, in case the Akamas Server has access to the Internet - is also supported.

, in case the Akamas Server does not have access to the Internet.

In case the Akamas Server is behind a proxy server please also read how to .

online installation mode
installation behind a proxy server
offline installation mode
setup Akamas behind a Proxy

Network requirements

This section lists all the connectivity settings required to operate and manage Akamas

Internet access

Internet access is required for Akamas online installation and updated procedures and allows retrieving the most updated Akamas container images from the Akamas private Amazon Elastic Container Registry (ECR).

If internet access is not available for policies or security reasons, Akamas installation and updates can be executed offline.

Internet access from the Akamas server is not mandatory but it’s strongly recommended.

Ports

The following table provides a list of the ports on the Akamas server that have to be reachable by Akamas administrators and users to properly operate the system.

Source

Destination

Port

Reason

Akamas admin

Akamas server

22

ssh

Akamas admin/user

Akamas server

80, 443

Akamas web UI access

Akamas admin/user

Akamas server

8000, 8443

Akamas API access

In the specific case of AWS instance and customer instances sharing the same VPC/Subnet inside AWS, you should:

  • open all of the ports listed in the table above for all inbound URLs (0.0.0.0/32) on your AWS security group

  • open outbound rules to all traffic and then attach this AWS security group (which must reside inside a private subnet) to the Akamas machine and all customer application AWS machines

Software Requirements

Operating System

The following table provides a list of the supported operating systems and their versions.

Operating System

Version

Ubuntu Linux

18.04+

CentOS

7.6+

RedHat Enterprise Linux

7.6+

On RHEL systems Akamas containers might need to be run in privileged mode depending on how Docker was installed on the system.

Software packages

The following table provides a list of the required Software Packages (also referred to as Akamas dependencies) together with their versions.

Software Package

Notes

Docker

Akamas is deployed as a set of containerized services running on Docker. During its operation, Akamas launches different containers so access to the docker socket with enough permissions to run the container is required.

Docker Compose

Akamas containerized services are managed via Docker Compose. Docker compose is usually already shipped with Docker starting from version 23.

AWS CLI

Akamas container images are published in a private Amazon Elastic Container Registry (ECR) and are automatically downloaded during the online installation procedure.

AWS CLI is required only during the installation phase if the server has internet access and can be skipped during an offline installation.

The exact version of these prerequisites is listed in the following table:

Software Package

Ubuntu

CentOS

RHEL

Docker

19.03+

1.13+

1.13+

Docker Compose

2.7.0+

2.7.0+

2.7.0+

AWS CLI

2.0.0+

2.0.0+

2.0.0+

Akamas user

To install and run Akamas it is recommended to create a dedicated user (usually "akamas"). The Akamas user is not required to be in the sudoers list but can be added to the docker (dockeroot) group so it can run docker and docker-compose commands.

Make sure that the Akamas user has the read, write, and execute permissions on /tmp. If your environment does not allow writing to the whole /tmp folder, please create a folder /tmp/build and assign read and write permission to the Akamas user on that folder.

Install Akamas dependencies

While some links to official documentation and installation resources are provided here, please make sure to refer to your internal system engineering department to ensure that your company deployment processes and best practices are correctly matched.

Dependencies Setup

As a preliminary step before installing any dependency, it is strongly suggested to create a user named akamas on your machine hosting Akamas Server.

Docker

Follow the reference documentation to install docker on your system.

Verify dependencies

As a quick check to verify that all dependencies have been correctly installed, you can run the following commands

  • Docker:

For offline installations, you can check docker with docker ps command

  • Docker compose :

Docker versions older than 23 must usedocker-compose command instead of docker compose

  • AWS CLI:

Read more about how to set up .

This page will guide you through the installation of software components that are required to get the Akamas Server installed on a machine. Please read the for a detailed list of these software components for each specific OS.

Docker installation guide:

Docker compose is already installed since Docker 23+. To install it on previous versions of Docker follow this installation guide:

AWS CLI v2:

To run docker with a non-root user, such as the akamas user, you should add it to the docker group. You can follow the guide at:

Akamas dependencies
docker run hello-world
docker compose --version
aws --version
Akamas dependencies
https://docs.docker.com/engine/install
https://docs.docker.com/compose/install/
https://docs.aws.amazon.com/cli/latest/userguide
https://docs.docker.com/engine/install/linux-postinstall/

Troubleshoot Docker installation issues

This section describes some of the most common issues found during the Akamas installation.

Issues when installing Docker

Centos 7 and RHEL 7

Notice: this distro features a known issue since Docker default execution group is named dockerroot instead of docker . To make docker work edit (or create) /etc/docker/daemon.json to include the following fragment:

{
  "group": "dockerroot"
}

After editing or creating the file, please restart Docker and then check the group permission of the Docker socket (/var/run/docker.sock), which should show dockerroot as a group:

srw-rw----. 1 root dockerroot 0 Jul  4 09:57 /var/run/docker.sock

Then, add the newly created akamas user to the dockerroot group so that it can run docker containers:

sudo usermod -aG dockerroot <user_name>

and check the akamas user has been correctly added to dockerroot group by running:

lid -g dockerroot

Issues when running AWS CLI

In case of issues in logging in through AWS CLI, when executing the following command:

aws ecr get-login-password --region us-east-2

Please check that:

  • Environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION are correctly set

  • AWS CLI version is 2.0+

Issue when starting Akamas services

Akamas failed to start some services

Please notice that the very first time Akamas is started, up to 30 minutes might be required to initialize the environment.

In case the issue persists you can run the following command to identify which service is not able to start up correctly

akamas status -d

License service unable to access docker socket

In some systems, the Docker socket, usually located in /var/run/docker.sock can not be accessed within a container. This causes Akamas to signal this behavior by reporting the Access Denied error in the license service logs.

To overcome this limitation edit the docker-compose.yaml file adding the line privileged: true to the following services:

  • License

  • Optimizer

  • Telemetry

  • Airflow

The following is a sample configuration where this change is applied to the license service:

license:
  image: 485790562880.dkr.ecr.us-east-2.amazonaws.com/akamas/license_service:2.3.0
  container_name: license
  privileged: true

Finally, you can issue the following command to apply these changes

docker-compose up -d

Missing Akamas Customer variable

You can easily inspect which value of this variable has been used when starting Akamas by running the following command on the Akamas server:

docker inspect license | grep AKAMAS_CUSTOMER

If you find out that the value is not the one you expect you can update it by running the following command on the Akamas server:

AKAMAS_CUSTOMER=<your-new-value> docker-compose up -d license

Once Akamas is up and running you can re-install your license.

Other issues

Hardware Requirements

Running in your data center

The following table provides the minimal hardware requirements for the virtual or physical machine used to install the Akamas server in your data center.

Resource

Requirement

CPU

4 cores @ 2 GHz

Memory

16 GB

Disk Space

70 GB

Running on AWS EC2

As shown in the following diagram, you can create the Akamas instance in the same AWS region, Virtual Private Cloud (VPC), and private subnet as your own already existing EC2 machines and by creating/configuring a new security group that allows communication between your application instances and Akamas instance. The inbound/outbound rules of this security group must be configured as explained in the Networking Requirements section of this page.

It is recommended to use an m6a.xlarge instance with at least 70GB of disks of type GP2 or GP3 and select the latest LTS version of Ubuntu.

Supported AWS Regions

Akamas can be run in any EC2 region.

AWS Service Limits

Docker compose installation

This section describes how to install Akamas on Docker.

Preliminary steps

Before installing Akamas, please follow these steps:

Installation steps

Please follow these steps to install the Akamas Server:

Online installation behind a Proxy server

This section describes how to setup an Akamas Server behind a proxy server and to allow Docker to connect to the Akamas repository on AWS ECR.

Configure Docker daemon

First, create the /etc/systemd/system/docker.service.d directory if it does not already exists. Then create or update the /etc/systemd/system/docker.service.d/http-proxy.conf file with the variables listed below, taking care of replacing <PROXY> with the address and port (and credentials if needed) of your target proxy server:

Once configured, flush the changes and restart Docker with the following commands:

Configure the Akamas containers

To allow the Akamas services to connect to addresses outside your intranet, the Docker instance needs to be configured to forward the proxy configuration to the Akamas containers.

Update the ~/.docker/config.json file adding the following field to the JSON, taking care to replace <PROXY> with the address (and credentials if needed) of your target proxy server:

Run Akamas

Set the following variables to configure your working environment, taking care to replace <PROXY> with the address (and credentials if needed) of your target proxy server:

Once configured, you can log into the ECR repository through the AWS CLI and start the Akamas services manually.

We recommend using the for a smoother experience.

When installing Akamas it’s mandatory to provide the AKAMAS_CUSTOMER variable as illustrated in the . This variable must match the one provided by Akamas representatives when issuing a license. If the variable is not properly exported license installation will fail with an error message indicating that the name of the customer installation does not match the one provided in the license.

For any other issues please contact Akamas .

To run Akamas on an AWS Instance you need to create a new virtual machine based on one of the supported operating systems. You can refer to for step-by-step instructions on creating the instance.

You can find the latest version supported for your preferred region .

Before installing Akamas on an AWS Instance please make sure to meet your AWS service limits (please refer to the official AWS documentation ).

Please make sure to read the section before installing Akamas.

Please also read the section on how to and how to . Finally, read the relevant sections of to integrate Akamas into your specific ecosystem.

For more details, refer to the official documentation page: .

For more details, refer to the official documentation page: .

official AWS CLI installation guide
installation guide
Customer Support Services
AWS documentation
here
here
Getting Started
Review hardware, software, and network prerequisites
Install all Akamas dependencies
Install the Akamas Server
Install the Akamas CLI
Verify the Akamas Server
Install an Akamas license
troubleshoot the installation
manage the Akamas Server
Integrating Akamas
[Service]
Environment="HTTP_PROXY=<PROXY>"
Environment="HTTPS_PROXY=<PROXY>"
sudo systemctl daemon-reload
sudo systemctl restart docker
{
  ...
  "proxies": {
    "default": {
      "httpProxy": "<PROXY>",
      "httpsProxy": "<PROXY>",
      "ftpProxy": "<PROXY>",
      "noProxy": "localhost,127.0.0.1,/var/run/docker.sock,database,optimizer,campaign,analyzer,telemetry,log,elasticsearch,metrics,system,license,store,orchestrator,airflow-db,airflow-webserver,kong-database,kong,user-service,keycloak,logstash,kibana,akamas-ui,grafana,prometheus,node-exporter,cadvisor,konga,benchmark"
    }
  }
}
export HTTP_PROXY='<PROXY>'
export HTTPS_PROXY='<PROXY>'
Control Docker with systemd
Configure Docker to use a proxy server

Setup HTTPS configuration

Akamas APIs and UI use plain HTTP when they are first installed. To enable the use of HTTPS you will need to:

  1. Ask your security team to provide you with a valid certificate for your server. The certificate usually consists of two files with ".key" and ".pem" extensions. You will need to provide the Akamas server DNS name.

  2. Create a folder named "certs" in the same directory as Akamas' docker-compose file;

  3. Copy the ".key" and ".pem" files in the created "certs" folder and rename them to "akamas.key" and "akamas.pem" respectively. Make sure that the files belong to the same user and group you use to run Akamas.

  4. Restart two Akamas services by running the following commands:

    cd <Akamas docker-compose file folder>
    docker-compose restart akamas-ui kong

After the containers reboot is complete you will be able to access the UI over HTTPS from your browser:

https://<akamas server name here>

Setup CLI to use HTTPS

Now that your Akamas server is configured to use HTTPS you can update the Akamas CLI configuration in order to use the secure protocol.

akamas init config

You will be prompted to enter some input, please value it as follows:

Api address [http://localhost:8000]: https://<akamas server dns address>:443/akapi
Workspace [default]: default
Verify SSL: [True]: True

You can test the connection by running:

akamas status

It should return ‘OK’, meaning that Akamas has been properly configured to work over HTTPS.

If you have not yet installed the Akamas CLI follow the in order to install it. If you already have the CLI available, you can run the following command:

CLI installation guide

Software Requirements

This page describes the requirements that should be fulfilled by the user when installing or managing an Akamas installation on Kubernetes. The software listed below is usually installed on the user's workstation or laptop.

Kubectl

kubectl version --short

Helm

helm version --short

Privileged access

Akamas uses Elasticsearch to store logs and time-series. When running Akamas on Kubernetes, Elasticsearch is installed, automatically using the official Elasticsearch helm chart. This chart required running an init container with privileged access to set up a configuration on the host running the Elasticsearch pod. If running such a container is not permitted in your environment you can add the following snippet to the akamas.yaml file when installing Akamas to disable this feature. \

# Disable ES privileged initialization container. 
elasticsearch:
  sysctlInitContainer:
    enabled: false

Changing UI Ports

By default, Akamas uses the following ports for its UI:

  • 80 (HTTP)

  • 443 (HTTPS)

Depending on the configuration of your environment, you may want to change the default settings: to do so, you’ll have to update the Akamas docker-compose file.

Inside the docker-compose.yml file, scroll down until you come across the akamas-ui service. There you will find a specification as follows:

Update the YAML file by remapping the UI ports to the desired ports of the host.

In case you were running Akamas with host networking, you are allowed to bind different ports in the container itself. In order to do so you can expand the docker-compose service by adding a couple of environment variables like this:

Kubectl must be installed and configured to interact with the desired cluster. Refer to the to set up the client.

To interact with the Kubernetes API server you will need , preferably with a version matching the cluster. To check both the client and cluster versions, run:

Installing Akamas requires or higher. To check the version run:

official kubectl documentation
kubectl
Helm 3.0
  akamas-ui:
    ports:
      - "443:443"
      - "80:80"
  akamas-ui:
    ports:
      - "<YOUR_HTTPS_PORT_OF_CHOICE>:443"
      - "<YOUR_HTTP_PORT_OF_CHOICE>:80"
  akamas-ui:
    environment:
      - HTTP_PORT=<HTTP_CONTAINER_PORT>
      - HTTPS_PORT=<HTTPS_CONTAINER_PORT>
    ports:
      - "<YOUR_HTTPS_PORT_OF_CHOICE>:<HTTP_CONTAINER_PORT>"
      - "<YOUR_HTTP_PORT_OF_CHOICE>:<HTTPS_CONTAINER_PORT>"

Kubernetes installation

This section describes how to install Akamas on a Kubernetes cluster.

Preliminary steps

Before installing Akamas, please follow these steps:

Installation steps

Please follow these steps to install the Akamas application:

Please also read the section on how to . Finally, read the relevant sections of to integrate Akamas into your specific ecosystem.

Review the cluster requirements
Install the software requirements
Install the application
Install the CLI
Verify the installation
Install the license
manage Akamas
Integrating Akamas

Install Akamas

Two installation modes are available:

Akamas is deployed on your Kubernetes cluster through a , and all the required images can be downloaded from the AWS ECR repository.

, in case the Kubernetes cluster can access the Internet.

, in case the Kubernetes cluster does not have access to the Internet or you need to use a private image registry.

Helm chart
online installation
offline installation

Prerequisites

Before installing the Akamas please make sure to review all the following requirements:

Cluster Requirements

Kubernetes version

Running Akamas requires a cluster running Kubernetes version 1.23 or higher.

Resources requirements

Akamas can be deployed in three different sizes depending on the number of concurrent optimization studies that will be executed. If you are unsure about which size is appropriate for your environment we suggest you start with the small one and upgrade to bigger ones as you expand the optimization activity to more applications.

The tables below report the required resources both for requests and limits that should be available in the cluster to use Akamas.

Small

The small tier is suited for environments that needs to support up to 10 concurrent optimization studies

Storage requirements

The cluster must provide the definition of a Storage Class so that the application installation can leverage Persistent Volume Claims to dynamically provision the volumes required to persist data.

Permissions

To work properly, Akamas needs to manage some resources inside the Namespace. For this reason, it is recommended to run Akamas in a dedicated Namespace.

To manage resources, Akamas uses a ServiceAccount bound to the application's pods, which must be created either manually by the cluster administrator or automatically by the provided Helm chart.

This snippet describes the namespaced required permissions for the service account:

Networking

Resource
Requests
Limits

For more information on this topic refer to .

Networking requirements depend on how users interact with Akamas. Services can be exposed via Ingress, LoadBalancers, NodePorts, or . Refer to for a more detailed description of the available options.

Cluster requirements
Software requirements

CPU

3 Cores

6 Cores

Memory

16 GB

18GB

Disk Space

70 GB

70 GB

- apiGroups: ["batch"]
  resources:
    - jobs
  verbs: ["get", "list", "create", "delete", "patch", "update"]
- apiGroups: [""]
  resources:
    - configmaps
  verbs: ["get", "list", "create", "delete", "patch", "update"]
- apiGroups: [""]
  resources:
    - pods
  verbs: ["get", "list", "patch", "update", "watch"]
- apiGroups: [""]
  resources:
    - pods/log
  verbs: ["get"]
- apiGroups: [""]
  resources:
    - secrets
  verbs:
    - get
    - create
    - patch
Kubernetes' official documentation
using kubectl as a proxy
Accessing Akamas

Online Installation

Create the configuration file

To proceed with the installation, you need to create a file, called akamas.yaml in this guide, containing the mandatory configuration values required to customize your application. The following template contains the minimal set of values required to install Akamas:

# AWS credentials to fetch ECR images (required)
awsAccessKeyId: <AWS_ACCESS_KEY_ID>
awsSecretAccessKey: <AWS_SECRET_ACCESS_KEY>

# Akamas customer name. Must match the value in the license (required)
akamasCustomer: <CUSTOMER_NAME>

# Akamas administrator password. If not set a random password will be generated
akamasAdminPassword: <ADMIN_PASSWORD>

You can also download the template file running the following snippet:

curl -so akamas.yaml  http://helm.akamas.io/templates/1.0.2/akamas.yaml.template

Start the installation

Add the Akamas' repository to the Helm client with the following command:

helm repo add akamas http://helm.akamas.io/charts

If you wish to see the values that Helm will use to install Akamas and override some of them, you may execute the following command:

helm show values akamas/akamas

Now, with the configuration file you just created (and the new variables you added to override the defaults), you can start the installation with the following command:

helm upgrade --install \
  --create-namespace --namespace akamas \
  -f akamas.yaml \
  akamas akamas/akamas

This command will create the Akamas resources within the specified namespace. You can define a different namespace by changing the argument --namespace <your-namespace>

An example output of a successful installation is the following:

Release "akamas" does not exist. Installing it now.
NAME: akamas
LAST DEPLOYED: Wed Apr  5 11:40:19 2023
NAMESPACE: akamas
STATUS: deployed
REVISION: 1
NOTES:
Akamas has been installed

NOTES:
Akamas has been installed

To get the initial password use the following command:

kubectl get secret akamas-admin-credentials -o go-template='{{ .data.password | base64decode }}'

Check the installation

To monitor the application startup, run the command kubectl get pods. After a few minutes, the expected output should be similar to the following:

NAME                           READY   STATUS    RESTARTS   AGE
airflow-6ffbbf46d8-dqf8m       3/3     Running   0          5m
analyzer-67cf968b48-jhxvd      1/1     Running   0          5m
campaign-666c5db96-xvl2z       1/1     Running   0          5m
database-0                     1/1     Running   0          5m
elasticsearch-master-0         1/1     Running   0          5m
keycloak-66f748d54-7l6wb       1/1     Running   0          5m
kibana-6d86b8cbf5-6nz9v        1/1     Running   0          5m
kong-7d6fdd97cf-c2xc9          1/1     Running   0          5m
license-54ff5cc5d8-tr64l       1/1     Running   0          5m
log-5974b5c86b-4q7lj           1/1     Running   0          5m
logstash-8697dd69f8-9bkts      1/1     Running   0          5m
metrics-577fb6bf8d-j7cl2       1/1     Running   0          5m
optimizer-5b7576c6bb-96w8n     1/1     Running   0          5m
orchestrator-95c57fd45-lh4m6   1/1     Running   0          5m
store-5489dd65f4-lsk62         1/1     Running   0          5m
system-5877d4c89b-h8s6v        1/1     Running   0          5m
telemetry-8cf448bf4-x68tr      1/1     Running   0          5m
ui-7f7f4c4f44-55lv5            1/1     Running   0          5m
users-966f8f78-wv4zj           1/1     Running   0          5m

At this point, you should be able to access the Akamas UI on http://localhost:8000 and the Akamas CLI http://localhost:8000/akapi by running Kubectl's port forwarding command:

kubectl port-forward service/ui 8000:http

Accessing Akamas

To interact with your Akamas instance you need the UI and API Gateway to be accessible from outside the cluster. This means you need to expose the ui and kong service respectively (although a minimal configuration only requires exposing the ui service, since it can forward requests to the API Gateway through the path /akapi).

Kubernetes offers different options to expose a service outside of the cluster. The following is a list of the supported ones, with examples of how to configure them to work in your chart release:

Port Forwarding

By default, Akams uses Cluster IPs for its services, which only allow communication inside the cluster. Still, you can leverage Kubectl's port-forward to create a private connection and expose any internal service on your local machine.

This solution is suggested to perform quick tests without the need of exposing the application, or in scenarios where cluster access to the public is not allowed.

To make the Akamas UI accessible on http://localhost:8000, run the following command:

kubectl port-forward service/ui 8000:http

To interact with the Akamas CLI you can use the URL http://localhost:8000/akapi, or expose the kong service in the same way.

Load Balancer

Load Balancers expose services outside the cluster. This solution is often used with clusters managed by cloud providers such as Amazon EKS or Google Kubernetes Engine (GKE).

ui:
  service:
    type: "LoadBalancer"

To get the address of the load balancer run the command:

kubectl get svc ui -o 'custom-columns=NAME:metadata.name,TYPE:spec.type,HOSTNAME:status.loadBalancer.ingress[].hostname'

Ingress

An Ingress is a Kubernetes object that provides service access, load balancing, and SSL termination to kubernetes services.

ui:
  ingress:
    enabled: true
    className: "<class-name>"
    hosts:
      - host: "<dns-name>"
        paths:
          - path: /
            pathType: Prefix

NodePort

Node Ports make services accessible on specific ports of any node of the cluster.

ui:
  service:
    type: "NodePort"
    http:
      nodePort: "30010"

The Akamas UI will be accessible on any cluster node at http://<cluster-node>:30010. You can also omit the http.nodePort field and let Kubernetes automatically select a random port.

Offline Installation

Configure the registry

If your cluster is in an air-gapped network or is unable to reach the default repository, you need to mirror the required images on a private repository.

The procedure described here leverages your local environment to upload the images, so it requires that Docker is installed and configured to interact with the private registry.

Obtain the Docker images

Get in contact with Akamas Customer Services to get the latest versions of the Akamas artifacts. This will include:

  • images.tar.gz: a tarball containing the Akamas images.

  • akamas: the binary file of the Akamas CLI that will be used to verify the installation.

Upload the Docker images

The offline installation mode requires importing the shipped Docker images into your local environment. Run the following command in the same directory where the tar file is stored:

Once the import is complete, you need to re-tag and upload the images. Run the following snippet, replacing <REGISTRY_URL> with the actual URL of the private registry:

Once the upload is complete, you can proceed with the next steps.

Create the configuration file

To proceed with the installation, you need to create a file, called akamas.yaml in this guide, containing the mandatory configuration values required to customize your application. The following template contains the minimal set of values required to install Akamas:

Configure the authentication

To authenticate to your private registry, you must manually create the Secret required to pull the images. If the registry uses basic authentication, you can create the credentials in the namespace by running the following command:

Otherwise, you can leverage any credential already configured on your machine by running the following command:

Start the installation

From a machine that can reach the endpoint, add the Akamas' repository to the Helm client with the following command:

If you can not reach helm.akamas.io from the machine where the installation will be run, pull the chart by running:

The command downloads the latest version chart version as an archive named akamas-<version>.tgz. The file can be transferred to the machine where the installation will be run. Replace akamas/akamas with the download package in the following commands.

If you wish to see and override the values that Helm will use to install Akamas, you may execute the following command.

Now, with the configuration file you just created (and the new variables you added to override the defaults), you can start the installation with the following command:

This command will create the Akamas resources within the specified namespace. You can define a different namespace by changing the argument --namespace <your-namespace>

An example output of a successful installation is the following:

Check the installation

To monitor the application startup, run the command kubectl get pods. After a few minutes, the expected output should be similar to the following:

At this point, you should be able to access the Akamas UI on http://localhost:8000 and the Akamas CLI http://localhost:8000/akapi by running Kubectl's port forwarding command:

Before starting the installation, make sure the are met.

Akamas on Kubernetes is provided as a set of templates packaged in a chart archive managed by .

This minimal configuration is enough to have Akamas up and running on your cluster, even though the endpoint will only be accessible through Kubectl's .

The page provides some configuration examples using different types of services: edit the akamas.yaml file using the strategy that best suits your needs, or continue directly with the next sections and configure the endpoints at a later time.

Mind that, before logging in, you need to and .

If you haven't already, you can update your configuration file to use a different type of service to expose Akamas' endpoints. To do so, pick from the the configuration snippet for the service type of your choice, add it to the akamas.yaml file, and re-run the installation command to update your Helm release.

Refer to the official for more details about port-forwarding.

You can expose the Akamas UI through a Load Balancer by adding the snippet below to the akamas.yaml file from the previous section, and re-running to update the configuration.

For more details on Load Balancers refer to the .

You can expose the Akamas UI through an Ingress by adding the snippet below to the akamas.yaml file from the previous section. After adding to className one of the ingress controllers available on the cluster, re-run to update the configuration.

You can also configure a certificate on the Ingress: refer to for instructions.

Refer to the for more details on Ingresses.

You can expose the Akamas UI through a NodePort by adding the snippet below to the akamas.yaml file from the previous section, and re-running to update the configuration.

Refer to the for more information on Node Ports.

Before starting the installation, make sure the are met.

Akamas on Kubernetes is provided as a set of templates packaged in a chart archive managed by .

This minimal configuration is enough to have Akamas up and running on your cluster, even though the endpoint will only be accessible through Kubectl's .

The page provides some configuration examples using different types of services: edit the akamas.yaml file using the strategy that better suits your needs, or continue directly with the sections and configure the endpoints at a later time.

This section describes how to configure the authentication to your private registry. If your registry does not require any authentication, skip directly to the .

Mind that, before logging in, you need to and .

If you haven't already, you can update your configuration file to use a different type of service to expose Akamas' endpoints. To do so, pick from the the configuration snippet for the service type of your choice, add it to the akamas.yaml file, and re-run the installation command to update your Helm release.

requirements
Helm
port forwarding
Accessing Akamas
configure the Akamas CLI
install a valid license
Accessing Akamas
kubernetes documentation
official kubernetes documentation
HTTPS configuration section
official kubernetes documentation
official kubernetes documentation
Port Forwarding
LoadBalancer
Ingress
NodePort
the install command
the install command
the install command
cd '<IMAGES_ARCHIVE_LOCATION>'
docker image load -i images.tar.gz
REGISTRY='<REGISTRY_URL>'
for image in `docker images | awk '/485790562880.dkr.ecr.us-east-2.amazonaws.com/ {print $1 ":" $2}'`; do
  newImage="${image/485790562880.dkr.ecr.us-east-2.amazonaws.com/${REGISTRY}}"
  docker tag ${image} ${newImage}
  docker push ${newImage}
done
# Akamas customer name. Must match the value in the license (required)
akamasCustomer: <CUSTOMER_NAME>

# Akamas administrator password. If not set a random password will be generated
akamasAdminPassword: <ADMIN_PASSWORD>

registry: <REGISTRY_URL>
postgresql:
  image:
    registry: <REGISTRY_URL>
elasticsearch:
  image: <REGISTRY_URL>/elasticsearch
kibana:
  image: <REGISTRY_URL>/kibana
kubectl create secret docker-registry registry-token \
  --namespace akamas \
  --docker-server=<REGISTRY_URL> \
  --docker-username=<USER> \
  --docker-password=<PASSWORD>
kubectl create secret docker-registry registry-token \
  --namespace akamas \
  --from-file=.dockerconfigjson=<PATH/TO/.docker/config.json>
helm repo add akamas http://helm.akamas.io/charts
helm pull akamas/akamas
helm show values akamas/akamas
helm upgrade --install \
  --create-namespace --namespace akamas \
  -f akamas.yaml \
  akamas akamas/akamas
Release "akamas" does not exist. Installing it now.
NAME: akamas
LAST DEPLOYED: Wed Apr  5 11:40:19 2023
NAMESPACE: akamas
STATUS: deployed
REVISION: 1
NOTES:
Akamas has been installed

NOTES:
Akamas has been installed

To get the initial password use the following command:

kubectl get secret akamas-admin-credentials -o go-template='{{ .data.password | base64decode }}'
NAME                           READY   STATUS    RESTARTS   AGE
airflow-6ffbbf46d8-dqf8m       3/3     Running   0          5m
analyzer-67cf968b48-jhxvd      1/1     Running   0          5m
campaign-666c5db96-xvl2z       1/1     Running   0          5m
database-0                     1/1     Running   0          5m
elasticsearch-master-0         1/1     Running   0          5m
keycloak-66f748d54-7l6wb       1/1     Running   0          5m
kibana-6d86b8cbf5-6nz9v        1/1     Running   0          5m
kong-7d6fdd97cf-c2xc9          1/1     Running   0          5m
license-54ff5cc5d8-tr64l       1/1     Running   0          5m
log-5974b5c86b-4q7lj           1/1     Running   0          5m
logstash-8697dd69f8-9bkts      1/1     Running   0          5m
metrics-577fb6bf8d-j7cl2       1/1     Running   0          5m
optimizer-5b7576c6bb-96w8n     1/1     Running   0          5m
orchestrator-95c57fd45-lh4m6   1/1     Running   0          5m
store-5489dd65f4-lsk62         1/1     Running   0          5m
system-5877d4c89b-h8s6v        1/1     Running   0          5m
telemetry-8cf448bf4-x68tr      1/1     Running   0          5m
ui-7f7f4c4f44-55lv5            1/1     Running   0          5m
users-966f8f78-wv4zj           1/1     Running   0          5m
kubectl port-forward service/ui 8000:http

Use a proxy server

The Akamas CLI supports interacting with the API server through an HTTP/HTTPS proxy server.

To enable access via an HTTP proxy, set the environment variable HTTP_PROXY. From the following snippet, replace proxy_ip and proxy_port with the desired values.

export HTTP_PROXY="http://<proxy_ip>:<proxy_port>"

Then, run an akamas command to verify access.

akamas status --debug

Access through an HTTPS proxy can be set by using environment variable HTTPS_PROXY, instead of HTTP_PROXY.

requirements
Helm
port forwarding
Accessing Akamas
configure the Akamas CLI
install a valid license
Accessing Akamas
installation section

Initialize the CLI

The CLI is used to interact with an akamas server. To initialize the configuration of the Akamas CLI you can run the command:

akamas init config

and follow the wizard to provide the required information such as the server IP.

Here is a summary of the configuration wizard options.

Api address [http://localhost:8000]: https://<akamas-hostname>:<ui-port>/akapi
Workpace [default]: default
Verify SSL: [True]: True
Is external certificate CA required? [y/N]: N

After this step, the Akamas CLI can be used to login to the Akamas server, by issuing the following command:

akamas login

and providing the credentials as requested.

Install the CLI

This section describes how to install an Akamas workstation

The Akamas CLI allows users to invoke commands against the Akamas dedicated machine (Akamas Server). The Akamas CLI can also be installed on a different system than the Akamas Server.

Prerequisites

Linux and Windows operating systems are supported for installing Akamas CLI.

Installation steps

The Akamas CLI can be installed and configured in two simple steps:

Setup the CLI

Linux

To get Akamas CLI installed on Linux, run the following commands:

curl -o akamas_cli -O https://akamas.s3.us-east-2.amazonaws.com/cli/$(curl https://akamas.s3.us-east-2.amazonaws.com/cli/stable.txt)/linux_64/akamas
sudo mv akamas_cli /usr/local/bin/akamas
chmod 755 /usr/local/bin/akamas

You can now run the Akamas CLI following by running the akamas command.

In some installations, the /usr/local/bin folder is not present in the PATH environment variable. This prevents you from using akamas without specifying the complete file location. To fix this issue you can add an entry to the PATH system environment variable or move the executable to another folder in your PATH.

Auto-completion

To enable auto-completion on Linux systems with a bash shell (requires bash 4.4+), run the following commands:

curl -O https://s3.us-east-2.amazonaws.com/akamas/cli/$(curl https://s3.us-east-2.amazonaws.com/akamas/cli/stable.txt)/linux_64/akamas_autocomplete.sh
mkdir -p ~/.akamas
mv akamas_autocomplete.sh ~/.akamas
echo '. ~/.akamas/akamas_autocomplete.sh' >> ~/.bashrc
source ~/.bashrc

Windows

To install the Akamas CLI on Windows run the following command from the Powershell:

Invoke-WebRequest "https://s3.us-east-2.amazonaws.com/akamas/cli/$($(Invoke-WebRequest https://s3.us-east-2.amazonaws.com/akamas/cli/stable.txt | Select-Object -Expand Content) -replace '\n', '')/win_64/akamas.exe" -OutFile akamas.exe

You can now run the Akamas CLI following by running .\akamas in the same folder.

To invoke the akamas CLI from any folder, create a akamas folder (such as C:\Program Files\akamas), and move there the akamas.exe file. Then, add an entry to the PATH system environment variable with the value C:\Program Files\akamas. Now, you can invoke the CLI from any folder, by simply running the akamas command.

The Akamas CLI can be accessed by simply running the akamascommand.

Verify the CLI

You can verify that the CLI was installed correctly by running this command:

akamas --version

which should show an output similar to this one

Akamas CLI, version 2.7.2

At any time, you can see available commands and options with:

akamas --help

Change CLI configuration

API Address

The CLI, as well as the UI, interacts with the akamas server via APIs. The apiAddress configuration contains the information required in order to communicate with the server.

Docker

The Akamas Server provides different listeners to interact with APIs:

  • an HTTP listener on port 80 under the path /akapi

  • an HTTP listener on port 8000

  • an HTTPS listener on port 443 under the path /akapi

  • an HTTPS listener on port 8443

Depending on your networking setup you can either use the listeners on ports 80 and 443 which are also used for the UI or directly interact with the API gateway on ports 8000 and 8443. If you are not sure about your network setup we suggest you start with the HTTPS listener on port 443.

For improved security, it is recommended to configure CLI communications with the Akamas Server over HTTPS. Notice that you need to have a valid certificate installed on your Akamas server (at least a self-signed one) in order to enable HTTPS communication between CLI and the Akamas Server.

Changing CLI protocol

The CLI can be configured either directly via the CLI itself or via the YAML configuration file akamasconf.

Using the CLI

Issue the following command to change the configuration of the Akamas CLI:

akamas init config

and then follow the wizard to provide the required CLI configuration:

  • enable HTTPS communications:

Api address [http://localhost:8000]: https://<akamas server dns name>:443/akapi
Workpace [default]: Workspace1
Verify SSL: [True]: True
Is external certificate CA required? [y/N]: N
  • enable HTTP communications:

Api address [http://localhost:8000]: http://<akamas server dns name>:80
Workspace [default]: Workspace1

Please notice that by default Akamas CLI expects a valid SSL certificate. If you are using a self-signed certificate or a not valid one you can set the Verify SSL variable to false. This will mimic the behavior of accepting a not valid HTTPS certificate on your favorite browser.

Using the akamasconf file

Create a file and name it akamasconf to be located at the following location:

  • Linux: ~/.akamas/akamasconf

  • Windows: C:\Users\<username>\.akamas (where C: is the drive where the OS is installed)

The file location can be customized by setting an $AKAMASCONF environment variable.

Here is an example akamasconf file provided as a sample:

apiAddress: http[s]://<akamas server dns name>:80[443]/akapi
verifySsl: [true|false]
organization: akamas
workspace: default

HTTPS configuration

HTTPS configuration can be set in the Akamas services (UI and Kong) or in the Ingress definition.

Certificate in the Ingress

Declare the certificate secret by adding a tls section to the Ingress definition:

ui:
  ingress:
    enabled: true
    className: "<class-name>"    # ingress class name

    hosts:
      - host: "example.company.com"
        paths:
          - path: /
            pathType: Prefix

    tls:
      - secretName: "<secret name>"  # secret name containing the certificate and key data
        hosts:
          - "example.company.com"

You can apply the same configuration to the kong service to add a certificate to the API Gateway.

Certificate in the Akamas services

To add a certificate to both the UI and API Gateway you need to generate the akamas.key and akamas.pem files, and create a secret in Akamas' namespace with the following command:

kubectl create secret generic certificate --from-file=akamas.key --from-file=akamas.pem

To complete the update, restart the deployments:

kubectl rollout restart deployment ui kong

This configuration can be changed at any time (see how to ).

Logging into Akamas requires a valid license. If you have not installed your license yet refer to the page .

Refer to the section to modify the CLI ports the Akamas Server is listening to. Section provides instructions on how to interact with Akamas via a proxy server.

For the full list of Akamas commands please refer to the section .

The CLI configuration contains the information required to communicate with the akamas server. It can be easily created and updated with a configuration wizard. This page describes the main options of the Akamas CLI and how to modify them. If your Akamas instance is installed with Kubernetes, ensure the UI service is .

For more information regarding the TLS definition refer to the .

change the CLI config
Install the Akamas license
Setup the CLI
Initialize the CLI
Change CLI config
Use a proxy server
CLI reference
official documentation
properly configured

Verify the installation

Run the following command to verify the correct startup and initialization of Akamas:

akamas status

When all services have been started this command will return an "OK" message. Please notice that it might take a few minutes for Akamas to start all services.

To check that also UI is properly working please access the following URL:

http://<akamas server name here>

You will see the Akamas login form:

Install the license

Logging into Akamas requires a valid Akamas license.

To install a license get in touch with Akamas Customer Service to receive:

  • the Akamas license file

  • your "customer name" to configure in the variable AKAMAS_CUSTOMER for Docker installations or akamasCustomer for Kubernetes installations

  • the URL to configure in the AKAMAS_BASE_URL variable for Docker installations

  • login credentials

Once you have this information, you can issue the following commands:

To get the administrator's initial password for Kubernetes installations, run the following command:

Akamas logs

Akamas allows dumping log entries from a specific service, workspace, workflow, study, trial, and experiment, for a specific timeframe and at different log levels.

Akamas CLI for logs

Akamas logs can be dumped via the following CLI command:

This command provides many filters which can be retrieved with the following command:

which should return

For example, to get the list of the most recent Akamas errors:

which should return something similar to:

Please notice that it is impossible to log into Akamas before a license has been installed. Read here .

how to Install an Akamas license
cd <your bundle files location>

akamas install license <license file you have been provided>

akamas login
# prompt for user and password
kubectl get secret -n <NAMESPACE> akamas-admin-credentials -o go-template='{{.data.password | base64decode}}'
akamas log
akamas log --help
Usage: akamas log [OPTIONS] [MESSAGE]

  Show Akamas logs

Options:
  -d, --debug                     Show extended error messages if present.
  --page-size INTEGER             Number of log's lines to be retrieved NOTE:
                                  This argument is mutually exclusive with
                                  arguments: [dump, no_pagination].
  --no-pagination                 Disable pagination and print all logs NOTE:
                                  This argument is mutually exclusive with
                                  arguments: [dump, page_size].
  --dump                          Print the logs without pagination and
                                  formatting NOTE: This argument is mutually
                                  exclusive with arguments: [page_size,
                                  no_pagination].
  -f, --from [%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d %H:%M:%S|%Y-%m-%dT%H:%M:%S.%f|%Y-%m-%d %H:%M:%S.%f|[-]nw|[-]nd|[-]nh|[-]nm|[-]ns]
                                  The start timestamp of the logs
  -t, --to [%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d %H:%M:%S|%Y-%m-%dT%H:%M:%S.%f|%Y-%m-%d %H:%M:%S.%f|[-]nw|[-]nd|[-]nh|[-]nm|[-]ns]
                                  The end timestamp of the logs
  -s, --study TEXT                UUID or name of the Study
  -e, --exp INTEGER               Number of the experiment
  --trial INTEGER                 Number of the trial
  -y, --system TEXT               UUID or name of the System
  -W, --workflow TEXT             UUID or name of the Workflow
  -l, --log-level TEXT            Log level
  -S, --service TEXT              Akamas service
  --without-metadata              Hide metadata
  --sorting [ASC|DESC]            Sorting order of the timestamps
  -ws, --workspace TEXT           UUID or name of the Workspace to visualize.
                                  When empty, system logs will be returned
                                  instead
  --help                          Show this message and exit.
akamas log -l ERROR
       timestamp                         system                  provider    service                                                                                   message
==============================================================================================================================================================================================================================================================
2022-05-02T15:51:26.88    -                                      -          airflow     Task failed with exception
2022-05-02T15:51:26.899   -                                      -          airflow     Failed to execute job 2 for task Akamas_LogCurator_Task
2022-05-02T15:56:29.195   -                                      -          airflow     Task failed with exception
2022-05-02T15:56:29.215   -                                      -          airflow     Failed to execute job 3 for task Akamas_LogCurator_Task
2022-05-02T16:01:55.587   -                                      -          license     2022-05-02 16:01:47.426 ERROR 1 --- [           main] c.a.m.utils.rest.RestHandlers            :  has failed with returning a response:
                                                                                        {"httpStatus":400,"timestamp":"2022-05-02T16:01:47.413638","error":"Bad Request","message":"The following metrics: 'spark.spark_application_duration' were not found
                                                                                        in any of the components of the system 'analytics_cluster'","path":null}
2022-05-02T16:01:55.587   -                                      -          license     2022-05-02 16:01:47.434 ERROR 1 --- [           main] c.a.m.MigrationApplication               : Unable to complete operation. Mode: RESTORE. Cause: A request to a
                                                                                        downstream service CampaignService has failed: 400 : [{"httpStatus":400,"timestamp":"2022-05-02T16:01:47.413638","error":"Bad Request","message":"The following
                                                                                        metrics: 'spark.spark_application_duration' were not found in any of the components of the system 'analytics_cluster'","path":null}]
2022-05-02T16:01:55.678   -                                      -          license     2022-05-02 16:01:47.434 ERROR 1 --- [           main] c.a.m.MigrationApplication               : Unable to complete operation. Mode: RESTORE. Cause: A request to a
                                                                                        downstream service CampaignService has failed: 400 : [{"httpStatus":400,"timestamp":"2022-05-02T16:01:47.413638","error":"Bad Request","message":"The following
                                                                                        metrics: 'spark.spark_application_duration' were not found in any of the components of the system 'analytics_cluster'","path":null}]
2022-05-02T16:01:55.678   -                                      -          license     2022-05-02 16:01:47.426 ERROR 1 --- [           main] c.a.m.utils.rest.RestHandlers            :  has failed with returning a response:
                                                                                        {"httpStatus":400,"timestamp":"2022-05-02T16:01:47.413638","error":"Bad Request","message":"The following metrics: 'spark.spark_application_duration' were not found
                                                                                        in any of the components of the system 'analytics_cluster'","path":null}
2022-05-02T16:12:10.261   -                                      -          license     2022-05-02 16:05:53.209 ERROR 1 --- [           main] c.a.m.services.CampaignService           : de9f5ff9-418e-4e25-ae2c-12fc8e72cafc
2022-05-02T16:32:07.216   -                                      -          license     2022-05-02 16:31:37.330 ERROR 1 --- [           main] c.a.m.services.CampaignService           : 06c4b858-8353-429c-bacd-0cc56cc44634
2022-05-02T16:38:18.522   -                                      -          campaign    Internal Server Error: Object of class [com.akamas.campaign_service.entities.campaign.experiment.Experiment] with identifier
                                                                                        [ExperimentIdentifier(workspace=ac8481d3-d031-4b6a-8ae9-c7b366f027e8, study=de9f5ff9-418e-4e25-ae2c-12fc8e72cafc, id=2)]: optimistic locking failed; nested exception
                                                                                        is org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) :
                                                                                        [com.akamas.campaign_service.entities.campaign.experiment.Experiment#ExperimentIdentifier(workspace=ac8481d3-d031-4b6a-8ae9-c7b366f027e8,
                                                                                        study=de9f5ff9-418e-4e25-ae2c-12fc8e72cafc, id=2)]

Manage anonymous data collection

Akamas might collect anonymized usage information on running optimizations. Collection and tracking are disabled by default and can be manually enabled.

Docker installation

External tracking is managed through the following environment variables:

  • AKAMAS_TRACKER_URL: the target URL for all tracking info.

  • AKAMAS_TRACKING_OPT_OUT: when set to 1, disables anonymous data collection.

Tracking for a running instance can be enabled by editing the AKAMAS_TRACKING_OPT_OUT variable in the docker-compose.yaml file.

To enable tracking set the variable to the following value:

AKAMAS_TRACKING_OPT_OUT=0

Then issue the command:

docker-compose up -d

Kubernetes installation

External tracking is managed through the field trackingOptOut in the Values file. To enable tracking set trackingOptOut to 0 as in the following example and upgrade the installation:

awsAccessKeyId: "YOUR_ACCESSKEY_ID"
awsSecretAccessKey: "YOUR_SECRET_ACCESS_KEY"

trackingOptOut: 0

Manage Akamas

This section is a collection of different topics related to how to manage the Akamas Server.

This section covers some topics on how to manage the Akamas Server:

Akamas logs
Audit logs
Install upgrades and patches
Backup & Recovery of the Akamas Server
Monitor the Akamas Server

Management container/pod

Akamas provides a Management Container (or called Management Pod, for Kubernetes deployments) that contains the Akamas CLI executable and other popular command line tools to develop custom scripts.

On docker, it runs in the same network of the Akamas' services or, when running on Kubernetes, in the Akamas' namespace. The purposes of this management container are:

  • Allow technical troubleshooting/maintenance from inside the Kubernetes cluster.

  • Allow customers to launch/control Akamas without the need to install Akamas CLI on their systems. The akamas executable is configured to connect to the correct endpoint.

  • Provide an environment for the Akamas workflow to execute custom scripts.

The following is the list of the installed tools:

  • akamas-cli

  • curl, ping, wget

  • docker, docker-cli, docker-compose

  • git

  • gzip, zip

  • jq, yq, vim

  • kubectl, Helm

  • openjdk 11

  • openssh-client, openssh-server, ssh-keygen

Docker compose installation

To run the management container on your docker installation, add the following code block to the list of services of your docker-compose file.

  management-container:
    image: 485790562880.dkr.ecr.us-east-2.amazonaws.com/akamas/management-container:1.0.1
    container_name: management-container
    environment:
      - BASH_ENV=/home/akamas/.bashrc
    expose:
      - 22
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    networks:
      - akamas
    restart: unless-stopped

and launch docker-compose up -d as explained in Start Akamas (online) or Run installation (offline).

Kubernetes installation

To run the management pod in the Akamas' namespace, update the following variable in the Values file of the Akamas' Helm chart:

managementPod:
  enabled: true

Then you can issue the helm upgrade --install ... command to launch the pods, as described in Start the installation (online) or Start the installation (offline).

Accessing Management Pod on Kubernetes

When it's deployed to Kubernetes, you may access this management pod in two ways:

  • via kubectl exec -it management-pod

  • via SSH command

NOTE: both methods require kubectl to be installed and configured for this cluster.

Kubectl access

Accessing is as simple as:

kubectl exec -it deployment/management-pod -- bash

SSH access

For this type of access, you need to retrieve the password for the akamas user. You should issue the following command to read it from management-pod logs:

kubectl logs service/management-pod

# example response is:
# Container started
# You can ssh into this container with user 'akamas' and password 'd48020ab71be6a07'

A similar result could be obtained by reading the file akamas_password in the work folder:

kubectl exec -it deployment/management-pod -- cat /work/akamas_password

# example response is:
# d48020ab71be6a07

At this point, you should launch this command to port-forward the management port to your local terminal (number 2300 can as well be any other number: it should be an unused port on your machine):

kubectl port-forward service/management-pod 2300:22 &

then, on another terminal, you may launch:

ssh akamas@localhost -p 2300

and answer yes to the question, then insert the akamas password to successfully SSH access the management pod (see example below):

my_user@my_machine:~$  ssh akamas@localhost -p 2300
The authenticity of host '[localhost]:2300 ([127.0.0.1]:2300)' can't be established.
ED25519 key fingerprint is SHA256:34GXnmRz1YjWr2TTpUpJmRoHYck0NzeAxni2L857Exs.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '[localhost]:2300' (ED25519) to the list of known hosts.
akamas@localhost's password:
Welcome to Ubuntu 20.04.6 LTS (GNU/Linux 5.10.178-162.673.amzn2.x86_64 x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

akamas@management-pod-6dd8b7f898-8xwzf:~$

Work folder

If you need to store Akamas artifacts, scripts, or any other file that need persistence, you can use the /work directory, which persists across restarts. This is the default folder at login time. It contains the akamas_password file mentioned above, the Kubernetes and SSH configuration files, which will be symlinked to your home folder.

Audit logs

Akamas audit logs

Akamas stores all its logs into an internal Elasticsearch instance: some of these logs are reported to the user in the GUI in order to ease the monitoring of workflow executions, while other logs are only accessible via CLI and are mostly used to provide more context and information to support requests.

Audit access can be performed by using the CLI in order to extract logs related to UI or API access. For instance, to extract audit logs from the last hour use the following commands:

  • UI Logs

akamas logs --no-pagination -S kong -f -1h
  • API Logs

akamas logs --no-pagination -S kong -f -1h

Notice: to visualize the system logs unrelated to the execution of workflows bound to workspaces, you need an account with administrative privileges.

Storing audit logs into files

To ease the integration with external logging systems, Akamas can be configured to store access logs into files. To enable this feature you should:

  1. Create a logs folder next to the Akamas docker-compose.yml file

  2. Edit the docker-compose.yml file by modifying the line FILE_LOG: "false" to FILE_LOG: "true"

  3. If Akamas is already running issue the following command

docker-compose up -d logstash

otherwise, start Akamas first.

When the user interacts with the UI or the API Akamas will report detailed access logs both on the internal database and in a file in the logs folder. To ease log rolling and management every day Akamas will create a new file named according to the pattern access-%{+YYYY-MM-dd}.log.

Backup & Recover of the Akamas Server

Akamas server backup

The process of backing up an Akamas server can be divided in two parts, that is system backup and otherwise start Akamas. Backup can be performed in any way you see fit: they’re just regular files so you can use any backup tool.

System backup

System services are hosted on AWS ECR repo so the only thing that fully defines a working Akamas application is the docker-compose.yml file. Performing a backup of the Akamas application is as simple as copying this single file to your backup location. you may schedule any script that performs this weekly or at any frequency you see fit

User data backup

You may list all existing Akamas studies via the Akamas CLI command:

akamas list study

Then you can export all existing studies one by one via the CLI command

akamas export study <UUID>

where UUID is the UUID of a single study. This command exports into a single archive file (tar.gz). These archive files can be backed up to your favorite backup folder.

Akamas server recovery

Akamas server recovery involves recovering the system backup, restarting the Akamas service then re-importing the studies.

System Restore

To restore the system you must recover the original docker-compose.yml then launch the command

docker-compose up &

from the folder where you placed this YAML file and then wait for the system to come up, by checking it with the command

akamas status -d

User data restore

All studies can be re-imported singularly with the CLI command (referring to the correct pathname of the archive):

akamas import study archive.tgz

Monitor the Akamas Server

External tools

You can use any monitoring tool to check the availability of the Akamas instance.

Checking Akamas services

To check the status of the Akams services please run akamas status -d to identify which service is not able to start up correctly

Here is an example of output:

Checking Akamas services on http://localhost:8000
 service	 status
=========================
analyzer       	UP
campaign       	UP
metrics        	UP
optimizer      	UP
orchestrator   	UP
system         	UP
telemetry      	UP
license        	UP
log            	UP
users          	UP
OK

Install upgrades and patches

Akamas patches and upgrades need to be installed by following the specific instructions specified in the package provided. In case of new releases, it is recommended to read the related Release Notes. Under normal circumstances, this usually requires the user to update the docker-compose configuration, as described in the next section.

Docker compose Configuration

When using docker compose to install Akamas, there’s a folder usually named akamas in the user home folder that contains a docker-compose.yml file. This is a YAML text file that contains a list of docker services with the URLs/version pointing to the ECR repo hosting all docker images needed to launch Akamas.

Here’s an excerpt of such a docker-compose.yml file (this example contains 3 services only):

services:
  #####################
  # Database Service #
  #####################
  database:
    image: 485790562880.dkr.ecr.us-east-2.amazonaws.com/akamas/master-db:1.7.0
    container_name: database2
    restart: always
    command: postgres -c max_connections=200

  #####################
  # Optimizer Service #
  #####################
  optimizer:
    image: 485790562880.dkr.ecr.us-east-2.amazonaws.com/akamas/optimizer_service:2.3.0
    container_name: optimizer
    restart: always
    networks:
      - akamas2
    depends_on:
      - database
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /tmp/build/engine_input:/tmp/build/engine_input

  ####################
  # Campaign Service #
  ####################
  campaign:
    image: 485790562880.dkr.ecr.us-east-2.amazonaws.com/akamas/campaign_service:2.3.0
    container_name: campaign
    restart: always
    volumes:
      - config:/config
    networks:
      - akamas2
    depends_on:
      - database
      - optimizer
      - analyzer

The relevant lines that usually have to be patched during an upgrade are the lines with key "image" like:

image: 485790562880.dkr.ecr.us-east-2.amazonaws.com/akamas/master-db:1.7.0

In order to update to a new version you should replace the versions (1.7.0 or 2.3.0) after the colon with the new versions (ask your Akamas support for the correct service versions for a specific Akamas release) then you should restart Akamas with the following console commands: First login to Akamas CLI with:

akamas login

and type username and password as in the example below

ubuntu@ak_machine:~/akamas/ $ akamas login
User: akamas
Password:
User akamas logged in. Welcome.

Now make sure you have the following AWS variables with the proper value in your Linux user environment:

AWS_DEFAULT_REGION
AWS_SECRET_ACCESS_KEY
AWS_ACCESS_KEY_ID

Then log in to AWS with the following command:

aws ecr get-login-password --region us-east-2 | docker login --username AWS --password-stdin 485790562880.dkr.ecr.us-east-2.amazonaws.com
Login Succeeded

Then pull all new ECR images for the new service versions you just changed (this should be done from when inside the same folder where file docker-compose.yml resides, usually $HOME/akamas/) with the following command:

docker-compose pull

It should return an output like the following:

Pulling database                ... done
Pulling optimizer               ... done
Pulling elasticsearch           ... done
Pulling log                     ... done
Pulling metrics                 ... done
Pulling telemetry               ... done
Pulling analyzer                ... done
Pulling campaign                ... done
Pulling system                  ... done
Pulling license                 ... done
Pulling store                   ... done
Pulling airflow-db              ... done
Pulling benchmark               ... done
Pulling kong-database           ... done
Pulling kong                    ... done
Pulling user-service            ... done
Pulling keycloak                ... done
Pulling logstash                ... done
Pulling kibana                  ... done
Pulling kong-consumer-init      ... done
Pulling kong-migration          ... done
Pulling keycloak-initializer    ... done
Pulling telemetry-init          ... done
Pulling curator-only-pull-image ... done
Pulling airflow                 ... done
Pulling orchestrator            ... done
Pulling akamas-init             ... done
Pulling akamas-ui               ... done
Pulling pg-admin                ... done
Pulling grafana                 ... done
Pulling prometheus              ... done
Pulling node-exporter           ... done
Pulling cadvisor                ... done
Pulling konga                   ... done

Finally, relaunch all services with:

docker-compose up -d

(usage example below)

ubuntu@ak_machine:~/akamas/ $ docker compose up -d
pgadmin4 is up-to-date
prometheus is up-to-date
benchmark is up-to-date
kibana is up-to-date
node-exporter is up-to-date
store is up-to-date
grafana is up-to-date
cadvisor is up-to-date
Starting telemetry-init ...
Starting curator-only-pull-image ...
Recreating database2             ...
Recreating airflow-db            ...
Starting kong-initializer        ...
akamas-ui is up-to-date
elasticsearch is up-to-date
Recreating kong-db               ...
Recreating metrics               ...
logstash is up-to-date
Recreating log                   ...
...(some logging follows)

Wait for a few minutes and check the Akamas services are back up by running the command:

akamas status -d

The expected output should be like the following (repeat the command after a minute or two if the last line is not "OK" as expected):

Checking Akamas services on http://localhost:8000 service status
analyzer UP
campaign UP
metrics UP
optimizer UP
orchestrator UP
system UP
telemetry UP
license UP
log UP
users UP
OK

Using Akamas

This section describes how to use Akamas

This guide introduces the optimization process and methodology with Akamas and then provides a step-by-step description of how to prepare, run and analyze Akamas optimization studies:

and also provides some technology-specific guidelines and examples on:

General optimization process
Preparing optimization studies
Running optimization studies
Guidelines for choosing optimization parameters
Guidelines for defining optimization studies

Preparing optimization studies

Preparing an optimization study requires several steps, as illustrated by the following figure:

and described in the following sections:

Notice that while these steps apply to both offline optimization studies and live optimization studies, some of these steps are different depending on which optimization is being prepared.

modeling systems
modeling components
creating telemetry instances
creating automation workflow
creating optimization study

General optimization process and methodology

Akamas has been designed and implemented to effectively support organizations in implementing their own approach to optimization, in particular, thanks to its Infrastructure as Code (IaC) design, modular and reusable constructs, and delegation-of-duty features to support multiple teams.

While an optimization process can also be a one-shot exercise aiming at optimizing a specific critical application to remediate performance issues or to address a cost reduction initiative, in general, optimization is conceived as a continuous and iterative process. This process can be seen as composed of multiple optimization campaigns running in parallel (each typically involving a single application) that are being executed at the same time (see the following figure).

At any given timeframe, for a specific application, there could be multiple studies being executed either in parallel or in sequence (see the following figure):

  • multiple live optimizations running for each critical application microservices; typically, a live optimization focuses on an application microservice supporting a specific business function with respect to specific optimization goals and constraints, as the optimization could be aimed for some microservices at improving performance while trading lower costs, while for others at keeping performances within the SLOs and reducing infrastructure or cloud cost;

  • multiple offline optimization studies may correspond to the different layers of the target system that are being optimized in several stages (typically starting with the backend layer, then the middleware, and finally the front-end layer), or to several application releases with different resources footprint (e.g. higher memory usage), or that involve technology changes in the application stack (e.g. moving from Oracle to MongoDB) or migration to a different cloud provider (or cloud managed service), or that are required to sustain higher workload (e.g. due to a marketing campaign) or to ensure application resilience under failure scenarios (identified by chaos engineering).

The following figure intends to illustrate the variety of scenarios in a real optimization process:

For example (with reference to the previous figure):

  • the optimization campaign for the microservices-based application App-1 runs an offline optimization study for the App-1-1 microservice in Q1 and the App-1-2 microservice in Q2, before running live optimizations for both these microservices in parallel starting from Q3; notice that in Q4, possibly to anticipate a workload growth and assess the required infrastructure, an offline optimization for App-1-2 (possibly the most resource-demanding microservice) is also executed;

  • the optimization campaign for the standalone application App-2 runs several offline optimizations in sequence: in Q1 and Q2, first separately on the frontend and backend layers of App-2 (respectively App-2-FE and App-2-BE) and then in Q3 for the entire application; in Q4, in addition to the quarterly optimization for App-2 with respect to the goal Goal-2-1 that was used in the previous optimizations, also another offline optimization is executed with respect to a different goal Goal-2-2, which could either be a refinement of the previous goal (e.g. with tighter SLOs) or reflecting a completely different goal (e.g. a cost-reduction goal with respect to a performance improvement goal);

  • the optimization campaign for the microservices-based application App-3 runs first a live optimization starting at some point in Q2 (for example as the application is first released) for most-critical microservice App-3-1 and then in Q3 also for other microservice App-3-2, possibly as a refinement of the modeling of App-3 based on the observed optimization results.

In Akamas, an optimization campaign is structured into one or more , which represent an optimization initiative aimed at optimizing a target system with respect to defined goals and constraints.

These studies can be either , which are typically executed in test or pre-production environments, also to validate planned changes or what-if scenarios, or which run directly in production environments.

More complex scenarios may result in the case of multiple teams (working jointly or separately) on the same or different applications, which in Akamas can be organized in different .

optimization studies
offline optimization studies
live optimization studies
workspaces

Creating custom optimization packs

To create a custom optimization pack, the following fixed directory structure and several YAML manifests need to be created.

Optimization pack directory structure

my_dir
|_ optimizationPack.yaml
|_ component-types
|  |_ componentType1.yaml
|
|_ metrics
|  |_ metricsGroup1.yaml
|
|_ parameters
|  |_ parametersGroup1.yaml
|
|_ telemetry-providers
|_ provider1.yaml

Optimization pack manifest

The optimizationPack.yaml file is the manifest of the optimization pack to be created, which should always be named optimizationPack and have the following structure:

name: Java_8_Optimization_Pack
description: An optimization pack for the Java Hotspot JVM version 8
weight: 1
version: 1.0.0
tags:
- java
- jvm

where:

Field
Type
Value restrictions
Is required
Default value
Description

name

string

It should not contain spaces.

TRUE

The name of the optimization pack.

description

string

TRUE

A description to characterize the optimization pack.

weight

integer

weight > 0

TRUE

A weight to be associated to the optimization pack. This field is used for licensing purposes.

version

string

It should match the regexp: \d.\d.\d

TRUE

The version of the optimization pack.

tags

array of string

FALSE

An empty array

A set of tags to make the optimization pack more easily searchable and discoverable.

Component types

The component-types directory should contain the manifests of the component types to be included in the optimization pack. No particular naming constraint is enforced on those manifests.

Metrics

The metrics directory should contain the manifests of the groups of metrics to be included in the optimization pack. No particular naming constraint is enforced on those manifests.

Parameters

The parameters directory should contain the manifests of the groups of parameters to be included in the optimization pack. No particular naming constraint is enforced on those manifests.

Telemetry providers

The telemetry-providers directory should contain the manifests of the groups of parameters to be included in the optimization pack. No particular naming is enforced on those manifests.

Building optimization pack descriptor

The following command need to be executed in order to produce the final JSON descriptor:

akamas build optimization-pack PATH_TO_THE_DIRECTORY

Modeling systems

The following figure shows a system corresponding to a Java-based application, where the Java Virtual Machine (JVM) and Kubernetes containers have been identified as key components.

As shown in this figure, a supported component is the "web application", representing the end user perspective of the modeled system (e.g. response time). As expected, this component type only provides measured metrics and no tunable parameters.

Best Practices

Properly modeling the application or service to be optimized by identifying the components and their parameters to tune is the first important step in the optimization process. Some best practices are described here below.

Modeling only relevant components

When defining the system and its components, it is convenient to focus only on those components that are either providing tunable parameters or key metrics (or KPIs).

Key metrics are those used to:

  • support the analysis of the optimization results, as metrics that are useful to measure the impact of parameter tuning on the performance, efficiency, or reliability of the system. For example, a Linux OS component could be used to assess the impact of the optimization on the system-level metrics such as CPU utilization.

Please note that the metrics used to define the optimization goal and constraints are mandatory as they are used by the Akamas AI engine to validate and score each tested configuration against the goal. Other metrics that are not related to the optimization goal and constraints can be considered optional from a pure optimization implementation perspective.

When defining the optimization study, it is always possible to select which parameters and metrics to consider, thus which components are modeled in the system. Therefore, a system could be modeled by all components that at some point are going to be optimized, even if not used in the current optimization study. However, the recommended approach is to model the system only with components whose parameters (and relevant metrics) are to be tuned by the current study.

Reusing systems whenever possible

Whenever possible, it is recommended to model systems and their components by considering how these could be reused for multiple optimization studies in different contexts.

For example, it might be useful to create a simple system containing only one component (e.g. the JVM) for a first optimization study. A new system might then be created to include other components (e.g. the application server) for more advanced optimization studies.

Modeling systems with horizontal scalability

A typical optimization target is a cluster, i.e. a system made of multiple instances that provide horizontal scalability (e.g. a Kubernetes deployment with several replicas). In this scenario, all the instances are supposed to be identical both from a code and configuration perspective. In this scenario, the recommended approach is to create only one component that represents a generic instance of the cluster. This way, all the instances will be tuned in exactly the same way.

Notice that in order for this approach to work correctly, it is also important to verify that the cluster is correctly monitored by the telemetry providers. Depending on the telemetry technology in use, the clustered system may be presented as either a single entity, with aggregated metrics (e.g. a Kubernetes deployment with the total CPU usage of all the replica pods), or as multiple entities, each corresponding to the different instances in the cluster:

  • in case aggregated metrics are provided by the telemetry provider for the cluster, these metrics can be simply assigned to the component modeling the whole cluster;

  • in case only instance-level metrics are made available by the telemetry provider, telemetry instances need to be configured in Akamas so as to aggregate the metrics of the cluster instances (e.g. averaging CPU utilization, summing memory usage, etc.), depending on how each specific metric is expected to be used in the goal and constraints or in the study results.

See for details on the structure of those manifests.

See for details on the structure of those manifests.

See for details on the structure of those manifests.

See for details on the structure of those manifests.

After this, the optimization pack can be installed (and then used) as described on the page.

The very first preparatory step is to model the representing an application or a service that needs to be optimized (also known as the optimization target).

Modeling a system translates into identifying the representing the key technology elements to be included in the optimization. Each component is associated with a set of tunable , i.e. configurable properties that impact the performance, efficiency, or reliability of the system, and with a set of , i.e. measurable properties that are used to evaluate the performance, efficiency, or reliability of the system. Typically, key system components are identified by considering which elements and their parameters need to be tuned.

Akamas provides several out-of-the-box to support system and component modeling. Moreover, it is also possible to define new component types to model other components (see ).

The section of the reference guide describes the template required to define a system, while the commands for creating a system are listed on the page.

define the optimization , either as metrics that are expected to be improved by the optimization or as metrics representing constraints. For example, a typical goal is to optimize the application throughput. In this case, a Web Application component should include service metrics such as transaction throughput or transaction response time.

Please, also notice that systems (and other Akamas artifacts) can be shared with different teams thanks to the definition of Akamas .

In this scenario, the associated automation needs to be configured to ensure that each configuration is applied to the whole cluster, by propagating the parameter configuration to all of the cluster instances, not just to a single instance represented by the modeled component whose metrics are collected and used to evaluate the overall cluster behavior under that configuration.

Component Types template
Metric template
Parameter template
Telemetry Provider template
Managing optimization packs
system
components
parameters
metrics
component types
Modeling components
System template
Resource Management command
goal and constraints
workspace
workflow

Modeling components

After identifying the components that are required to model a system, the following step is to model each identified key component.

While the optimization process does not necessarily require component types and optimization packs to be defined, it is recommended to leverage this construct to facilitate modularization and reuse.

This is possible as the Akamas optimization pack model is extensible: custom optimization packs can be easily created without any programming to allow Akamas optimization capabilities to be applied to virtually any technology.

Managing optimization packs

Whether out-of-the-box or custom, before being used optimization packs need to be installed on an Akamas installation before being used.

Since optimization packs are global resources that are shared across all the workspaces on the same Akamas installation, an account with administrative privileges is required to manage them.

Optimization packs that are not yet installed are displayed as grayed out in the Akamas UI (this is the case for the AWS and Docker packs in the following figure).

An Akamas installation comes with the latest optimization packs already loaded in the store and is able to check the central repository for updates.

Installing

There are two ways of installing an optimization pack:

  • online installation - this is the general case when the optimization pack is already in the store

Only in the first case, an optimization pack can be installed from the UI. See here below the command line commands to get an optimization pack installed.

Online installation

Execute the following command by specifying the name of the optimization pack that is already available in the store:

akamas install optimization-pack OPTIMIZATION_PACK_NAME

Offline installation

curl -O https://akamas.s3.us-east-2.amazonaws.com/optimization-packs/Linux/1.3.0/Linux_1-3-0.json akamas install optimization-pack Linux_1-3-0.json

Execute the following command to install an optimization pack by specifying the name of the optimization pack and the full path to the JSON descriptor file:

akamas install optimization-pack PATH_TO_JSON_DESCRIPTOR

Forcing installation

When installing an optimization pack, the following checks are executed to identify potential clashes with already existing resources:

  • name of the optimization pack

  • metrics

  • parameters

  • component types

  • telemetry providers

In case one of those checks is positive (i.e. a clash exists), the installation failed and a message notifies that a "force" option needs to be used to get the optimization pack installed anyway

akamas install -f optimization-pack OPTIMIZATION_PACK_NAME

Please be aware that when forcing the installation of an optimization pack, Akamas replaces (or merges) all the conflicting resources, except that if there is at least one custom resource, the installation is stopped. In this case, the custom resource needs to be manually removed first in order to proceed.

Uninstalling

The following command uninstalls an optimization pack

akamas uninstall --force OPTIMIZATION_PACK_NAME

Notice that this also deletes all the components built using that optimization pack.

Updating

In case a new optimization pack needs to be installed from a descriptor, the procedure is the following:

  • uninstall the optimization pack

  • remove the old version of the optimization pack descriptor file from the store container;

  • install the new optimization pack with the new JSON descriptor

Akamas provides the corresponding for their specific technology (and possibly version) and describing all the tunable parameters and metrics of interest. The full list of Akamas optimization packs is available on the o page of the Akamas reference guide.

The section of the reference guide describes the template required to define a system component, while the commands for creating a system component are listed on the page.

The page describes how to create a new optimization pack (possibly by reusing an already existing one) while the page in the Akamas reference guide describes how to define a custom component type (if required).

Notice that optimization packs, even if provided out-of-the-box by Akamas, need to be installed (as described on thepage), in case they have not yet been used before in the Akamas installation, by other users. Indeed, optimization packs are global resources that are shared across all the workspaces on the same Akamas installation.

offline installation - this may apply to custom optimization packs available as a JSON file (refer to the page)

The following command describes how to download the file descriptor related to the version 1.3.0 of the Linux optimization pack:

component types
optimization packs
ptimization packs
Component template
Resource Management command
Creating custom optimization pack
Component Type template
Managing optimization packs
Creating custom optimization pack
https://akamas.s3.us-east-2.amazonaws.com/optimization-packs/Linux/1.3.0/Linux_1-3-0.json

Performing load testing to support optimization activities

This page provides a short compendium of general performance engineering best practices to be applied in any load testing exercise. The focus is on how to ensure that realistic performance tests are designed and implemented to be successfully leveraged for optimization initiatives.

The goal of ensuring realistic performance tests boils down to two aspects:

  • sound test environments;

  • realistic workloads.

Test environments

A test o the pre-production environment (Test Env from now on) needs to represent as closely as possible the production environment (ProdEnv from now on).

The most representative test environment would be a perfect replica of the production environment from both infrastructure (hardware) and architecture perspectives. The following criteria and guidelines can help design a TestEnv that is suitable for performance testing supporting optimization initiatives.

Hardware specifications

The hardware specifications of the physical or virtual servers running in TestEnv and ProdEnv must be identical. This is because any differences in the available resources (e.g. amount of RAM) or specification (e.g. CPU vendor and/or type) may affect both services performance and system configuration.

This general guideline can only be relaxed for servers/clusters running container(s) or container orchestration platforms (e.g. Kubernetes or OpenShift). Indeed, it is possible to safely execute most of the related optimization cases if the TestEnv guarantees enough spare/residual capacity (number of cores or amount of RAM) to allocate all the needed resources.

While for monolithic architectures this may translate into significant HW requirements, with microservices this might not be the case, for two main reasons:

  • microservices are typically smaller than monoliths and designed for horizontal scalability: this means that optimizing the configuration of the single instance (pod/container resources and runtime settings) becomes easier as they typically have smaller HW requirements;

  • approaches like Infrastructure-as-code (IaaC), typically used with cloud-native applications, allow for easily setting up cluster infrastructure (on-prem or on the cloud) that can mimic production environments.

Downscaled/downsized architecture

Test Envs are typically downscaled/downsized with respect to Prod Envs. If this is the case, then optimizations can be safely executed provided it is possible to generate a "production-like" workload on each of the nodes/elements of the architecture.

This can be usually achieved if all the architectural layers have the same scale ratio between the two environments and the generated workload is scaled accordingly. For example, if the ProdEnvs has 4 nodes at the front-end layer, 4 at the backend layer, and 2 at the database layer, then a TestEnv can have 2 nodes, 2 nodes, and 1 node respectively.

Load balancing among nodes

From a performance testing perspective, the existence of a load balancing among multiple nodes can be ignored, if the load balancing relies on an external component that ensures a uniform distribution of the load across all nodes.

On the contrary, if an application-level balancing is in place, it might be required to include at least two nodes in the testing scenario so as to take into account the impact of such a mechanism on the performance of the cluster.

External/downstream services

The TestEnv should also replicate the application ecosystem, including dependencies from external or downstream services.

External or downstream services should emulate the production behavior from both functional (e.g. response size and error rate) and performance (e.g. throughput and response times) perspectives. In case of constraints or limitations on the ability to leverage external/downstream services for testing purposes, the production behavior needs to be simulated via stubs/mock services.

In the case of microservices applications, it is also required to replicate dependencies within an application. Several approaches can be taken for this purpose, such as:

  • replicating interacting microservices;

  • disregarding dependencies with nonrelevant services (e.g. a post-processing service running on a mainframe whose messages are simply left published in a queue without being dequeued).

Test cases

The most representative performance test script would provide 100% coverage of all the possible test cases. Of course, this is very unlikely to be the case in performance testing. The following criteria and guidelines can be considered to establish the required test coverage.

Statistical relevance

The test cases included in the test script must cover at least 80% of the production workload.

Business relevance

The test cases included in the test script must cover all the business-critical functionalities that are known (or expected) to represent a significant load in the production environment

Technical relevance

The test cases included in the test script must cover all the functionalities that at the code level involve:

  • Large objects/data structure allocation and management

  • Long living objects/data structure allocation and management

  • Intensive CPU, data, or network utilization

  • "one of-a-kind" implementations, such as connections to a data source, ad-hoc objects allocation/management, etc.

Test user paths and behavior

The virtual user paths and behavior coded in the test script must be representative of the workload generated by production users. The most representative test script would account for the production users in terms of a mix of the different user paths, associated think times, and session length perspectives.

When single-user paths cannot be easily identified, the best practice is to consider each of them the most comprehensive user journey. In general, a worst-case approach is recommended.

The task of reproducing realistic workloads is easier for microservice architectures. On the contrary, for monolithic architectures, this task could become hard as it may not be easy to observe all of the workloads, due to custom frameworks, etc. With microservices, the workload can be completely decomposed in terms of APIs/endpoints and APM tools can provide full observability of production workload traffic and performance characteristics for each single API. This guarantees that the replicated workload can reproduce the production traffic as closely as possible.

Test data

Both test script data, that is datasets used in the test script, and test environment data, that is datasets in any involved databases/datastores, have to be characterized both in terms of size and variance so as to reproduce the production performances.

Test script data

The test script data has to be characterized in order to guarantee production-like performances (e.g. cache behavior). In case this characterization is difficult, the best practice is to adopt a worst-case approach.

Test environment data

The test data must be sized and have an adequate variance to guarantee production like performances in the interaction with databases/datastores (e.g. query response times).

Test scenarios

Most performance test tools provide the ability to easily define and modify the test scenarios on top of already defined test cases/scripts, test case-mix, and test data. This is especially useful in the Akamas context where it might be required to execute a specific test scenario, based on the specific optimization goal defined. The most common (and useful, in the Akamas context) test scenarios are described here below.

Load tests

A load test aims at measuring system performance against a specified workload level, typically the one experienced or expected in production. Usually, the workload level is defined in terms of virtual user concurrency or request throughput.

In the load test, after an initial ramp-up, the target load level is maintained constant for a steady state until the end of the test.

When validating a load test, the following two key factors have to be considered:

  • The steady-state concurrency/throughput level: a good practice is to apply a worst-case approach by emulating at least 110% of the production throughput;

  • The steady-state duration: in general defining the length for steady-state is a complex task because it is strictly dependent on the technologies under test and also because phenomena such as bootstraps, warm-ups, and caching can affect the performance and behavior of the system only before or after a certain amount of time; as a general guide to validate the steady-state duration, it is useful to:

    1. execute a long-run test by keeping the defined steady-state for at least 2h to 3h;

    2. analyze test results by looking for any variation in the performance and behavior of the system over time;

    3. In case no variation is observed, shorten the defined same steady-state to at least 30+min.

Stress tests

A Stress test is all about pushing the system under test to its limit.

Stress tests are useful to identify the maximum throughput that an application can cope with while working within its SLOs. Identifying the breaking point of an application is also useful to highlight the bottleneck(s) of the application.

A stress test also makes it possible to understand how the system reacts to excessive load, thus validating the architectural expectations. For example, it can be useful to discover that the application crashes when reaching the limit, instead of simply enqueuing requests and slowing down processing them.

Endurance tests

An endurance test aims at validating the system's performance over an extended period of time.

Validating tests vs production

The first validation is provided by utilization metrics (e.g. CPU, RAM, I/O), which should closely display in the test environments the same behavior of production environments. If the delta is significant, some refinements of the test case and environment might be required to close the gap and gain confidence in the test results.

Creating workflows for live optimizations

More in detail, a typical workflow includes the following types of tasks:

  • Applying the configuration, by preparing and then applying the parameter configuration that has been recommended and/or approved to the target environment - this may require interfacing configuration management tools or pushing configuration to a repository

Depending on the complexity of the system, the workflow might be composed by multiple actions of the same type, each operating on separate components of the target system.

Creating telemetry instances

Telemetry Providers are shared across all the workspace in the same Akamas installation and require an account with administrative privileges to manage them. Any number of telemetry instances (even of the same type) can be specified. For example, the following figure shows two Prometheus telemetry instances associated with the Adservice system.

Best Practices

The following sections provide guidelines on how to create telemetry instances.

Verify metrics provided by the telemetry provider

A seemingly obvious, yet fundamental, best practice when choosing a telemetry provider is to check whether the required metrics:

  • are supported by the original data source or can be added (e.g. as it is in the case of Prometheus)

  • are available and can be effectively gathered in the specific implementation

  • are supported by the telemetry provider itself or whether it needs to be extended (this is the case for a Prometheus telemetry provider ) as in the case of custom metrics such as those made available by the application itself

Creating automation workflows

Creating optimization studies

The final preparatory step before running a study is to actually create the study, which also requires several substeps.

Offline optimization studies

For offline optimization studies, there are some additional (optional) steps:

Live optimization studies

For live optimization studies, there are some additional steps - including a mandatory one:

Creating workflows for offline studies

More in detail, a typical workflow includes the following types of tasks:

  • Preparing the application, by executing all cleaning or reset actions that are required to prepare the load testing phase and ensuring that each experiment is executed under exactly the same conditions - for example, this may involve cleaning caches, uploading test data, etc

  • Applying the configuration, by preparing and then applying the parameter configuration under test to the target environment - this may require interfacing configuration management tools or pushing configuration to a repository, restarting the entire application or some of its components to ensure that some parameters are effectively applied, and then checking that after restarting the application is up & running before the workflow execution continues, and checking whether the configuration has been correctly applied

  • Applying the workload, by launching a load test to assess the behavior of the system under the applied configuration and synthetic workload defined in the load testing scenarios - of course, a preliminary step is to design a load testing scenario and synthetic workload that ensures that optimized configurations resulting from the offline optimization can be applied to the target system under the real or expected workload

Failing workflows

A workflow interrupts in case any of its steps does. A failing workflow causes the experiment or trial to fail. This should be considered as a different situation than a specific configuration not matching optimization constraints or causing the system under test to fail to run. For example, if the amount of max memory configured was too low, the application may fail to start.

When an experiment fails, the Akamas AI engine takes this information into account and thus learns that that parameter configuration was bad. This way, the AI engine automatically tries to avoid the regions of the parameter space which can lead to low scores or failures.

Best Practices

Creating effective workflows is essential to ensure that Akamas can automatically identify the optimal configuration in a reliable and efficient way. Some best practices on how to build robust workflows are described here below.

Reusing workload as much as possible

Since Akamas workflows are first-class entities that can be used by multiple studies, it might be useful to avoid creating (and maintaining) multiple workflows and instead define workflows that can be easily reused, by factoring all differences into specific action parameters.

Of course, this general guideline should be balanced with respect to other requirements, such as avoiding potential conflicts due to different teams modifying the same workload for different uses and potentially impacting optimization results.

Building robust workflows

Akamas takes into account the exit code of each of the workflow tasks, and the whole workflow fails if a task exits with an error. Therefore, the best practice is to make use of exit codes in each task, to ensure that task failures can only happen in case of bad parameter configuration.

For example, it is important to always check that the application has correctly started and is up and running (after a new configuration has been applied). This can be done by:

  • including a workflow task that tests the application is up and running after the tasks where the configuration is applied;

  • making sure that this task exits with an error in case the application has not correctly started (typically after a timeout).

Another example is when the underlying environment incurs issues during the optimization (e.g. a database might be mistakenly shut down by another team). As much as possible, all these environmental transient issues should be carefully avoided. Akamas also provides the ability to execute multiple task retries (default is twice, configurable) to compensate for these transient issues, provided they only last for a short time (the retry time and delay are also configurable).

Building workflows that ensure reproducible experiments

As for any other performance evaluation activity, Akamas experiments should be designed to be reproducible: if the same experiment (hence, the same parameter configuration) is executed multiple times (i.e. in multiple trials), the same performance results should be found for each trial.

Therefore, it is fundamental that workflows include all the necessary tasks to realize reproducible experiments. Particular care needs to be taken to correctly manage the system state across the experiments and trials. System state can include:

  • Application caches

  • Operating system cache and buffers (e.g. Linux filesystem page cache)

  • Database tables that fill up during the optimization process

All experiments should always start with a clean and well-known state. If the state is not properly managed, it may happen that the performance of the system is observed to change (whether higher or lower) not because of the effect of the applied parameters, but due to other effects (e.g. warming of caches).

Best practices to consistently manage system state across experiments include:

  • Restoring the system state at the beginning of each experiment - this may involve restarting the application, clearing caches, restoring DB tables, etc;

  • Allowing for a sufficient warm-up period in the performance tests, so to ensure application performance has reached stability. See also the recommended best practices about properly managing warm-up periods in the following section about creating an optimization study.

Another common cause that can impact the reproducibility of experiments is an unstable infrastructure or environment. Therefore, it is important to ensure that the underlying infrastructure is stable and that no other workload that might impact the optimization results is running on it. For example, beware of scheduled system jobs (e.g. backups), automatic software updates or anti-virus systems that might not explicitly be considered as part of the environment but that may unexpectedly alter its performance behavior.

Taking into account workflow duration

When designing workflows, it is important to take into account the potential duration of their tasks. Indeed, the task duration impacts the duration of the overall optimization and might impact the ability to execute a sufficient number of experiments within the overall time interval or specific time windows allowed for the optimization study.

Typically, the longest task in a workflow is the one related to applying workload (e.g. launching a load test or a batch job): such tasks can last for dozens of minutes if not hours. However, a workflow may also include other ancillary tasks that may provide nontrivial contributions to the task durations (e.g. checking the status to ensure that the application is up & running).

Making workflows fail fast

As general guidance, it is better to fail fast by performing quick checks executed as early as possible. For example, it is better to do a status check before launching a load test instead of possibly waiting for it to complete (maybe after 1h) just to discover that the application did not even start.

mocking these microservices and simulating realistic response times using simulation tools such as ;

A workflow for a automates all the actions required to interface the configuration management. Notice that metrics collection is an implicit action that does not need to be coded as part of the workflow.

As expected, with respect to , there are no actions to apply synthetic workloads as part of a load-testing scenario.

After modeling the system and its components, the following step (see the following figure) is to ensure that all the metrics that are required to define goals and constraints and analyze the behavior of the target system can be collected from one of the available data sources available in the environment, that in Akamas are called .

Akamas provides a number of out-of-the-box telemetry providers, including industry-standard monitoring platforms (e.g. Prometheus or Dynatrace), performance testing tools (e.g. LoadRunner or JMeter), or simple CSV files. The section lists all the out-of-the-box telemetry providers and how to get them integrated by Akamas, while the section describes the mapping of the specific data source metrics to Akamas metrics).

Since several instances of a data source type might be available, the specific data source instance needs to be specified, that is a corresponding needs to be defined for the modeled system and its components.

The section of the reference guide describes the template required to define a telemetry instance, while the commands for creating a telemetry instance are listed on the page.

Akamas makes it possible to validate whether a telemetry setup works correctly by first executing dry runs. This is discussed in the context of the recommended practices to run optimization studies (section ).

After modeling the system and its components and ensuring that appropriate telemetry instances are defined, the following step (see the following figure) is to define a .

A workflow automates all the tasks to be executed in sequence (see the following figure) during the optimization study, in particular those leveraging integrations with external entities, such as telemetry providers or configuration management tools. Akamas provides a number of general-purpose and specialized workflow operators (see page).

The section of the reference guide describes the template required to define a workflow, while the commands for creating a workflow are listed on the page.

Since a workflow is an Akamas resource defined at the level and that can be used by multiple studies, it might be the case that a convenient workflow is already available or can be used to create a new workflow for the specific target system and integrations, by adding/removing some workflow tasks, changing the task sequence or the values assigned to task parameters.

Notice that since the structure of workflows defined for a and for an are very different, these cases are described by a specific page:

Most of the substeps are common for both a and an , even if they might need to be conceived differently in these two different contexts:

Other optional and mandatory steps are specific for offline optimization studies () and live optimization studies ().

The section of the reference guide describes the template for creating a study, while the commands for creating a study are on the page. For offline optimization studies only, the Akamas UI displays the "Create a study" button that provides a visual step-by-step procedure for creating a new optimization study (see the following figure).

(optional - typically after defining the goal & constraints)

(optional - typically after defining the goal & constraints)

Notice that Akamas also allows existing offline optimization studies to be duplicated either from the Akamas UI (see the following figure) or from the command line (refer to the page).

(mandatory - typically after defining the goal & constraints)

(optional - typically when defining the optimization steps)

A workflow for an automates all the actions required to interface the configuration management and load testing tools (see the following figure) at each experiment or trial. Notice that metrics collection is an implicit action that does not need to be coded as part of the workflow.

The section provides some examples of how to define workload for a specific technology. In a complex application, a workflow may include multiple actions of the same type, each operating on separate components of the target system. The guide provides some real-world examples of how to create workflows and optimization studies.

This explains why it is important to build robust workflows that ensure experiments only fail in case bad configurations are tested. See the specific entry in the best practices section below.

Some additional best practices related to the design and implementation of load testing are described in the page.

https://github.com/spectolabs/hoverfly
live optimization study
workflows for offline optimization studies
telemetry providers
Integrating Telemetry Providers
Telemetry metric mapping
telemetry instance
Telemetry instance template
Resource Management command
Running optimization studies
workflow
Workflow Operator
Workflow template
Resource Management command
workspace
live optimization study
offline optimization study
creating workflows for offline optimization studies
creating workflows for live optimization studies
live optimization study
offline optimization study
defining the optimization goal & constraints
defining the optimization parameters & metrics
defining the optimization steps
Study template
Resource Management command
defining windowing policies
defining KPIs
Resource management commands
defining workloads
setting safety policies
offline optimization study
Optimization examples
Knowledge base
Performing load testing to support optimization activities
see below
see below
Building robust workflow

Defining parameters & metrics

As illustrated by the previous and following figures, during this step is also possible to edit the range of values associated with each optimization parameter with respect to the default domain provided by either the original or custom optimization pack in use for the respective technology.

Parameter rendering

By default, all parameters specified in the parameters selection of a study are applied ("rendered"). Akamas allows specifying which configuration parameters should be applied in the optimization steps. More precisely:

  • parameter rendering is available at the step level for baseline, preset, and optimize steps

  • parameter rendering is not available for bootstrap steps (bootstrapped experiments are not executed)

This feature can be useful to deal with the different strategies through which applications and systems accept configuration parameters.

Best Practices

The following sections provide some best practices on how to best approach the step of defining optimization parameters. .

Configure parameters domains based on environment specs

Since the parameter domain defines the range of values that the Akamas AI engine can assign to the parameter, when defining the system parameters to be optimized, it is important to review the parameter domains and adjust them based on the system characteristics of the target system, environment and best practices in place.

Akamas optimization packs already provide parameter domains that are correct for most situations. For example, the OpenJDK 11 JVM gcType is a categorical parameter that already includes all the possible garbage collectors that can be set for this JVM version.

For other parameters, there are no sensible default domains as they depend on the environment. For example, the OpenJDK 11 maxHeapSize JVM parameter dictates how much memory the JVM can use. This obviously depends on the environment in which the JVM runs. For example, the upper bound might be 90% of the memory of the virtual machine or container in which the JVM runs.

Configure parameter constraints based on Optimization Pack best practices

Depending on the specific technology under optimization, the configuration parameters may have relationships among themselves. For example, in a JVM the newSize parameter defines the size of a region of the JVM heap, and hence its value should be always less than the maxHeapSize parameter.

Akamas AI engine supports the definition of constraints among parameters as this is a frequent need when optimizing real-life applications.

It is important to define the parameter constraints when creating a new study. The optimization pack documentation provides guidelines on what are the most important parameter constraints for the specific technology.

When optimizing a new or custom technology, it may happen that some experiments fail due to unknown parameter constraints being violated. For example, the application may fail to start and only by analyzing the application error logs, the reason for the failure can be understood. For a Java application, the JVM error message (e.g. "new size cannot be larger than max heap size") could provide useful hints. This would reveal that some constraints need to be added to the parameter constraints in the study.

While the Akamas AI engine has been designed to learn from failures, including those due to relationships among parameters that were not explicitly set as constraints, setting parameter constraints may help avoid unnecessary failures and thus speed up the optimization process.

Defining windowing policies

For both offline and live optimization studies, it is possible to define how to identify the time windows that Akamas needs to consider for assessing the result of an experiment. Defining a windowing policy helps achieve reliable optimizations by excluding metrics data points that should not influence the score of an experiment.

The following two windowing policies are available:

  • Trim windowing: discards the initial and final part of an experiment - e.g. to exclude warm-up and tear-down phases - trim windowing policy is the default (with entire interval selection whether no trimming is specified)

  • Stability windowing: discard those parts that do not correspond to the most stable window - this leverages the Akamas features of automatically identifying the most stable window based on the user-specified specified criteria

Best Practices

The following sections provide general best practices on how to define suitable windowing policy.

Define windowing based on the optimization goal

In order to make the optimization process fully automated and unattended, Akamas automatically analyzes the time series of the collected metrics of each experiment and calculates the experiment score (all the system metrics will also be aggregated).

Based on the optimization goal, it is important to instruct Akamas on how to perform this experiment analysis, in particular, by also leveraging Akamas windowing policies.

For example, when optimizing an online or transactional application, there are two common scenarios:

  1. Increase system performance (i.e. minimize response time) or reduce system costs (i.e. decrease resource footprint or cloud costs) while processing a given and fixed transaction volume (i.e. a load test);

  2. Increase the maximum throughput a system can support (i.e., system capacity) while processing an increasing amount of load (e.g. a stress test).

In the first scenario, a load test scenario is typically used: the injected load (e.g. virtual users) ramps up for a period, followed by a steady state, with a final ramp-down period. From a performance engineering standpoint, since the goal is to assess the system performance during the steady state, the warm-up and tear-down periods can be discarded. This analysis can be automated by applying a windowing policy of type "trim" upon creating the optimization study, which makes Akamas automatically compute the experiment score by discarding a configurable warm-up and tear-down period.

In the second scenario, a stress test is typically used: the injected load follows a ramp with increasing levels of users, designed to stress the system up to its limit. In this case, a performance engineer is most likely interested in the maximum throughput the system can sustain before breaking down (possibly while matching a response time constraint). This analysis can be automated by applying a windowing policy of type "stability", which makes Akamas automatically compute the experiment score in the time window where the throughput was maximized but stable for a configurable period of time.

When optimizing a batch application, windowing is typically not required. In such scenarios, a typical goal is to minimize batch duration or aggregate resource utilization. Hence, there is no need to define any windowing policy: by default, the whole experiment timeframe is considered.

Finding an effective stability window

Setting up an effective stability window requires some knowledge of the test scenario and the variability of the environment.

As a general guideline it is recommended to run a baseline study with a stability window set to a low value, such as a value close to 0 or half of the expected mean of the metric, and then to inspect the results of the baseline to identify which window has been identified and update the standard deviation threshold accordingly. When using a continuous ramp the test has no plateaus, so the standard deviation threshold should be a bit higher to account for the increment of the traffic in the windowing period. On the contrary, when running a staircase test with many plateaus, the standard deviation can be smaller to identify a period of time with the same amount of users.

Applying the standard deviation filter to very stable metrics, such as the number of users, simplifies the definition of the standard deviation threshold but might hide some instability of the environment when subject to constant traffic. On the other hand, applying the threshold to a more direct measure of the performance, such as the throughput, makes it easier to identify the stability period of the application but might require more baseline experiments to identify the proper threshold value. The logs of the scoring phase provide useful insights into the maximum standard deviation found and the number of candidate windows that have been identified given a threshold value, which can be used to refine the threshold in a few baseline experiments.

After defining the goal and its constraints, the following substep in creating an optimization study is specifying the optimization and . In particular, selecting the parameters that are going to be tuned to optimize the system is a critical decision that requires carefully balancing complexity and effectiveness. As for goals & constraints, also this step may require adopting an iterative approach. See also the section here below.

The and pages of the section in the reference guide describe how to define the corresponding structure. For offline optimization studies only, the Akamas UI allows the parameters and metrics to be defined as part of the visual procedure activated by the "Create a study" button (see the following figure).

Please also refer to the for a number of selected technologies. Some examples provided in the Knowledge Base guide may also provide useful guidance.

Please refer to the page to see how to configure parameter rendering.

The parameter jvm_gcType as displayed in the OpenJDK 11 optimization pack

Defining good parameter domains is important to ensure the parameter configurations suggested by the Akamas AI engine will be as good as possible. Notice that if the domain is not defined correctly, this may cause experiment failures (e.g. the JVM could not start if the maxHeapSize is higher than the container size). As discussed as part of the for defining robust workflows, the Akamas AI engine has been designed to learn configurations that may lead to failures and to automatically discover any hidden constraints found in the environment.

The page of the section in the reference guide describes the corresponding structures. For offline optimization studies only, the Akamas UI allows the windowing policies to be defined as part of the visual procedure activated by the "Create a study" button (see the following figures).

Parameter selection
Metric selection
Study template
Guidelines for choosing optimization parameters
Parameter rendering
parameters
metrics
Best Practices
best practices
Windowing policy
Study template

Defining optimization steps

A final step in defining an optimization study is to specify specifies the sequence of steps executed while running the study.

The following four types of steps are available:

  • Baseline: performs an experiment and sets it as a baseline for all the other ones

  • Bootstrap: imports experiments from other studies

  • Preset: performs an experiment with a specific configuration

  • Optimize: performs experiments and generates optimized configurations

Please notice that at least one baseline step is always required in any optimization study. This applies not only to offline optimization studies, but also to live optimization studies as it is being used to suggest changes to parameter values starting from the default values.

Best Practices

The following sections provide some best practices on how to best approach the step of defining the baseline step.

Ensure the baseline configuration is correct

In an optimization study, the baseline is an important experiment as it represents the system performance with the current configuration, and serves as a reference to assess the relative improvements the optimization achieved.

Therefore, it is important to make sure the baseline configuration of the study correctly reflects the current configuration - be it the vendor default or the result of a manual tuning exercise.

Evaluate which parameters to include in the baseline configuration

When defining the study baseline configuration it is important to evaluate which parameters to include. Indeed, several technologies have default values assigned to most of their configuration parameters. However, the runtime behavior can be different depending on whether the parameter is set to the default value or it is not set at all.

Therefore, it is recommended to review the current configuration (e.g. the one in place in production) and identify which parameters and values have been set (e.g. JVM maxHeapSize = 2GB, gcType = Parallel, etc.), and then to only set those parameters with their corresponding values, without adding any other parameters. This ensures that the specified baseline is consistent with the real production setup.

Defining workloads

For a live optimization study, it is required to specify which component metrics represent the different workloads observed on the target system. A workload could be represented by either a metric directly measuring that workload, such as the application throughput, or a proxy metric, such as the percentage of reads and writes in your database.

Akamas features automatic detection of workload contexts, corresponding to different patterns for the same workload. For example, workload context could correspond to the peak or idle load, or to the weekend or weekday traffic. This allows Akamas to recommend safe configurations based on the observed behavior of the system under similar workload conditions.

Moreover, Akamas also provides customizable safety policies that drive the Akamas optimizer in evaluating candidate configurations with respect to defined goal constraints.

Online mode

Live optimizations can operate in one of the following online modes:

  • recommendation (or manual) mode (the default mode): Akamas does not immediately apply a configuration identified by Akamas AI: a new configuration is first recommended to the user, who needs to approve it, possibly after modifying it, before it gets applied - this is also referred to as human-in-the loop scenario;

  • fully autonomous (or automatic) mode: new configurations are immediately applied by Akamas as soon as they are generated by the Akamas AI, without being first recommended to (and approved by) the user.

It is worth noticing that under a recommendation mode, there might be a significant delay between the time a configuration is identified by Akamas and the time the recommended changes get applied. Therefore, the Akamas AI leverages the workload information differently when looking for a new configuration, depending on the defined online mode:

  • in the recommendation mode, Akamas takes into account all the defined workloads and looks for the configuration that best satisfies the goal constraints for all the observed workloads and provides the best improvements for all of them

  • in the fully autonomous mode, Akamas works on a single workload at each iteration (based on a customizable workload strategy - see below) and looks for an optimized configuration for that specific workload to be immediately applied in the next iteration, even if it might not be the best for the different workloads

Notice that the online mode can be changed at any time, that is while the optimization study is running, to become immediately effective. For example, a live optimization could initially operate in recommendation mode and then be changed to fully autonomous mode afterward.

Example of optimization study with two steps: baselining and optimize

The page in the section in the reference guide describes how to define the corresponding structures for each of the different types of steps allowed by Akamas. For offline optimization studies only, the Akamas UI allows the optimization steps to be defined as part of the visual procedure activated by the "Create a study" button (see the following figure).

In addition to the best practices here below, please refer to the section for a number of examples related to a variety of technologies and the guide for real-world examples.

The page of the section in the reference guide describes how to define the corresponding structure.

Akamas provides several parameters governing how the Akamas optimizer operates and leverages the workload information while a live optimization study is being executed. The most important parameter is the online mode (see ) as it related to whether the human user is part of the approval loop when the Akamas AI recommends a configuration to be applied.

The online mode can be specified at the study level and can also be overridden at the step level (only for steps of type "optimize" - see section ). The page of the section in the reference guide describes how to define the corresponding structure. This can be done either from the Akamas command line (see page ) or from the Akamas AI (see the following figure).

Steps
Study template
Optimization examples
Knowledge Base
Workload selection
Study template
Defining optimization steps
Optimize step
Study template
Optimizer option commands
here below

Defining KPIs

While the optimization goal drives the Akamas AI toward optimal configurations, there might be other sub-optimal configurations of interest in case they do not simply match the optimization constraints but might also improve on some Key Performance Indicators (KPIs).

For example:

  • for a Kubernetes microservice Java-based application, a typical optimization goal is to reduce the overall (infrastructure or cloud) cost by tuning both Kubernetes and JVM parameters while keeping SLOs in terms of application response time and error rate under control

  • among different configurations that provide similar cost reduction in addition to matching all SLOs, a configuration that would also significantly cause the application response time might be worth considering with respect to an optimal configuration that does not improve on this KPI

Akamas automatically considers any metric referred to in the defined optimization goal and constraints for an offline optimization study as a KPI. Moreover, any other metrics of the system component can be specified as a KPI for an offline optimization study.

Once KPIs are defined, Akamas will represent the results of the optimization in the Insights section of the Akamas UI. Moreover, the corresponding suboptimal configuration associated with a specific KPI is highlighted in the Akamas UI by a textual badge "Best <KPI name>".

The page of the section in the reference guide describes how to define the corresponding structure. Specifying the KPIs can be done while first defining the study or from the Akamas UI, at either study creation time or afterward (see the following figures).

Please notice that KPIs can also be re-defined after an offline optimization study has been completed as their definition does not affect the optimization process, only the evaluation of its results. See the section and the page.

KPIs
Study template
Analyzing offline optimization studies
Optimization Insights

Setting safety policies

While Akamas leverages similar AI methods for both live optimizations and optimization studies, the way these methods are applied is radically different. Indeed, for optimization studies running in pre-production environments, the approach is to explore the configuration space by also accepting potential failed experiments, to identify regions that do not correspond to viable configurations. Of course, this approach cannot be accepted for live optimization running in production environments. For this purpose, Akamas live optimization uses observations of configuration changes combined with the automatic detection of workload contexts and provides several customizable safety policies when recommending configurations to be approved, revisited, and applied.

Exploration factor

Akamas provides an optimizer option known as the exploration factor that only allows gradual changes to the parameters. This gradual optimization allows Akamas to observe how these changes impact the system behavior before applying the following gradual changes.

By properly configuring the optimizer, Akamas can gradually explore regions of the configuration space and slowly approach any potentially risky regions, thus avoiding recommending any configurations that may negatively impact the system. Gradual optimization takes into account the maximum recommended change for each parameter. This is defined as a percentage (default is 5%) with respect to the baseline value. For example, in the case of a container whose CPU limit is 1000 millicores, the corresponding maximum allowed change is 50 millicores. It is important to notice that this does not represent an absolute cap, as Akamas also takes into account any good configurations observed. For example, in the event of a traffic peak, Akamas would recommend a good configuration that was observed working fine for a similar workload in the past, even if the change is higher than 5% of the current configuration value.

Notice that this feature would not work for categorical parameters (e.g. JVM GC Type) as their values do not change incrementally. Therefore, when it comes to these parameters, Akamas by default takes a conservative approach of only recommending configurations with categorical parameters taking already observed before values. This still allows some never observed values to be recommended as users are allowed to modify values also for categorical parameters when operating in human-in-the-loop mode. Once Akamas has observed that that specific configuration is working fine, the corresponding value can then be recommended. For example, a user might modify the recommended configuration for GC Type from Serial to Parallel. Once Parallel has been observed as working fine, Akamas would consider it for future recommendations of GC Type, while other values (e.g. G1) would not be considered until verified as safe recommendations.

The exploration factor can be customized for each live optimization individually and changed while live optimizations are running.

Safety factor

Akamas provides an optimizer option known as the safety factor designed to prevent Akamas from selecting configurations (even if slowly approaching them) that may impact the ability to match defined SLOs. For example, when optimizing container CPU limits, lower and lower CPU limits might be recommended, up to the point that the limit becomes too low that the application performance degrades.

Akamas takes into account the magnitude of constraint breaches: a severe breach is considered more negative than a minor breach. For example, in the case of an SLO of 200 ms on response time, a configuration causing a 1 sec response time is assigned a very different penalty than a configuration causing a 210 ms response time. Moreover, Akamas leverages the smart constraint evaluation feature that takes into account if a configuration is causing constraints to approach their corresponding thresholds. For example, in the case of an SLO of 200 ms on response time, a configuration changing response time from 170 ms to 190 ms is considered more problematic than one causing a change from 100 ms to 120 ms. The first one is considered by Akamas as corresponding to a gray area that should not be explored.

The safety factor is also used when starting the study in order to validate the behavior of the baseline to identify the safety of exploring configurations close to the baseline. If the baseline presents some constraint violations, then even exploring configurations close to the baseline might cause a risk. If Akamas identifies that, in the baseline configuration, more than (safety_factor*number_of_trials) manifest constraint violations then the optimization is stopped.

If your baseline has some trials failing constraint validation we suggest you analyze them before proceeding with the optimization

The safety factor is set by default to 0.5 and can be customized for each live optimization individually and changed while live optimizations are running.

Outlier detection

It is also worth mentioning that Akamas also features an outlier detection capability to compensate for production environments typically being noisy and much less stable than staging environments, thus displaying highly fluctuating performance metrics. As a consequence, constraints may fail from time to time, even for perfectly good configurations. This may be due to a variety of causes, such as shared infrastructure on the cloud, slowness of external systems, etc.

Akamas provides a few customizable optimizer options (refer to the options described on the page of the reference guide) that should be configured so as to make configurations recommended in live optimization and applied to production environments as safe as possible.

Optimize step

Before running optimization studies

The following provides some best practices that can be adopted before launching optimization studies, in particular for offline optimization studies.

Dry-running the optimization study

It is recommended to execute a dry-run of the study to verify that the workflow works as expected and in particular that the telemetry and configuration management steps are correctly executed.

Verify that workflow actually works

It is important to verify that all the steps of the workflow complete successfully and produce the expected results.

Verify that parameters are applied and effective

When approaching the optimization of new applications or technologies, it is important to make sure all the parameters that are being set are actually applied and used by the system.

Depending on the specific technology at hand, the following issues can be found:

  • parameters were set but they are not applied - for example parameters were set in the wrong configuration file or the path is not correct;

  • some automatic (corrective) mechanisms are in place that overrides the values applied for the parameters.

Therefore, it is important to always verify the actual values of the parameters once the system is up & running with a new configuration, and make sure they match the values applied by Akamas. This is typically done by leveraging:

  • monitoring tools, when the parameters are available as metrics or properties of the system;

  • native administration tools, which are typically available for introspection or troubleshooting activities (e.g. jcmd for the JVM).

Verify that load testing works

It is important to verify that the integration with load testing tools actually executes the intended load test scenarios.

Verify that telemetry collects all the relevant metrics

It is important to make sure that the integration with telemetry providers works correctly and that all the relevant metrics of the system are correctly collected.

Data-gathering from the telemetry data sources is launched at the end of the workflow tasks. The status of the telemetry process can be inspected in the Progress tab, where it is also possible to inspect the telemetry logs in case of failures.

Please notice that the telemetry process fails if the key metrics of the study cannot be gathered. This includes metrics defined in the goal function or constraints.

Baselining the system

Before running the optimization study, it is important to make sure the system and the environment where the optimization is running provide stable and reproducible performance.

Make sure the system performance is stable

In order to ensure a successful optimization, it is important to make sure that the target system displays stable and predictable performance and does not suffer from random variations.

To make sure this is the case, it is recommended to create a study that only runs a single baseline experiment. In order to assess the performance of the system, Akamas trials can be used to execute the same experiments (hence, the same configuration) multiple times (e.g. three times). Once the experiment is completed, the resulting performance metrics can be analyzed to assess the stability. The analysis can either be done by leveraging aggregate metrics in the Analysis tab, or to a deeper level on the actual time series by accessing the Metrics tab from the Akamas UI.

Ideally, no significant performance variation should be observed in the different trials, for the key system performance metrics. Otherwise, it is strongly recommended to identify the root cause before proceeding with the actual optimization activity.

If you are running a live optimization, any constraint violation in the baseline will halt the study. In order to recommend safe configurations, the optimization process requires that the baseline does not violate constraints for the entire observation period.

Backuping the original configuration

Before launching the optimization it might be a good idea to take note of (or backup) the original configuration. This is very important in the case of Linux OS parameters optimization.

Inspecting the data gathering logs from the Akamas UI

Running optimization studies

Before actually running an optimization study, it is highly recommended to read the following sections:

Offline optimization studies

This can be useful for multiple reasons, including the case of an error (e.g. a misconfigured workflow) that requires "restarting" the study.

Live optimization studies

For live optimization studies, it is possible to stop a study and restart it. However, please notice that this is an irreversible action, that would delete all the executed experiments, so basically, restarting a live study means starting it from scratch.

Defining optimization goal & constraints

In general, any performance engineering, tuning, and optimization activity involves complex tradeoffs among different - and potentially conflicting - goals and system performance metrics, such as:

  • Maximizing the business volume an application can support, while not making the single transaction slower or increasing errors above a desired threshold

  • Minimizing the duration of a batch processing task, while not increasing the cloud costs by more than 20% or using more than 8 CPUs

Akamas support all these (and other) scenarios by means of the optimization goal, that is the single metric or the formula combining multiple metrics that have to be either minimized or maximized, and one or more constraints among metrics of the system.

In general, constraints can be defined as either absolute constraints (e.g. app.response_time < 200 ms) or as relative constraints with respect to a baseline (e.g. app_response_time < +20% of the baseline), that is the current configuration in place, typically corresponding to the very first experiment in an offline optimization study which. Therefore, relative constraints are only applicable to offline optimization studies, while absolute constraints are applicable to both absolute and relative constraints.

Please notice that any experiment that does not respect the constraints is marked by Akamas as failed, even if correctly executed. The reason for this failure can be inspected in the experiment status. Similarly to workflow failures (see below), the Akamas AI engine automatically takes any failure due to constraint violations into account when searching the optimization space to identify the parameter configurations that might improve the goal metrics while matching constraints.

Best Practices

There are no general guidelines and best practices on how to best define goals & constraints, as this is where experience, knowledge, and processes meet.

Analyzing results of offline optimization studies

Since an offline optimization study lasts for at most the number of configured experiments and typically runs in a test or pre-production environment, results could be safely either analyzed after the study has completely finished.

However, it is a good practice to analyze partial results while the study is still running as this may provide useful insights about both the system being optimized (e.g. understanding of the system dynamics and sub-optimal configurations that could be immediately applied) and about the optimization study itself (e.g. how to re-design a workflow or change constraints), early-on.

The Akamas UI displays the results of an offline optimization study in different visual areas:

  • the Best Configuration section provides the optimal configuration identified by Akamas, as a list of recommended values for the optimization parameters compared to the baseline and ranked according to their relevance;

  • the Progress tab see the following figures) displays the progression of the study with respect to the study steps, the status of each experiment (and trial), its associated score, and the parameter values of the corresponding configurations; this area is mostly used for study monitoring (e.g. identifying failing workflows) and troubleshooting purposes;

  • the Analysis tab (see the following figures) displays how the baseline and experiments score with respect to the optimization goal, and the values of metrics and parameters for the corresponding configurations; this area supports the analysis of the different configurations;

  • the Metrics tab (see the following figure) displays the behavior of the metrics for all executed experiments (and trials); this area supports both study validation activities and deeper analysis of the system behavior;

Once all the preparatory steps for creating a study are done, running a study is straightforward: An optimization study can be started from either the Akamas UI (see the following figures) or the command line (refer to the page).

or

Once started, managing studies is different for offline optimization studies (see ) and live optimization studies (see ).

Notice that once an offline optimization study has started, it can only be stopped or let be finished and not restarted again. However, it is also possible to reuse experiments executed in another study in another (successfully or not) finished study - this is called bootstrapping and is illustrated by the following figure (also refer to the page on the reference page).

The first fundamental step in creating a study is to define the study . While this step might be perceived as somewhat straightforward (e.g. constraints could be simply translated from SLOs already in place), defining the optimization goal really requires carefully balancing complexity and effectiveness, also as part of the general (iterative) optimization process. Please also read the section here below.

Please notice that when defining constraints for an optimization study, it is required to also include those constraints listed in the Constraints section of the respective Optimization Packs which express internal constraints among parameters. For example, in case OpenJDK 11 components are to be tuned, the reference section is .

The page of the in the reference guide describes the corresponding structures. For offline optimization studies only, the Akamas UI allows the optimization goal and constraints to be defined as part of the visual procedure activated by the "Create a study" button (see the following figure).

Please refer to the section for a number of examples related to a variety of technologies and the guide for real-world examples.

the Insights section (see the following figure) displays any suboptimal configurations that have been identified for the study KPIs, and also allows making comparisons among them and the best configuration - the page describes in further detail the Insight section and the insights tags displayed in other areas of the Akamas UI.

Resource management commands
Before running optimization studies
Analyzing results of offline optimization studies
Analyzing results of live optimization studies
Before applying optimization results
Bootstrap Step
here below
here below
Goal & Constraint
Study template
Optimization examples
Knowledge Base
Optimization Insights
goal & constraints
Best Practices

Optimization Insights

While the main result of an optimization study is to identify the optimal configuration with respect to the defined goal & constraints, any suboptimal configuration that is improving on one of the defined KPIs can be also very valuable.

These configurations are displayed in a dedicated section of the Akamas UI and also displayed in other areas of the Akamas UI as textual badges "Best <KPI name>" referred to as (insights) tags.

Insights section

The following figures show the Insights section displayed on the study page and the Insights pages that can be drilled down to.

The following figure shows the insights tags in the Analysis tab:

Please notice that "Best", "Best Memory Limit" and any other KPI-related tags are displayed in the Akamas UI while the study progresses and thus may be reassigned as new experiments get executed and their configurations are scored and provide their results for the defined study KPIs. See

Insights tags

After starting a study, any finished experiment is labeled by one or more insights tags "Best <KPI name>" in case the corresponding configuration provides the best result so far for those KPIs. Notice that for experiments involving multiple trials, tags are only assigned after all their trials have finished.

Of course, after the very first experiment (i.e. a baseline) finishes, all tags are assigned to the corresponding configuration. This is displayed by the following figure for a study where the KPIs named CPU with formula renaissance.cpu_used and direction minimize and MEM with formula renaissance.mem_used and direction minimize:

When the following experiments finish, tags are reevaluated according with respect to the computed goal score and the achieved results for any single KPI. In this study, experiment #2 provided a better result for both the CPU and the study goal, so it got both the tags Best CPU and Best renaissance.response_time(which is defined as the goal of the study). Notice that the blue star is displayed by Akamas (except for baseline) to highlight the fact that this was automatically generated by Akamas and not assigned by a user.

Afterward, experiment #3 got the tag as the best configuration while experiment #4 got the tag Best CPU. as improving on experiment #2. Therefore two configurations displayed the blue star.

A number of experiments later, experiment #7 provided better memory usage than the baseline so got the tag Best MEM assigned. At this point, three configurations have the blue start, thus making evident that there are tradeoffs when trying to optimize with respect to the goal and the KPIs.

Guidelines for Kubernetes

When starting a new Kubernetes optimization, the following is a list of recommended parameters. After having selected the parameters, always make sure to add/review the corresponding parameter domains and constraints based on your environment. Please refer to the optimization pack reference page for more information.

Suggested Parameters for Kubernetes Containers

  • cpu_request

  • cpu_limit

  • memory_request

  • memory_limit

Before applying optimization results

The following best practices should be considered before applying a configuration identified by an offline optimization study from a test or pre-production environment to a production environment.

Most of these best practices are general and refer to any configuration change and application rollout, not only to Akamas-related scenarios.

Validating the study results

Any configuration identified by Akamas in a test or pre-production environment, by executing a number of experiments and trials in a limited timeframe, should be first validated before being promoted to production in its ability to consistently deliver the expected performance over time.

Running endurance tests

An endurance test typically lasts for several hours and can either mimic the specific load profile of production environments (e.g. morning peaks or low load phases during the night) or a simple constant high load (flat load). A specific Akamas study can be implemented for this purpose.

Applying results of optimization studies

When applying a new configuration to a production environment it is important to reduce the risk of severely impacting the supported services and allowing time to backtrack if required.

Adopt gradual rollouts

With a gradual rollout approach, a new configuration is applied to only a subset of the target system to allow the system to be observed for a period of time and avoid impacting the entire.

Several strategies are possible, including:

  • Canary deployment, where a small percentage of the traffic is served by the instance with the new configuration;

  • Shadow traffic, where traffic is mirrored and redirected to the instance with the new configuration, and responses are not impacting the user.

Assess the impact on the infrastructure and other applications

In the case of an application sharing entire layers or single components (e.g. microservices) with other applications, it is important to assess in advance the potential impact on other applications before applying a configuration identified by only considering SLOs related to a single application.

The following general considerations may help in assessing the impact on the infrastructure:

  • if the new configuration is more efficient (i.e. it is less demanding in terms of resources) or it does require changes to resource requirements (e.g. does not change K8s request limits), then the configuration can be expected to be beneficial as the resources will be freed and become available for additional applications;

  • If the new configuration is less efficient (i.e. it requires more resources), then appropriate checks of whether the additional capacity is available in the infrastructure (e.g. in the K8s cluster or namespace) should be done, as when allocating new applications.

As far as the other applications are concerned:

  • Just reducing the operational cost of a service does not have any impact on other applications that are calling or using the service;

  • While tuning service for performance may put the caller system in back-pressure fatigue, this is not the typical behavior of enterprise systems, where the most susceptible systems are on the backend side:

    • Tuning most external services will not increase the throughput much, which is typically business-driven, thus the risk to overwhelm the backends is low;

    • Tuning the backends allows the caller systems to handle faster connections, thus reducing the memory footprint and increasing the resilience of the entire system;

  • Especially in the case of highly distributed systems, such as microservices, the number of inflight packages for a given period of time is something to be minimized;

  • A latency reduction for a microservice implies fewer in-flight packages throughout the system, leading to better performance, faster failures, and fewer pending transactions to be rolled back in case of incidents.

Guidelines for JVM layer (OpenJDK)

When starting a new JVM optimization, the following is a list of recommended parameters. After having selected the parameters, always make sure to add/review the corresponding parameter domains and constraints based on your environment. Please refer to the optimization pack reference page for more information.

Suggested Parameters for Open JDK 11

  • jvm_maxHeapSize

  • jvm_minHeapSize

  • jvm_gcType

  • jvm_newSize

  • jvm_survivorRatio

  • jvm_maxTenuringThreshold

  • jvm_parallelGCThreads

  • jvm_concurrentGCThreads

  • jvm_maxInlineSize

  • jvm_inlineSmallCode

  • jvm_useTransparentHugePages

  • jvm_alwaysPreTouch

Suggested Parameters for Open JDK 8

When starting a new JVM optimization, the following is a list of recommended parameters to include in your study:

  • jvm_gcType

  • jvm_maxHeapSize

  • jvm_newSize

  • jvm_survivorRatio

  • jvm_maxTenuringThreshold

  • jvm_parallelGCThreads

  • jvm_concurrentGCThreads

Guidelines for choosing optimization parameters

In this section, some guidelines on how to choose optimization parameters are provided for the following specific technologies:

These guidelines also provide an example of how to approach the selection of parameters (and how to define the associated domains and constraints) in an optimization study.

Analyzing results of live optimization studies

Even for live optimization studies, it is a good practice to analyze how the optimization is being executed with respect to the defined goal & constraints, and workloads.

This analysis may provide useful insights about the system being optimized (e.g. understanding of the system dynamics) and about the optimization study itself (e.g. how to adjust optimizer options or change constraints). Since this is more challenging for an environment that is being optimized live, a common practice to adopt a recommendation mode before possibly switching to a fully autonomous mode.

The Akamas UI displays the results of an offline optimization study in the following areas:

  • the Metrics section (see the following figures) displays the behavior of the metrics as configurations are recommended and applied (possibly after being reviewed and approved by users); this area supports the analysis of how the optimizer is driven by the configured safety and exploration factors.

  • The All Configurations section provides the list of all the recommended configurations, possibly as modified by the user, as well as the detail of each applied configuration (see the following figures).

  • in the case of a recommendation mode, the Pending configuration section (see the following figure) shows the configuration that is being recommended to allow users to review it (see the EDIT toggle) and approve it:

Guidelines for JVM (OpenJ9)

Suggested Parameters for OpenJ9

  • j9vm_minFreeHeap

  • j9vm_maxFreeHeap

  • j9vm_minHeapSize

  • j9vm_maxHeapSize

  • j9vm_gcCompact

  • j9vm_gcThreads

  • j9vm_gcPolicy

  • j9vm_codeCacheTotal

  • j9vm_compilationThreads

  • j9vm_aggressiveOpts

The following describes how to approach tuning JVM in the following areas:

Tuning JVM Heap

The most relevant JVM parameters are the ones defining the boundaries of the allocated heap (j9vm_minHeapSize, j9vm_maxHeapSize). The upper bound to configure for this domain strongly depends on the memory in megabytes available on the host instance or on how much we are willing to allocate, while the lower bound depends on the minimum requirements to run the application.

The free heap parameters (j9vm_minFreeHeap, j9vm_maxFreeHeap) define some boundaries for the free space target ratio, which impacts the trigger thresholds of the garbage collector. The suggested starting ranges are from 0.1 and 0.6 for the minimum free ratio range, and from 0.3 to 0.9 for the maximum.

The following represents a sample snippet of the section parametersSelection in the study definition:

It is also recommended to define the following constraints:

  • min heap size lower than or equal to the max heap size:

  • upper bound to be at least 5 percentage points higher than the lower bound

Tuning JVM Garbage Collection

The following JVM parameters define the behavior of the garbage collector:

  • j9vm_gcPolicy

  • j9vm_gcThreads

  • j9vm_gcCompact

The garbage collection policy (j9vm_gcPolicy) defines the collection strategy used by the JVM. This parameter is key for the performance of the application: the default garbage collector (gencon) is the best solution for most scenarios, but some specific kinds of applications may benefit from one of the alternative options.

The number of GC threads (j9vm_gcThreads) defines the level of parallelism available to the collector. This value can range from 1 to the maximum number of CPUs that are available or we are willing to allocate.

The GC compaction (j9vm_gcCompact) selects if garbage collections perform full compactions always, never, or based on internal policies.

The following represents a sample snippet of the section parametersSelection in the study definition:

Tuning JVM compilation

The following JVM parameters define the behaviors of the compilation:

  • j9vm_compilationThreads

  • j9vm_codeCacheTotal

The compilation threads parameter (j9vm_compilationThreads) defines the number available for the JIT compiler. Its range depends on the available CPUs.

The code cache parameter (j9vm_codeCacheTotal) defines the maximum size limit for the JIT code cache. Higher values may benefit complex server-type applications, at the expense of the memory footprint, so should be taken into account in the overall memory requirements.

The following represents a sample snippet of the section parametersSelection in the study definition:

Tuning JVM aggressive optimizations

The following JVM parameter defines the behavior of aggressive optimization:

  • j9vm_aggressiveOpts

Aggressive optimizations (j9vm_aggressiveOpts) enables some experimental features that usually lead to performance gains.

The following represents a sample snippet of the section parametersSelection in the study definition:

Insights section for an offline optimization study
Insights details with comparison among selected configurations
Insights details for a specific configuration

Kubernetes
JVM (OpenJDK)
JVM (OpenJ9)
Oracle Database
PostgreSQL
parametersSelection:
  - name: jvm.j9vm_maxHeapSize
    domain: [<LOWER_BOUND>, <UPPER_BOUND>]
  - name: jvm.j9vm_minHeapSize
    domain: [<LOWER_BOUND>, <UPPER_BOUND>]

  - name: jvm.j9vm_minFreeHeap
    domain: [0.1, 0.6]
  - name: jvm.j9vm_maxFreeHeap
    domain: [0.3, 0.9]
jvm.j9vm_minHeapSize <= jvm.j9vm_maxHeapSize
jvm.j9vm_minFreeHeap + 0.05 < jvm.j9vm_maxFreeHeap
parametersSelection:
  - name: jvm.j9vm_gcPolicy
    categories: [balanced, gencon, metronome, optavgpause, optthruput]

  - name: jvm.j9vm_gcThreads
    domain: [1, <MAX_CPUS>]

  - name: jvm.j9vm_gcCompact
parametersSelection:
  - name: jvm.j9vm_compilationThreads
    domain: [1, <MAX_CPUS>]

  - name: jvm.j9vm_codeCacheTotal
    domain: [2, <UPPER_BOUND>]
parametersSelection:
  - name: j9vm_aggressiveOpts
JVM Heap
JVM Garbage Collection
JVM compilation
JVM aggressive optimization

Guidelines for Oracle Database

This page provides a list of best practices when optimizing an Oracle database with Akamas.

Memory Allocation Sub-spaces

This section provides some guidelines on the most relevant memory-related parameters and how to configure them to perform a high-level optimization of a generic Oracle Database instance.

Oracle DBAs can choose, depending on their needs or expertise, the desired level of granularity when configuring the memory allocated to the database areas and components, and let the Oracle instance automatically manage the lower layers. In the same way, Akamas can tune a target instance with different levels of granularity.

In particular, we can configure an Akamas study so that it simply tunes the overall memory of the instance, leaving Oracle automatically manage how to allocate it between shared memory (SGA) and program memory (PGA); alternatively, we can tune the target values of both of these areas and let Oracle take care of their components, or go even deeper and have total control of the sizing of every single component.

Notice: running the queries in this guide requires a user with the ALTER SYSTEM, SELECT ON V_$PARAMETER, and SELECT ON V_$OSSTAT privileges

Also notice that to define the domain of some of the parameters you need to know the physical memory of the instance. You can find the value in MiB running the query select round(value/1024/1024)||'M' "physical_memory" from v$osstat where stat_name='PHYSICAL_MEMORY_BYTES'. Otherwise, if you have access to the underlying machine, you can run the bash command free -m

Tuning the Total Memory

This is the simplest of the memory-optimization set of parameters, where the study configures only the overall memory available for the instance and lets Oracle’s Automatic Memory Management (AMM) dynamically assign the space to the SGA and PGA. This is useful for simple studies where you want to minimize the overall used memory, usually coupled with constraints to make sure the performances of the overall system remain within acceptable values.

  • memory_target: this parameter specifies the total memory used by the Oracle instance. When AMM is enabled can find the default value with the query select display_value "memory_target" from v$parameter where name='memory_target'. Otherwise, you can get an estimate summing the configured SGA size found running select display_value "sga_target" from v$parameter where name LIKE 'sga_target' and the size of the PGA found with select ceil(value/1024/1024)||'M' "physical_memory" from v$pgastat where name='maximum PGA allocated'. The explored domain strongly depends on your application and hardware, but an acceptable range goes from 152M (the minimum configurable value) to the physical size of your instance. Over time, Akamas will learn to avoid automatically the configuration with not-enough memory.

To configure the Automatic Memory Management you also need to make sure that the parameters sga_target and pga_aggregate_limit are set to 0 by configuring them among the default values of a study, or manually running the configuration queries.

The following snippet shows the parameter selection to tune the total memory of the instance. The domain is configured to go from the minimum value to the maximum physical memory (7609M in our example).

parametersSelection:
- name: ora.memory_target
  domain: [152, 7609]

Tuning the Shared and Program Memory Global Areas

With the following set of parameters, Akamas tunes the individual sizes of the SGA and PGA, letting Oracle’s Automatic Shared Memory Management (ASMM) dynamically size their underlying SGA components. You can leverage these parameters for studies where, like the previous scenario, you want to find the configuration with the lowest memory allocation that still performs within your SLOs. Another possible scenario is to find the balance in the allocation of the memory available that best fits your optimization goals.

  • sga_target: this parameter specifies the target SGA size. When ASMM is configured, you can find the default value with the query select display_value "sga_target" from v$parameter where name='sga_size'. The explored domain strongly depends on your application and hardware, but an acceptable range goes from 64M (the minimum configurable value) to the physical size of your instance minus a reasonable size for the PGA (usually up to 80% of physical memory).

  • pga_aggregate_target: this parameter specifies the target PGA size. You can find the default value with the query select display_value "pga_aggregate_target" from v$parameter where name='pga_aggregate_target'. The explored domain strongly depends on your application and hardware, but an acceptable range goes from 10M (the minimum configurable value) to the physical size of your instance minus a reasonable size for the SGA.

To tune the SGA and PGA, you also must set the memory_target to 0 to disable AMM by configuring them among the default values of a study, or manually running the configuration queries. ASMM will dynamically tune all the SGA components whose size is not specified, so set to 0 all the parameters (db_cache_size, log_buffer, java_pool_size, large_pool_size, shared_pool_size, and streams_pool_size) unless you have any specific requirements.

The following snippet shows the parameter selection to tune both SGA and PGA sizes. Each parameter is configured to go from the minimum value to 90% of the maximum physical memory (6848M in our example), allowing Akamas to explore all the possible ways to partition the space between the two areas and find the best configuration for our use case:

parametersSelection:
- name: ora.sga_target
  domain: [64, 6848]
- name: ora.pga_aggregate_target
  domain: [10, 6848]

The following code snippet forces Akamas to explore configuration spaces where the total memory, expressed in MiB, does not exceed the total memory available (7609M in our example). This allows speeding up the optimization avoiding configurations that won’t work correctly.

parameterConstraints:
- name: Limit total memory
  formula: ora.sga_target + ora.pga_aggregate_target <= 7609

Tuning the Shared Memory

With the following set of parameters, Akamas tunes the space allocated to one or more of the components that make the System Global Area, along with the size of the Program Global Area size. This scenario is useful for studies where you want to find the memory distribution that best fits your optimization goals.

  • pga_aggregate_target: this parameter specifies the size of the PGA. You can find the default value with the query select display_value "pga_aggregate_target" from v$parameter where name='pga_aggregate_target'. The explored domain strongly depends on your application and hardware, but an acceptable range goes from 10M (the minimum configurable value) to the physical size of your instance.

  • db_cache_size: this parameter specifies the size of the default buffer pool. You can find the default value with the query select * from v$sgainfo where name='Buffer Cache Size'.

  • log_buffer: this parameter specifies the size of the log buffer. You can find the default value with the query select * from v$sgainfo where name='Redo Buffers'.

  • java_pool_size: this parameter specifies the size of the java pool. You can find the default value with the query select * from v$sgainfo where name='Java Pool Size'.

  • large_pool_size: this parameter specifies the size of the large pool. You can find the default value with the query select * from v$sgainfo where name='Large Pool Size'.

  • streams_pool_size: this parameter specifies the size of the streams pool. You can find the default value with the query select * from v$sgainfo where name='Streams Pool Size'.

  • shared_pool_size: this parameter specifies the size of the shared pool. You can find the default value with the query select * from v$sgainfo where name='Shared Pool Size'.

The explored domains of the SGA components strongly depend on your application and hardware; an approach is to scale both up and down the baseline value by a reasonable factor to define the domain boundaries (eg: from 20% to 500% of the baseline).

To tune all the components set both the memory_target and sga_target parameters to 0 by configuring them among the default values of a study, or manually running the configuration queries.

Notice: if your system leverages non-standard block-size buffers you should consider tuning also the db_Nk_cache_size parameters.

The following snippet shows the parameter selection to tune the size of the PGA and the SGA components. The PGA parameter is configured to go from the minimum value to 90% of the maximum physical memory (6848M in our example), while the domains for the SGA components are configured scaling their default value by approximatively a factor of 10. Along with the constraint defined below, these domains give Akamas great flexibility while exploring how to distribute the available memory space:

parametersSelection:
- name: ora.pga_aggregate_target
  domain: [10, 6848]
- name: ora.db_cache_size
  domain: [128, 6848]
- name: ora.log_buffer
  domain: [1, 128]
- name: ora.java_pool_size
  domain: [4, 240]
- name: ora.large_pool_size
  domain: [12, 1024]
- name: ora.shared_pool_size
  domain: [12, 1024]

The following code snippet forces Akamas to explore configuration spaces where the total memory, expressed in MiB, does not exceed the total memory available (7609M in our example).

parameterConstraints:
- name: Limit total memory
  formula: ora.db_cache_size + name: ora.log_buffer + ora.java_pool_size + ora.large_pool_size + ora.shared_pool_size + ora.pga_aggregate_target <= 7609

You should also add to the equation any db_Nk_cache_size tuned in the study.

Guidelines for PostgreSQL

Suggested Parameters

When running a PostgreSQL optimization, consider starting from these recommendations:

Parameter
Recommendation

pg_max_connections

Keep its value under 1000 connections.

pg_effective_cache_size

75% of physical available memory to PostgreSQL.

pg_maintenance_work_mem

12% of physical available memory to PostgreSQL.

pg_max_wal_senders

Max replicas you expect to have, doubled.

pg_max_parallel_workers

Number of cores divided by 2.

pg_shared_buffers

25% of physical available memory to PostgreSQL.

Guidelines for defining optimization studies

This section provides some guidelines on how to define optimization studies by means of a few examples related to single-technology/layer systems, in particular on how to define workflows and telemetry providers.

More complex real-world examples are provided by the guide.

Knowledge Base

Optimizing Linux

When optimizing Linux systems, typically the goal is to allow cost savings or improve performance and the quality of service, such as sustaining higher levels of traffic or enabling transactions with lower latency.

Workflows

Applying parameters

A typical workflow

You can organize a typical workflow to optimize Linux in three parts:

  1. Configure Linux

  2. Test the performance of the system

  3. Perform some cleanup

Here’s an example of a typical workflow for a Linux system:

name: "linux workflow"
tasks:
- name: "set linux parameters"
  operator: "LinuxConfigurator"
  arguments:
    component: "mylinuxcomponent"

- name: "execute performance test"
  operator: "Executor"
  arguments:
    host:
      hostname: "perf.mycompany.com"
      key: "..."
      username: "perf"
    command: "/home/perf/start.sh"

Telemetry Providers

Please refer to the for the list of component types, parameters, metrics, and constraints.

Akamas provides the as the preferred way to apply Linux parameters to a system to be optimized. The operator connects via SSH to your Linux components and will employ different strategies to apply Linux parameters. Notice that this operator allows you to exclude some block/network devices from being configured.

Use the to apply configuration parameters to the operating system, no restart is required

Use to execute a performance test against the system

Use to perform any clean-up to guarantee any subsequent execution of the workflow will run without problems

Akamas does not provide any specialized telemetry solution to gather Linux metrics as these metrics can be collected in a variety of ways, leveraging a plethora of existing solutions. For example, the supports Linux system metrics.

Linux optimization pack
LinuxConfigurator operator
LinuxConfigurator operator
workflow operators
workflow operators
Prometheus provider

Optimizing Java OpenJDK

When optimizing Java applications based on OpenJDK, typically the goal is to tune the JVM from both the point of view of cost savings and quality of service.

Workflows

Applying parameters

The following is an example of templatized executions string:

#!/bin/bash
cd "$(dirname "$0")"
java ${jvm.jvm_gcType} ${jvm.jvm_minHeapSize} ${jvm.jvm_maxHeapSize} ${jvm_newSize} ${jvm_survivorRatio} -jar renaissance.jar -r 20 --csv renaissance.csv page-rank

A typical workflow

A typical workflow to optimize a Java application can be structured in two parts:

  1. Configure the Java arguments

  2. Run the Java application

Here’s an example of a typical workflow where Akamas executes the script containing the command string generated by the file configurator:

name: optimize-java-app
tasks:
  - name: Configure Parameters
    operator: FileConfigurator
    arguments:
        source:
            hostname: app.akamas.io
            username: akamas
            path: /home/akamas/app/run.sh.templ
            key: rsa-key
        target:
          hostname: app.akamas.io
          username: akamas
          path: /home/akamas/app/run.sh
          key: rsa-key

  - name: Launch Test
    operator: Executor
    arguments:
      command: bash /home/akamas/app/run.sh
      host:
        hostname: app.akamas.io
        username: akamas
        key: rsa-key

Telemetry Providers

Here’s a configuration example for a telemetry provider instance that uses Prometheus to extract all the JMX metrics defined in this optimization pack:

provider: Prometheus
config:
  address: monitoring.akamas.io
  port: 9090

where the configuration of the monitored component provides the additional references as in the following snippet:

name: jvm
description: target JVM
componentType: openjdk-11
properties:
    prometheus:
        instance: jvm
        job: jmx-exporter

Examples

Please refer to the for the list of component types, parameters, metrics, and constraints.

Akamas offers many operators that you can use to apply the parameters for the tuned JVM. In particular, it is suggested to use the to create a configuration file or inject the arguments directly in the command string using a template.

Generate a configuration file or a command string containing the selected JVM parameters using a .

Use available to execute a performance test against the application.

Akamas can access JMX metrics using the . This provider comes out of the box with a set of default queries to interrogate a Prometheus instance configured to fetch data from a .

See this for an example of a study leveraging the Java OpenJDK pack.\

Java OpenJDK optimization pack
FileConfigurator operator
FileConfigurator operator
operators
Prometheus provider
JMX Exporter
page

Optimizing OpenJ9

When optimizing Java applications based on OpenJ9, typically the goal is to tune the JVM from both the point of view of cost savings and quality of service.

Workflows

Applying parameters

The following is an example of templatized executions string:

#!/bin/bash
cd "$(dirname "$0")"
java ${jvm.*} -jar myApp.jar

A typical workflow

A typical workflow to optimize a Java application can be structured in two parts:

  1. Configure the Java arguments

  2. Run the Java application

Here’s an example of a typical workflow where Akamas executes the script containing the command string generated by the file configurator:

name: optimize-java-app
tasks:
  - name: Configure Parameters
    operator: FileConfigurator
    arguments:
      source:
        hostname: app.akamas.io
        username: akamas
        path: /home/akamas/app/run.sh.templ
        key: rsa-key
      target:
        hostname: app.akamas.io
        username: akamas
        path: /home/akamas/app/run.sh
        key: rsa-key

  - name: Launch Test
    operator: Executor
    arguments:
      command: bash /home/akamas/app/run.sh
      host:
        hostname: app.akamas.io
        username: akamas
        key: rsa-key

Telemetry Providers

Here’s a configuration example for a telemetry provider instance that uses Prometheus to extract all the JMX metrics defined in this optimization pack:

provider: Prometheus
config:
  address: monitoring.akamas.io
  port: 9090

where the configuration of the monitored component provides the additional references as in the following snippet:

name: jvm
description: target JVM
componentType: java-ibm-j9vm-8
properties:
  prometheus:
    instance: jvm
    job: jmx-exporter

Examples

Please refer to the for the list of component types, parameters, metrics, and constraints.

Akamas offers many operators that you can use to apply the parameters for the tuned JVM. In particular, it is suggested to leverage the to create a configuration file or inject the arguments directly in the command string using a template.

Generate a configuration file or a command string containing the selected JVM parameters using a .

Use available to execute a performance test against the application.

Akamas can access JMX metrics using the This provider comes out of the box with a set of default queries to interrogate a Prometheus instance configured to fetch data from a .

See this for an example of a study leveraging the Eclipse OpenJ9 pack.

OpenJ9 optimization pack
FileConfigurator Operator
FileConfigurator Operator
operators
Prometheus provider.
JMX Exporter
page

Optimizing Spark

When optimizing applications running on the Apache Spark framework, the goal is to find the configurations that best optimize the allocated resources or the execution time.

Workflows

Applying parameters

Other solutions include:

A typical workflow

You can organize a typical workflow to optimize a Spark application in three parts:

  1. Setup the test environment

    1. prepare any required input data

    2. apply the Spark configuration parameters, if you are going for a file-based solution

  2. Execute the Spark application

  3. Perform cleanup

name: Spark workflow
tasks:
   - name: cwspark
     arguments:
        master: yarn
        deployMode: cluster
        file: /home/hadoop/scripts/pi.py
        args: [ 100 ]L

Telemetry Providers

Here’s a configuration example for a telemetry provider instance:

provider: SparkHistoryServer
config:
  address: sparkmaster.akamas.io
  port: 18080

Examples

Please refer to the for the list of component types, parameters, metrics, and constraints.

Akamas offers several operators that you can use to apply the parameters for the tuned Spark application. In particular, we suggest using the , which connects to a target instance to submit the application using the configuration parameters to test.

the , which allows submitting the application along with the configuration parameters using the

the standard , which allows running a custom command or script once the updated the default Spark configuration file or a custom one using a template.

Here’s an example of a typical workflow where Akamas executes the Spark application using the :

Akamas can access statistics using the . This provider maps the metrics in this optimization pack to the statistics provided by the History Server endpoint.

See this for an example of a study leveraging the Spark pack.

Spark optimization pack
Spark SSH Submit operator
Spark Livy Operator
Livy Rest interface
Executor operator
FileConfigurator operator
Spark SSH Submit operator
Spark History Server
Spark History Server Provider
page

Optimizing Kubernetes

When optimizing Kubernetes applications, typically the goal is to find the configuration that assigns resources to containerized applications so as to minimize waste and ensure the quality of service.

Workflows

Applying parameters

The following example is the definition of a deployment, where the replicas and resources are templatized in order to work with the FileConfigurator:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: ${deployment.k8s_workload_replicas}
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx:1.14.2
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: ${container.cpu_request}
              memory: ${container.memory_request}
            limits:
              cpu: ${container.cpu_limit}
              memory: ${container.memory_limit}

A typical workflow

A typical workflow to optimize a Kubernetes application is usually structured as the following:

  1. Wait for the application to be ready: run a custom script to wait until the rollout is complete.

  2. Run the test: execute the benchmark.

Here’s an example of a typical workflow for a system:

name: Kubernetes workflow
tasks:
  - name: Configure deployment parameters
    operator: FileConfigurator
    arguments:
      source:
        path: nginx-deployment.yaml.templ
        hostname: app.akamas.io
        username: akamas
        key: rsa-key
      target:
        path: nginx-deployment.yaml
        hostname: app.akamas.io
        username: akamas
        password: akamas

  - name: Apply parameters
    operator: Executor
    arguments:
      command: kubectl apply -f nginx-deployment.yaml
      host:
        hostname: app.akamas.io
        username: akamas
        password: akamas

  - name: Wait application ready
    operator: Executor
    arguments:
      command: bash /home/akamas/app/check-status.sh
      host:
        hostname: app.akamas.io
        username: akamas
        password: akamas

  - name: Run test
    operator: Executor
    arguments:
      command: bash /home/akamas/app/run-test.sh
      host:
        hostname: app.akamas.io
        username: akamas
        password: akamas

Telemetry Providers

Here’s a configuration example for a telemetry provider instance that uses Prometheus to extract all the Kubernetes metrics defined in this optimization pack:

provider: Prometheus
config:
  address: monitoring.akamas.io
  port: 9090

where the configuration of the monitored component provides the additional filters as in the following snippet:

name: nginx_pod
description: Pod running Nginx
componentType: Kubernetes Pod

properties:
  prometheus:
    job: 'kubernetes-cadvisor|kube-state-metrics'
    namespace: akamas
    pod: nginx-*
name: cluster
description: Cluster
componentType: Kubernetes Cluster

properties:
  prometheus:
    job: 'kubernetes-cadvisor|kube-state-metrics'

Please keep in mind that some resources, such as pods belonging to deployments, require wildcards in order to match the auto-generated names.

Examples

Optimizing Oracle Database

When optimizing a MongoDB instance, typically the goal is to maximize the throughput of an Oracle-backed application or to minimize its resource consumption, thus reducing costs.

Workflows

Applying parameters

Oracle Configurator

File Configurator and Executor

A typical workflow

The optimization of an Oracle database usually includes the following tasks in the workflow, as implemented in the example below:

  1. Apply the Oracle configuration suggested by Akamas and restart the instance if needed (Update parameters task).

  2. Perform any additional warm-up task that may be required to bring the database up at the operating regime (Execute warmup task).

  3. Execute the workload targeting the database or the front-end in front of it (Execute performance test task).

  4. Restore the original state of the database in order to guarantee the consistency of further tests, removing any dirty data added by the workload and possibly flushing the database caches (Cleanup task).

The following is the complete YAML configuration file of the workflow described above:

Telemetry Providers

The following example shows how to configure a telemetry instance for a Prometheus provider in order to query the data points extracted from the exporter described above:

Examples

Please refer to the for the list of component types, parameters, metrics, and constraints.

Akamas offers different operators to configure Kubernetes entities. In particular, you can use the to update the definition file of a resource and apply it with the .

Configure the Kubernetes artifacts: use the to create the definition files starting from a template.

Apply the new parameters: apply the updated definitions using the .

Akamas can access Kubernetes metrics using the This provider comes out of the box with a set of default queries to interrogate a Prometheus instance configured to fetch data from and .

See this for an example of a study leveraging the Kubernetes pack.

Please refer to the for the list of component types, parameters, metrics, and constraints.

One common way to configure Oracle parameters is through the execution ALTER SYSTEM statements on the database instance: to automate the execution of this task Akamas provides the . For finer control, Akamas provides the , which allows building custom statements in a script file that can be executed by the .

The allows the workflow to configure an on-premise instance with minimal configuration. The following snippet is an example of a configuration task, where all the connection arguments are already defined in the referenced component:

Most cloud providers offer web APIs as the only way to configure database services. In this case, the can submit an API request through a custom executable using a configuration file generated by a . The following is an example workflow where a FileConfigurator task generates a configuration file (oraconf), followed by an Executor task that parses and submits the configuration to the API endpoint through a custom script (api_update_db_conf.sh):

Akamas offers many telemetry providers to extract Oracle Database metrics; one of them is the , which we can use to query Oracle Database metrics collected by a Prometheus instance via the .

The snippet below shows a configuration example for the Oracle Exporter extracting metrics regarding the Oracle sessions:

See and for examples of studies leveraging the Oracle Database pack.

Kubernetes optimization pack
FileConfigurator operator
Executor operator
File Configurator operator
Executor operator
Prometheus provider.
cAdvisor
kube-state-metrics
page
name: Update Oracle parameters
operator: OracleConfigurator
arguments:
  component: oracledb
tasks:
  - name: Generate Oracle configuration
    operator: FileConfigurator
    arguments:
      sourcePath: /home/akamas/oraconf.template
      targetPath: /home/akamas/oraconf
      component: oracledb

  - name: Update conf
    operator: Executor
    arguments:
      command: bash /home/akamas/oraconf/api_update_db_conf.sh /home/akamas/oraconf
      component: oracleML
name: workflow
description: Test Oracle instance configuration.
tasks:

  - name: Update parameters
    operator: OracleConfigurator
    arguments:
      component: oracledb

  - name: Execute warmup
    operator: Executor
    arguments:
      host:
        hostname: perf.mycompany.com
        key: ...
        username: perf
      command: /home/perf/warmup.sh

  - name: Execute performance test
    operator: Executor
    arguments:
      host:
        hostname: perf.mycompany.com
        key: ...
        username: perf
      command: /home/perf/start.sh

  - name: Cleanup
    operator: OracleExecutor
    arguments:
      sql:
        - TRUNCATE TABLE user_actions
      component: oracledb
[[metric]]
context = "sessions"
labels = [ "status", "type" ]
metricsdesc = { value= "Gauge metric with count of sessions by status and type." }
request = "SELECT status, type, COUNT(*) as value FROM v$session GROUP BY status, type"
provider: Prometheus
config:
  address: akamas.mycompany.com
  port: 9090

metrics:
  - metric: sessions_active_user
    datasourceMetric: oracledb_sessions_value{instance='$INSTANCE$', type='USER', status='ACTIVE', %FILTERS%}

  - metric: sessions_inactive_user
    datasourceMetric: oracledb_sessions_value{instance='$INSTANCE$', type='USER', status='INACTIVE', %FILTERS%}
Oracle Database optimization pack
OracleConfigurator operator
FileConfigurator operator
Executor operator
OracleConfigurator operator
Executor operator
FileConfigurator operator
Prometheus provider
Prometheus Oracle Exporter
toml
Optimizing an Oracle Database server instance
Optimizing an Oracle Database for an e-commerce service

Optimizing MySQL Database

When optimizing a MySQL instance, typically the goal is one of the following:

  • Throughput optimization: increasing the capacity of a MySQL deployment to serve clients

  • Cost optimization: decreasing the size of a MySQL deployment while guaranteeing the same service level

Workflows

Applying parameters

Usually, MySQL parameters are configured by writing them in the MySQL configuration file, typically called my.cnf, and located under /etc/mysql/ on most Linux systems.

In order to preserve the original config file intact, it is best practice to use additional configuration files, located in /etc/mysql/conf.d to override the default parameters. These files are automatically read by MySQL.

FileConfigurator and Executor operator

A typical workflow

A typical workflow to optimize a MySQL deployment can be structured in three parts:

  1. Configure MySQL

  2. Restart MySQL

  3. Test the performance of the application

  4. Prepare test results

Finally, when running performance experiments on databases is common practice to do some cleanup tasks at the end of the test to restore the database's initial condition to avoid impacting subsequent tests.

Here’s an example of a typical workflow for MySQL, which uses the OLTP Resourcestresser benchmark to run performance tests

name: OptimizeMySQL
tasks:

  - name: Configure MySQL
    operator: FileConfigurator
    arguments:
      component: mysql

  - name: Restart MySQL
    operator: Executor
    arguments:
      command: "/mysql/restart-mysql-container.sh"
      component: mysql

  - name: test
    operator: Executor
    arguments:
      command: "cd /home/ubuntu/oltp/oltpbench && ./oltpbenchmark --bench resourcestresser --config /home/ubuntu/oltp/resourcestresser.xml --execute=true -s 5 --output out"
      component: OLTP

  - name: Parse csv results
    operator: Executor
    arguments:
      command: "bash /home/ubuntu/oltp/scripts/parse_csv.sh"
      component: OLTP

Telemetry providers

Here’s an example of a telemetry providers instance that uses Prometheus to extract all the MySQL metrics defined in this optimization pack:

provider: prometheus
config:
  address: mysql.mydomain.com
  port: 9090
  job: mysql_exporter

Examples

Optimizing MongoDB

When optimizing a MongoDB instance, typically the goal is one of the following:

  • Throughput optimization - increasing the capacity of a MongoDB deployment to serve clients

  • Cost optimization - decreasing the size of a MongoDB deployment while guaranteeing the same service level

To reach such goals, it is recommended to tune the parameters that manage the cache, which is of the elements that impact performances the most, in particular those parameters that control the lifecycle and the size of the MongoDB’s cache.

  • The number of documents inserted in the database per second

  • The number of active connections

Workflows

Applying parameters

FileConfigurator and Executor operator

You can leverage the FileConfigurator by creating a template file on a remote host that contains some scripts to configure MongoDB with placeholders that will be replaced with the values of parameters tuned by Akamas.

Here’s an example of the aforementioned template file:

You can leverage the FileConfigurator by creating a template file on a remote host that contains some scripts to configure MongoDB with placeholders that will be replaced with the values of parameters tuned by Akamas. Once the FileConfigurator has replaced all the tokens, you can use the Executor operator to actually execute the script to configure MongoDB.

A typical workflow

A typical workflow to optimize a MongoDB deployment can be structured in three parts:

  1. Configure MongoDB

  2. Test the performance of the application

  3. Prepare test results (optional)

  4. Cleanup

Finally, when running performance experiments on a database, is common practice to execute some cleanup tasks at the end of the test to restore the database initial condition and avoid impacting subsequent tests.

Here’s an example of a typical workflow for a MongoDB deployment, which uses the YCSB benchmark to run performance tests:

Telemetry providers

Here’s an example of a telemetry providers instance that uses Prometheus to extract all the MongoDB metrics defined in this optimization pack:

Examples

Please refer to the for the list of component types, parameters, metrics, and constraints.

You can leverage the by creating a template file on a remote host that contains some scripts to configure MySQL with placeholders that will be replaced with the values of parameters tuned by Akamas. When all the placeholders in FileConfigurator get replaced, the operator can be used to actually execute the script to configure and restart the database

Use the to specify an input and an output template file. The input template file is used to specify how to interpolate MySQL parameters into a configuration file, and the output file is used to contain the result of the interpolation.

Use the to restart MySQL allowing it to load the new configuration file produced in the previous step.

Optionally, use the to verify that the application is up and running and has finished any initialization logic.

Use any of the to perform a performance test against the application.

Use any of the to organize test results so that they can be imported into Akamas using the supported (see also section here below).

Akamas can access MySQL metrics using the This provider can be leveraged to query MySQL metrics collected by a Prometheus instance via the .

This and this describe an example of how to leverage the MySQL optimization pack.

Even though it is possible to evaluate performance improvements of MongoDB by looking at the business application that uses it as its database, looking at the end-to-end throughput or response time, or using a performance test like , the optimization pack provides internal MongoDB metrics that can shed a light too on how MongoDB is performing, in particular in terms of throughput, for example:

Please refer to the for the list of component types, parameters, metrics, and constraints.

Akamas offers many operators that you can use to apply freshly tuned configuration parameters to your MongoDB deployment. In particular, we suggest using the to create a configuration script file and the ExecutorOperator to execute it and thus apply the parameters.

Use the to specify an input and an output template file. The input template file is used to specify how to interpolate MongoDB parameters into a script, and the output file contains the actual configuration.

Use the operator to reconfigure MongoDB exploiting the output file produced in the previous step. You may need to restart MongoDB depending on the configuration parameters you want to optimize.

Either use the operator or the operator to verify that the application is up and running and has finished any initialization logic (this step may not be necessary)

Use available to execute a performance test against the application

If Akamas does not already automatically import performance test metrics, then you can use available to extract test results and make them available to Akamas (for example, you can use an to launch a script that produces a CSV of the test results that Akamas can consume using the )

Use available to bring back MongoDB into a clean state to avoid impacting subsequent tests

Akamas offers many telemetry providers to extract MongoDB metrics; one of them is the which we can use to query MongoDB metrics collected by a Prometheus instance via the .

See the page for an example of a study leveraging the MongoDB pack.

MySQL optimization pack
FileConfigurator operator
FileConfigurator operator
Executor operator
Executor operator
workflow operators
workflow operators
telemetry providers
Prometheus provider.
MySql Prometheus exporter
page
page
#!/bin/sh

cd "$(dirname "$0")" || exit

CACHESIZE=${mongo.mongodb_cache_size}
SYNCDELAY=${mongo.mongodb_syncdelay}
EVICTION_DIRTY_TRIGGER=${mongo.mongodb_eviction_dirty_trigger}
EVICTION_DIRTY_TARGET=${mongo.mongodb_eviction_dirty_target}
EVICTION_THREADS_MIN=${mongo.mongodb_eviction_threads_min}
EVICTION_THREADS_MAX=${mongo.mongodb_eviction_threads_max}
EVICTION_TRIGGER=${mongo.mongodb_eviction_trigger}
EVICTION_TARGET=${mongo.mongodb_eviction_target}
USE_NOATIME=${mongo.mongodb_datafs_use_noatime}

# Here we have to remount the disk mongodb uses for data, to take advantage of the USE_NOATIME parameter

sudo service mongod stop
sudo umount /mnt/mongodb
if [ "$USE_NOATIME" = true ]; then
        sudo mount /dev/nvme0n1 /mnt/mongodb -o noatime
else
        sudo mount /dev/nvme0n1 /mnt/mongodb
fi
sudo service mongod start

# flush logs
echo -n | sudo tee /mnt/mongodb/log/mongod.log
sudo service mongod restart

until grep -q "waiting for connections on port 27017" /mnt/mongodb/log/mongod.log
do
        echo "waiting MongoDB..."
        sleep 60
done

sleep 5
sudo service prometheus-mongodb-exporter restart
# set knobs
mongo --quiet --eval "db.adminCommand({setParameter:1, 'wiredTigerEngineRuntimeConfig': 'cache_size=${CACHESIZE}m, eviction=(threads_min=$EVICTION_THREADS_MIN,threads_max=$EVICTION_THREADS_MAX), eviction_dirty_trigger=$EVICTION_DIRTY_TRIGGER, eviction_dirty_target=$EVICTION_DIRTY_TARGET', eviction_trigger=$EVICTION_TRIGGER, eviction_target=$EVICTION_TARGET})"
mongo --quiet --eval "db = db.getSiblingDB('admin'); db.runCommand({ setParameter : 1, syncdelay: $SYNCDELAY})"

sleep 3
name: "ycsb_mongo_workflow"
tasks:
  - name: "configure mongo"
    operator: "FileConfigurator"
    arguments:
      sourcePath: "/home/ubuntu/mongo/templates/mongo_launcher.sh.templ"
      targetPath: "/home/ubuntu/mongo/launcher.sh"
      component: "mongo"

  - name: "launch mongo"
    operator: "Executor"
    arguments:
      command: "bash /home/ubuntu/mongo/launcher.sh 2>&1 | tee -a /tmp/log"
      component: "mongo"

  - name: "launch ycsb"
    operator: "Executor"
    arguments:
      command: "bash /home/ubuntu/ycsb/launch_load.sh 2>&1 | tee -a /tmp/log"
      component: "mongo_ycsb"

  - name: "parse ycsb"
    operator: "Executor"
    arguments:
      command: "python /home/ubuntu/ycsb/parser.py"
      component: "mongo_ycsb"
  - name: "clean mongo"
    operator: "Executor"
    arguments:
      command: "bash /home/ubuntu/clean_mongodb.sh"
      component: "mongo"
provider: "Prometheus"
config:
  address: "prometheus.mycompany.com"
  port: 9090

metrics:
  - metric: "mongodb_connections_current"
    datasourceMetric: "mongodb_connections{instance="$INSTANCE$"}"
    labels: ["state"]
  - metric: "mongodb_heap_used"
    datasourceMetric: "mongodb_extra_info_heap_usage_bytes{instance="$INSTANCE$"}"
  - metric: "mongodb_page_faults_total"
    datasourceMetric: "rate(mongodb_extra_info_page_faults_total{instance="$INSTANCE$"}[$DURATION$])"
  - metric: "mongodb_global_lock_current_queue"
    datasourceMetric: "mongodb_global_lock_current_queue{instance="$INSTANCE$"}"
    labels: ["type"]
  - metric: "mongodb_mem_used"
    datasourceMetric: "mongodb_memory{instance="$INSTANCE$"}"
    labels: ["type"]
  - metric: "mongodb_documents_inserted"
    datasourceMetric: "rate(mongodb_metrics_document_total{instance="$INSTANCE$", state="inserted"}[$DURATION$])"
  - metric: "mongodb_documents_updated"
    datasourceMetric: "rate(mongodb_metrics_document_total{instance="$INSTANCE$", state="updated"}[$DURATION$])"
  - metric: "mongodb_documents_deleted"
    datasourceMetric: "rate(mongodb_metrics_document_total{instance="$INSTANCE$", state="deleted"}[$DURATION$])"
  - metric: "mongodb_documents_returned"
    datasourceMetric: "rate(mongodb_metrics_document_total{instance="$INSTANCE$", state="returned"}[$DURATION$])"
YCSB
MongoDB optimization pack
FileConfigurator operator
FileConfigurator
Executor
Sleep
Executor
operators
operators
Executor
CSV provider
operators
Prometheus provider
MongoDB Prometheus exporter
Optimizing a MongoDB server instance

Integrating Akamas

  • Configuration Management tools providing the ability to set tunable parameters for the system to be optimized - this integration applies to both offline and live optimization studies;

  • Value Stream Delivery tools to implement a continuous optimization process as part of a CI/CD pipeline - this integration applies to both offline and live optimization studies;

  • Load Testing tools used to reproduce a synthetic workload on the system to be optimized; notice that these tools may also act as Telemetry Providers (e.g. for end-user metrics) - this integration only applies to offline optimization studies.

These integrations may require some setup on both the tool and the Akamas side and may also involve defining workflows and making use of workflow operators.

Akamas provides the following areas of integration with your ecosystem, which may apply or not depending on whether you are running or :

Telemetry Providers tools providing time series for metrics of interest for the system to be optimized (see also ) - this integration applies to both offline and live optimization studies;

Install CSV provider

To install the CSV File provider, create a YAML file (called provider.yml in this example) with the specification of the provider:

# CSV File Telemetry Provider
name: CSV File
description: Telemetry Provider that enables to import of metrics from a remote CSV file
dockerImage: 485790562880.dkr.ecr.us-east-2.amazonaws.com/akamas/telemetry-providers/csv-file-provider:3.1.2

Then, you can then install the provider with the Akamas CLI:

akamas install telemetry-provider provider.yml
live optimization studies
offline optimization studies
Telemetry Providers

Optimizing PostgreSQL

When optimizing a PostgreSQL instance, typically the goal is one of the following:

  • Throughput optimization: increasing the number of transactions

  • Cost optimization: minimize resource consumption according to a typical workload, thus cutting costs

Workflow

Applying parameters

A typical optimization process involves the following steps:

  1. Configure PostgreSQL parameters

  2. Restore DB data

  3. Restart PostgreSQL and wait for the initialization

  4. Run benchmark

  5. Parse results

Please note that most PostgreSQL parameters do not need an application restart.

CSV provider

The CSV provider collects metrics from CSV files and makes them available to Akamas. It offers a very versatile way to integrate custom data sources.

Prerequisites

This section provides the minimum requirements that you should match before using the CSV File telemetry provider.

Network requirements

The following requirements should be met to enable the provider to gather CSV files from remote hosts:

  • Port 22 (or a custom one) should be open from Akamas installation to the host where the files reside.

  • The host where the files reside should support SCP or SFTP protocols.

Permissions

  • Read access to the CSV files target of the integration

Akamas supported version

  • Versions < 2.0.0 are compatibile with Akamas until version 1.8.0

  • Versions >= 2.0.0 are compatible with Akamas from version 1.9.0

Supported component types

The CSV File provider is generic and allows integration with any data source, therefore it does not come with support for a specific component type.

Setup the data source

To operate properly, the CSV file provider expects the presence of four fields in each processed CSV file:

  • A timestamp field used to identify the point in time a certain sample refers to.

  • A component field used to identify the Akamas entity.

  • A metric field used to identify the name of the metric.

  • A value field used to store the actual value of the metric.

These fields can have custom names in the CSV file, you can specify them in the provider configuration.

Integrating Telemetry Providers

Akamas supports the integration with virtually any telemetry and observability tool.

Supported Telemetry Providers

The following table describes the supported Telemetry Providers, which are created automatically at installation time.

Notice that Telemetry Providers are shared across all the workspaces within the same Akamas installation, and only users with administrative privileges can manage them.

Optimizing Web Applications

Telemetry configuration

No specialized telemetry solution to gather Web Application metrics is included. The following providers however can integrate with the provided metrics:

Workflows

Applying parameters

The provided component type does not define any parameter. The workflow will optimize parameters defined in other component types representing the underlying technological stack.

A typical workflow

A typical workflow to optimize a web application is structured in three parts:

  1. Configure and restart the application

  2. Run the test

  3. Perform the cleanup

Here's an example workflow to perform a test on a Java web application using NeoLoad as a load generator:

name: "webapp workflow"
tasks:
  - name: Set Java parameters
    operator: FileConfigurator
    arguments:
      source:
        hostname: myapp.mycompany.com
        username: ubuntu
        key: # ...
        path: /home/ubuntu/conf_template
      target:
        hostname: myapp.mycompany.com
        username: ubuntu
        key: # ...
        path: /home/ubuntu/conf

  - name: Restart application
    operator: Executor
    arguments:
      command: "/home/ubuntu/myapp_down.sh; /home/ubuntu/myapp_sh -opts '/home/ubuntu/conf'"
      host:
        hostname: myapp.mycompany.com
        username: ubuntu
        key: # ...

  - name: Run NeoLoadWeb load test
    operator: NeoLoadWeb
    arguments:
      accountToken: NLW_TOKEN
      projectFile:
        # NeoLoad projectfile location ...

Examples

Create CSV telemetry instances

To create an instance of the CSV provider, build a YAML file (instance.yml in this example) with the definition of the instance:

Then you can create the instance for the system using the Akamas CLI:

timestampFormat format

Notice that the week-year format YYYY is compliant with the ISO-8601 specification, but you should replace it with the year-of-era format yyyy if you are specifying a timestampFormat different from the ISO one. For example:

  • Correct: yyyy-MM-dd HH:mm:ss

  • Wrong: YYYY-MM-dd HH:mm:ss

Configuration options

When you create an instance of the CSV provider, you should specify some configuration information to allow the provider to correctly extract and process metrics from your CSV files.

You can specify configuration information within the config part of the YAML of the instance definition.

Required properties

  • address - a URL or IP identifying the address of the host where CSV files reside

  • username - the username used when connecting to the host

  • authType - the type of authentication to use when connecting to the file host; either password or key

  • auth - the authentication credential; either a password or a key according to authType. When using keys, the value can either be the value of the key or the path of the file to import from

  • remoteFilePattern - a list of remote files to be imported

Optional properties

  • protocol - the protocol to use to retrieve files; either scp or sftp. Default is scp

  • fieldSeparator - the character used as a field separator in the CSV files. Default is ,

  • componentColumn - the header of the column containing the name of the component. Default is COMPONENT

  • timestampColumn - the header of the column containing the timestamp. Default is TS

  • timestampFormat - the format of the timestamp (e.g. yyyy-MM-dd HH:mm:ss zzz). Default is YYYY-MM-ddTHH:mm:ss

You should also specify the mapping between the metrics available in your CSV files and those provided by Akamas. This can be done in the metrics section of the telemetry instance configuration. To map a custom metric you should specify at least the following properties:

  • metric - the name of a metric in Akamas

  • datasourceMetric - the header of a column that contains the metric in the CSV file

The provider ignores any column not present as datasourceMetric in this section.

The sample configuration reported in this section would import the metric cpu_util from CSV files formatted as in the example below:

Telemetry instance reference

The following represents the complete configuration reference for the telemetry provider instance.

The following table reports the configuration reference for the config section

The following table reports the configuration reference for the metrics section

Use cases

Here you can find common use cases addressed by this provider.

Linux SAR

Note that the metrics are percentages (between 1 and 100), while Akamas accepts percentages as values between 0 and 1, therefore each metric in this configuration has a scale factor of 0.001.

You can import the two CPU metrics and the memory metric from a SAR log using the following telemetry instance configuration.

Using the configured instance, the CSV File provider will perform the following operations to import the metrics:

  1. Retrieve the file "/csv/sar.csv" from the server "127.0.0.1" using the SCP protocol authenticating with the provided password.

  2. Use the column hostname to lookup components by name.

  3. Use the column timestamp to find the timestamps of the samples (that are expected to be in the format specified by timestampFormat).

  4. Collect the metrics (two with the same name, but different labels, and one with a different name):

    • cpu_util: in the CSV file is in the column %user and attach to its samples the label "mode" with value "user".

    • cpu_util: in the CSV file is in the column %system and attach to its samples the label "mode" with value "system".

    • mem_util: in the CSV file is in the column %memory.

Please refer to the for the list of component types, parameters, metrics, and constraints.

Akamas offers many operators that you can use to apply the parameters for the tuned PostgreSQL instances. In particular, we suggest using the for parameters templating and configuration, and the for restoring DB data and launching scripts.

The page describes how to get this Telemetry Provider installed. Once installed, this provider is shared with all users of your Akamas installation and can be used to monitor many different systems, by configuring appropriate telemetry provider instances as described in the page.

Telemetry Provider
Description

collects metrics from CSV files

collects metrics from Dynatrace

collects metrics from Prometheus

collects metrics from Spark History Server

collects metrics from Tricentis Neoload Web

collects metrics from MicroFocus Load Runner Professional

collects metrics from MicroFocus Load Runner Enterprise

collects price metrics for Amazon Elastic Compute Cloud (ec2) from Amazon’s own APIs

This page intends to provide some guidance in optimizing web applications. Please refer to the for the list of component types, parameters, metrics, and constraints.

: this provider can be configured to ingest data points generated by any monitoring application able to export the data in CSV format.

integrations leveraging , or as a load generator can use this ad-hoc provider that comes out of the box and uses the metrics defined in this optimization pack.

Use the to interpolate the tuned parameters in the configuration files of the underlying stack.

Restart the application using an .

Wait for the application to come up using the or .

use any of the to trigger the execution of the performance test against the application.

use any of the to restore the application to the original state.

See this for an example of a study leveraging the Web Application pack.

You can find detailed information on timestamp patterns in the Patterns for Formatting and Parsing section on the page.

Field
Type
Description
Default Value
Restrictions
Required
Field
Type
Description
Restrictions
Required

In this use case, you are going to import some metrics coming from , a popular UNIX tool to monitor system resources. SAR can export CSV files in the following format.

PostgreSQL optimization pack
FileConfigurator operator
Executor operator
Install CSV provider
Create a CSV provider instance
Web Application optimization pack
CSV File Provider
NeoLoad Web
LoadRunner Professional
LoadRunner Enterprise
FileConfigura operator
Executor operator
Sleep
Executor operator
available operators
available operators
page
# CSV Telemetry Provider Instance
provider: CSV File
config:
  address: host1.example.com
  authType: password
  username: akamas
  auth: akamas
  remoteFilePattern: /monitoring/result-*.csv
  componentColumn: COMPONENT
  timestampColumn: TS
  timestampFormat: YYYY-MM-dd'T'HH:mm:ss
metrics:
  - metric: cpu_util
    datasourceMetric: user%
akamas create telemetry-instance instance.yml system
TS,                   COMPONENT,  user%
2020-04-17T09:46:30,  host,       20
2020-04-17T09:46:35,  host,       23
2020-04-17T09:46:40,  host,       32
2020-04-17T09:46:45,  host,       21
provider: CSV File             # this is an instance of the CSV provider
config:
  address: host1.example.com   # the address of the host with the CSV files
  port: 22                     # the port used to connect
  authType: password           # the authentication method
  username: akamas             # the username used to connect
  auth: akamas                 # the authentication credential
  protocol: scp                # the protocol used to retrieve the file
  fieldSeparator: ","          # the character used as field separator in the CSV files
  remoteFilePattern: /monitoring/result-*.csv    # the path of the CSV files to import
  componentColumn: COMPONENT                     # the header of the column with component names
  timestampColumn: TS                            # the header of the column with the time stamp
  timestampFormat: YYYY-mm-ddTHH:MM:ss           # the format of the timestamp
metrics:
  - metric: cpu_util                             # the name of the Akamas metric
    datasourceMetric: user%                      # the header of the column with the original metric
    staticLabels:
      mode: user                                 # (optional) additional labels to add to the metric

metric

String

The name of the metric in Akamas

An existing Akamas metric

Yes

datasourceMetric

String

The name (header) of the column that contains the specific metric

An existing column in the CSV file

Yes

scale

Decimal number

The scale factor to apply when importing the metric

staticLabels

List of key-value pairs

A list of key-value pairs that will be attached to the specific metric sample

No

hostname, interval,     timestamp, 		        %user,	%system,      %memory
machine1, 600,		2018-08-07 06:45:01 UTC,	30.01,	20.77,		96.21
machine1, 600,		2018-08-07 06:55:01 UTC,	40.07,	13.00,		84.55
machine1, 600,		2018-08-07 07:05:01 UTC,	5.00,	90.55,		89.23
provider: CSV File
config:
  remoteFilePattern: /csv/sar.csv
  address: 127.0.0.1
  port: 22
  username: user123
  auth: password123
  authType: password
  protocol: scp
  componentColumn: hostname
  timestampColumn: timestamp
  timestampFormat: yyyy-MM-dd HH:mm:ss zzz
metrics:
  - metric: cpu_util
    datasourceMetric: %user
    scale: 0.001
    staticLabels:
      mode: user
  - metric: cpu_util
    datasourceMetric: %system
    scale: 0.001
    staticLabels:
      mode: system
  - metric: mem_util
    scale: 0.001
    datasourceMetric: %memory
CSV provider
Dynatrace
Prometheus
Spark History Server
NeoloadWeb
Load Runner Professional
Load Runner Enterprise
AWS
DateTimeFormatter (Java Platform SE 8)
SAR

address

String

The address of the machine where the CSV file resides

A valid URL or IP

Yes

port

Number (integer)

The port to connect to, in order to retrieve the file

22

1≤port≤65536

No

username

String

The username to use in order to connect to the remote machine

Yes

protocol

String

scp

scp sftp

No

authType

String

Specify which method is used to authenticate against the remote machine:

  • password: use the value of the parameter auth as a password

  • key: use the value of the parameter auth as a private key. Supported formats are RSA and DSA

password key

Yes

auth

String

A password or an RSA/DSA key (as YAML multi-line string, keeping new lines)

Yes

remoteFilePattern

String

A list of valid path for linux

Yes

componentColumn

String

The CSV column containing the name of the component.

The column's values must match (case sensitive) the name of a component specified in the System

COMPONENT

The column must exists in the CSV file

Yes

timestampColumn

String

The CSV column containing the timestamps of the samples

TS

The column must exists in the CSV file

No

timestampFormat

String

Timestamps' format

YYYY-mm-ddTHH:MM:ss

No

fieldSeparator

String

Specify the field separator of the CSV

,

, ;

No

The protocol used to connect to the remote machine: or

The path of the remote file(s) to be analyzed. The path can contains expressio

Must be specified using .

SCP
SFTP
GLOB
Java syntax

Dynatrace provider

The Dynatrace provider collects metrics from Dynatrace and makes them available to Akamas.

This provider includes support for several technologies. In any case, custom queries can be defined to gather the desired metrics.

Supported versions

Dynatrace SaaS/Managed version 1.187 or later

Supported component types:

  • Kubernetes and Docker

  • Web Application

  • Ubuntu-16.04, Rhel-7.6

  • java-openjdk-8, java-openjdk-11

  • java-ibm-j9vm-6, java-ibm-j9vm-8, java-eclipse-openj9-11

Prerequisites

This section provides the minimum requirements that you should match before using the Prometheus provider.

  • Dynatrace SaaS/Managed version 1.187 or later

  • A valid Dynatrace license

  • Dynatrace OneAgent installed on the servers where the Dynatrace entities to be monitored are running

  • Connectivity between Akamas and the Dynatrace server on port 443

Dynatrace Token

The Dynatrace provider needs a Dynatrace API token with the following privileges:

  • metrics.read (Read metrics)

  • entities.read (Read entities and tags)

  • DataExport (Access problem and event feed, metrics, and topology)

  • ReadSyntheticData (Read synthetic monitors, locations, and nodes)

  • DataImport (Data ingest, e.g.: metrics and events). This permission is used to inform Dynatrace about configuration changes.

Component configuration

To instruct Akamas from which Dynatrace entities (e.g. Workloads, Services, Process Groups) metrics should be collected you can some specific properties on components.

Different strategies can be used to map Dynatrace entities to Akamas components:

  • By id

  • By name

  • By tags

  • By Kubernetes properties

By id

You can map a component to a Dynatrace entity by leveraging the unique id of the entity, which you should put under the id property in the component. This strategy is best used for long-lived instances whose ID does not change during the optimization such as Hosts, Process Groups or Services.

Here is an example of how to setup host monitoring via id:

name: My Host
properties:
 dynatrace:
  id: HOST-12345YUAB1

You can find the id of a Dynatrace entity by looking at the URL of a Dynatrace dashboard relative to the entity. Watch out that the "host" key is valid only for Linux components, other components (e.g. the JVM) require to drill down into the host entities to get the PROCESS_GROUP_INSTANCE or PROCESS_GROUP id.

By name

You can map a component to a Dynatrace entity by leveraging the entity’s display name. This strategy is similar to the map by id but provides a more friendly way to identify the mapped entity. Beware that id multiple entities in your Dynatrace installation share the same name they will all be mapped to the same component. The Dynatrace display name should be put under the name property in the component definition:

name: MyComponent
properties:
 dynatrace:
  name: host-1

By tags

You can map a component to a Dynatrace entity by leveraging Dynatrace tags that match the entity, tags which you should put under the tags property in the component definition.

If multiple tags are specified, instances matching any of the specified tags will be selected.

This sample configuration maps to the component all Dynatrace entities with tag environment: test or [AWS]dynatrace-monitored: true

name: MyComponent
properties:
 dynatrace:
  tags:
     environment: test
     [AWS]dynatrace-monitored: true

Dynatrace supports both key-value and key-only tags. Key-only tags can be specified as Key-value tags with an empty value as in the following example

name: MyComponent
properties:
 dynatrace:
  tags:
     myKeyOnlyTag: ""

By Kubernetes properties

You can map a component to a Dynatrace entity referring to a Kubernetes cluster (e.g. a Pod or a Container) by leveraging dedicated properties.

Container

In order to properly identify the set of containers to be mapped, you can specify the following properties. Any container matching all the properties will be mapped to the component.

Akamas property
Dynatrace property
Location

namespace

Kubernetes namespace

Container dashboard

containerName

Kubernetes container name

Container dashboard

basePodName

Kubernetes base pod name

Container dashboard

You can retrieve all the information to setup the properties on the top of the Dynatrace container dashboard.

The following example shows how to map a component to a container running in Kubernetes:

dynatrace:
  type: CONTAINER_GROUP_INSTANCE
  kubernetes:
    namespace: boutique
    containerName: server
    basePodName: ak-frontend-*

Pod

In order to properly identify the set of pods to be mapped, you can specify the following properties. Any pod matching all the properties will be mapped to the component.

Akamas property
Dynatrace property
Location

state

State

Pod dashboard

namespace

Namespace

Pod dashboard

workload

Workload

Pod dashboard

If you need to further narrow your pod selection you can also specify a set of tags as described in the by tags. Note that tags for Kubernetes resources are called Labels in the Dynatrace dashboard.

Labels are specified as key-value in the Akamas configuration. In Dynatrace’s dashboard key and value are separated by a column (:)

Example

The following example shows how to map a component to a pod running in Kubernetes:

dynatrace:
  type: CLOUD_APPLICATION_INSTANCE
  namePrefix: ak-frontend-
  kubernetes:
    labels:
      workload: ak-frontend
      product: hipstershop

Container, Pod, or Workload?

Please note, that when you are mapping components to Kubernetes entities the property type is required to instruct Akamas on which type of entity you want to map. Dynatrace maps Kubernetes entities to the following types:

Kubernetes type
Dynatrace type

Docker container

CONTAINER_GROUP_INSTANCE

Pod

CLOUD_APPLICATION_INSTANCE

Workload

CLOUD_APPLICATION

Namespace

CLOUD_APPLICATION_NAMESPACE

Cluster

KUBERNETES_CLUSTER

Improve component mapping with type

You can improve the matching of components with Dynatrace by adding a type property in the component definition, this property will help the provider match only those Dynatrace entities of the given type.

name: MyComponent
properties:
 dynatrace:
  type: SERVICE     # here the type helps the mapping by tags by filtering down entities that are only services
  tags:
     environment: test
     "[AWS]dynatrace-monitored": true

The type of an entity can be retrieved from the URL of the entity’s dashboard

Available entities types can be retrieved, from your Dynatrace instance, with the following command:

curl 'https://<Your Dynatrace host>/api/v2/entityTypes/?pageSize=500' --header 'Authorization: Api-Token <API-TOKEN>'

Mapping multiple entities in one component

In some circumstances, you might want to map multiple Dyantrace entities (e.g. a set of hosts) to the same Akamas component and import aggregated metrics.

This can be easily done by using tags. If Akamas detects that multiple entities have been mapped to the same component it will try to aggregate metrics; some metrics, however, can not be automatically aggregated.

To force aggregation on all available metrics you can add the mergeable: trueproperty to the component under the Dynatrace element.

name: MyComponent
properties:
 dynatrace:
  mergeable: true
  tags:
     environment: test
     [AWS]dynatrace-monitored: true

Refer to to see how component-types metrics are extracted by this provider.

A Dynatrace API token with the privileges described .

To generate an API Token for your Dynatrace installation you can follow .

Dynatrace provider metrics mapping
these steps
here
Akamas high-level architecture
Akamas optimizes any system and integrates with any ecosystem
Akamas key use cases
Akamas high-level architecture
Steps to prepare an optimization study
Optimization campaigns executed in a yearly timeframe (llustrative)
Optimization campaigns and their optimization studies (illustrative)
Ready to start a new offline optimization study
Study bootstrapping
Warning message when stopping a live optimization study
Defining the optimization goal (and constraints)
Best Configuration section
Progress tab showing the study steps and experiments
Progress tab showing the configuration associted to an experiment as compared to the best and baseline
Higher part of the Analysis tab showing scored experiments over time
Lower part of the Analysis tab showing values for each confguration metric and parameter
Analysis tab with selected metrics and relative constraints (toggle on)
Best Score and Insight section of an offline optinmization study
Analysis tab showing experiment scores and the insght tags
Baseline just finished - all tags are assigned to this experiment
Experiment #2 is awared Best CPU tag
Experiment #4 is awared Best CPU
Experiment #7 is awarded Best MEM usage
Metrics section of a live optimization study
From the metrics cahrt displaying configurations (toggle on) to a specific configuration
The list of configurations applied ovet time in the All Configuration section
A specifici configuration from the All Configuration section
Pending configutation
Preparing an optimization: model the system
Figure: Example of a modeled system and its components
Preparing an optimization: model the system components
Optimization packs that are already installed or need to be installed
Automated workflow for a live optimization study
Preparing an optimization: create telemetry instances
Telemetry instances associated with a system
Preparing an optimization: create automation workflow
Workflow and its tasks (partial)
Preparing an optimization: creating an optimization study
Akamas UI for creating offline optimization studies
Preparing an optimization: creating an offline optimization study
Preparing an optimization: creating a live optimization study
Automated workflow for an offline optimization study
Workflow reuse by two different studies
Defining the optimization paramters (and metrics)
Editing the range of values for a parameter in a study
Illustration of the trim and stability windowing policies
Defining a trim windowing policy
Defining a stabilty windowing policy
Example of optimization study with three steps: baselining, bootstrap and optimize
Defining the steps while creating a new study
Changing from automatic to manual mode
Specifying the study KPIs
Modifying the list of KPIs for a study
Constraints