Only this pageAll pages
Powered by GitBook
Couldn't generate the PDF for 270 pages, generation stopped at 100.
Extend with 50 more pages.
1 of 100

3.1.3

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Cloud Hosting

Refer to your Cloud Provider website for information about cloud hosting options and related cost information.

AWS EC2

For AWS EC2 costs visit the and use the to estimate the cost for your architecture.

EC2 Pricing page
AWS Pricing Calculator

Deployment

Akamas is an on-premise product running on a dedicated machine within the customer environment:

  • on a virtual or physical machine in your data center

  • on a virtual machine managed running on a cloud, by any cloud provider (e.g. AWS EC2)

  • on your own laptop

Akamas also provides a Free Trial option which can be requested .

here

Licensing

Software Licenses

Maintenance & Support Services

Other billable services

Akamas software licensing model is subscription-based (typically on a yearly basis). For more information on Akamas' cost model and software licensing costs, please contact .

Akamas software licenses include which also include access to .

Akamas also provides optional professional services for deployment, training, and integration activities. For more information about Akamas professional services, please contact .

info@akamas.io
info@akamas.io
Customer Support Services
Maintenance & Support Services

Security

Akamas takes security seriously and provides enterprise-grade software where customer data is kept safe at all times. This page describes some of the most important security aspects of Akamas software and information related to process and tools used by the Akamas company (Akamas S.p.A) to develop its software products.

Information managed by Akamas

Akamas manages the following types of information:

  • System configuration and performance metrics: technical data related to the systems being optimized. Examples of such data include the number of CPUs available in a virtual machine or the memory usage of a Java application server;

  • User accounts: accounts assigned to users to securely access the Akamas platform. For each user account, Akamas currently requires an account name and a password. Akamas does not collect any other personal identifying information;

  • Service Credentials: credentials used by Akamas to automate manual tasks and to integrate with external tools. In particular, Akamas leverages the following types of interaction:

    • Integration with monitoring and orchestration tools, e.g. to collect IT performance metrics and system configuration. As a best practice, Akamas recommends using dedicated service accounts with minimal read-only privileges.

    • Integration with the target systems to apply changes to configuration parameters. As a best practice, Akamas recommends using dedicated service accounts with minimal privileges to read/write identified parameters.

GDPR compliance

Akamas is a fully GDPR compliant product.

Akamas is a company owned by the Moviri Group. The Moviri Group and all its companies are fully compliant with GDPR. Moviri Group Data Privacy Policy and Data Breach Incident Response Plan which apply to all the owned companies can be requested from Akamas Customer Support.

Security certifications

Akamas is an on-premises product and does not transmit any data outside the customer network. Considering the kind of data that is managed within Akamas (see section "Which information is managed by Akamas"), specific security certifications like PCI or HIPAA are not required as the platform does not manage payment or health-related information.

Data encryption

Akamas takes the need for security seriously and understands the importance of encrypting data to keep it safe at-rest and in-flight.

In-Flight encryption

All the communications between Akamas UI and CLI and the back-end services are encrypted via HTTPS. The customer can configure Akamas to use customer-provided SSL certificates in all communications.

Communications between Akamas services and other integrated tools within the customer network rely on the security configuration requirements of the integrated tool (e.g. HTTPS calls to interact with REST services).

At-Rest encryption

Akamas is an on-premises product and runs on dedicated virtual machines within the customer environment. At-Rest Encryption can be achieved following customer policies and best practices, for example leveraging operating system-level techniques.

Akamas also provides an application-level encryption layer aimed at extending the scope of at-Rest encryption. With this increased level of security, sensitive data managed by Akamas (e.g. passwords, tokens, or keys required to interact with external systems) are safely stored in Akamas databases using industry-standard AES 256-bit encryption.

Encryption option for Akamas on EC2

In case of Akamas hosted on an AWS machine you may optionally create an EC2 instance with an encrypted EBS volume before installing OS and Akamas, in order to achieve a higher level of security.

Password management

Password Security

Passwords are securely stored using a one-way hash algorithm.

Password complexity

Akamas comes with a default password policy with the following requirements:

  • have a minimum length of 12 characters.

  • contains at least 1 uppercase and 1 lowercase character.

  • contains at least 1 special character.

  • is different from the username.

  • must be different from the last password set.

Customers can modify this policy by providing a custom one that matches their internal security policies.

Password rotation

Akamas enforces no password rotation mechanism.

Credential storage

  • When running on a Linux installation with KDE's KWallet enabled or with GNOME's Keyring enabled, the credentials will be stored in the default wallet/keyring.

  • When running on Windows, the credential will be stored in Windows Credential Locker.

  • When running on a macOS, the credential will be stored in Keychain.

  • When running on a Linux headless installation, the credentials will be stored in CLEAR TEXT in a file in the current Akamas configuration folder.

Resources visibility model

Akamas provides fine granularity control over resources managed within the platform. In particular, Akamas features two kinds of resources:

  • Workspace resources: entities bound to one of the isolated virtual environments (named workspaces) that can only be accessed in reading or writing mode by users to whom the administrators explicitly granted the required privileges. Such resources typically include sensitive data (e.g. passwords, API tokens). Examples of such resources include the system to be optimized, the set of configurations, optimization studies, etc.

  • Shared resources: entities that can be installed and updated by administrators and are available to all Akamas users. Such resources only contain technology-related information (e.g. the set of performance metrics for a Java application server). Examples of such resources include Optimization Packs, which are libraries of technology components that Akamas can optimize, such as a Java application server.

Akamas Logs

Akamas logs traffic from UI and APIs. Application level logs include user access via APIs and UI and any action taken by Akamas on integrated systems.

Akamas logs are retained on the dedicated virtual machine within the customer environment, by default, for 7 days. The retention period can be configured according to customer policies. Logs can be accessed either via UI or via log dump within the retention period. Additionally, logs have a format that can be easily integrated with external systems like log engines and SIEM to support forensic analysis.

Code scanning policy

Akamas is developed according to security best practices and the code is scanned regularly (at least daily).

The Akamas development process leverages modern continuous integration approaches and the development pipeline includes SonarQube, a leading security scanning product that includes comprehensive support for established security standards including CWE, SANS, and OWASP. Code scanning is automatically triggered in case of a new build, a release, and every night.

Vulnerability scanning and patch management policy

Akamas features modern micro-service architecture and is delivered as a set of docker containers whose images are hosted on a private Elastic Container Registry (ECR) repository on the AWS cloud. Akamas leverages the vulnerability scanning capabilities of AWS ECR to identify vulnerabilities within the product container images. AWS ECR uses the Common Vulnerabilities and Exposures (CVEs) database from the open-source Clair project.

If a vulnerability is detected, Akamas will perform a security assessment of the security risk in terms of the impact of the vulnerability, and evaluate the necessary steps (e.g. dependency updates) required to fix the vulnerability within a timeline related to the outcome of the security assessment.

After the assessment, the vulnerability can be fixed either by recommending the upgrade to a new product version or by delivering a patch or a hotfix for the current version.

Customer Support Services

Akamas Customer Support Services are delivered by Akamas support engineers, also called Support Agents, who will work remotely with Customer to provide a temporary remedy for the incident and, ultimately, a permanent resolution. Akamas Support Agents automatically escalate issues to the appropriate technical group within Akamas and notify Customers of any relevant progress. Akamas provides Customers with the ability to escalate issues when appropriate.

Please notice that Customer Support services are not to be considered as alternatives to product documentation and training, or to professional and consulting services, so adequate knowledge of Akamas products is assumed when interacting with Akamas Customer Support. Thus, during the resolution of a reported issue Support Agents may redirect Customer to training or professional services (that are not part of the scope of this service).

How to use this documentation

This page is intended as your entry point to the Akamas documentation.

Introduction to Akamas

A quick introduction to Akamas

Akamas is the AI-powered optimization platform designed to maximize service quality and cost efficiency without compromising on application performance. Akamas supports both production environments under live, dynamic workloads, and in test/pre-production environments against any what-if scenario and workload.

Thanks to Akamas, performance engineers, DevOps, CloudOps, FinOps and SRE teams can keep complex applications, such as Kubernetes microservices applications, optimized to avoid any unnecessary cost and any performance risks.

Akamas Optimization platform

The Akamas optimization platform leverages patented AI techniques that can autonomously identify optimal full-stack configurations driven by any custom-defined goals and constraints (SLOs), without any human intervention, any agents, and any code or byte-code changes.

Akamas optimal configurations can be applied either i) under human approval (human-in-the-loop mode) or ii) automatically, as a continuous optimization step in a CI/CD pipeline (in-the-pipe) or iii) autonomously by Akamas (autopilot).

Akamas coverage

Akamas can optimize any system with respect to any set of parameters chosen from the application, middleware, database, cloud, and any other underlying layers.

Akamas provides dozens of out-of-the-box Optimization Packs available for key technologies such as JVM, Go, Kubernetes, Docker, Oracle, MongoDB, ElasticSearch, PostgreSQL, Spark, AWS EC2 and Lambda, and more. Optimization Pack provides parameters, relationships, and metrics to accelerate the optimization process setup and support company-wide best practices. Custom Optimization Packs can be easily created without any coding.

The following figure is illustrative of Akamas coverage for both managed technologies and integrated components of the ecosystem.

Akamas integrations

Akamas can integrate with any ecosystem thanks to out-of-the-box and custom integrations with the following components:

  • telemetry & monitoring tools and other sources of KPIs and cost data, such as Dynatrace, Prometheus, CloudWatch, and CSV files

  • configuration management tools, repositories and interfaces to apply configurations, such as Ansible, Openshift, and Git

  • value stream delivery tools to support a continuous optimization process, such as Jenkins, Dynatrace Cloud Automation, and GitLab

  • load testing tools to generate simulated workloads in test/pre-production, such as LoadRunner, NeoLoad, and JMeter

Akamas has been designed around Infrastructure-as-Code (IaC) and DevOps principles. Thanks to a comprehensive set of APIs and integration mechanisms, it is possible to extend the Akamas optimization platform to manage any system and integrate with any ecosystem.

Use Cases

Akamas optimization platform supports a variety of use cases, including:

  • Improve Service Quality: optimize application performance (e.g. maximize throughput, minimize response time and job execution time) and stability (lower fluctuations and peaks);

  • Increase Business Agility: identify resource bottlenecks in early stages of the delivery cycle, avoid delays due to manual remediations - release higher quality services and reduce production incidents;

  • Increase Service Resilience: improve service resilience under higher workloads (e.g. expected business growth) or failure scenarios identified by chaos engineering practices - improve SRE practice;

  • Reduce IT Cost / Cloud Bill: reduce on-premise infrastructure cost and cloud bills due to resource over-provisioning - improve cost efficiency of Kubernetes microservices applications;

  • Optimize Cloud Migration: safely migrate on-premise applications to cloud environments for optimal cost efficiency evaluate options to migrate to managed services (e.g. AWS Fargate);

  • Improve Operational Efficiency: save engineering time spent on manual tuning tasks and enable Performance Engineering teams to do more in less time (and with less external consulting).

Getting started with Akamas

This guide introduces Akamas and covers various fundamental topics such as licensing and deployment models, security topics, and maintenance & support services.

Maintenance & Support (M&S) Services

This page is intended as a first introduction to Akamas Maintenance & Support (M&S) Services.

Please refer to the specific contract in place with your Company.

Akamas M&S Services include:

Akamas M&S Services do not include any installation and upgrade services, creation of any custom optimization packs, telemetry providers, or workflow operators, or implementation of any custom features and integrations that are not provided out-of-the-box by the Akamas products.

Support levels with Akamas 3.1

Support levels for software versions

Different levels of support are provided for software versions of Akamas products, starting from its general availability (GA) date, and depending on the release of following software versions.

Version Numbering

Akamas adopts a three-place numbering scheme MA.MI.SP to designate released versions of its Software, where:

  • MA is the Major Version

  • MI is the Minor Version

  • SP is the Service Pack or Patch number

Support levels

The following table describes the three levels of support for a software version.

End-of-Life (EOL)

At any time, Akamas reserves the right to "end of life" (EOL) a software product and to terminate any Maintenance & Support Services for such product, provided that Licensor has notified the Licensee at least 12 months prior to the above-mentioned termination.

The period of time occurring between the "end of life" notification and the actual termination of Maintenance & Support Services is provided as follows:

  • No new enhancements will be introduced.

  • No enhancements will be made to support new or updated versions of the platform on which the product runs or which it integrates.

  • New hotfixes for problems of high technical impact or business exposure for customers may still be developed. Based on customer input, Akamas Support Agents will determine the degree of impact and exposure and the consequent activities.

  • Reasonable efforts will be done to inform the Customer of any fixes, service packs, patches, or workarounds applicable to the reported case if any.

It is recommended to read this guide before moving to other guides on how to install, integrate, and use Akamas. The section of the Reference guide can help in reviewing Akamas key concepts.

access to Software versions released as major and minor versions, service packs, patches, and hotfixes according to .

assistance from for inquiries about the Akamas product and issues encountered while using Akamas products where there is a reasonable expectation that issues are caused by Akamas products, according to

Based on the , the following table describes the level of support of the Akamas versions after the version 3.1 GA date (2022 November, 11th).

Version
Support Level
Support level
Description
Glossary

Full Support

Akamas provides full support for one previous (either major or minor) version in addition to the latest available GA version.

For Software version in Full Support level: Akamas Support Agents provide service packs, patches, hotfixes, or workarounds to make the Software operate in substantial conformity with its then-current operating documentation.

Limited Support

Following the Full Support period, Akamas provides Limited Support for additional 12 months.

For Software versions in Limited Support level:

  • No new enhancements will be made to a version in "Limited Support" Akamas Support Agents will direct Customers to existing fixes, patches, or workarounds applicable to the reported case, if any;

  • Akamas Support Agents will provide hot fixes for problems of high technical impact or business exposure for customers;

  • Based on Customer input, Akamas Support Agents will determine the degree of impact and exposure and the consequent activities;

  • Akamas Support Agents will direct Customers to upgrade to a more current version of the Software.

No Support

Following the Limited Support period, Akamas provides no support for any Software version.

For Software versions in No Support level: No new maintenance releases, enhancements, patches, or hot fixes will be made available. Akamas Support Agents will direct Customers to upgrade to a more current version of the Software.

Support levels for software versions
Akamas Customer Support
Support levels for Customer Support Services
Support levels for software versions

  • provides a very first introduction to AI-powered optimization

  • covers Akamas licensing, deployment, security topics

  • describes Akamas maintenance and support services.

This guide provides some preliminary knowledge required to puchaise, implement and use Akamas.

User personas: All roles

  • describes the Akamas architecture

  • provides the hardware, software and network prerequisites

  • describes the steps to install an Akamas Server and CLI

This guide provides the knowledge required to install and manage an Akamas installation.

User personas: Akamas Admin

  • describes the Akamas optimization process and methodology

  • provides guidelines for optimizing some specific technologies

  • provides examples of optimization studies

This guide provides the methodology to define an optimization process and knowledge to leverage Akamas

User personas: Analyst / Practicioner teams

  • describes how to integrate Akamas with the telemetry providers and configuration management tools

  • describes how to integrate Akamas with load testing tools

  • describes how to integrate Akamas with CI/CD tools

This guide provides the knowledge required to integrate Akamas with the ecosystem

User personas: Akamas Admin, DevOps team

  • provides a glossary of Akamas key concepts with references to construct templates and commands

  • provides a reference to Akamas construct templates

  • provides a reference to Akamas command-line commands

  • describes Akamas optimization packs and telemetry providers

User personas: Akamas Admin, DevOps team, Analyst / Practicioner teams

  • describes how to setup a test environment for experimenting with Akamas

  • describes how to apply the Akamas approach to the optimization of some real-world cases

  • provides examples of Akamas templates and commands for the real-world cases

User personas: Analyst / Practicioner teams

3.1

Full Support

Notice: this will change once the following major version is released

3.0

Full Support

Notice: this will change once the following major version is released

2.x

Limited Support until 12 months after 3.0 GA date, that is 2023 September, 13th (see )

1.x

No Support

Getting started with Akamas
Installing Akamas
Using Akamas
Integrating Akamas
Akamas Reference
Knowledge Base
Support Levels with Akamas 3.0

Install Akamas dependencies

While some links to official documentation and installation resources are provided here, please make sure to refer to your internal system engineering department to ensure that your company deployment processes and best practices are correctly matched.

Dependencies Setup

As a preliminary step before installing any dependency, it is strongly suggested to create a user named akamas on your machine hosting Akamas Server.

Docker

Follow the reference documentation to install docker on your system.

Verify dependencies

As a quick check to verify that all dependencies have been correctly installed, you can run the following commands

  • Docker:

    docker run hello-world

For offline installations, you can check docker with docker ps command

  • Docker compose :

    docker compose --version

Docker versions older than 23 must usedocker-compose command instead of docker compose

  • AWS CLI:

    aws --version

This page will guide you through the installation of software components that are required to get the Akamas Server installed on a machine. Please read the for a detailed list of these software components for each specific OS.

Docker installation guide:

Docker compose is already installed since Docker 23+. To install it on previous versions of Docker follow this installation guide:

AWS CLI v2:

To run docker with a non-root user, such as the akamas user, you should add it to the docker group. You can follow the guide at:

Akamas dependencies
https://docs.docker.com/engine/install
https://docs.docker.com/compose/install/
https://docs.aws.amazon.com/cli/latest/userguide
https://docs.docker.com/engine/install/linux-postinstall/

Install the Akamas Server

Akamas is deployed as a set of containerized services running on Docker and managed via Docker Compose. The latest version of the Akamas Docker Compose file and all the images required by Docker can be downloaded from the AWS ECR repository.

Two installation modes are available:

Network requirements

This section lists all the connectivity settings required to operate and manage Akamas

Internet access

Internet access is required for Akamas online installation and updated procedures and allows retrieving the most updated Akamas container images from the Akamas private Amazon Elastic Container Registry (ECR).

If internet access is not available for policies or security reasons, Akamas installation and updates can be executed offline.

Internet access from the Akamas server is not mandatory but it’s strongly recommended.

Ports

The following table provides a list of the ports on the Akamas server that have to be reachable by Akamas administrators and users to properly operate the system.

In the specific case of AWS instance and customer instances sharing the same VPC/Subnet inside AWS, you should:

  • open all of the ports listed in the table above for all inbound URLs (0.0.0.0/32) on your AWS security group

  • open outbound rules to all traffic and then attach this AWS security group (which must reside inside a private subnet) to the Akamas machine and all customer application AWS machines

, in case the Akamas Server has access to the Internet - is also supported.

, in case the Akamas Server does not have access to the Internet.

online installation mode
installation behind a proxy server
offline installation mode

Source

Destination

Port

Reason

Akamas admin

Akamas server

22

ssh

Akamas admin/user

Akamas server

80, 443

Akamas web UI access

Akamas admin/user

Akamas server

8000, 8443

Akamas API access

Offline installation mode

Akamas is deployed as a set of containerized services running on Docker and managed via Docker Compose. In offine installation mode, the latest version of the Akamas Docker Compose file and all the images required by Docker cannot be downloaded from the AWS ECR repository.

Get Akamas Docker artifacts

Get in contact with Akamas Customer Services to get the latest versions of the Akamas artifacts to be uploaded to a location of your choice on the dedicated Akamas Server.

Akamas installation artifacts will include:

  • images.tar.gz: Akamas main images

  • docker-compose.yml: docker-compose file for Akamas

  • a binary file named akamas: this is the binary file of the akamas CLI that will be used to verify the installation.

Import Docker images

A preliminary step in offline installation mode is to import the shipped Docker images by running the following commands in the same directory where the tar files have been stored:

cd <your bundle files location>
docker image load -i images.tar.gz

Notice that this import procedure could take quite some time!

Configure Akamas environment variables

To configure Akamas, the following environment variables are required to be set:

  • AKAMAS_CUSTOMER: this is the customer name matching the one referenced in the Akamas license.

  • AKAMAS_BASE_URL: this is the endpoint in the Akamas APIs that will be used to interact with the CLI, typically http://<akamas server dns address>:8000

Environment variables creation is performed by the snippet below:

# add double quotes ("xx xx") if the name contains white spaces
export AKAMAS_CUSTOMER=<your name or your organization name>
export AKAMAS_BASE_URL=http://<akamas server dns address>:8000

It is recommended to save these exported variables in your ~/.bashrc file for convenience.

Run installation

To start Akamas you can now simply navigate into the akamas folder and run a docker-compose command as follows:

cd <your docker-compose file location>
docker compose up -d

Notice that you may get the following error:

Error saving credentials: error storing credentials - err: exit status 1, out: Cannot autolaunch D-Bus without X11 $DISPLAY
  • Ubuntu

sudo apt-get install -y pass
  • RHEL

yum install pass

This is a documented docker bug (see ) that can be solved by installing the "pass" package:

this link

Installing Akamas

This section describes how to get Akamas installed.

Preliminary steps

Before installing Akamas, please follow these steps:

Installation steps

Please follow these steps to install the Akamas Server:

Please make sure to read the section before installing Akamas.

Please also read the section on how to and how to . Finally, read the relevant sections of to integrate Akamas into your specific ecosystem.

Getting Started
Review hardware, software, and network prerequisites
Install all Akamas dependencies
Install the Akamas Server
Install the Akamas CLI
Verify the Akamas Server
Install an Akamas license
troubleshoot the installation
manage the Akamas Server
Integrating Akamas

Online installation mode

Akamas is deployed as a set of containerized services running on Docker and managed via Docker Compose. In online installation mode, the latest version of the Akamas Docker Compose file and all the images required by Docker can be downloaded from the AWS ECR repository.

Get Akamas Docker artifacts

It is suggested to first create a directory akamas in the home directory of your user, and then run the following command to get the latest compose file:

cd ~
mkdir akamas
cd akamas
curl -O https://s3.us-east-2.amazonaws.com/akamas/compose/$(curl https://s3.us-east-2.amazonaws.com/akamas/compose/stable.txt)/docker-compose.yml

Configure Akamas environment variables

To configure Akamas, you should set the following environment variables:

  • AKAMAS_CUSTOMER: this is the customer name matching the one referenced in the Akamas license.

  • AKAMAS_BASE_URL: this is the endpoint in the Akamas APIs that will be used to interact with the CLI, typically http://<akamas server dns address>:8000

You can export the variables using the following snippet:

# add double quotes ("xx xx") if the name contains white spaces
export AKAMAS_CUSTOMER=<your name or your organization name>
export AKAMAS_BASE_URL=http://<akamas server dns address>:8000

It is recommended to save these exported variables in your ~/.bashrc file for convenience.

Start Akamas

In order to login into AWS ECR and pull the most recent Akamas containers images you also need to set the AWS authentication variables to the appropriate values provided by Akamas Customer Support Services by running the following command:

export AWS_ACCESS_KEY_ID=<your access key id>
export AWS_SECRET_ACCESS_KEY=<your secret access key>
export AWS_DEFAULT_REGION=us-east-2

At this point, you can start installing Akamas server by running the following AWS CLI commands:

aws ecr get-login-password --region us-east-2 | docker login -u AWS --password-stdin https://485790562880.dkr.ecr.us-east-2.amazonaws.com
docker compose up -d

In case the Akamas Server is behind a proxy server please also read how to .

setup Akamas behind a Proxy

Support levels for Customer Support Services

Akamas Customer Support Services provides different standard levels of support. Please verify the level of support specified in the contract in place with your Company.

Severity levels

The following table describes the different severity levels for Customer Support.

Severity level
Description
Impact

S1

Blocking: production Customer system is severely impacted.

Notice: this severity level only applies to production environments

Catastrophic business impacts (e.g. complete loss of a core business process and work cannot reasonably continue (e.g. all final users are unable to access the Customer application)

S2

Critical: one major Akamas functionality is unavailable

Significant loss or degradation of the Akamas services (e.g. Akamas is down or Akamas is not generating recommendations)

S3

Severe: limitation in accessing one major Akamas functionality

Moderate business impact and moderate loss or degradation of services, but work can reasonably continue in an impaired manner (e.g. only some specific functions are not working properly)

S4

Informational: Any other request

Minimum business impact.

Substantially functioning with minor or no impediments of services.

Support conditions

The contract in place with the Customer specifies the level of support provided by Akamas Agents, according at least to the following items:

  • Maximum number of support seats: this is the maximum number of named users within the Customer organization who can request Akamas Customer Support.

  • Language(s): these are the languages that can be used for interacting with Akamas Support Agents - the default is English.

  • Channel(s): these are the different communication channels that can be used to interact with Akamas Agents - these may include one or more options among web ticketing, email, phone, and Slack channel.

  • Max Initial Response Time: this refers to the time interval occurring from the time a request is opened by Customer to Customer Support and the time a Support Agent responds with a first notification (acknowledgment).

  • Severity: this is the level of severity associated with a reported issue, which initially corresponds to the severity level originally indicated by the Customer. Notice that the severity level may change, for example as new information becomes available or if Support Agents and Customer agree to re-evaluate it. Please notice that the severity level may be downgraded by Support Agents if Customer is not able to provide adequate resources or responses to enable Akamas to continue with its resolution efforts.

  • Initial Remedy: this refers to any operation aimed at addressing a reported issue by restoring a minimal level of operations, even if it may cause some performance degradation of the Customer service or operations. A workaround is to be considered a valid Initial Remedy.

Please notice that Support Agents may refuse to serve a service request to Customer Support either in case Customer does not have a valid Maintenance & Support subscription or in case the above-mentioned conditions or other conditions stated in the contract in place are not met. In any case, the Customer is expected to provide all the information required by Support Agent in order to serve service requests Customer Support.

Hardware Requirements

Running in your data center

The following table provides the minimal hardware requirements for the virtual or physical machine used to install the Akamas server in your data center.

Resource

Requirement

CPU

4 cores @ 2 GHz

Memory

16 GB

Disk Space

70 GB

Running on AWS EC2

As shown in the following diagram, you can create the Akamas instance in the same AWS region, Virtual Private Cloud (VPC), and private subnet as your own already existing EC2 machines and by creating/configuring a new security group that allows communication between your application instances and Akamas instance. The inbound/outbound rules of this security group must be configured as explained in the Networking Requirements section of this page.

It is recommended to use an m6a.xlarge instance with at least 70GB of disks of type GP2 or GP3 and select the latest LTS version of Ubuntu.

Supported AWS Regions

Akamas can be run in any EC2 region.

AWS Service Limits

Running on a laptop

Prerequisites

Before installing the Akamas Server please make sure to review all the following requirements:

Software Requirements

Operating System

The following table provides a list of the supported operating systems and their versions.

On RHEL systems Akamas containers might need to be run in privileged mode depending on how Docker was installed on the system.

Software packages

The following table provides a list of the required Software Packages (also referred to as Akamas dependencies) together with their versions.

The exact version of these prerequisites is listed in the following table:

Akamas user

To install and run Akamas it is recommended to create a dedicated user (usually "akamas"). The Akamas user is not required to be in the sudoers list but can be added to the docker (dockeroot) group so it can run docker and docker-compose commands.

Make sure that the Akamas user has the read, write, and execute permissions on /tmp. If your environment does not allow writing to the whole /tmp folder, please create a folder /tmp/build and assign read and write permission to the Akamas user on that folder.

To run Akamas on an AWS Instance you need to create a new virtual machine based on one of the supported operating systems. You can refer to for step-by-step instructions on creating the instance.

You can find the latest version supported for your preferred region .

Before installing Akamas on an AWS Instance please make sure to meet your AWS service limits (please refer to the official AWS documentation ).

This special case is also referred to as "Akamas-in-a-box" and is covered by the installation guide.

Read more about how to set up.

AWS documentation
here
here
akamas-in-a-box

Operating System

Version

Ubuntu Linux

18.04+

CentOS

7.6+

RedHat Enterprise Linux

7.6+

Software Package

Notes

Docker

Akamas is deployed as a set of containerized services running on Docker. During its operation, Akamas launches different containers so access to the docker socket with enough permissions to run the container is required.

Docker Compose

Akamas containerized services are managed via Docker Compose. Docker compose is usually already shipped with Docker starting from version 23.

AWS CLI

Akamas container images are published in a private Amazon Elastic Container Registry (ECR) and are automatically downloaded during the online installation procedure.

AWS CLI is required only during the installation phase if the server has internet access and can be skipped during an offline installation.

Software Package

Ubuntu

CentOS

RHEL

Docker

19.03+

1.13+

1.13+

Docker-compose

2.0+

2.0+

2.0+

AWS CLI

2.0.0+

2.0.0+

2.0.0+

Changing UI Ports

By default, Akamas uses the following ports for its UI:

  • 80 (HTTP)

  • 443 (HTTPS)

Depending on the configuration of your environment, you may want to change the default settings: in order to do so, you’ll have to update the Akamas docker-compose file.

Inside the docker-compose.yml file, scroll down until you come across the akamas-ui service. There you will find a specification as follows:

  akamas-ui:
    ports:
      - "443:443"
      - "80:80"

Update the yaml by remapping the UI ports to the desired ports of the host.

akamas-ui:
    ports:
      - "<YOUR_HTTPS_PORT_OF_CHOICE>:443"
      - "<YOUR_HTTP_PORT_OF_CHOICE>:80"

In case you were running Akamas with host networking, you are allowed to bind different ports in the container itself. In order to do so you can expand the docker-compose service by adding a couple of environment variables like this:

akamas-ui:
    environment:
      - HTTP_PORT=<HTTP_CONTAINER_PORT>
      - HTTPS_PORT=<HTTPS_CONTAINER_PORT>
    ports:
      - "<YOUR_HTTPS_PORT_OF_CHOICE>:<HTTP_CONTAINER_PORT>"
      - "<YOUR_HTTP_PORT_OF_CHOICE>:<HTTPS_CONTAINER_PORT>"
Hardware requirements
Software requirements
Network requirements
Akamas dependencies

Online installation behind a Proxy server

This section describes how to setup an Akamas Server behind a proxy server and to allow Docker to connect to the Akamas repository on AWS ECR.

Configure Docker daemon

First, create the /etc/systemd/system/docker.service.d directory if it does not already exists. Then create or update the /etc/systemd/system/docker.service.d/http-proxy.conf file with the variables listed below, taking care of replacing <PROXY> with the address and port (and credentials if needed) of your target proxy server:

Once configured, flush the changes and restart Docker with the following commands:

Configure the Akamas containers

To allow the Akamas services to connect to addresses outside your intranet, the Docker instance needs to be configured to forward the proxy configuration to the Akamas containers.

Update the ~/.docker/config.json file adding the following field to the JSON, taking care to replace <PROXY> with the address (and credentials if needed) of your target proxy server:

Run Akamas

Set the following variables to configure your working environment, taking care to replace <PROXY> with the address (and credentials if needed) of your target proxy server:

Once configured, you can log into the ECR repository through the AWS CLI and start the Akamas services manually.

For more details, refer to the official documentation page: .

For more details, refer to the official documentation page: .

[Service]
Environment="HTTP_PROXY=<PROXY>"
Environment="HTTPS_PROXY=<PROXY>"
sudo systemctl daemon-reload
sudo systemctl restart docker
{
  ...
  "proxies": {
    "default": {
      "httpProxy": "<PROXY>",
      "httpsProxy": "<PROXY>",
      "ftpProxy": "<PROXY>",
      "noProxy": "localhost,127.0.0.1,/var/run/docker.sock,database,optimizer,campaign,analyzer,telemetry,log,elasticsearch,metrics,system,license,store,orchestrator,airflow-db,airflow-webserver,kong-database,kong,user-service,keycloak,logstash,kibana,akamas-ui,grafana,prometheus,node-exporter,cadvisor,konga,benchmark"
    }
  }
}
export HTTP_PROXY='<PROXY>'
export HTTPS_PROXY='<PROXY>'
Control Docker with systemd
Configure Docker to use a proxy server

Akamas Architecture

Akamas is based on a microservices architecture where each service is deployed as a Docker container and communicates with other services via REST APIs on a dedicated machine (Akamas Server).

The following figure represents the high-level Akamas architecture.

Interact with Akamas

Users can interact with Akamas via either the Graphical User Interface (GUI), Command-Line Interface (CLI), or via Application Programmatic Interface (API).

Both the GUI and CLI leverage HTTP/S APIs which pass through an API gateway (based on Kong), which also takes care of authenticating users by interacting with Akamas access management and routing requests to the different services.

The Akamas CLI can be invoked on either the Akamas Server itself or on a different machine (e.g. a laptop or another server) where the Akamas CLI has been installed.

Repositories

Akamas data is securely stored in different databases:

  • time series data gathered from telemetry providers are stored in Elasticsearch;

  • application logs are also stored in Elasticsearch;

  • data related to systems, studies, workflows, and other user-provided data are stored in a Postgres database.

Notice: both Postgres and Elasticsearch and any other service included within Akamas are provided by Akamas as a Docker container image as part of the Akamas installation package.

Services

Core Services

The following Spring-based microservices represent Akamas core services:

  • System Service: holds information about metrics, parameters, and systems that are being optimized

  • Campaign Service: holds information about optimization studies, including configurations and experiments

  • Metrics Service: stores raw performance metrics (in Elasticsearch)

  • Analyzer Service: automates the analysis of load tests and provides related functionalities such as smart windowing

  • Telemetry Service: takes care of integrating different data sources by supporting multiple Telemetry Providers

  • Optimizer Service: combines different optimization engines to generate optimized configurations using ML techniques

  • Orchestrator Service: manages the execution of user-defined workflows to drive load tests

  • User Service: takes care of user management activities such as user creation or password changes

  • License Service: takes care of license management activities, optimization pack, and study export.

Ancillary Services

Akamas also provides advanced management features like logging, self-monitoring, licensing, user management, and more.

Setup HTTPS configuration

Akamas APIs and UI use plain HTTP when they are first installed. To enable the use of HTTPS you will need to:

  1. Ask your security team to provide you with a valid certificate for your server. The certificate usually consists of two files with ".key" and ".pem" extension. You will need to provide the Akamas server DNS name.

  2. Create a folder named "certs" in the same directory of Akamas docker-compose file;

  3. Copy the ".key" and ".pem" files in the created "certs" folder and rename them to "akamas.key" and "akamas.pem" respectively. Make sure that the files belong to the same user and group you use to run Akamas.

  4. Restart two Akamas services by running the following commands:

    cd <Akamas docker-compose file folder>
    docker-compose restart akamas-ui kong

After the containers reboot is complete you will be able to access the UI over https from your browser:

https://<akamas server name here>

Setup CLI to use HTTPS

Now that your Akamas server is configured to use HTTPS you can update the Akamas CLI configuration in order to use the secure protocol.

akamas init config

You will be prompted to enter some input, please value it as follows:

Api address [http://localhost:8000]: https://<akamas server dns address>:8443
Workspace [default]: default
Verify SSL: [True]: True

You can test the connection by running:

akamas status

It should return ‘OK’, meaning that Akamas has been properly configured to work over HTTPS.

Install the Akamas CLI

This section describes how to install an Akamas workstation

The Akamas CLI allows users to invoke commands against the Akamas dedicated machine (Akamas Server). The Akamas CLI can also be installed on a different system than the Akamas Server.

Prerequisites

Linux and Windows operating systems are supported for installing Akamas CLI.

Installation steps

The Akamas CLI can be installed and configured in three simple steps:

If you have not yet installed the Akamas CLI follow the in order to install it. If you already have the CLI available, you can run the following command:

You can also read the section to modify the CLI ports the Akamas Server is listening to.

CLI installation guide
Setup the Akamas CLI
Verify the Akamas CLI
Initialize the Akamas CLI
Change CLI config

Verify the Akamas CLI

The Akamas CLI can be accessed by simply running the akamascommand.

You can verify that the CLI have been installed by running this command:

akamas --version

which should show an output similar to this one

Akamas CLI, version 2.7.0

At any time, you can use the following command to see available commands and options.

akamas --help

For the full list of Akamas commands please refer to the section of the Akamas Reference guide.

CLI reference

Change CLI configuration

The CLI configuration contains the information required to communicate with the akamas server. It can be easily created and updated with a configuration wizard. This page describes the main options of the Akamas CLI and how to modify them.

API Address

The CLI, as well as the UI, interacts with the akamas server via APIs. The apiAddress configuration contains the information required in order to communicate with the server.

The Akamas Server provides two different listeners to interact with APIs:

  • a HTTP listener on port 8000

  • a HTTPS listener on port 8443

For improved security, it is recommended to configure CLI communications with the Akamas Server over HTTPS. Notice that you need to have a valid certificate installed on your Akamas server (at least a self-signed one) in order to enable HTTPS communication between CLI and the Akamas Server.

Changing CLI protocol

The CLI can be configured either directly via the CLI itself or via the YAML configuration file akamasconf.

Using the CLI

Issue the following command to change the configuration of the Akamas CLI:

akamas init config

and then follow the wizard to provide the required CLI configuration:

  • enable HTTPS communications:

Api address [http://localhost:8000]: https://<akamas server dns name>:8443
Workpace [default]: Workspace1
Verify SSL: [True]: True
Is external certificate CA required? [y/N]: N
  • enable HTTP communications:

Api address [http://localhost:8000]: http://<akamas server dns name>:8000
Workspace [default]: Workspace1

Please notice that Verify SSL must be set to True only if you are using a valid certificate. If you are using a self-signed one, please set it to False. This will mimic the behavior of accepting a not valid HTTPS certificate on your favorite browser.

Using the akamasconf file

Create a file and name it akamasconf to be located in the following location:

  • Linux: ~/.akamas/akamasconf

  • Windows: C:\Users\<username>\.akamas (where C: is the drive where the OS is installed)

The file location can be customized by setting an $AKAMASCONF environment variable.

Here is an example akamasconf file provided as a sample:

apiAddress: http[s]://<akamas server dns name>:8000[8443]
verifySsl: [true|false]
organization: akamas
workspace: default

The SSL certificate is only required if verifySsl is set to true. In this case the SSL certificate requires an external CA to be validated.

Setup the Akamas CLI

Linux

To get Akamas CLI installed on Linux, run the following commands:

curl -o akamas_cli -O https://s3.us-east-2.amazonaws.com/akamas/cli/$(curl https://s3.us-east-2.amazonaws.com/akamas/cli/stable.txt)/linux_64/akamas
sudo mv akamas_cli /usr/local/bin/akamas
chmod 755 /usr/local/bin/akamas

You can now run the Akamas CLI following by running the akamas command.

In some installations, the /usr/local/bin folder is not present in the PATH environment variable. This prevents you from using akamas without specifying the complete file location. To fix this issue you can add an entry to the PATH system environment variable or move the executable to another folder in your PATH.

Auto-completion

To enable auto-completion on Linux systems with a bash shell (requires bash 4.4+), run the following commands:

curl -O https://s3.us-east-2.amazonaws.com/akamas/cli/$(curl https://s3.us-east-2.amazonaws.com/akamas/cli/stable.txt)/linux_64/akamas_autocomplete.sh
mkdir -p ~/.akamas
mv akamas_autocomplete.sh ~/.akamas
echo '. ~/.akamas/akamas_autocomplete.sh' >> ~/.bashrc
source ~/.bashrc

Windows

To install the Akamas CLI on Windows run the following command from the Powershell:

Invoke-WebRequest "https://s3.us-east-2.amazonaws.com/akamas/cli/$($(Invoke-WebRequest https://s3.us-east-2.amazonaws.com/akamas/cli/stable.txt | Select-Object -Expand Content) -replace '\n', '')/win_64/akamas.exe" -OutFile akamas.exe

You can now run the Akamas CLI following by running .\akamas in the same folder.

To invoke the akamas CLI from any folder, create a akamas folder (such as C:\Program Files\akamas), and move there the akamas.exe file. Then, add an entry to the PATH system environment variable with the value C:\Program Files\akamas. Now, you can invoke the CLI from any folder, by simply running the akamas command.

Verify the Akamas Server

Run the following command to verify the correct startup and initialization of Akamas:

akamas status

When all services have been started this command will return an "OK" message. Please notice that it might take a few minutes for Akamas to start all services.

To check that also UI is properly working please access the following URL:

http://<akamas server name here>

You will see the Akamas login form:

Please notice that it is impossible to log into Akamas before a license has been installed. Read here .

how to Install an Akamas license
Akamas high-level architecture

Manage anonymous data collection

Akamas might collect anonymized usage information on running optimizations. Collection and tracking are disabled by default and can be manually enabled.

External tracking is managed through the following environment variables:

  • AKAMAS_TRACKER_URL: the target URL for all tracking info.

  • AKAMAS_TRACKING_OPT_OUT: when set to 1, disables anonymous data collection.

Tracking for a running instance can be disabled by executing this simple command in the folder where the Akamas compose file is located:

AKAMAS_TRACKING_OPT_OUT=1 docker-compose up -d

As usual with environment variables, it is strongly suggested to export the desired value to your ~/.bashrc file to ensure persistence.

echo "export AKAMAS_TRACKING_OPT_OUT=1" >> ~/.bashrc
source ~/.bashrc

Install the Akamas license

Logging into Akamas requires a valid Akamas license.

To install a license get in touch with Akamas Customer Service to receive:

  • the Akamas license file

  • your assigned values for the AKAMAS_CUSTOMER and AKAMAS_BASE_URL variables referenced in the license file

  • login credentials

Once you have this information, you can issue the following commands:

cd <your bundle files location>

akamas install license <license file you have been provided>

akamas login
# prompt for user and password

Manage the Akamas Server

This section is a collection of different topics related to how to manage the Akamas Server.

This section covers some topics on how to manage the Akamas Server:

Akamas logs
Audit logs
Install upgrades and patches
Backup & Recovery of the Akamas Server
Monitor the Akamas Server

Audit logs

Akamas audit logs

Akamas stores all its logs into an internal Elasticsearch instance: some of these logs are reported to the user in the GUI in order to ease the monitoring of workflow executions, while other logs are only accessible via CLI and are mostly used to provide more context and information to support requests.

Audit access can be performed by using the CLI in order to extract logs related to UI or API access. For instance, to extract audit logs from the last hour use the following commands:

  • UI Logs

akamas logs --no-pagination -S kong -f -1h
  • API Logs

akamas logs --no-pagination -S kong -f -1h

Notice: to visualize the system logs unrelated to the execution of workflows bound to workspaces, you need an account with administrative privileges.

Storing audit logs into files

To ease the integration with external logging systems, Akamas can be configured to store access logs into files. To enable this feature you should:

  1. Create a logs folder next to the Akamas docker-compose.yml file

  2. Edit the docker-compose.yml file by modifying the line FILE_LOG: "false" to FILE_LOG: "true"

  3. If Akamas is already running issue the following command

docker-compose up -d logstash

otherwise, start Akamas first.

When the user interacts with the UI or the API Akamas will report detailed access logs both on the internal database and in a file in the logs folder. To ease log rolling and management every day Akamas will create a new file named according to the pattern access-%{+YYYY-MM-dd}.log.

Troubleshoot install issues

This section describes some of the most common issues found during the Akamas installation.

Issues when installing Docker

Centos 7 and RHEL 7

Notice: this distro features a known issue since Docker default execution group is named dockerroot instead of docker . To make docker work edit (or create) /etc/docker/daemon.json to include the following fragment:

{
  "group": "dockerroot"
}

After editing or creating the file, please restart Docker and then check the group permission of the Docker socket (/var/run/docker.sock), which should show dockerroot as a group:

srw-rw----. 1 root dockerroot 0 Jul  4 09:57 /var/run/docker.sock

Then, add the newly created akamas user to the dockerroot group so that it can run docker containers:

sudo usermod -aG dockerroot <user_name>

and check the akamas user has been correctly added to dockerroot group by running:

lid -g dockerroot

Issues when running AWS CLI

In case of issues in logging in through AWS CLI, when executing the following command:

aws ecr get-login-password --region us-east-2

Please check that:

  • Environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION are correctly set

  • AWS CLI version is 2.0+

Issue when starting Akamas services

Akamas failed to start some services

Please notice that the very first time Akamas is started, up to 30 minutes might be required to initialize the environment.

In case the issue persists you can run the following command to identify which service is not able to start up correctly

akamas status -d

License service unable to access docker socket

In some systems, the Docker socket, usually located in /var/run/docker.sock can not be accessed within a container. This causes Akamas to signal this behavior by reporting the Access Denied error in the license service logs.

To overcome this limitation edit the docker-compose.yaml file adding the line privileged: true to the following services:

  • License

  • Optimizer

  • Telemetry

  • Airflow

The following is a sample configuration where this change is applied to the license service:

license:
  image: 485790562880.dkr.ecr.us-east-2.amazonaws.com/akamas/license_service:2.3.0
  container_name: license
  privileged: true

Finally, you can issue the following command to apply these changes

docker-compose up -d

Missing Akamas Customer variable

You can easily inspect which value of this variable has been used when starting Akamas by running the following command on the Akamas server:

docker inspect license | grep AKAMAS_CUSTOMER

If you find out that the value is not the one you expect you can change it by running the following command on the Akamas server:

AKAMAS_CUSTOMER=<your-new-value> docker-compose up -d license

Once Akamas is up and running you can re-install your license.

Other issues

Initialize Akamas CLI

The CLI is used to interact with an akamas server. To initialize the configuration of the Akamas CLI you can run the command:

akamas init config

and follow the wizard to provide the required information such as the server IP.

Here is a summary of the configuration wizard options.

Api address [http://localhost:8000]: https://<akamas server dns name>:8443
Workpace [default]: default
Verify SSL: [True]: True
Is external certificate CA required? [y/N]: N

After this step, the Akamas CLI can be used to login to the Akamas server, by issuing the following command:

akamas login

and providing the credentials as requested.

We recommend using the for a smoother experience.

When installing Akamas it’s mandatory to export the AKAMAS_CUSTOMER variable as illustrated in the . This variable must match the one provided by Akamas representatives when issuing a license. If the variable is not properly exported license installation will fail with an error message indicating that the name of the customer installation does not match the one provided in the license.

For any other issues please contact Akamas .

This configuration can be changed at any time (see how to ).

Logging into Akamas requires a valid license. If you have not installed your license yet refer to the page .

official AWS CLI installation guide
installation guide
Customer Support Services
change the CLI config
Install the Akamas license

Akamas logs

Akamas allows dumping log entries from a specific service, workspace, workflow, study, trial, and experiment, for a specific timeframe and at different log levels.

Akamas CLI for logs

Akamas logs can be dumped via the following CLI command:

akamas log

This command provides many filters which can be retrieved with the following command:

akamas log --help

which should return

Usage: akamas log [OPTIONS] [MESSAGE]

  Show Akamas logs

Options:
  -d, --debug                     Show extended error messages if present.
  --page-size INTEGER             Number of log's lines to be retrieved NOTE:
                                  This argument is mutually exclusive with
                                  arguments: [dump, no_pagination].
  --no-pagination                 Disable pagination and print all logs NOTE:
                                  This argument is mutually exclusive with
                                  arguments: [dump, page_size].
  --dump                          Print the logs without pagination and
                                  formatting NOTE: This argument is mutually
                                  exclusive with arguments: [page_size,
                                  no_pagination].
  -f, --from [%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d %H:%M:%S|%Y-%m-%dT%H:%M:%S.%f|%Y-%m-%d %H:%M:%S.%f|[-]nw|[-]nd|[-]nh|[-]nm|[-]ns]
                                  The start timestamp of the logs
  -t, --to [%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d %H:%M:%S|%Y-%m-%dT%H:%M:%S.%f|%Y-%m-%d %H:%M:%S.%f|[-]nw|[-]nd|[-]nh|[-]nm|[-]ns]
                                  The end timestamp of the logs
  -s, --study TEXT                UUID or name of the Study
  -e, --exp INTEGER               Number of the experiment
  --trial INTEGER                 Number of the trial
  -y, --system TEXT               UUID or name of the System
  -W, --workflow TEXT             UUID or name of the Workflow
  -l, --log-level TEXT            Log level
  -S, --service TEXT              Akamas service
  --without-metadata              Hide metadata
  --sorting [ASC|DESC]            Sorting order of the timestamps
  -ws, --workspace TEXT           UUID or name of the Workspace to visualize.
                                  When empty, system logs will be returned
                                  instead
  --help                          Show this message and exit.

For example, to get the list of the most recent Akamas errors:

akamas log -l ERROR

which should return something similar to:

       timestamp                         system                  provider    service                                                                                   message
==============================================================================================================================================================================================================================================================
2022-05-02T15:51:26.88    -                                      -          airflow     Task failed with exception
2022-05-02T15:51:26.899   -                                      -          airflow     Failed to execute job 2 for task Akamas_LogCurator_Task
2022-05-02T15:56:29.195   -                                      -          airflow     Task failed with exception
2022-05-02T15:56:29.215   -                                      -          airflow     Failed to execute job 3 for task Akamas_LogCurator_Task
2022-05-02T16:01:55.587   -                                      -          license     2022-05-02 16:01:47.426 ERROR 1 --- [           main] c.a.m.utils.rest.RestHandlers            :  has failed with returning a response:
                                                                                        {"httpStatus":400,"timestamp":"2022-05-02T16:01:47.413638","error":"Bad Request","message":"The following metrics: 'spark.spark_application_duration' were not found
                                                                                        in any of the components of the system 'analytics_cluster'","path":null}
2022-05-02T16:01:55.587   -                                      -          license     2022-05-02 16:01:47.434 ERROR 1 --- [           main] c.a.m.MigrationApplication               : Unable to complete operation. Mode: RESTORE. Cause: A request to a
                                                                                        downstream service CampaignService has failed: 400 : [{"httpStatus":400,"timestamp":"2022-05-02T16:01:47.413638","error":"Bad Request","message":"The following
                                                                                        metrics: 'spark.spark_application_duration' were not found in any of the components of the system 'analytics_cluster'","path":null}]
2022-05-02T16:01:55.678   -                                      -          license     2022-05-02 16:01:47.434 ERROR 1 --- [           main] c.a.m.MigrationApplication               : Unable to complete operation. Mode: RESTORE. Cause: A request to a
                                                                                        downstream service CampaignService has failed: 400 : [{"httpStatus":400,"timestamp":"2022-05-02T16:01:47.413638","error":"Bad Request","message":"The following
                                                                                        metrics: 'spark.spark_application_duration' were not found in any of the components of the system 'analytics_cluster'","path":null}]
2022-05-02T16:01:55.678   -                                      -          license     2022-05-02 16:01:47.426 ERROR 1 --- [           main] c.a.m.utils.rest.RestHandlers            :  has failed with returning a response:
                                                                                        {"httpStatus":400,"timestamp":"2022-05-02T16:01:47.413638","error":"Bad Request","message":"The following metrics: 'spark.spark_application_duration' were not found
                                                                                        in any of the components of the system 'analytics_cluster'","path":null}
2022-05-02T16:12:10.261   -                                      -          license     2022-05-02 16:05:53.209 ERROR 1 --- [           main] c.a.m.services.CampaignService           : de9f5ff9-418e-4e25-ae2c-12fc8e72cafc
2022-05-02T16:32:07.216   -                                      -          license     2022-05-02 16:31:37.330 ERROR 1 --- [           main] c.a.m.services.CampaignService           : 06c4b858-8353-429c-bacd-0cc56cc44634
2022-05-02T16:38:18.522   -                                      -          campaign    Internal Server Error: Object of class [com.akamas.campaign_service.entities.campaign.experiment.Experiment] with identifier
                                                                                        [ExperimentIdentifier(workspace=ac8481d3-d031-4b6a-8ae9-c7b366f027e8, study=de9f5ff9-418e-4e25-ae2c-12fc8e72cafc, id=2)]: optimistic locking failed; nested exception
                                                                                        is org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) :
                                                                                        [com.akamas.campaign_service.entities.campaign.experiment.Experiment#ExperimentIdentifier(workspace=ac8481d3-d031-4b6a-8ae9-c7b366f027e8,
                                                                                        study=de9f5ff9-418e-4e25-ae2c-12fc8e72cafc, id=2)]

Backup & Recover of the Akamas Server

Akamas server backup

The process of backing up an Akamas server can be divided in two parts, that is system backup and otherwise start Akamas. Backup can be performed in any way you see fit: they’re just regular files so you can use any backup tool.

System backup

System services are hosted on AWS ECR repo so the only thing that fully defines a working Akamas application is the docker-compose.yml file. Performing a backup of the Akamas application is as simple as copying this single file to your backup location. you may schedule any script that performs this weekly or at any frequency you see fit

User data backup

You may list all existing Akamas studies via the Akamas CLI command:

akamas list study

Then you can export all existing studies one by one via the CLI command

akamas export study <UUID>

where UUID is the UUID of a single study. This command exports into a single archive file (tar.gz). These archive files can be backed up to your favorite backup folder.

Akamas server recovery

Akamas server recovery involves recovering the system backup, restarting the Akamas service then re-importing the studies.

System Restore

To restore the system you must recover the original docker-compose.yml then launch the command

docker-compose up &

from the folder where you placed this YAML file and then wait for the system to come up, by checking it with the command

akamas status -d

User data restore

All studies can be re-imported singularly with the CLI command (referring to the correct pathname of the archive):

akamas import study archive.tgz

Install upgrades and patches

Akamas patches and upgrades need to be installed by following the specific instructions specified in the package provided. In case of new releases, it is recommended to read the related Release Notes. Under normal circumstances, this usually requires the user to update the docker-compose configuration, as described in the next section.

Docker compose Configuration

When using docker compose to install Akamas, there’s a folder usually named akamas in the user home folder that contains a docker-compose.yml file. This is a YAML text file that contains a list of docker services with the URLs/version pointing to the ECR repo hosting all docker images needed to launch Akamas.

Here’s an excerpt of such a docker-compose.yml file (this example contains 3 services only):

services:
  #####################
  # Database Service #
  #####################
  database:
    image: 485790562880.dkr.ecr.us-east-2.amazonaws.com/akamas/master-db:1.7.0
    container_name: database2
    restart: always
    command: postgres -c max_connections=200

  #####################
  # Optimizer Service #
  #####################
  optimizer:
    image: 485790562880.dkr.ecr.us-east-2.amazonaws.com/akamas/optimizer_service:2.3.0
    container_name: optimizer
    restart: always
    networks:
      - akamas2
    depends_on:
      - database
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /tmp/build/engine_input:/tmp/build/engine_input

  ####################
  # Campaign Service #
  ####################
  campaign:
    image: 485790562880.dkr.ecr.us-east-2.amazonaws.com/akamas/campaign_service:2.3.0
    container_name: campaign
    restart: always
    volumes:
      - config:/config
    networks:
      - akamas2
    depends_on:
      - database
      - optimizer
      - analyzer

The relevant lines that usually have to be patched during an upgrade are the lines with key "image" like:

image: 485790562880.dkr.ecr.us-east-2.amazonaws.com/akamas/master-db:1.7.0

In order to update to a new version you should replace the versions (1.7.0 or 2.3.0) after the colon with the new versions (ask your Akamas support for the correct service versions for a specific Akamas release) then you should restart Akamas with the following console commands: First login to Akamas CLI with:

akamas login

and type username and password as in the example below

ubuntu@ak_machine:~/akamas/ $ akamas login
User: akamas
Password:
User akamas logged in. Welcome.

Now make sure you have the following AWS variables with the proper value in your Linux user environment:

AWS_DEFAULT_REGION
AWS_SECRET_ACCESS_KEY
AWS_ACCESS_KEY_ID

Then log in to AWS with the following command:

aws ecr get-login-password --region us-east-2 | docker login --username AWS --password-stdin 485790562880.dkr.ecr.us-east-2.amazonaws.com
Login Succeeded

Then pull all new ECR images for the new service versions you just changed (this should be done from when inside the same folder where file docker-compose.yml resides, usually $HOME/akamas/) with the following command:

docker-compose pull

It should return an output like the following:

Pulling database                ... done
Pulling optimizer               ... done
Pulling elasticsearch           ... done
Pulling log                     ... done
Pulling metrics                 ... done
Pulling telemetry               ... done
Pulling analyzer                ... done
Pulling campaign                ... done
Pulling system                  ... done
Pulling license                 ... done
Pulling store                   ... done
Pulling airflow-db              ... done
Pulling benchmark               ... done
Pulling kong-database           ... done
Pulling kong                    ... done
Pulling user-service            ... done
Pulling keycloak                ... done
Pulling logstash                ... done
Pulling kibana                  ... done
Pulling kong-consumer-init      ... done
Pulling kong-migration          ... done
Pulling keycloak-initializer    ... done
Pulling telemetry-init          ... done
Pulling curator-only-pull-image ... done
Pulling airflow                 ... done
Pulling orchestrator            ... done
Pulling akamas-init             ... done
Pulling akamas-ui               ... done
Pulling pg-admin                ... done
Pulling grafana                 ... done
Pulling prometheus              ... done
Pulling node-exporter           ... done
Pulling cadvisor                ... done
Pulling konga                   ... done

Finally, relaunch all services with:

docker-compose up -d

(usage example below)

ubuntu@ak_machine:~/akamas/ $ docker compose up -d
pgadmin4 is up-to-date
prometheus is up-to-date
benchmark is up-to-date
kibana is up-to-date
node-exporter is up-to-date
store is up-to-date
grafana is up-to-date
cadvisor is up-to-date
Starting telemetry-init ...
Starting curator-only-pull-image ...
Recreating database2             ...
Recreating airflow-db            ...
Starting kong-initializer        ...
akamas-ui is up-to-date
elasticsearch is up-to-date
Recreating kong-db               ...
Recreating metrics               ...
logstash is up-to-date
Recreating log                   ...
...(some logging follows)

Wait for a few minutes and check the Akamas services are back up by running the command:

akamas status -d

The expected output should be like the following (repeat the command after a minute or two if the last line is not "OK" as expected):

Checking Akamas services on http://localhost:8000 service status
analyzer UP
campaign UP
metrics UP
optimizer UP
orchestrator UP
system UP
telemetry UP
license UP
log UP
users UP
OK

Using Akamas

This section describes how to use Akamas

This guide introduces the optimization process and methodology with Akamas and then provides a step-by-step description of how to prepare, run and analyze Akamas optimization studies:

and also provides some technology-specific guidelines and examples on:

Monitor the Akamas Server

External tools

You can use any monitoring tool to check the availability of the Akamas instance.

Checking Akamas services

To check the status of the Akams services please run akamas status -d to identify which service is not able to start up correctly

Here is an example of output:

Checking Akamas services on http://localhost:8000
 service	 status
=========================
analyzer       	UP
campaign       	UP
metrics        	UP
optimizer      	UP
orchestrator   	UP
system         	UP
telemetry      	UP
license        	UP
log            	UP
users          	UP
OK

General optimization process
Preparing optimization studies
Running optimization studies
Guidelines for choosing optimization parameters
Guidelines for defining optimization studies

General optimization process and methodology

Akamas has been designed and implemented to effectively support organizations in implementing their own approach to optimization, in particular, thanks to its Infrastructure as Code (IaC) design, modular and reusable constructs, and delegation-of-duty features to support multiple teams.

While an optimization process can also be a one-shot exercise aiming at optimizing a specific critical application to remediate performance issues or to address a cost reduction initiative, in general, optimization is conceived as a continuous and iterative process. This process can be seen as composed of multiple optimization campaigns running in parallel (each typically involving a single application) that are being executed at the same time (see the following figure).

At any given timeframe, for a specific application, there could be multiple studies being executed either in parallel or in sequence (see the following figure):

  • multiple live optimizations running for each critical application microservices; typically, a live optimization focuses on an application microservice supporting a specific business function with respect to specific optimization goals and constraints, as the optimization could be aimed for some microservices at improving performance while trading lower costs, while for others at keeping performances within the SLOs and reducing infrastructure or cloud cost;

  • multiple offline optimization studies may correspond to the different layers of the target system that are being optimized in several stages (typically starting with the backend layer, then the middleware, and finally the front-end layer), or to several application releases with different resources footprint (e.g. higher memory usage), or that involve technology changes in the application stack (e.g. moving from Oracle to MongoDB) or migration to a different cloud provider (or cloud managed service), or that are required to sustain higher workload (e.g. due to a marketing campaign) or to ensure application resilience under failure scenarios (identified by chaos engineering).

The following figure intends to illustrate the variety of scenarios in a real optimization process:

For example (with reference to the previous figure):

  • the optimization campaign for the microservices-based application App-1 runs an offline optimization study for the App-1-1 microservice in Q1 and the App-1-2 microservice in Q2, before running live optimizations for both these microservices in parallel starting from Q3; notice that in Q4, possibly to anticipate a workload growth and assess the required infrastructure, an offline optimization for App-1-2 (possibly the most resource-demanding microservice) is also executed;

  • the optimization campaign for the standalone application App-2 runs several offline optimizations in sequence: in Q1 and Q2, first separately on the frontend and backend layers of App-2 (respectively App-2-FE and App-2-BE) and then in Q3 for the entire application; in Q4, in addition to the quarterly optimization for App-2 with respect to the goal Goal-2-1 that was used in the previous optimizations, also another offline optimization is executed with respect to a different goal Goal-2-2, which could either be a refinement of the previous goal (e.g. with tighter SLOs) or reflecting a completely different goal (e.g. a cost-reduction goal with respect to a performance improvement goal);

  • the optimization campaign for the microservices-based application App-3 runs first a live optimization starting at some point in Q2 (for example as the application is first released) for most-critical microservice App-3-1 and then in Q3 also for other microservice App-3-2, possibly as a refinement of the modeling of App-3 based on the observed optimization results.

Preparing optimization studies

Preparing an optimization study requires several steps, as illustrated by the following figure:

and described in the following sections:

Notice that while these steps apply to both offline optimization studies and live optimization studies, some of these steps are different depending on which optimization is being prepared.

Modeling systems

The following figure shows a system corresponding to a Java-based application, where the Java Virtual Machine (JVM) and Kubernetes containers have been identified as key components.

As shown in this figure, a supported component is the "web application", representing the end user perspective of the modeled system (e.g. response time). As expected, this component type only provides measured metrics and no tunable parameters.

Best Practices

Properly modeling the application or service to be optimized by identifying the components and their parameters to tune is the first important step in the optimization process. Some best practices are described here below.

Modeling only relevant components

When defining the system and its components, it is convenient to focus only on those components that are either providing tunable parameters or key metrics (or KPIs).

Key metrics are those used to:

  • support the analysis of the optimization results, as metrics that are useful to measure the impact of parameter tuning on the performance, efficiency, or reliability of the system. For example, a Linux OS component could be used to assess the impact of the optimization on the system-level metrics such as CPU utilization.

Please note that the metrics used to define the optimization goal and constraints are mandatory as they are used by the Akamas AI engine to validate and score each tested configuration against the goal. Other metrics that are not related to the optimization goal and constraints can be considered optional from a pure optimization implementation perspective.

When defining the optimization study, it is always possible to select which parameters and metrics to consider, thus which components are modeled in the system. Therefore, a system could be modeled by all components that at some point are going to be optimized, even if not used in the current optimization study. However, the recommended approach is to model the system only with components whose parameters (and relevant metrics) are to be tuned by the current study.

Reusing systems whenever possible

Whenever possible, it is recommended to model systems and their components by considering how these could be reused for multiple optimization studies in different contexts.

For example, it might be useful to create a simple system containing only one component (e.g. the JVM) for a first optimization study. A new system might then be created to include other components (e.g. the application server) for more advanced optimization studies.

Modeling systems with horizontal scalability

A typical optimization target is a cluster, i.e. a system made of multiple instances that provide horizontal scalability (e.g. a Kubernetes deployment with several replicas). In this scenario, all the instances are supposed to be identical both from a code and configuration perspective. In this scenario, the recommended approach is to create only one component that represents a generic instance of the cluster. This way, all the instances will be tuned in exactly the same way.

Notice that in order for this approach to work correctly, it is also important to verify that the cluster is correctly monitored by the telemetry providers. Depending on the telemetry technology in use, the clustered system may be presented as either a single entity, with aggregated metrics (e.g. a Kubernetes deployment with the total CPU usage of all the replica pods), or as multiple entities, each corresponding to the different instances in the cluster:

  • in case aggregated metrics are provided by the telemetry provider for the cluster, these metrics can be simply assigned to the component modeling the whole cluster;

  • in case only instance-level metrics are made available by the telemetry provider, telemetry instances need to be configured in Akamas so as to aggregate the metrics of the cluster instances (e.g. averaging CPU utilization, summing memory usage, etc.), depending on how each specific metric is expected to be used in the goal and constraints or in the study results.

Modeling components

After identifying the components that are required to model a system, the following step is to model each identified key component.

While the optimization process does not necessarily require component types and optimization packs to be defined, it is recommended to leverage this construct to facilitate modularization and reuse.

This is possible as the Akamas optimization pack model is extensible: custom optimization packs can be easily created without any programming to allow Akamas optimization capabilities to be applied to virtually any technology.

Creating custom optimization packs

To create a custom optimization pack, the following fixed directory structure and several YAML manifests need to be created.

Optimization pack directory structure

my_dir
|_ optimizationPack.yaml
|_ component-types
|  |_ componentType1.yaml
|
|_ metrics
|  |_ metricsGroup1.yaml
|
|_ parameters
|  |_ parametersGroup1.yaml
|
|_ telemetry-providers
|_ provider1.yaml

Optimization pack manifest

The optimizationPack.yaml file is the manifest of the optimization pack to be created, which should always be named optimizationPack and have the following structure:

name: Java_8_Optimization_Pack
description: An optimization pack for the Java Hotspot JVM version 8
weight: 1
version: 1.0.0
tags:
- java
- jvm

where:

Field
Type
Value restrictions
Is required
Default value
Description

name

string

It should not contain spaces.

TRUE

The name of the optimization pack.

description

string

TRUE

A description to characterize the optimization pack.

weight

integer

weight > 0

TRUE

A weight to be associated to the optimization pack. This field is used for licensing purposes.

version

string

It should match the regexp: \d.\d.\d

TRUE

The version of the optimization pack.

tags

array of string

FALSE

An empty array

A set of tags to make the optimization pack more easily searchable and discoverable.

Component types

The component-types directory should contain the manifests of the component types to be included in the optimization pack. No particular naming constraint is enforced on those manifests.

Metrics

The metrics directory should contain the manifests of the groups of metrics to be included in the optimization pack. No particular naming constraint is enforced on those manifests.

Parameters

The parameters directory should contain the manifests of the groups of parameters to be included in the optimization pack. No particular naming constraint is enforced on those manifests.

Telemetry providers

The telemetry-providers directory should contain the manifests of the groups of parameters to be included in the optimization pack. No particular naming is enforced on those manifests.

Building optimization pack descriptor

The following command need to be executed in order to produce the final JSON descriptor:

akamas build optimization-pack PATH_TO_THE_DIRECTORY

In Akamas, an optimization campaign is structured into one or more , which represent an optimization initiative aimed at optimizing a target system with respect to defined goals and constraints.

These studies can be either , which are typically executed in test or pre-production environments, also to validate planned changes or what-if scenarios, or which run directly in production environments.

More complex scenarios may result in the case of multiple teams (working jointly or separately) on the same or different applications, which in Akamas can be organized in different .

The very first preparatory step is to model the representing an application or a service that needs to be optimized (also known as the optimization target).

Modeling a system translates into identifying the representing the key technology elements to be included in the optimization. Each component is associated with a set of tunable , i.e. configurable properties that impact the performance, efficiency, or reliability of the system, and with a set of , i.e. measurable properties that are used to evaluate the performance, efficiency, or reliability of the system. Typically, key system components are identified by considering which elements and their parameters need to be tuned.

Akamas provides several out-of-the-box to support system and component modeling. Moreover, it is also possible to define new component types to model other components (see ).

The section of the reference guide describes the template required to define a system, while the commands for creating a system are listed on the page.

define the optimization , either as metrics that are expected to be improved by the optimization or as metrics representing constraints. For example, a typical goal is to optimize the application throughput. In this case, a Web Application component should include service metrics such as transaction throughput or transaction response time.

Please, also notice that systems (and other Akamas artifacts) can be shared with different teams thanks to the definition of Akamas .

In this scenario, the associated automation needs to be configured to ensure that each configuration is applied to the whole cluster, by propagating the parameter configuration to all of the cluster instances, not just to a single instance represented by the modeled component whose metrics are collected and used to evaluate the overall cluster behavior under that configuration.

Akamas provides the corresponding for their specific technology (and possibly version) and describing all the tunable parameters and metrics of interest. The full list of Akamas optimization packs is available on the o page of the Akamas reference guide.

The section of the reference guide describes the template required to define a system component, while the commands for creating a system component are listed on the page.

The page describes how to create a new optimization pack (possibly by reusing an already existing one) while the page in the Akamas reference guide describes how to define a custom component type (if required).

Notice that optimization packs, even if provided out-of-the-box by Akamas, need to be installed (as described on thepage), in case they have not yet been used before in the Akamas installation, by other users. Indeed, optimization packs are global resources that are shared across all the workspaces on the same Akamas installation.

See for details on the structure of those manifests.

See for details on the structure of those manifests.

See for details on the structure of those manifests.

See for details on the structure of those manifests.

After this, the optimization pack can be installed (and then used) as described on the page.

optimization studies
offline optimization studies
live optimization studies
workspaces
modeling systems
modeling components
creating telemetry instances
creating automation workflow
creating optimization study
system
components
parameters
metrics
component types
Modeling components
System template
Resource Management command
goal and constraints
workspace
workflow
component types
optimization packs
ptimization packs
Component template
Resource Management command
Creating custom optimization pack
Component Type template
Managing optimization packs
Component Types template
Metric template
Parameter template
Telemetry Provider template
Managing optimization packs

Creating telemetry instances

Telemetry Providers are shared across all the workspace in the same Akamas installation and require an account with administrative privileges to manage them. Any number of telemetry instances (even of the same type) can be specified. For example, the following figure shows two Prometheus telemetry instances associated with the Adservice system.

Best Practices

The following sections provide guidelines on how to create telemetry instances.

Verify metrics provided by the telemetry provider

A seemingly obvious, yet fundamental, best practice when choosing a telemetry provider is to check whether the required metrics:

  • are supported by the original data source or can be added (e.g. as it is in the case of Prometheus)

  • are available and can be effectively gathered in the specific implementation

  • are supported by the telemetry provider itself or whether it needs to be extended (this is the case for a Prometheus telemetry provider ) as in the case of custom metrics such as those made available by the application itself

Creating workflows for offline studies

More in detail, a typical workflow includes the following types of tasks:

  • Preparing the application, by executing all cleaning or reset actions that are required to prepare the load testing phase and ensuring that each experiment is executed under exactly the same conditions - for example, this may involve cleaning caches, uploading test data, etc

  • Applying the configuration, by preparing and then applying the parameter configuration under test to the target environment - this may require interfacing configuration management tools or pushing configuration to a repository, restarting the entire application or some of its components to ensure that some parameters are effectively applied, and then checking that after restarting the application is up & running before the workflow execution continues, and checking whether the configuration has been correctly applied

  • Applying the workload, by launching a load test to assess the behavior of the system under the applied configuration and synthetic workload defined in the load testing scenarios - of course, a preliminary step is to design a load testing scenario and synthetic workload that ensures that optimized configurations resulting from the offline optimization can be applied to the target system under the real or expected workload

Failing workflows

A workflow interrupts in case any of its steps does. A failing workflow causes the experiment or trial to fail. This should be considered as a different situation than a specific configuration not matching optimization constraints or causing the system under test to fail to run. For example, if the amount of max memory configured was too low, the application may fail to start.

When an experiment fails, the Akamas AI engine takes this information into account and thus learns that that parameter configuration was bad. This way, the AI engine automatically tries to avoid the regions of the parameter space which can lead to low scores or failures.

Best Practices

Creating effective workflows is essential to ensure that Akamas can automatically identify the optimal configuration in a reliable and efficient way. Some best practices on how to build robust workflows are described here below.

Reusing workload as much as possible

Since Akamas workflows are first-class entities that can be used by multiple studies, it might be useful to avoid creating (and maintaining) multiple workflows and instead define workflows that can be easily reused, by factoring all differences into specific action parameters.

Of course, this general guideline should be balanced with respect to other requirements, such as avoiding potential conflicts due to different teams modifying the same workload for different uses and potentially impacting optimization results.

Building robust workflows

Akamas takes into account the exit code of each of the workflow tasks, and the whole workflow fails if a task exits with an error. Therefore, the best practice is to make use of exit codes in each task, to ensure that task failures can only happen in case of bad parameter configuration.

For example, it is important to always check that the application has correctly started and is up and running (after a new configuration has been applied). This can be done by:

  • including a workflow task that tests the application is up and running after the tasks where the configuration is applied;

  • making sure that this task exits with an error in case the application has not correctly started (typically after a timeout).

Another example is when the underlying environment incurs issues during the optimization (e.g. a database might be mistakenly shut down by another team). As much as possible, all these environmental transient issues should be carefully avoided. Akamas also provides the ability to execute multiple task retries (default is twice, configurable) to compensate for these transient issues, provided they only last for a short time (the retry time and delay are also configurable).

Building workflows that ensure reproducible experiments

As for any other performance evaluation activity, Akamas experiments should be designed to be reproducible: if the same experiment (hence, the same parameter configuration) is executed multiple times (i.e. in multiple trials), the same performance results should be found for each trial.

Therefore, it is fundamental that workflows include all the necessary tasks to realize reproducible experiments. Particular care needs to be taken to correctly manage the system state across the experiments and trials. System state can include:

  • Application caches

  • Operating system cache and buffers (e.g. Linux filesystem page cache)

  • Database tables that fill up during the optimization process

All experiments should always start with a clean and well-known state. If the state is not properly managed, it may happen that the performance of the system is observed to change (whether higher or lower) not because of the effect of the applied parameters, but due to other effects (e.g. warming of caches).

Best practices to consistently manage system state across experiments include:

  • Restoring the system state at the beginning of each experiment - this may involve restarting the application, clearing caches, restoring DB tables, etc;

  • Allowing for a sufficient warm-up period in the performance tests, so to ensure application performance has reached stability. See also the recommended best practices about properly managing warm-up periods in the following section about creating an optimization study.

Another common cause that can impact the reproducibility of experiments is an unstable infrastructure or environment. Therefore, it is important to ensure that the underlying infrastructure is stable and that no other workload that might impact the optimization results is running on it. For example, beware of scheduled system jobs (e.g. backups), automatic software updates or anti-virus systems that might not explicitly be considered as part of the environment but that may unexpectedly alter its performance behavior.

Taking into account workflow duration

When designing workflows, it is important to take into account the potential duration of their tasks. Indeed, the task duration impacts the duration of the overall optimization and might impact the ability to execute a sufficient number of experiments within the overall time interval or specific time windows allowed for the optimization study.

Typically, the longest task in a workflow is the one related to applying workload (e.g. launching a load test or a batch job): such tasks can last for dozens of minutes if not hours. However, a workflow may also include other ancillary tasks that may provide nontrivial contributions to the task durations (e.g. checking the status to ensure that the application is up & running).

Making workflows fail fast

As general guidance, it is better to fail fast by performing quick checks executed as early as possible. For example, it is better to do a status check before launching a load test instead of possibly waiting for it to complete (maybe after 1h) just to discover that the application did not even start.

After modeling the system and its components, the following step (see the following figure) is to ensure that all the metrics that are required to define goals and constraints and analyze the behavior of the target system can be collected from one of the available data sources available in the environment, that in Akamas are called .

Akamas provides a number of out-of-the-box telemetry providers, including industry-standard monitoring platforms (e.g. Prometheus or Dynatrace), performance testing tools (e.g. LoadRunner or JMeter), or simple CSV files. The section lists all the out-of-the-box telemetry providers and how to get them integrated by Akamas, while the section describes the mapping of the specific data source metrics to Akamas metrics).

Since several instances of a data source type might be available, the specific data source instance needs to be specified, that is a corresponding needs to be defined for the modeled system and its components.

The section of the reference guide describes the template required to define a telemetry instance, while the commands for creating a telemetry instance are listed on the page.

Akamas makes it possible to validate whether a telemetry setup works correctly by first executing dry runs. This is discussed in the context of the recommended practices to run optimization studies (section ).

A workflow for an automates all the actions required to interface the configuration management and load testing tools (see the following figure) at each experiment or trial. Notice that metrics collection is an implicit action that does not need to be coded as part of the workflow.

The section provides some examples of how to define workload for a specific technology. In a complex application, a workflow may include multiple actions of the same type, each operating on separate components of the target system. The guide provides some real-world examples of how to create workflows and optimization studies.

This explains why it is important to build robust workflows that ensure experiments only fail in case bad configurations are tested. See the specific entry in the best practices section below.

Some additional best practices related to the design and implementation of load testing are described in the page.

telemetry providers
Integrating Telemetry Providers
Telemetry metric mapping
telemetry instance
Telemetry instance template
Resource Management command
Running optimization studies
offline optimization study
Optimization examples
Knowledge base
Performing load testing to support optimization activities
Building robust workflow

Creating automation workflows

Managing optimization packs

Whether out-of-the-box or custom, before being used optimization packs need to be installed on an Akamas installation before being used.

Since optimization packs are global resources that are shared across all the workspaces on the same Akamas installation, an account with administrative privileges is required to manage them.

Optimization packs that are not yet installed are displayed as grayed out in the Akamas UI (this is the case for the AWS and Docker packs in the following figure).

The content of the store can be also inspected from the store container on the Akamas server:

which also provides the list of the associated JSON file (the optimization pack descriptor).

Downloading descriptors

An Akamas installation comes with the latest optimization packs already loaded in the store and is able to check the central repository for updates.

Installing

There are two ways of installing an optimization pack:

  • online installation - this is the general case when the optimization pack is already in the store

Only in the first case, an optimization pack can be installed from the UI. See here below the command line commands to get an optimization pack installed.

Online installation

Execute the following command by specifying the name of the optimization pack that is already available in the store:

Offline installation

Execute the following command to install an optimization pack by specifying the name of the optimization pack and the full path to the JSON descriptor file:

Forcing installation

When installing an optimization pack, the following checks are executed to identify potential clashes with already existing resources:

  • name of the optimization pack

  • metrics

  • parameters

  • component types

  • telemetry providers

In case one of those checks is positive (i.e. a clash exists), the installation failed and a message notifies that a "force" option needs to be used to get the optimization pack installed anyway

Please be aware that when forcing the installation of an optimization pack, Akamas replaces (or merges) all the conflicting resources, except that if there is at least one custom resource, the installation is stopped. In this case, the custom resource needs to be manually removed first in order to proceed.

Uninstalling

The following command uninstalls an optimization pack

Notice that this also deletes all the components built using that optimization pack.

Updating

In case a new optimization pack needs to be installed from a descriptor, the procedure is the following:

  • uninstall the optimization pack

  • remove the old version of the optimization pack descriptor file from the store container;

  • install the new optimization pack with the new JSON descriptor

After modeling the system and its components and ensuring that appropriate telemetry instances are defined, the following step (see the following figure) is to define a .

A workflow automates all the tasks to be executed in sequence (see the following figure) during the optimization study, in particular those leveraging integrations with external entities, such as telemetry providers or configuration management tools. Akamas provides a number of general-purpose and specialized workflow operators (see page).

The section of the reference guide describes the template required to define a workflow, while the commands for creating a workflow are listed on the page.

Since a workflow is an Akamas resource defined at the level and that can be used by multiple studies, it might be the case that a convenient workflow is already available or can be used to create a new workflow for the specific target system and integrations, by adding/removing some workflow tasks, changing the task sequence or the values assigned to task parameters.

Notice that since the structure of workflows defined for a and for an are very different, these cases are described by a specific page:

The following command describes how to download the file descriptor related to the version 1.3.0 of the Linux optimization pack:

offline installation - this may apply to custom optimization packs available as a JSON file (refer to the page)

workflow
Workflow Operator
Workflow template
Resource Management command
workspace
live optimization study
offline optimization study
creating workflows for offline optimization studies
creating workflows for live optimization studies
docker exec -it store bash
root@ff16b64e84db:/store# cd /store_data/
root@ff16b64e84db:/store_data# ls -l
total 120
-rw-r--r--. 1 root  root   3136 Jul 21 07:11 Docker_1-1-0.json
-rw-r--r--. 1 root  root  19130 Jul 21 07:11 Java-OpenJDK_1-2-6.json
-rw-r--r--. 1 root  root  70391 Jul 21 07:11 Linux_1-2-1.json
-rw-r--r--. 1 root  root   4661 Jul 21 07:11 MongoDB_1-0-0.json
-rw-r--r--. 1 root  root   6432 Jul 21 07:11 MySQL_1-2-0.json
-rw-r--r--. 1 39073 11537  5633 Sep  7 11:31 web-application_1-1-0.json
curl -O https://akamas.s3.us-east-2.amazonaws.com/optimization-packs/Linux/1.3.0/Linux_1-3-0.json akamas install optimization-pack Linux_1-3-0.json
akamas install optimization-pack OPTIMIZATION_PACK_NAME
akamas install optimization-pack PATH_TO_JSON_DESCRIPTOR
akamas install -f optimization-pack OPTIMIZATION_PACK_NAME
akamas uninstall --force OPTIMIZATION_PACK_NAME
https://akamas.s3.us-east-2.amazonaws.com/optimization-packs/Linux/1.3.0/Linux_1-3-0.json
Creating custom optimization pack

Creating optimization studies

The final preparatory step before running a study is to actually create the study, which also requires several substeps.

Offline optimization studies

For offline optimization studies, there are some additional (optional) steps:

Live optimization studies

For live optimization studies, there are some additional steps - including a mandatory one:

Performing load testing to support optimization activities

This page provides a short compendium of general performance engineering best practices to be applied in any load testing exercise. The focus is on how to ensure that realistic performance tests are designed and implemented to be successfully leveraged for optimization initiatives.

The goal of ensuring realistic performance tests boils down to two aspects:

  • sound test environments;

  • realistic workloads.

Test environments

A test o the pre-production environment (Test Env from now on) needs to represent as closely as possible the production environment (ProdEnv from now on).

The most representative test environment would be a perfect replica of the production environment from both infrastructure (hardware) and architecture perspectives. The following criteria and guidelines can help design a TestEnv that is suitable for performance testing supporting optimization initiatives.

Hardware specifications

The hardware specifications of the physical or virtual servers running in TestEnv and ProdEnv must be identical. This is because any differences in the available resources (e.g. amount of RAM) or specification (e.g. CPU vendor and/or type) may affect both services performance and system configuration.

This general guideline can only be relaxed for servers/clusters running container(s) or container orchestration platforms (e.g. Kubernetes or OpenShift). Indeed, it is possible to safely execute most of the related optimization cases if the TestEnv guarantees enough spare/residual capacity (number of cores or amount of RAM) to allocate all the needed resources.

While for monolithic architectures this may translate into significant HW requirements, with microservices this might not be the case, for two main reasons:

  • microservices are typically smaller than monoliths and designed for horizontal scalability: this means that optimizing the configuration of the single instance (pod/container resources and runtime settings) becomes easier as they typically have smaller HW requirements;

  • approaches like Infrastructure-as-code (IaaC), typically used with cloud-native applications, allow for easily setting up cluster infrastructure (on-prem or on the cloud) that can mimic production environments.

Downscaled/downsized architecture

Test Envs are typically downscaled/downsized with respect to Prod Envs. If this is the case, then optimizations can be safely executed provided it is possible to generate a "production-like" workload on each of the nodes/elements of the architecture.

This can be usually achieved if all the architectural layers have the same scale ratio between the two environments and the generated workload is scaled accordingly. For example, if the ProdEnvs has 4 nodes at the front-end layer, 4 at the backend layer, and 2 at the database layer, then a TestEnv can have 2 nodes, 2 nodes, and 1 node respectively.

Load balancing among nodes

From a performance testing perspective, the existence of a load balancing among multiple nodes can be ignored, if the load balancing relies on an external component that ensures a uniform distribution of the load across all nodes.

On the contrary, if an application-level balancing is in place, it might be required to include at least two nodes in the testing scenario so as to take into account the impact of such a mechanism on the performance of the cluster.

External/downstream services

The TestEnv should also replicate the application ecosystem, including dependencies from external or downstream services.

External or downstream services should emulate the production behavior from both functional (e.g. response size and error rate) and performance (e.g. throughput and response times) perspectives. In case of constraints or limitations on the ability to leverage external/downstream services for testing purposes, the production behavior needs to be simulated via stubs/mock services.

In the case of microservices applications, it is also required to replicate dependencies within an application. Several approaches can be taken for this purpose, such as:

  • replicating interacting microservices;

  • disregarding dependencies with nonrelevant services (e.g. a post-processing service running on a mainframe whose messages are simply left published in a queue without being dequeued).

Test cases

The most representative performance test script would provide 100% coverage of all the possible test cases. Of course, this is very unlikely to be the case in performance testing. The following criteria and guidelines can be considered to establish the required test coverage.

Statistical relevance

The test cases included in the test script must cover at least 80% of the production workload.

Business relevance

The test cases included in the test script must cover all the business-critical functionalities that are known (or expected) to represent a significant load in the production environment

Technical relevance

The test cases included in the test script must cover all the functionalities that at the code level involve:

  • Large objects/data structure allocation and management

  • Long living objects/data structure allocation and management

  • Intensive CPU, data, or network utilization

  • "one of-a-kind" implementations, such as connections to a data source, ad-hoc objects allocation/management, etc.

Test user paths and behavior

The virtual user paths and behavior coded in the test script must be representative of the workload generated by production users. The most representative test script would account for the production users in terms of a mix of the different user paths, associated think times, and session length perspectives.

When single-user paths cannot be easily identified, the best practice is to consider each of them the most comprehensive user journey. In general, a worst-case approach is recommended.

The task of reproducing realistic workloads is easier for microservice architectures. On the contrary, for monolithic architectures, this task could become hard as it may not be easy to observe all of the workloads, due to custom frameworks, etc. With microservices, the workload can be completely decomposed in terms of APIs/endpoints and APM tools can provide full observability of production workload traffic and performance characteristics for each single API. This guarantees that the replicated workload can reproduce the production traffic as closely as possible.

Test data

Both test script data, that is datasets used in the test script, and test environment data, that is datasets in any involved databases/datastores, have to be characterized both in terms of size and variance so as to reproduce the production performances.

Test script data

The test script data has to be characterized in order to guarantee production-like performances (e.g. cache behavior). In case this characterization is difficult, the best practice is to adopt a worst-case approach.

Test environment data

The test data must be sized and have an adequate variance to guarantee production like performances in the interaction with databases/datastores (e.g. query response times).

Test scenarios

Most performance test tools provide the ability to easily define and modify the test scenarios on top of already defined test cases/scripts, test case-mix, and test data. This is especially useful in the Akamas context where it might be required to execute a specific test scenario, based on the specific optimization goal defined. The most common (and useful, in the Akamas context) test scenarios are described here below.

Load tests

A load test aims at measuring system performance against a specified workload level, typically the one experienced or expected in production. Usually, the workload level is defined in terms of virtual user concurrency or request throughput.

In the load test, after an initial ramp-up, the target load level is maintained constant for a steady state until the end of the test.

When validating a load test, the following two key factors have to be considered:

  • The steady-state concurrency/throughput level: a good practice is to apply a worst-case approach by emulating at least 110% of the production throughput;

  • The steady-state duration: in general defining the length for steady-state is a complex task because it is strictly dependent on the technologies under test and also because phenomena such as bootstraps, warm-ups, and caching can affect the performance and behavior of the system only before or after a certain amount of time; as a general guide to validate the steady-state duration, it is useful to:

    1. execute a long-run test by keeping the defined steady-state for at least 2h to 3h;

    2. analyze test results by looking for any variation in the performance and behavior of the system over time;

    3. In case no variation is observed, shorten the defined same steady-state to at least 30+min.

Stress tests

A Stress test is all about pushing the system under test to its limit.

Stress tests are useful to identify the maximum throughput that an application can cope with while working within its SLOs. Identifying the breaking point of an application is also useful to highlight the bottleneck(s) of the application.

A stress test also makes it possible to understand how the system reacts to excessive load, thus validating the architectural expectations. For example, it can be useful to discover that the application crashes when reaching the limit, instead of simply enqueuing requests and slowing down processing them.

Endurance tests

An endurance test aims at validating the system's performance over an extended period of time.

Validating tests vs production

The first validation is provided by utilization metrics (e.g. CPU, RAM, I/O), which should closely display in the test environments the same behavior of production environments. If the delta is significant, some refinements of the test case and environment might be required to close the gap and gain confidence in the test results.

Creating workflows for live optimizations

More in detail, a typical workflow includes the following types of tasks:

  • Applying the configuration, by preparing and then applying the parameter configuration that has been recommended and/or approved to the target environment - this may require interfacing configuration management tools or pushing configuration to a repository

Depending on the complexity of the system, the workflow might be composed by multiple actions of the same type, each operating on separate components of the target system.

Defining workloads

For a live optimization study, it is required to specify which component metrics represent the different workloads observed on the target system. A workload could be represented by either a metric directly measuring that workload, such as the application throughput, or a proxy metric, such as the percentage of reads and writes in your database.

Akamas features automatic detection of workload contexts, corresponding to different patterns for the same workload. For example, workload context could correspond to the peak or idle load, or to the weekend or weekday traffic. This allows Akamas to recommend safe configurations based on the observed behavior of the system under similar workload conditions.

Moreover, Akamas also provides customizable safety policies that drive the Akamas optimizer in evaluating candidate configurations with respect to defined goal constraints.

Online mode

Live optimizations can operate in one of the following online modes:

  • recommendation (or manual) mode (the default mode): Akamas does not immediately apply a configuration identified by Akamas AI: a new configuration is first recommended to the user, who needs to approve it, possibly after modifying it, before it gets applied - this is also referred to as human-in-the loop scenario;

  • fully autonomous (or automatic) mode: new configurations are immediately applied by Akamas as soon as they are generated by the Akamas AI, without being first recommended to (and approved by) the user.

It is worth noticing that under a recommendation mode, there might be a significant delay between the time a configuration is identified by Akamas and the time the recommended changes get applied. Therefore, the Akamas AI leverages the workload information differently when looking for a new configuration, depending on the defined online mode:

  • in the recommendation mode, Akamas takes into account all the defined workloads and looks for the configuration that best satisfies the goal constraints for all the observed workloads and provides the best improvements for all of them

  • in the fully autonomous mode, Akamas works on a single workload at each iteration (based on a customizable workload strategy - see below) and looks for an optimized configuration for that specific workload to be immediately applied in the next iteration, even if it might not be the best for the different workloads

Notice that the online mode can be changed at any time, that is while the optimization study is running, to become immediately effective. For example, a live optimization could initially operate in recommendation mode and then be changed to fully autonomous mode afterward.

Defining windowing policies

For both offline and live optimization studies, it is possible to define how to identify the time windows that Akamas needs to consider for assessing the result of an experiment. Defining a windowing policy helps achieve reliable optimizations by excluding metrics data points that should not influence the score of an experiment.

The following two windowing policies are available:

  • Trim windowing: discards the initial and final part of an experiment - e.g. to exclude warm-up and tear-down phases - trim windowing policy is the default (with entire interval selection whether no trimming is specified)

  • Stability windowing: discard those parts that do not correspond to the most stable window - this leverages the Akamas features of automatically identifying the most stable window based on the user-specified specified criteria

Best Practices

The following sections provide general best practices on how to define suitable windowing policy.

Define windowing based on the optimization goal

In order to make the optimization process fully automated and unattended, Akamas automatically analyzes the time series of the collected metrics of each experiment and calculates the experiment score (all the system metrics will also be aggregated).

Based on the optimization goal, it is important to instruct Akamas on how to perform this experiment analysis, in particular, by also leveraging Akamas windowing policies.

For example, when optimizing an online or transactional application, there are two common scenarios:

  1. Increase system performance (i.e. minimize response time) or reduce system costs (i.e. decrease resource footprint or cloud costs) while processing a given and fixed transaction volume (i.e. a load test);

  2. Increase the maximum throughput a system can support (i.e., system capacity) while processing an increasing amount of load (e.g. a stress test).

In the first scenario, a load test scenario is typically used: the injected load (e.g. virtual users) ramps up for a period, followed by a steady state, with a final ramp-down period. From a performance engineering standpoint, since the goal is to assess the system performance during the steady state, the warm-up and tear-down periods can be discarded. This analysis can be automated by applying a windowing policy of type "trim" upon creating the optimization study, which makes Akamas automatically compute the experiment score by discarding a configurable warm-up and tear-down period.

In the second scenario, a stress test is typically used: the injected load follows a ramp with increasing levels of users, designed to stress the system up to its limit. In this case, a performance engineer is most likely interested in the maximum throughput the system can sustain before breaking down (possibly while matching a response time constraint). This analysis can be automated by applying a windowing policy of type "stability", which makes Akamas automatically compute the experiment score in the time window where the throughput was maximized but stable for a configurable period of time.

When optimizing a batch application, windowing is typically not required. In such scenarios, a typical goal is to minimize batch duration or aggregate resource utilization. Hence, there is no need to define any windowing policy: by default, the whole experiment timeframe is considered.

Finding an effective stability window

Setting up an effective stability window requires some knowledge of the test scenario and the variability of the environment.

As a general guideline it is recommended to run a baseline study with a stability window set to a low value, such as a value close to 0 or half of the expected mean of the metric, and then to inspect the results of the baseline to identify which window has been identified and update the standard deviation threshold accordingly. When using a continuous ramp the test has no plateaus, so the standard deviation threshold should be a bit higher to account for the increment of the traffic in the windowing period. On the contrary, when running a staircase test with many plateaus, the standard deviation can be smaller to identify a period of time with the same amount of users.

Applying the standard deviation filter to very stable metrics, such as the number of users, simplifies the definition of the standard deviation threshold but might hide some instability of the environment when subject to constant traffic. On the other hand, applying the threshold to a more direct measure of the performance, such as the throughput, makes it easier to identify the stability period of the application but might require more baseline experiments to identify the proper threshold value. The logs of the scoring phase provide useful insights into the maximum standard deviation found and the number of candidate windows that have been identified given a threshold value, which can be used to refine the threshold in a few baseline experiments.

Most of the substeps are common for both a and an , even if they might need to be conceived differently in these two different contexts:

Other optional and mandatory steps are specific for offline optimization studies () and live optimization studies ().

The section of the reference guide describes the template for creating a study, while the commands for creating a study are on the page. For offline optimization studies only, the Akamas UI displays the "Create a study" button that provides a visual step-by-step procedure for creating a new optimization study (see the following figure).

(optional - typically after defining the goal & constraints)

(optional - typically after defining the goal & constraints)

Notice that Akamas also allows existing offline optimization studies to be duplicated either from the Akamas UI (see the following figure) or from the command line (refer to the page).

(mandatory - typically after defining the goal & constraints)

(optional - typically when defining the optimization steps)

mocking these microservices and simulating realistic response times using simulation tools such as ;

A workflow for a automates all the actions required to interface the configuration management. Notice that metrics collection is an implicit action that does not need to be coded as part of the workflow.

As expected, with respect to , there are no actions to apply synthetic workloads as part of a load-testing scenario.

The page of the section in the reference guide describes how to define the corresponding structure.

Akamas provides several parameters governing how the Akamas optimizer operates and leverages the workload information while a live optimization study is being executed. The most important parameter is the online mode (see ) as it related to whether the human user is part of the approval loop when the Akamas AI recommends a configuration to be applied.

The online mode can be specified at the study level and can also be overridden at the step level (only for steps of type "optimize" - see section ). The page of the section in the reference guide describes how to define the corresponding structure. This can be done either from the Akamas command line (see page ) or from the Akamas AI (see the following figure).

The page of the section in the reference guide describes the corresponding structures. For offline optimization studies only, the Akamas UI allows the windowing policies to be defined as part of the visual procedure activated by the "Create a study" button (see the following figures).

live optimization study
offline optimization study
defining the optimization goal & constraints
defining the optimization parameters & metrics
defining the optimization steps
Study template
Resource Management command
defining windowing policies
defining KPIs
Resource management commands
defining workloads
setting safety policies
https://github.com/spectolabs/hoverfly
live optimization study
workflows for offline optimization studies
see below
see below
Workload selection
Study template
Defining optimization steps
Optimize step
Study template
Optimizer option commands
Windowing policy
Study template
here below

Optimization Insights

While the main result of an optimization study is to identify the optimal configuration with respect to the defined goal & constraints, any suboptimal configuration that is improving on one of the defined KPIs can be also very valuable.

These configurations are displayed in a dedicated section of the Akamas UI and also displayed in other areas of the Akamas UI as textual badges "Best <KPI name>" referred to as (insights) tags.

Insights section

The following figures show the Insights section displayed on the study page and the Insights pages that can be drilled down to.

The following figure shows the insights tags in the Analysis tab:

Please notice that "Best", "Best Memory Limit" and any other KPI-related tags are displayed in the Akamas UI while the study progresses and thus may be reassigned as new experiments get executed and their configurations are scored and provide their results for the defined study KPIs. See

Insights tags

After starting a study, any finished experiment is labeled by one or more insights tags "Best <KPI name>" in case the corresponding configuration provides the best result so far for those KPIs. Notice that for experiments involving multiple trials, tags are only assigned after all their trials have finished.

Of course, after the very first experiment (i.e. a baseline) finishes, all tags are assigned to the corresponding configuration. This is displayed by the following figure for a study where the KPIs named CPU with formula renaissance.cpu_used and direction minimize and MEM with formula renaissance.mem_used and direction minimize:

When the following experiments finish, tags are reevaluated according with respect to the computed goal score and the achieved results for any single KPI. In this study, experiment #2 provided a better result for both the CPU and the study goal, so it got both the tags Best CPU and Best renaissance.response_time(which is defined as the goal of the study). Notice that the blue star is displayed by Akamas (except for baseline) to highlight the fact that this was automatically generated by Akamas and not assigned by a user.

Afterward, experiment #3 got the tag as the best configuration while experiment #4 got the tag Best CPU. as improving on experiment #2. Therefore two configurations displayed the blue star.

A number of experiments later, experiment #7 provided better memory usage than the baseline so got the tag Best MEM assigned. At this point, three configurations have the blue start, thus making evident that there are tradeoffs when trying to optimize with respect to the goal and the KPIs.

Running optimization studies

Before actually running an optimization study, it is highly recommended to read the following sections:

Offline optimization studies

This can be useful for multiple reasons, including the case of an error (e.g. a misconfigured workflow) that requires "restarting" the study.

Live optimization studies

For live optimization studies, it is possible to stop a study and restart it. However, please notice that this is an irreversible action, that would delete all the executed experiments, so basically, restarting a live study means starting it from scratch.

Analyzing results of offline optimization studies

Since an offline optimization study lasts for at most the number of configured experiments and typically runs in a test or pre-production environment, results could be safely either analyzed after the study has completely finished.

However, it is a good practice to analyze partial results while the study is still running as this may provide useful insights about both the system being optimized (e.g. understanding of the system dynamics and sub-optimal configurations that could be immediately applied) and about the optimization study itself (e.g. how to re-design a workflow or change constraints), early-on.

The Akamas UI displays the results of an offline optimization study in different visual areas:

  • the Best Configuration section provides the optimal configuration identified by Akamas, as a list of recommended values for the optimization parameters compared to the baseline and ranked according to their relevance;

  • the Progress tab see the following figures) displays the progression of the study with respect to the study steps, the status of each experiment (and trial), its associated score, and the parameter values of the corresponding configurations; this area is mostly used for study monitoring (e.g. identifying failing workflows) and troubleshooting purposes;

  • the Analysis tab (see the following figures) displays how the baseline and experiments score with respect to the optimization goal, and the values of metrics and parameters for the corresponding configurations; this area supports the analysis of the different configurations;

  • the Metrics tab (see the following figure) displays the behavior of the metrics for all executed experiments (and trials); this area supports both study validation activities and deeper analysis of the system behavior;

Before running optimization studies

The following provides some best practices that can be adopted before launching optimization studies, in particular for offline optimization studies.

Dry-running the optimization study

It is recommended to execute a dry-run of the study to verify that the workflow works as expected and in particular that the telemetry and configuration management steps are correctly executed.

Verify that workflow actually works

It is important to verify that all the steps of the workflow complete successfully and produce the expected results.

Verify that parameters are applied and effective

When approaching the optimization of new applications or technologies, it is important to make sure all the parameters that are being set are actually applied and used by the system.

Depending on the specific technology at hand, the following issues can be found:

  • parameters were set but they are not applied - for example parameters were set in the wrong configuration file or the path is not correct;

  • some automatic (corrective) mechanisms are in place that overrides the values applied for the parameters.

Therefore, it is important to always verify the actual values of the parameters once the system is up & running with a new configuration, and make sure they match the values applied by Akamas. This is typically done by leveraging:

  • monitoring tools, when the parameters are available as metrics or properties of the system;

  • native administration tools, which are typically available for introspection or troubleshooting activities (e.g. jcmd for the JVM).

Verify that load testing works

It is important to verify that the integration with load testing tools actually executes the intended load test scenarios.

Verify that telemetry collects all the relevant metrics

It is important to make sure that the integration with telemetry providers works correctly and that all the relevant metrics of the system are correctly collected.

Data-gathering from the telemetry data sources is launched at the end of the workflow tasks. The status of the telemetry process can be inspected in the Progress tab, where it is also possible to inspect the telemetry logs in case of failures.

Please notice that the telemetry process fails if the key metrics of the study cannot be gathered. This includes metrics defined in the goal function or constraints.

Baselining the system

Before running the optimization study, it is important to make sure the system and the environment where the optimization is running provide stable and reproducible performance.

Make sure the system performance is stable

In order to ensure a successful optimization, it is important to make sure that the target system displays stable and predictable performance and does not suffer from random variations.

To make sure this is the case, it is recommended to create a study that only runs a single baseline experiment. In order to assess the performance of the system, Akamas trials can be used to execute the same experiments (hence, the same configuration) multiple times (e.g. three times). Once the experiment is completed, the resulting performance metrics can be analyzed to assess the stability. The analysis can either be done by leveraging aggregate metrics in the Analysis tab, or to a deeper level on the actual time series by accessing the Metrics tab from the Akamas UI.

Ideally, no significant performance variation should be observed in the different trials, for the key system performance metrics. Otherwise, it is strongly recommended to identify the root cause before proceeding with the actual optimization activity.

Backuping the original configuration

Before launching the optimization it might be a good idea to take note of (or backup) the original configuration. This is very important in the case of Linux OS parameters optimization.

Defining optimization goal & constraints

In general, any performance engineering, tuning, and optimization activity involves complex tradeoffs among different - and potentially conflicting - goals and system performance metrics, such as:

  • Maximizing the business volume an application can support, while not making the single transaction slower or increasing errors above a desired threshold

  • Minimizing the duration of a batch processing task, while not increasing the cloud costs by more than 20% or using more than 8 CPUs

Akamas support all these (and other) scenarios by means of the optimization goal, that is the single metric or the formula combining multiple metrics that have to be either minimized or maximized, and one or more constraints among metrics of the system.

In general, constraints can be defined as either absolute constraints (e.g. app.response_time < 200 ms) or as relative constraints with respect to a baseline (e.g. app_response_time < +20% of the baseline), that is the current configuration in place, typically corresponding to the very first experiment in an offline optimization study which. Therefore, relative constraints are only applicable to offline optimization studies, while absolute constraints are applicable to both absolute and relative constraints.

Please notice that any experiment that does not respect the constraints is marked by Akamas as failed, even if correctly executed. The reason for this failure can be inspected in the experiment status. Similarly to workflow failures (see below), the Akamas AI engine automatically takes any failure due to constraint violations into account when searching the optimization space to identify the parameter configurations that might improve the goal metrics while matching constraints.

Best Practices

There are no general guidelines and best practices on how to best define goals & constraints, as this is where experience, knowledge, and processes meet.

Defining KPIs

While the optimization goal drives the Akamas AI toward optimal configurations, there might be other sub-optimal configurations of interest in case they do not simply match the optimization constraints but might also improve on some Key Performance Indicators (KPIs).

For example:

  • for a Kubernetes microservice Java-based application, a typical optimization goal is to reduce the overall (infrastructure or cloud) cost by tuning both Kubernetes and JVM parameters while keeping SLOs in terms of application response time and error rate under control

  • among different configurations that provide similar cost reduction in addition to matching all SLOs, a configuration that would also significantly cause the application response time might be worth considering with respect to an optimal configuration that does not improve on this KPI

Akamas automatically considers any metric referred to in the defined optimization goal and constraints for an offline optimization study as a KPI. Moreover, any other metrics of the system component can be specified as a KPI for an offline optimization study.

Once KPIs are defined, Akamas will represent the results of the optimization in the Insights section of the Akamas UI. Moreover, the corresponding suboptimal configuration associated with a specific KPI is highlighted in the Akamas UI by a textual badge "Best <KPI name>".

Setting safety policies

While Akamas leverages similar AI methods for both live optimizations and optimization studies, the way these methods are applied is radically different. Indeed, for optimization studies running in pre-production environments, the approach is to explore the configuration space by also accepting potential failed experiments, to identify regions that do not correspond to viable configurations. Of course, this approach cannot be accepted for live optimization running in production environments. For this purpose, Akamas live optimization uses observations of configuration changes combined with the automatic detection of workload contexts and provides several customizable safety policies when recommending configurations to be approved, revisited, and applied.

Exploration factor

Akamas provides an optimizer option known as the exploration factor that only allows gradual changes to the parameters. This gradual optimization allows Akamas to observe how these changes impact the system behavior before applying the following gradual changes.

By properly configuring the optimizer, Akamas can gradually explore regions of the configuration space and slowly approach any potentially risky regions, thus avoiding recommending any configurations that may negatively impact the system. Gradual optimization takes into account the maximum recommended change for each parameter. This is defined as a percentage (default is 5%) with respect to the baseline value. For example, in the case of a container whose CPU limit is 1000 millicores, the corresponding maximum allowed change is 50 millicores. It is important to notice that this does not represent an absolute cap, as Akamas also takes into account any good configurations observed. For example, in the event of a traffic peak, Akamas would recommend a good configuration that was observed working fine for a similar workload in the past, even if the change is higher than 5% of the current configuration value.

Notice that this feature would not work for categorical parameters (e.g. JVM GC Type) as their values do not change incrementally. Therefore, when it comes to these parameters, Akamas by default takes a conservative approach of only recommending configurations with categorical parameters taking already observed before values. This still allows some never observed values to be recommended as users are allowed to modify values also for categorical parameters when operating in human-in-the-loop mode. Once Akamas has observed that that specific configuration is working fine, the corresponding value can then be recommended. For example, a user might modify the recommended configuration for GC Type from Serial to Parallel. Once Parallel has been observed as working fine, Akamas would consider it for future recommendations of GC Type, while other values (e.g. G1) would not be considered until verified as safe recommendations.

The exploration factor can be customized for each live optimization individually and changed while live optimizations are running.

Safety factor

Akamas provides an optimizer option known as the safety factor designed to prevent Akamas from selecting configurations (even if slowly approaching them) that may impact the ability to match defined SLOs. For example, when optimizing container CPU limits, lower and lower CPU limits might be recommended, up to the point that the limit becomes too low that the application performance degrades.

Akamas takes into account the magnitude of constraint breaches: a severe breach is considered more negative than a minor breach. For example, in the case of an SLO of 200 ms on response time, a configuration causing a 1 sec response time is assigned a very different penalty than a configuration causing a 210 ms response time. Moreover, Akamas leverages the smart constraint evaluation feature that takes into account if a configuration is causing constraints to approach their corresponding thresholds. For example, in the case of an SLO of 200 ms on response time, a configuration changing response time from 170 ms to 190 ms is considered more problematic than one causing a change from 100 ms to 120 ms. The first one is considered by Akamas as corresponding to a gray area that should not be explored.

The safety factor is also used when starting the study in order to validate the behavior of the baseline to identify the safety of exploring configurations close to the baseline. If the baseline presents some constraint violations, then even exploring configurations close to the baseline might cause a risk. If Akamas identifies that, in the baseline configuration, more than (safety_factor*number_of_trials) manifest constraint violations then the optimization is stopped.

If your baseline has some trials failing constraint validation we suggest you analyze them before proceeding with the optimization

The safety factor is set by default to 0.5 and can be customized for each live optimization individually and changed while live optimizations are running.

Outlier detection

It is also worth mentioning that Akamas also features an outlier detection capability to compensate for production environments typically being noisy and much less stable than staging environments, thus displaying highly fluctuating performance metrics. As a consequence, constraints may fail from time to time, even for perfectly good configurations. This may be due to a variety of causes, such as shared infrastructure on the cloud, slowness of external systems, etc.

Defining optimization steps

A final step in defining an optimization study is to specify specifies the sequence of steps executed while running the study.

The following four types of steps are available:

  • Baseline: performs an experiment and sets it as a baseline for all the other ones

  • Bootstrap: imports experiments from other studies

  • Preset: performs an experiment with a specific configuration

  • Optimize: performs experiments and generates optimized configurations

Please notice that at least one baseline step is always required in any optimization study. This applies not only to offline optimization studies, but also to live optimization studies as it is being used to suggest changes to parameter values starting from the default values.

Best Practices

The following sections provide some best practices on how to best approach the step of defining the baseline step.

Ensure the baseline configuration is correct

In an optimization study, the baseline is an important experiment as it represents the system performance with the current configuration, and serves as a reference to assess the relative improvements the optimization achieved.

Therefore, it is important to make sure the baseline configuration of the study correctly reflects the current configuration - be it the vendor default or the result of a manual tuning exercise.

Evaluate which parameters to include in the baseline configuration

When defining the study baseline configuration it is important to evaluate which parameters to include. Indeed, several technologies have default values assigned to most of their configuration parameters. However, the runtime behavior can be different depending on whether the parameter is set to the default value or it is not set at all.

Therefore, it is recommended to review the current configuration (e.g. the one in place in production) and identify which parameters and values have been set (e.g. JVM maxHeapSize = 2GB, gcType = Parallel, etc.), and then to only set those parameters with their corresponding values, without adding any other parameters. This ensures that the specified baseline is consistent with the real production setup.

Defining parameters & metrics

As illustrated by the previous and following figures, during this step is also possible to edit the range of values associated with each optimization parameter with respect to the default domain provided by either the original or custom optimization pack in use for the respective technology.

Parameter rendering

By default, all parameters specified in the parameters selection of a study are applied ("rendered"). Akamas allows specifying which configuration parameters should be applied in the optimization steps. More precisely:

  • parameter rendering is available at the step level for baseline, preset, and optimize steps

  • parameter rendering is not available for bootstrap steps (bootstrapped experiments are not executed)

This feature can be useful to deal with the different strategies through which applications and systems accept configuration parameters.

Best Practices

The following sections provide some best practices on how to best approach the step of defining optimization parameters. .

Configure parameters domains based on environment specs

Since the parameter domain defines the range of values that the Akamas AI engine can assign to the parameter, when defining the system parameters to be optimized, it is important to review the parameter domains and adjust them based on the system characteristics of the target system, environment and best practices in place.

Akamas optimization packs already provide parameter domains that are correct for most situations. For example, the OpenJDK 11 JVM gcType is a categorical parameter that already includes all the possible garbage collectors that can be set for this JVM version.

For other parameters, there are no sensible default domains as they depend on the environment. For example, the OpenJDK 11 maxHeapSize JVM parameter dictates how much memory the JVM can use. This obviously depends on the environment in which the JVM runs. For example, the upper bound might be 90% of the memory of the virtual machine or container in which the JVM runs.

Configure parameter constraints based on Optimization Pack best practices

Depending on the specific technology under optimization, the configuration parameters may have relationships among themselves. For example, in a JVM the newSize parameter defines the size of a region of the JVM heap, and hence its value should be always less than the maxHeapSize parameter.

Akamas AI engine supports the definition of constraints among parameters as this is a frequent need when optimizing real-life applications.

It is important to define the parameter constraints when creating a new study. The optimization pack documentation provides guidelines on what are the most important parameter constraints for the specific technology.

When optimizing a new or custom technology, it may happen that some experiments fail due to unknown parameter constraints being violated. For example, the application may fail to start and only by analyzing the application error logs, the reason for the failure can be understood. For a Java application, the JVM error message (e.g. "new size cannot be larger than max heap size") could provide useful hints. This would reveal that some constraints need to be added to the parameter constraints in the study.

While the Akamas AI engine has been designed to learn from failures, including those due to relationships among parameters that were not explicitly set as constraints, setting parameter constraints may help avoid unnecessary failures and thus speed up the optimization process.

Insights section for an offline optimization study
Insights details with comparison among selected configurations
Insights details for a specific configuration

Once all the preparatory steps for creating a study are done, running a study is straightforward: An optimization study can be started from either the Akamas UI (see the following figures) or the command line (refer to the page).

or

Once started, managing studies is different for offline optimization studies (see ) and live optimization studies (see ).

Notice that once an offline optimization study has started, it can only be stopped or let be finished and not restarted again. However, it is also possible to reuse experiments executed in another study in another (successfully or not) finished study - this is called bootstrapping and is illustrated by the following figure (also refer to the page on the reference page).

the Insights section (see the following figure) displays any suboptimal configurations that have been identified for the study KPIs, and also allows making comparisons among them and the best configuration - the page describes in further detail the Insight section and the insights tags displayed in other areas of the Akamas UI.

Inspecting the data gathering logs from the Akamas UI

The first fundamental step in creating a study is to define the study . While this step might be perceived as somewhat straightforward (e.g. constraints could be simply translated from SLOs already in place), defining the optimization goal really requires carefully balancing complexity and effectiveness, also as part of the general (iterative) optimization process. Please also read the section here below.

Please notice that when defining constraints for an optimization study, it is required to also include those constraints listed in the Constraints section of the respective Optimization Packs which express internal constraints among parameters. For example, in case OpenJDK 11 components are to be tuned, the reference section is .

The page of the in the reference guide describes the corresponding structures. For offline optimization studies only, the Akamas UI allows the optimization goal and constraints to be defined as part of the visual procedure activated by the "Create a study" button (see the following figure).

Please refer to the section for a number of examples related to a variety of technologies and the guide for real-world examples.

The page of the section in the reference guide describes how to define the corresponding structure. Specifying the KPIs can be done while first defining the study or from the Akamas UI, at either study creation time or afterward (see the following figures).

Please notice that KPIs can also be re-defined after an offline optimization study has been completed as their definition does not affect the optimization process, only the evaluation of its results. See the section and the page.

Akamas provides a few customizable optimizer options (refer to the options described on the page of the reference guide) that should be configured so as to make configurations recommended in live optimization and applied to production environments as safe as possible.

Example of optimization study with two steps: baselining and optimize

The page in the section in the reference guide describes how to define the corresponding structures for each of the different types of steps allowed by Akamas. For offline optimization studies only, the Akamas UI allows the optimization steps to be defined as part of the visual procedure activated by the "Create a study" button (see the following figure).

In addition to the best practices here below, please refer to the section for a number of examples related to a variety of technologies and the guide for real-world examples.

After defining the goal and its constraints, the following substep in creating an optimization study is specifying the optimization and . In particular, selecting the parameters that are going to be tuned to optimize the system is a critical decision that requires carefully balancing complexity and effectiveness. As for goals & constraints, also this step may require adopting an iterative approach. See also the section here below.

The and pages of the section in the reference guide describe how to define the corresponding structure. For offline optimization studies only, the Akamas UI allows the parameters and metrics to be defined as part of the visual procedure activated by the "Create a study" button (see the following figure).

Please also refer to the for a number of selected technologies. Some examples provided in the Knowledge Base guide may also provide useful guidance.

Please refer to the page to see how to configure parameter rendering.

The parameter jvm_gcType as displayed in the OpenJDK 11 optimization pack

Defining good parameter domains is important to ensure the parameter configurations suggested by the Akamas AI engine will be as good as possible. Notice that if the domain is not defined correctly, this may cause experiment failures (e.g. the JVM could not start if the maxHeapSize is higher than the container size). As discussed as part of the for defining robust workflows, the Akamas AI engine has been designed to learn configurations that may lead to failures and to automatically discover any hidden constraints found in the environment.

Resource management commands
Before running optimization studies
Analyzing results of offline optimization studies
Analyzing results of live optimization studies
Before applying optimization results
Bootstrap Step
Optimization Insights
Goal & Constraint
Study template
Optimization examples
Knowledge Base
KPIs
Study template
Analyzing offline optimization studies
Optimization Insights
Optimize step
Steps
Study template
Optimization examples
Knowledge Base
Parameter selection
Metric selection
Study template
Guidelines for choosing optimization parameters
Parameter rendering
here below
here below
goal & constraints
Best Practices
parameters
metrics
Best Practices
best practices

Guidelines for PostgreSQL

Suggested Parameters

When running a PostgreSQL optimization, consider starting from these recommendations:

Parameter
Recommendation

pg_max_connections

Keep its value under 1000 connections.

pg_effective_cache_size

75% of physical available memory to PostgreSQL.

pg_maintenance_work_mem

12% of physical available memory to PostgreSQL.

pg_max_wal_senders

Max replicas you expect to have, doubled.

pg_max_parallel_workers

Number of cores divided by 2.

pg_shared_buffers

25% of physical available memory to PostgreSQL.

Guidelines for Oracle Database

This page provides a list of best practices when optimizing an Oracle database with Akamas.

Memory Allocation Sub-spaces

This section provides some guidelines on the most relevant memory-related parameters and how to configure them to perform a high-level optimization of a generic Oracle Database instance.

Oracle DBAs can choose, depending on their needs or expertise, the desired level of granularity when configuring the memory allocated to the database areas and components, and let the Oracle instance automatically manage the lower layers. In the same way, Akamas can tune a target instance with different levels of granularity.

In particular, we can configure an Akamas study so that it simply tunes the overall memory of the instance, leaving Oracle automatically manage how to allocate it between shared memory (SGA) and program memory (PGA); alternatively, we can tune the target values of both of these areas and let Oracle take care of their components, or go even deeper and have total control of the sizing of every single component.

Notice: running the queries in this guide requires a user with the ALTER SYSTEM, SELECT ON V_$PARAMETER, and SELECT ON V_$OSSTAT privileges

Also notice that to define the domain of some of the parameters you need to know the physical memory of the instance. You can find the value in MiB running the query select round(value/1024/1024)||'M' "physical_memory" from v$osstat where stat_name='PHYSICAL_MEMORY_BYTES'. Otherwise, if you have access to the underlying machine, you can run the bash command free -m

Tuning the Total Memory

This is the simplest of the memory-optimization set of parameters, where the study configures only the overall memory available for the instance and lets Oracle’s Automatic Memory Management (AMM) dynamically assign the space to the SGA and PGA. This is useful for simple studies where you want to minimize the overall used memory, usually coupled with constraints to make sure the performances of the overall system remain within acceptable values.

  • memory_target: this parameter specifies the total memory used by the Oracle instance. When AMM is enabled can find the default value with the query select display_value "memory_target" from v$parameter where name='memory_target'. Otherwise, you can get an estimate summing the configured SGA size found running select display_value "sga_target" from v$parameter where name LIKE 'sga_target' and the size of the PGA found with select ceil(value/1024/1024)||'M' "physical_memory" from v$pgastat where name='maximum PGA allocated'. The explored domain strongly depends on your application and hardware, but an acceptable range goes from 152M (the minimum configurable value) to the physical size of your instance. Over time, Akamas will learn to avoid automatically the configuration with not-enough memory.

To configure the Automatic Memory Management you also need to make sure that the parameters sga_target and pga_aggregate_limit are set to 0 by configuring them among the default values of a study, or manually running the configuration queries.

The following snippet shows the parameter selection to tune the total memory of the instance. The domain is configured to go from the minimum value to the maximum physical memory (7609M in our example).

parametersSelection:
- name: ora.memory_target
  domain: [152, 7609]

Tuning the Shared and Program Memory Global Areas

With the following set of parameters, Akamas tunes the individual sizes of the SGA and PGA, letting Oracle’s Automatic Shared Memory Management (ASMM) dynamically size their underlying SGA components. You can leverage these parameters for studies where, like the previous scenario, you want to find the configuration with the lowest memory allocation that still performs within your SLOs. Another possible scenario is to find the balance in the allocation of the memory available that best fits your optimization goals.

  • sga_target: this parameter specifies the target SGA size. When ASMM is configured, you can find the default value with the query select display_value "sga_target" from v$parameter where name='sga_size'. The explored domain strongly depends on your application and hardware, but an acceptable range goes from 64M (the minimum configurable value) to the physical size of your instance minus a reasonable size for the PGA (usually up to 80% of physical memory).

  • pga_aggregate_target: this parameter specifies the target PGA size. You can find the default value with the query select display_value "pga_aggregate_target" from v$parameter where name='pga_aggregate_target'. The explored domain strongly depends on your application and hardware, but an acceptable range goes from 10M (the minimum configurable value) to the physical size of your instance minus a reasonable size for the SGA.

To tune the SGA and PGA, you also must set the memory_target to 0 to disable AMM by configuring them among the default values of a study, or manually running the configuration queries. ASMM will dynamically tune all the SGA components whose size is not specified, so set to 0 all the parameters (db_cache_size, log_buffer, java_pool_size, large_pool_size, shared_pool_size, and streams_pool_size) unless you have any specific requirements.

The following snippet shows the parameter selection to tune both SGA and PGA sizes. Each parameter is configured to go from the minimum value to 90% of the maximum physical memory (6848M in our example), allowing Akamas to explore all the possible ways to partition the space between the two areas and find the best configuration for our use case:

parametersSelection:
- name: ora.sga_target
  domain: [64, 6848]
- name: ora.pga_aggregate_target
  domain: [10, 6848]

The following code snippet forces Akamas to explore configuration spaces where the total memory, expressed in MiB, does not exceed the total memory available (7609M in our example). This allows speeding up the optimization avoiding configurations that won’t work correctly.

parameterConstraints:
- name: Limit total memory
  formula: ora.sga_target + ora.pga_aggregate_target <= 7609

Tuning the Shared Memory

With the following set of parameters, Akamas tunes the space allocated to one or more of the components that make the System Global Area, along with the size of the Program Global Area size. This scenario is useful for studies where you want to find the memory distribution that best fits your optimization goals.

  • pga_aggregate_target: this parameter specifies the size of the PGA. You can find the default value with the query select display_value "pga_aggregate_target" from v$parameter where name='pga_aggregate_target'. The explored domain strongly depends on your application and hardware, but an acceptable range goes from 10M (the minimum configurable value) to the physical size of your instance.

  • db_cache_size: this parameter specifies the size of the default buffer pool. You can find the default value with the query select * from v$sgainfo where name='Buffer Cache Size'.

  • log_buffer: this parameter specifies the size of the log buffer. You can find the default value with the query select * from v$sgainfo where name='Redo Buffers'.

  • java_pool_size: this parameter specifies the size of the java pool. You can find the default value with the query select * from v$sgainfo where name='Java Pool Size'.

  • large_pool_size: this parameter specifies the size of the large pool. You can find the default value with the query select * from v$sgainfo where name='Large Pool Size'.

  • streams_pool_size: this parameter specifies the size of the streams pool. You can find the default value with the query select * from v$sgainfo where name='Streams Pool Size'.

  • shared_pool_size: this parameter specifies the size of the shared pool. You can find the default value with the query select * from v$sgainfo where name='Shared Pool Size'.

The explored domains of the SGA components strongly depend on your application and hardware; an approach is to scale both up and down the baseline value by a reasonable factor to define the domain boundaries (eg: from 20% to 500% of the baseline).

To tune all the components set both the memory_target and sga_target parameters to 0 by configuring them among the default values of a study, or manually running the configuration queries.

Notice: if your system leverages non-standard block-size buffers you should consider tuning also the db_Nk_cache_size parameters.

The following snippet shows the parameter selection to tune the size of the PGA and the SGA components. The PGA parameter is configured to go from the minimum value to 90% of the maximum physical memory (6848M in our example), while the domains for the SGA components are configured scaling their default value by approximatively a factor of 10. Along with the constraint defined below, these domains give Akamas great flexibility while exploring how to distribute the available memory space:

parametersSelection:
- name: ora.pga_aggregate_target
  domain: [10, 6848]
- name: ora.db_cache_size
  domain: [128, 6848]
- name: ora.log_buffer
  domain: [1, 128]
- name: ora.java_pool_size
  domain: [4, 240]
- name: ora.large_pool_size
  domain: [12, 1024]
- name: ora.shared_pool_size
  domain: [12, 1024]

The following code snippet forces Akamas to explore configuration spaces where the total memory, expressed in MiB, does not exceed the total memory available (7609M in our example).

parameterConstraints:
- name: Limit total memory
  formula: ora.db_cache_size + name: ora.log_buffer + ora.java_pool_size + ora.large_pool_size + ora.shared_pool_size + ora.pga_aggregate_target <= 7609

You should also add to the equation any db_Nk_cache_size tuned in the study.

Before applying optimization results

The following best practices should be considered before applying a configuration identified by an offline optimization study from a test or pre-production environment to a production environment.

Most of these best practices are general and refer to any configuration change and application rollout, not only to Akamas-related scenarios.

Validating the study results

Any configuration identified by Akamas in a test or pre-production environment, by executing a number of experiments and trials in a limited timeframe, should be first validated before being promoted to production in its ability to consistently deliver the expected performance over time.

Running endurance tests

An endurance test typically lasts for several hours and can either mimic the specific load profile of production environments (e.g. morning peaks or low load phases during the night) or a simple constant high load (flat load). A specific Akamas study can be implemented for this purpose.

Applying results of optimization studies

When applying a new configuration to a production environment it is important to reduce the risk of severely impacting the supported services and allowing time to backtrack if required.

Adopt gradual rollouts

With a gradual rollout approach, a new configuration is applied to only a subset of the target system to allow the system to be observed for a period of time and avoid impacting the entire.

Several strategies are possible, including:

  • Canary deployment, where a small percentage of the traffic is served by the instance with the new configuration;

  • Shadow traffic, where traffic is mirrored and redirected to the instance with the new configuration, and responses are not impacting the user.

Assess the impact on the infrastructure and other applications

In the case of an application sharing entire layers or single components (e.g. microservices) with other applications, it is important to assess in advance the potential impact on other applications before applying a configuration identified by only considering SLOs related to a single application.

The following general considerations may help in assessing the impact on the infrastructure:

  • if the new configuration is more efficient (i.e. it is less demanding in terms of resources) or it does require changes to resource requirements (e.g. does not change K8s request limits), then the configuration can be expected to be beneficial as the resources will be freed and become available for additional applications;

  • If the new configuration is less efficient (i.e. it requires more resources), then appropriate checks of whether the additional capacity is available in the infrastructure (e.g. in the K8s cluster or namespace) should be done, as when allocating new applications.

As far as the other applications are concerned:

  • Just reducing the operational cost of a service does not have any impact on other applications that are calling or using the service;

  • While tuning service for performance may put the caller system in back-pressure fatigue, this is not the typical behavior of enterprise systems, where the most susceptible systems are on the backend side:

    • Tuning most external services will not increase the throughput much, which is typically business-driven, thus the risk to overwhelm the backends is low;

    • Tuning the backends allows the caller systems to handle faster connections, thus reducing the memory footprint and increasing the resilience of the entire system;

  • Especially in the case of highly distributed systems, such as microservices, the number of inflight packages for a given period of time is something to be minimized;

  • A latency reduction for a microservice implies fewer in-flight packages throughout the system, leading to better performance, faster failures, and fewer pending transactions to be rolled back in case of incidents.

Guidelines for JVM layer (OpenJDK)

Suggested Parameters (Open JDK 8)

When starting a new JVM optimization, the following is a list of recommended parameters to include in your study:

  • jvm_gcType

  • jvm_maxHeapSize

  • jvm_newSize

  • jvm_survivorRatio

  • jvm_maxTenuringThreshold

  • jvm_parallelGCThreads

  • jvm_concurrentGCThreads

Suggested Parameters (Open JDK 11)

When starting a new JVM optimization, the following is a list of recommended parameters to include in your study:

  • jvm_gcType

  • jvm_maxHeapSize

  • jvm_newSize

  • jvm_minHeapSize

  • jvm_activeProcessorCount

  • jvm_survivorRatio

  • jvm_maxTenuringThreshold

  • jvm_parallelGCThreads

  • jvm_concurrentGCThreads

Guidelines for choosing optimization parameters

In this section, some guidelines on how to choose optimization parameters are provided for the following specific technologies:

These guidelines also provide an example of how to approach the selection of parameters (and how to define the associated domains and constraints) in an optimization study.

JVM (OpenJ9)
JVM (OpenJDK8 and OpenJDK11)
Oracle Database
PostgreSQL

Optimizing OpenJ9

When optimizing Java applications based on OpenJ9, typically the goal is to tune the JVM from both the point of view of cost savings and quality of service.

Workflows

Applying parameters

The following is an example of templatized executions string:

#!/bin/bash
cd "$(dirname "$0")"
java ${jvm.*} -jar myApp.jar

A typical workflow

A typical workflow to optimize a Java application can be structured in two parts:

  1. Configure the Java arguments

  2. Run the Java application

Here’s an example of a typical workflow where Akamas executes the script containing the command string generated by the file configurator:

name: optimize-java-app
tasks:
  - name: Configure Parameters
    operator: FileConfigurator
    arguments:
      source:
        hostname: app.akamas.io
        username: akamas
        path: /home/akamas/app/run.sh.templ
        key: rsa-key
      target:
        hostname: app.akamas.io
        username: akamas
        path: /home/akamas/app/run.sh
        key: rsa-key

  - name: Launch Test
    operator: Executor
    arguments:
      command: bash /home/akamas/app/run.sh
      host:
        hostname: app.akamas.io
        username: akamas
        key: rsa-key

Telemetry Providers

Here’s a configuration example for a telemetry provider instance that uses Prometheus to extract all the JMX metrics defined in this optimization pack:

provider: Prometheus
config:
  address: monitoring.akamas.io
  port: 9090

where the configuration of the monitored component provides the additional references as in the following snippet:

name: jvm
description: target JVM
componentType: java-ibm-j9vm-8
properties:
  prometheus:
    instance: jvm
    job: jmx-exporter

Examples

Optimizing Linux

When optimizing Linux systems, typically the goal is to allow cost savings or improve performance and the quality of service, such as sustaining higher levels of traffic or enabling transactions with lower latency.

Workflows

Applying parameters

A typical workflow

You can organize a typical workflow to optimize Linux in three parts:

  1. Configure Linux

  2. Test the performance of the system

  3. Perform some cleanup

Here’s an example of a typical workflow for a Linux system:

Telemetry Providers

Please refer to the for the list of component types, parameters, metrics, and constraints.

Akamas offers many operators that you can use to apply the parameters for the tuned JVM. In particular, it is suggested to leverage the to create a configuration file or inject the arguments directly in the command string using a template.

Generate a configuration file or a command string containing the selected JVM parameters using a .

Use available to execute a performance test against the application.

Akamas can access JMX metrics using the This provider comes out of the box with a set of default queries to interrogate a Prometheus instance configured to fetch data from a .

See this for an example of a study leveraging the Eclipse OpenJ9 pack.

Please refer to the for the list of component types, parameters, metrics, and constraints.

Akamas provides the as the preferred way to apply Linux parameters to a system to be optimized. The operator connects via SSH to your Linux components and will employ different strategies to apply Linux parameters. Notice that this operator allows you to exclude some block/network devices from being configured.

Use the to apply configuration parameters to the operating system, no restart is required

Use to execute a performance test against the system

Use to perform any clean-up to guarantee any subsequent execution of the workflow will run without problems

Akamas does not provide any specialized telemetry solution to gather Linux metrics as these metrics can be collected in a variety of ways, leveraging a plethora of existing solutions. For example, the supports Linux system metrics.

OpenJ9 optimization pack
FileConfigurator Operator
FileConfigurator Operator
operators
Prometheus provider.
JMX Exporter
page
name: "linux workflow"
tasks:
- name: "set linux parameters"
  operator: "LinuxConfigurator"
  arguments:
    component: "mylinuxcomponent"

- name: "execute performance test"
  operator: "Executor"
  arguments:
    host:
      hostname: "perf.mycompany.com"
      key: "..."
      username: "perf"
    command: "/home/perf/start.sh"
Linux optimization pack
LinuxConfigurator operator
LinuxConfigurator operator
workflow operators
workflow operators
Prometheus provider

Optimizing Java OpenJDK

When optimizing Java applications based on OpenJDK, typically the goal is to tune the JVM from both the point of view of cost savings and quality of service.

Workflows

Applying parameters

The following is an example of templatized executions string:

#!/bin/bash
cd "$(dirname "$0")"
java ${jvm.jvm_gcType} ${jvm.jvm_minHeapSize} ${jvm.jvm_maxHeapSize} ${jvm_newSize} ${jvm_survivorRatio} -jar renaissance.jar -r 20 --csv renaissance.csv page-rank

A typical workflow

A typical workflow to optimize a Java application can be structured in two parts:

  1. Configure the Java arguments

  2. Run the Java application

Here’s an example of a typical workflow where Akamas executes the script containing the command string generated by the file configurator:

name: optimize-java-app
tasks:
  - name: Configure Parameters
    operator: FileConfigurator
    arguments:
        source:
            hostname: app.akamas.io
            username: akamas
            path: /home/akamas/app/run.sh.templ
            key: rsa-key
        target:
          hostname: app.akamas.io
          username: akamas
          path: /home/akamas/app/run.sh
          key: rsa-key

  - name: Launch Test
    operator: Executor
    arguments:
      command: bash /home/akamas/app/run.sh
      host:
        hostname: app.akamas.io
        username: akamas
        key: rsa-key

Telemetry Providers

Here’s a configuration example for a telemetry provider instance that uses Prometheus to extract all the JMX metrics defined in this optimization pack:

provider: Prometheus
config:
  address: monitoring.akamas.io
  port: 9090

where the configuration of the monitored component provides the additional references as in the following snippet:

name: jvm
description: target JVM
componentType: openjdk-11
properties:
    prometheus:
        instance: jvm
        job: jmx-exporter

Examples

Optimizing Web Applications

Telemetry configuration

No specialized telemetry solution to gather Web Application metrics is included. The following providers however can integrate with the provided metrics:

Workflows

Applying parameters

The provided component type does not define any parameter. The workflow will optimize parameters defined in other component types representing the underlying technological stack.

A typical workflow

A typical workflow to optimize a web application is structured in three parts:

  1. Configure and restart the application

  2. Run the test

  3. Perform the cleanup

Here's an example workflow to perform a test on a Java web application using NeoLoad as a load generator:

name: "webapp workflow"
tasks:
  - name: Set Java parameters
    operator: FileConfigurator
    arguments:
      source:
        hostname: myapp.mycompany.com
        username: ubuntu
        key: # ...
        path: /home/ubuntu/conf_template
      target:
        hostname: myapp.mycompany.com
        username: ubuntu
        key: # ...
        path: /home/ubuntu/conf

  - name: Restart application
    operator: Executor
    arguments:
      command: "/home/ubuntu/myapp_down.sh; /home/ubuntu/myapp_sh -opts '/home/ubuntu/conf'"
      host:
        hostname: myapp.mycompany.com
        username: ubuntu
        key: # ...

  - name: Run NeoLoadWeb load test
    operator: NeoLoadWeb
    arguments:
      accountToken: NLW_TOKEN
      projectFile:
        # NeoLoad projectfile location ...

Examples

Optimizing Kubernetes

When optimizing Kubernetes applications, typically the goal is to find the configuration that assigns resources to containerized applications so as to minimize waste and ensure the quality of service.

Workflows

Applying parameters

The following example is the definition of a deployment, where the replicas and resources are templatized in order to work with the FileConfigurator:

A typical workflow

A typical workflow to optimize a Kubernetes application is usually structured as the following:

  1. Wait for the application to be ready: run a custom script to wait until the rollout is complete.

  2. Run the test: execute the benchmark.

Here’s an example of a typical workflow for a system:

Telemetry Providers

Here’s a configuration example for a telemetry provider instance that uses Prometheus to extract all the Kubernetes metrics defined in this optimization pack:

where the configuration of the monitored component provides the additional filters as in the following snippet:

Please keep in mind that some resources, such as pods belonging to deployments, require wildcards in order to match the auto-generated names.

Examples

Guidelines for defining optimization studies

This section provides some guidelines on how to define optimization studies by means of a few examples related to single-technology/layer systems, in particular on how to define workflows and telemetry providers.

Guidelines for JVM (OpenJ9)

Suggested Parameters for OpenJ9

  • j9vm_minFreeHeap

  • j9vm_maxFreeHeap

  • j9vm_minHeapSize

  • j9vm_maxHeapSize

  • j9vm_gcCompact

  • j9vm_gcThreads

  • j9vm_gcPolicy

  • j9vm_codeCacheTotal

  • j9vm_compilationThreads

  • j9vm_aggressiveOpts

The following describes how to approach tuning JVM in the following areas:

Tuning JVM Heap

The most relevant JVM parameters are the ones defining the boundaries of the allocated heap (j9vm_minHeapSize, j9vm_maxHeapSize). The upper bound to configure for this domain strongly depends on the memory in megabytes available on the host instance or on how much we are willing to allocate, while the lower bound depends on the minimum requirements to run the application.

The free heap parameters (j9vm_minFreeHeap, j9vm_maxFreeHeap) define some boundaries for the free space target ratio, which impacts the trigger thresholds of the garbage collector. The suggested starting ranges are from 0.1 and 0.6 for the minimum free ratio range, and from 0.3 to 0.9 for the maximum.

The following represents a sample snippet of the section parametersSelection in the study definition:

It is also recommended to define the following constraints:

  • min heap size lower than or equal to the max heap size:

  • upper bound to be at least 5 percentage points higher than the lower bound

Tuning JVM Garbage Collection

The following JVM parameters define the behavior of the garbage collector:

  • j9vm_gcPolicy

  • j9vm_gcThreads

  • j9vm_gcCompact

The garbage collection policy (j9vm_gcPolicy) defines the collection strategy used by the JVM. This parameter is key for the performance of the application: the default garbage collector (gencon) is the best solution for most scenarios, but some specific kinds of applications may benefit from one of the alternative options.

The number of GC threads (j9vm_gcThreads) defines the level of parallelism available to the collector. This value can range from 1 to the maximum number of CPUs that are available or we are willing to allocate.

The GC compaction (j9vm_gcCompact) selects if garbage collections perform full compactions always, never, or based on internal policies.

The following represents a sample snippet of the section parametersSelection in the study definition:

Tuning JVM compilation

The following JVM parameters define the behaviors of the compilation:

  • j9vm_compilationThreads

  • j9vm_codeCacheTotal

The compilation threads parameter (j9vm_compilationThreads) defines the number available for the JIT compiler. Its range depends on the available CPUs.

The code cache parameter (j9vm_codeCacheTotal) defines the maximum size limit for the JIT code cache. Higher values may benefit complex server-type applications, at the expense of the memory footprint, so should be taken into account in the overall memory requirements.

The following represents a sample snippet of the section parametersSelection in the study definition:

Tuning JVM aggressive optimizations

The following JVM parameter defines the behavior of aggressive optimization:

  • j9vm_aggressiveOpts

Aggressive optimizations (j9vm_aggressiveOpts) enables some experimental features that usually lead to performance gains.

The following represents a sample snippet of the section parametersSelection in the study definition:

Please refer to the for the list of component types, parameters, metrics, and constraints.

Akamas offers many operators that you can use to apply the parameters for the tuned JVM. In particular, it is suggested to use the to create a configuration file or inject the arguments directly in the command string using a template.

Generate a configuration file or a command string containing the selected JVM parameters using a .

Use available to execute a performance test against the application.

Akamas can access JMX metrics using the . This provider comes out of the box with a set of default queries to interrogate a Prometheus instance configured to fetch data from a .

See this for an example of a study leveraging the Java OpenJDK pack.

This page intends to provide some guidance in optimizing web applications. Please refer to the for the list of component types, parameters, metrics, and constraints.

: this provider can be configured to ingest data points generated by any monitoring application able to export the data in CSV format.

integrations leveraging , or as a load generator can use this ad-hoc provider that comes out of the box and uses the metrics defined in this optimization pack.

Use the to interpolate the tuned parameters in the configuration files of the underlying stack.

Restart the application using an .

Wait for the application to come up using the or .

use any of the to trigger the execution of the performance test against the application.

use any of the to restore the application to the original state.

See this for an example of a study leveraging the Web Application pack.

Please refer to the for the list of component types, parameters, metrics, and constraints.

Akamas offers different operators to configure Kubernetes entities. In particular, you can use the to update the definition file of a resource and apply it with the .

Configure the Kubernetes artifacts: use the to create the definition files starting from a template.

Apply the new parameters: apply the updated definitions using the .

Akamas can access Kubernetes metrics using the This provider comes out of the box with a set of default queries to interrogate a Prometheus instance configured to fetch data from and .

See this for an example of a study leveraging the Kubernetes pack.

More complex real-world examples are provided by the guide.

Java OpenJDK optimization pack
FileConfigurator operator
FileConfigurator operator
operators
Prometheus provider
JMX Exporter
page
Web Application optimization pack
CSV File Provider
NeoLoad Web
LoadRunner Professional
LoadRunner Enterprise
FileConfigura operator
Executor operator
Sleep
Executor operator
available operators
available operators
page
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: ${deployment.k8s_workload_replicas}
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx:1.14.2
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: ${container.cpu_request}
              memory: ${container.memory_request}
            limits:
              cpu: ${container.cpu_limit}
              memory: ${container.memory_limit}
name: Kubernetes workflow
tasks:
  - name: Configure deployment parameters
    operator: FileConfigurator
    arguments:
      source:
        path: nginx-deployment.yaml.templ
        hostname: app.akamas.io
        username: akamas
        key: rsa-key
      target:
        path: nginx-deployment.yaml
        hostname: app.akamas.io
        username: akamas
        password: akamas

  - name: Apply parameters
    operator: Executor
    arguments:
      command: kubectl apply -f nginx-deployment.yaml
      host:
        hostname: app.akamas.io
        username: akamas
        password: akamas

  - name: Wait application ready
    operator: Executor
    arguments:
      command: bash /home/akamas/app/check-status.sh
      host:
        hostname: app.akamas.io
        username: akamas
        password: akamas

  - name: Run test
    operator: Executor
    arguments:
      command: bash /home/akamas/app/run-test.sh
      host:
        hostname: app.akamas.io
        username: akamas
        password: akamas
provider: Prometheus
config:
  address: monitoring.akamas.io
  port: 9090
name: nginx_pod
description: Pod running Nginx
componentType: Kubernetes Pod

properties:
  prometheus:
    job: 'kubernetes-cadvisor|kube-state-metrics'
    namespace: akamas
    pod: nginx-*
name: cluster
description: Cluster
componentType: Kubernetes Cluster

properties:
  prometheus:
    job: 'kubernetes-cadvisor|kube-state-metrics'
parametersSelection:
  - name: jvm.j9vm_maxHeapSize
    domain: [<LOWER_BOUND>, <UPPER_BOUND>]
  - name: jvm.j9vm_minHeapSize
    domain: [<LOWER_BOUND>, <UPPER_BOUND>]

  - name: jvm.j9vm_minFreeHeap
    domain: [0.1, 0.6]
  - name: jvm.j9vm_maxFreeHeap
    domain: [0.3, 0.9]
jvm.j9vm_minHeapSize <= jvm.j9vm_maxHeapSize
jvm.j9vm_minFreeHeap + 0.05 < jvm.j9vm_maxFreeHeap
parametersSelection:
  - name: jvm.j9vm_gcPolicy
    categories: [balanced, gencon, metronome, optavgpause, optthruput]

  - name: jvm.j9vm_gcThreads
    domain: [1, <MAX_CPUS>]

  - name: jvm.j9vm_gcCompact
parametersSelection:
  - name: jvm.j9vm_compilationThreads
    domain: [1, <MAX_CPUS>]

  - name: jvm.j9vm_codeCacheTotal
    domain: [2, <UPPER_BOUND>]
parametersSelection:
  - name: j9vm_aggressiveOpts
Kubernetes optimization pack
FileConfigurator operator
Executor operator
File Configurator operator
Executor operator
Prometheus provider.
cAdvisor
kube-state-metrics
page
Knowledge Base
JVM Heap
JVM Garbage Collection
JVM compilation
JVM aggressive optimization

Analyzing results of live optimization studies

Even for live optimization studies, it is a good practice to analyze how the optimization is being executed with respect to the defined goal & constraints, and workloads.

This analysis may provide useful insights about the system being optimized (e.g. understanding of the system dynamics) and about the optimization study itself (e.g. how to adjust optimizer options or change constraints). Since this is more challenging for an environment that is being optimized live, a common practice to adopt a recommendation mode before possibly switching to a fully autonomous mode.

The Akamas UI displays the results of an offline optimization study in the following areas:

  • the Metrics section (see the following figures) displays the behavior of the metrics as configurations are recommended and applied (possibly after being reviewed and approved by users); this area supports the analysis of how the optimizer is driven by the configured safety and exploration factors.

  • The All Configurations section provides the list of all the recommended configurations, possibly as modified by the user, as well as the detail of each applied configuration (see the following figures).

  • in the case of a recommendation mode, the Pending configuration section (see the following figure) shows the configuration that is being recommended to allow users to review it (see the EDIT toggle) and approve it:

Optimizing PostgreSQL

When optimizing a PostgreSQL instance, typically the goal is one of the following:

  • Throughput optimization: increasing the number of transactions

  • Cost optimization: minimize resource consumption according to a typical workload, thus cutting costs

Workflow

Applying parameters

A typical optimization process involves the following steps:

  1. Configure PostgreSQL parameters

  2. Restore DB data

  3. Restart PostgreSQL and wait for the initialization

  4. Run benchmark

  5. Parse results

Please note that most PostgreSQL parameters do not need an application restart.

Please refer to the for the list of component types, parameters, metrics, and constraints.

Akamas offers many operators that you can use to apply the parameters for the tuned PostgreSQL instances. In particular, we suggest using the for parameters templating and configuration, and the for restoring DB data and launching scripts.

PostgreSQL optimization pack
FileConfigurator operator
Executor operator

Integrating Akamas

  • Configuration Management tools providing the ability to set tunable parameters for the system to be optimized - this integration applies to both offline and live optimization studies;

  • Value Stream Delivery tools to implement a continuous optimization process as part of a CI/CD pipeline - this integration applies to both offline and live optimization studies;

  • Load Testing tools used to reproduce a synthetic workload on the system to be optimized; notice that these tools may also act as Telemetry Providers (e.g. for end-user metrics) - this integration only applies to offline optimization studies.

These integrations may require some setup on both the tool and the Akamas side and may also involve defining workflows and making use of workflow operators.

Akamas provides the following areas of integration with your ecosystem, which may apply or not depending on whether you are running or :

Telemetry Providers tools providing time series for metrics of interest for the system to be optimized (see also ) - this integration applies to both offline and live optimization studies;

live optimization studies
offline optimization studies
Telemetry Providers

Optimizing Spark

When optimizing applications running on the Apache Spark framework, the goal is to find the configurations that best optimize the allocated resources or the execution time.

Workflows

Applying parameters

Other solutions include:

A typical workflow

You can organize a typical workflow to optimize a Spark application in three parts:

  1. Setup the test environment

    1. prepare any required input data

    2. apply the Spark configuration parameters, if you are going for a file-based solution

  2. Execute the Spark application

  3. Perform cleanup

name: Spark workflow
tasks:
   - name: cwspark
     arguments:
        master: yarn
        deployMode: cluster
        file: /home/hadoop/scripts/pi.py
        args: [ 100 ]L

Telemetry Providers

Here’s a configuration example for a telemetry provider instance:

provider: SparkHistoryServer
config:
  address: sparkmaster.akamas.io
  port: 18080

Examples

Please refer to the for the list of component types, parameters, metrics, and constraints.

Akamas offers several operators that you can use to apply the parameters for the tuned Spark application. In particular, we suggest using the , which connects to a target instance to submit the application using the configuration parameters to test.

the , which allows submitting the application along with the configuration parameters using the

the standard , which allows running a custom command or script once the updated the default Spark configuration file or a custom one using a template.

Here’s an example of a typical workflow where Akamas executes the Spark application using the :

Akamas can access statistics using the . This provider maps the metrics in this optimization pack to the statistics provided by the History Server endpoint.

See this for an example of a study leveraging the Spark pack.

Spark optimization pack
Spark SSH Submit operator
Spark Livy Operator
Livy Rest interface
Executor operator
FileConfigurator operator
Spark SSH Submit operator
Spark History Server
Spark History Server Provider
page

Optimizing Oracle Database

When optimizing a MongoDB instance, typically the goal is to maximize the throughput of an Oracle-backed application or to minimize its resource consumption, thus reducing costs.

Workflows

Applying parameters

Oracle Configurator

name: Update Oracle parameters
operator: OracleConfigurator
arguments:
  component: oracledb

File Configurator and Executor

tasks:
  - name: Generate Oracle configuration
    operator: FileConfigurator
    arguments:
      sourcePath: /home/akamas/oraconf.template
      targetPath: /home/akamas/oraconf
      component: oracledb

  - name: Update conf
    operator: Executor
    arguments:
      command: bash /home/akamas/oraconf/api_update_db_conf.sh /home/akamas/oraconf
      component: oracleML

A typical workflow

The optimization of an Oracle database usually includes the following tasks in the workflow, as implemented in the example below:

  1. Apply the Oracle configuration suggested by Akamas and restart the instance if needed (Update parameters task).

  2. Perform any additional warm-up task that may be required to bring the database up at the operating regime (Execute warmup task).

  3. Execute the workload targeting the database or the front-end in front of it (Execute performance test task).

  4. Restore the original state of the database in order to guarantee the consistency of further tests, removing any dirty data added by the workload and possibly flushing the database caches (Cleanup task).

The following is the complete YAML configuration file of the workflow described above:

name: workflow
description: Test Oracle instance configuration.
tasks:

  - name: Update parameters
    operator: OracleConfigurator
    arguments:
      component: oracledb

  - name: Execute warmup
    operator: Executor
    arguments:
      host:
        hostname: perf.mycompany.com
        key: ...
        username: perf
      command: /home/perf/warmup.sh

  - name: Execute performance test
    operator: Executor
    arguments:
      host:
        hostname: perf.mycompany.com
        key: ...
        username: perf
      command: /home/perf/start.sh

  - name: Cleanup
    operator: OracleExecutor
    arguments:
      sql:
        - TRUNCATE TABLE user_actions
      component: oracledb

Telemetry Providers

[[metric]]
context = "sessions"
labels = [ "status", "type" ]
metricsdesc = { value= "Gauge metric with count of sessions by status and type." }
request = "SELECT status, type, COUNT(*) as value FROM v$session GROUP BY status, type"

The following example shows how to configure a telemetry instance for a Prometheus provider in order to query the data points extracted from the exporter described above:

provider: Prometheus
config:
  address: akamas.mycompany.com
  port: 9090

metrics:
  - metric: sessions_active_user
    datasourceMetric: oracledb_sessions_value{instance='$INSTANCE$', type='USER', status='ACTIVE', %FILTERS%}

  - metric: sessions_inactive_user
    datasourceMetric: oracledb_sessions_value{instance='$INSTANCE$', type='USER', status='INACTIVE', %FILTERS%}

Examples

Optimizing MongoDB

When optimizing a MongoDB instance, typically the goal is one of the following:

  • Throughput optimization - increasing the capacity of a MongoDB deployment to serve clients

  • Cost optimization - decreasing the size of a MongoDB deployment while guaranteeing the same service level

To reach such goals, it is recommended to tune the parameters that manage the cache, which is of the elements that impact performances the most, in particular those parameters that control the lifecycle and the size of the MongoDB’s cache.

  • The number of documents inserted in the database per second

  • The number of active connections

Workflows

Applying parameters

FileConfigurator and Executor operator

You can leverage the FileConfigurator by creating a template file on a remote host that contains some scripts to configure MongoDB with placeholders that will be replaced with the values of parameters tuned by Akamas.

Here’s an example of the aforementioned template file:

You can leverage the FileConfigurator by creating a template file on a remote host that contains some scripts to configure MongoDB with placeholders that will be replaced with the values of parameters tuned by Akamas. Once the FileConfigurator has replaced all the tokens, you can use the Executor operator to actually execute the script to configure MongoDB.

A typical workflow

A typical workflow to optimize a MongoDB deployment can be structured in three parts:

  1. Configure MongoDB

  2. Test the performance of the application

  3. Prepare test results (optional)

  4. Cleanup

Finally, when running performance experiments on a database, is common practice to execute some cleanup tasks at the end of the test to restore the database initial condition and avoid impacting subsequent tests.

Here’s an example of a typical workflow for a MongoDB deployment, which uses the YCSB benchmark to run performance tests:

Telemetry providers

Here’s an example of a telemetry providers instance that uses Prometheus to extract all the MongoDB metrics defined in this optimization pack:

Examples

Install CSV provider

To install the CSV File provider, create a YAML file (called provider.yml in this example) with the specification of the provider:

Then, you can then install the provider with the Akamas CLI:

Dynatrace provider

The Dynatrace provider collects metrics from Dynatrace and makes them available to Akamas.

This provider includes support for several technologies. In any case, custom queries can be defined to gather the desired metrics.

Supported versions

Dynatrace SaaS/Managed version 1.187 or later

Supported component types:

  • Kubernetes and Docker

  • Web Application

  • Ubuntu-16.04, Rhel-7.6

  • java-openjdk-8, java-openjdk-11

  • java-ibm-j9vm-6, java-ibm-j9vm-8, java-eclipse-openj9-11

Prerequisites

This section provides the minimum requirements that you should match before using the Prometheus provider.

  • Dynatrace SaaS/Managed version 1.187 or later

  • A valid Dynatrace license

  • Dynatrace OneAgent installed on the servers where the Dynatrace entities to be monitored are running

  • Connectivity between Akamas and the Dynatrace server on port 443

Dynatrace Token

The Dynatrace provider needs a Dynatrace API token with the following privileges:

  • metrics.read (Read metrics)

  • entities.read (Read entities and tags)

  • DataExport (Access problem and event feed, metrics, and topology)

  • ReadSyntheticData (Read synthetic monitors, locations, and nodes)

  • DataImport (Data ingest, e.g.: metrics and events). This permission is used to inform Dynatrace about configuration changes.

Component configuration

To instruct Akamas from which Dynatrace entities (e.g. Workloads, Services, Process Groups) metrics should be collected you can some specific properties on components.

Different strategies can be used to map Dynatrace entities to Akamas components:

  • By id

  • By name

  • By tags

  • By Kubernetes properties

By id

You can map a component to a Dynatrace entity by leveraging the unique id of the entity, which you should put under the id property in the component. This strategy is best used for long-lived instances whose ID does not change during the optimization such as Hosts, Process Groups or Services.

Here is an example of how to setup host monitoring via id:

You can find the id of a Dynatrace entity by looking at the URL of a Dynatrace dashboard relative to the entity. Watch out that the "host" key is valid only for Linux components, other components (e.g. the JVM) require to drill down into the host entities to get the PROCESS_GROUP_INSTANCE or PROCESS_GROUP id.

By name

You can map a component to a Dynatrace entity by leveraging the entity’s display name. This strategy is similar to the map by id but provides a more friendly way to identify the mapped entity. Beware that id multiple entities in your Dynatrace installation share the same name they will all be mapped to the same component. The Dynatrace display name should be put under the name property in the component definition:

By tags

You can map a component to a Dynatrace entity by leveraging Dynatrace tags that match the entity, tags which you should put under the tags property in the component definition.

If multiple tags are specified, instances matching any of the specified tags will be selected.

This sample configuration maps to the component all Dynatrace entities with tag environment: test or [AWS]dynatrace-monitored: true

Dynatrace supports both key-value and key-only tags. Key-only tags can be specified as Key-value tags with an empty value as in the following example

By Kubernetes properties

You can map a component to a Dynatrace entity referring to a Kubernetes cluster (e.g. a Pod or a Container) by leveraging dedicated properties.

Container

In order to properly identify the set of containers to be mapped, you can specify the following properties. Any container matching all the properties will be mapped to the component.

You can retrieve all the information to setup the properties on the top of the Dynatrace container dashboard.

The following example shows how to map a component to a container running in Kubernetes:

Pod

In order to properly identify the set of pods to be mapped, you can specify the following properties. Any pod matching all the properties will be mapped to the component.

If you need to further narrow your pod selection you can also specify a set of tags as described in the by tags. Note that tags for Kubernetes resources are called Labels in the Dynatrace dashboard.

Labels are specified as key-value in the Akamas configuration. In Dynatrace’s dashboard key and value are separated by a column (:)

Example

The following example shows how to map a component to a pod running in Kubernetes:

Container, Pod, or Workload?

Please note, that when you are mapping components to Kubernetes entities the property type is required to instruct Akamas on which type of entity you want to map. Dynatrace maps Kubernetes entities to the following types:

Improve component mapping with type

You can improve the matching of components with Dynatrace by adding a type property in the component definition, this property will help the provider match only those Dynatrace entities of the given type.

The type of an entity can be retrieved from the URL of the entity’s dashboard

Available entities types can be retrieved, from your Dynatrace instance, with the following command:

Integrating Telemetry Providers

Akamas supports the integration with virtually any telemetry and observability tool.

Supported Telemetry Providers

The following table describes the supported Telemetry Providers, which are created automatically at installation time.

Notice that Telemetry Providers are shared across all the workspaces within the same Akamas installation, and only users with administrative privileges can manage them.

CSV provider

The CSV provider collects metrics from CSV files and makes them available to Akamas. It offers a very versatile way to integrate custom data sources.

Prerequisites

This section provides the minimum requirements that you should match before using the CSV File telemetry provider.

Network requirements

The following requirements should be met to enable the provider to gather CSV files from remote hosts:

  • Port 22 (or a custom one) should be open from Akamas installation to the host where the files reside.

  • The host where the files reside should support SCP or SFTP protocols.

Permissions

  • Read access to the CSV files target of the integration

Akamas supported version

  • Versions < 2.0.0 are compatibile with Akamas until version 1.8.0

  • Versions >= 2.0.0 are compatible with Akamas from version 1.9.0

Supported component types

The CSV File provider is generic and allows integration with any data source, therefore it does not come with support for a specific component type.

Setup the data source

To operate properly, the CSV file provider expects the presence of four fields in each processed CSV file:

  • A timestamp field used to identify the point in time a certain sample refers to.

  • A component field used to identify the Akamas entity.

  • A metric field used to identify the name of the metric.

  • A value field used to store the actual value of the metric.

These fields can have custom names in the CSV file, you can specify them in the provider configuration.

Optimizing MySQL Database

When optimizing a MySQL instance, typically the goal is one of the following:

  • Throughput optimization: increasing the capacity of a MySQL deployment to serve clients

  • Cost optimization: decreasing the size of a MySQL deployment while guaranteeing the same service level

Workflows

Applying parameters

Usually, MySQL parameters are configured by writing them in the MySQL configuration file, typically called my.cnf, and located under /etc/mysql/ on most Linux systems.

In order to preserve the original config file intact, it is best practice to use additional configuration files, located in /etc/mysql/conf.d to override the default parameters. These files are automatically read by MySQL.

FileConfigurator and Executor operator

A typical workflow

A typical workflow to optimize a MySQL deployment can be structured in three parts:

  1. Configure MySQL

  2. Restart MySQL

  3. Test the performance of the application

  4. Prepare test results

Finally, when running performance experiments on databases is common practice to do some cleanup tasks at the end of the test to restore the database's initial condition to avoid impacting subsequent tests.

Here’s an example of a typical workflow for MySQL, which uses the OLTP Resourcestresser benchmark to run performance tests

Telemetry providers

Here’s an example of a telemetry providers instance that uses Prometheus to extract all the MySQL metrics defined in this optimization pack:

Examples

Create Dynatrace telemetry instances

The installed provider is shared with all users of your Akamas installation and can monitor many different systems, by configuring appropriate telemetry provider instances.

To create an instance of the Dynatrace provider, build a YAML file (instance.yml in this example) with the definition of the instance:

Then you can create the instance for the system using the Akamas CLI:

Configuration options

When you create an instance of the Dynatrace provider, you should specify some configuration information to allow the provider to correctly extract and process metrics from Dynatrace.

You can specify configuration information within the config part of the YAML of the instance definition.

Required properties

Collect additional metrics

You can collect additional metrics with the Dynatrace provider by using the metrics field:

Configure a proxy for Dynatrace

In the case in which Akamas cannot reach directly your Dynatrace installation, you can configure an HTTP proxy by using the proxy field:

Telemetry instance reference

This section reports the complete reference for the definition of a telemetry instance.

This table shows the reference for the config section within the definition of the Dynatrace provider instance:

Proxy options reference

This table reports the reference for the config → proxy section within the definition of the Dynatrace provider instance:

Metrics options reference

This table reports the reference for the metrics section within the definition of the Dynatrace provider instance. The section contains a collection of objects with the following properties:

Use cases

This section reports common use cases addressed by this provider.

Collect system metrics

Check the Linux optimization pack for a list of all the system metrics available in Akamas.

As a second step, choose a strategy to map your Linux component (MyLinuxComponent) with the corresponding Dyntrace entity.

Let’s assume you want to map by id your Dynatrace entity, you can find the id in the URL bar of a Dyntrace dashboard of the entity:

Grab the id and add it to the Linux component definition:

You can leverage the name of the entity as well:

As a third and final step, once the component is all set, you can create an instance of the Dynatrace provider and then build your first studies:

Create CSV telemetry instances

To create an instance of the CSV provider, build a YAML file (instance.yml in this example) with the definition of the instance:

Then you can create the instance for the system using the Akamas CLI:

timestampFormat format

Regarding the timestamp format, please notice that while the week-year format YYYY is compliant with the ISO-8601 specification, but you should replace it with the year-of-era format yyyy if you are specifying a timestampFormat different from the ISO one. For example:

  • Correct: yyyy-MM-dd HH:mm:ss

  • Wrong: YYYY-MM-dd HH:mm:ss

Configuration options

When you create an instance of the CSV provider, you should specify some configuration information to allow the provider to correctly extract and process metrics from your CSV files.

You can specify configuration information within the config part of the YAML of the instance definition.

Required properties

  • address - a URL or IP identifying the address of the host where CSV files reside

  • username - the username used when connecting to the host

  • authType - the type of authentication to use when connecting to the file host; either password or key

  • auth - the authentication credential; either a password or a key according to authType. When using keys, the value can either be the value of the key or the path of the file to import from

  • remoteFilePattern - a list of remote files to be imported

Optional properties

  • protocol - the protocol to use to retrieve files; either scp or sftp. Default is scp

  • fieldSeparator - the character used as a field separator in the csv files. Default is ,

  • componentColumn - the header of the column containing the name of the component. Default is COMPONENT

  • timestampColumn - the header of the column containing the timestamp. Default is TS

  • timestampFormat - the format of the timestamp (e.g. yyyy-MM-dd HH:mm:ss zzz). Default is YYYY-MM-ddTHH:mm:ss

You should also specify the mapping between the metrics available in your CSV files and those provided by Akamas. This can be done in the metrics section of the telemetry instance configuration. To map a custom metric you should specify at least the following properties:

  • metric - the name of a metric in Akamas

  • datasourceMetric - the header of a column that contains the metric in the CSV file

The provider ignores any column not present as datasourceMetric in this section.

The sample configuration reported in this section would import the metric cpu_util from CSV files formatted as in the example below:

Telemetry instance reference

The following represents the complete configuration reference for the telemetry provider instance.

The following table reports the configuration reference for the config section

The following table reports the configuration reference for the metrics section

Use cases

Here you can find common use cases addressed by this provider.

Linux SAR

Note that the metrics are percentages (between 1 and 100), while Akamas accepts percentages as values between 0 and 1, therefore each metric in this configuration has a scale factor of 0.001.

You can import the two CPU metrics and the memory metric from a SAR log using the following telemetry instance configuration.

Using the configured instance, the CSV File provider will perform the following operations to import the metrics:

  1. Retrieve the file "/csv/sar.csv" from the server "127.0.0.1" using the SCP protocol authenticating with the provided password.

  2. Use the column hostname to lookup components by name.

  3. Use the column timestamp to find the timestamps of the samples (that is expected to be in the format specified by timestampFormat).

  4. Collect the metrics (two with the same name, but different labels, and one with a different name):

    • cpu_util: in the CSV file is in the column %user and attach to its samples the label "mode" with value "user".

    • cpu_util: in the CSV file is in the column %system and attach to its samples the label "mode" with value "system".

    • mem_util: in the CSV file is in the column %memory.

Please refer to the for the list of component types, parameters, metrics, and constraints.

One common way to configure Oracle parameters is through the execution ALTER SYSTEM statements on the database instance: to automate the execution of this task Akamas provides the . For finer control, Akamas provides the , which allows building custom statements in a script file that can be executed by the .

The allows the workflow to configure an on-premise instance with minimal configuration. The following snippet is an example of a configuration task, where all the connection arguments are already defined in the referenced component:

Most cloud providers offer web APIs as the only way to configure database services. In this case, the can submit an API request through a custom executable using a configuration file generated by a . The following is an example workflow where a FileConfigurator task generates a configuration file (oraconf), followed by an Executor task that parses and submits the configuration to the API endpoint through a custom script (api_update_db_conf.sh):

Akamas offers many telemetry providers to extract Oracle Database metrics; one of them is the , which we can use to query Oracle Database metrics collected by a Prometheus instance via the .

The snippet below shows a configuration example for the Oracle Exporter extracting metrics regarding the Oracle sessions:

See and for examples of studies leveraging the Oracle Database pack.

Even though it is possible to evaluate performance improvements of MongoDB by looking at the business application that uses it as its database, looking at the end-to-end throughput or response time, or using a performance test like , the optimization pack provides internal MongoDB metrics that can shed a light too on how MongoDB is performing, in particular in terms of throughput, for example:

Please refer to the for the list of component types, parameters, metrics, and constraints.

Akamas offers many operators that you can use to apply freshly tuned configuration parameters to your MongoDB deployment. In particular, we suggest using the to create a configuration script file and the ExecutorOperator to execute it and thus apply the parameters.

Use the to specify an input and an output template file. The input template file is used to specify how to interpolate MongoDB parameters into a script, and the output file contains the actual configuration.

Use the operator to reconfigure MongoDB exploiting the output file produced in the previous step. You may need to restart MongoDB depending on the configuration parameters you want to optimize.

Either use the operator or the operator to verify that the application is up and running and has finished any initialization logic (this step may not be necessary)

Use available to execute a performance test against the application

If Akamas does not already automatically import performance test metrics, then you can use available to extract test results and make them available to Akamas (for example, you can use an to launch a script that produces a CSV of the test results that Akamas can consume using the )

Use available to bring back MongoDB into a clean state to avoid impacting subsequent tests

Akamas offers many telemetry providers to extract MongoDB metrics; one of them is the which we can use to query MongoDB metrics collected by a Prometheus instance via the .

See the page for an example of a study leveraging the MongoDB pack.

Refer to to see how component-types metrics are extracted by this provider.

A Dynatrace API token with the privileges described .

To generate an API Token for your Dynatrace installation you can follow .

Akamas property
Dynatrace property
Location
Akamas property
Dynatrace property
Location
Kubernetes type
Dynatrace type
Telemetry Provider
Description

The page describes how to get this Telemetry Provider installed. Once installed, this provider is shared with all users of your Akamas installation and can be used to monitor many different systems, by configuring appropriate telemetry provider instances as described in the page.

Please refer to the for the list of component types, parameters, metrics, and constraints.

You can leverage the by creating a template file on a remote host that contains some scripts to configure MySQL with placeholders that will be replaced with the values of parameters tuned by Akamas. When all the placeholders in FileConfigurator get replaced, the operator can be used to actually execute the script to configure and restart the database

Use the to specify an input and an output template file. The input template file is used to specify how to interpolate MySQL parameters into a configuration file, and the output file is used to contain the result of the interpolation.

Use the to restart MySQL allowing it to load the new configuration file produced in the previous step.

Optionally, use the to verify that the application is up and running and has finished any initialization logic.

Use any of the to perform a performance test against the application.

Use any of the to organize test results so that they can be imported into Akamas using the supported (see also section here below).

Akamas can access MySQL metrics using the This provider can be leveraged to query MySQL metrics collected by a Prometheus instance via the .

This and this describe an example of how to leverage the MySQL optimization pack.

url - URL of the Dynatrace installation API (see to retrieve the URL of your installation)

token - A Dynatrace API Token with the

Field
Type
Value restrictions
Required
Default Value
Description
Field
Type
Value restrictions
Required
Default value
Description
Field
Type
Value Restrictions
Required
Default value
Description

As a first step to start extracting metrics from Dyntrace, and make sure it has the right permissions.

You can find detailed information on timestamp patterns in the Patterns for Formatting and Parsing section on the page.

Field
Type
Description
Default Value
Restrictions
Required
Field
Type
Description
Restrictions
Required

In this use case, you are going to import some metrics coming from , a popular UNIX tool to monitor system resources. SAR can export CSV files in the following format.

Oracle Database optimization pack
OracleConfigurator operator
FileConfigurator operator
Executor operator
OracleConfigurator operator
Executor operator
FileConfigurator operator
Prometheus provider
Prometheus Oracle Exporter
toml
Optimizing an Oracle Database server instance
Optimizing an Oracle Database for an e-commerce service
#!/bin/sh

cd "$(dirname "$0")" || exit

CACHESIZE=${mongo.mongodb_cache_size}
SYNCDELAY=${mongo.mongodb_syncdelay}
EVICTION_DIRTY_TRIGGER=${mongo.mongodb_eviction_dirty_trigger}
EVICTION_DIRTY_TARGET=${mongo.mongodb_eviction_dirty_target}
EVICTION_THREADS_MIN=${mongo.mongodb_eviction_threads_min}
EVICTION_THREADS_MAX=${mongo.mongodb_eviction_threads_max}
EVICTION_TRIGGER=${mongo.mongodb_eviction_trigger}
EVICTION_TARGET=${mongo.mongodb_eviction_target}
USE_NOATIME=${mongo.mongodb_datafs_use_noatime}

# Here we have to remount the disk mongodb uses for data, to take advantage of the USE_NOATIME parameter

sudo service mongod stop
sudo umount /mnt/mongodb
if [ "$USE_NOATIME" = true ]; then
        sudo mount /dev/nvme0n1 /mnt/mongodb -o noatime
else
        sudo mount /dev/nvme0n1 /mnt/mongodb
fi
sudo service mongod start

# flush logs
echo -n | sudo tee /mnt/mongodb/log/mongod.log
sudo service mongod restart

until grep -q "waiting for connections on port 27017" /mnt/mongodb/log/mongod.log
do
        echo "waiting MongoDB..."
        sleep 60
done

sleep 5
sudo service prometheus-mongodb-exporter restart
# set knobs
mongo --quiet --eval "db.adminCommand({setParameter:1, 'wiredTigerEngineRuntimeConfig': 'cache_size=${CACHESIZE}m, eviction=(threads_min=$EVICTION_THREADS_MIN,threads_max=$EVICTION_THREADS_MAX), eviction_dirty_trigger=$EVICTION_DIRTY_TRIGGER, eviction_dirty_target=$EVICTION_DIRTY_TARGET', eviction_trigger=$EVICTION_TRIGGER, eviction_target=$EVICTION_TARGET})"
mongo --quiet --eval "db = db.getSiblingDB('admin'); db.runCommand({ setParameter : 1, syncdelay: $SYNCDELAY})"

sleep 3
name: "ycsb_mongo_workflow"
tasks:
  - name: "configure mongo"
    operator: "FileConfigurator"
    arguments:
      sourcePath: "/home/ubuntu/mongo/templates/mongo_launcher.sh.templ"
      targetPath: "/home/ubuntu/mongo/launcher.sh"
      component: "mongo"

  - name: "launch mongo"
    operator: "Executor"
    arguments:
      command: "bash /home/ubuntu/mongo/launcher.sh 2>&1 | tee -a /tmp/log"
      component: "mongo"

  - name: "launch ycsb"
    operator: "Executor"
    arguments:
      command: "bash /home/ubuntu/ycsb/launch_load.sh 2>&1 | tee -a /tmp/log"
      component: "mongo_ycsb"

  - name: "parse ycsb"
    operator: "Executor"
    arguments:
      command: "python /home/ubuntu/ycsb/parser.py"
      component: "mongo_ycsb"
  - name: "clean mongo"
    operator: "Executor"
    arguments:
      command: "bash /home/ubuntu/clean_mongodb.sh"
      component: "mongo"
provider: "Prometheus"
config:
  address: "prometheus.mycompany.com"
  port: 9090

metrics:
  - metric: "mongodb_connections_current"
    datasourceMetric: "mongodb_connections{instance="$INSTANCE$"}"
    labels: ["state"]
  - metric: "mongodb_heap_used"
    datasourceMetric: "mongodb_extra_info_heap_usage_bytes{instance="$INSTANCE$"}"
  - metric: "mongodb_page_faults_total"
    datasourceMetric: "rate(mongodb_extra_info_page_faults_total{instance="$INSTANCE$"}[$DURATION$])"
  - metric: "mongodb_global_lock_current_queue"
    datasourceMetric: "mongodb_global_lock_current_queue{instance="$INSTANCE$"}"
    labels: ["type"]
  - metric: "mongodb_mem_used"
    datasourceMetric: "mongodb_memory{instance="$INSTANCE$"}"
    labels: ["type"]
  - metric: "mongodb_documents_inserted"
    datasourceMetric: "rate(mongodb_metrics_document_total{instance="$INSTANCE$", state="inserted"}[$DURATION$])"
  - metric: "mongodb_documents_updated"
    datasourceMetric: "rate(mongodb_metrics_document_total{instance="$INSTANCE$", state="updated"}[$DURATION$])"
  - metric: "mongodb_documents_deleted"
    datasourceMetric: "rate(mongodb_metrics_document_total{instance="$INSTANCE$", state="deleted"}[$DURATION$])"
  - metric: "mongodb_documents_returned"
    datasourceMetric: "rate(mongodb_metrics_document_total{instance="$INSTANCE$", state="returned"}[$DURATION$])"
# CSV File Telemetry Provider
name: CSV File
description: Telemetry Provider that enables to import of metrics from a remote CSV file
dockerImage: 485790562880.dkr.ecr.us-east-2.amazonaws.com/akamas/telemetry-providers/csv-file-provider:3.0.2
akamas install telemetry-provider provider.yml
name: My Host
properties:
 dynatrace:
  id: HOST-12345YUAB1
name: MyComponent
properties:
 dynatrace:
  name: host-1
name: MyComponent
properties:
 dynatrace:
  tags:
     environment: test
     [AWS]dynatrace-monitored: true
name: MyComponent
properties:
 dynatrace:
  tags:
     myKeyOnlyTag: ""

namespace

Kubernetes namespace

Container dashboard

containerName

Kubernetes container name

Container dashboard

basePodName

Kubernetes base pod name

Container dashboard

dynatrace:
  type: CONTAINER_GROUP_INSTANCE
  kubernetes:
    namespace: boutique
    containerName: server
    basePodName: ak-frontend-*

state

State

Pod dashboard

namespace

Namespace

Pod dashboard

workload

Workload

Pod dashboard

dynatrace:
  type: CLOUD_APPLICATION_INSTANCE
  namePrefix: ak-frontend-
  kubernetes:
    labels:
      workload: ak-frontend
      product: hipstershop

Docker container

CONTAINER_GROUP_INSTANCE

Pod

CLOUD_APPLICATION_INSTANCE

Workload

CLOUD_APPLICATION

Namespace

CLOUD_APPLICATION_NAMESPACE

Cluster

KUBERNETES_CLUSTER

name: MyComponent
properties:
 dynatrace:
  type: SERVICE     # here the type helps the mapping by tags by filtering down entities that are only services
  tags:
     environment: test
     "[AWS]dynatrace-monitored": true
curl 'https://<Your Dynatrace host>/api/v2/entityTypes/?pageSize=500' --header 'Authorization: Api-Token <API-TOKEN>'
name: OptimizeMySQL
tasks:

  - name: Configure MySQL
    operator: FileConfigurator
    arguments:
      component: mysql

  - name: Restart MySQL
    operator: Executor
    arguments:
      command: "/mysql/restart-mysql-container.sh"
      component: mysql

  - name: test
    operator: Executor
    arguments:
      command: "cd /home/ubuntu/oltp/oltpbench && ./oltpbenchmark --bench resourcestresser --config /home/ubuntu/oltp/resourcestresser.xml --execute=true -s 5 --output out"
      component: OLTP

  - name: Parse csv results
    operator: Executor
    arguments:
      command: "bash /home/ubuntu/oltp/scripts/parse_csv.sh"
      component: OLTP
provider: prometheus
config:
  address: mysql.mydomain.com
  port: 9090
  job: mysql_exporter
# Dynatrace Telemetry Provider Instance
provider: Dynatrace
config:
  url: https://wuy711522.live.dynatrace.com
  token: XbERgThisIsAnExampleToken
akamas create telemetry-instance instance.yml system
config:
  url: https://wuy71982.live.dynatrace.com
  token: XbERgkKeLgVfDI2SDwI0h
metrics:
- metric: "akamas_metric"                     # extra akamas metrics to monitor
  datasourceMetric: builtin:host:new_metric   # query to execute to extract the metric
  labels:
  - "method"      # the "method" label will be retained within akamas
config:
  url: https://wuy71982.live.dynatrace.com
  token: XbERgkKeLgVfDI2SDwI0h
  proxy:
    address: https://dynaproxy  # the URL of the HTTP proxy
    port: 9999                  # the port the proxy listens to
provider: Dynatrace  # this is an instance of the <name> provider
config:
  url: https://wuy71982.live.dynatrace.com
  token: XbERgkKeLgVfDI2SDwI0h
  proxy:
    address: https://dynaproxy # the URL of the HTTP proxy
    port: 9999            # the port the proxy listens to
    username: myusername  # http basic auth username if necessary
    password: mypassword  # http basic auth password if necessary
  tags:
    Environment: Test       # dynatrace tags to be matched for every component

metrics:
- metric: "cpu_usage"  # this is the name of the metric within Akamas
  # The dynatrace metric name
  datasourceMetric: "builtin:host.cpu.usage"
  extras:
      mergeEntities: true  # instruct the telemetry to aggregate the metric over multiple entities
  aggregation: avg  # The aggregation to perform if the mergeEntities property is set to true

address

String

It should be a valid URL

Yes

The URL of the HTTP proxy to use to communicate with the Dynatrace installation API

port

Number (integer)

1 <port<65535

Yes

The port at which the HTTP proxy listens for connections

username

String

No

The username to use when authenticating against the HTTP proxy, if necessary

password

String

No

The username to use when authenticating against the HTTP proxy, if necessary

metric

String

It must be an Akamas metric

Yes

The name of an Akamas metric that should map to the new metric you want to gather

datasourceMetric

String

A valid Dynatrace metric

Yes

The Dynatrace query to use to extract metric

labels

Array of strings

-

No

The list of Dynatrace labels that should be retained when gathering the metric

staticLabels

Key-Value

-

No

Static labels that will be attached to metric samples

name: MyLinuxComponent
description: this is a Linux component
properties:
  dynatrace:
    id: HOST-A987D45512ABCEEE
name: MyLinuxComponent
description: this is a Linux component
properties:
  dynatrace:
    name: Host1
name: Dynatrace
config:
  url: https://my_dyna_installation_url
  token: MY_DYNA_TOKEN
# CSV Telemetry Provider Instance
provider: CSV File
config:
 address: host1.example.com
 authType: password
 username: akamas
 auth: akamas
 remoteFilePattern: /monitoring/result-*.csv
 componentColumn: COMPONENT
 timestampColumn: TS
 timestampFormat: YYYY-MM-dd'T'HH:mm:ss
metrics:
 - metric: cpu_util
   datasourceMetric: user%
akamas create telemetry-instance instance.yml system
TS,                   COMPONENT,  user%
2020-04-17T09:46:30,  host,       20
2020-04-17T09:46:35,  host,       23
2020-04-17T09:46:40,  host,       32
2020-04-17T09:46:45,  host,       21
provider: CSV File            # this is an instance of the Csv provider
config:
 address: host1.example.com   # the adress of the host with the csv files
 port: 22                     # the port used to connect
 authType: password           # the authentication method
 username: akamas             # the username used to connect
 auth: akamas                 # the authentication credential
 protocol: scp                # the protocol used to retrieve the file
 fieldSeparator: ","          # the character used as field separator in the csv files
 remoteFilePattern: /monitoring/result-*.csv    # the path of the csv files to import
 componentColumn: COMPONENT                     # the header of the column with component names
 timestampColumn: TS                            # the header of the column with the time stamp
 timestampFormat: YYYY-mm-ddTHH:MM:ss         # the format of the timestamp
metrics:
 - metric: cpu_util                             # the name of the Akamas metric
   datasourceMetric: user%                      # the header of the column with the original metric
   staticLabels:
    mode: user                                  # (optional) additional labels to add to the metric

metric

String

The name of the metric in Akamas

An existing Akamas metric

Yes

datasourceMetric

String

The name (header) of the column that contains the specific metric

An existing column in the CSV file

Yes

scale

Decimal number

The scale factor to apply when importing the metric

staticLabels

List of key-value pairs

A list of key-value pairs that will be attached to the specific metric sample

No

hostname, interval,     timestamp, 		        %user,	%system,      %memory
machine1, 600,		2018-08-07 06:45:01 UTC,	30.01,	20.77,		96.21
machine1, 600,		2018-08-07 06:55:01 UTC,	40.07,	13.00,		84.55
machine1, 600,		2018-08-07 07:05:01 UTC,	5.00,	90.55,		89.23
provider: CSV File
config:
  remoteFilePattern: /csv/sar.csv
  address: 127.0.0.1
  port: 22
  username: user123
  auth: password123
  authType: password
  protocol: scp
  componentColumn: hostname
  timestampColumn: timestamp
  timestampFormat: yyyy-MM-dd HH:mm:ss zzz
metrics:
- metric: cpu_util
  datasourceMetric: %user
  scale: 0.001
  staticLabels:
    mode: user
- metric: cpu_util
  datasourceMetric: %system
  scale: 0.001
  staticLabels:
    mode: system
- metric: mem_util
  scale: 0.001
  datasourceMetric: %memory

Install Prometheus provider

To install the Prometheus provider, create a YAML file (provider.yml in this example) with the definition of the provider:

name: Prometheus
description: Telemetry Provider that enables to import of metrics from Prometheus
dockerImage: 485790562880.dkr.ecr.us-east-2.amazonaws.com/akamas/telemetry-providers/prometheus-provider:3.3.0

Then you can install the provider using the Akamas CLI:

akamas install telemetry-provider provider.yml

The installed provider is shared with all users of your Akamas installation and can monitor many different systems, by configuring appropriate telemetry provider instances.

YCSB
MongoDB optimization pack
FileConfigurator operator
FileConfigurator
Executor
Sleep
Executor
operators
operators
Executor
CSV provider
operators
Prometheus provider
MongoDB Prometheus exporter
Optimizing a MongoDB server instance
Dynatrace provider metrics mapping
these steps
Install CSV provider
Create a CSV provider instance
MySQL optimization pack
FileConfigurator operator
FileConfigurator operator
Executor operator
Executor operator
workflow operators
workflow operators
telemetry providers
Prometheus provider.
MySql Prometheus exporter
page
page
https://www.dynatrace.com/support/help/extend-dynatrace/dynatrace-api/
generate your API token
DateTimeFormatter (Java Platform SE 8)
SAR
here
proper permissions

url

String

It should be a valid URL

Yes

token

String

Yes

proxy

Object

See Proxy options reference

Yes

The specification of the HTTP proxy to use to communicate with Dynatrace.

pushEvents

String

true, false

No

true

If set to true the provider will inform dynatrace of the configuration change event which will be visible in the Dynatrace UI.

tags

Object

No

A set of global tags to match Dynatrace entities. The provider uses these tags to apply a default filtering of Dynatrace entities for every component.

address

String

The address of the machine where the CSV file resides

A valid URL or IP

Yes

port

Number (integer)

The port to connect to, in order to retrieve the file

22

1≤port≤65536

No

username

String

The username to use in order to connect to the remote machine

Yes

protocol

String

scp

scp sftp

No

authType

String

Specify which method is used to authenticate against the remote machine:

  • password: use the value of the parameter auth as a password

  • key: use the value of the parameter auth as a private key. Supported formats are RSA and DSA

password key

Yes

auth

String

A password or an RSA/DSA key (as YAML multi-line string, keeping new lines)

Yes

remoteFilePattern

String

A list of valid path for linux

Yes

componentColumn

String

The CSV column containing the name of the component.

The column's values must match (case sensitive) the name of a component specified in the System

COMPONENT

The column must exists in the CSV file

Yes

timestampColumn

String

The CSV column containing the timestamps of the samples

TS

The column must exists in the CSV file

No

timestampFormat

String

Timestamps' format

YYYY-mm-ddTHH:MM:ss

No

fieldSeparator

String

Specify the field separator of the CSV

,

, ;

No

Create Prometheus telemetry instances

To create an instance of the Prometheus provider, edit a YAML file (instance.yml in this example) with the definition of the instance:

# Prometheus Telemetry Provider Instance
provider: Prometheus

config:
 address: host1  # URL or IP of the Prometheus from which extract metrics
 port: 9090      # Port of the Prometheus from which extract metrics

Then you can create the instance for the system using the Akamas CLI:

akamas create telemetry-instance instance.yml system

Configuration options

When you create an instance of the Prometheus provider, you should specify some configuration information to allow the provider to extract and process metrics from Prometheus correctly.

You can specify configuration information within the config part of the YAML of the instance definition.

Required properties

  • address, a URL or IP identifying the address of the host where Prometheus is installed

  • port, the port exposed by Prometheus

Optional properties

  • user, the username for the Prometheus service

  • password, the user password for the Prometheus service

  • job, a string to specify the scraping job name. The default is ".*" for all scraping jobs

  • logLevel, set this to "DETAILED" for some extra logs when searching for metrics (default value is "INFO")

  • headers, to specify additional custom headers e.g: headers: "custom_key": "custom_value"

  • namespace, a string to specify the namespace

  • duration, integer to determine the duration in seconds for data collection (use a number between 1 and 3600)

  • enableHttps, boolean to enable HTTPS in Prometheus (since 3.2.6)

  • ignoreCertificates, boolean to ignore SSL certificates

  • disableConnectionCheck, boolean to disable initial connection check to Prometheus

Custom queries

The Prometheus provider allows defining additional queries to populate custom metrics or redefine the default ones according to your use case. You can configure additional metrics using the metrics field as shown in the configuration below:

config:
  address: host1
  port: 9090

metrics:
- metric: cust_metric   # extra akamas metric to monitor
  datasourceMetric: 'http_requests_total{environment=~"staging|testing|development", method!="GET"}' # query to execute to extract the metric
  labels:
  - method   # The "method" label will be retained within akamas

In this example, the telemetry instance will populate cust_metric with the results of the query specified in datasource, maintaining the value of the labels listed under labels.

Akamas placeholders

Akamas pre-processes the queries before running them, replacing special-purpose placeholders with the fields provided in the components. For example, given the following component definition:

name: jvm1
description: jvm1 for payment services
properties:
  prometheus:
    instance: service01
    job: jmx

the query sum(jvm_memory_used_bytes{instance=~"$INSTANCE$", job=~"$JOB$"}) will be expanded for this component into sum(jvm_memory_used_bytes{instance=~"service01", job=~"jmx"}). This provides greater flexibility through the templatization of the queries, allowing the same query to select the correct data sources for different components.

The following is the list of available placeholders:

Placeholder
Usage example
Component definition example
Expanded query
Description

$INSTANCE$, $JOB$

node_load1{instance=~"$INSTANCE$", job=~"$JOB$"}

node_load1{instance=~"frontend", job=~"node"}

These placeholders are replaced respectively with the instance and job fields configured in the component’s prometheus configuration.

%FILTERS%

container_memory_usage_bytes{job=~"$JOB$" %FILTERS%}

container_memory_usage_bytes{job=~"advisor", name=~"db-.*"}

This placeholder is replaced with a list containing any additional filter in the component’s definition (other than instance and job), where each field is expanded as field_name=~"field_value". This is useful to define additional label matches in the query without the need to hardcode them.

$DURATION$

rate(http_client_requests_seconds_count[$DURATION$])

rate(http_client_requests_seconds_count[30s])

$NAMESPACE$, $POD$, $CONTAINER$

1e3 * avg(kube_pod_container_resource_limits{resource="cpu", namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%})

1e3 * avg(kube_pod_container_resource_limits{resource="cpu", namespace=~"boutique", pod=~"adservice.*", container=~"server"})

These placeholders are used within kubernetes environments

Example

prometheus:
  instance: frontend
  job: node

Use cases

This section reports common use cases addressed by this provider.

Collect Kubernetes metrics

To gather kubernetes metrics, the following exporters are required:

  • kube-state-metrics

  • cadvisor

As an example, you can define a component with type Kubernetes Container in this way:

name: adservice
description: The adservice of the online boutique by Google
componentType: Kubernetes Container
properties:
  prometheus:
    namespace: boutique
    pod: adservice.*
    container: server

Collect Java metrics

java -javaagent:the_downloaded_jmx_exporter_jar.jar=9100:config.yaml -jar yourJar.jar

The command will expose on localhost on port 9100 Java metrics of youJar.jar __ which can be scraped by Prometheus.

config.yaml is a configuration file useful for the activity of this exporter. It is suggested to use this configuration for an optimal experience with the Prometheus provider:

startDelaySeconds: 0
username:
password:
ssl: false
lowercaseOutputName: false
lowercaseOutputLabelNames: false
# using the property above we are telling the export to export only relevant java metrics
whitelistObjectNames:
- "java.lang:*"
- "jvm:*"

As a next step, add a new scraping target in the configuration of the Prometheus used by the provider:

...
scrape_configs:
# JMX Exporter
- job_name: "jmx"
  static_configs:
  - targets: ["jmx_exporter_host:9100"]

You can then create a YAML file with the definition of a telemetry instance (prom_instance.yml) of the Prometheus provider:

name: Prometheus
config:
  address: prometheus_host
  port: 9090

And you can create the telemetry instance using the Akamas CLI:

akamas create telemetry-instance prom_instance.yml

Finally, to bind the extracted metrics to the related component, you should add the following field to the properties of the component’s definition:

prometheus:
  job: jmx

Collect system metrics

systemctl start node_exporter

Here’s the manifest of the node_exporter service:

[Unit]
Description=Node Exporter

[Service]
ExecStart=/path/to/node_exporter/executable

[Install]
WantedBy=default.target

The service will expose on localhost on port 9100 system metrics __ which can be scraped by Prometheus.

As a final step, add a new scraping target in the configuration of the Prometheus used by the provider:

...
scrape_configs:
# Node Exporter
- job_name: "node"
  static_configs:
  - targets: ["node_exporter_host:9100"]
  relabel_configs:
  - source_labels: ["__address__"]
    regex: "(.*):.*"
    # here we put as "instance", the name of the component the metrics refer to
    target_label: "instance"
    replacement: "linux_component_name"

You can then create a YAML file with the definition of a telemetry instance (prom_instance.yml) of the Prometheus provider:

provider: Prometheus
config:
  address: prometheus_host
  port: 9090

And you can create the telemetry instance using the Akamas CLI:

akamas create telemetry-instance prom_instance.yml

Finally, to bind the extracted metrics to the related component, you should add the following field to the properties of the component’s definition:

prometheus:
  instance: linux_component_name
  job: node

Spark History Server provider

The Spark History Server provider collects metrics from a Spark History Server instance and makes them available to Akamas.

Prerequisites

This section provides the minimum requirements that you should match before using the Spark History Server telemetry provider.

Supported versions

  • Apache Spark 2.3

Network requirements

  • Spark History Server API must be reachable at the provided address and port (the default port is 18080).

Supported component types

  • spark-application

Akamas supported version

  • Versions < 2.0.0 are compatible with Akamas until version 1.8.0

  • Versions >= 2.0.0 are compatible with Akamas from version 1.9.0

Workflow requirements

This section lists the workflow operators this provider depends on:

Components configuration

Akamas uses components to identify specific elements of the system to be monitored and optimized. Your system might contain multiple components to model, for example, a Spark application and each host of the cluster. To point Akamas to the right component when extracting metrics you need to add a property called sparkApplication to your Spark Application component. The provider will only extract metrics for components for which this property has been specified.

Install Dynatrace provider

Install the Telemetry Provider

Skip this part if the Telemetry Provider is already installed.

To install the Dynatrace provider, create a YAML file (called provider.yml in this example) with the definition of the provider:

Then you can install the provider using the Akamas CLI:

Install Spark History Server provider

To install the Spark History Server provider, create a YAML file (called provider.yml in this example) with the definition of the provider:

Then you can install the provider using the Akamas CLI:

The installed provider is shared with all users of your Akamas installation and can monitor many different systems, by configuring appropriate telemetry provider instances.

OracleDB Exporter

This page describes how to set up an OracleDB exporter in order to gather metrics regarding an Oracle Database instance through the Prometheus provider.

Installation

Use the following command line to run the container, where cust-metrics.toml is your configuration file defining the queries for additional custom metrics (see paragraph below) and DATA_SOURCE_NAME an environment variable containing the Oracle EasyConnect string:

Custom queries

Custom Configuration file

Prometheus provider

The Prometheus provider collects metrics from a Prometheus instance and makes them available to Akamas.

This provider includes support for several technologies (Prometheus exporters). In any case, custom queries can be defined to gather the desired metrics.

Prerequisites

This section provides the minimum requirements that you should match before using the Prometheus provider.

Supported Prometheus versions:

Akamas supports Prometheus starting from version2.26.

Using also theprometheus-operator requires Prometheus 0.47 or greater. This version is bundled with the kube-prometheus-stack since version 15.

Connectivity between the Akamas server and the Prometheus server is also required. By default, Prometheus is run on port 9090.

Supported Prometheus exporters

Supported Akamas component types

  • Kubernetes (Pod, Container, Workload, Namespace)

  • Web Application

  • Java (java-ibm-j9vm-6, java-ibm-j9vm-8, java-eclipse-openj9-11, java-openjdk-8, java-openjdk-11)

  • Linux (Ubuntu-16.04, Rhel-7.6)

Component configuration

Akamas reasons in terms of a system to be optimized and in terms of parameters and metrics of components of that system. To understand which metrics collected from Prometheus should be mapped to a component, the Prometheus provider looks up some properties in the components of a system grouped under prometheus property. These properties depend on the exporter and the component type.

Nested under this property you can also include any additional field your use case may require to filter the imported metrics further. These fields will be appended in queries to the list of label matches in the form field_name=~'field_value', and can specify either exact values or patterns.

It is important that you add instance and, optionally, the job properties to the components of a system so that the Prometheus provider can gather metrics from them:

Prometheus configuration

The Prometheus provider does not usually require a specific configuration of the Prometheus instance it uses.

When gathering metrics for hosts it's usually convenient to set the value of the instance label so that it matches the value of the instance property in a component; in this way, the Prometheus provider knows which system component each data point refers to.

Here’s an example configuration for Prometheus that sets the instance label:

CloudWatch Exporter

This page describes how to set up a CloudWatch exporter in order to gather AWS metrics through the Prometheus provider. This is especially useful to monitor system metrics when you don’t have direct SSH access to AWS resources like EC2 Instances or if you want to gather AWS-specific metrics not available in the guest OS.

AWS policies

In order to fetch metrics fromCloudWatch, the exporter requires an IAM user or role with the following privileges:

  • cloudwatch:GetMetricData

  • cloudwatch:GetMetricStatistics

  • cloudwatch:ListMetrics

  • tag:GetResources

You can assign AWS-managed policies CloudWatchReadOnlyAccess and ResourceGroupsandTagEditorReadOnlyAccess to the desired user to enable these permissions.

Exporter configuration

  • region: AWS region of the monitored resource

  • metrics: a list of objects containing filters for the exported metrics

    • aws_namespace: the namespace of the monitored resource

    • aws_metric_name: the name of the AWS metric to fetch

    • aws_dimensions: the dimension to expose as labels

    • aws_dimension_select: the dimension to filter over

    • aws_statistics: the list of metric statistics to expose

    • aws_tag_select: optional tags to filter on

      • tag_selections: map containing the list of values to select for each tag

Notice: AWS bills CloudWatch usage in batches of 1 million requests, where every metric counts as a single request. To avoid unnecessary expenses configure only the metrics you need.

Prometheus configuration

Notice: AWS bills CloudWatch usage in batches of 1 million requests, where every metric counts as a single request. To avoid unnecessary expenses configure an appropriate scraping interval.

Additional workflow task

Once you configured the exporter in the Prometheus configuration you can start to fetch metrics using the Prometheus provider. The following sections describe some scripts you can add as tasks in your workflow.

Wait for metrics

Start/stop the exporter as needed

Since Amazon bills your CloudWatch queries is wise to run the exporter only when needed. The following script allows you to manage the exporter from the workflow by adding the following tasks:

  • start the container right before the beginning of the load test (command: bash script.sh start)

Custom Configuration file

The example below is the Akamas-supported configuration, fetching metrics of EC2 instances named server1 and server2.

Create Spark History Server telemetry instances

Create a telemetry instance

To create an instance of the Spark History Server provider, build a YAML file (instance.yml in this example) with the definition of the instance:

Then you can create the instance for the system spark-system using the Akamas CLI:

Configuration options

When you create an instance of the Spark History Server provider, you should specify some configuration information to allow the provider to correctly extract and process metrics from the Spark History server.

You can specify configuration information within the config part of the YAML of the instance definition.

Required properties

  • address - hostname of the Spark History Server instance

Telemetry instance reference

The following YAML file describes the definition of a telemetry instance.

The following table reports the reference for the config section within the definition of the Spark History Server provider instance:

Use cases

This section reports common use cases addressed by this provider.

Collect stage metrics of a Spark Application

As a first step, you need to create a YAML file (spark_instance.yml) containing the configuration the provider needs to connect to the Spark History Server, plus the filter on the desired level of granularity for the imported metrics:

and then create the telemetry instance using the Akamas CLI:

Best practices

This section reports common best practices you can adopt to ease the use of this telemetry provider.

  • configure metrics granularity: in order to reduce the collection time, configure the importLevel to import metrics with a granularity no finer than the study requires.

  • wait for metrics publication: make sure in the workflow there is a few-minute interval between the end of the Spark application and the execution of the Spark telemetry instance, since the Spark History Server may take some time to complete the publication of the metrics.

NeoLoadWeb provider

The NeoLoad Web provider collects metrics from a NeoLoad Web instance and makes them available to Akamas.

Prerequisites

This section provides the minimum requirements that you should match before using the NeoLoad Web telemetry provider.

Supported versions

  • NeoLoad Web SaaS or managed version 7.1 or later.

Network requirements

Permissions

  • NeoLoad Web API access token.

Akamas supported version

  • Versions < 2.0.0 are compatibile with Akamas untill version 1.8.0

  • Versions >= 2.0.0 are compatible with Akamas from version 1.9.0

Supported component types

  • Web Application

Workflow requirements

This section lists the workflow operators this provider depends on.

Components configuration

Akamas reasons in terms of a system to be optimized and in terms of parameters and metrics of components of that system. To understand which metrics collected from NeoloadWeb should refer to which component, the NeoloadWeb provider looks up the property neoloadweb in the components of a system:

collects metrics from CSV files

collects metrics from Dynatrace

collects metrics from Prometheus

collects metrics from Spark History Server

collects metrics from Tricentis Neoload Web

collects metrics from MicroFocus Load Runner Professional

collects metrics from MicroFocus Load Runner Enterprise

collects price metrics for Amazon Elastic Compute Cloud (ec2) from Amazon’s own APIs

The URL of the Dynatrace installation API (see the )

The Dynatrace API Token the provider should use to interact with Dynatrace. The token should have .

The protocol used to connect to the remote machine: or

The path of the remote file(s) to be analyzed. The path can contains expressio

Must be specified using .

Please refer to for a complete reference of PromQL

See below

See below

If not set in the component properties, this placeholder is replaced with the duration field configured in the telemety-instance. You should use it with instead of hardcoding a fixed value.

See

Check for a list of all the Java metrics available in Akamas

You can leverage the Prometheus provider to collect Java metrics by using the . The JMX Exporter is a collector of Java metrics for Prometheus that can be run as an agent for any Java application. Once downloaded, you execute it alongside a Java application with this command:

Check the for a list of all the system metrics available in Akamas

You can leverage the Prometheus provider to collect system metrics (Linux) by using the . The Node exporter is a collector of system metrics for Prometheus that can be run as a standalone executable or a service within a Linux machine to be monitored. Once downloaded, schedule it as a service using, for example, systemd:

You can check to see how component-types metrics are extracted by this provider.

The OracleDB exporter repository is available on the . The suggested deploy mode is through a , since the Prometheus instance can easily access the running container through the Akamas network.

You can refer to the for more details or alternative deployment modes.

It is possible to define additional queries to expose custom metrics using any data in the database instance that is readable by the monitoring user (see for more details about the syntax).

The following is an example of exporting system metrics from the Dynamic Performance (V$) Views used by the Prometheus provider default queries for the :

(Linux system metrics)

(Java metrics)

(Docker container metrics)

exporter (AWS resources metrics)

(Web application metrics)

The Prometheus provider includes queries for most of the monitoring use cases these exporters cover. If you need to specify custom queries or make use of exporters not currently supported you can specify them as described in creating .

Refer to to see how component-type metrics are extracted by this provider.

Notice: you should configure your Prometheus instances so that the Prometheus provider can leverage the instance property of components, as described in the section here above.

The CloudWatch exporter repository is available on the . It requires a minimal configuration to fetch metrics from the desired AWS instances. Below is a short list of the parameters needed for a minimal configuration:

resource_type_selection: resource type to fetch the tags from (see: )

resource_id_dimension: dimension to use for the resource id (see: )

For a complete list of possible values for namespaces, metrics, and dimensions please refer to the official .

The suggested deployment mode for the exporter is through a . The following snippet provides a command line example to run the container (remember to provide your AWS credentials if needed and the path of the configuration file):

You can refer to the for more details or alternative deployment modes.

In order to scrape the newly created exporter add a new job to the configuration file. You will also need to define some in order to add the instance label required by Akamas to properly filter the incoming metrics. In the example below the instance label is copied from the instance’s Name tag:

It’s worth noting that CloudWatch may require some minutes to aggregate the stats according to the configured granularity, causing the telemetry provider to fail while trying to fetch data points not available yet. To avoid such issues you can add at the end of your workflow a task using an to wait for the CloudWatch metrics to be ready. The following script is an example of implementation:

stop the container after the metrics publication, as described in the (command: bash script.sh stop).

Field
Type
Description
Default value
Restriction
Required

Check for a list of all Spark application metrics available in Akamas

This example shows how to configure a Spark History Server provider in order to collect performance metrics about a Spark application submitted to the cluster using the operator.

Finally, you will need to define for your study a workflow that includes the submission of the Spark application to the cluster, in this case using the :

The NeoLoad Web API must be reachable at a provided address and port (by default ).

You can check to see how component-types metrics are extracted by this provider.

CSV provider
Dynatrace
Prometheus
Spark History Server
NeoloadWeb
Load Runner Professional
Load Runner Enterprise
AWS
official reference
sufficient permissions
SCP
SFTP
GLOB
Java syntax
Querying basics | Prometheus
Java OpenJDK page
JMX Exporter
Linux page
Node exporter
name: My Application
properties:
  sparkApplication: "true"
# Dynatrace Telemetry Provider
name: Dynatrace
description: Telemetry Provider that enables to import metrics from Dynatrace installations
dockerImage: 485790562880.dkr.ecr.us-east-2.amazonaws.com/akamas/telemetry-providers/dynatrace-provider:3.2.0
akamas install telemetry-provider provider.yml
# Spark History Server Telemetry Provider
name: SparkHistoryServer
description: Telemetry Provider that enables to import of metrics from Spark History Server instances
dockerImage: 485790562880.dkr.ecr.us-east-2.amazonaws.com/akamas/telemetry-providers/spark-history-server-provider:2.0.0
akamas install telemetry-provider provider.yml
docker run -d --name oracledb_exporter --restart always \
  --network akamas -p 9161:9161 \
  -v ~/oracledb_exporter/cust-metrics.toml:/cust-metrics.toml \
  -e CUSTOM_METRICS=/cust-metrics.toml \
  -e DATA_SOURCE_NAME="username/password@//oracledb.mycompany.com/service" \
  iamseth/oracledb_exporter
[[metric]]
context= "memory"
labels= [ "component" ]
metricsdesc= { size="Component memory extracted from v$memory_dynamic_components in Oracle." }
request = '''
SELECT component, current_size as "size"
FROM V$MEMORY_DYNAMIC_COMPONENTS
UNION
SELECT name, bytes as "size"
FROM V$SGAINFO
WHERE name in ('Free SGA Memory Available', 'Redo Buffers', 'Maximum SGA Size')
'''

[[metric]]
context = "activity"
metricsdesc = { value="Generic counter metric from v$sysstat view in Oracle." }
fieldtoappend = "name"
request = '''
SELECT name, value
FROM V$SYSSTAT WHERE name IN (
  'execute count',
  'user commits', 'user rollbacks',
  'db block gets from cache', 'consistent gets from cache', 'physical reads cache', /* CACHE */
  'redo log space requests'
 )
 '''

[[metric]]
context = "system_event"
labels = [ "event", "wait_class" ]
request = '''
SELECT
  event, wait_class,
  total_waits, time_waited
FROM V$SYSTEM_EVENT
'''
[metric.metricsdesc]
  total_waits= "Total number of waits for the event as per V$SYSTEM_EVENT in Oracle."
  time_waited= "Total time waited for the event (in hundredths of seconds) as per V$SYSTEM_EVENT in Oracle."
# Specification for a component, whose metrics should be collected by the Prometheus Provider
name: jvm1  # name of the component
description: jvm1 for payment services  # description of the component
properties:
  prometheus:
    instance: service0001  # instance of the component: where the component is located relative to Prometheus
    job: jmx               # job of the component: which prom exporter is gathering metrics from the component
# Custom global config
global:
  scrape_interval:     5s   # Set the scrape interval to every 15 seconds. The default is every 1 minute.
  evaluation_interval: 5s   # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# A scrape configuration containing exactly one endpoint to scrape:
scrape_configs:
# Node Exporter
- job_name: 'node'
  static_configs:
  - targets: ["localhost:9100"]
  relabel_configs:
  - source_labels: ["__address__"]
    regex: "(.*):.*"
    target_label: instance
    replacement: value_of_instance_property_in_the_component_the_data_points_should_refer_to
region: us-east-2
metrics:
  - aws_namespace: AWS/EC2
    aws_metric_name: CPUUtilization
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkIn
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkOut
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkPacketsIn
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkPacketsOut
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: CPUCreditUsage
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: CPUCreditBalance
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSReadOps
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSWriteOps
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSReadBytes
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSWriteBytes
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSIOBalance%
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSByteBalance%
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId
docker run -d --name cloudwatch_exporter \
  -p 9106:9106 \
  -v $(pwd)/cloudwatch-exporter.yaml:/config/config.yml \
  -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
  prom/cloudwatch-exporter
scrape_configs:
  - job_name: cloudwatch_exporter
    scrape_interval: 60s
    scrape_timeout: 30s
    static_configs:
      - targets: [cloudwatch_exporter:9106]
    metric_relabel_configs:
      - source_labels: [tag_Name]
        regex: '(.+)'
        target_label: instance
METRIC=aws_rds_cpuutilization_sum   # metric to check for
DELAY_SEC=15
RETRIES=60

NOW=`date +'%FT%T.%3NZ'`

for i in `seq $RETRIES`; do
  sleep $DELAY_SEC
  curl -sS "http://prometheus_host/api/v1/query?query=${METRIC}&time=${NOW}" | jq -ce '.data.result[]' && exit 0
done

exit 255
#!/bin/bash

set -e

CMD=$1
CONT_NAME=cloudwatch_exporter

stop_cont() {
  [ -z `docker ps -aq -f "name=${CONT_NAME}"` ] || (echo Removing ${CONT_NAME} && docker rm -f ${CONT_NAME})
}

case $CMD in
  stop|remove)
    stop_cont
    ;;

  start)
    stop_cont

    AWS_ACCESS_KEY_ID=`awk 'BEGIN { FS = "=" } /aws_access_key_id/ {print $2 }' ~/.aws/credentials | tr -d '[:space:]'`
    AWS_SECRET_ACCESS_KEY=`awk 'BEGIN { FS = "=" } /aws_secret_access_key/ {print $2 }' ~/.aws/credentials | tr -d '[:space:]'`

    echo Starting container $CONT_NAME
    docker run -d --name $CONT_NAME \
      -p 9106:9106 \
      -v ~/oracle-database/utils/cloudwatch-exporter.yaml:/config/config.yml \
      -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
      prom/cloudwatch-exporter
    ;;

    *)
    echo Unrecognized option $CMD
    exit 255
    ;;
esac
region: us-east-2
metrics:
  - aws_namespace: AWS/EC2
    aws_metric_name: CPUUtilization
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkIn
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkOut
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkPacketsIn
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkPacketsOut
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: CPUCreditUsage
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: CPUCreditBalance
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSReadOps
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSWriteOps
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSReadBytes
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSWriteBytes
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSIOBalance%
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSByteBalance%
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId
provider: SparkHistoryServer
config:
  address: spark_master_node
  port: 18080
  importLevel: stage
akamas create telemetry-instance instance.yml spark-system
provider: SparkHistoryServer  # This is an instance of the Spark History Server provider
config:
  address: spark_master_node # The adress of Spark History Server
  port: 18080   # The port of Spark History Server
  importLevel: job  # The granularity of the imported metrics

address

URL

Spark History Server address

Yes

importLevel

String

Granularity of the imported metrics

job

Allowed values: job, stage, task

No

port

Integer

Spark History Server listening port

18080

No

provider: SparkHistoryServer
config:
  address: spark_master_node
  port: 18080
  importLevel: stage
akamas create telemetry-instance spark_instance.yml
name: spark_workflow
tasks:
  - name: Run Spark application
    operator: SSHSparkSubmit
    arguments:
      component: spark
name: MyComponent
properties:
 neoloadweb: "true" # The presence of this property helps akamas discriminate metrics imported using neoloadweb from the ones imported by other providers for the same component
Constraints

Install NeoLoadWeb telemetry provider

To install the NeoLoad Web provider, create a YAML file (called provider.yml in this example) with the definition of the provider:

# NeoLoad Web Telemetry Provider
name: NeoLoadWeb
description: Telemetry Provider that enables to import of metrics from NeoLoad Web instances
dockerImage: 485790562880.dkr.ecr.us-east-2.amazonaws.com/akamas/telemetry-providers/neoload-web-provider:2.0.1

Then you can install the provider using the Akamas CLI:

akamas install telemetry-provider provider.yml

The installed provider is shared with all users of your Akamas installation and can monitor many different systems, by configuring appropriate telemetry provider instances.

range vectors
Spark History Server provider metrics mapping
SparkSubmit Operator
SparkSSHSubmit operator
SparkLivy Operator
official project page
Docker image
official guide
the guide
Oracle Database optimization pack
Node exporter
JMX exporter
cAdvisor
CloudWatch
Jmeter
Prometheus provider metrics mapping
official project page
Resource Types
Resource Types
AWS CloudWatch User Guide
Docker image
official guide
relabeling rules
Executor operator
Spark Application page
Spark SSH Submit
Spark SSH Submit operator
https://neoload-api.saas.neotys.com
NeoLoadWeb provider metrics mapping
NeoLoadWeb operator
Example
Example
Collect Kubernetes metrics
Prometheus telemetry instances
Setup datasource
previous section
Akamas high-level architecture
Akamas optimizes any system and integrates with any ecosystem
Akamas key use cases
Preparing an optimization: create telemetry instances
Telemetry instances associated with a system
Automated workflow for an offline optimization study
Workflow reuse by two different studies
Preparing an optimization: create automation workflow
Workflow and its tasks (partial)
Optimization packs that are already installed or need to be installed
Preparing an optimization: creating an optimization study
Akamas UI for creating offline optimization studies
Preparing an optimization: creating an offline optimization study
Preparing an optimization: creating a live optimization study
Automated workflow for a live optimization study
Changing from automatic to manual mode
Illustration of the trim and stability windowing policies
Defining a trim windowing policy
Defining a stabilty windowing policy
Analysis tab showing experiment scores and the insght tags
Baseline just finished - all tags are assigned to this experiment
Experiment #2 is awared Best CPU tag
Experiment #4 is awared Best CPU
Experiment #7 is awarded Best MEM usage
Ready to start a new offline optimization study
Study bootstrapping
Warning message when stopping a live optimization study
Best Configuration section
Progress tab showing the study steps and experiments
Progress tab showing the configuration associted to an experiment as compared to the best and baseline
Higher part of the Analysis tab showing scored experiments over time
Lower part of the Analysis tab showing values for each confguration metric and parameter
Analysis tab with selected metrics and relative constraints (toggle on)
Best Score and Insight section of an offline optinmization study
Defining the optimization goal (and constraints)
Specifying the study KPIs
Modifying the list of KPIs for a study
Example of optimization study with three steps: baselining, bootstrap and optimize
Defining the steps while creating a new study
Defining the optimization paramters (and metrics)
Editing the range of values for a parameter in a study
Metrics section of a live optimization study
From the metrics cahrt displaying configurations (toggle on) to a specific configuration
The list of configurations applied ovet time in the All Configuration section
A specifici configuration from the All Configuration section
Pending configutation
Optimization campaigns executed in a yearly timeframe (llustrative)
Optimization campaigns and their optimization studies (illustrative)
Steps to prepare an optimization study
Preparing an optimization: model the system
Figure: Example of a modeled system and its components
Preparing an optimization: model the system components