Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Akamas software licensing model is subscription-based (typically on a yearly basis). For more information on Akamas' cost model and software licensing costs, please contact info@akamas.io.
Akamas software licenses include Maintenance & Support Services which also include access to Customer Support Services.
Akamas also provides optional professional services for deployment, training, and integration activities. For more information about Akamas professional services, please contact info@akamas.io.
Akamas is an on-premise product running on a dedicated machine within the customer environment:
on a virtual or physical machine in your data center
on a virtual machine managed running on a cloud, by any cloud provider (e.g. AWS EC2)
on your own laptop
Akamas also provides a Free Trial option which can be requested here.
This guide introduces Akamas and covers various fundamental topics such as licensing and deployment models, security topics, and maintenance & support services.
It is recommended to read this guide before moving to other guides on how to install, integrate, and use Akamas. The section of the Reference guide can help in reviewing Akamas key concepts.
This page is intended as your entry point to the Akamas documentation.
provides a very first introduction to AI-powered optimization
covers Akamas licensing, deployment, security topics
describes Akamas maintenance and support services.
This guide provides some preliminary knowledge required to puchaise, implement and use Akamas.
User personas: All roles
describes the Akamas architecture
provides the hardware, software and network prerequisites
describes the steps to install an Akamas Server and CLI
This guide provides the knowledge required to install and manage an Akamas installation.
User personas: Akamas Admin
describes the Akamas optimization process and methodology
provides guidelines for optimizing some specific technologies
provides examples of optimization studies
This guide provides the methodology to define an optimization process and knowledge to leverage Akamas
User personas: Analyst / Practicioner teams
describes how to integrate Akamas with the telemetry providers and configuration management tools
describes how to integrate Akamas with load testing tools
describes how to integrate Akamas with CI/CD tools
This guide provides the knowledge required to integrate Akamas with the ecosystem
User personas: Akamas Admin, DevOps team
provides a glossary of Akamas key concepts with references to construct templates and commands
provides a reference to Akamas construct templates
provides a reference to Akamas command-line commands
describes Akamas optimization packs and telemetry providers
User personas: Akamas Admin, DevOps team, Analyst / Practicioner teams
describes how to setup a test environment for experimenting with Akamas
describes how to apply the Akamas approach to the optimization of some real-world cases
provides examples of Akamas templates and commands for the real-world cases
User personas: Analyst / Practicioner teams
A quick introduction to Akamas
Akamas is the AI-powered optimization platform designed to maximize service quality and cost efficiency without compromising on application performance. Akamas supports both production environments under live, dynamic workloads, and in test/pre-production environments against any what-if scenario and workload.
Thanks to Akamas, performance engineers, DevOps, CloudOps, FinOps and SRE teams can keep complex applications, such as Kubernetes microservices applications, optimized to avoid any unnecessary cost and any performance risks.
The Akamas optimization platform leverages patented AI techniques that can autonomously identify optimal full-stack configurations driven by any custom-defined goals and constraints (SLOs), without any human intervention, any agents, and any code or byte-code changes.
Akamas optimal configurations can be applied either i) under human approval (human-in-the-loop mode) or ii) automatically, as a continuous optimization step in a CI/CD pipeline (in-the-pipe) or iii) autonomously by Akamas (autopilot).
Akamas can optimize any system with respect to any set of parameters chosen from the application, middleware, database, cloud, and any other underlying layers.
Akamas provides dozens of out-of-the-box Optimization Packs available for key technologies such as JVM, Go, Kubernetes, Docker, Oracle, MongoDB, ElasticSearch, PostgreSQL, Spark, AWS EC2 and Lambda, and more. Optimization Pack provides parameters, relationships, and metrics to accelerate the optimization process setup and support company-wide best practices. Custom Optimization Packs can be easily created without any coding.
The following figure is illustrative of Akamas coverage for both managed technologies and integrated components of the ecosystem.
Akamas can integrate with any ecosystem thanks to out-of-the-box and custom integrations with the following components:
telemetry & monitoring tools and other sources of KPIs and cost data, such as Dynatrace, Prometheus, CloudWatch, and CSV files
configuration management tools, repositories and interfaces to apply configurations, such as Ansible, Openshift, and Git
value stream delivery tools to support a continuous optimization process, such as Jenkins, Dynatrace Cloud Automation, and GitLab
load testing tools to generate simulated workloads in test/pre-production, such as LoadRunner, NeoLoad, and JMeter
Akamas has been designed around Infrastructure-as-Code (IaC) and DevOps principles. Thanks to a comprehensive set of APIs and integration mechanisms, it is possible to extend the Akamas optimization platform to manage any system and integrate with any ecosystem.
Akamas optimization platform supports a variety of use cases, including:
Improve Service Quality: optimize application performance (e.g. maximize throughput, minimize response time and job execution time) and stability (lower fluctuations and peaks);
Increase Business Agility: identify resource bottlenecks in early stages of the delivery cycle, avoid delays due to manual remediations - release higher quality services and reduce production incidents;
Increase Service Resilience: improve service resilience under higher workloads (e.g. expected business growth) or failure scenarios identified by chaos engineering practices - improve SRE practice;
Reduce IT Cost / Cloud Bill: reduce on-premise infrastructure cost and cloud bills due to resource over-provisioning - improve cost efficiency of Kubernetes microservices applications;
Optimize Cloud Migration: safely migrate on-premise applications to cloud environments for optimal cost efficiency evaluate options to migrate to managed services (e.g. AWS Fargate);
Improve Operational Efficiency: save engineering time spent on manual tuning tasks and enable Performance Engineering teams to do more in less time (and with less external consulting).
Akamas takes security seriously and provides enterprise-grade software where customer data is kept safe at all times. This page describes some of the most important security aspects of Akamas software and information related to process and tools used by the Akamas company (Akamas S.p.A) to develop its software products.
Akamas manages the following types of information:
System configuration and performance metrics: technical data related to the systems being optimized. Examples of such data include the number of CPUs available in a virtual machine or the memory usage of a Java application server;
User accounts: accounts assigned to users to securely access the Akamas platform. For each user account, Akamas currently requires an account name and a password. Akamas does not collect any other personal identifying information;
Service Credentials: credentials used by Akamas to automate manual tasks and to integrate with external tools. In particular, Akamas leverages the following types of interaction:
Integration with monitoring and orchestration tools, e.g. to collect IT performance metrics and system configuration. As a best practice, Akamas recommends using dedicated service accounts with minimal read-only privileges.
Integration with the target systems to apply changes to configuration parameters. As a best practice, Akamas recommends using dedicated service accounts with minimal privileges to read/write identified parameters.
Akamas is a fully GDPR compliant product.
Akamas is a company owned by the Moviri Group. The Moviri Group and all its companies are fully compliant with GDPR. Moviri Group Data Privacy Policy and Data Breach Incident Response Plan which apply to all the owned companies can be requested from Akamas Customer Support.
Akamas is an on-premises product and does not transmit any data outside the customer network. Considering the kind of data that is managed within Akamas (see section "Which information is managed by Akamas"), specific security certifications like PCI or HIPAA are not required as the platform does not manage payment or health-related information.
Akamas takes the need for security seriously and understands the importance of encrypting data to keep it safe at-rest and in-flight.
All the communications between Akamas UI and CLI and the back-end services are encrypted via HTTPS. The customer can configure Akamas to use customer-provided SSL certificates in all communications.
Communications between Akamas services and other integrated tools within the customer network rely on the security configuration requirements of the integrated tool (e.g. HTTPS calls to interact with REST services).
Akamas is an on-premises product and runs on dedicated virtual machines within the customer environment. At-Rest Encryption can be achieved following customer policies and best practices, for example leveraging operating system-level techniques.
Akamas also provides an application-level encryption layer aimed at extending the scope of at-Rest encryption. With this increased level of security, sensitive data managed by Akamas (e.g. passwords, tokens, or keys required to interact with external systems) are safely stored in Akamas databases using industry-standard AES 256-bit encryption.
In case of Akamas hosted on an AWS machine you may optionally create an EC2 instance with an encrypted EBS volume before installing OS and Akamas, in order to achieve a higher level of security.
Passwords are securely stored using a one-way hash algorithm.
Akamas comes with a default password policy with the following requirements:
have a minimum length of 12 characters.
contains at least 1 uppercase and 1 lowercase character.
contains at least 1 special character.
is different from the username.
must be different from the last password set.
Customers can modify this policy by providing a custom one that matches their internal security policies.
Akamas enforces no password rotation mechanism.
When running on a Linux installation with KDE's KWallet enabled or with GNOME's Keyring enabled, the credentials will be stored in the default wallet/keyring.
When running on Windows, the credential will be stored in Windows Credential Locker.
When running on a macOS, the credential will be stored in Keychain.
When running on a Linux headless installation, the credentials will be stored in CLEAR TEXT in a file in the current Akamas configuration folder.
Akamas provides fine granularity control over resources managed within the platform. In particular, Akamas features two kinds of resources:
Workspace resources: entities bound to one of the isolated virtual environments (named workspaces) that can only be accessed in reading or writing mode by users to whom the administrators explicitly granted the required privileges. Such resources typically include sensitive data (e.g. passwords, API tokens). Examples of such resources include the system to be optimized, the set of configurations, optimization studies, etc.
Shared resources: entities that can be installed and updated by administrators and are available to all Akamas users. Such resources only contain technology-related information (e.g. the set of performance metrics for a Java application server). Examples of such resources include Optimization Packs, which are libraries of technology components that Akamas can optimize, such as a Java application server.
Akamas logs traffic from UI and APIs. Application level logs include user access via APIs and UI and any action taken by Akamas on integrated systems.
Akamas logs are retained on the dedicated virtual machine within the customer environment, by default, for 7 days. The retention period can be configured according to customer policies. Logs can be accessed either via UI or via log dump within the retention period. Additionally, logs have a format that can be easily integrated with external systems like log engines and SIEM to support forensic analysis.
Akamas is developed according to security best practices and the code is scanned regularly (at least daily).
The Akamas development process leverages modern continuous integration approaches and the development pipeline includes SonarQube, a leading security scanning product that includes comprehensive support for established security standards including CWE, SANS, and OWASP. Code scanning is automatically triggered in case of a new build, a release, and every night.
Akamas features modern micro-service architecture and is delivered as a set of docker containers whose images are hosted on a private Elastic Container Registry (ECR) repository on the AWS cloud. Akamas leverages the vulnerability scanning capabilities of AWS ECR to identify vulnerabilities within the product container images. AWS ECR uses the Common Vulnerabilities and Exposures (CVEs) database from the open-source Clair project.
If a vulnerability is detected, Akamas will perform a security assessment of the security risk in terms of the impact of the vulnerability, and evaluate the necessary steps (e.g. dependency updates) required to fix the vulnerability within a timeline related to the outcome of the security assessment.
After the assessment, the vulnerability can be fixed either by recommending the upgrade to a new product version or by delivering a patch or a hotfix for the current version.
This page is intended as a first introduction to Akamas Maintenance & Support (M&S) Services.
Please refer to the specific contract in place with your Company.
Akamas M&S Services include:
access to Software versions released as major and minor versions, service packs, patches, and hotfixes according to Support levels for software versions.
assistance from Akamas Customer Support for inquiries about the Akamas product and issues encountered while using Akamas products where there is a reasonable expectation that issues are caused by Akamas products, according to Support levels for Customer Support Services
Akamas M&S Services do not include any installation and upgrade services, creation of any custom optimization packs, telemetry providers, or workflow operators, or implementation of any custom features and integrations that are not provided out-of-the-box by the Akamas products.
Akamas Customer Support Services are delivered by Akamas support engineers, also called Support Agents, who will work remotely with Customer to provide a temporary remedy for the incident and, ultimately, a permanent resolution. Akamas Support Agents automatically escalate issues to the appropriate technical group within Akamas and notify Customers of any relevant progress. Akamas provides Customers with the ability to escalate issues when appropriate.
Please notice that Customer Support services are not to be considered as alternatives to product documentation and training, or to professional and consulting services, so adequate knowledge of Akamas products is assumed when interacting with Akamas Customer Support. Thus, during the resolution of a reported issue Support Agents may redirect Customer to training or professional services (that are not part of the scope of this service).
Akamas Customer Support Services provides different standard levels of support. Please verify the level of support specified in the contract in place with your Company.
The following table describes the different severity levels for Customer Support.
S1
Blocking: production Customer system is severely impacted.
Notice: this severity level only applies to production environments
Catastrophic business impacts (e.g. complete loss of a core business process and work cannot reasonably continue (e.g. all final users are unable to access the Customer application)
S2
Critical: one major Akamas functionality is unavailable
Significant loss or degradation of the Akamas services (e.g. Akamas is down or Akamas is not generating recommendations)
S3
Severe: limitation in accessing one major Akamas functionality
Moderate business impact and moderate loss or degradation of services, but work can reasonably continue in an impaired manner (e.g. only some specific functions are not working properly)
S4
Informational: Any other request
Minimum business impact.
Substantially functioning with minor or no impediments of services.
The contract in place with the Customer specifies the level of support provided by Akamas Agents, according at least to the following items:
Maximum number of support seats: this is the maximum number of named users within the Customer organization who can request Akamas Customer Support.
Language(s): these are the languages that can be used for interacting with Akamas Support Agents - the default is English.
Channel(s): these are the different communication channels that can be used to interact with Akamas Agents - these may include one or more options among web ticketing, email, phone, and Slack channel.
Max Initial Response Time: this refers to the time interval occurring from the time a request is opened by Customer to Customer Support and the time a Support Agent responds with a first notification (acknowledgment).
Severity: this is the level of severity associated with a reported issue, which initially corresponds to the severity level originally indicated by the Customer. Notice that the severity level may change, for example as new information becomes available or if Support Agents and Customer agree to re-evaluate it. Please notice that the severity level may be downgraded by Support Agents if Customer is not able to provide adequate resources or responses to enable Akamas to continue with its resolution efforts.
Initial Remedy: this refers to any operation aimed at addressing a reported issue by restoring a minimal level of operations, even if it may cause some performance degradation of the Customer service or operations. A workaround is to be considered a valid Initial Remedy.
Please notice that Support Agents may refuse to serve a service request to Customer Support either in case Customer does not have a valid Maintenance & Support subscription or in case the above-mentioned conditions or other conditions stated in the contract in place are not met. In any case, the Customer is expected to provide all the information required by Support Agent in order to serve service requests Customer Support.
Different levels of support are provided for software versions of Akamas products, starting from its general availability (GA) date, and depending on the release of following software versions.
Akamas adopts a three-place numbering scheme MA.MI.SP to designate released versions of its Software, where:
MA is the Major Version
MI is the Minor Version
SP is the Service Pack or Patch number
The following table describes the three levels of support for a software version.
Full Support
Akamas provides full support for one previous (either major or minor) version in addition to the latest available GA version.
For Software version in Full Support level: Akamas Support Agents provide service packs, patches, hotfixes, or workarounds to make the Software operate in substantial conformity with its then-current operating documentation.
Limited Support
Following the Full Support period, Akamas provides Limited Support for additional 12 months.
For Software versions in Limited Support level:
No new enhancements will be made to a version in "Limited Support" Akamas Support Agents will direct Customers to existing fixes, patches, or workarounds applicable to the reported case, if any;
Akamas Support Agents will provide hot fixes for problems of high technical impact or business exposure for customers;
Based on Customer input, Akamas Support Agents will determine the degree of impact and exposure and the consequent activities;
Akamas Support Agents will direct Customers to upgrade to a more current version of the Software.
No Support
Following the Limited Support period, Akamas provides no support for any Software version.
For Software versions in No Support level: No new maintenance releases, enhancements, patches, or hot fixes will be made available. Akamas Support Agents will direct Customers to upgrade to a more current version of the Software.
At any time, Akamas reserves the right to "end of life" (EOL) a software product and to terminate any Maintenance & Support Services for such product, provided that Licensor has notified the Licensee at least 12 months prior to the above-mentioned termination.
The period of time occurring between the "end of life" notification and the actual termination of Maintenance & Support Services is provided as follows:
No new enhancements will be introduced.
No enhancements will be made to support new or updated versions of the platform on which the product runs or which it integrates.
New hotfixes for problems of high technical impact or business exposure for customers may still be developed. Based on customer input, Akamas Support Agents will determine the degree of impact and exposure and the consequent activities.
Reasonable efforts will be done to inform the Customer of any fixes, service packs, patches, or workarounds applicable to the reported case if any.
Based on the Support levels for software versions, the following table describes the level of support of the Akamas versions after the version 3.1 GA date (2022 November, 11th).
3.1
Full Support
Notice: this will change once the following major version is released
3.0
Full Support
Notice: this will change once the following major version is released
2.x
Limited Support until 12 months after 3.0 GA date, that is 2023 September, 13th (see )
1.x
No Support
The following table provides a list of the supported operating systems and their versions.
On RHEL systems Akamas containers might need to be run in privileged mode depending on how Docker was installed on the system.
The following table provides a list of the required Software Packages (also referred to as Akamas dependencies) together with their versions.
The exact version of these prerequisites is listed in the following table:
To install and run Akamas it is recommended to create a dedicated user (usually "akamas"). The Akamas user is not required to be in the sudoers list but can be added to the docker (dockeroot) group so it can run docker and docker-compose commands.
Make sure that the Akamas user has the read, write, and execute permissions on /tmp
. If your environment does not allow writing to the whole /tmp
folder, please create a folder /tmp/build
and assign read and write permission to the Akamas user on that folder.
Read more about how to set up.
Operating System
Version
Ubuntu Linux
18.04+
CentOS
7.6+
RedHat Enterprise Linux
7.6+
Software Package
Notes
Docker
Akamas is deployed as a set of containerized services running on Docker. During its operation, Akamas launches different containers so access to the docker socket with enough permissions to run the container is required.
Docker Compose
Akamas containerized services are managed via Docker Compose. Docker compose is usually already shipped with Docker starting from version 23.
AWS CLI
Akamas container images are published in a private Amazon Elastic Container Registry (ECR) and are automatically downloaded during the online installation procedure.
AWS CLI is required only during the installation phase if the server has internet access and can be skipped during an offline installation.
Software Package
Ubuntu
CentOS
RHEL
Docker
19.03+
1.13+
1.13+
Docker-compose
2.0+
2.0+
2.0+
AWS CLI
2.0.0+
2.0.0+
2.0.0+
Before installing the Akamas Server please make sure to review all the following requirements:
This section lists all the connectivity settings required to operate and manage Akamas
Internet access is required for Akamas online installation and updated procedures and allows retrieving the most updated Akamas container images from the Akamas private Amazon Elastic Container Registry (ECR).
If internet access is not available for policies or security reasons, Akamas installation and updates can be executed offline.
Internet access from the Akamas server is not mandatory but it’s strongly recommended.
The following table provides a list of the ports on the Akamas server that have to be reachable by Akamas administrators and users to properly operate the system.
Source
Destination
Port
Reason
Akamas admin
Akamas server
22
ssh
Akamas admin/user
Akamas server
80, 443
Akamas web UI access
Akamas admin/user
Akamas server
8000, 8443
Akamas API access
In the specific case of AWS instance and customer instances sharing the same VPC/Subnet inside AWS, you should:
open all of the ports listed in the table above for all inbound URLs (0.0.0.0/32) on your AWS security group
open outbound rules to all traffic and then attach this AWS security group (which must reside inside a private subnet) to the Akamas machine and all customer application AWS machines
This section describes how to get Akamas installed.
Please make sure to read the Getting Started section before installing Akamas.
Before installing Akamas, please follow these steps:
Please follow these steps to install the Akamas Server:
Please also read the section on how to troubleshoot the installation and how to manage the Akamas Server. Finally, read the relevant sections of Integrating Akamas to integrate Akamas into your specific ecosystem.
The following table provides the minimal hardware requirements for the virtual or physical machine used to install the Akamas server in your data center.
Resource
Requirement
CPU
4 cores @ 2 GHz
Memory
16 GB
Disk Space
70 GB
To run Akamas on an AWS Instance you need to create a new virtual machine based on one of the supported operating systems. You can refer to AWS documentation for step-by-step instructions on creating the instance.
As shown in the following diagram, you can create the Akamas instance in the same AWS region, Virtual Private Cloud (VPC), and private subnet as your own already existing EC2 machines and by creating/configuring a new security group that allows communication between your application instances and Akamas instance. The inbound/outbound rules of this security group must be configured as explained in the Networking Requirements section of this page.
It is recommended to use an m6a.xlarge
instance with at least 70GB of disks of type GP2
or GP3
and select the latest LTS version of Ubuntu.
Akamas can be run in any EC2 region.
You can find the latest version supported for your preferred region here.
Before installing Akamas on an AWS Instance please make sure to meet your AWS service limits (please refer to the official AWS documentation here).
This special case is also referred to as "Akamas-in-a-box" and is covered by the akamas-in-a-box installation guide.
Akamas is based on a microservices architecture where each service is deployed as a Docker container and communicates with other services via REST APIs on a dedicated machine (Akamas Server).
The following figure represents the high-level Akamas architecture.
Users can interact with Akamas via either the Graphical User Interface (GUI), Command-Line Interface (CLI), or via Application Programmatic Interface (API).
Both the GUI and CLI leverage HTTP/S APIs which pass through an API gateway (based on Kong), which also takes care of authenticating users by interacting with Akamas access management and routing requests to the different services.
The Akamas CLI can be invoked on either the Akamas Server itself or on a different machine (e.g. a laptop or another server) where the Akamas CLI has been installed.
Akamas data is securely stored in different databases:
time series data gathered from telemetry providers are stored in Elasticsearch;
application logs are also stored in Elasticsearch;
data related to systems, studies, workflows, and other user-provided data are stored in a Postgres database.
Notice: both Postgres and Elasticsearch and any other service included within Akamas are provided by Akamas as a Docker container image as part of the Akamas installation package.
The following Spring-based microservices represent Akamas core services:
System Service: holds information about metrics, parameters, and systems that are being optimized
Campaign Service: holds information about optimization studies, including configurations and experiments
Metrics Service: stores raw performance metrics (in Elasticsearch)
Analyzer Service: automates the analysis of load tests and provides related functionalities such as smart windowing
Telemetry Service: takes care of integrating different data sources by supporting multiple Telemetry Providers
Optimizer Service: combines different optimization engines to generate optimized configurations using ML techniques
Orchestrator Service: manages the execution of user-defined workflows to drive load tests
User Service: takes care of user management activities such as user creation or password changes
License Service: takes care of license management activities, optimization pack, and study export.
Akamas also provides advanced management features like logging, self-monitoring, licensing, user management, and more.
This page will guide you through the installation of software components that are required to get the Akamas Server installed on a machine. Please read the for a detailed list of these software components for each specific OS.
While some links to official documentation and installation resources are provided here, please make sure to refer to your internal system engineering department to ensure that your company deployment processes and best practices are correctly matched.
As a preliminary step before installing any dependency, it is strongly suggested to create a user named akamas on your machine hosting Akamas Server.
Follow the reference documentation to install docker on your system.
Docker installation guide:
Docker compose is already installed since Docker 23+. To install it on previous versions of Docker follow this installation guide:
AWS CLI v2:
To run docker with a non-root user, such as the akamas
user, you should add it to the docker group. You can follow the guide at:
As a quick check to verify that all dependencies have been correctly installed, you can run the following commands
Docker:
For offline installations, you can check docker with docker ps
command
Docker compose :
Docker versions older than 23 must usedocker-compose
command instead of docker compose
AWS CLI:
Akamas is deployed as a set of containerized services running on Docker and managed via Docker Compose. The latest version of the Akamas Docker Compose file and all the images required by Docker can be downloaded from the AWS ECR repository.
Two installation modes are available:
, in case the Akamas Server has access to the Internet - is also supported.
, in case the Akamas Server does not have access to the Internet.
Akamas is deployed as a set of containerized services running on Docker and managed via Docker Compose. In offine installation mode, the latest version of the Akamas Docker Compose file and all the images required by Docker cannot be downloaded from the AWS ECR repository.
Get in contact with Akamas Customer Services to get the latest versions of the Akamas artifacts to be uploaded to a location of your choice on the dedicated Akamas Server.
Akamas installation artifacts will include:
images.tar.gz
: Akamas main images
docker-compose.yml
: docker-compose file for Akamas
a binary file named akamas
: this is the binary file of the akamas CLI that will be used to verify the installation.
A preliminary step in offline installation mode is to import the shipped Docker images by running the following commands in the same directory where the tar files have been stored:
Notice that this import procedure could take quite some time!
To configure Akamas, the following environment variables are required to be set:
AKAMAS_CUSTOMER
: this is the customer name matching the one referenced in the Akamas license.
AKAMAS_BASE_URL
: this is the endpoint in the Akamas APIs that will be used to interact with the CLI, typically http://<akamas server dns address>:8000
Environment variables creation is performed by the snippet below:
It is recommended to save these exported variables in your ~/.bashrc
file for convenience.
To start Akamas you can now simply navigate into the akamas
folder and run a docker-compose
command as follows:
Notice that you may get the following error:
This is a documented docker bug (see this link) that can be solved by installing the "pass" package:
Ubuntu
RHEL
Akamas is deployed as a set of containerized services running on Docker and managed via Docker Compose. In online installation mode, the latest version of the Akamas Docker Compose file and all the images required by Docker can be downloaded from the AWS ECR repository.
In case the Akamas Server is behind a proxy server please also read how to setup Akamas behind a Proxy.
It is suggested to first create a directory akamas
in the home directory of your user, and then run the following command to get the latest compose file:
To configure Akamas, you should set the following environment variables:
AKAMAS_CUSTOMER
: this is the customer name matching the one referenced in the Akamas license.
AKAMAS_BASE_URL
: this is the endpoint in the Akamas APIs that will be used to interact with the CLI, typically http://<akamas server dns address>:8000
You can export the variables using the following snippet:
It is recommended to save these exported variables in your ~/.bashrc file for convenience.
In order to login into AWS ECR and pull the most recent Akamas containers images you also need to set the AWS authentication variables to the appropriate values provided by Akamas Customer Support Services by running the following command:
At this point, you can start installing Akamas server by running the following AWS CLI commands:
By default, Akamas uses the following ports for its UI:
80 (HTTP)
443 (HTTPS)
Depending on the configuration of your environment, you may want to change the default settings: in order to do so, you’ll have to update the Akamas docker-compose file.
Inside the docker-compose.yml file, scroll down until you come across the akamas-ui
service.
There you will find a specification as follows:
Update the yaml by remapping the UI ports to the desired ports of the host.
In case you were running Akamas with host networking, you are allowed to bind different ports in the container itself. In order to do so you can expand the docker-compose service by adding a couple of environment variables like this:
This section describes how to setup an Akamas Server behind a proxy server and to allow Docker to connect to the Akamas repository on AWS ECR.
First, create the /etc/systemd/system/docker.service.d
directory if it does not already exists. Then create or update the /etc/systemd/system/docker.service.d/http-proxy.conf
file with the variables listed below, taking care of replacing <PROXY>
with the address and port (and credentials if needed) of your target proxy server:
Once configured, flush the changes and restart Docker with the following commands:
For more details, refer to the official documentation page: Control Docker with systemd.
To allow the Akamas services to connect to addresses outside your intranet, the Docker instance needs to be configured to forward the proxy configuration to the Akamas containers.
Update the ~/.docker/config.json
file adding the following field to the JSON, taking care to replace <PROXY>
with the address (and credentials if needed) of your target proxy server:
For more details, refer to the official documentation page: Configure Docker to use a proxy server.
Set the following variables to configure your working environment, taking care to replace <PROXY>
with the address (and credentials if needed) of your target proxy server:
Once configured, you can log into the ECR repository through the AWS CLI and start the Akamas services manually.
This section describes how to install an Akamas workstation
The Akamas CLI allows users to invoke commands against the Akamas dedicated machine (Akamas Server). The Akamas CLI can also be installed on a different system than the Akamas Server.
Linux and Windows operating systems are supported for installing Akamas CLI.
The Akamas CLI can be installed and configured in three simple steps:
You can also read the section to modify the CLI ports the Akamas Server is listening to.
Akamas APIs and UI use plain HTTP when they are first installed. To enable the use of HTTPS you will need to:
Ask your security team to provide you with a valid certificate for your server. The certificate usually consists of two files with ".key" and ".pem" extension. You will need to provide the Akamas server DNS name.
Create a folder named "certs" in the same directory of Akamas docker-compose file;
Copy the ".key" and ".pem" files in the created "certs" folder and rename them to "akamas.key" and "akamas.pem" respectively. Make sure that the files belong to the same user and group you use to run Akamas.
Restart two Akamas services by running the following commands:
After the containers reboot is complete you will be able to access the UI over https from your browser:
Now that your Akamas server is configured to use HTTPS you can update the Akamas CLI configuration in order to use the secure protocol.
If you have not yet installed the Akamas CLI follow the in order to install it. If you already have the CLI available, you can run the following command:
You will be prompted to enter some input, please value it as follows:
You can test the connection by running:
It should return ‘OK’, meaning that Akamas has been properly configured to work over HTTPS.
The CLI configuration contains the information required to communicate with the akamas server. It can be easily created and updated with a configuration wizard. This page describes the main options of the Akamas CLI and how to modify them.
The CLI, as well as the UI, interacts with the akamas server via APIs. The apiAddress
configuration contains the information required in order to communicate with the server.
The Akamas Server provides two different listeners to interact with APIs:
a HTTP listener on port 8000
a HTTPS listener on port 8443
For improved security, it is recommended to configure CLI communications with the Akamas Server over HTTPS. Notice that you need to have a valid certificate installed on your Akamas server (at least a self-signed one) in order to enable HTTPS communication between CLI and the Akamas Server.
The CLI can be configured either directly via the CLI itself or via the YAML configuration file akamasconf
.
Issue the following command to change the configuration of the Akamas CLI:
and then follow the wizard to provide the required CLI configuration:
enable HTTPS communications:
enable HTTP communications:
Please notice that Verify SSL
must be set to True only if you are using a valid certificate. If you are using a self-signed one, please set it to False
. This will mimic the behavior of accepting a not valid HTTPS certificate on your favorite browser.
akamasconf
fileCreate a file and name it akamasconf
to be located in the following location:
Linux: ~/.akamas/akamasconf
Windows: C:\Users\<username>\.akamas
(where C: is the drive where the OS is installed)
The file location can be customized by setting an $AKAMASCONF
environment variable.
Here is an example akamasconf
file provided as a sample:
The SSL certificate is only required if verifySsl is set to true. In this case the SSL certificate requires an external CA to be validated.
The Akamas CLI can be accessed by simply running the akamas
command.
You can verify that the CLI have been installed by running this command:
which should show an output similar to this one
At any time, you can use the following command to see available commands and options.
For the full list of Akamas commands please refer to the section of the Akamas Reference guide.
The CLI is used to interact with an akamas server. To initialize the configuration of the Akamas CLI you can run the command:
and follow the wizard to provide the required information such as the server IP.
Here is a summary of the configuration wizard options.
This configuration can be changed at any time (see how to ).
After this step, the Akamas CLI can be used to login to the Akamas server, by issuing the following command:
and providing the credentials as requested.
Logging into Akamas requires a valid license. If you have not installed your license yet refer to the page .
To get Akamas CLI installed on Linux, run the following commands:
You can now run the Akamas CLI following by running the akamas
command.
In some installations, the /usr/local/bin
folder is not present in the PATH
environment variable. This prevents you from using akamas without specifying the complete file location. To fix this issue you can add an entry to the PATH
system environment variable or move the executable to another folder in your PATH
.
To enable auto-completion on Linux systems with a bash shell (requires bash 4.4+), run the following commands:
To install the Akamas CLI on Windows run the following command from the Powershell:
You can now run the Akamas CLI following by running .\akamas
in the same folder.
To invoke the akamas
CLI from any folder, create a akamas
folder (such as C:\Program Files\akamas
), and move there the akamas.exe
file. Then, add an entry to the PATH
system environment variable with the value C:\Program Files\akamas
. Now, you can invoke the CLI from any folder, by simply running the akamas
command.
This section describes some of the most common issues found during the Akamas installation.
Notice: this distro features a known issue since Docker default execution group is named dockerroot
instead of docker
. To make docker work edit (or create) /etc/docker/daemon.json
to include the following fragment:
After editing or creating the file, please restart Docker and then check the group permission of the Docker socket (/var/run/docker.sock
), which should show dockerroot
as a group:
Then, add the newly created akamas
user to the dockerroot
group so that it can run docker containers:
and check the akamas
user has been correctly added to dockerroot
group by running:
In case of issues in logging in through AWS CLI, when executing the following command:
Please check that:
Environment variables AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, AWS_DEFAULT_REGION
are correctly set
AWS CLI version is 2.0+
We recommend using the official AWS CLI installation guide for a smoother experience.
Please notice that the very first time Akamas is started, up to 30 minutes might be required to initialize the environment.
In case the issue persists you can run the following command to identify which service is not able to start up correctly
In some systems, the Docker socket, usually located in /var/run/docker.sock
can not be accessed within a container. This causes Akamas to signal this behavior by reporting the Access Denied error in the license service logs.
To overcome this limitation edit the docker-compose.yaml
file adding the line privileged: true
to the following services:
License
Optimizer
Telemetry
Airflow
The following is a sample configuration where this change is applied to the license service:
Finally, you can issue the following command to apply these changes
When installing Akamas it’s mandatory to export the AKAMAS_CUSTOMER variable as illustrated in the installation guide. This variable must match the one provided by Akamas representatives when issuing a license. If the variable is not properly exported license installation will fail with an error message indicating that the name of the customer installation does not match the one provided in the license.
You can easily inspect which value of this variable has been used when starting Akamas by running the following command on the Akamas server:
If you find out that the value is not the one you expect you can change it by running the following command on the Akamas server:
Once Akamas is up and running you can re-install your license.
For any other issues please contact Akamas Customer Support Services.
Akamas allows dumping log entries from a specific service, workspace, workflow, study, trial, and experiment, for a specific timeframe and at different log levels.
Akamas logs can be dumped via the following CLI command:
This command provides many filters which can be retrieved with the following command:
which should return
For example, to get the list of the most recent Akamas errors:
which should return something similar to:
Logging into Akamas requires a valid Akamas license.
To install a license get in touch with Akamas Customer Service to receive:
the Akamas license file
your assigned values for the AKAMAS_CUSTOMER
and AKAMAS_BASE_URL
variables referenced in the license file
login credentials
Once you have this information, you can issue the following commands:
Akamas might collect anonymized usage information on running optimizations. Collection and tracking are disabled by default and can be manually enabled.
External tracking is managed through the following environment variables:
AKAMAS_TRACKER_URL: the target URL for all tracking info.
AKAMAS_TRACKING_OPT_OUT: when set to 1, disables anonymous data collection.
Tracking for a running instance can be disabled by executing this simple command in the folder where the Akamas compose file is located:
As usual with environment variables, it is strongly suggested to export the desired value to your ~/.bashrc file to ensure persistence.
This section is a collection of different topics related to how to manage the Akamas Server.
This section covers some topics on how to manage the Akamas Server:
Run the following command to verify the correct startup and initialization of Akamas:
When all services have been started this command will return an "OK" message. Please notice that it might take a few minutes for Akamas to start all services.
To check that also UI is properly working please access the following URL:
You will see the Akamas login form:
Please notice that it is impossible to log into Akamas before a license has been installed. Read here how to Install an Akamas license.
Akamas has been designed and implemented to effectively support organizations in implementing their own approach to optimization, in particular, thanks to its Infrastructure as Code (IaC) design, modular and reusable constructs, and delegation-of-duty features to support multiple teams.
While an optimization process can also be a one-shot exercise aiming at optimizing a specific critical application to remediate performance issues or to address a cost reduction initiative, in general, optimization is conceived as a continuous and iterative process. This process can be seen as composed of multiple optimization campaigns running in parallel (each typically involving a single application) that are being executed at the same time (see the following figure).
In Akamas, an optimization campaign is structured into one or more , which represent an optimization initiative aimed at optimizing a target system with respect to defined goals and constraints.
At any given timeframe, for a specific application, there could be multiple studies being executed either in parallel or in sequence (see the following figure):
multiple live optimizations running for each critical application microservices; typically, a live optimization focuses on an application microservice supporting a specific business function with respect to specific optimization goals and constraints, as the optimization could be aimed for some microservices at improving performance while trading lower costs, while for others at keeping performances within the SLOs and reducing infrastructure or cloud cost;
multiple offline optimization studies may correspond to the different layers of the target system that are being optimized in several stages (typically starting with the backend layer, then the middleware, and finally the front-end layer), or to several application releases with different resources footprint (e.g. higher memory usage), or that involve technology changes in the application stack (e.g. moving from Oracle to MongoDB) or migration to a different cloud provider (or cloud managed service), or that are required to sustain higher workload (e.g. due to a marketing campaign) or to ensure application resilience under failure scenarios (identified by chaos engineering).
The following figure intends to illustrate the variety of scenarios in a real optimization process:
For example (with reference to the previous figure):
the optimization campaign for the microservices-based application App-1 runs an offline optimization study for the App-1-1 microservice in Q1 and the App-1-2 microservice in Q2, before running live optimizations for both these microservices in parallel starting from Q3; notice that in Q4, possibly to anticipate a workload growth and assess the required infrastructure, an offline optimization for App-1-2 (possibly the most resource-demanding microservice) is also executed;
the optimization campaign for the standalone application App-2 runs several offline optimizations in sequence: in Q1 and Q2, first separately on the frontend and backend layers of App-2 (respectively App-2-FE and App-2-BE) and then in Q3 for the entire application; in Q4, in addition to the quarterly optimization for App-2 with respect to the goal Goal-2-1 that was used in the previous optimizations, also another offline optimization is executed with respect to a different goal Goal-2-2, which could either be a refinement of the previous goal (e.g. with tighter SLOs) or reflecting a completely different goal (e.g. a cost-reduction goal with respect to a performance improvement goal);
the optimization campaign for the microservices-based application App-3 runs first a live optimization starting at some point in Q2 (for example as the application is first released) for most-critical microservice App-3-1 and then in Q3 also for other microservice App-3-2, possibly as a refinement of the modeling of App-3 based on the observed optimization results.
This section describes how to use Akamas
This guide introduces the optimization process and methodology with Akamas and then provides a step-by-step description of how to prepare, run and analyze Akamas optimization studies:
and also provides some technology-specific guidelines and examples on:
The process of backing up an Akamas server can be divided in two parts, that is system backup and otherwise start Akamas. Backup can be performed in any way you see fit: they’re just regular files so you can use any backup tool.
System services are hosted on AWS ECR repo so the only thing that fully defines a working Akamas application is the docker-compose.yml file. Performing a backup of the Akamas application is as simple as copying this single file to your backup location. you may schedule any script that performs this weekly or at any frequency you see fit
You may list all existing Akamas studies via the Akamas CLI command:
Then you can export all existing studies one by one via the CLI command
where UUID is the UUID of a single study. This command exports into a single archive file (tar.gz). These archive files can be backed up to your favorite backup folder.
Akamas server recovery involves recovering the system backup, restarting the Akamas service then re-importing the studies.
To restore the system you must recover the original docker-compose.yml
then launch the command
from the folder where you placed this YAML file and then wait for the system to come up, by checking it with the command
All studies can be re-imported singularly with the CLI command (referring to the correct pathname of the archive):
These studies can be either , which are typically executed in test or pre-production environments, also to validate planned changes or what-if scenarios, or which run directly in production environments.
More complex scenarios may result in the case of multiple teams (working jointly or separately) on the same or different applications, which in Akamas can be organized in different .
Akamas patches and upgrades need to be installed by following the specific instructions specified in the package provided. In case of new releases, it is recommended to read the related Release Notes. Under normal circumstances, this usually requires the user to update the docker-compose configuration, as described in the next section.
When using docker compose to install Akamas, there’s a folder usually named akamas
in the user home folder that contains a docker-compose.yml file. This is a YAML text file that contains a list of docker services with the URLs/version pointing to the ECR repo hosting all docker images needed to launch Akamas.
Here’s an excerpt of such a docker-compose.yml file (this example contains 3 services only):
The relevant lines that usually have to be patched during an upgrade are the lines with key "image" like:
In order to update to a new version you should replace the versions (1.7.0 or 2.3.0) after the colon with the new versions (ask your Akamas support for the correct service versions for a specific Akamas release) then you should restart Akamas with the following console commands: First login to Akamas CLI with:
and type username and password as in the example below
Now make sure you have the following AWS variables with the proper value in your Linux user environment:
Then log in to AWS with the following command:
Then pull all new ECR images for the new service versions you just changed (this should be done from when inside the same folder where file docker-compose.yml resides, usually $HOME/akamas/
) with the following command:
It should return an output like the following:
Finally, relaunch all services with:
(usage example below)
Wait for a few minutes and check the Akamas services are back up by running the command:
The expected output should be like the following (repeat the command after a minute or two if the last line is not "OK" as expected):
Akamas stores all its logs into an internal Elasticsearch instance: some of these logs are reported to the user in the GUI in order to ease the monitoring of workflow executions, while other logs are only accessible via CLI and are mostly used to provide more context and information to support requests.
Audit access can be performed by using the CLI in order to extract logs related to UI or API access. For instance, to extract audit logs from the last hour use the following commands:
UI Logs
API Logs
Notice: to visualize the system logs unrelated to the execution of workflows bound to workspaces, you need an account with administrative privileges.
To ease the integration with external logging systems, Akamas can be configured to store access logs into files. To enable this feature you should:
Create a logs
folder next to the Akamas docker-compose.yml
file
Edit the docker-compose.yml
file by modifying the line FILE_LOG: "false"
to FILE_LOG: "true"
If Akamas is already running issue the following command
otherwise, start Akamas first.
When the user interacts with the UI or the API Akamas will report detailed access logs both on the internal database and in a file in the logs
folder. To ease log rolling and management every day Akamas will create a new file named according to the pattern access-%{+YYYY-MM-dd}.log
.
To create a custom optimization pack, the following fixed directory structure and several YAML manifests need to be created.
The optimizationPack.yaml
file is the manifest of the optimization pack to be created, which should always be named optimizationPack
and have the following structure:
where:
name
string
It should not contain spaces.
TRUE
The name of the optimization pack.
description
string
TRUE
A description to characterize the optimization pack.
weight
integer
weight > 0
TRUE
A weight to be associated to the optimization pack. This field is used for licensing purposes.
version
string
It should match the regexp:
\d.\d.\d
TRUE
The version of the optimization pack.
tags
array of string
FALSE
An empty array
A set of tags to make the optimization pack more easily searchable and discoverable.
The component-types
directory should contain the manifests of the component types to be included in the optimization pack. No particular naming constraint is enforced on those manifests.
See Component Types template for details on the structure of those manifests.
The metrics
directory should contain the manifests of the groups of metrics to be included in the optimization pack. No particular naming constraint is enforced on those manifests.
See Metric template for details on the structure of those manifests.
The parameters
directory should contain the manifests of the groups of parameters to be included in the optimization pack. No particular naming constraint is enforced on those manifests.
See Parameter template for details on the structure of those manifests.
The telemetry-providers
directory should contain the manifests of the groups of parameters to be included in the optimization pack. No particular naming is enforced on those manifests.
See Telemetry Provider template for details on the structure of those manifests.
The following command need to be executed in order to produce the final JSON descriptor:
After this, the optimization pack can be installed (and then used) as described on the Managing optimization packs page.
Preparing an optimization study requires several steps, as illustrated by the following figure:
and described in the following sections:
Notice that while these steps apply to both offline optimization studies and live optimization studies, some of these steps are different depending on which optimization is being prepared.
The very first preparatory step is to model the system representing an application or a service that needs to be optimized (also known as the optimization target).
Modeling a system translates into identifying the components representing the key technology elements to be included in the optimization. Each component is associated with a set of tunable parameters, i.e. configurable properties that impact the performance, efficiency, or reliability of the system, and with a set of metrics, i.e. measurable properties that are used to evaluate the performance, efficiency, or reliability of the system. Typically, key system components are identified by considering which elements and their parameters need to be tuned.
The following figure shows a system corresponding to a Java-based application, where the Java Virtual Machine (JVM) and Kubernetes containers have been identified as key components.
As shown in this figure, a supported component is the "web application", representing the end user perspective of the modeled system (e.g. response time). As expected, this component type only provides measured metrics and no tunable parameters.
Akamas provides several out-of-the-box component types to support system and component modeling. Moreover, it is also possible to define new component types to model other components (see Modeling components).
The System template section of the reference guide describes the template required to define a system, while the commands for creating a system are listed on the Resource Management command page.
Properly modeling the application or service to be optimized by identifying the components and their parameters to tune is the first important step in the optimization process. Some best practices are described here below.
Modeling only relevant components
When defining the system and its components, it is convenient to focus only on those components that are either providing tunable parameters or key metrics (or KPIs).
Key metrics are those used to:
define the optimization goal and constraints, either as metrics that are expected to be improved by the optimization or as metrics representing constraints. For example, a typical goal is to optimize the application throughput. In this case, a Web Application component should include service metrics such as transaction throughput or transaction response time.
support the analysis of the optimization results, as metrics that are useful to measure the impact of parameter tuning on the performance, efficiency, or reliability of the system. For example, a Linux OS component could be used to assess the impact of the optimization on the system-level metrics such as CPU utilization.
Please note that the metrics used to define the optimization goal and constraints are mandatory as they are used by the Akamas AI engine to validate and score each tested configuration against the goal. Other metrics that are not related to the optimization goal and constraints can be considered optional from a pure optimization implementation perspective.
When defining the optimization study, it is always possible to select which parameters and metrics to consider, thus which components are modeled in the system. Therefore, a system could be modeled by all components that at some point are going to be optimized, even if not used in the current optimization study. However, the recommended approach is to model the system only with components whose parameters (and relevant metrics) are to be tuned by the current study.
Reusing systems whenever possible
Whenever possible, it is recommended to model systems and their components by considering how these could be reused for multiple optimization studies in different contexts.
For example, it might be useful to create a simple system containing only one component (e.g. the JVM) for a first optimization study. A new system might then be created to include other components (e.g. the application server) for more advanced optimization studies.
Please, also notice that systems (and other Akamas artifacts) can be shared with different teams thanks to the definition of Akamas workspace.
Modeling systems with horizontal scalability
A typical optimization target is a cluster, i.e. a system made of multiple instances that provide horizontal scalability (e.g. a Kubernetes deployment with several replicas). In this scenario, all the instances are supposed to be identical both from a code and configuration perspective. In this scenario, the recommended approach is to create only one component that represents a generic instance of the cluster. This way, all the instances will be tuned in exactly the same way.
In this scenario, the associated automation workflow needs to be configured to ensure that each configuration is applied to the whole cluster, by propagating the parameter configuration to all of the cluster instances, not just to a single instance represented by the modeled component whose metrics are collected and used to evaluate the overall cluster behavior under that configuration.
Notice that in order for this approach to work correctly, it is also important to verify that the cluster is correctly monitored by the telemetry providers. Depending on the telemetry technology in use, the clustered system may be presented as either a single entity, with aggregated metrics (e.g. a Kubernetes deployment with the total CPU usage of all the replica pods), or as multiple entities, each corresponding to the different instances in the cluster:
in case aggregated metrics are provided by the telemetry provider for the cluster, these metrics can be simply assigned to the component modeling the whole cluster;
in case only instance-level metrics are made available by the telemetry provider, telemetry instances need to be configured in Akamas so as to aggregate the metrics of the cluster instances (e.g. averaging CPU utilization, summing memory usage, etc.), depending on how each specific metric is expected to be used in the goal and constraints or in the study results.
After identifying the components that are required to model a system, the following step is to model each identified key component.
Akamas provides the corresponding component types for their specific technology (and possibly version) and optimization packs describing all the tunable parameters and metrics of interest. The full list of Akamas optimization packs is available on the optimization packs page of the Akamas reference guide.
The Component template section of the reference guide describes the template required to define a system component, while the commands for creating a system component are listed on the Resource Management command page.
While the optimization process does not necessarily require component types and optimization packs to be defined, it is recommended to leverage this construct to facilitate modularization and reuse.
This is possible as the Akamas optimization pack model is extensible: custom optimization packs can be easily created without any programming to allow Akamas optimization capabilities to be applied to virtually any technology.
The Creating custom optimization pack page describes how to create a new optimization pack (possibly by reusing an already existing one) while the Component Type template page in the Akamas reference guide describes how to define a custom component type (if required).
Notice that optimization packs, even if provided out-of-the-box by Akamas, need to be installed (as described on the Managing optimization packs page), in case they have not yet been used before in the Akamas installation, by other users. Indeed, optimization packs are global resources that are shared across all the workspaces on the same Akamas installation.
Whether out-of-the-box or custom, before being used optimization packs need to be installed on an Akamas installation before being used.
Since optimization packs are global resources that are shared across all the workspaces on the same Akamas installation, an account with administrative privileges is required to manage them.
Optimization packs that are not yet installed are displayed as grayed out in the Akamas UI (this is the case for the AWS and Docker packs in the following figure).
The content of the store can be also inspected from the store container on the Akamas server:
which also provides the list of the associated JSON file (the optimization pack descriptor).
An Akamas installation comes with the latest optimization packs already loaded in the store and is able to check the central repository for updates.
There are two ways of installing an optimization pack:
online installation - this is the general case when the optimization pack is already in the store
Only in the first case, an optimization pack can be installed from the UI. See here below the command line commands to get an optimization pack installed.
Online installation
Execute the following command by specifying the name of the optimization pack that is already available in the store:
Execute the following command to install an optimization pack by specifying the name of the optimization pack and the full path to the JSON descriptor file:
When installing an optimization pack, the following checks are executed to identify potential clashes with already existing resources:
name of the optimization pack
metrics
parameters
component types
telemetry providers
In case one of those checks is positive (i.e. a clash exists), the installation failed and a message notifies that a "force" option needs to be used to get the optimization pack installed anyway
Please be aware that when forcing the installation of an optimization pack, Akamas replaces (or merges) all the conflicting resources, except that if there is at least one custom resource, the installation is stopped. In this case, the custom resource needs to be manually removed first in order to proceed.
The following command uninstalls an optimization pack
Notice that this also deletes all the components built using that optimization pack.
In case a new optimization pack needs to be installed from a descriptor, the procedure is the following:
uninstall the optimization pack
remove the old version of the optimization pack descriptor file from the store container;
install the new optimization pack with the new JSON descriptor
After modeling the system and its components and ensuring that appropriate telemetry instances are defined, the following step (see the following figure) is to define a .
A workflow automates all the tasks to be executed in sequence (see the following figure) during the optimization study, in particular those leveraging integrations with external entities, such as telemetry providers or configuration management tools. Akamas provides a number of general-purpose and specialized workflow operators (see page).
The following command describes how to download the file descriptor related to the version 1.3.0 of the Linux optimization pack:
offline installation - this may apply to custom optimization packs available as a JSON file (refer to the page)
The section of the reference guide describes the template required to define a workflow, while the commands for creating a workflow are listed on the page.
Since a workflow is an Akamas resource defined at the level and that can be used by multiple studies, it might be the case that a convenient workflow is already available or can be used to create a new workflow for the specific target system and integrations, by adding/removing some workflow tasks, changing the task sequence or the values assigned to task parameters.
Notice that since the structure of workflows defined for a and for an are very different, these cases are described by a specific page:
This page provides a short compendium of general performance engineering best practices to be applied in any load testing exercise. The focus is on how to ensure that realistic performance tests are designed and implemented to be successfully leveraged for optimization initiatives.
The goal of ensuring realistic performance tests boils down to two aspects:
sound test environments;
realistic workloads.
A test o the pre-production environment (Test Env from now on) needs to represent as closely as possible the production environment (ProdEnv from now on).
The most representative test environment would be a perfect replica of the production environment from both infrastructure (hardware) and architecture perspectives. The following criteria and guidelines can help design a TestEnv that is suitable for performance testing supporting optimization initiatives.
Hardware specifications
The hardware specifications of the physical or virtual servers running in TestEnv and ProdEnv must be identical. This is because any differences in the available resources (e.g. amount of RAM) or specification (e.g. CPU vendor and/or type) may affect both services performance and system configuration.
This general guideline can only be relaxed for servers/clusters running container(s) or container orchestration platforms (e.g. Kubernetes or OpenShift). Indeed, it is possible to safely execute most of the related optimization cases if the TestEnv guarantees enough spare/residual capacity (number of cores or amount of RAM) to allocate all the needed resources.
While for monolithic architectures this may translate into significant HW requirements, with microservices this might not be the case, for two main reasons:
microservices are typically smaller than monoliths and designed for horizontal scalability: this means that optimizing the configuration of the single instance (pod/container resources and runtime settings) becomes easier as they typically have smaller HW requirements;
approaches like Infrastructure-as-code (IaaC), typically used with cloud-native applications, allow for easily setting up cluster infrastructure (on-prem or on the cloud) that can mimic production environments.
Downscaled/downsized architecture
Test Envs are typically downscaled/downsized with respect to Prod Envs. If this is the case, then optimizations can be safely executed provided it is possible to generate a "production-like" workload on each of the nodes/elements of the architecture.
This can be usually achieved if all the architectural layers have the same scale ratio between the two environments and the generated workload is scaled accordingly. For example, if the ProdEnvs has 4 nodes at the front-end layer, 4 at the backend layer, and 2 at the database layer, then a TestEnv can have 2 nodes, 2 nodes, and 1 node respectively.
Load balancing among nodes
From a performance testing perspective, the existence of a load balancing among multiple nodes can be ignored, if the load balancing relies on an external component that ensures a uniform distribution of the load across all nodes.
On the contrary, if an application-level balancing is in place, it might be required to include at least two nodes in the testing scenario so as to take into account the impact of such a mechanism on the performance of the cluster.
External/downstream services
The TestEnv should also replicate the application ecosystem, including dependencies from external or downstream services.
External or downstream services should emulate the production behavior from both functional (e.g. response size and error rate) and performance (e.g. throughput and response times) perspectives. In case of constraints or limitations on the ability to leverage external/downstream services for testing purposes, the production behavior needs to be simulated via stubs/mock services.
In the case of microservices applications, it is also required to replicate dependencies within an application. Several approaches can be taken for this purpose, such as:
replicating interacting microservices;
mocking these microservices and simulating realistic response times using simulation tools such as https://github.com/spectolabs/hoverfly;
disregarding dependencies with nonrelevant services (e.g. a post-processing service running on a mainframe whose messages are simply left published in a queue without being dequeued).
The most representative performance test script would provide 100% coverage of all the possible test cases. Of course, this is very unlikely to be the case in performance testing. The following criteria and guidelines can be considered to establish the required test coverage.
Statistical relevance
The test cases included in the test script must cover at least 80% of the production workload.
Business relevance
The test cases included in the test script must cover all the business-critical functionalities that are known (or expected) to represent a significant load in the production environment
Technical relevance
The test cases included in the test script must cover all the functionalities that at the code level involve:
Large objects/data structure allocation and management
Long living objects/data structure allocation and management
Intensive CPU, data, or network utilization
"one of-a-kind" implementations, such as connections to a data source, ad-hoc objects allocation/management, etc.
The virtual user paths and behavior coded in the test script must be representative of the workload generated by production users. The most representative test script would account for the production users in terms of a mix of the different user paths, associated think times, and session length perspectives.
When single-user paths cannot be easily identified, the best practice is to consider each of them the most comprehensive user journey. In general, a worst-case approach is recommended.
The task of reproducing realistic workloads is easier for microservice architectures. On the contrary, for monolithic architectures, this task could become hard as it may not be easy to observe all of the workloads, due to custom frameworks, etc. With microservices, the workload can be completely decomposed in terms of APIs/endpoints and APM tools can provide full observability of production workload traffic and performance characteristics for each single API. This guarantees that the replicated workload can reproduce the production traffic as closely as possible.
Both test script data, that is datasets used in the test script, and test environment data, that is datasets in any involved databases/datastores, have to be characterized both in terms of size and variance so as to reproduce the production performances.
Test script data
The test script data has to be characterized in order to guarantee production-like performances (e.g. cache behavior). In case this characterization is difficult, the best practice is to adopt a worst-case approach.
Test environment data
The test data must be sized and have an adequate variance to guarantee production like performances in the interaction with databases/datastores (e.g. query response times).
Most performance test tools provide the ability to easily define and modify the test scenarios on top of already defined test cases/scripts, test case-mix, and test data. This is especially useful in the Akamas context where it might be required to execute a specific test scenario, based on the specific optimization goal defined. The most common (and useful, in the Akamas context) test scenarios are described here below.
Load tests
A load test aims at measuring system performance against a specified workload level, typically the one experienced or expected in production. Usually, the workload level is defined in terms of virtual user concurrency or request throughput.
In the load test, after an initial ramp-up, the target load level is maintained constant for a steady state until the end of the test.
When validating a load test, the following two key factors have to be considered:
The steady-state concurrency/throughput level: a good practice is to apply a worst-case approach by emulating at least 110% of the production throughput;
The steady-state duration: in general defining the length for steady-state is a complex task because it is strictly dependent on the technologies under test and also because phenomena such as bootstraps, warm-ups, and caching can affect the performance and behavior of the system only before or after a certain amount of time; as a general guide to validate the steady-state duration, it is useful to:
execute a long-run test by keeping the defined steady-state for at least 2h to 3h;
analyze test results by looking for any variation in the performance and behavior of the system over time;
In case no variation is observed, shorten the defined same steady-state to at least 30+min.
Stress tests
A Stress test is all about pushing the system under test to its limit.
Stress tests are useful to identify the maximum throughput that an application can cope with while working within its SLOs. Identifying the breaking point of an application is also useful to highlight the bottleneck(s) of the application.
A stress test also makes it possible to understand how the system reacts to excessive load, thus validating the architectural expectations. For example, it can be useful to discover that the application crashes when reaching the limit, instead of simply enqueuing requests and slowing down processing them.
Endurance tests
An endurance test aims at validating the system's performance over an extended period of time.
The first validation is provided by utilization metrics (e.g. CPU, RAM, I/O), which should closely display in the test environments the same behavior of production environments. If the delta is significant, some refinements of the test case and environment might be required to close the gap and gain confidence in the test results.
The following provides some best practices that can be adopted before launching optimization studies, in particular for offline optimization studies.
It is recommended to execute a dry-run of the study to verify that the workflow works as expected and in particular that the telemetry and configuration management steps are correctly executed.
Verify that workflow actually works
It is important to verify that all the steps of the workflow complete successfully and produce the expected results.
Verify that parameters are applied and effective
When approaching the optimization of new applications or technologies, it is important to make sure all the parameters that are being set are actually applied and used by the system.
Depending on the specific technology at hand, the following issues can be found:
parameters were set but they are not applied - for example parameters were set in the wrong configuration file or the path is not correct;
some automatic (corrective) mechanisms are in place that overrides the values applied for the parameters.
Therefore, it is important to always verify the actual values of the parameters once the system is up & running with a new configuration, and make sure they match the values applied by Akamas. This is typically done by leveraging:
monitoring tools, when the parameters are available as metrics or properties of the system;
native administration tools, which are typically available for introspection or troubleshooting activities (e.g. jcmd for the JVM).
Verify that load testing works
It is important to verify that the integration with load testing tools actually executes the intended load test scenarios.
Verify that telemetry collects all the relevant metrics
It is important to make sure that the integration with telemetry providers works correctly and that all the relevant metrics of the system are correctly collected.
Data-gathering from the telemetry data sources is launched at the end of the workflow tasks. The status of the telemetry process can be inspected in the Progress tab, where it is also possible to inspect the telemetry logs in case of failures.
Please notice that the telemetry process fails if the key metrics of the study cannot be gathered. This includes metrics defined in the goal function or constraints.
Before running the optimization study, it is important to make sure the system and the environment where the optimization is running provide stable and reproducible performance.
Make sure the system performance is stable
In order to ensure a successful optimization, it is important to make sure that the target system displays stable and predictable performance and does not suffer from random variations.
To make sure this is the case, it is recommended to create a study that only runs a single baseline experiment. In order to assess the performance of the system, Akamas trials can be used to execute the same experiments (hence, the same configuration) multiple times (e.g. three times). Once the experiment is completed, the resulting performance metrics can be analyzed to assess the stability. The analysis can either be done by leveraging aggregate metrics in the Analysis tab, or to a deeper level on the actual time series by accessing the Metrics tab from the Akamas UI.
Ideally, no significant performance variation should be observed in the different trials, for the key system performance metrics. Otherwise, it is strongly recommended to identify the root cause before proceeding with the actual optimization activity.
Before launching the optimization it might be a good idea to take note of (or backup) the original configuration. This is very important in the case of Linux OS parameters optimization.
A workflow for a live optimization study automates all the actions required to interface the configuration management. Notice that metrics collection is an implicit action that does not need to be coded as part of the workflow.
More in detail, a typical workflow includes the following types of tasks:
Applying the configuration, by preparing and then applying the parameter configuration that has been recommended and/or approved to the target environment - this may require interfacing configuration management tools or pushing configuration to a repository
Depending on the complexity of the system, the workflow might be composed by multiple actions of the same type, each operating on separate components of the target system.
As expected, with respect to workflows for offline optimization studies, there are no actions to apply synthetic workloads as part of a load-testing scenario.
After modeling the system and its components, the following step (see the following figure) is to ensure that all the metrics that are required to define goals and constraints and analyze the behavior of the target system can be collected from one of the available data sources available in the environment, that in Akamas are called telemetry providers.
Akamas provides a number of out-of-the-box telemetry providers, including industry-standard monitoring platforms (e.g. Prometheus or Dynatrace), performance testing tools (e.g. LoadRunner or JMeter), or simple CSV files. The section Integrating Telemetry Providers lists all the out-of-the-box telemetry providers and how to get them integrated by Akamas, while the section Telemetry metric mapping describes the mapping of the specific data source metrics to Akamas metrics).
Since several instances of a data source type might be available, the specific data source instance needs to be specified, that is a corresponding telemetry instance needs to be defined for the modeled system and its components.
The Telemetry instance template section of the reference guide describes the template required to define a telemetry instance, while the commands for creating a telemetry instance are listed on the Resource Management command page.
Telemetry Providers are shared across all the workspace in the same Akamas installation and require an account with administrative privileges to manage them. Any number of telemetry instances (even of the same type) can be specified. For example, the following figure shows two Prometheus telemetry instances associated with the Adservice system.
The following sections provide guidelines on how to create telemetry instances.
Verify metrics provided by the telemetry provider
A seemingly obvious, yet fundamental, best practice when choosing a telemetry provider is to check whether the required metrics:
are supported by the original data source or can be added (e.g. as it is in the case of Prometheus)
are available and can be effectively gathered in the specific implementation
are supported by the telemetry provider itself or whether it needs to be extended (this is the case for a Prometheus telemetry provider ) as in the case of custom metrics such as those made available by the application itself
Akamas makes it possible to validate whether a telemetry setup works correctly by first executing dry runs. This is discussed in the context of the recommended practices to run optimization studies (section Running optimization studies).
The final preparatory step before running a study is to actually create the study, which also requires several substeps.
Most of the substeps are common for both a live optimization study and an offline optimization study, even if they might need to be conceived differently in these two different contexts:
Other optional and mandatory steps are specific for offline optimization studies (see below) and live optimization studies (see below).
The Study template section of the reference guide describes the template for creating a study, while the commands for creating a study are on the Resource Management command page. For offline optimization studies only, the Akamas UI displays the "Create a study" button that provides a visual step-by-step procedure for creating a new optimization study (see the following figure).
For offline optimization studies, there are some additional (optional) steps:
defining windowing policies (optional - typically after defining the goal & constraints)
defining KPIs (optional - typically after defining the goal & constraints)
Notice that Akamas also allows existing offline optimization studies to be duplicated either from the Akamas UI (see the following figure) or from the command line (refer to the Resource management commands page).
For live optimization studies, there are some additional steps - including a mandatory one:
defining workloads (mandatory - typically after defining the goal & constraints)
setting safety policies (optional - typically when defining the optimization steps)
A workflow for an offline optimization study automates all the actions required to interface the configuration management and load testing tools (see the following figure) at each experiment or trial. Notice that metrics collection is an implicit action that does not need to be coded as part of the workflow.
More in detail, a typical workflow includes the following types of tasks:
Preparing the application, by executing all cleaning or reset actions that are required to prepare the load testing phase and ensuring that each experiment is executed under exactly the same conditions - for example, this may involve cleaning caches, uploading test data, etc
Applying the configuration, by preparing and then applying the parameter configuration under test to the target environment - this may require interfacing configuration management tools or pushing configuration to a repository, restarting the entire application or some of its components to ensure that some parameters are effectively applied, and then checking that after restarting the application is up & running before the workflow execution continues, and checking whether the configuration has been correctly applied
Applying the workload, by launching a load test to assess the behavior of the system under the applied configuration and synthetic workload defined in the load testing scenarios - of course, a preliminary step is to design a load testing scenario and synthetic workload that ensures that optimized configurations resulting from the offline optimization can be applied to the target system under the real or expected workload
The Optimization examples section provides some examples of how to define workload for a specific technology. In a complex application, a workflow may include multiple actions of the same type, each operating on separate components of the target system. The Knowledge base guide provides some real-world examples of how to create workflows and optimization studies.
A workflow interrupts in case any of its steps does. A failing workflow causes the experiment or trial to fail. This should be considered as a different situation than a specific configuration not matching optimization constraints or causing the system under test to fail to run. For example, if the amount of max memory configured was too low, the application may fail to start.
When an experiment fails, the Akamas AI engine takes this information into account and thus learns that that parameter configuration was bad. This way, the AI engine automatically tries to avoid the regions of the parameter space which can lead to low scores or failures.
This explains why it is important to build robust workflows that ensure experiments only fail in case bad configurations are tested. See the specific entry Building robust workflow in the best practices section below.
Creating effective workflows is essential to ensure that Akamas can automatically identify the optimal configuration in a reliable and efficient way. Some best practices on how to build robust workflows are described here below.
Some additional best practices related to the design and implementation of load testing are described in the Performing load testing to support optimization activities page.
Since Akamas workflows are first-class entities that can be used by multiple studies, it might be useful to avoid creating (and maintaining) multiple workflows and instead define workflows that can be easily reused, by factoring all differences into specific action parameters.
Of course, this general guideline should be balanced with respect to other requirements, such as avoiding potential conflicts due to different teams modifying the same workload for different uses and potentially impacting optimization results.
Akamas takes into account the exit code of each of the workflow tasks, and the whole workflow fails if a task exits with an error. Therefore, the best practice is to make use of exit codes in each task, to ensure that task failures can only happen in case of bad parameter configuration.
For example, it is important to always check that the application has correctly started and is up and running (after a new configuration has been applied). This can be done by:
including a workflow task that tests the application is up and running after the tasks where the configuration is applied;
making sure that this task exits with an error in case the application has not correctly started (typically after a timeout).
Another example is when the underlying environment incurs issues during the optimization (e.g. a database might be mistakenly shut down by another team). As much as possible, all these environmental transient issues should be carefully avoided. Akamas also provides the ability to execute multiple task retries (default is twice, configurable) to compensate for these transient issues, provided they only last for a short time (the retry time and delay are also configurable).
Building workflows that ensure reproducible experiments
As for any other performance evaluation activity, Akamas experiments should be designed to be reproducible: if the same experiment (hence, the same parameter configuration) is executed multiple times (i.e. in multiple trials), the same performance results should be found for each trial.
Therefore, it is fundamental that workflows include all the necessary tasks to realize reproducible experiments. Particular care needs to be taken to correctly manage the system state across the experiments and trials. System state can include:
Application caches
Operating system cache and buffers (e.g. Linux filesystem page cache)
Database tables that fill up during the optimization process
All experiments should always start with a clean and well-known state. If the state is not properly managed, it may happen that the performance of the system is observed to change (whether higher or lower) not because of the effect of the applied parameters, but due to other effects (e.g. warming of caches).
Best practices to consistently manage system state across experiments include:
Restoring the system state at the beginning of each experiment - this may involve restarting the application, clearing caches, restoring DB tables, etc;
Allowing for a sufficient warm-up period in the performance tests, so to ensure application performance has reached stability. See also the recommended best practices about properly managing warm-up periods in the following section about creating an optimization study.
Another common cause that can impact the reproducibility of experiments is an unstable infrastructure or environment. Therefore, it is important to ensure that the underlying infrastructure is stable and that no other workload that might impact the optimization results is running on it. For example, beware of scheduled system jobs (e.g. backups), automatic software updates or anti-virus systems that might not explicitly be considered as part of the environment but that may unexpectedly alter its performance behavior.
Taking into account workflow duration
When designing workflows, it is important to take into account the potential duration of their tasks. Indeed, the task duration impacts the duration of the overall optimization and might impact the ability to execute a sufficient number of experiments within the overall time interval or specific time windows allowed for the optimization study.
Typically, the longest task in a workflow is the one related to applying workload (e.g. launching a load test or a batch job): such tasks can last for dozens of minutes if not hours. However, a workflow may also include other ancillary tasks that may provide nontrivial contributions to the task durations (e.g. checking the status to ensure that the application is up & running).
Making workflows fail fast
As general guidance, it is better to fail fast by performing quick checks executed as early as possible. For example, it is better to do a status check before launching a load test instead of possibly waiting for it to complete (maybe after 1h) just to discover that the application did not even start.
While the optimization goal drives the Akamas AI toward optimal configurations, there might be other sub-optimal configurations of interest in case they do not simply match the optimization constraints but might also improve on some Key Performance Indicators (KPIs).
For example:
for a Kubernetes microservice Java-based application, a typical optimization goal is to reduce the overall (infrastructure or cloud) cost by tuning both Kubernetes and JVM parameters while keeping SLOs in terms of application response time and error rate under control
among different configurations that provide similar cost reduction in addition to matching all SLOs, a configuration that would also significantly cause the application response time might be worth considering with respect to an optimal configuration that does not improve on this KPI
Akamas automatically considers any metric referred to in the defined optimization goal and constraints for an offline optimization study as a KPI. Moreover, any other metrics of the system component can be specified as a KPI for an offline optimization study.
The KPIs page of the Study template section in the reference guide describes how to define the corresponding structure. Specifying the KPIs can be done while first defining the study or from the Akamas UI, at either study creation time or afterward (see the following figures).
Once KPIs are defined, Akamas will represent the results of the optimization in the Insights section of the Akamas UI. Moreover, the corresponding suboptimal configuration associated with a specific KPI is highlighted in the Akamas UI by a textual badge "Best <KPI name>".
Please notice that KPIs can also be re-defined after an offline optimization study has been completed as their definition does not affect the optimization process, only the evaluation of its results. See the section Analyzing offline optimization studies and the Optimization Insights page.
After defining the goal and its constraints, the following substep in creating an optimization study is specifying the optimization parameters and metrics. In particular, selecting the parameters that are going to be tuned to optimize the system is a critical decision that requires carefully balancing complexity and effectiveness. As for goals & constraints, also this step may require adopting an iterative approach. See also the Best Practices section here below.
The Parameter selection and Metric selection pages of the Study template section in the reference guide describe how to define the corresponding structure. For offline optimization studies only, the Akamas UI allows the parameters and metrics to be defined as part of the visual procedure activated by the "Create a study" button (see the following figure).
As illustrated by the previous and following figures, during this step is also possible to edit the range of values associated with each optimization parameter with respect to the default domain provided by either the original or custom optimization pack in use for the respective technology.
Please also refer to the Guidelines for choosing optimization parameters for a number of selected technologies. Some examples provided in the Knowledge Base guide may also provide useful guidance.
By default, all parameters specified in the parameters selection of a study are applied ("rendered"). Akamas allows specifying which configuration parameters should be applied in the optimization steps. More precisely:
parameter rendering is available at the step level for baseline, preset, and optimize steps
parameter rendering is not available for bootstrap steps (bootstrapped experiments are not executed)
This feature can be useful to deal with the different strategies through which applications and systems accept configuration parameters.
Please refer to the Parameter rendering page to see how to configure parameter rendering.
The following sections provide some best practices on how to best approach the step of defining optimization parameters. .
Configure parameters domains based on environment specs
Since the parameter domain defines the range of values that the Akamas AI engine can assign to the parameter, when defining the system parameters to be optimized, it is important to review the parameter domains and adjust them based on the system characteristics of the target system, environment and best practices in place.
Akamas optimization packs already provide parameter domains that are correct for most situations. For example, the OpenJDK 11 JVM gcType is a categorical parameter that already includes all the possible garbage collectors that can be set for this JVM version.
For other parameters, there are no sensible default domains as they depend on the environment. For example, the OpenJDK 11 maxHeapSize JVM parameter dictates how much memory the JVM can use. This obviously depends on the environment in which the JVM runs. For example, the upper bound might be 90% of the memory of the virtual machine or container in which the JVM runs.
Defining good parameter domains is important to ensure the parameter configurations suggested by the Akamas AI engine will be as good as possible. Notice that if the domain is not defined correctly, this may cause experiment failures (e.g. the JVM could not start if the maxHeapSize is higher than the container size). As discussed as part of the best practices for defining robust workflows, the Akamas AI engine has been designed to learn configurations that may lead to failures and to automatically discover any hidden constraints found in the environment.
Configure parameter constraints based on Optimization Pack best practices
Depending on the specific technology under optimization, the configuration parameters may have relationships among themselves. For example, in a JVM the newSize parameter defines the size of a region of the JVM heap, and hence its value should be always less than the maxHeapSize parameter.
Akamas AI engine supports the definition of constraints among parameters as this is a frequent need when optimizing real-life applications.
It is important to define the parameter constraints when creating a new study. The optimization pack documentation provides guidelines on what are the most important parameter constraints for the specific technology.
When optimizing a new or custom technology, it may happen that some experiments fail due to unknown parameter constraints being violated. For example, the application may fail to start and only by analyzing the application error logs, the reason for the failure can be understood. For a Java application, the JVM error message (e.g. "new size cannot be larger than max heap size") could provide useful hints. This would reveal that some constraints need to be added to the parameter constraints in the study.
While the Akamas AI engine has been designed to learn from failures, including those due to relationships among parameters that were not explicitly set as constraints, setting parameter constraints may help avoid unnecessary failures and thus speed up the optimization process.
For both offline and live optimization studies, it is possible to define how to identify the time windows that Akamas needs to consider for assessing the result of an experiment. Defining a windowing policy helps achieve reliable optimizations by excluding metrics data points that should not influence the score of an experiment.
The following two windowing policies are available:
Trim windowing: discards the initial and final part of an experiment - e.g. to exclude warm-up and tear-down phases - trim windowing policy is the default (with entire interval selection whether no trimming is specified)
Stability windowing: discard those parts that do not correspond to the most stable window - this leverages the Akamas features of automatically identifying the most stable window based on the user-specified specified criteria
The Windowing policy page of the Study template section in the reference guide describes the corresponding structures. For offline optimization studies only, the Akamas UI allows the windowing policies to be defined as part of the visual procedure activated by the "Create a study" button (see the following figures).
The following sections provide general best practices on how to define suitable windowing policy.
Define windowing based on the optimization goal
In order to make the optimization process fully automated and unattended, Akamas automatically analyzes the time series of the collected metrics of each experiment and calculates the experiment score (all the system metrics will also be aggregated).
Based on the optimization goal, it is important to instruct Akamas on how to perform this experiment analysis, in particular, by also leveraging Akamas windowing policies.
For example, when optimizing an online or transactional application, there are two common scenarios:
Increase system performance (i.e. minimize response time) or reduce system costs (i.e. decrease resource footprint or cloud costs) while processing a given and fixed transaction volume (i.e. a load test);
Increase the maximum throughput a system can support (i.e., system capacity) while processing an increasing amount of load (e.g. a stress test).
In the first scenario, a load test scenario is typically used: the injected load (e.g. virtual users) ramps up for a period, followed by a steady state, with a final ramp-down period. From a performance engineering standpoint, since the goal is to assess the system performance during the steady state, the warm-up and tear-down periods can be discarded. This analysis can be automated by applying a windowing policy of type "trim" upon creating the optimization study, which makes Akamas automatically compute the experiment score by discarding a configurable warm-up and tear-down period.
In the second scenario, a stress test is typically used: the injected load follows a ramp with increasing levels of users, designed to stress the system up to its limit. In this case, a performance engineer is most likely interested in the maximum throughput the system can sustain before breaking down (possibly while matching a response time constraint). This analysis can be automated by applying a windowing policy of type "stability", which makes Akamas automatically compute the experiment score in the time window where the throughput was maximized but stable for a configurable period of time.
When optimizing a batch application, windowing is typically not required. In such scenarios, a typical goal is to minimize batch duration or aggregate resource utilization. Hence, there is no need to define any windowing policy: by default, the whole experiment timeframe is considered.
Setting up an effective stability window requires some knowledge of the test scenario and the variability of the environment.
As a general guideline it is recommended to run a baseline study with a stability window set to a low value, such as a value close to 0 or half of the expected mean of the metric, and then to inspect the results of the baseline to identify which window has been identified and update the standard deviation threshold accordingly. When using a continuous ramp the test has no plateaus, so the standard deviation threshold should be a bit higher to account for the increment of the traffic in the windowing period. On the contrary, when running a staircase test with many plateaus, the standard deviation can be smaller to identify a period of time with the same amount of users.
Applying the standard deviation filter to very stable metrics, such as the number of users, simplifies the definition of the standard deviation threshold but might hide some instability of the environment when subject to constant traffic. On the other hand, applying the threshold to a more direct measure of the performance, such as the throughput, makes it easier to identify the stability period of the application but might require more baseline experiments to identify the proper threshold value. The logs of the scoring phase provide useful insights into the maximum standard deviation found and the number of candidate windows that have been identified given a threshold value, which can be used to refine the threshold in a few baseline experiments.
The first fundamental step in creating a study is to define the study goal & constraints. While this step might be perceived as somewhat straightforward (e.g. constraints could be simply translated from SLOs already in place), defining the optimization goal really requires carefully balancing complexity and effectiveness, also as part of the general (iterative) optimization process. Please also read the Best Practices section here below.
In general, any performance engineering, tuning, and optimization activity involves complex tradeoffs among different - and potentially conflicting - goals and system performance metrics, such as:
Maximizing the business volume an application can support, while not making the single transaction slower or increasing errors above a desired threshold
Minimizing the duration of a batch processing task, while not increasing the cloud costs by more than 20% or using more than 8 CPUs
Akamas support all these (and other) scenarios by means of the optimization goal, that is the single metric or the formula combining multiple metrics that have to be either minimized or maximized, and one or more constraints among metrics of the system.
In general, constraints can be defined as either absolute constraints (e.g. app.response_time < 200 ms) or as relative constraints with respect to a baseline (e.g. app_response_time < +20% of the baseline), that is the current configuration in place, typically corresponding to the very first experiment in an offline optimization study which. Therefore, relative constraints are only applicable to offline optimization studies, while absolute constraints are applicable to both absolute and relative constraints.
Please notice that when defining constraints for an optimization study, it is required to also include those constraints listed in the Constraints section of the respective Optimization Packs which express internal constraints among parameters. For example, in case OpenJDK 11 components are to be tuned, the reference section is Constraints.
The Goal & Constraint page of the Study template in the reference guide describes the corresponding structures. For offline optimization studies only, the Akamas UI allows the optimization goal and constraints to be defined as part of the visual procedure activated by the "Create a study" button (see the following figure).
Please notice that any experiment that does not respect the constraints is marked by Akamas as failed, even if correctly executed. The reason for this failure can be inspected in the experiment status. Similarly to workflow failures (see below), the Akamas AI engine automatically takes any failure due to constraint violations into account when searching the optimization space to identify the parameter configurations that might improve the goal metrics while matching constraints.
There are no general guidelines and best practices on how to best define goals & constraints, as this is where experience, knowledge, and processes meet.
Please refer to the section Optimization examples for a number of examples related to a variety of technologies and the Knowledge Base guide for real-world examples.
While Akamas leverages similar AI methods for both live optimizations and optimization studies, the way these methods are applied is radically different. Indeed, for optimization studies running in pre-production environments, the approach is to explore the configuration space by also accepting potential failed experiments, to identify regions that do not correspond to viable configurations. Of course, this approach cannot be accepted for live optimization running in production environments. For this purpose, Akamas live optimization uses observations of configuration changes combined with the automatic detection of workload contexts and provides several customizable safety policies when recommending configurations to be approved, revisited, and applied.
Akamas provides a few customizable optimizer options (refer to the options described on the page of the reference guide) that should be configured so as to make configurations recommended in live optimization and applied to production environments as safe as possible.
Akamas provides an optimizer option known as the exploration factor that only allows gradual changes to the parameters. This gradual optimization allows Akamas to observe how these changes impact the system behavior before applying the following gradual changes.
By properly configuring the optimizer, Akamas can gradually explore regions of the configuration space and slowly approach any potentially risky regions, thus avoiding recommending any configurations that may negatively impact the system. Gradual optimization takes into account the maximum recommended change for each parameter. This is defined as a percentage (default is 5%) with respect to the baseline value. For example, in the case of a container whose CPU limit is 1000 millicores, the corresponding maximum allowed change is 50 millicores. It is important to notice that this does not represent an absolute cap, as Akamas also takes into account any good configurations observed. For example, in the event of a traffic peak, Akamas would recommend a good configuration that was observed working fine for a similar workload in the past, even if the change is higher than 5% of the current configuration value.
Notice that this feature would not work for categorical parameters (e.g. JVM GC Type) as their values do not change incrementally. Therefore, when it comes to these parameters, Akamas by default takes a conservative approach of only recommending configurations with categorical parameters taking already observed before values. This still allows some never observed values to be recommended as users are allowed to modify values also for categorical parameters when operating in human-in-the-loop mode. Once Akamas has observed that that specific configuration is working fine, the corresponding value can then be recommended. For example, a user might modify the recommended configuration for GC Type from Serial to Parallel. Once Parallel has been observed as working fine, Akamas would consider it for future recommendations of GC Type, while other values (e.g. G1) would not be considered until verified as safe recommendations.
The exploration factor can be customized for each live optimization individually and changed while live optimizations are running.
Akamas provides an optimizer option known as the safety factor designed to prevent Akamas from selecting configurations (even if slowly approaching them) that may impact the ability to match defined SLOs. For example, when optimizing container CPU limits, lower and lower CPU limits might be recommended, up to the point that the limit becomes too low that the application performance degrades.
Akamas takes into account the magnitude of constraint breaches: a severe breach is considered more negative than a minor breach. For example, in the case of an SLO of 200 ms on response time, a configuration causing a 1 sec response time is assigned a very different penalty than a configuration causing a 210 ms response time. Moreover, Akamas leverages the smart constraint evaluation feature that takes into account if a configuration is causing constraints to approach their corresponding thresholds. For example, in the case of an SLO of 200 ms on response time, a configuration changing response time from 170 ms to 190 ms is considered more problematic than one causing a change from 100 ms to 120 ms. The first one is considered by Akamas as corresponding to a gray area that should not be explored.
The safety factor is also used when starting the study in order to validate the behavior of the baseline to identify the safety of exploring configurations close to the baseline. If the baseline presents some constraint violations, then even exploring configurations close to the baseline might cause a risk. If Akamas identifies that, in the baseline configuration, more than (safety_factor*number_of_trials) manifest constraint violations then the optimization is stopped.
If your baseline has some trials failing constraint validation we suggest you analyze them before proceeding with the optimization
The safety factor is set by default to 0.5 and can be customized for each live optimization individually and changed while live optimizations are running.
It is also worth mentioning that Akamas also features an outlier detection capability to compensate for production environments typically being noisy and much less stable than staging environments, thus displaying highly fluctuating performance metrics. As a consequence, constraints may fail from time to time, even for perfectly good configurations. This may be due to a variety of causes, such as shared infrastructure on the cloud, slowness of external systems, etc.
A final step in defining an optimization study is to specify specifies the sequence of steps executed while running the study.
The following four types of steps are available:
Baseline: performs an experiment and sets it as a baseline for all the other ones
Bootstrap: imports experiments from other studies
Preset: performs an experiment with a specific configuration
Optimize: performs experiments and generates optimized configurations
Please notice that at least one baseline step is always required in any optimization study. This applies not only to offline optimization studies, but also to live optimization studies as it is being used to suggest changes to parameter values starting from the default values.
The following sections provide some best practices on how to best approach the step of defining the baseline step.
Ensure the baseline configuration is correct
In an optimization study, the baseline is an important experiment as it represents the system performance with the current configuration, and serves as a reference to assess the relative improvements the optimization achieved.
Therefore, it is important to make sure the baseline configuration of the study correctly reflects the current configuration - be it the vendor default or the result of a manual tuning exercise.
Evaluate which parameters to include in the baseline configuration
When defining the study baseline configuration it is important to evaluate which parameters to include. Indeed, several technologies have default values assigned to most of their configuration parameters. However, the runtime behavior can be different depending on whether the parameter is set to the default value or it is not set at all.
Therefore, it is recommended to review the current configuration (e.g. the one in place in production) and identify which parameters and values have been set (e.g. JVM maxHeapSize = 2GB, gcType = Parallel, etc.), and then to only set those parameters with their corresponding values, without adding any other parameters. This ensures that the specified baseline is consistent with the real production setup.
For a live optimization study, it is required to specify which component metrics represent the different workloads observed on the target system. A workload could be represented by either a metric directly measuring that workload, such as the application throughput, or a proxy metric, such as the percentage of reads and writes in your database.
The page of the section in the reference guide describes how to define the corresponding structure.
Akamas features automatic detection of workload contexts, corresponding to different patterns for the same workload. For example, workload context could correspond to the peak or idle load, or to the weekend or weekday traffic. This allows Akamas to recommend safe configurations based on the observed behavior of the system under similar workload conditions.
Akamas provides several parameters governing how the Akamas optimizer operates and leverages the workload information while a live optimization study is being executed. The most important parameter is the online mode (see ) as it related to whether the human user is part of the approval loop when the Akamas AI recommends a configuration to be applied.
Moreover, Akamas also provides customizable safety policies that drive the Akamas optimizer in evaluating candidate configurations with respect to defined goal constraints.
Live optimizations can operate in one of the following online modes:
recommendation (or manual) mode (the default mode): Akamas does not immediately apply a configuration identified by Akamas AI: a new configuration is first recommended to the user, who needs to approve it, possibly after modifying it, before it gets applied - this is also referred to as human-in-the loop scenario;
fully autonomous (or automatic) mode: new configurations are immediately applied by Akamas as soon as they are generated by the Akamas AI, without being first recommended to (and approved by) the user.
It is worth noticing that under a recommendation mode, there might be a significant delay between the time a configuration is identified by Akamas and the time the recommended changes get applied. Therefore, the Akamas AI leverages the workload information differently when looking for a new configuration, depending on the defined online mode:
in the recommendation mode, Akamas takes into account all the defined workloads and looks for the configuration that best satisfies the goal constraints for all the observed workloads and provides the best improvements for all of them
in the fully autonomous mode, Akamas works on a single workload at each iteration (based on a customizable workload strategy - see below) and looks for an optimized configuration for that specific workload to be immediately applied in the next iteration, even if it might not be the best for the different workloads
The online mode can be specified at the study level and can also be overridden at the step level (only for steps of type "optimize" - see section ). The page of the section in the reference guide describes how to define the corresponding structure. This can be done either from the Akamas command line (see page ) or from the Akamas AI (see the following figure).
Notice that the online mode can be changed at any time, that is while the optimization study is running, to become immediately effective. For example, a live optimization could initially operate in recommendation mode and then be changed to fully autonomous mode afterward.
Since an offline optimization study lasts for at most the number of configured experiments and typically runs in a test or pre-production environment, results could be safely either analyzed after the study has completely finished.
However, it is a good practice to analyze partial results while the study is still running as this may provide useful insights about both the system being optimized (e.g. understanding of the system dynamics and sub-optimal configurations that could be immediately applied) and about the optimization study itself (e.g. how to re-design a workflow or change constraints), early-on.
The Akamas UI displays the results of an offline optimization study in different visual areas:
the Best Configuration section provides the optimal configuration identified by Akamas, as a list of recommended values for the optimization parameters compared to the baseline and ranked according to their relevance;
the Progress tab see the following figures) displays the progression of the study with respect to the study steps, the status of each experiment (and trial), its associated score, and the parameter values of the corresponding configurations; this area is mostly used for study monitoring (e.g. identifying failing workflows) and troubleshooting purposes;
the Analysis tab (see the following figures) displays how the baseline and experiments score with respect to the optimization goal, and the values of metrics and parameters for the corresponding configurations; this area supports the analysis of the different configurations;
the Metrics tab (see the following figure) displays the behavior of the metrics for all executed experiments (and trials); this area supports both study validation activities and deeper analysis of the system behavior;
Once all the preparatory steps for creating a study are done, running a study is straightforward: An optimization study can be started from either the Akamas UI (see the following figures) or the command line (refer to the page).
Before actually running an optimization study, it is highly recommended to read the following sections:
or
This can be useful for multiple reasons, including the case of an error (e.g. a misconfigured workflow) that requires "restarting" the study.
For live optimization studies, it is possible to stop a study and restart it. However, please notice that this is an irreversible action, that would delete all the executed experiments, so basically, restarting a live study means starting it from scratch.
The page in the section in the reference guide describes how to define the corresponding structures for each of the different types of steps allowed by Akamas. For offline optimization studies only, the Akamas UI allows the optimization steps to be defined as part of the visual procedure activated by the "Create a study" button (see the following figure).
In addition to the best practices here below, please refer to the section for a number of examples related to a variety of technologies and the guide for real-world examples.
the Insights section (see the following figure) displays any suboptimal configurations that have been identified for the study KPIs, and also allows making comparisons among them and the best configuration - the page describes in further detail the Insight section and the insights tags displayed in other areas of the Akamas UI.
Once started, managing studies is different for offline optimization studies (see ) and live optimization studies (see ).
Notice that once an offline optimization study has started, it can only be stopped or let be finished and not restarted again. However, it is also possible to reuse experiments executed in another study in another (successfully or not) finished study - this is called bootstrapping and is illustrated by the following figure (also refer to the page on the reference page).
The following best practices should be considered before applying a configuration identified by an offline optimization study from a test or pre-production environment to a production environment.
Most of these best practices are general and refer to any configuration change and application rollout, not only to Akamas-related scenarios.
Any configuration identified by Akamas in a test or pre-production environment, by executing a number of experiments and trials in a limited timeframe, should be first validated before being promoted to production in its ability to consistently deliver the expected performance over time.
An endurance test typically lasts for several hours and can either mimic the specific load profile of production environments (e.g. morning peaks or low load phases during the night) or a simple constant high load (flat load). A specific Akamas study can be implemented for this purpose.
When applying a new configuration to a production environment it is important to reduce the risk of severely impacting the supported services and allowing time to backtrack if required.
With a gradual rollout approach, a new configuration is applied to only a subset of the target system to allow the system to be observed for a period of time and avoid impacting the entire.
Several strategies are possible, including:
Canary deployment, where a small percentage of the traffic is served by the instance with the new configuration;
Shadow traffic, where traffic is mirrored and redirected to the instance with the new configuration, and responses are not impacting the user.
In the case of an application sharing entire layers or single components (e.g. microservices) with other applications, it is important to assess in advance the potential impact on other applications before applying a configuration identified by only considering SLOs related to a single application.
The following general considerations may help in assessing the impact on the infrastructure:
if the new configuration is more efficient (i.e. it is less demanding in terms of resources) or it does require changes to resource requirements (e.g. does not change K8s request limits), then the configuration can be expected to be beneficial as the resources will be freed and become available for additional applications;
If the new configuration is less efficient (i.e. it requires more resources), then appropriate checks of whether the additional capacity is available in the infrastructure (e.g. in the K8s cluster or namespace) should be done, as when allocating new applications.
As far as the other applications are concerned:
Just reducing the operational cost of a service does not have any impact on other applications that are calling or using the service;
While tuning service for performance may put the caller system in back-pressure fatigue, this is not the typical behavior of enterprise systems, where the most susceptible systems are on the backend side:
Tuning most external services will not increase the throughput much, which is typically business-driven, thus the risk to overwhelm the backends is low;
Tuning the backends allows the caller systems to handle faster connections, thus reducing the memory footprint and increasing the resilience of the entire system;
Especially in the case of highly distributed systems, such as microservices, the number of inflight packages for a given period of time is something to be minimized;
A latency reduction for a microservice implies fewer in-flight packages throughout the system, leading to better performance, faster failures, and fewer pending transactions to be rolled back in case of incidents.
When starting a new JVM optimization, the following is a list of recommended parameters to include in your study:
jvm_gcType
jvm_maxHeapSize
jvm_newSize
jvm_survivorRatio
jvm_maxTenuringThreshold
jvm_parallelGCThreads
jvm_concurrentGCThreads
When starting a new JVM optimization, the following is a list of recommended parameters to include in your study:
jvm_gcType
jvm_maxHeapSize
jvm_newSize
jvm_minHeapSize
jvm_activeProcessorCount
jvm_survivorRatio
jvm_maxTenuringThreshold
jvm_parallelGCThreads
jvm_concurrentGCThreads
In this section, some guidelines on how to choose optimization parameters are provided for the following specific technologies:
These guidelines also provide an example of how to approach the selection of parameters (and how to define the associated domains and constraints) in an optimization study.
While the main result of an optimization study is to identify the optimal configuration with respect to the defined goal & constraints, any suboptimal configuration that is improving on one of the defined KPIs can be also very valuable.
These configurations are displayed in a dedicated section of the Akamas UI and also displayed in other areas of the Akamas UI as textual badges "Best <KPI name>" referred to as (insights) tags.
The following figures show the Insights section displayed on the study page and the Insights pages that can be drilled down to.
The following figure shows the insights tags in the Analysis tab:
Please notice that "Best", "Best Memory Limit" and any other KPI-related tags are displayed in the Akamas UI while the study progresses and thus may be reassigned as new experiments get executed and their configurations are scored and provide their results for the defined study KPIs. See
After starting a study, any finished experiment is labeled by one or more insights tags "Best <KPI name>" in case the corresponding configuration provides the best result so far for those KPIs. Notice that for experiments involving multiple trials, tags are only assigned after all their trials have finished.
Of course, after the very first experiment (i.e. a baseline) finishes, all tags are assigned to the corresponding configuration. This is displayed by the following figure for a study where the KPIs named CPU
with formula renaissance.cpu_used
and direction minimize
and MEM
with formula renaissance.mem_used
and direction minimize
:
When the following experiments finish, tags are reevaluated according with respect to the computed goal score and the achieved results for any single KPI. In this study, experiment #2 provided a better result for both the CPU and the study goal, so it got both the tags Best CPU
and Best renaissance.response_time
(which is defined as the goal of the study). Notice that the blue star is displayed by Akamas (except for baseline) to highlight the fact that this was automatically generated by Akamas and not assigned by a user.
Afterward, experiment #3 got the tag as the best configuration while experiment #4 got the tag Best CPU
. as improving on experiment #2. Therefore two configurations displayed the blue star.
A number of experiments later, experiment #7 provided better memory usage than the baseline so got the tag Best MEM
assigned. At this point, three configurations have the blue start, thus making evident that there are tradeoffs when trying to optimize with respect to the goal and the KPIs.
Even for live optimization studies, it is a good practice to analyze how the optimization is being executed with respect to the defined goal & constraints, and workloads.
This analysis may provide useful insights about the system being optimized (e.g. understanding of the system dynamics) and about the optimization study itself (e.g. how to adjust optimizer options or change constraints). Since this is more challenging for an environment that is being optimized live, a common practice to adopt a recommendation mode before possibly switching to a fully autonomous mode.
The Akamas UI displays the results of an offline optimization study in the following areas:
the Metrics section (see the following figures) displays the behavior of the metrics as configurations are recommended and applied (possibly after being reviewed and approved by users); this area supports the analysis of how the optimizer is driven by the configured safety and exploration factors.
The All Configurations section provides the list of all the recommended configurations, possibly as modified by the user, as well as the detail of each applied configuration (see the following figures).
in the case of a recommendation mode, the Pending configuration section (see the following figure) shows the configuration that is being recommended to allow users to review it (see the EDIT toggle) and approve it:
j9vm_minFreeHeap
j9vm_maxFreeHeap
j9vm_minHeapSize
j9vm_maxHeapSize
j9vm_gcCompact
j9vm_gcThreads
j9vm_gcPolicy
j9vm_codeCacheTotal
j9vm_compilationThreads
j9vm_aggressiveOpts
The following describes how to approach tuning JVM in the following areas:
The most relevant JVM parameters are the ones defining the boundaries of the allocated heap (j9vm_minHeapSize
, j9vm_maxHeapSize
). The upper bound to configure for this domain strongly depends on the memory in megabytes available on the host instance or on how much we are willing to allocate, while the lower bound depends on the minimum requirements to run the application.
The free heap parameters (j9vm_minFreeHeap
, j9vm_maxFreeHeap
) define some boundaries for the free space target ratio, which impacts the trigger thresholds of the garbage collector. The suggested starting ranges are from 0.1 and 0.6 for the minimum free ratio range, and from 0.3 to 0.9 for the maximum.
The following represents a sample snippet of the section parametersSelection
in the study definition:
It is also recommended to define the following constraints:
min heap size lower than or equal to the max heap size:
upper bound to be at least 5 percentage points higher than the lower bound
The following JVM parameters define the behavior of the garbage collector:
j9vm_gcPolicy
j9vm_gcThreads
j9vm_gcCompact
The garbage collection policy (j9vm_gcPolicy
) defines the collection strategy used by the JVM. This parameter is key for the performance of the application: the default garbage collector (gencon
) is the best solution for most scenarios, but some specific kinds of applications may benefit from one of the alternative options.
The number of GC threads (j9vm_gcThreads
) defines the level of parallelism available to the collector. This value can range from 1 to the maximum number of CPUs that are available or we are willing to allocate.
The GC compaction (j9vm_gcCompact
) selects if garbage collections perform full compactions always, never, or based on internal policies.
The following represents a sample snippet of the section parametersSelection
in the study definition:
The following JVM parameters define the behaviors of the compilation:
j9vm_compilationThreads
j9vm_codeCacheTotal
The compilation threads parameter (j9vm_compilationThreads
) defines the number available for the JIT compiler. Its range depends on the available CPUs.
The code cache parameter (j9vm_codeCacheTotal
) defines the maximum size limit for the JIT code cache. Higher values may benefit complex server-type applications, at the expense of the memory footprint, so should be taken into account in the overall memory requirements.
The following represents a sample snippet of the section parametersSelection
in the study definition:
The following JVM parameter defines the behavior of aggressive optimization:
j9vm_aggressiveOpts
Aggressive optimizations (j9vm_aggressiveOpts
) enables some experimental features that usually lead to performance gains.
The following represents a sample snippet of the section parametersSelection
in the study definition:
When optimizing Linux systems, typically the goal is to allow cost savings or improve performance and the quality of service, such as sustaining higher levels of traffic or enabling transactions with lower latency.
Please refer to the for the list of component types, parameters, metrics, and constraints.
Akamas provides the as the preferred way to apply Linux parameters to a system to be optimized. The operator connects via SSH to your Linux components and will employ different strategies to apply Linux parameters. Notice that this operator allows you to exclude some block/network devices from being configured.
You can organize a typical workflow to optimize Linux in three parts:
Configure Linux
Use the to apply configuration parameters to the operating system, no restart is required
Test the performance of the system
Use to execute a performance test against the system
Perform some cleanup
Use to perform any clean-up to guarantee any subsequent execution of the workflow will run without problems
Here’s an example of a typical workflow for a Linux system:
This section provides some guidelines on how to define optimization studies by means of a few examples related to single-technology/layer systems, in particular on how to define workflows and telemetry providers.
More complex real-world examples are provided by the guide.
This page provides a list of best practices when optimizing an Oracle database with Akamas.
This section provides some guidelines on the most relevant memory-related parameters and how to configure them to perform a high-level optimization of a generic Oracle Database instance.
Oracle DBAs can choose, depending on their needs or expertise, the desired level of granularity when configuring the memory allocated to the database areas and components, and let the Oracle instance automatically manage the lower layers. In the same way, Akamas can tune a target instance with different levels of granularity.
In particular, we can configure an Akamas study so that it simply tunes the overall memory of the instance, leaving Oracle automatically manage how to allocate it between shared memory (SGA) and program memory (PGA); alternatively, we can tune the target values of both of these areas and let Oracle take care of their components, or go even deeper and have total control of the sizing of every single component.
Notice: running the queries in this guide requires a user with the ALTER SYSTEM
, SELECT ON V_$PARAMETER
, and SELECT ON V_$OSSTAT
privileges
Also notice that to define the domain of some of the parameters you need to know the physical memory of the instance. You can find the value in MiB running the query select round(value/1024/1024)||'M' "physical_memory" from v$osstat where stat_name='PHYSICAL_MEMORY_BYTES'
. Otherwise, if you have access to the underlying machine, you can run the bash command free -m
This is the simplest of the memory-optimization set of parameters, where the study configures only the overall memory available for the instance and lets Oracle’s Automatic Memory Management (AMM) dynamically assign the space to the SGA and PGA. This is useful for simple studies where you want to minimize the overall used memory, usually coupled with constraints to make sure the performances of the overall system remain within acceptable values.
memory_target
: this parameter specifies the total memory used by the Oracle instance.
When AMM is enabled can find the default value with the query select display_value "memory_target" from v$parameter where name='memory_target'
. Otherwise, you can get an estimate summing the configured SGA size found running select display_value "sga_target" from v$parameter where name LIKE 'sga_target'
and the size of the PGA found with select ceil(value/1024/1024)||'M' "physical_memory" from v$pgastat where name='maximum PGA allocated'
.
The explored domain strongly depends on your application and hardware, but an acceptable range goes from 152M
(the minimum configurable value) to the physical size of your instance. Over time, Akamas will learn to avoid automatically the configuration with not-enough memory.
To configure the Automatic Memory Management you also need to make sure that the parameters sga_target
and pga_aggregate_limit
are set to 0
by configuring them among the default values of a study, or manually running the configuration queries.
The following snippet shows the parameter selection to tune the total memory of the instance. The domain is configured to go from the minimum value to the maximum physical memory (7609M
in our example).
With the following set of parameters, Akamas tunes the individual sizes of the SGA and PGA, letting Oracle’s Automatic Shared Memory Management (ASMM) dynamically size their underlying SGA components. You can leverage these parameters for studies where, like the previous scenario, you want to find the configuration with the lowest memory allocation that still performs within your SLOs. Another possible scenario is to find the balance in the allocation of the memory available that best fits your optimization goals.
sga_target
: this parameter specifies the target SGA size.
When ASMM is configured, you can find the default value with the query select display_value "sga_target" from v$parameter where name='sga_size'
.
The explored domain strongly depends on your application and hardware, but an acceptable range goes from 64M
(the minimum configurable value) to the physical size of your instance minus a reasonable size for the PGA (usually up to 80% of physical memory).
pga_aggregate_target
: this parameter specifies the target PGA size.
You can find the default value with the query select display_value "pga_aggregate_target" from v$parameter where name='pga_aggregate_target'
.
The explored domain strongly depends on your application and hardware, but an acceptable range goes from 10M
(the minimum configurable value) to the physical size of your instance minus a reasonable size for the SGA.
To tune the SGA and PGA, you also must set the memory_target
to 0
to disable AMM by configuring them among the default values of a study, or manually running the configuration queries.
ASMM will dynamically tune all the SGA components whose size is not specified, so set to 0
all the parameters (db_cache_size
, log_buffer
, java_pool_size
, large_pool_size
, shared_pool_size
, and streams_pool_size
) unless you have any specific requirements.
The following snippet shows the parameter selection to tune both SGA and PGA sizes. Each parameter is configured to go from the minimum value to 90% of the maximum physical memory (6848M
in our example), allowing Akamas to explore all the possible ways to partition the space between the two areas and find the best configuration for our use case:
The following code snippet forces Akamas to explore configuration spaces where the total memory, expressed in MiB, does not exceed the total memory available (7609M
in our example). This allows speeding up the optimization avoiding configurations that won’t work correctly.
With the following set of parameters, Akamas tunes the space allocated to one or more of the components that make the System Global Area, along with the size of the Program Global Area size. This scenario is useful for studies where you want to find the memory distribution that best fits your optimization goals.
pga_aggregate_target
: this parameter specifies the size of the PGA.
You can find the default value with the query select display_value "pga_aggregate_target" from v$parameter where name='pga_aggregate_target'
.
The explored domain strongly depends on your application and hardware, but an acceptable range goes from 10M
(the minimum configurable value) to the physical size of your instance.
db_cache_size
: this parameter specifies the size of the default buffer pool.
You can find the default value with the query select * from v$sgainfo where name='Buffer Cache Size'
.
log_buffer
: this parameter specifies the size of the log buffer.
You can find the default value with the query select * from v$sgainfo where name='Redo Buffers'
.
java_pool_size
: this parameter specifies the size of the java pool.
You can find the default value with the query select * from v$sgainfo where name='Java Pool Size'
.
large_pool_size
: this parameter specifies the size of the large pool.
You can find the default value with the query select * from v$sgainfo where name='Large Pool Size'
.
streams_pool_size
: this parameter specifies the size of the streams pool.
You can find the default value with the query select * from v$sgainfo where name='Streams Pool Size'
.
shared_pool_size
: this parameter specifies the size of the shared pool.
You can find the default value with the query select * from v$sgainfo where name='Shared Pool Size'
.
The explored domains of the SGA components strongly depend on your application and hardware; an approach is to scale both up and down the baseline value by a reasonable factor to define the domain boundaries (eg: from 20% to 500% of the baseline).
To tune all the components set both the memory_target
and sga_target
parameters to 0
by configuring them among the default values of a study, or manually running the configuration queries.
Notice: if your system leverages non-standard block-size buffers you should consider tuning also the db_Nk_cache_size
parameters.
The following snippet shows the parameter selection to tune the size of the PGA and the SGA components. The PGA parameter is configured to go from the minimum value to 90% of the maximum physical memory (6848M
in our example), while the domains for the SGA components are configured scaling their default value by approximatively a factor of 10. Along with the constraint defined below, these domains give Akamas great flexibility while exploring how to distribute the available memory space:
The following code snippet forces Akamas to explore configuration spaces where the total memory, expressed in MiB, does not exceed the total memory available (7609M
in our example).
You should also add to the equation any db_Nk_cache_size
tuned in the study.
When optimizing Java applications based on OpenJDK, typically the goal is to tune the JVM from both the point of view of cost savings and quality of service.
Please refer to the for the list of component types, parameters, metrics, and constraints.
Akamas offers many operators that you can use to apply the parameters for the tuned JVM. In particular, it is suggested to use the to create a configuration file or inject the arguments directly in the command string using a template.
The following is an example of templatized executions string:
A typical workflow to optimize a Java application can be structured in two parts:
Configure the Java arguments
Generate a configuration file or a command string containing the selected JVM parameters using a .
Run the Java application
Use available to execute a performance test against the application.
Here’s an example of a typical workflow where Akamas executes the script containing the command string generated by the file configurator:
Here’s a configuration example for a telemetry provider instance that uses Prometheus to extract all the JMX metrics defined in this optimization pack:
where the configuration of the monitored component provides the additional references as in the following snippet:
Akamas does not provide any specialized telemetry solution to gather Linux metrics as these metrics can be collected in a variety of ways, leveraging a plethora of existing solutions. For example, the supports Linux system metrics.
Akamas can access JMX metrics using the . This provider comes out of the box with a set of default queries to interrogate a Prometheus instance configured to fetch data from a .
See this for an example of a study leveraging the Java OpenJDK pack.
pg_max_connections
Keep its value under 1000 connections.
pg_effective_cache_size
75% of physical available memory to PostgreSQL.
pg_maintenance_work_mem
12% of physical available memory to PostgreSQL.
pg_max_wal_senders
Max replicas you expect to have, doubled.
pg_max_parallel_workers
Number of cores divided by 2.
pg_shared_buffers
25% of physical available memory to PostgreSQL.
When optimizing Java applications based on OpenJ9, typically the goal is to tune the JVM from both the point of view of cost savings and quality of service.
Please refer to the OpenJ9 optimization pack for the list of component types, parameters, metrics, and constraints.
Akamas offers many operators that you can use to apply the parameters for the tuned JVM. In particular, it is suggested to leverage the FileConfigurator Operator to create a configuration file or inject the arguments directly in the command string using a template.
The following is an example of templatized executions string:
A typical workflow to optimize a Java application can be structured in two parts:
Configure the Java arguments
Generate a configuration file or a command string containing the selected JVM parameters using a FileConfigurator Operator.
Run the Java application
Use available operators to execute a performance test against the application.
Here’s an example of a typical workflow where Akamas executes the script containing the command string generated by the file configurator:
Akamas can access JMX metrics using the Prometheus provider. This provider comes out of the box with a set of default queries to interrogate a Prometheus instance configured to fetch data from a JMX Exporter.
Here’s a configuration example for a telemetry provider instance that uses Prometheus to extract all the JMX metrics defined in this optimization pack:
where the configuration of the monitored component provides the additional references as in the following snippet:
See this page for an example of a study leveraging the Eclipse OpenJ9 pack.
When optimizing a MySQL instance, typically the goal is one of the following:
Throughput optimization: increasing the capacity of a MySQL deployment to serve clients
Cost optimization: decreasing the size of a MySQL deployment while guaranteeing the same service level
Please refer to the MySQL optimization pack for the list of component types, parameters, metrics, and constraints.
Usually, MySQL parameters are configured by writing them in the MySQL configuration file, typically called my.cnf
, and located under /etc/mysql/
on most Linux systems.
In order to preserve the original config file intact, it is best practice to use additional configuration files, located in /etc/mysql/conf.d
to override the default parameters. These files are automatically read by MySQL.
FileConfigurator and Executor operator
You can leverage the FileConfigurator operator by creating a template file on a remote host that contains some scripts to configure MySQL with placeholders that will be replaced with the values of parameters tuned by Akamas. When all the placeholders in FileConfigurator get replaced, the operator can be used to actually execute the script to configure and restart the database
A typical workflow to optimize a MySQL deployment can be structured in three parts:
Configure MySQL
Use the FileConfigurator operator to specify an input and an output template file. The input template file is used to specify how to interpolate MySQL parameters into a configuration file, and the output file is used to contain the result of the interpolation.
Restart MySQL
Use the Executor operator to restart MySQL allowing it to load the new configuration file produced in the previous step.
Optionally, use the Executor operator to verify that the application is up and running and has finished any initialization logic.
Test the performance of the application
Use any of the workflow operators to perform a performance test against the application.
Prepare test results
Use any of the workflow operators to organize test results so that they can be imported into Akamas using the supported telemetry providers (see also section here below).
Finally, when running performance experiments on databases is common practice to do some cleanup tasks at the end of the test to restore the database's initial condition to avoid impacting subsequent tests.
Here’s an example of a typical workflow for MySQL, which uses the OLTP Resourcestresser benchmark to run performance tests
Akamas can access MySQL metrics using the Prometheus provider. This provider can be leveraged to query MySQL metrics collected by a Prometheus instance via the MySql Prometheus exporter.
Here’s an example of a telemetry providers instance that uses Prometheus to extract all the MySQL metrics defined in this optimization pack:
This page and this page describe an example of how to leverage the MySQL optimization pack.
This page intends to provide some guidance in optimizing web applications. Please refer to the Web Application optimization pack for the list of component types, parameters, metrics, and constraints.
No specialized telemetry solution to gather Web Application metrics is included. The following providers however can integrate with the provided metrics:
CSV File Provider: this provider can be configured to ingest data points generated by any monitoring application able to export the data in CSV format.
integrations leveraging NeoLoad Web, LoadRunner Professional or LoadRunner Enterprise as a load generator can use this ad-hoc provider that comes out of the box and uses the metrics defined in this optimization pack.
The provided component type does not define any parameter. The workflow will optimize parameters defined in other component types representing the underlying technological stack.
A typical workflow to optimize a web application is structured in three parts:
Configure and restart the application
Use the FileConfigura operator to interpolate the tuned parameters in the configuration files of the underlying stack.
Restart the application using an Executor operator.
Wait for the application to come up using the Sleep or Executor operator.
Run the test
use any of the available operators to trigger the execution of the performance test against the application.
Perform the cleanup
use any of the available operators to restore the application to the original state.
Here's an example workflow to perform a test on a Java web application using NeoLoad as a load generator:
See this page for an example of a study leveraging the Web Application pack.
When optimizing Kubernetes applications, typically the goal is to find the configuration that assigns resources to containerized applications so as to minimize waste and ensure the quality of service.
Please refer to the Kubernetes optimization pack for the list of component types, parameters, metrics, and constraints.
Akamas offers different operators to configure Kubernetes entities. In particular, you can use the FileConfigurator operator to update the definition file of a resource and apply it with the Executor operator.
The following example is the definition of a deployment, where the replicas and resources are templatized in order to work with the FileConfigurator:
A typical workflow to optimize a Kubernetes application is usually structured as the following:
Configure the Kubernetes artifacts: use the File Configurator operator to create the definition files starting from a template.
Apply the new parameters: apply the updated definitions using the Executor operator.
Wait for the application to be ready: run a custom script to wait until the rollout is complete.
Run the test: execute the benchmark.
Here’s an example of a typical workflow for a system:
Akamas can access Kubernetes metrics using the Prometheus provider. This provider comes out of the box with a set of default queries to interrogate a Prometheus instance configured to fetch data from cAdvisor and kube-state-metrics.
Here’s a configuration example for a telemetry provider instance that uses Prometheus to extract all the Kubernetes metrics defined in this optimization pack:
where the configuration of the monitored component provides the additional filters as in the following snippet:
Please keep in mind that some resources, such as pods belonging to deployments, require wildcards in order to match the auto-generated names.
See this page for an example of a study leveraging the Kubernetes pack.
When optimizing a MongoDB instance, typically the goal is one of the following:
Throughput optimization - increasing the capacity of a MongoDB deployment to serve clients
Cost optimization - decreasing the size of a MongoDB deployment while guaranteeing the same service level
To reach such goals, it is recommended to tune the parameters that manage the cache, which is of the elements that impact performances the most, in particular those parameters that control the lifecycle and the size of the MongoDB’s cache.
Even though it is possible to evaluate performance improvements of MongoDB by looking at the business application that uses it as its database, looking at the end-to-end throughput or response time, or using a performance test like YCSB, the optimization pack provides internal MongoDB metrics that can shed a light too on how MongoDB is performing, in particular in terms of throughput, for example:
The number of documents inserted in the database per second
The number of active connections
Please refer to the MongoDB optimization pack for the list of component types, parameters, metrics, and constraints.
Akamas offers many operators that you can use to apply freshly tuned configuration parameters to your MongoDB deployment. In particular, we suggest using the FileConfigurator operator to create a configuration script file and the ExecutorOperator to execute it and thus apply the parameters.
FileConfigurator and Executor operator
You can leverage the FileConfigurator by creating a template file on a remote host that contains some scripts to configure MongoDB with placeholders that will be replaced with the values of parameters tuned by Akamas.
Here’s an example of the aforementioned template file:
You can leverage the FileConfigurator by creating a template file on a remote host that contains some scripts to configure MongoDB with placeholders that will be replaced with the values of parameters tuned by Akamas. Once the FileConfigurator has replaced all the tokens, you can use the Executor operator to actually execute the script to configure MongoDB.
A typical workflow to optimize a MongoDB deployment can be structured in three parts:
Configure MongoDB
Use the FileConfigurator to specify an input and an output template file. The input template file is used to specify how to interpolate MongoDB parameters into a script, and the output file contains the actual configuration.
Use the Executor operator to reconfigure MongoDB exploiting the output file produced in the previous step. You may need to restart MongoDB depending on the configuration parameters you want to optimize.
Test the performance of the application
Use available operators to execute a performance test against the application
Prepare test results (optional)
If Akamas does not already automatically import performance test metrics, then you can use available operators to extract test results and make them available to Akamas (for example, you can use an Executor to launch a script that produces a CSV of the test results that Akamas can consume using the CSV provider)
Cleanup
Use available operators to bring back MongoDB into a clean state to avoid impacting subsequent tests
Finally, when running performance experiments on a database, is common practice to execute some cleanup tasks at the end of the test to restore the database initial condition and avoid impacting subsequent tests.
Here’s an example of a typical workflow for a MongoDB deployment, which uses the YCSB benchmark to run performance tests:
Akamas offers many telemetry providers to extract MongoDB metrics; one of them is the Prometheus provider which we can use to query MongoDB metrics collected by a Prometheus instance via the MongoDB Prometheus exporter.
Here’s an example of a telemetry providers instance that uses Prometheus to extract all the MongoDB metrics defined in this optimization pack:
See the page Optimizing a MongoDB server instance for an example of a study leveraging the MongoDB pack.
When optimizing applications running on the Apache Spark framework, the goal is to find the configurations that best optimize the allocated resources or the execution time.
Please refer to the Spark optimization pack for the list of component types, parameters, metrics, and constraints.
Akamas offers several operators that you can use to apply the parameters for the tuned Spark application. In particular, we suggest using the Spark SSH Submit operator, which connects to a target instance to submit the application using the configuration parameters to test.
Other solutions include:
the Spark Livy Operator, which allows submitting the application along with the configuration parameters using the Livy Rest interface
the standard Executor operator, which allows running a custom command or script once the FileConfigurator operator updated the default Spark configuration file or a custom one using a template.
You can organize a typical workflow to optimize a Spark application in three parts:
Setup the test environment
prepare any required input data
apply the Spark configuration parameters, if you are going for a file-based solution
Execute the Spark application
Perform cleanup
Here’s an example of a typical workflow where Akamas executes the Spark application using the Spark SSH Submit operator:
Akamas can access Spark History Server statistics using the Spark History Server Provider. This provider maps the metrics in this optimization pack to the statistics provided by the History Server endpoint.
Here’s a configuration example for a telemetry provider instance:
See this page for an example of a study leveraging the Spark pack.
When optimizing a MongoDB instance, typically the goal is to maximize the throughput of an Oracle-backed application or to minimize its resource consumption, thus reducing costs.
Please refer to the Oracle Database optimization pack for the list of component types, parameters, metrics, and constraints.
One common way to configure Oracle parameters is through the execution ALTER SYSTEM
statements on the database instance: to automate the execution of this task Akamas provides the OracleConfigurator operator. For finer control, Akamas provides the FileConfigurator operator, which allows building custom statements in a script file that can be executed by the Executor operator.
Oracle Configurator
The OracleConfigurator operator allows the workflow to configure an on-premise instance with minimal configuration. The following snippet is an example of a configuration task, where all the connection arguments are already defined in the referenced component:
File Configurator and Executor
Most cloud providers offer web APIs as the only way to configure database services. In this case, the Executor operator can submit an API request through a custom executable using a configuration file generated by a FileConfigurator operator.
The following is an example workflow where a FileConfigurator task generates a configuration file (oraconf
), followed by an Executor task that parses and submits the configuration to the API endpoint through a custom script (api_update_db_conf.sh
):
The optimization of an Oracle database usually includes the following tasks in the workflow, as implemented in the example below:
Apply the Oracle configuration suggested by Akamas and restart the instance if needed (Update parameters
task).
Perform any additional warm-up task that may be required to bring the database up at the operating regime (Execute warmup
task).
Execute the workload targeting the database or the front-end in front of it (Execute performance test
task).
Restore the original state of the database in order to guarantee the consistency of further tests, removing any dirty data added by the workload and possibly flushing the database caches (Cleanup
task).
The following is the complete YAML configuration file of the workflow described above:
Akamas offers many telemetry providers to extract Oracle Database metrics; one of them is the Prometheus provider, which we can use to query Oracle Database metrics collected by a Prometheus instance via the Prometheus Oracle Exporter.
The snippet below shows a toml configuration example for the Oracle Exporter extracting metrics regarding the Oracle sessions:
The following example shows how to configure a telemetry instance for a Prometheus provider in order to query the data points extracted from the exporter described above:
See Optimizing an Oracle Database server instance and Optimizing an Oracle Database for an e-commerce service for examples of studies leveraging the Oracle Database pack.
The CSV provider collects metrics from CSV files and makes them available to Akamas. It offers a very versatile way to integrate custom data sources.
This section provides the minimum requirements that you should match before using the CSV File telemetry provider.
The following requirements should be met to enable the provider to gather CSV files from remote hosts:
Port 22 (or a custom one) should be open from Akamas installation to the host where the files reside.
The host where the files reside should support SCP or SFTP protocols.
Read access to the CSV files target of the integration
Versions < 2.0.0 are compatibile with Akamas until version 1.8.0
Versions >= 2.0.0 are compatible with Akamas from version 1.9.0
The CSV File provider is generic and allows integration with any data source, therefore it does not come with support for a specific component type.
To operate properly, the CSV file provider expects the presence of four fields in each processed CSV file:
A timestamp field used to identify the point in time a certain sample refers to.
A component field used to identify the Akamas entity.
A metric field used to identify the name of the metric.
A value field used to store the actual value of the metric.
These fields can have custom names in the CSV file, you can specify them in the provider configuration.
When optimizing a PostgreSQL instance, typically the goal is one of the following:
Throughput optimization: increasing the number of transactions
Cost optimization: minimize resource consumption according to a typical workload, thus cutting costs
Please refer to the for the list of component types, parameters, metrics, and constraints.
Akamas offers many operators that you can use to apply the parameters for the tuned PostgreSQL instances. In particular, we suggest using the for parameters templating and configuration, and the for restoring DB data and launching scripts.
A typical optimization process involves the following steps:
Configure PostgreSQL parameters
Restore DB data
Restart PostgreSQL and wait for the initialization
Run benchmark
Parse results
Please note that most PostgreSQL parameters do not need an application restart.
To install the CSV File provider, create a YAML file (called provider.yml
in this example) with the specification of the provider:
Then, you can then install the provider with the Akamas CLI:
Akamas supports the integration with virtually any telemetry and observability tool.
The following table describes the supported Telemetry Providers, which are created automatically at installation time.
Notice that Telemetry Providers are shared across all the workspaces within the same Akamas installation, and only users with administrative privileges can manage them.
Akamas provides the following areas of integration with your ecosystem, which may apply or not depending on whether you are running or :
Telemetry Providers tools providing time series for metrics of interest for the system to be optimized (see also ) - this integration applies to both offline and live optimization studies;
Configuration Management tools providing the ability to set tunable parameters for the system to be optimized - this integration applies to both offline and live optimization studies;
Value Stream Delivery tools to implement a continuous optimization process as part of a CI/CD pipeline - this integration applies to both offline and live optimization studies;
Load Testing tools used to reproduce a synthetic workload on the system to be optimized; notice that these tools may also act as Telemetry Providers (e.g. for end-user metrics) - this integration only applies to offline optimization studies.
These integrations may require some setup on both the tool and the Akamas side and may also involve defining workflows and making use of workflow operators.
To create an instance of the CSV provider, build a YAML file (instance.yml
in this example) with the definition of the instance:
Then you can create the instance for the system
using the Akamas CLI:
timestampFormat
formatRegarding the timestamp format, please notice that while the week-year format YYYY
is compliant with the ISO-8601 specification, but you should replace it with the year-of-era format yyyy
if you are specifying a timestampFormat
different from the ISO one. For example:
Correct: yyyy-MM-dd HH:mm:ss
Wrong: YYYY-MM-dd HH:mm:ss
When you create an instance of the CSV provider, you should specify some configuration information to allow the provider to correctly extract and process metrics from your CSV files.
You can specify configuration information within the config
part of the YAML of the instance definition.
address
- a URL or IP identifying the address of the host where CSV files reside
username
- the username used when connecting to the host
authType
- the type of authentication to use when connecting to the file host; either password
or key
auth
- the authentication credential; either a password or a key according to authType
. When using keys, the value can either be the value of the key or the path of the file to import from
remoteFilePattern
- a list of remote files to be imported
protocol
- the protocol to use to retrieve files; either scp
or sftp
. Default is scp
fieldSeparator
- the character used as a field separator in the csv files. Default is ,
componentColumn
- the header of the column containing the name of the component. Default is COMPONENT
timestampColumn
- the header of the column containing the timestamp. Default is TS
timestampFormat
- the format of the timestamp (e.g. yyyy-MM-dd HH:mm:ss zzz
). Default is YYYY-MM-ddTHH:mm:ss
You should also specify the mapping between the metrics available in your CSV files and those provided by Akamas. This can be done in the metrics
section of the telemetry instance configuration. To map a custom metric you should specify at least the following properties:
metric
- the name of a metric in Akamas
datasourceMetric
- the header of a column that contains the metric in the CSV file
The provider ignores any column not present as datasourceMetric
in this section.
The sample configuration reported in this section would import the metric cpu_util
from CSV files formatted as in the example below:
The following represents the complete configuration reference for the telemetry provider instance.
The following table reports the configuration reference for the config
section
The following table reports the configuration reference for the metrics
section
Here you can find common use cases addressed by this provider.
Note that the metrics are percentages (between 1 and 100), while Akamas accepts percentages as values between 0 and 1, therefore each metric in this configuration has a scale factor of 0.001.
You can import the two CPU metrics and the memory metric from a SAR log using the following telemetry instance configuration.
Using the configured instance, the CSV File provider will perform the following operations to import the metrics:
Retrieve the file "/csv/sar.csv" from the server "127.0.0.1" using the SCP protocol authenticating with the provided password.
Use the column hostname
to lookup components by name.
Use the column timestamp
to find the timestamps of the samples (that is expected to be in the format specified by timestampFormat
).
Collect the metrics (two with the same name, but different labels, and one with a different name):
cpu_util
: in the CSV file is in the column %user and attach to its samples the label "mode" with value "user".
cpu_util
: in the CSV file is in the column %system and attach to its samples the label "mode" with value "system".
mem_util
: in the CSV file is in the column %memory.
The page describes how to get this Telemetry Provider installed. Once installed, this provider is shared with all users of your Akamas installation and can be used to monitor many different systems, by configuring appropriate telemetry provider instances as described in the page.
You can find detailed information on timestamp patterns in the Patterns for Formatting and Parsing section on the page.
In this use case, you are going to import some metrics coming from , a popular UNIX tool to monitor system resources. SAR can export CSV files in the following format.
collects metrics from CSV files
collects metrics from Dynatrace
collects metrics from Prometheus
collects metrics from Spark History Server
collects metrics from Tricentis Neoload Web
collects metrics from MicroFocus Load Runner Professional
collects metrics from MicroFocus Load Runner Enterprise
collects price metrics for Amazon Elastic Compute Cloud (ec2) from Amazon’s own APIs
address
String
The address of the machine where the CSV file resides
A valid URL or IP
Yes
port
Number (integer)
The port to connect to, in order to retrieve the file
22
1≤port
≤65536
No
username
String
The username to use in order to connect to the remote machine
Yes
protocol
String
scp
scp
sftp
No
authType
String
Specify which method is used to authenticate against the remote machine:
password: use the value of the parameter auth
as a password
key: use the value of the parameter auth
as a private key. Supported formats are RSA and DSA
password
key
Yes
auth
String
A password or an RSA/DSA key (as YAML multi-line string, keeping new lines)
Yes
remoteFilePattern
String
The path of the remote file(s) to be analyzed. The path can contains GLOB expressio
A list of valid path for linux
Yes
componentColumn
String
The CSV column containing the name of the component.
The column's values must match (case sensitive) the name of a component specified in the System
COMPONENT
The column must exists in the CSV file
Yes
timestampColumn
String
The CSV column containing the timestamps of the samples
TS
The column must exists in the CSV file
No
timestampFormat
String
Timestamps' format
YYYY-mm-ddTHH:MM:ss
Must be specified using Java syntax.
No
fieldSeparator
String
Specify the field separator of the CSV
,
,
;
No
metric
String
The name of the metric in Akamas
An existing Akamas metric
Yes
datasourceMetric
String
The name (header) of the column that contains the specific metric
An existing column in the CSV file
Yes
scale
Decimal number
The scale factor to apply when importing the metric
staticLabels
List of key-value pairs
A list of key-value pairs that will be attached to the specific metric sample
No
The Prometheus provider collects metrics from a Prometheus instance and makes them available to Akamas.
This provider includes support for several technologies (Prometheus exporters). In any case, custom queries can be defined to gather the desired metrics.
This section provides the minimum requirements that you should match before using the Prometheus provider.
Akamas supports Prometheus starting from version2.26.
Using also theprometheus-operator
requires Prometheus 0.47 or greater. This version is bundled with the kube-prometheus-stack
since version 15.
Connectivity between the Akamas server and the Prometheus server is also required. By default, Prometheus is run on port 9090.
Node exporter (Linux system metrics)
JMX exporter (Java metrics)
cAdvisor (Docker container metrics)
CloudWatch exporter (AWS resources metrics)
Jmeter (Web application metrics)
The Prometheus provider includes queries for most of the monitoring use cases these exporters cover. If you need to specify custom queries or make use of exporters not currently supported you can specify them as described in creating Prometheus telemetry instances.
Kubernetes (Pod, Container, Workload, Namespace)
Web Application
Java (java-ibm-j9vm-6, java-ibm-j9vm-8, java-eclipse-openj9-11, java-openjdk-8, java-openjdk-11)
Linux (Ubuntu-16.04, Rhel-7.6)
Refer to Prometheus provider metrics mapping to see how component-type metrics are extracted by this provider.
Akamas reasons in terms of a system to be optimized and in terms of parameters and metrics of components of that system. To understand which metrics collected from Prometheus should be mapped to a component, the Prometheus provider looks up some properties in the components of a system grouped under prometheus
property. These properties depend on the exporter and the component type.
Nested under this property you can also include any additional field your use case may require to filter the imported metrics further. These fields will be appended in queries to the list of label matches in the form field_name=~'field_value'
, and can specify either exact values or patterns.
Notice: you should configure your Prometheus instances so that the Prometheus provider can leverage the instance
property of components, as described in the Setup datasource section here above.
It is important that you add instance
and, optionally, the job
properties to the components of a system so that the Prometheus provider can gather metrics from them:
The Prometheus provider does not usually require a specific configuration of the Prometheus instance it uses.
When gathering metrics for hosts it's usually convenient to set the value of the instance
label so that it matches the value of the instance
property in a component; in this way, the Prometheus provider knows which system component each data point refers to.
Here’s an example configuration for Prometheus that sets the instance
label:
To install the Prometheus provider, create a YAML file (provider.yml
in this example) with the definition of the provider:
Then you can install the provider using the Akamas CLI:
The installed provider is shared with all users of your Akamas installation and can monitor many different systems, by configuring appropriate telemetry provider instances.
To create an instance of the Prometheus provider, edit a YAML file (instance.yml
in this example) with the definition of the instance:
Then you can create the instance for the system
using the Akamas CLI:
When you create an instance of the Prometheus provider, you should specify some configuration information to allow the provider to extract and process metrics from Prometheus correctly.
You can specify configuration information within the config
part of the YAML of the instance definition.
address
, a URL or IP identifying the address of the host where Prometheus is installed
port
, the port exposed by Prometheus
user
, the username for the Prometheus service
password
, the user password for the Prometheus service
job
, a string to specify the scraping job name. The default is ".*" for all scraping jobs
logLevel
, set this to "DETAILED" for some extra logs when searching for metrics (default value is "INFO")
headers
, to specify additional custom headers
e.g: headers:
"custom_key": "custom_value"
namespace
, a string to specify the namespace
duration
, integer to determine the duration in seconds for data collection (use a number between 1 and 3600)
enableHttps
, boolean to enable HTTPS in Prometheus (since 3.2.6)
ignoreCertificates
, boolean to ignore SSL certificates
disableConnectionCheck
, boolean to disable initial connection check to Prometheus
The Prometheus provider allows defining additional queries to populate custom metrics or redefine the default ones according to your use case. You can configure additional metrics using the metrics
field as shown in the configuration below:
In this example, the telemetry instance will populate cust_metric
with the results of the query specified in datasource
, maintaining the value of the labels listed under labels
.
Please refer to Querying basics | Prometheus for a complete reference of PromQL
Akamas pre-processes the queries before running them, replacing special-purpose placeholders with the fields provided in the components. For example, given the following component definition:
the query sum(jvm_memory_used_bytes{instance=~"$INSTANCE$", job=~"$JOB$"})
will be expanded for this component into sum(jvm_memory_used_bytes{instance=~"service01", job=~"jmx"})
. This provides greater flexibility through the templatization of the queries, allowing the same query to select the correct data sources for different components.
The following is the list of available placeholders:
$INSTANCE$
, $JOB$
node_load1{instance=~"$INSTANCE$", job=~"$JOB$"}
node_load1{instance=~"frontend", job=~"node"}
These placeholders are replaced respectively with the instance
and job
fields configured in the component’s prometheus
configuration.
%FILTERS%
container_memory_usage_bytes{job=~"$JOB$" %FILTERS%}
container_memory_usage_bytes{job=~"advisor", name=~"db-.*"}
This placeholder is replaced with a list containing any additional filter in the component’s definition (other than instance
and job
), where each field is expanded as field_name=~"field_value"
. This is useful to define additional label matches in the query without the need to hardcode them.
$DURATION$
rate(http_client_requests_seconds_count[$DURATION$])
rate(http_client_requests_seconds_count[30s])
$NAMESPACE$
, $POD$
, $CONTAINER$
1e3 * avg(kube_pod_container_resource_limits{resource="cpu", namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%})
1e3 * avg(kube_pod_container_resource_limits{resource="cpu", namespace=~"boutique", pod=~"adservice.*", container=~"server"})
These placeholders are used within kubernetes environments
This section reports common use cases addressed by this provider.
To gather kubernetes metrics, the following exporters are required:
kube-state-metrics
cadvisor
As an example, you can define a component with type Kubernetes Container
in this way:
Check Java OpenJDK page for a list of all the Java metrics available in Akamas
You can leverage the Prometheus provider to collect Java metrics by using the JMX Exporter. The JMX Exporter is a collector of Java metrics for Prometheus that can be run as an agent for any Java application. Once downloaded, you execute it alongside a Java application with this command:
The command will expose on localhost on port 9100 Java metrics of youJar.jar
__ which can be scraped by Prometheus.
config.yaml
is a configuration file useful for the activity of this exporter. It is suggested to use this configuration for an optimal experience with the Prometheus provider:
As a next step, add a new scraping target in the configuration of the Prometheus used by the provider:
You can then create a YAML file with the definition of a telemetry instance (prom_instance.yml
) of the Prometheus provider:
And you can create the telemetry instance using the Akamas CLI:
Finally, to bind the extracted metrics to the related component, you should add the following field to the properties
of the component’s definition:
Check the Linux page for a list of all the system metrics available in Akamas
You can leverage the Prometheus provider to collect system metrics (Linux) by using the Node exporter. The Node exporter is a collector of system metrics for Prometheus that can be run as a standalone executable or a service within a Linux machine to be monitored. Once downloaded, schedule it as a service using, for example, systemd:
Here’s the manifest of the node_exporter
service:
The service will expose on localhost on port 9100 system metrics __ which can be scraped by Prometheus.
As a final step, add a new scraping target in the configuration of the Prometheus used by the provider:
You can then create a YAML file with the definition of a telemetry instance (prom_instance.yml
) of the Prometheus provider:
And you can create the telemetry instance using the Akamas CLI:
Finally, to bind the extracted metrics to the related component, you should add the following field to the properties
of the component’s definition:
The Dynatrace provider collects metrics from Dynatrace and makes them available to Akamas.
This provider includes support for several technologies. In any case, custom queries can be defined to gather the desired metrics.
Dynatrace SaaS/Managed version 1.187 or later
Kubernetes and Docker
Web Application
Ubuntu-16.04, Rhel-7.6
java-openjdk-8, java-openjdk-11
java-ibm-j9vm-6, java-ibm-j9vm-8, java-eclipse-openj9-11
Refer to Dynatrace provider metrics mapping to see how component-types metrics are extracted by this provider.
This section provides the minimum requirements that you should match before using the Prometheus provider.
Dynatrace SaaS/Managed version 1.187 or later
A valid Dynatrace license
Dynatrace OneAgent installed on the servers where the Dynatrace entities to be monitored are running
Connectivity between Akamas and the Dynatrace server on port 443
A Dynatrace API token with the privileges described here.
The Dynatrace provider needs a Dynatrace API token with the following privileges:
metrics.read (Read metrics)
entities.read (Read entities and tags)
DataExport (Access problem and event feed, metrics, and topology)
ReadSyntheticData (Read synthetic monitors, locations, and nodes)
DataImport (Data ingest, e.g.: metrics and events). This permission is used to inform Dynatrace about configuration changes.
To generate an API Token for your Dynatrace installation you can follow these steps.
To instruct Akamas from which Dynatrace entities (e.g. Workloads, Services, Process Groups) metrics should be collected you can some specific properties on components.
Different strategies can be used to map Dynatrace entities to Akamas components:
By id
By name
By tags
By Kubernetes properties
You can map a component to a Dynatrace entity by leveraging the unique id of the entity, which you should put under the id
property in the component. This strategy is best used for long-lived instances whose ID does not change during the optimization such as Hosts, Process Groups or Services.
Here is an example of how to setup host monitoring via id:
You can find the id of a Dynatrace entity by looking at the URL of a Dynatrace dashboard relative to the entity. Watch out that the "host" key is valid only for Linux components, other components (e.g. the JVM) require to drill down into the host entities to get the PROCESS_GROUP_INSTANCE or PROCESS_GROUP id.
You can map a component to a Dynatrace entity by leveraging the entity’s display name. This strategy is similar to the map by id but provides a more friendly way to identify the mapped entity. Beware that id multiple entities in your Dynatrace installation share the same name they will all be mapped to the same component. The Dynatrace display name should be put under the name
property in the component definition:
You can map a component to a Dynatrace entity by leveraging Dynatrace tags that match the entity, tags which you should put under the tags
property in the component definition.
If multiple tags are specified, instances matching any of the specified tags will be selected.
This sample configuration maps to the component all Dynatrace entities with tag environment: test
or [AWS]dynatrace-monitored: true
Dynatrace supports both key-value and key-only tags. Key-only tags can be specified as Key-value tags with an empty value as in the following example
You can map a component to a Dynatrace entity referring to a Kubernetes cluster (e.g. a Pod or a Container) by leveraging dedicated properties.
In order to properly identify the set of containers to be mapped, you can specify the following properties. Any container matching all the properties will be mapped to the component.
namespace
Kubernetes namespace
Container dashboard
containerName
Kubernetes container name
Container dashboard
basePodName
Kubernetes base pod name
Container dashboard
You can retrieve all the information to setup the properties on the top of the Dynatrace container dashboard.
The following example shows how to map a component to a container running in Kubernetes:
In order to properly identify the set of pods to be mapped, you can specify the following properties. Any pod matching all the properties will be mapped to the component.
state
State
Pod dashboard
namespace
Namespace
Pod dashboard
workload
Workload
Pod dashboard
If you need to further narrow your pod selection you can also specify a set of tags as described in the by tags. Note that tags for Kubernetes resources are called Labels in the Dynatrace dashboard.
Labels are specified as key-value in the Akamas configuration. In Dynatrace’s dashboard key and value are separated by a column (:
)
The following example shows how to map a component to a pod running in Kubernetes:
Please note, that when you are mapping components to Kubernetes entities the property type
is required to instruct Akamas on which type of entity you want to map.
Dynatrace maps Kubernetes entities to the following types:
Docker container
CONTAINER_GROUP_INSTANCE
Pod
CLOUD_APPLICATION_INSTANCE
Workload
CLOUD_APPLICATION
Namespace
CLOUD_APPLICATION_NAMESPACE
Cluster
KUBERNETES_CLUSTER
You can improve the matching of components with Dynatrace by adding a type
property in the component definition, this property will help the provider match only those Dynatrace entities of the given type.
The type of an entity can be retrieved from the URL of the entity’s dashboard
Available entities types can be retrieved, from your Dynatrace instance, with the following command:
The installed provider is shared with all users of your Akamas installation and can monitor many different systems, by configuring appropriate telemetry provider instances.
To create an instance of the Dynatrace provider, build a YAML file (instance.yml
in this example) with the definition of the instance:
Then you can create the instance for the system
using the Akamas CLI:
When you create an instance of the Dynatrace provider, you should specify some configuration information to allow the provider to correctly extract and process metrics from Dynatrace.
You can specify configuration information within the config
part of the YAML of the instance definition.
url
- URL of the Dynatrace installation API (see https://www.dynatrace.com/support/help/extend-dynatrace/dynatrace-api/ to retrieve the URL of your installation)
token
- A Dynatrace API Token with the proper permissions
You can collect additional metrics with the Dynatrace provider by using the metrics
field:
In the case in which Akamas cannot reach directly your Dynatrace installation, you can configure an HTTP proxy by using the proxy
field:
This section reports the complete reference for the definition of a telemetry instance.
This table shows the reference for the config
section within the definition of the Dynatrace provider instance:
url
String
It should be a valid URL
Yes
token
String
Yes
proxy
Object
See Proxy options reference
Yes
The specification of the HTTP proxy to use to communicate with Dynatrace.
pushEvents
String
true, false
No
true
If set to true the provider will inform dynatrace of the configuration change event which will be visible in the Dynatrace UI.
tags
Object
No
A set of global tags to match Dynatrace entities. The provider uses these tags to apply a default filtering of Dynatrace entities for every component.
This table reports the reference for the config
→ proxy
section within the definition of the Dynatrace provider instance:
address
String
It should be a valid URL
Yes
The URL of the HTTP proxy to use to communicate with the Dynatrace installation API
port
Number (integer)
1 <port
<65535
Yes
The port at which the HTTP proxy listens for connections
username
String
No
The username to use when authenticating against the HTTP proxy, if necessary
password
String
No
The username to use when authenticating against the HTTP proxy, if necessary
This table reports the reference for the metrics
section within the definition of the Dynatrace provider instance. The section contains a collection of objects with the following properties:
metric
String
It must be an Akamas metric
Yes
The name of an Akamas metric that should map to the new metric you want to gather
datasourceMetric
String
A valid Dynatrace metric
Yes
The Dynatrace query to use to extract metric
labels
Array of strings
-
No
The list of Dynatrace labels that should be retained when gathering the metric
staticLabels
Key-Value
-
No
Static labels that will be attached to metric samples
This section reports common use cases addressed by this provider.
Check the Linux optimization pack for a list of all the system metrics available in Akamas.
As a first step to start extracting metrics from Dyntrace, generate your API token and make sure it has the right permissions.
As a second step, choose a strategy to map your Linux component (MyLinuxComponent) with the corresponding Dyntrace entity.
Let’s assume you want to map by id your Dynatrace entity, you can find the id in the URL bar of a Dyntrace dashboard of the entity:
Grab the id and add it to the Linux component definition:
You can leverage the name of the entity as well:
As a third and final step, once the component is all set, you can create an instance of the Dynatrace provider and then build your first studies:
To install the Spark History Server provider, create a YAML file (called provider.yml
in this example) with the definition of the provider:
Then you can install the provider using the Akamas CLI:
The installed provider is shared with all users of your Akamas installation and can monitor many different systems, by configuring appropriate telemetry provider instances.
To create an instance of the Spark History Server provider, build a YAML file (instance.yml
in this example) with the definition of the instance:
Then you can create the instance for the system spark-system
using the Akamas CLI:
When you create an instance of the Spark History Server provider, you should specify some configuration information to allow the provider to correctly extract and process metrics from the Spark History server.
You can specify configuration information within the config
part of the YAML of the instance definition.
address
- hostname of the Spark History Server instance
The following YAML file describes the definition of a telemetry instance.
The following table reports the reference for the config
section within the definition of the Spark History Server provider instance:
This section reports common use cases addressed by this provider.
As a first step, you need to create a YAML file (spark_instance.yml
) containing the configuration the provider needs to connect to the Spark History Server, plus the filter on the desired level of granularity for the imported metrics:
and then create the telemetry instance using the Akamas CLI:
This section reports common best practices you can adopt to ease the use of this telemetry provider.
configure metrics granularity: in order to reduce the collection time, configure the importLevel
to import metrics with a granularity no finer than the study requires.
wait for metrics publication: make sure in the workflow there is a few-minute interval between the end of the Spark application and the execution of the Spark telemetry instance, since the Spark History Server may take some time to complete the publication of the metrics.
The Spark History Server provider collects metrics from a Spark History Server instance and makes them available to Akamas.
Prerequisites
This section provides the minimum requirements that you should match before using the Spark History Server telemetry provider.
Apache Spark 2.3
Spark History Server API must be reachable at the provided address and port (the default port is 18080
).
spark-application
You can check to see how component-types metrics are extracted by this provider.
Versions < 2.0.0 are compatible with Akamas until version 1.8.0
Versions >= 2.0.0 are compatible with Akamas from version 1.9.0
This section lists the workflow operators this provider depends on:
Akamas uses components to identify specific elements of the system to be monitored and optimized. Your system might contain multiple components to model, for example, a Spark application and each host of the cluster. To point Akamas to the right component when extracting metrics you need to add a property called sparkApplication
to your Spark Application component. The provider will only extract metrics for components for which this property has been specified.
The NeoLoad Web provider collects metrics from a NeoLoad Web instance and makes them available to Akamas.
This section provides the minimum requirements that you should match before using the NeoLoad Web telemetry provider.
NeoLoad Web SaaS or managed version 7.1 or later.
The NeoLoad Web API must be reachable at a provided address and port (by default ).
NeoLoad Web API access token.
Versions < 2.0.0 are compatibile with Akamas untill version 1.8.0
Versions >= 2.0.0 are compatible with Akamas from version 1.9.0
Web Application
This section lists the workflow operators this provider depends on.
Akamas reasons in terms of a system to be optimized and in terms of parameters and metrics of components of that system. To understand which metrics collected from NeoloadWeb should refer to which component, the NeoloadWeb provider looks up the property neoloadweb
in the components of a system:
This page describes how to set up an OracleDB exporter in order to gather metrics regarding an Oracle Database instance through the Prometheus provider.
The OracleDB exporter repository is available on the . The suggested deploy mode is through a , since the Prometheus instance can easily access the running container through the Akamas network.
Use the following command line to run the container, where cust-metrics.toml
is your configuration file defining the queries for additional custom metrics (see paragraph below) and DATA_SOURCE_NAME
an environment variable containing the Oracle EasyConnect string:
You can refer to the for more details or alternative deployment modes.
It is possible to define additional queries to expose custom metrics using any data in the database instance that is readable by the monitoring user (see for more details about the syntax).
The following is an example of exporting system metrics from the Dynamic Performance (V$) Views used by the Prometheus provider default queries for the :
This page describes how to set up a CloudWatch exporter in order to gather AWS metrics through the Prometheus provider. This is especially useful to monitor system metrics when you don’t have direct SSH access to AWS resources like EC2 Instances or if you want to gather AWS-specific metrics not available in the guest OS.
In order to fetch metrics fromCloudWatch, the exporter requires an IAM user or role with the following privileges:
cloudwatch:GetMetricData
cloudwatch:GetMetricStatistics
cloudwatch:ListMetrics
tag:GetResources
You can assign AWS-managed policies CloudWatchReadOnlyAccess and ResourceGroupsandTagEditorReadOnlyAccess to the desired user to enable these permissions.
The CloudWatch exporter repository is available on the . It requires a minimal configuration to fetch metrics from the desired AWS instances. Below is a short list of the parameters needed for a minimal configuration:
region: AWS region of the monitored resource
metrics: a list of objects containing filters for the exported metrics
aws_namespace: the namespace of the monitored resource
aws_metric_name: the name of the AWS metric to fetch
aws_dimensions: the dimension to expose as labels
aws_dimension_select: the dimension to filter over
aws_statistics: the list of metric statistics to expose
aws_tag_select: optional tags to filter on
tag_selections: map containing the list of values to select for each tag
resource_type_selection: resource type to fetch the tags from (see: )
resource_id_dimension: dimension to use for the resource id (see: )
Notice: AWS bills CloudWatch usage in batches of 1 million requests, where every metric counts as a single request. To avoid unnecessary expenses configure only the metrics you need.
Notice: AWS bills CloudWatch usage in batches of 1 million requests, where every metric counts as a single request. To avoid unnecessary expenses configure an appropriate scraping interval.
Once you configured the exporter in the Prometheus configuration you can start to fetch metrics using the Prometheus provider. The following sections describe some scripts you can add as tasks in your workflow.
Since Amazon bills your CloudWatch queries is wise to run the exporter only when needed. The following script allows you to manage the exporter from the workflow by adding the following tasks:
start the container right before the beginning of the load test (command: bash script.sh start
)
The example below is the Akamas-supported configuration, fetching metrics of EC2 instances named server1 and server2.
See below
See below
If not set in the component properties, this placeholder is replaced with the duration field configured in the telemety-instance. You should use it with instead of hardcoding a fixed value.
See
The URL of the Dynatrace installation API (see the )
The Dynatrace API Token the provider should use to interact with Dynatrace. The token should have .
Check for a list of all Spark application metrics available in Akamas
This example shows how to configure a Spark History Server provider in order to collect performance metrics about a Spark application submitted to the cluster using the operator.
Finally, you will need to define for your study a workflow that includes the submission of the Spark application to the cluster, in this case using the :
You can check to see how component-types metrics are extracted by this provider.
For a complete list of possible values for namespaces, metrics, and dimensions please refer to the official .
The suggested deployment mode for the exporter is through a . The following snippet provides a command line example to run the container (remember to provide your AWS credentials if needed and the path of the configuration file):
You can refer to the for more details or alternative deployment modes.
In order to scrape the newly created exporter add a new job to the configuration file. You will also need to define some in order to add the instance
label required by Akamas to properly filter the incoming metrics.
In the example below the instance
label is copied from the instance’s Name
tag:
It’s worth noting that CloudWatch may require some minutes to aggregate the stats according to the configured granularity, causing the telemetry provider to fail while trying to fetch data points not available yet. To avoid such issues you can add at the end of your workflow a task using an to wait for the CloudWatch metrics to be ready. The following script is an example of implementation:
stop the container after the metrics publication, as described in the (command: bash script.sh stop
).
address
URL
Spark History Server address
Yes
importLevel
String
Granularity of the imported metrics
job
Allowed values: job
, stage
, task
No
port
Integer
Spark History Server listening port
18080
No
To install the NeoLoad Web provider, create a YAML file (called provider.yml
in this example) with the definition of the provider:
Then you can install the provider using the Akamas CLI:
The installed provider is shared with all users of your Akamas installation and can monitor many different systems, by configuring appropriate telemetry provider instances.