Akamas Docs
3.1.3
3.1.3
  • How to use this documentation
  • Getting started with Akamas
    • Introduction to Akamas
    • Licensing
    • Deployment
      • Cloud Hosting
    • Security
    • Maintenance & Support (M&S) Services
      • Customer Support Services
      • Support levels for Customer Support Services
      • Support levels for software versions
      • Support levels with Akamas 3.1
  • Installing Akamas
    • Akamas Architecture
    • Prerequisites
      • Hardware Requirements
      • Software Requirements
      • Network requirements
    • Install Akamas dependencies
    • Install the Akamas Server
      • Online installation mode
        • Online installation behind a Proxy server
      • Offline installation mode
      • Changing UI Ports
      • Setup HTTPS configuration
    • Install the Akamas CLI
      • Setup the Akamas CLI
      • Verify the Akamas CLI
      • Initialize Akamas CLI
      • Change CLI configuration
    • Verify the Akamas Server
    • Install the Akamas license
    • Manage anonymous data collection
    • Troubleshoot install issues
    • Manage the Akamas Server
      • Akamas logs
      • Audit logs
      • Install upgrades and patches
      • Monitor the Akamas Server
      • Backup & Recover of the Akamas Server
  • Using Akamas
    • General optimization process and methodology
    • Preparing optimization studies
      • Modeling systems
      • Modeling components
        • Creating custom optimization packs
        • Managing optimization packs
      • Creating telemetry instances
      • Creating automation workflows
        • Creating workflows for offline studies
        • Performing load testing to support optimization activities
        • Creating workflows for live optimizations
      • Creating optimization studies
        • Defining optimization goal & constraints
        • Defining windowing policies
        • Defining KPIs
        • Defining parameters & metrics
        • Defining workloads
        • Defining optimization steps
        • Setting safety policies
    • Running optimization studies
      • Before running optimization studies
      • Analyzing results of offline optimization studies
        • Optimization Insights
      • Analyzing results of live optimization studies
      • Before applying optimization results
    • Guidelines for choosing optimization parameters
      • Guidelines for JVM (OpenJ9)
      • Guidelines for JVM layer (OpenJDK)
      • Guidelines for Oracle Database
      • Guidelines for PostgreSQL
    • Guidelines for defining optimization studies
      • Optimizing Linux
      • Optimizing Java OpenJDK
      • Optimizing OpenJ9
      • Optimizing Web Applications
      • Optimizing Kubernetes
      • Optimizing Spark
      • Optimizing Oracle Database
      • Optimizing MongoDB
      • Optimizing MySQL Database
      • Optimizing PostgreSQL
  • Integrating Akamas
    • Integrating Telemetry Providers
      • CSV provider
        • Install CSV provider
        • Create CSV telemetry instances
      • Dynatrace provider
        • Install Dynatrace provider
        • Create Dynatrace telemetry instances
      • Prometheus provider
        • Install Prometheus provider
        • Create Prometheus telemetry instances
        • CloudWatch Exporter
        • OracleDB Exporter
      • Spark History Server provider
        • Install Spark History Server provider
        • Create Spark History Server telemetry instances
      • NeoLoadWeb provider
        • Install NeoLoadWeb telemetry provider
        • Create NeoLoadWeb telemetry instances
      • LoadRunner Professional provider
        • Install LoadRunner Professional provider
        • Create LoadRunner Professional telemetry instances
      • LoadRunner Enterprise provider
        • Install LoadRunner Enterprise provider
        • Create LoadRunner Enterprise telemetry instances
      • AWS provider
        • Install AWS provider
        • Create AWS telemetry instances
    • Integrating Configuration Management
    • Integrating Value Stream Delivery
    • Integrating Load Testing
      • Integrating NeoLoad
      • Integrating Load Runner Professional
      • Integrating LoadRunner Enterprise
  • Akamas Reference
    • Glossary
      • System
      • Component
      • Metric
      • Parameter
      • Component Type
      • Workflow
      • Telemetry Provider
      • Telemetry Instance
      • Optimization Pack
      • Goals & Constraints
      • KPI
      • Optimization Study
      • Offline Optimization Study
      • Live Optimization Study
      • Workspace
    • Construct templates
      • System template
      • Component template
      • Parameter template
      • Metric template
      • Component Types template
      • Telemetry Provider template
      • Telemetry Instance template
      • Workflows template
      • Study template
        • Goal & Constraints
        • Windowing policy
          • Trim windowing
          • Stability windowing
        • Parameter selection
        • Metric selection
        • Workload selection
        • KPIs
        • Steps
          • Baseline step
          • Bootstrap step
          • Preset step
          • Optimize step
        • Parameter rendering
    • Workflow Operators
      • General operator arguments
      • Executor Operator
      • FileConfigurator Operator
      • LinuxConfigurator Operator
      • WindowsExecutor Operator
      • WindowsFileConfigurator Operator
      • Sleep Operator
      • OracleExecutor Operator
      • OracleConfigurator Operator
      • SparkSSHSubmit Operator
      • SparkSubmit Operator
      • SparkLivy Operator
      • NeoLoadWeb Operator
      • LoadRunner Operator
      • LoadRunnerEnteprise Operator
    • Telemetry metric mapping
      • Dynatrace metrics mapping
      • Prometheus metrics mapping
      • NeoLoadWeb metrics mapping
      • Spark History Server metrics mapping
      • LoadRunner metrics mapping
    • Optimization Packs
      • Linux optimization pack
        • Amazon Linux
        • Amazon Linux 2
        • Amazon Linux 2022
        • CentOS 7
        • CentOS 8
        • RHEL 7
        • RHEL 8
        • Ubuntu 16.04
        • Ubuntu 18.04
        • Ubuntu 20.04
      • DotNet optimization pack
        • DotNet Core 3.1
      • Java OpenJDK optimization pack
        • Java OpenJDK 8
        • Java OpenJDK 11
      • OpenJ9 optimization pack
        • IBM J9 VM 6
        • IBM J9 VM 8
        • Eclipse Open J9 11
      • NodeJS optimization pack
        • NodeJS
      • GO optimization pack
        • GO 1
      • Web Application optimization pack
        • Web Application
      • Docker optimization pack
        • Container
      • Kubernetes optimization pack
        • Kubernetes Pod
        • Kubernetes Container
        • Kubernetes Workload
        • Kubernetes Namespace
        • Kubernetes Cluster
      • WebSphere optimization pack
        • WebSphere 8.5
        • WebSphere Liberty ND
      • AWS optimization pack
        • EC2
        • Lambda
      • PostgreSQL optimization pack
        • PostgreSQL 11
        • PostgreSQL 12
      • Cassandra optimization pack
        • Cassandra
      • MySQL Database optimization pack
        • MySQL 8.0
      • Oracle Database optimization pack
        • Oracle Database 12c
        • Oracle Database 18c
        • Oracle Database 19c
        • RDS Oracle Database 11g
        • RDS Oracle Database 12c
      • MongoDB optimization pack
        • MongoDB 4
        • MongoDB 5
      • Elasticsearch optimization pack
        • Elasticsearch 6
      • Spark optimization pack
        • Spark Application 2.2.0
        • Spark Application 2.3.0
        • Spark Application 2.4.0
    • Command Line commands
      • Administration commands
      • User and Workspace management commands
      • Authentication commands
      • Resource management commands
      • Optimizer options commands
    • Release Notes
  • Knowledge Base
    • Setting up a Konakart environment for testing Akamas
    • Modeling a sample Java-based e-commerce application (Konakart)
    • Optimizing a web application
    • Optimizing a sample Java OpenJ9 application
    • Optimizing a sample Java OpenJDK application
    • Optimizing a sample Linux system
    • Optimizing a MongoDB server instance
    • Optimizing a Kubernetes application
    • Leveraging Ansible to automate AWS instance management
    • Guidelines for optimizing AWS EC2 instances
    • Optimizing a sample application running on AWS
    • Optimizing a Spark application
    • Optimizing an Oracle Database server instance
    • Optimizing an Oracle Database for an e-commerce service
    • Guidelines for optimizing Oracle RDS
    • Optimizing a MySQL server database running Sysbench
    • Optimizing a MySQL server database running OLTPBench
    • Optimizing cost of a Kubernetes application while preserving SLOs in production
    • Optimizing a live full-stack deployment (K8s + JVM)
  • Akamas Free Trial
Powered by GitBook
On this page
  • Failing workflows
  • Best Practices

Was this helpful?

Export as PDF
  1. Using Akamas
  2. Preparing optimization studies
  3. Creating automation workflows

Creating workflows for offline studies

Last updated 2 years ago

Was this helpful?

A workflow for an automates all the actions required to interface the configuration management and load testing tools (see the following figure) at each experiment or trial. Notice that metrics collection is an implicit action that does not need to be coded as part of the workflow.

More in detail, a typical workflow includes the following types of tasks:

  • Preparing the application, by executing all cleaning or reset actions that are required to prepare the load testing phase and ensuring that each experiment is executed under exactly the same conditions - for example, this may involve cleaning caches, uploading test data, etc

  • Applying the configuration, by preparing and then applying the parameter configuration under test to the target environment - this may require interfacing configuration management tools or pushing configuration to a repository, restarting the entire application or some of its components to ensure that some parameters are effectively applied, and then checking that after restarting the application is up & running before the workflow execution continues, and checking whether the configuration has been correctly applied

  • Applying the workload, by launching a load test to assess the behavior of the system under the applied configuration and synthetic workload defined in the load testing scenarios - of course, a preliminary step is to design a load testing scenario and synthetic workload that ensures that optimized configurations resulting from the offline optimization can be applied to the target system under the real or expected workload

Failing workflows

A workflow interrupts in case any of its steps does. A failing workflow causes the experiment or trial to fail. This should be considered as a different situation than a specific configuration not matching optimization constraints or causing the system under test to fail to run. For example, if the amount of max memory configured was too low, the application may fail to start.

When an experiment fails, the Akamas AI engine takes this information into account and thus learns that that parameter configuration was bad. This way, the AI engine automatically tries to avoid the regions of the parameter space which can lead to low scores or failures.

Best Practices

Creating effective workflows is essential to ensure that Akamas can automatically identify the optimal configuration in a reliable and efficient way. Some best practices on how to build robust workflows are described here below.

Reusing workload as much as possible

Since Akamas workflows are first-class entities that can be used by multiple studies, it might be useful to avoid creating (and maintaining) multiple workflows and instead define workflows that can be easily reused, by factoring all differences into specific action parameters.

Of course, this general guideline should be balanced with respect to other requirements, such as avoiding potential conflicts due to different teams modifying the same workload for different uses and potentially impacting optimization results.

Building robust workflows

Akamas takes into account the exit code of each of the workflow tasks, and the whole workflow fails if a task exits with an error. Therefore, the best practice is to make use of exit codes in each task, to ensure that task failures can only happen in case of bad parameter configuration.

For example, it is important to always check that the application has correctly started and is up and running (after a new configuration has been applied). This can be done by:

  • including a workflow task that tests the application is up and running after the tasks where the configuration is applied;

  • making sure that this task exits with an error in case the application has not correctly started (typically after a timeout).

Another example is when the underlying environment incurs issues during the optimization (e.g. a database might be mistakenly shut down by another team). As much as possible, all these environmental transient issues should be carefully avoided. Akamas also provides the ability to execute multiple task retries (default is twice, configurable) to compensate for these transient issues, provided they only last for a short time (the retry time and delay are also configurable).

Building workflows that ensure reproducible experiments

As for any other performance evaluation activity, Akamas experiments should be designed to be reproducible: if the same experiment (hence, the same parameter configuration) is executed multiple times (i.e. in multiple trials), the same performance results should be found for each trial.

Therefore, it is fundamental that workflows include all the necessary tasks to realize reproducible experiments. Particular care needs to be taken to correctly manage the system state across the experiments and trials. System state can include:

  • Application caches

  • Operating system cache and buffers (e.g. Linux filesystem page cache)

  • Database tables that fill up during the optimization process

All experiments should always start with a clean and well-known state. If the state is not properly managed, it may happen that the performance of the system is observed to change (whether higher or lower) not because of the effect of the applied parameters, but due to other effects (e.g. warming of caches).

Best practices to consistently manage system state across experiments include:

  • Restoring the system state at the beginning of each experiment - this may involve restarting the application, clearing caches, restoring DB tables, etc;

  • Allowing for a sufficient warm-up period in the performance tests, so to ensure application performance has reached stability. See also the recommended best practices about properly managing warm-up periods in the following section about creating an optimization study.

Another common cause that can impact the reproducibility of experiments is an unstable infrastructure or environment. Therefore, it is important to ensure that the underlying infrastructure is stable and that no other workload that might impact the optimization results is running on it. For example, beware of scheduled system jobs (e.g. backups), automatic software updates or anti-virus systems that might not explicitly be considered as part of the environment but that may unexpectedly alter its performance behavior.

Taking into account workflow duration

When designing workflows, it is important to take into account the potential duration of their tasks. Indeed, the task duration impacts the duration of the overall optimization and might impact the ability to execute a sufficient number of experiments within the overall time interval or specific time windows allowed for the optimization study.

Typically, the longest task in a workflow is the one related to applying workload (e.g. launching a load test or a batch job): such tasks can last for dozens of minutes if not hours. However, a workflow may also include other ancillary tasks that may provide nontrivial contributions to the task durations (e.g. checking the status to ensure that the application is up & running).

Making workflows fail fast

As general guidance, it is better to fail fast by performing quick checks executed as early as possible. For example, it is better to do a status check before launching a load test instead of possibly waiting for it to complete (maybe after 1h) just to discover that the application did not even start.

The section provides some examples of how to define workload for a specific technology. In a complex application, a workflow may include multiple actions of the same type, each operating on separate components of the target system. The guide provides some real-world examples of how to create workflows and optimization studies.

This explains why it is important to build robust workflows that ensure experiments only fail in case bad configurations are tested. See the specific entry in the best practices section below.

Some additional best practices related to the design and implementation of load testing are described in the page.

Optimization examples
Knowledge base
Performing load testing to support optimization activities
Building robust workflow
offline optimization study
Automated workflow for an offline optimization study
Workflow reuse by two different studies