1 of 100

Akamas Reference

This guide provides a Glossary describing Akamas key concepts with their associated construct templates, command line commands, and user interfaces.

This guide also provides references to:

Construct templates
Workflow Operators structure and operators
Telemetry Providers metric mapping
Optimization Packs metrics and parameters
Commands to administer Akamas, manage users, authenticate and manage its resources

Glossary

This section provides a definition of Akamas' key concepts and terms and also provides references to the related construct properties, commands, and user interfaces.

Term

Definition

systems targeted by optimization studies

elements of the system

types associated to a system component

objects encapsulating knowledge about component types

a measured metric, collected via telemetry providers

tunable parameters, set via native or other interfaces

general definition of providers of collected metrics

specific instances of telemetry providers

automation workflow to set parameters, collect metrics and run load testing

goal and constraints defined for an optimization study

optimization studies for a target system

optimization studies for a non-live system

optimization studies for a live system

virtual environments to organize and isolate resources

System

A system represents of the entire system which is the target of optimization.

A system is a single object irrespective of the number or type of entities or layers that are in the scope of the optimization. It can be used to model and describe a wide set of entities like:

An N-layers application
A single micro-service
A single (or a collection of) batch job(s)

A System is made of one or more components. Each component represents one of the elements in the system, whose parameters are involved in the optimization or whose metrics are collected to evaluate the results of such optimization.

Construct

A system is described by the following properties:

The full micro-services stack of an application
a name that uniquely identifies the system
a description that clarifies what the system refers to

The construct to be used to define a system is described on the System template page.

Commands

A system is an Akamas resource that can be managed via CLI using the resource management commands.

User Interface

The Akamas UI displays systems (depending on the user privileges on the defined workspaces) in a specific top-level menu.

Component

A component represents an element of a system. Typically, systems are made up of different entities and layers which can be modeled by components. In other words, a system can be considered a collection of related components.

Notice that a component is a black-box definition of each entity involved in an optimization study, so detailed modeling of the entities being involved in the optimization is not required. The only relevant elements are the parameters that are involved in the optimization and the metrics that are collected to evaluate the results of such an optimization.

Notice that only the entities that are directly involved in the optimization need to be modeled and defined within Akamas. An entity is involved in an optimization study if it is optimized or monitored by Akamas, where "optimized" means that Akamas is optimizing at least one of its parameters, and "monitored" means that Akamas is monitoring at least one of its metrics.

Construct

A component is described by the following mandatory properties (other properties can be defined but are not mandatory):

a name that uniquely identifies the component within the system
a description that clarifies what the component refers to
a component type that identifies the technology of the component (see component type)

In general, a component contains a set of each of the following:

parameter(s) in the scope of the optimization
metric(s) needed to define the optimization goal
metric(s) needed to define the optimization constraints
metric(s) that are not needed to either define the optimization goal or constraints, and hence not used by Akamas to perform the optimization, but are collected in order to support the analysis (and which can be possibly added at a later time as part of optimization goal or constraint when refining the optimization).

The construct to be used to define a component is described on the Component template page.

Commands

A component is an Akamas resource that can be managed via CLI using the resource management commands.

User Interface

The Akamas UI shows more details about components by drilling down their respective system.

Parameter

A parameter is a property of the system that can be applied and tuned to change the system's behavior. Akamas optimizes systems by changing parameters to achieve the stated goal while respecting the defined constraints.

Examples of a parameter include:

Configuration knobs (e.g. JVM garbage collection type)
Resource settings (e.g. amount of memory allocated to a Spark job)
Algorithms settings (e.g. learning rate of a neural network)
Architectural properties (e.g. how many caching layers in an enterprise application)
Type of resources (e.g. AWS EC2 instance or EBS volume type)
Any other thing (e.g. amount of sugar in your cookies)

The following table describes the parameter types:

Prameter Type

Domain

Akamas normalized domain

REAL

real values

Akamas normalizes the values

[0.0, 10.0] → [0.0, 1.0]

INTEGER

integer values

Akamas converts the integer into real and then normalizes the values

[0, 3] → [0.0, 3.0] → [0.0, 1.0]

ORDINAL

integer values

Akamas converts the category into real and then normalizes the values

['a', 'b', 'c'] → [0, 2] → [0.0, 2.0] → [0.0, 1.0]

CATEGORICAL

categorical values

Akamas converts each param value into a new param that may be either 1.0 (active) or 0.0 (inactive), only 1 of these new params can be "active" during each exp:

['a', 'b', 'c'] → [[0.0, 1.0], [0.0, 1.0], [0.0, 1.0]]

Construct

A parameter is described by the following properties:

a name that uniquely identifies the parameter
a description that clarifies the semantics of the parameter
a unit that defines the unit of measurement used by the parameter

Although users can create parameters with any name, we suggest using the naming convention context_parameter where

context refers to the technology or more general environment in which that metric is defined (e.g. elasticsearch, jvm, mysql, spark)
parameter is the parameter name in the original context (e.g. gcType, numberOfExecutors)

This makes it possible to identify parameters more easily and avoid any potential name clash.

The construct to be used to define a parameter is described on the Parameter template page.

User Interface

Parameters are displayed in the Akamas UI when drilling down to each system component.

For each optimization study, the optimization scope is the set of parameters that Akamas can change to achieve the defined optimization goal.

Component Type

A component type is a blueprint for a component that describes the type of entity the component refers to. In Akamas, a component needs to be associated with a component type, from which the component inherits its metrics and parameters.

Component types are platform entities (i.e.: shared among all the users) usually provided off the shelf and shipped within the Optimization Packs. Typically, different component types within the same optimization pack are used to model different versions/releases of the same technology.

Akamas users with appropriate privileges can create custom component types and optimization packs, as described on the Creating custom optimization pack page.

Construct

A component type is described by the following mandatory properties (other properties can be defined but are not mandatory):

a name that uniquely identifies the component type within the system
a description that clarifies what the component type refers to
a parameter definitions array (more on Parameters later)
a metrics array (more on Metrics later)

The construct to be used to define a component type is described on the Component type template page.

Commands

A component type is an Akamas resource that can be managed via CLI using the resource management commands.

User Interface

When visualizing system components the component type is displayed.

The following figure shows the out-of-the-box JVM component types related to the JVM optimization pack.

Workflow

A workflow is a set of tasks that needs to be executed in sequence to evaluate a configuration as part of an optimization study. A task is a single action performed within a workflow.

Workflows allow you to automate Akamas optimization studies, by automatically executing a sequence of tasks such as initializing an environment, triggering load testing, restoring a database, applying configurations, and much more.

These are examples of common tasks that can be performed by a task

Launch remote commands via SSH
Apply parameters values in configuration files
Execute Spark jobs via spark-submit API
Start performance tests by integrating with external tools such as Neoload

Workflows are first-class entities that can be defined globally and then used in multiple optimization studies.

Akamas provides several workflow operators that can be used to perform tasks in a workflow. Some operators are general-purpose, such as those executing a command or script on a specific host, while others provide native integrations with specific technologies and tools, such as Spark History Server or load testing tools.

Construct

The construct to be used to define a workflow is described on the Workflow template page.

Commands

A telemetry provider is an Akamas resource that can be managed via CLI using the resource management commands.

User Interface

The Akamas UI shows systems in a specific top-level menu.

The list of tasks is displayed when drilling down to each specific workflow.

Telemetry Provider

A telemetry provider is a software object that represents a data source of metrics. A telemetry instance is a specific instance of a telemetry provider that refers to a specific data source.

Examples of telemetry providers are:

monitoring tools (e.g. Prometheus or Dynatrace)
load testing tools (e.g. LoadRunner or Neoload)
CSV files

A telemetry provider is a platform-wide entity that can be reused across systems to ease the integration with metrics sources.

Akamas provides a number of out-of-the-box telemetry providers. Custom telemetry providers can also be created.

Construct

The construct to be used to define a telemetry provider is described on the Telemetry Provider template page.

Commands

A telemetry provider is an Akamas resource that can be managed via CLI using the resource management commands.

User Interface

The Akamas UI shows systems in a specific top-level menu.

Optimization Pack

An optimization pack is a software object that provides a convenient facility for encapsulating all the knowledge (e.g. metrics, parameters with their default values and domain ranges) required to apply Akamas optimizations to a set of entities associated with the same technology.

Notice that while optimization packs are very convenient for modeling systems and creating studies, it is not required for these entities to be covered by an optimization pack.

Akamas provides a library of out-the-box optimization packs and new custom optimization packs can be easily added (no coding is required).

Construct

An optimization pack needs to include the entities that encapsulate technology-specific information related to the supported component types:

supported component types
parameters and metrics for each component type
supported telemetry providers (optional)

Commands

An optimization pack is an that can be managed via CLI using the

User Interface

The Akamas UI shows systems in a specific top-level menu.

An optimization pack encapsulates one or more of the following technology-specific elements:

Component Types: these represent the type of the component(s) included, each with its associated parameters and metrics
Telemetry Providers: that define where to collect metrics

An optimization pack enables Akamas users to optimize a technology without necessarily being an expert in that technology and to code their knowledge about a technology or a specific application to be reused in multiple optimization studies to ease the modeling process.

Goals & Constraints

The optimization goal defines the objective of an optimization study to be achieved by changing the system parameters to modify the system behavior while also satisfying any defined optimization constraints on the system metrics, possibly representing SLOs.

Construct

A goal is defined by:

an optimization objective: either maximize or minimize
a scoring function (scalar): either a single metric or a formula defined by one or more metrics

One or more constraints can be associated with a goal

a formula defined on one or more metrics, referring to either absolute values (absolute constraints) or relative to a baseline value (relative constraints)

Notice that relative constraints are only supported by offline optimization studies while absolute constraints are supported by both offline and online optimization studies.

Goals and constraints are not an Akamas resource as they are defined as part of an optimization study. The construct to be used to define a goal and its constraints are described in the Goal & Constraint page of the Study template section.

Commands

Goals and constraints are not an Akamas resource and are always defined as part of an optimization study.

User Interface

Goals and constraints are displayed in the Akamas UI when drilling down each optimization study.

The detail of the formula used to define the goal may also be displayed:

KPI

A KPI is a metric that is worth considering when analyzing the result of an offline optimization study, looking for (sub)optimal configurations generated by Akamas AI to be applied.

Akamas automatically considers any metric referred to in the defined optimization goal and constraints for an offline optimization study as a KPI. Moreover, any other metrics of the system component can be specified as a KPI for an offline optimization study.

Construct

A KPI is defined as follows (from the UI or the CLI):

Field name

Field description

KPIs are not an as they are defined as part of an optimization study. The construct to define KPIs is described on the page of the section.

Commands

KPIs are not an and are always defined as part of an optimization study.

User Interface

The number and first KPIs are displayed in the Akamas UI in the header of each offline optimization study.

The full list of KPIs is displayed by drilling down to the KPIs section.

From this section, it is possible to modify the list of KPIs and change their names and other attributes.

Offline Optimization Study

Offline optimization studies are where the workload is simulated by leveraging a load-testing tool.

Offline optimization studies are typically used to optimize systems in pre-production environments, with respect to planned and what-if scenarios that cannot be directly run in production. Scenarios include new application releases, planned technology changes (e.g. new JVM or DB), cloud migration or new provider, expected workload growth, and resilience under failure scenarios (from chaos engineering).

The following figure represents the iterative process associated with offline optimizations:

The following 5 phases can be identified for each iteration (also known as experiment):

Recommend Conf: Akamas AI engine identifies the configuration for the next iteration until a termination condition for the study is met (e.g. number of experiments).

Thanks to its patented AI (reinforcement learning) algorithms, Akamas can find the optimal configuration without having to explore all the possible configurations.

Trials

For each experiment, Akamas allows multiple trials to be executed. A trial is a repetition of the same experiment to reduce the impact of noise on the result of an experiment.

Environments can be noisy for several reasons such as:

External conditions (e.g. background jobs, "noisy neighbors" in the cloud)
Measurement errors (e.g. monitoring tools not always 100% accurate)

This approach is consistent with scientific and engineering practices, where the strategy to minimize the impact of noise is to repeat the same experiment multiple times.

Steps

An offline optimization study can include multiple steps.

Typically there are at least two steps:

Baseline step: a single experiment that is run by applying the already deployed configuration before the Akamas optimization is applied - the results of this experiment are used as a reference (baseline) for assessing the optimization and as such is a mandatory step for each study
Optimize step: a defined number of experiments used to identify the optimal configuration by leveraging Akamas AI.

Other steps are:

Bootstrap step: imported experiments from other optimization studies
Preset step: a single experiment with a defined configuration

The steps to be executed can be specified when defining an offline optimization study.

Commands

User Interface

The Akamas UI shows offline optimization studies in a specific top-level menu.

The details and results of an offline optimization study are displayed when drilling down (there are multiple tabs and sections).

Live Optimization Study

Live Optimization studies are optimization studies where the workload is real: the system that needs to be optimized operates with respect to varying workloads observed while running live.

Live optimization studies are typically used to optimize systems in production environments. For example, a microservices application can be optimized while running in production by having Kubernetes and JVM parameters dynamically tuned for multiple microservices so as to minimize costs while matching response time objectives.

The following figure represents the iterative process associated with offline optimizations:

The following 5 phases can be identified for each iteration:

Collect KPIs: Akamas collects the metrics of the system required to observe its behavior under the current parameter configuration by leveraging the associated telemetry provider - here Akamas is also observing and categorizing the different workload contexts that are used to recommend configurations that are appropriate for each specific workload context
Score vs Goal: Akamas scores the applied parameter configuration under the specific workload context against the defined goal and constraints
Recommend Conf: Akamas provides a recommendation for parameter configuration based on the observed behavior under the specific workload context and leveraging the Akamas AI
Human Approval: this is an optional step as there are two operational modes:
- autonomous mode: no human intervention is required
- human-approval mode: recommendations need to be approved by users before configuration changes get applied - recommendations can be changed by users
Apply Conf: Akamas applies the recommended (and possibly revisited) configuration, by leveraging the defined workflow.

Notice that configurations can be applied by Akamas via integrations with native interfaces (e.g. Kubectl), by leveraging any orchestration and automation tool in place (e.g. OpenShift), or by triggering a pull request to a configuration repository (e.g. Git). This can be applied to either the entire target system or to a canary deployment.

Safety Policies

Akamas provides several customizable policies for live optimization studies to ensure that recommended configuration changes to production environments are as safe as possible. Akamas safety policies include gradual optimization, smart constraints, and outlier detection.

Commands

A live optimization study is an Akamas resource that can be managed via CLI using resource management commands.

User Interface

The Akamas UI shows live optimization studies in a specific top-level menu.

The details and results of an offline optimization study are displayed when drilling down.

Workspace

A workspace is a virtual environment that groups systems, workflows, and studies to restrict user access to them: a user can access these resources only when granted the required permissions to that workspace.

Akamas defines two user roles according to the assigned permission on the workspace:

Contributors (write permission) can create and manage workspace resources (studies, telemetry instances, systems, and workflows) and can also do exports/imports, view all global resources (Optimization Packs, and Telemetry Providers), and see remaining credits;
Viewers (read permission) can only access optimization results but cannot create or modify workspace resources.

Workspaces and accesses are managed by users with administrative privileges. A user with administrator privileges can manage licenses, users, workspaces, and install/deinstall Optimization Packs, and Telemetry Providers.

Workspaces can be defined according to different criteria, such as:

By department (e.g. Performance, Development)
By initiative (e.g. Poc, Training)
By application (e.g. Registry, Banking..)

A workspace is described by the following property:

a name that uniquely identifies the workspace

Commands

A workspace is an that can be managed via CLI using the See also this page devoted to commands on .

User Interface

The workspace a study belongs to is always displayed. Filters can be used to select only studies belonging to specific workspaces

Construct templates

This section describes all the structures that can be used to define resources and objects in Akamas.

Resource

Construct template

System template

Systems are defined using a YAML manifest with the following structure:

# General section
name: Analytical functions
description: A collection of analytical functions

with the following properties:

Field

Type

Value restrictions

Is required

Default Value

Description

name

string

TRUE

The name of the system

description

string

TRUE

A description to characterize the system

Example

The following represents a system (for Cassandra related system)

name: system1
description: my system with 3 nodes of cassandra

Component template

Components are defined using a YAML manifest with the following structure:

name: branin
description: The branin analytical function
componentType: function_branin
properties:
  hostname: function-server

and properties:

Field

Type

Value restrictions

Is required

Default value

Description

name

string

should match the following regexp:

^[a-zA-Z][a-zA-Z0-9_]*$

that is only letters, number and underscores, no initial number of underscore

Notice: this should not match the name of another component

TRUE

The name of the component.

description

string

TRUE

A description to characterize the component.

componentType

string

notice: this should match the name of an existing component-type

TRUE

The name of the component-type that defines the type of the component.

properties

object

FALSE

General custom properties of the component. These properties can be defined freely and usually have the purpose to expose information useful for configuring the component.

Examples

Example of a component for OpenJDK11:

name: JVM_1
description: The first jvm of the system
componentType: java-openjdk-11
properties:
  hostname: ycsb1.dev.akamas.io
  username: ubuntu

Example of a component for the Linux operating system:

name: os_1
description: The operating system of team 1
componentType: Ubuntu-20.04
properties:
  hostname: ycsb1.dev.akamas.io
  username: ubuntu

Parameter template

Parameters are defined using a YAML manifest with the following structure:

with the following properties:

Field

Type

Value restrictions

Is required

Default Value

Description

Notice that parameter definitions are shared across all the workspaces on the same Akamas installation, and require an account with administrative privileges to manage them.

Example

The following represents a set of parameters for a JVM component

The following represents a set of CPU-related parameters for the Linux operating system

Metric template

Metrics are defined using a YAML manifest with the following structure:

metrics:
  - name: "cpu_util"
    description: "cpu utilization"
    unit: "percent"
  - name: "mem_util"
    description: "memory utilization"
    unit: "percent"

and properties:

Field

Type

Value restrictions

Is required

Default Value

Description

name

string

no spaces are allowed

TRUE

The name of the metric

unit

string

The unit of measure of the metric

description

string

TRUE

A description characterizing the metric

Supported units of measure

The supported units of measure for metrics are:

Type

Units

Temporal units

Units of infomation

Others

percent

Notice that supported units of measure are automatically scaled for visualization purposes. In particular, for units of information, Akamas uses a base 2 scaling for bytes, i.e., 1 kilobyte = 1024 bytes, 1 megabyte = 1024 kilobytes, and so on. Other units of measure are only scaled up using millions or billions (e.g., 124000000 custom units become 124 Mln custom units).

Component Types template

Component types are defined using a YAML manifest with the following structure:

# General section
name: function_branin
description: A component type for the branin analytical function

# Parameters section
parameters:
  - name: x1
    domain:
      type: real
      domain: [-5.0, 10.0]
    defaultValue: -5.0
    decimals: 3
    operators:
    FileConfigurator:
      confTemplate: "${value}"

  - name: x2
    domain:
      type: real
      domain: [0.0, 15.0]
    defaultValue: 0.0

  - name: x3
    domain:
      type: categorical
      categories: [cat1,cat2,cat3]
    operators:
    LinuxConfigurator:
      echo:
        file: /sys/class/block/nvme0n1/queue/scheduler

# Metrics section
metrics:
  - name: function_value

and properties for the general section:

Field

Type

Value restrictions

Is required

Default value

Description

name

string

should match the following regexp:

^[a-zA-Z][a-zA-Z0-9_]*$

that is only letters, number and underscores, no initial number of underscore

Notice: this should not match the name of another component

TRUE

The name of the component.

description

string

TRUE

A description to characterize the component.

componentType

string

notice: this should match the name of an existing component-type

TRUE

The name of the component-type that defines the type of the component.

properties

object

FALSE

General custom properties of the component. These properties can be defined freely and usually have the purpose to expose information useful for configuring the component.

The parameter section describes the relationship between the component type and already defined parameters with the following properties:

Field

Type

Value restrictions

Is required

Default value

Description

name

string

It should match the name of an existing parameter.

Yes

The name of the parameter that should be related to the component-type

domain->type

string

{real, integer, categorical}

Yes

The type of domain to be set for the parameter in relationship with the component-type

domain->domain

array of numbers

The numbers should be either all integers or real numbers(do not omit the " . ") depending on domain->type.

The size of the array must be 2.

The bounds to be used to define the domain of the parameter. These bounds are inclusive

domain->categories

array of strings

The possible categories that the parameter could possess

defaultValue

string, integer, real

The value must be included in the domain, for real and integer types and must be a value included in the categories

Yes

The default value of the parameter

decimals

integer

[0-255]

The number of decimal digits rendered for this parameter

operators

object

TRUE

Specify what operators can be used to apply the parameter

The metric section describes the relationship between the component type and already defined metrics with the following properties:

Field

Type

Value restrictions

Is required

Default value

Description

name

string

It should match the name of an existing metric

TRUE

The name of the metric that should be related to the component type

Notice that component type definitions are shared across all the workspaces on the same Akamas installation, and require an account with administrative privileges to manage them.

Examples

Example of a component for the Cassandra component type:

name: Cassandra
description: The Cassandra NoSQL database version 3
parameters:
  - name: cassandra_compactionStrategy
    domain:
      type: categorical
      categories: [A, B]
    defaultValue: A

metrics:
  - name: total_rate
  - name: read_rate
  - name: write_rate
  - name: read_response_time_avg
  - name: read_response_time_p90
  - name: read_response_time_p99
  - name: read_response_time_max
  - name: write_response_time_avg
  - name: write_response_time_p90
  - name: write_response_time_p99
  - name: write_response_time_max

Example of a component for the Linux operating component type:

name: Linux OS
description: A component type for the Linux Operating System
parameters:
  #CPU Related
  - name: os_cpuSchedMinGranularity
    domain:
      type: integer
      domain: [300000, 30000000]
    defaultValue: 3000000
  - name: os_cpuSchedWakeupGranularity
    domain:
      type: integer
      domain: [400000, 40000000]
    defaultValue: 4000000
  - name: osCpu.schedMigrationCost
    domain:
      type: integer
      domain: [100000, 5000000]
    defaultValue: 500000
  - name: os_CPUSchedChildRunsFirst
    domain:
      type: integer
      domain: [0, 1]
    defaultValue: 0
  - name: os_CPUSchedLatency
    domain:
      type: integer
      domain: [2400000, 240000000]
    defaultValue: 24000000
  - name: os_CPUSchedAutogroupEnabled
    domain:
      type: integer
      domain: [0, 1]
    defaultValue: 1
  - name: os_CPUSchedNrMigrate
    domain:
      type: integer
      domain: [3, 320]
    defaultValue: 32
  #Memory Related
  - name: os_MemorySwappiness
    domain:
      type: integer
      domain: [0, 100]
    defaultValue: 60
  - name: os_MemoryVmVfsCachePressure
    domain:
      type: integer
      domain: [10, 100]
    defaultValue: 100
  - name: os_MemoryVmMinFree
    domain:
      type: integer
      domain: [10240, 1024000]
    defaultValue: 67584
  - name: os_MemoryVmDirtyRatio
    domain:
      type: integer
      domain: [1, 99]
    defaultValue: 10
  - name: os_MemoryTransparentHugepageEnabled
    domain:
      type: categorical
      categories: ['True', 'False']
    defaultValue: 'True'
  - name: os_MemoryTransparentHugepageDefrag
    domain:
      type: categorical
      categories: ['True', 'False']
    defaultValue: 'True'
  - name: os_MemorySwap
    domain:
      type: categorical
      categories: ['True', 'False']
    defaultValue: 'True'
  - name: os_MemoryVmDirtyExpire
    domain:
      type: integer
      domain: [300, 30000]
    defaultValue: 3000
  - name: os_MemoryVmDirtyWriteback
    domain:
      type: integer
      domain: [50, 5000]
    defaultValue: 500

metrics:
  - name: cpu_num
  - name: cpu_util
  - name: mem_util
  - name: load_avg
  - name: swapins
  - name: swapouts
  - name: disk_iops_writes
  - name: disk_iops_reads
  - name: disk_iops_total
  - name: disk_await_worst
  - name: proc_blocked
  - name: context_switch
  - name: tcp_retrans
  - name: tcp_tozerowin
  - name: net_band_rx_bits
  - name: net_band_tx_bits
  - name: network_in_byte_rate
  - name: network_out_byte_rate
  - name: mem_fault_minor
  - name: mem_fault_major
  - name: mem_active_file
  - name: mem_active_anon
  - name: mem_inactive_file
  - name: mem_inactive_anon

Telemetry Provider template

Telemetry Providers are defined using a YAML manifest with the following structure:

name: "<string>"
description: "<string>"
dockerImage: "<string>"

with the following properties:

Name

Type

Description

Mandatory

name

string

The name of the Telemetry Provider. This name will be used to reference the Telemetry Provider in the Telemetry Provider Instances. This is unique in an Akamas instance

yes

description

string

A description for the Telemetry Provider

yes

dockerImage

string

The docker image of the Telemetry Provider.

yes

Please refer to the page Integrating Telemetry Providers which describes the out-of-the-box Telemetry Providers that are created automatically at Akamas install time.

Telemetry Instance template

Telemetry instances are defined using a YAML manifest with the following structure:

provider: Provider Name
config:
  providerSpecificConfig1: "<value>"
  providerSpecificConfig2: 123
metrics:
- name: metric_name
  datasourceName: datsource_metric_name
  labels:
    - label1
    - label2
  staticLabels:
    staticLabel1: staticValue1
    staticLabel2: staticValue2

with the following properties for the global section

Name

Type

Description

Mandatory

provider

string

The name of the Telemetry Provider

Yes

config

object

Provider-specific configuration in a key-value format (see specific provider documentation for details)

Yes

metrics

object

This section is used to specify the metrics to extract. This section is specific for each Telemetry Provider (see specific provider documentation for details)

and the metrics section

Name

Type

Description

Mandatory

name

string

Name of the metric in Akamas.

This metric must exists in at least one of the referred by the System associated with the Telemetry Provider Instance

Yes

datasourceName

string

Name of the metric (or extraction query) in the data source. The value of this parameter is specific to the data source.

Yes

labels

List of strings

List of labels. For the specific usage of this parameter, see the documentation of the specific Telemetry Provider

staticLabels

List of key-value pair

List of Key-Value pairs that are interpreted as a pair of labels name and value. This "static labels" are copied directly in each sample of the specific metric and sent to the Metric Service

Workflows template

Workflow are defined using a YAML manifest with the following structure:

with the following properties:

Name

Type

Value Restrictions

Required

Default

Description

The full list of Operators and related options is provided on the pages.

Example

A workflow for the java-based renaissance benchmark application

Study template

Optimization studies are defined using a YAML manifest with the following structure:

system: 1
name: Optimizing the e-shop application
goal:
  objective: maximize
  function:
    formula: payments_per_sec
    variables:
      payments_per_sec:
        metric: eshop_payments
        labels:
          componentName: eshop

workflow: eshop_jmeter_test
steps:
  - name: baseline
    type: baseline
    values:
      tomcat.maxThreads: 1024
      jvm.maxHeap: 2048
      jvm.garbageCollectorType: G1GC
      postgres.shared_buffers: 4096

with the following mandatory properties:

Field

Type

Value restrictions

Is required

Default Value

Description

system

object reference

TRUE

The system the study refers to

name

string

TRUE

The name of the study

goal

object

TRUE

kpis

list

FALSE

numberOfTrials

integer

FALSE

The number of trials for each experiment - see below

trialAggregation

string

MAX, MIN, AVG

FALSE

AVG

The aggregation used to calculate the score across multiple trials - see below

parametersSelection

list

FALSE

all

metricsSelection

list

FALSE

all

workloadsSelection

object array

FALSE

windowing

string

FALSE

trim

workflow

object reference

TRUE

The workflow the study refers to

steps

list

TRUE

Some of these optional properties depend on whether the study is an offline or live optimization study.

Number of trials

It is possible to perform more than one trial per experiment in order to validate the score of a configuration under test, e.g. to take into account noisy environments.

The following fragment of the YAML definition of a study sets the number of trials to 3:

numberOfTrials: 3

Notice: This is a global property of the study which can be overwritten for each step.

Trial aggregation

The trial aggregation policy defines how trial scores are aggregated to form experiment scores.

There are three different types of strategies to aggregate trial scores:

AVG: the score of an experiment is the average of the scores of its trials - this is the default
MIN: the score of an experiment is the minimum among the scores of its trials
MAX: the score of an experiment is the maximum among the scores of its trial

The following fragment of the YAML definition of a study sets the trial aggregation to MAX:

trialAggregation: MAX # other possible values are AVG, MIN

Examples

The following system refers to an offline optimization study for a system modeling an e-commerce service, where a windowing strategy is specified:

system: "bde4f259-9a51-4c67-87aa-3c5bc599c6b9" # id of the system to optimize with the actions defined in this study
workflow: "eshop_jmeter_test" # name of the workflow to use to perform trials
name: Optimizing the e-shop application # name of the study
goal: # the performance goal to achieve
  objective: "maximize"
  function:
    formula: "eshop.payments_per_second"
windowing: # the temporal window in which to compute the score of a trial
  type: "trim"
  trim: ["10s", "0s"] # use the duration of the trial minus 0s from start and end to compute the score of the trial
parametersSelection: "all" # use all available configuration parameters
metricsSelection: "all" # gather all metrics
steps: # the steps to conduct to perform experiments and trials
  - name: "my_baseline" # do first a baseline with the provided configuration
    type: "baseline"
    values:
      jvm.maxHeap: 2048
      jvm.gcType: "-XX:+UseParallelGC"
  - name: my_optimization # then do 20 optimization experiments of 2 trials each
    type: optimize
    numberOfExperiments: 200
    numberOfTrials: 2

The following offline study refers to a tuning initiative for a Cassandra-based system (ID 2)

system: 2
name: Optimizing the cassandra - team 2
goal:
  objective: minimize
  function:
    formula: read_response_time_p90
    variables:
      read_response_time_p90:
        metric: read_response_time_p90
        labels:
          componentName: cassandra

windowing:
  type: trim
  trim: [5m, 1m]

workflow: cassandra_workflow
parametersSelection:
  - name: cassandra_jvm.jvm_maxHeapSize
  - name: cassandra.cassandra_concurrentReads
  - name: cassandra.cassandra_concurrentWrites
  - name: cassandra.cassandra_fileCacheSizeInMb
  - name: cassandra.cassandra_memtableCleanupThreshold
  - name: cassandra.cassandra_concurrentCompactors

steps:
  - name: baseline_step
    type: baseline
    values:
      cassandra_jvm.jvm_maxHeapSize: 1024
      cassandra.cassandra_concurrentReads: 32
      cassandra.cassandra_concurrentWrites: 32
      cassandra.cassandra_fileCacheSizeInMb: 512
      cassandra.cassandra_memtableCleanupThreshold: 0.11
      cassandra.cassandra_concurrentCompactors: 2

  - name: optimization_step
    type: optimize
    optimizer: CALABI
    numberOfExperiments: 50

The following offline study is for tuning another Cassandra-based system (ID 3) by acting only on JVM and Linux parameters

system: 3
name: Optimizing a Cassandra NoSQL database version 3 (jvm + os parameters)
goal:
  objective: minimize
  function:
    formula: (x1+x2)/2
    variables:
      x1:
        metric: write_response_time_p90
        labels:
          componentName: cassandra_team1
      x2:
        metric: read_response_time_p90
        labels:
          componentName: cassandra_team1

windowing:
  type: trim
  trim: [8m,2m]

numberOfTrials: 2
workflow: cassandra_workflow_jvm_os

parametersSelection:
  - name: JVM1.jvm_maxHeapSize
  - name: JVM1.jvm_newRatio
  - name: JVM1.jvm_survivorRatio
  - name: JVM1.jvm_maxTenuringThreshold
  - name: JVM1.jvm_gcType
  - name: JVM1.jvm_concurrentGCThreads
  - name: os1.os_cpuSchedMinGranularity
  - name: os1.os_cpuSchedWakeupGranularity
  - name: os1.os_CPUSchedMigrationCost
  - name: os1.os_CPUSchedChildRunsFirst
  - name: os1.os_CPUSchedLatency

steps:
  - name: baseline_step
    type: baseline
    values:
      JVM_team1.jvm_maxHeapSize: 1024
      JVM_team1.jvm_newRatio: 2
      JVM_team1.jvm_survivorRatio: 8
      JVM_team1.jvm_maxTenuringThreshold: 15
      JVM_team1.jvm_gcType: UseConcMarkSweepGC
      JVM_team1.jvm_concurrentGCThreads: 8
      os_team1.os_cpuSchedMinGranularity: 3000000
      os_team1.os_cpuSchedWakeupGranularity: 4000000
      os_team1.os_CPUSchedMigrationCost: 500000
      os_team1.os_CPUSchedChildRunsFirst: 0
      os_team1.os_CPUSchedLatency: 24000000

  - name: optimization_sobol
    type: optimize
    optimizer: SOBOL
    numberOfExperiments: 3

  - name: optimization_calabi
    type: optimize
    optimizer: CALABI
    numberOfExperiments: 50

Goal & Constraints

Optimization goals and constraints are defined using a YAML manifest with the following structure:

where:

Field

Type

Value restriction

Is Required

Default value

Description

Function

The function field of the Goal of a Study details the characteristics of the function Akamas should minimize or maximize to reach the desired performance objective.

The function field has the following structure:

Where:

Formula

The formula field represents the mathematical expression of the performance objective for the Study and contains variables and operators with the following characteristics:

Valid operators are: + - * / ^ sqrt(variable) log(variable) max(variable1, variable2) min(variable1, variable2)
Valid variables are in the form:
- <component_name>.<metric_name>, which correspond directly to metrics of Components of the System under test
- <variable_name>, which should match variables specified in the variables field

Variables

The variables field contains the specification of additional variables present in the formula, variables that can offer more flexibility with respect to directly specifying each metric of each Component in the formula.

Notice: each subfield of variables specifies a variable with its characteristics, the name of the subfield is the name of the variable.

The variable subfield has the following structure:

It is possible to use the notation <component_name>.<metric_name> in the metric field to automatically filter the metric’s data point by that component name is applied.

Constraints

The constraints field specifies constraints on the metrics of the components of the system under test that need to be satisfied for a configuration to be valid with respect to the defined goal.

Each constraint has the form of:

mathematical_operationcomparison_operatorvalue_to_compare

where valid mathematical operations include:

+ - * / ^
min max
sqrt log (log is a natural logarithm)

valid comparison operators include:

> < <= >=
== != (equality, inequality)

and valid values to compare include:

absolute values (e.g, 104343)
percentage values relative to the baseline (e.g, 20%)

Examples

The following example refers to a study whose goal is to optimize the throughput of a Java service (jpetstore), that is to maximize the throughput (measured as elements_per_second) while keeping errors (error_rate) and latency (avg_duration, max_duration) under control (absolute values):

The following example refers to a study whose goal is to optimize the memory consumption of Docker containers in a microservices application, that is to minimize the average memory consumption of Docker containers within the application of appId="app1" by observing memory limits, also normalizing by the maximum duration of a benchmark (container_benchmark_duration).

Windowing policy

The Windowing field in a study specifies the windowing policy to be adopted to score the experiments of an optimization study.

The two available windowing strategies have different structures:

: trim the temporal interval of a trial, both from the start and the end of a specified temporal amount - this is the default strategy
: discard temporal intervals in which a given metric is not stable and selects the temporal interval in which a metric is maximized or minimized.

In case the windowing strategy is not specified, the entire time window is considered.

Trim windowing

A windowing policy of type trim trims the temporal interval of a trial, both from the start and from the end of a specified temporal amount (e.g., 3 seconds).

The trim windowing has the following structure:

Filed

Type

Value restrictions

Is required

Default value

Description

In case a windowing policy is not specified, the default windowing corresponding to trim[0s,0s] is considered.

Example

The following fragment shows a windowing strategy of type "trim" where the time window is specified to start 10s after the beginning of the trial and to end immediately before the end of the trial:

Stability windowing

A windowing policy of type stability discards temporal intervals in which a given metric is not stable, and selects, among the remaining intervals, the ones in which another target metric is maximized or minimized. Stability windowing can be sample-based or time-frame based.

The stability windowing has the following structure:

Field

Type

Value restrictions

Is required

Default value

Description

and for the comparison metric section

Field

Type

Value restrictions

Is required

Default value

Description

Example

The following fragment is an example of stability windowing (time-frame based):

Parameter selection

The ParameterSelection field in a study specifies which parameters of the system should be tuned while running the optimization study.

In case this selection is not specified, all parameters are considered.

A parameter selection can either assume the value of all to indicate that all the available parameters of the system of the study should be tuned, or can assume the value of a list with items of the shape like the one below:

Field

Type

Value restriction

Is required

Description

Notice that, by default, every parameter specified in the parameters selection of a study is applied. This can be modified, by leveraging the options.

Example

The following fragment is an example:

Ubuntu 18.04

Ubuntu 20.04

Optimize step

An optimize step generates optimized configurations according to the defined optimization strategy. During this step, Akamas AI is used to generate such optimized configurations.

The optimize step has the following structure:

Field

Type

Value restrictions

Is required

Default value

Description

type

string

optimize

yes

The type of the step, in this case, optimize

name

string

yes

The name of the step

runOnFailure

boolean

true false

false

The execution policy of the step:

false prevents the step from running in case the previous step failed
true allows the step to run even if the previous step failed

numberOfExperiments

integer

numberOfExperiments > 0 and

numberOfExperiments >= numberOfInitExperiments

yes

The number of experiments to execute - see below

numberOfTrials

integer

numberOfTrials > 0

The number of trials to execute for each experiment

numberOfInitExperiments

integer

numberOfInitExperiments < numberOfExperiments

The number of initialization experiment to execute - see below.

maxFailedExperiments

integer

maxFailedExperiments > 1

The number of experiment failures (as either workflow errors or constraint violations) to accept before the step is marked as failed

optimizer

string

AKAMAS SOBOL RANDOM

AKAMAS

The type of optimizer to use to generate the configuration of the experiments - see below

optimizerOptions

object

see below

Some options for the AKAMAS optimizer - see below

doNotRenderParameters

string

renderParameters

string

Optimizer

The optimizer field allows selecting the desired optimizer:

AKAMAS identifies the standard AI optimizer used by Akamas
SOBOL identifies an optimizer that generates configurations using Sobol sequences
RANDOM identifies an optimization that generates configurations using random numbers

Notice that SOBOL and RANDOM optimizers do not perform initialization experiments, hence the field numberOfInitExperiments is ignored.

Optimizer options for offline studies

For offline optimization studies, the optimizerOptions field can be used to specify whether beta-warping optimization (a more sophisticated optimization that requires longer time) should be used for how many experiments (as a percentage):

# half the experiments
# should be done
# with beta warping
experimentsWithBeta: "50%"

where experimentsWithBeta can be:

A percentage between 0 and 100%
A number less than or equal to numberOfExperiments

Please notice that also the safetyFactor option discussed here below in the context of live optimization studies can be applied to offline optimization studies.

Optimizer options for live studies

For live optimization studies, the optimizerOptions field can be used to specify several important parameters governing the live optimization, which can be defined at the study level and also overridden at the step level (only for steps of type optimize):

optimizerOptions:
  onlineMode: RECOMMEND                    # [RECOMMEND|FULLY_AUTONOMOUS]
  safetyMode: GLOBAL                       # [GLOBAL|LOCAL]
  workloadOptimizedForStrategy: MAXIMIN    # [MAXIMIN|MEDIAN|LAST|MOST_VIOLATED]
  safetyFactor: 0.55                       # 0 <= safetyFactor <= 1
  explorationFactor: 0.05                  # 0 <= explorationFactor <= 1 or FULL_EXPLORATION

Notice that while available as independent options, the optimizer options onlineMode (described here below), workloadOptimizedForStrategy (here below) and the safetyFactor (here below) works in conjunction according to the following schema:

Online Mode

Safety Mode

Workload strategy

RECOMMEND

GLOBAL

MAXIMIN

FULLY_AUTONOMOUS

LOCAL

LAST

All these optimizer options can be changed at any time, that is while the optimization study is running, to become immediately effective. The Optimizer options commands page in the reference guide provides these specific update commands.

Online Mode

The onlineMode field specifies how the Akamas optimizer should operate:

RECOMMEND: configurations are recommended to the user by Akamas and are only applied after having been approved (and possibly modified) by the user;
FULLY AUTONOMOUS MODE: configurations are immediately applied by Akamas.

Safety Mode

The safetyMode field describes how the Akamas optimizer should evaluate the goal constraints on a candidate configuration for that configuration to be considered valid:

GLOBAL: the constraints must be satisfied by the configuration under all observed workloads in the configuration history - this is the value taken in case onlineMode is set to RECOMMEND;
LOCAL: the constraints are evaluated only under the workload selected according to the workload strategy - this should be used with onlineMode set to FULLY_AUTONOMOUS.

Notice that when setting the safetyMode to LOCAL, the recommended configuration is only expected to be good for the specific workload selected under the defined workload strategy, but it might violate constraints under another workload.

Workload Strategy

The workloadOptimizedForStrategy field specifies the workload strategy that drives how Akamas leverages the workload information when looking for the next configuration:

MAXIMIN: the optimizer looks for a configuration that maximizes the minimum improvements for all the already observed workloads;
MEDIAN: for each workload, a median of all its values is considered - this works well to find a configuration that is good for the median of all the workloads;
LAST: for each workload, the last observed workload is considered - this works well to find a configuration that is good for the last workloads - this is often used in conjunction with a LOCAL safety mode (see here above);
MOST_VIOLATED: for each workload, the workload of the configuration which results in most violations is considered.

Safety Factor

The safetyFactor field specifies how much the optimizer should stay on the safe side in evaluating a candidate configuration with respect to the goal constraints. A higher safety factor corresponds to a safer configuration, that is a configuration that is less likely to violate goal constraints.

Acceptable values are all the real values ranging between 0 and 1, with (safetyFactor - 0.5) representing the allowed margin for staying within the defined constraint:

0 means "no safety", as with this value the optimizer totally ignores goal constraints violations;
0.5 means "safe, but no margin", as with this value the optimizer only tries configurations that do not violate the goal constraints, by remaining as close as possible to them;
1 means "super safe", as with this value the optimize only tries configurations that are very far from goal constraints.

For live optimization studies, 0.6 is the default value, while for offline optimization studies, the default value is 0.5.

Exploration Factor

The explorationFactor field specifies how much the optimizer explores the (unknown) optimization space when looking for new configurations. For any parameter, this factor measures the delta between already tried values and the value of a new possible configuration. A higher exploration factor corresponds to a broader exploration of never tried before parameter values.

Acceptable values are all the real values ranging between 0 and 1, plus the special string FULL_EXPLORATION:

0 means "no exploration", as with this value the optimizer chooses a value among the previously seen values for each parameter;
1 means "full exploration, except for categories", as with this value the optimizer for a non-categorical parameter any value among all its domain values can be chosen, while only values (categories) that have already been seen in previous configurations are chosen for a categorical parameter;
FULL_EXPLORATION means "full exploration, including categories" as with this value the optimizer chooses any value among all its domain values, including categories, even if not already seen in previous configurations.

In case the desired explorationFactoris 1 but there are some specific parameters that also need to be explored with respect to all its categories, then PRESET steps (refer to the Preset step page) can be used to run an optimization study with these values. For an example of a live optimization study where this approach is adopted see Optimizing a live full-stack deployment (K8s + JVM).

Failures

An optimize step is fault-tolerant and tries to relaunch experiments when they fail. Nevertheless, the step imposes a limit on the number of failed experiments: if too many experiments fail, then the entire step fails too. By default, at most 30 experiments can fail while Akamas is optimizing systems. An experiment is considered failed when it either failed to run (i.e., there is an error in the workflow) or violated some constraint.

Inizialitations

An optimize step launches some initialization experiments (by default 10) that do not apply the AI optimizer and are used to find good configurations. By default, the step performs 10 initialization experiments.

Initialization experiments take into account bootstrapped experiments, experiments executed in preset steps, and baseline experiments.

Examples

The following fragment refers to an optimization study that runs 50 experiments using the SOBOL optimizer:

name: "my_optimize" # name of the step
type: "optimize" # type of the step (optimize)
optimizer: "SOBOL"
numberOfExperiments: 50 # amount of experiments to execute
numberOfTrials: 2 # amount of trials for each experiment

The following fragment refers to an optimization where 50% of the experiments need to use the beta-warping option, which enables a more sophisticated but longer optimization:

# half the experiments
# should be done
# with beta warping
experimentsWithBeta: "50%"

NeoLoadWeb Operator

The NeoLoadWeb operator allows piloting performance tests on a target system by leveraging the Tricentis NeoLoad Web solution.

Once triggered, this operator will configure and start the execution of a NeoLoad test run on the remote endpoint. When the test is unable to run then the operator blocks the Akamas workflow issuing an error.

Operator arguments

This operator requires five pieces of information to pilot successfully performance tests within Akamas:

The location of a .zip archive(project file) containing the definition of the performance test. This location can be a URL accessible via HTTP/HTTPS or a file path accessible via SFTP. Otherwise, the unique identifier of a previously uploaded project must be provided.
The name of the scenario to be used for the test
The URL of the NeoLoad Web API (either on-premise or SaaS)
The URL of the NeoLoad Web API for uploading project files
The account token used to access the NeoLoad Web APIs

When a projectFile is specified the Operator uploads the provided project to NeoLoad and launches the specified scenario. After the execution of the scenario, the project is deleted from NeoLoad. When a projectId is specified the Operator expects the project to be already available on NeoLoad. Please refer to NeoLoad official documentation on how to upload a project and obtain a project ID.

Name

Type

Value Restrictions

Required

Default

Description

scenarioName

String

It should match an existing scenario in the project file. Can be retrieved from the "runtime" section of your neoload controller.

No, if the component whose name is defined in component has a property that maps to scenarioName

The name of the scenario to be used for the performance piloted by Akamas

projectId

String

It should be a valid UUID

No, if a projectFile is already defined

The identified of a previously uploaded project file. Has precedence over projectFile

projectFile

Object

It should have a structure like the one described here below

No, if a projectId is already defined

The specification of the strategy to be used to get the archive containing the specification of the performance test to be piloted by Akamas. When defined projectId has the precedence.

neoloadProjectFilesApi

String

It should be a valid URL or IP

The address of the API to be used to upload project files to NeoLoad Web

neoloadApi

String

It should be a valid URL or IP

The address of the Neotys' NeoLoad Web API

lgZones

String

Comma-separated list of zones and number of LG

The list of LG zones id with the number of the LGs. Example: "ZoneId1:10,ZoneId2:5". If empty, the default zone will be used with one LG.

controllerZoneId

String

A controller zone Id

The controller zone Id. If empty, the default zone will be used.

component

String

It should match the name of an existing component of the System under test

The name of the component whose properties can be used as arguments of the operator.

accountToken

String

It should match an existing access token registered with NeoLoad Web

No, if specified in the component. See example below

The token to be used to authenticate requests against the NeoLoad Web APIs

`ProjectFile` structure and arguments

The projectFile argument needs to be specified differently depending on the protocol used to get the specification of the performance test:

HTTP/HTTPS
SSH (SFTP)

HTTP/HTTPS

Here follows the structure of the projectFile argument in the case in which HTTP/HTTPS is used to get the specification of the performance test:

...
projectFile:
    http:
        url: http://url_of_project_file

with its arguments:

Name

Type

Value Restrictions

Required

Default

Descrption

url

String

It should be a valid URL or IP

Yes

The URL of the project file

verifySSL

Boolean

true

If the https connection should be verified using the certificates available on the machine in which the operator is running

SSH (SFTP)

Here follows the structure of the projectFile argument in the case in which SFTP is used to get the specification of the performance test.

projectFile:
    ssh:
        hostname: this_is_a_hostname
        username: this_is_a_username
        sshPort: 22
        key: this_is_a_key
        path: /path/to/project/file

with its arguments

Type

Value Restrictions

Required

Default

Description

hostname

String

It should be a valid SSH host address

Yes

SSH host address

username

String

Yes

SSH login username

password

String

No. Either password or key should be provided

SSH login password

sshPort

Number (integer)

1≤sshPort≤65532

SSH port

key

String

No, Either password or key should be provided

SSH login key, provided directly its value or the path of the file to import from. The operator supports RSA and DSA Keys.

path

String

It should be a valid path on the SSH host machine

Yes

The path of the project file

`component` structure and arguments

The component argument can be used to refer to a component by name and use its properties as the arguments of the operator.

Component property to Operator argument mapping

Component property

Operator argument

neoloadProjectFilesApi

neoloadApi

accountToken

scenarioName

controllerZoneId

lgZones

deleteProjectAfterTest

url

projectFile->http->url

verifySSL

projectFile->http->verifySSL

hostname

projectFile->ssh->hostname

username

projectFile->ssh->username

password

projectFile->ssh->password

key

projectFile->ssh->key

sshPort

projectFile->ssh->sshPort

path

projectFile->ssh->path

Examples

Without component argument

name: task1
operator: NeoLoadWeb
arguments:
    projectFile:
        ssh:
            hostname: akamas-machine-1
            username: akamas
            key: "-----BEGIN RSA PRIVATE KEY----- RSA KEY HERE -----END RSA PRIVATE KEY-----"
            path: projects/project1.zip
    scenarioName: scenario1
    accountToken: "ACCOUNT TOKEN HERE"

With component argument

name: task1
operator: NeoLoadWeb
arguments:
    component: component1
    accountToken: "ACCOUNT TOKEN HERE"

Dynatrace metrics mapping

This page describes the mapping between metrics provided by Dynatrace to Akamas metrics for each supported component type.

Component Type

Notes

Linux

Component metric

Labels

Static labels

Dynatrace metric

Scale

cpu_load_avg

builtin:host.cpu.load

cpu_num

N/A

cpu_util

builtin:host.cpu.usage

0.01

cpu_util_details

mode:

idle
user
system
iowait

builtin:host.cpu.idle (mode=idle)
builtin:host.cpu.system (mode=system)
builtin:host.cpu.user (mode=user)
builtin:host.cpu.iowait (mode=iowait)

0.01

mem_util

N/A

mem_util_nocache

builtin:host.mem.usage

0.01

mem_util_details

N/A

mem_used

N/A

mem_used_nocache

builtin:host.mem.used

mem_total

N/A

mem_fault

builtin:host.mem.avail.pfps

mem_fault_minor

N/A

mem_fault_major

N/A

mem_swapins

N/A

mem_swapouts

N/A

disk_swap_util

N/A

disk_swap_used

N/A

filesystem_util

Disk

builtin:host.disk.usedPct

filesystem_used

N/A

filesystem_size

N/A

disk_util_details

Disk

builtin:host.disk.free

0.01

disk_iops_writes

N/A

disk_iops_reads

N/A

disk_iops

N/A

disk_iops_details

N/A

disk_response_time_worst

N/A

disk_response_time

N/A

disk_io_inflight_details

N/A

0.01

disk_write_bytes

N/A

disk_read_bytes

N/A

disk_read_write_bytes

N/A

disk_write_bytes_details

Disk

builtin:host.disk.bytesWritten

disk_read_bytes_details

Disk

builtin:host.disk.bytesRead

disk_response_time_details

Disk

builtin:host.disk.readTime

0.001

proc_blocked

N/A

os_context_switch

N/A

network_tcp_retrans

N/A

network_in_bytes_details

Network interface

builtin:host.net.nic.bytesRx

network_out_bytes_details

Network interface

builtin:host.net.nic.bytesTx

JVM

Component metric

Labels

Dynatrace metric

Scale

Aggregate multiple Dynatrace entities

Multiple entitites aggregation

jvm_gc_count

builtin:tech.jvm.memory.pool.collectionCount:merge(poolname,gcname):sum

1/60

Yes

avg

jvm_gc_time

builtin:tech.jvm.memory.gc.suspensionTime

0.01

Yes

avg

jvm_heap_size

builtin:tech.jvm.memory.runtime.max

Yes

avg

jvm_heap_committed

Yes

avg

jvm_heap_used

Yes

avg

jvm_off_heap_used

Yes

avg

jvm_heap_old_gen_size

Yes

avg

jvm_heap_old_gen_used

Yes

avg

jvm_heap_young_gen_size

Yes

avg

jvm_heap_young_gen_used

Yes

avg

jvm_threads_current

builtin:tech.jvm.threads.count

Yes

avg

Web Application

Component metric

Dynatrace metric

Scale

transactions_response_time

N/A

pages_response_time

N/A

requests_response_time

builtin:service.response.time

0.000001

transactions_response_time_min

N/A

pages_response_time_min

N/A

requests_response_time_min

builtin:service.response.time:min

0.000001

transactions_response_time_max

N/A

pages_response_time_max

N/A

requests_response_time_max

builtin:service.response.time:max

0.000001

transactions_throughput

N/A

pages_throughput

N/A

requests_throughput

builtin:service.errors.total.successCount

1/60

pages_error_rate

N/A

requests_error_rate

N/A

transactions_error_throughput

N/A

pages_error_throughput

N/A

requests_error_throughput

N/A

users

N/A

Kubernetes Container and Docker Container

Component Metric

Dynatrace Metric

Scale

Aggregate multiple Dynatrace entities

Multiple entitites aggregation

container_cpu_limit

builtin:containers.cpu.limit

Yes

avg

container_cpu_util

builtin:containers.cpu.usagePercent

0.01

Yes

avg

container_cpu_throttled_millicores

builtin:containers.cpu.throttledMilliCores

Yes

avg

container_cpu_throttle_time

builtin:containers.cpu.throttledTime

1 / 10^9 / 60

Yes

avg

container_cpu_used

builtin:containers.cpu.usageMilliCores

Yes

avg

container_cpu_used_max

builtin:containers.cpu.usageMilliCores:max

Yes

max

container_memory_limit

builtin:containers.memory.limitBytes

Yes

avg

container_memory_used

builtin:containers.memory.residentSetBytes

Yes

avg

container_memory_used_max

builtin:containers.memory.residentSetBytes:max

Yes

max

container_memory_util

builtin:containers.memory.usagePercent

0.01

Yes

avg

container_oom_kills_count

builtin:containers.memory.outOfMemoryKills

1/60

Yes

avg

Kubernetes Pod

Component Metric

Dynatrace Metric

Scale

Aggregate multiple Dynatrace entities

Multiple entitites aggregation

k8s_pod_cpu_limit

builtin:cloud.kubernetes.pod.cpuLimits

Yes

avg

k8s_pod_cpu_request

builtin:cloud.kubernetes.pod.cpuRequests

Yes

avg

k8s_pod_memory_limit

builtin:cloud.kubernetes.pod.memoryLimits

Yes

avg

k8s_pod_memory_request

builtin:cloud.kubernetes.pod.memoryRequests

Yes

avg

k8s_pod_container_restarts

builtin:cloud.kubernetes.pod.containerRestarts

Yes

avg

k8s_workload_desired_pods

builtin:cloud.kubernetes.workload.desiredPods

Yes

sum

Amazon Linux 2

This page describes the Optimization Pack for the component type Amazon Linux 2.

Metrics

CPU

Metric

Description

cpu_load_avg

tasks

The system load average (i.e., the number of active tasks in the system)

cpu_num

CPUs

The number of CPUs available in the system (physical and logical)

cpu_util

percent

The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)

cpu_used

CPUs

The average number of CPUs used in the system (physical and logical)

cpu_util_details

percent

The average CPU utilization % broken down by usage type and CPU number (e.g., cp1 user, cp2 system, cp3 soft-irq)

Memory

Metric

Description

mem_fault

faults/s

The number of memory faults (minor+major)

mem_fault_major

faults/s

The number of major memory faults (i.e., faults that cause disk access) per second

mem_fault_minor

faults/s

The number of minor memory faults (i.e., faults that do not cause disk access) per second

mem_swapins

pages/s

The number of memory pages swapped in per second

mem_swapouts

pages/s

The number of memory pages swapped out per second

mem_total

bytes

The total amount of installed memory

mem_used

bytes

The total amount of memory used

mem_used_nocache

bytes

The total amount of memory used without considering memory reserved for caching purposes

mem_util

percent

The memory utilization % (i.e, the % of memory used)

mem_util_details

percent

The memory utilization % (i.e., the % of memory used) broken down by usage type (e.g., active memory)

mem_util_nocache

percent

The memory utilization % (i.e., the % of memory used) without considering memory reserved for caching purposes

Disk & Filesystem

Metric

Description

disk_io_inflight_details

ops

The number of IO disk operations in progress (outstanding) broken down by disk (e.g., disk /dev/nvme01)

disk_iops

ops/s

The average number of IO disk operations per second across all disks

disk_iops_details

ops/s

The number of IO disk-write operations per second broken down by disk (e.g., disk /dev/nvme01)

disk_iops_reads

ops/s

The average number of IO disk-read operations per second across all disks

disk_iops_writes

ops/s

The average number of IO disk-write operations per second across all disks

disk_read_bytes

bytes/s

The number of bytes per second read across all disks

disk_read_bytes_details

bytes/s

The average response time of IO disk operations broken down by disk (e.g., disk C://)

disk_read_write_bytes

bytes/s

The number of bytes per second written across all disks

disk_response_time_details

seconds

The average response time of IO disk operations broken down by disk (e.g., disk C://)

disk_response_time_read

seconds

The average response time of read disk operations

disk_response_time_worst

seconds

The average response time of IO disk operations of the slowest disk

disk_response_time_write

seconds

The average response time of write on disk operations

disk_swap_used

bytes

The total amount of space used by swap disks

disk_swap_util

percent

The average space utilization % of swap disks

disk_util_details

percent

The utilization % of disk, i.e how much time a disk is busy doing work broken down by disk (e.g., disk D://)

disk_write_bytes

bytes/s

The number of bytes per second written across all disks

disk_write_bytes_details

bytes/s

The number of bytes per second written from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation WRITE)

filesystem_size

bytes

The size of filesystems broken down by type and device (e.g., filesystem of type ext4 for device /dev/nvme01)

filesystem_used

bytes

The amount of space used on the filesystems broken down by type and device (e.g., filesystem of type zfs on device /dev/nvme01)

filesystem_util

percent

The space utilization % of filesystems broken down by type and device (e.g., filesystem of type overlayfs on device /dev/loop1)

Network

Metric

Description

network_in_bytes_details

bytes/s

The number of inbound network packets in bytes per second broken down by network device (e.g., wlp4s0)

network_out_bytes_details

bytes/s

The number of outbound network packets in bytes per second broken down by network device (e.g., eth01)

network_tcp_retrans

retrans/s

The number of network TCP retransmissions per second

Others

Metric

Description

os_context_switch

switches/s

The number of context switches per second

proc_blocked

processes

The number of processes blocked (e.g, for IO or swapping reasons)

Parameters

CPU

Parameter

Type

Unit

Default Value

Domain

Restart

Description

os_cpuSchedMinGranularity

integer

nanoseconds

1500000

300000 → 30000000

Minimal preemption granularity (in nanoseconds) for CPU bound tasks

os_cpuSchedWakeupGranularity

integer

nanoseconds

2000000

400000 → 40000000

Scheduler Wakeup Granularity (in nanoseconds)

os_CPUSchedMigrationCost

integer

nanoseconds

500000

100000 → 5000000

Amount of time (in nanoseconds) after the last execution that a task is considered to be "cache hot" in migration decisions. A "hot" task is less likely to be migrated to another CPU, so increasing this variable reduces task migrations

os_CPUSchedChildRunsFirst

integer

0, 1

A freshly forked child runs before the parent continues execution

os_CPUSchedLatency

integer

nanoseconds

12000000

2400000 → 240000000

Targeted preemption latency (in nanoseconds) for CPU bound tasks

os_CPUSchedAutogroupEnabled

integer

0, 1

Enables the Linux task auto-grouping feature, where the kernel assigns related tasks to groups and schedules them together on CPUs to achieve higher performance for some workloads

os_CPUSchedNrMigrate

integer

3 → 320

Scheduler NR Migrate

Memory

Parameter

Type

Unit

Default Value

Domain

Restart

Description

os_MemorySwappiness

integer

percent

0 → 100

The percentage of RAM free space for which the kernel will start swapping pages to disk

os_MemoryVmVfsCachePressure

integer

100

10 → 100

VFS Cache Pressure

os_MemoryVmCompactionProactiveness

integer

0 → 100

Determines how aggressively compaction is done in the background

os_MemoryVmPageLockUnfairness

integer

0 → 1000

Set the level of unfairness in the page lock queue.

os_MemoryVmWatermarkScaleFactor

integer

0 → 1000

The amount of memory, expressed as fractions of 10'000, left in a node/system before kswapd is woken up and how much memory needs to be free before kswapd goes back to sleep

os_MemoryVmWatermarkBoostFactor

integer

15000

0 → 30000

The level of reclaim when the memory is being fragmented, expressed as fractions of 10'000 of a zone's high watermark

os_MemoryVmMinFree

integer

67584

10240 → 1024000

Minimum Free Memory (in kbytes)

os_MemoryTransparentHugepageEnabled

categorical

madvise

always, never, madvise

Transparent Hugepage Enablement Flag

os_MemoryTransparentHugepageDefrag

categorical

madvise

always, never, defer+madvise, madvise, defer

Transparent Hugepage Enablement Defrag

os_MemorySwap

categorical

swapon

swapon, swapoff

Memory Swap

os_MemoryVmDirtyRatio

integer

1 → 99

When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write

os_MemoryVmDirtyBackgroundRatio

integer

1 → 99

When the dirty memory pages exceed this percentage of the total memory, the kernel begins to write them asynchronously in the background

os_MemoryVmDirtyExpire

integer

centiseconds

3000

300 → 30000

When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write

os_MemoryVmDirtyWriteback

integer

centiseconds

500

50 → 5000

Memory Dirty Writeback (in centisecs)

Network

Parameter

Type

Unit

Default Value

Domain

Restart

Description

os_NetworkNetCoreSomaxconn

integer

megabytes

128

12 → 8192

Network Max Connections

os_NetworkNetCoreNetdevMaxBacklog

integer

megabytes/s

1000

100 → 10000

Network Max Backlog

os_NetworkNetIpv4TcpMaxSynBacklog

integer

milliseconds

256

52 → 5120

Network IPV4 Max Sync Backlog

os_NetworkNetCoreNetdevBudget

integer

300

30 → 30000

Network Budget

os_NetworkNetCoreRmemMax

integer

212992

21299 → 2129920

Maximum network receive buffer size that applications can request

os_NetworkNetCoreWmemMax

integer

212992

21299 → 2129920

Maximum network transmit buffer size that applications can request

os_NetworkNetIpv4TcpSlowStartAfterIdle

integer

0, 1

Network Slow Start After Idle Flag

os_NetworkNetIpv4TcpFinTimeout

integer

6 → 600

Network TCP timeout

os_NetworkRfs

integer

0 → 131072

If enabled increases datacache hitrate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running

Storage

Parameter

Type

Unit

Default Value

Domain

Restart

Description

os_StorageReadAhead

integer

kilobytes

128

0 → 4096

Read-ahead speeds up file access by pre-fetching data and loading it into the page cache so that it can be available earlier in memory instead of from disk

os_StorageNrRequests

integer

12 → 1280

Storage Number of Requests

os_StorageRqAffinity

integer

1, 2

Storage Requests Affinity

os_StorageQueueScheduler

integer

none

none, kyber, mq-deadline, bfq

Storage Queue Scheduler Type

os_StorageNomerges

integer

0 → 2

Enables the user to disable the lookup logic involved with IO merging requests in the block layer. By default (0) all merges are enabled. With 1 only simple one-hit merges will be tried. With 2 no merge algorithms will be tried

os_StorageMaxSectorsKb

integer

kilobytes

256

32 → 256

The largest IO size that the OS can issue to a block device

Amazon Linux

This page describes the Optimization Pack for the component type Amazon Linux.

Metrics

CPU

Metric

Description

cpu_load_avg

tasks

The system load average (i.e., the number of active tasks in the system)

cpu_num

CPUs

The number of CPUs available in the system (physical and logical)

cpu_util

percent

The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)

cpu_used

CPUs

The average number of CPUs used in the system (physical and logical)

cpu_util_details

percent

The average CPU utilization % broken down by usage type and CPU number (e.g., cp1 user, cp2 system, cp3 soft-irq)

Memory

Metric

Description

mem_fault

faults/s

The number of memory faults (minor+major)

mem_fault_major

faults/s

The number of major memory faults (i.e., faults that cause disk access) per second

mem_fault_minor

faults/s

The number of minor memory faults (i.e., faults that do not cause disk access) per second

mem_swapins

pages/s

The number of memory pages swapped in per second

mem_swapouts

pages/s

The number of memory pages swapped out per second

mem_total

bytes

The total amount of installed memory

mem_used

bytes

The total amount of memory used

mem_used_nocache

bytes

The total amount of memory used without considering memory reserved for caching purposes

mem_util

percent

The memory utilization % (i.e, the % of memory used)

mem_util_details

percent

The memory utilization % (i.e., the % of memory used) broken down by usage type (e.g., active memory)

mem_util_nocache

percent

The memory utilization % (i.e., the % of memory used) without considering memory reserved for caching purposes

Disk & Filesystem

Metric

Description

disk_io_inflight_details

ops

The number of IO disk operations in progress (outstanding) broken down by disk (e.g., disk /dev/nvme01)

disk_iops

ops/s

The average number of IO disk operations per second across all disks

disk_iops_details

ops/s

The number of IO disk-write operations per second broken down by disk (e.g., disk /dev/nvme01)

disk_iops_reads

ops/s

The average number of IO disk-read operations per second across all disks

disk_iops_writes

ops/s

The average number of IO disk-write operations per second across all disks

disk_read_bytes

bytes/s

The number of bytes per second read across all disks

disk_read_bytes_details

bytes/s

The average response time of IO disk operations broken down by disk (e.g., disk C://)

disk_read_write_bytes

bytes/s

The number of bytes per second written across all disks

disk_response_time_details

seconds

The average response time of IO disk operations broken down by disk (e.g., disk C://)

disk_response_time_read

seconds

The average response time of read disk operations

disk_response_time_worst

seconds

The average response time of IO disk operations of the slowest disk

disk_response_time_write

seconds

The average response time of write on disk operations

disk_swap_used

bytes

The total amount of space used by swap disks

disk_swap_util

percent

The average space utilization % of swap disks

disk_util_details

percent

The utilization % of disk, i.e how much time a disk is busy doing work broken down by disk (e.g., disk D://)

disk_write_bytes

bytes/s

The number of bytes per second written across all disks

disk_write_bytes_details

bytes/s

The number of bytes per second written from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation WRITE)

filesystem_size

bytes

The size of filesystems broken down by type and device (e.g., filesystem of type ext4 for device /dev/nvme01)

filesystem_used

bytes

The amount of space used on the filesystems broken down by type and device (e.g., filesystem of type zfs on device /dev/nvme01)

filesystem_util

percent

The space utilization % of filesystems broken down by type and device (e.g., filesystem of type overlayfs on device /dev/loop1)

Network

Metric

Description

network_in_bytes_details

bytes/s

The number of inbound network packets in bytes per second broken down by network device (e.g., wlp4s0)

network_out_bytes_details

bytes/s

The number of outbound network packets in bytes per second broken down by network device (e.g., eth01)

network_tcp_retrans

retrans/s

The number of network TCP retransmissions per second

Others

Metric

Description

os_context_switch

switches/s

The number of context switches per second

proc_blocked

processes

The number of processes blocked (e.g, for IO or swapping reasons)

Parameters

CPU

Parameter

Type

Unit

Default Value

Domain

Restart

Description

os_cpuSchedMinGranularity

integer

nanoseconds

1500000

300000 → 30000000

Minimal preemption granularity (in nanoseconds) for CPU bound tasks

os_cpuSchedWakeupGranularity

integer

nanoseconds

2000000

400000 → 40000000

Scheduler Wakeup Granularity (in nanoseconds)

os_CPUSchedMigrationCost

integer

nanoseconds

500000

100000 → 5000000

os_CPUSchedChildRunsFirst

integer

0, 1

A freshly forked child runs before the parent continues execution

os_CPUSchedLatency

integer

nanoseconds

12000000

2400000 → 240000000

Targeted preemption latency (in nanoseconds) for CPU bound tasks

os_CPUSchedAutogroupEnabled

integer

0, 1

Enables the Linux task auto-grouping feature, where the kernel assigns related tasks to groups and schedules them together on CPUs to achieve higher performance for some workloads

os_CPUSchedNrMigrate

integer

3 → 320

Scheduler NR Migrate

Memory

Parameter

Type

Unit

Default Value

Domain

Restart

Description

os_MemorySwappiness

integer

percent

0 → 100

The percentage of RAM free space for which the kernel will start swapping pages to disk

os_MemoryVmVfsCachePressure

integer

100

10 → 100

VFS Cache Pressure

os_MemoryVmCompactionProactiveness

integer

Determines how aggressively compaction is done in the background

os_MemoryVmMinFree

integer

67584

10240 → 1024000

Minimum Free Memory (in kbytes)

os_MemoryTransparentHugepageEnabled

categorical

madvise

always, never, madvise

Transparent Hugepage Enablement Flag

os_MemoryTransparentHugepageDefrag

categorical

madvise

always, never, defer+madvise, madvise, defer

Transparent Hugepage Enablement Defrag

os_MemorySwap

categorical

swapon

swapon, swapoff

Memory Swap

os_MemoryVmDirtyRatio

integer

1 → 99

When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write

os_MemoryVmDirtyBackgroundRatio

integer

1 → 99

When the dirty memory pages exceed this percentage of the total memory, the kernel begins to write them asynchronously in the background

os_MemoryVmDirtyExpire

integer

centiseconds

3000

300 → 30000

When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write

os_MemoryVmDirtyWriteback

integer

centiseconds

500

50 → 5000

Memory Dirty Writeback (in centisecs)

Network

Parameter

Type

Unit

Default Value

Domain

Restart

Description

os_NetworkNetCoreSomaxconn

integer

megabytes

128

12 → 8192

Network Max Connections

os_NetworkNetCoreNetdevMaxBacklog

integer

megabytes/s

1000

100 → 10000

Network Max Backlog

os_NetworkNetIpv4TcpMaxSynBacklog

integer

milliseconds

256

52 → 5120

Network IPV4 Max Sync Backlog

os_NetworkNetCoreNetdevBudget

integer

300

30 → 30000

Network Budget

os_NetworkNetCoreRmemMax

integer

212992

21299 → 2129920

Maximum network receive buffer size that applications can request

os_NetworkNetCoreWmemMax

integer

212992

21299 → 2129920

Maximum network transmit buffer size that applications can request

os_NetworkNetIpv4TcpSlowStartAfterIdle

integer

0, 1

Network Slow Start After Idle Flag

os_NetworkNetIpv4TcpFinTimeout

integer

6 → 600

Network TCP timeout

os_NetworkRfs

integer

0 → 131072

If enabled increases datacache hitrate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running

Storage

Parameter

Type

Unit

Default Value

Domain

Restart

Description

os_StorageReadAhead

integer

kilobytes

128

0 → 4096

Read-ahead speeds up file access by pre-fetching data and loading it into the page cache so that it can be available earlier in memory instead of from disk

os_StorageNrRequests

integer

12 → 1280

Storage Number of Requests

os_StorageRqAffinity

integer

1, 2

Storage Requests Affinity

os_StorageNomerges

integer

0 → 2

os_StorageMaxSectorsKb

integer

kilobytes

256

32 → 256

The largest IO size that the OS can issue to a block device

RHEL 7

This page describes the Optimization Pack for the component type RHEL 7.

Metrics

CPU

Metric

Unit

Description

cpu_num

CPUs

The number of CPUs available in the system (physical and logical)

cpu_util

percent

The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)

cpu_util_details

percent

The average CPU utilization % broken down by usage type and cpu number (e.g., cp1 user, cp2 system, cp3 soft-irq)

cpu_load_avg

tasks

The system load average (i.e., the number of active tasks in the system)

Memory

Metric

Unit

Description

mem_util

percent

The memory utilization % (i.e, the % of memory used)

mem_util_nocache

percent

The memory utilization % (i.e., the % of memory used) without considering memory reserved for caching purposes

mem_util_details

percent

The memory utilization % (i.e., the % of memory used) broken down by usage type (e.g., active memory)

mem_used

bytes

The total amount of memory used

mem_used_nocache

bytes

The total amount of memory used without considering memory reserved for caching purposes

mem_total

bytes

The total amount of installed memory

mem_fault_minor

faults/s

The number of minor memory faults (i.e., faults that do not cause disk access) per second

mem_fault_major

faults/s

The number of major memory faults (i.e., faults that cause disk access) per second

mem_fault

faults/s

The number of memory faults (major + minor)

mem_swapins

pages/s

The number of memory pages swapped in per second

mem_swapouts

pages/s

The number of memory pages swapped out per second

Network

Metric

Unit

Description

network_tcp_retrans

retrans/s

The number of network TCP retransmissions per second

network_in_bytes_details

bytes/s

The number of inbound network packets in bytes per second broken down by network device (e.g., wlp4s0)

network_out_bytes_details

bytes/s

The number of outbound network packets in bytes per second broken down by network device (e.g., eth01)

Disk

Notice: you can use a device custom filter to monitor a specific disk with Prometheus. You can find more information on Prometheus queries and the %FILTERS% placeholder here: Prometheus provider and here: Prometheus provider metrics mapping.

Metric

Unit

Description

disk_swap_util

percent

The average space utilization % of swap disks

disk_swap_used

bytes

The total amount of space used by swap disks

disk_util_details

percent

The utilization % of disk, i.e how much time a disk is busy doing work broken down by disk (e.g., disk D://)

disk_iops_writes

ops/s

The average number of IO disk-write operations per second across all disks

disk_iops_reads

ops/s

The average number of IO disk-read operations per second across all disks

disk_iops

ops/s

The average number of IO disk operations per second across all disks

disk_response_time_read

seconds

The average response time of IO read-disk operations

disk_response_time_worst

seconds

The average response time of IO disk operations of the slowest disk

disk_response_time_write

seconds

The average response time of IO write-disk operations

disk_response_time_details

ops/s

The average response time of IO disk operations broken down by disk (e.g., disk /dev/nvme01 )

disk_iops_details

ops/s

The number of IO disk-write operations of per second broken down by disk (e.g., disk /dev/nvme01)

disk_io_inflight_details

ops

The number of IO disk operations in progress (outstanding) broken down by disk (e.g., disk /dev/nvme01)

disk_write_bytes

bytes/s

The number of bytes per second written across all disks

disk_read_bytes

bytes/s

The number of bytes per second read across all disks

disk_read_write_bytes

bytes/s

The number of bytes per second read and written across all disks

disk_write_bytes_details

bytes/s

The number of bytes per second written from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation WRITE)

disk_read_bytes_details

bytes/s

The number of bytes per second read from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation READ)

Filesystem

Metric

Unit

Description

filesystem_util

percent

The space utilization % of filesystems broken down by type and device (e.g., filesystem of type overlayfs on device /dev/loop1)

filesystem_used

bytes

The amount of space used on the filesystems broken down by type and device (e.g., filesystem of type zfs on device /dev/nvme01)

filesystem_size

bytes

The size of filesystems broken down by type and device (e.g., filesystem of type ext4 for device /dev/nvme01)

Other metrics

Metric

Unit

Description

proc_blocked

processes

The number of processes blocked (e.g, for IO or swapping reasons)

os_context_switch

switches/s

The number of context switches per second

Parameters

CPU

Parameter

Default Value

Domain

Description

os_cpuSchedMinGranularity

2250000 ns

300000→30000000 ns

Minimal preemption granularity (in nanoseconds) for CPU bound tasks

os_cpuSchedWakeupGranularity

3000000 ns

400000→40000000 ns

Scheduler Wakeup Granularity (in nanoseconds)

os_CPUSchedMigrationCost

500000 ns

100000→5000000 ns

os_CPUSchedChildRunsFirst

0→1

A freshly forked child runs before the parent continues execution

os_CPUSchedLatency

18000000 ns

2400000→240000000 ns

Targeted preemption latency (in nanoseconds) for CPU bound tasks

os_CPUSchedAutogroupEnabled

0→1

Enables the Linux task auto-grouping feature, where the kernel assigns related tasks to groups and schedules them together on CPUs to achieve higher performance for some workloads

os_CPUSchedNrMigrate

3→320

Scheduler NR Migrate

Memory

Parameter

Default Value

Domain

Description

os_MemorySwappiness

0→100

Memory Swappiness

os_MemoryVmVfsCachePressure

100 %

10→100 %

VFS Cache Pressure

os_MemoryVmMinFree

67584 KB

10240→1024000 KB

Minimum Free Memory

os_MemoryVmDirtyRatio

20 %

1→99 %

When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write

os_MemoryVmDirtyBackgroundRatio

10 %

1→99 %

When the dirty memory pages exceed this percentage of the total memory, the kernel begins to write them asynchronously in the background

os_MemoryTransparentHugepageEnabled

always

always never

Transparent Hugepage Enablement

os_MemoryTransparentHugepageDefrag

always

always never

Transparent Hugepage Enablement Defrag

os_MemorySwap

swapon

swapon swapoff

Memory Swap

os_MemoryVmDirtyExpire

3000 centisecs

300→30000 centisecs

Memory Dirty Expiration Time

os_MemoryVmDirtyWriteback

500 centisecs

50→5000 centisecs

Memory Dirty Writeback

Network

Parameter

Default value

Domain

Description

os_NetworkNetCoreSomaxconn

128 connections

12→1200 connections

Network Max Connections

os_NetworkNetCoreNetdevMaxBacklog

1000 packets

100→10000 packets

Network Max Backlog

os_NetworkNetIpv4TcpMaxSynBacklog

1024 packets

52→15120 packets

Network IPV4 Max Sync Backlog

os_NetworkNetCoreNetdevBudget

300 packets

30→3000 packets

Network Budget

os_NetworkNetCoreRmemMax

212992 bytes

21299→2129920 bytes

Maximum network receive buffer size that applications can request

os_NetworkNetCoreWmemMax

21299→2129920 bytes

Maximum network transmit buffer size that applications can request

os_NetworkNetIpv4TcpSlowStartAfterIdle

0→1

Network Slow Start After Idle Flag

os_NetworkNetIpv4TcpFinTimeout

6 →600 seconds

Network TCP timeout

os_NetworkRfs

0→131072

If enabled increases datacache hitrate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running

Storage

Parameter

Default value

Domain

Description

os_StorageReadAhead

128 KB

0→1024 KB

Read-ahead speeds up file access by pre-fetching data and loading it into the page cache so that it can be available earlier in memory instead of from disk

os_StorageNrRequests

1000 packets

100→10000 packets

Network Max Backlog

os_StorageRqAffinity

1→2

Storage Requests Affinity

os_StorageQueueScheduler

none

none kyber

Storage Queue Scheduler Type

os_StorageNomerges

0→2

os_StorageMaxSectorsKb

128 KB

32→128 KB

The largest IO size that the OS c

Amazon Linux 2022

This page describes the Optimization Pack for the component type Amazon Linux 2022.

Metrics

CPU

Metric

Description

cpu_load_avg

tasks

The system load average (i.e., the number of active tasks in the system)

cpu_num

CPUs

The number of CPUs available in the system (physical and logical)

cpu_util

percent

The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)

cpu_used

CPUs

The average number of CPUs used in the system (physical and logical)

cpu_util_details

percent

The average CPU utilization % broken down by usage type and CPU number (e.g., cp1 user, cp2 system, cp3 soft-irq)

Memory

Metric

Description

mem_fault

faults/s

The number of memory faults (minor+major)

mem_fault_major

faults/s

The number of major memory faults (i.e., faults that cause disk access) per second

mem_fault_minor

faults/s

The number of minor memory faults (i.e., faults that do not cause disk access) per second

mem_swapins

pages/s

The number of memory pages swapped in per second

mem_swapouts

pages/s

The number of memory pages swapped out per second

mem_total

bytes

The total amount of installed memory

mem_used

bytes

The total amount of memory used

mem_used_nocache

bytes

The total amount of memory used without considering memory reserved for caching purposes

mem_util

percent

The memory utilization % (i.e, the % of memory used)

mem_util_details

percent

The memory utilization % (i.e., the % of memory used) broken down by usage type (e.g., active memory)

mem_util_nocache

percent

The memory utilization % (i.e., the % of memory used) without considering memory reserved for caching purposes

Disk & Filesystem

Metric

Description

disk_io_inflight_details

ops

The number of IO disk operations in progress (outstanding) broken down by disk (e.g., disk /dev/nvme01)

disk_iops

ops/s

The average number of IO disk operations per second across all disks

disk_iops_details

ops/s

The number of IO disk-write operations per second broken down by disk (e.g., disk /dev/nvme01)

disk_iops_reads

ops/s

The average number of IO disk-read operations per second across all disks

disk_iops_writes

ops/s

The average number of IO disk-write operations per second across all disks

disk_read_bytes

bytes/s

The number of bytes per second read across all disks

disk_read_bytes_details

bytes/s

The average response time of IO disk operations broken down by disk (e.g., disk C://)

disk_read_write_bytes

bytes/s

The number of bytes per second written across all disks

disk_response_time_details

seconds

The average response time of IO disk operations broken down by disk (e.g., disk C://)

disk_response_time_read

seconds

The average response time of read disk operations

disk_response_time_worst

seconds

The average response time of IO disk operations of the slowest disk

disk_response_time_write

seconds

The average response time of write on disk operations

disk_swap_used

bytes

The total amount of space used by swap disks

disk_swap_util

percent

The average space utilization % of swap disks

disk_util_details

percent

The utilization % of disk, i.e how much time a disk is busy doing work broken down by disk (e.g., disk D://)

disk_write_bytes

bytes/s

The number of bytes per second written across all disks

disk_write_bytes_details

bytes/s

The number of bytes per second written from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation WRITE)

filesystem_size

bytes

The size of filesystems broken down by type and device (e.g., filesystem of type ext4 for device /dev/nvme01)

filesystem_used

bytes

The amount of space used on the filesystems broken down by type and device (e.g., filesystem of type zfs on device /dev/nvme01)

filesystem_util

percent

The space utilization % of filesystems broken down by type and device (e.g., filesystem of type overlayfs on device /dev/loop1)

Network

Metric

Description

network_in_bytes_details

bytes/s

The number of inbound network packets in bytes per second broken down by network device (e.g., wlp4s0)

network_out_bytes_details

bytes/s

The number of outbound network packets in bytes per second broken down by network device (e.g., eth01)

network_tcp_retrans

retrans/s

The number of network TCP retransmissions per second

Others

Metric

Description

os_context_switch

switches/s

The number of context switches per second

proc_blocked

processes

The number of processes blocked (e.g, for IO or swapping reasons)

Parameters

CPU

Parameter

Type

Unit

Default Value

Domain

Restart

Description

os_cpuSchedMinGranularity

integer

nanoseconds

1500000

300000 → 30000000

Minimal preemption granularity (in nanoseconds) for CPU bound tasks

os_cpuSchedWakeupGranularity

integer

nanoseconds

2000000

400000 → 40000000

Scheduler Wakeup Granularity (in nanoseconds)

os_CPUSchedMigrationCost

integer

nanoseconds

500000

100000 → 5000000

os_CPUSchedChildRunsFirst

integer

0, 1

A freshly forked child runs before the parent continues execution

os_CPUSchedLatency

integer

nanoseconds

12000000

2400000 → 240000000

Targeted preemption latency (in nanoseconds) for CPU bound tasks

os_CPUSchedAutogroupEnabled

integer

0, 1

Enables the Linux task auto-grouping feature, where the kernel assigns related tasks to groups and schedules them together on CPUs to achieve higher performance for some workloads

os_CPUSchedNrMigrate

integer

3 → 320

Scheduler NR Migrate

Memory

Parameter

Type

Unit

Default Value

Domain

Restart

Description

os_MemorySwappiness

integer

percent

0 → 100

The percentage of RAM free space for which the kernel will start swapping pages to disk

os_MemoryVmVfsCachePressure

integer

100

10 → 100

VFS Cache Pressure

os_MemoryVmCompactionProactiveness

integer

10 → 100

Determines how aggressively compaction is done in the background

os_MemoryVmPageLockUnfairness

integer

0 → 1000

Set the level of unfairness in the page lock queue.

os_MemoryVmWatermarkScaleFactor

integer

0 → 1000

The amount of memory, expressed as fractions of 10'000, left in a node/system before kswapd is woken up and how much memory needs to be free before kswapd goes back to sleep

os_MemoryVmWatermarkBoostFactor

integer

15000

0 → 30000

The level of reclaim when the memory is being fragmented, expressed as fractions of 10'000 of a zone's high watermark

os_MemoryVmMinFree

integer

67584

10240 → 1024000

Minimum Free Memory (in kbytes)

os_MemoryTransparentHugepageEnabled

categorical

madvise

always, never, madvise

Transparent Hugepage Enablement Flag

os_MemoryTransparentHugepageDefrag

categorical

madvise

always, never, defer+madvise, madvise, defer

Transparent Hugepage Enablement Defrag

os_MemorySwap

categorical

swapon

swapon, swapoff

Memory Swap

os_MemoryVmDirtyRatio

integer

1 → 99

When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write

os_MemoryVmDirtyBackgroundRatio

integer

1 → 99

When the dirty memory pages exceed this percentage of the total memory, the kernel begins to write them asynchronously in the background

os_MemoryVmDirtyExpire

integer

centiseconds

3000

300 → 30000

When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write

os_MemoryVmDirtyWriteback

integer

centiseconds

500

50 → 5000

Memory Dirty Writeback (in centisecs)

Network

Parameter

Type

Unit

Default Value

Domain

Restart

Description

os_NetworkNetCoreSomaxconn

integer

megabytes

128

12 → 8192

Network Max Connections

os_NetworkNetCoreNetdevMaxBacklog

integer

megabytes/s

1000

100 → 10000

Network Max Backlog

os_NetworkNetIpv4TcpMaxSynBacklog

integer

milliseconds

256

52 → 5120

Network IPV4 Max Sync Backlog

os_NetworkNetCoreNetdevBudget

integer

300

30 → 30000

Network Budget

os_NetworkNetCoreRmemMax

integer

212992

21299 → 2129920

Maximum network receive buffer size that applications can request

os_NetworkNetCoreWmemMax

integer

212992

21299 → 2129920

Maximum network transmit buffer size that applications can request

os_NetworkNetIpv4TcpSlowStartAfterIdle

integer

0, 1

Network Slow Start After Idle Flag

os_NetworkNetIpv4TcpFinTimeout

integer

6 → 600

Network TCP timeout

os_NetworkRfs

integer

0 → 131072

If enabled increases datacache hitrate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running

Storage

Parameter

Type

Unit

Default Value

Domain

Restart

Description

os_StorageReadAhead

integer

kilobytes

128

0 → 4096

Read-ahead speeds up file access by pre-fetching data and loading it into the page cache so that it can be available earlier in memory instead of from disk

os_StorageNrRequests

integer

12 → 1280

Storage Number of Requests

os_StorageRqAffinity

integer

1, 2

Storage Requests Affinity

os_StorageQueueScheduler

integer

none

none, kyber, mq-deadline, bfq

Storage Queue Scheduler Type

os_StorageNomerges

integer

0 → 2

os_StorageMaxSectorsKb

integer

kilobytes

256

32 → 256

The largest IO size that the OS can issue to a block device

CentOS 7

This page describes the Optimization Pack for the component type CentOS 7.

Metrics

CPU

Metric

Unit

Description

cpu_num

CPUs

The number of CPUs available in the system (physical and logical)

cpu_util

percent

The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)

cpu_util_details

percent

The average CPU utilization % broken down by usage type and cpu number (e.g., cp1 user, cp2 system, cp3 soft-irq)

cpu_load_avg

tasks

The system load average (i.e., the number of active tasks in the system)

Memory

Metric

Unit

Description

mem_util

percent

The memory utilization % (i.e, the % of memory used)

mem_util_nocache

percent

The memory utilization % (i.e., the % of memory used) without considering memory reserved for caching purposes

mem_util_details

percent

The memory utilization % (i.e., the % of memory used) broken down by usage type (e.g., active memory)

mem_used

bytes

The total amount of memory used

mem_used_nocache

bytes

The total amount of memory used without considering memory reserved for caching purposes

mem_total

bytes

The total amount of installed memory

mem_fault_minor

faults/s

The number of minor memory faults (i.e., faults that do not cause disk access) per second

mem_fault_major

faults/s

The number of major memory faults (i.e., faults that cause disk access) per second

mem_fault

faults/s

The number of memory faults (major + minor)

mem_swapins

pages/s

The number of memory pages swapped in per second

mem_swapouts

pages/s

The number of memory pages swapped out per second

Network

Metric

Unit

Description

network_tcp_retrans

retrans/s

The number of network TCP retransmissions per second

network_in_bytes_details

bytes/s

The number of inbound network packets in bytes per second broken down by network device (e.g., wlp4s0)

network_out_bytes_details

bytes/s

The number of outbound network packets in bytes per second broken down by network device (e.g., eth01)

Disk

Metric

Unit

Description

disk_swap_util

percent

The average space utilization % of swap disks

disk_swap_used

bytes

The total amount of space used by swap disks

disk_util_details

percent

The utilization % of disk, i.e how much time a disk is busy doing work broken down by disk (e.g., disk D://)

disk_iops_writes

ops/s

The average number of IO disk-write operations per second across all disks

disk_iops_reads

ops/s

The average number of IO disk-read operations per second across all disks

disk_iops

ops/s

The average number of IO disk operations per second across all disks

disk_response_time_read

seconds

The average response time of IO read-disk operations

disk_response_time_worst

seconds

The average response time of IO disk operations of the slowest disk

disk_response_time_write

seconds

The average response time of IO write-disk operations

disk_response_time_details

ops/s

The average response time of IO disk operations broken down by disk (e.g., disk /dev/nvme01 )

disk_iops_details

ops/s

The number of IO disk-write operations of per second broken down by disk (e.g., disk /dev/nvme01)

disk_io_inflight_details

ops

The number of IO disk operations in progress (outstanding) broken down by disk (e.g., disk /dev/nvme01)

disk_write_bytes

bytes/s

The number of bytes per second written across all disks

disk_read_bytes

bytes/s

The number of bytes per second read across all disks

disk_read_write_bytes

bytes/s

The number of bytes per second read and written across all disks

disk_write_bytes_details

bytes/s

The number of bytes per second written from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation WRITE)

disk_read_bytes_details

bytes/s

The number of bytes per second read from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation READ)

Filesystem

Metric

Unit

Description

filesystem_util

percent

The space utilization % of filesystems broken down by type and device (e.g., filesystem of type overlayfs on device /dev/loop1)

filesystem_used

bytes

The amount of space used on the filesystems broken down by type and device (e.g., filesystem of type zfs on device /dev/nvme01)

filesystem_size

bytes

The size of filesystems broken down by type and device (e.g., filesystem of type ext4 for device /dev/nvme01)

Other metrics

Metric

Unit

Description

proc_blocked

processes

The number of processes blocked (e.g, for IO or swapping reasons)

os_context_switch

switches/s

The number of context switches per second

Parameters

CPU

Parameter

Default Value

Domain

Description

os_cpuSchedMinGranularity

2250000 ns

300000→30000000 ns

Minimal preemption granularity (in nanoseconds) for CPU bound tasks

os_cpuSchedWakeupGranularity

3000000 ns

400000→40000000 ns

Scheduler Wakeup Granularity (in nanoseconds)

os_CPUSchedMigrationCost

500000 ns

100000→5000000 ns

os_CPUSchedChildRunsFirst

0→1

A freshly forked child runs before the parent continues execution

os_CPUSchedLatency

18000000 ns

2400000→240000000 ns

Targeted preemption latency (in nanoseconds) for CPU bound tasks

os_CPUSchedAutogroupEnabled

0→1

Enables the Linux task auto-grouping feature, where the kernel assigns related tasks to groups and schedules them together on CPUs to achieve higher performance for some workloads

os_CPUSchedNrMigrate

3→320

Scheduler NR Migrate

Memory

Parameter

Default Value

Domain

Description

os_MemorySwappiness

0→100

Memory Swappiness

os_MemoryVmVfsCachePressure

100 %

10→100 %

VFS Cache Pressure

os_MemoryVmMinFree

67584 KB

10240→1024000 KB

Minimum Free Memory

os_MemoryVmDirtyRatio

20 %

1→99 %

When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write

os_MemoryVmDirtyBackgroundRatio

10 %

1→99 %

When the dirty memory pages exceed this percentage of the total memory, the kernel begins to write them asynchronously in the background

os_MemoryTransparentHugepageEnabled

always

always never

Transparent Hugepage Enablement

os_MemoryTransparentHugepageDefrag

always

always never

Transparent Hugepage Enablement Defrag

os_MemorySwap

swapon

swapon swapoff

Memory Swap

os_MemoryVmDirtyExpire

3000 centisecs

300→30000 centisecs

Memory Dirty Expiration Time

os_MemoryVmDirtyWriteback

500 centisecs

50→5000 centisecs

Memory Dirty Writeback

Network

Parameter

Default value

Domain

Description

os_NetworkNetCoreSomaxconn

128 connections

12→1200 connections

Network Max Connections

os_NetworkNetCoreNetdevMaxBacklog

1000 packets

100→10000 packets

Network Max Backlog

os_NetworkNetIpv4TcpMaxSynBacklog

1024 packets

52→15120 packets

Network IPV4 Max Sync Backlog

os_NetworkNetCoreNetdevBudget

300 packets

30→3000 packets

Network Budget

os_NetworkNetCoreRmemMax

212992 bytes

21299→2129920 bytes

Maximum network receive buffer size that applications can request

os_NetworkNetCoreWmemMax

21299→2129920 bytes

Maximum network transmit buffer size that applications can request

os_NetworkNetIpv4TcpSlowStartAfterIdle

0→1

Network Slow Start After Idle Flag

os_NetworkNetIpv4TcpFinTimeout

6 →600 seconds

Network TCP timeout

os_NetworkRfs

0→131072

If enabled increases datacache hitrate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running

Storage

Parameter

Default value

Domain

Description

os_StorageReadAhead

128 KB

0→1024 KB

Read-ahead speeds up file access by pre-fetching data and loading it into the page cache so that it can be available earlier in memory instead of from disk

os_StorageNrRequests

1000 packets

100→10000 packets

Network Max Backlog

os_StorageRqAffinity

1→2

Storage Requests Affinity

os_StorageQueueScheduler

none

none kyber

Storage Queue Scheduler Type

os_StorageNomerges

0→2

os_StorageMaxSectorsKb

128 KB

32→128 KB

The largest IO size that the OS c

Constraints

There are no general constraints among CentOS 7 parameters.

CentOS 8

This page describes the Optimization Pack for the component type CentOS 8.

Metrics

CPU

Metric

Unit

Description

cpu_num

CPUs

The number of CPUs available in the system (physical and logical)

cpu_util

percent

The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)

cpu_util_details

percent

The average CPU utilization % broken down by usage type and cpu number (e.g., cp1 user, cp2 system, cp3 soft-irq)

cpu_load_avg

tasks

The system load average (i.e., the number of active tasks in the system)

Memory

Metric

Unit

Description

mem_util

percent

The memory utilization % (i.e, the % of memory used)

mem_util_nocache

percent

The memory utilization % (i.e., the % of memory used) without considering memory reserved for caching purposes

mem_util_details

percent

The memory utilization % (i.e., the % of memory used) broken down by usage type (e.g., active memory)

mem_used

bytes

The total amount of memory used

mem_used_nocache

bytes

The total amount of memory used without considering memory reserved for caching purposes

mem_total

bytes

The total amount of installed memory

mem_fault_minor

faults/s

The number of minor memory faults (i.e., faults that do not cause disk access) per second

mem_fault_major

faults/s

The number of major memory faults (i.e., faults that cause disk access) per second

mem_fault

faults/s

The number of memory faults (major + minor)

mem_swapins

pages/s

The number of memory pages swapped in per second

mem_swapouts

pages/s

The number of memory pages swapped out per second

Network

Metric

Unit

Description

network_tcp_retrans

retrans/s

The number of network TCP retransmissions per second

network_in_bytes_details

bytes/s

The number of inbound network packets in bytes per second broken down by network device (e.g., wlp4s0)

network_out_bytes_details

bytes/s

The number of outbound network packets in bytes per second broken down by network device (e.g., eth01)

Disk

Metric

Unit

Description

disk_swap_util

percent

The average space utilization % of swap disks

disk_swap_used

bytes

The total amount of space used by swap disks

disk_util_details

percent

The utilization % of disk, i.e how much time a disk is busy doing work broken down by disk (e.g., disk D://)

disk_iops_writes

ops/s

The average number of IO disk-write operations per second across all disks

disk_iops_reads

ops/s

The average number of IO disk-read operations per second across all disks

disk_iops

ops/s

The average number of IO disk operations per second across all disks

disk_response_time_read

seconds

The average response time of IO read-disk operations

disk_response_time_worst

seconds

The average response time of IO disk operations of the slowest disk

disk_response_time_write

seconds

The average response time of IO write-disk operations

disk_response_time_details

ops/s

The average response time of IO disk operations broken down by disk (e.g., disk /dev/nvme01 )

disk_iops_details

ops/s

The number of IO disk-write operations of per second broken down by disk (e.g., disk /dev/nvme01)

disk_io_inflight_details

ops

The number of IO disk operations in progress (outstanding) broken down by disk (e.g., disk /dev/nvme01)

disk_write_bytes

bytes/s

The number of bytes per second written across all disks

disk_read_bytes

bytes/s

The number of bytes per second read across all disks

disk_read_write_bytes

bytes/s

The number of bytes per second read and written across all disks

disk_write_bytes_details

bytes/s

The number of bytes per second written from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation WRITE)

disk_read_bytes_details

bytes/s

The number of bytes per second read from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation READ)

Filesystem

Metric

Unit

Description

filesystem_util

percent

The space utilization % of filesystems broken down by type and device (e.g., filesystem of type overlayfs on device /dev/loop1)

filesystem_used

bytes

The amount of space used on the filesystems broken down by type and device (e.g., filesystem of type zfs on device /dev/nvme01)

filesystem_size

bytes

The size of filesystems broken down by type and device (e.g., filesystem of type ext4 for device /dev/nvme01)

Other metrics

Metric

Unit

Description

proc_blocked

processes

The number of processes blocked (e.g, for IO or swapping reasons)

os_context_switch

switches/s

The number of context switches per second

Parameters

CPU

Parameter

Default Value

Domain

Description

os_cpuSchedMinGranularity

2250000 ns

300000→30000000 ns

Minimal preemption granularity (in nanoseconds) for CPU bound tasks

os_cpuSchedWakeupGranularity

3000000 ns

400000→40000000 ns

Scheduler Wakeup Granularity (in nanoseconds)

os_CPUSchedMigrationCost

500000 ns

100000→5000000 ns

os_CPUSchedChildRunsFirst

0→1

A freshly forked child runs before the parent continues execution

os_CPUSchedLatency

18000000 ns

2400000→240000000 ns

Targeted preemption latency (in nanoseconds) for CPU bound tasks

os_CPUSchedAutogroupEnabled

0→1

Enables the Linux task auto-grouping feature, where the kernel assigns related tasks to groups and schedules them together on CPUs to achieve higher performance for some workloads

os_CPUSchedNrMigrate

3→320

Scheduler NR Migrate

Memory

Parameter

Default Value

Domain

Description

os_MemorySwappiness

0→100

Memory Swappiness

os_MemoryVmVfsCachePressure

100 %

10→100 %

VFS Cache Pressure

os_MemoryVmMinFree

67584 KB

10240→1024000 KB

Minimum Free Memory

os_MemoryVmDirtyRatio

20 %

1→99 %

When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write

os_MemoryVmDirtyBackgroundRatio

10 %

1→99 %

When the dirty memory pages exceed this percentage of the total memory, the kernel begins to write them asynchronously in the background

os_MemoryTransparentHugepageEnabled

never

always never madvise

Transparent Hugepage Enablement

os_MemoryTransparentHugepageDefrag

always

always never madvise defer defer+madvise

Transparent Hugepage Enablement Defrag

os_MemorySwap

swapon

swapon swapoff

Memory Swap

os_MemoryVmDirtyExpire

3000 centisecs

300→30000 centisecs

Memory Dirty Expiration Time

os_MemoryVmDirtyWriteback

500 centisecs

50→5000 centisecs

Memory Dirty Writeback

Network

Parameter

Default value

Domain

Description

os_NetworkNetCoreSomaxconn

128 connections

12→1200 connections

Network Max Connections

os_NetworkNetCoreNetdevMaxBacklog

1000 packets

100→10000 packets

Network Max Backlog

os_NetworkNetIpv4TcpMaxSynBacklog

512 packets

52→15120 packets

Network IPV4 Max Sync Backlog

os_NetworkNetCoreNetdevBudget

300 packets

30→3000 packets

Network Budget

os_NetworkNetCoreRmemMax

212992 bytes

21299→2129920 bytes

Maximum network receive buffer size that applications can request

os_NetworkNetCoreWmemMax

21299→2129920 bytes

Maximum network transmit buffer size that applications can request

os_NetworkNetIpv4TcpSlowStartAfterIdle

0→1

Network Slow Start After Idle Flag

os_NetworkNetIpv4TcpFinTimeout

6 →600 seconds

Network TCP timeout

os_NetworkRfs

0→131072

If enabled increases datacache hitrate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running

Storage

Parameter

Default value

Domain

Description

os_StorageReadAhead

128 KB

0→1024 KB

Read-ahead speeds up file access by pre-fetching data and loading it into the page cache so that it can be available earlier in memory instead of from disk

os_StorageNrRequests

1000 packets

100→10000 packets

Network Max Backlog

os_StorageRqAffinity

1→2

Storage Requests Affinity

os_StorageQueueScheduler

none

none kyber mq-deadline bfq

Storage Queue Scheduler Type

os_StorageNomerges

0→2

os_StorageMaxSectorsKb

128 KB

32→128 KB

The largest IO size that the OS c

Constraints

There are no general constraints among RHEL 8 parameters.

Java OpenJDK 8

This page describes the Optimization Pack for Java OpenJDK 8 JVM.

Metrics

Memory

Metric

Unit

Description

mem_used

bytes

The total amount of memory used

requests_throughput

requests/s

The number of requests performed per second

requests_response_time

milliseconds

The average request response time

jvm_heap_size

bytes

The size of the JVM heap memory

jvm_heap_used

bytes

The amount of heap memory used

jvm_heap_util

percent

The utilization % of heap memory

jvm_memory_used

bytes

The total amount of memory used across all the JVM memory pools

jvm_memory_used_details

bytes

The total amount of memory used broken down by pool (e.g., code-cache, compressed-class-space)

jvm_memory_buffer_pool_used

bytes

The total amount bytes used by buffers within the JVM buffer memory pool

CPU

Metric

Unit

Description

cpu_util

percent

The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)

cpu_used

CPUs

The total amount of CPUs used

Garbage Collection

Metric

Unit

Description

jvm_gc_time

percent

The % of wall clock time the JVM spent doing stop the world garbage collection activities

jvm_gc_time_details

percent

The % of wall clock time the JVM spent doing stop the world garbage collection activities broken down by type of garbage collection algorithm (e.g., ParNew)

jvm_gc_count

collections/s

The total number of stop the world JVM garbage collections that have occurred per second

jvm_gc_count_details

collections/s

The total number of stop the world JVM garbage collections that have occurred per second, broken down by type of garbage collection algorithm (e.g., G1, CMS)

jvm_gc_duration

seconds

The average duration of a stop the world JVM garbage collection

jvm_gc_duration_details

seconds

The average duration of a stop the world JVM garbage collection broken down by type of garbage collection algorithm (e.g., G1, CMS)

Other metrics

Metric

Unit

Description

jvm_threads_current

threads

The total number of active threads within the JVM

jvm_threads_deadlocked

threads

The total number of deadlocked threads within the JVM

jvm_compilation_time

milliseconds

The total time spent by the JVM JIT compiler compiling bytecode

Parameters

Memory

Name

Type

Unit

Dafault

Domain

Restart

Description

jvm_minHeapSize

integer

megabytes

You should select your own default value.

You should select your own domain.

yes

The inimum heap size.

jvm_maxHeapSize

integer

megabytes

You should select your own default value.

You should select your own domain.

yes

The maximum heap size.

jvm_maxRAM

integer

megabytes

You should select your own default value.

You should select your own domain.

yes

The maximum amount of memory used by the JVM.

jvm_initialRAMPercentage

real

percent

1.563

0.1 → 100

yes

The initial percentage of memory used by the JVM.

jvm_maxRAMPercentage

real

percent

25.0

0.1 → 100.0

yes

The percentage of memory used for maximum heap size. Requires Java 10, Java 8 Update 191 or later.

jvm_alwaysPreTouch

categorical

-AlwaysPreTouch

+AlwaysPreTouch, -AlwaysPreTouch

yes

Pretouch pages during initialization.

jvm_metaspaceSize

integer

megabytes

20

You should select your own domain.

yes

The initial size of the allocated class metadata space.

jvm_maxMetaspaceSize

integer

megabytes

20

You should select your own domain.

yes

The maximum size of the allocated class metadata space.

jvm_useTransparentHugePages

categorical

-UseTransparentHugePages

+UseTransparentHugePages, -UseTransparentHugePages

yes

Enables the use of large pages that can dynamically grow or shrink.

jvm_allocatePrefetchInstr

integer

0

0 → 3

yes

Prefetch ahead of the allocation pointer.

jvm_allocatePrefetchDistance

integer

bytes

0

0 → 512

yes

Distance to prefetch ahead of allocation pointer. -1 use system-specific value (automatically determined).

jvm_allocatePrefetchLines

integer

lines

3

1 → 64

yes

The number of lines to prefetch ahead of array allocation pointer.

jvm_allocatePrefetchStyle

integer

1

0 → 3

yes

Selects the prefetch instruction to generate.

jvm_useLargePages

categorical

+UseLargePages

+UseLargePages, -UseLargePages

yes

Enable the use of large page memory.

Garbage Collection

Name

Type

Unit

Default

Domain

Restart

Description

jvm_newSize

integer

megabytes

You should select your own default value.

You should select your own domain.

yes

Sets the initial and maximum size of the heap for the young generation (nursery).

jvm_maxNewSize

integer

megabytes

You should select your own default value.

You should select your own domain.

yes

Specifies the upper bound for the young generation size.

jvm_survivorRatio

integer

8

1 → 100

yes

The ratio between the Eden and each Survivor-space within the JVM. For example, a jvm_survivorRatio would mean that the Eden-space is 6 times one Survivor-space.

jvm_useAdaptiveSizePolicy

categorical

+UseAdaptiveSizePolicy

+UseAdaptiveSizePolicy, -UseAdaptiveSizePolicy

yes

Enable adaptive generation sizing. Disable coupled with jvm_targetSurvivorRatio.

jvm_adaptiveSizePolicyWeight

integer

percent

10

1 → 100

yes

The weighting given to the current Garbage Collection time versus previous GC times when checking the timing goal.

jvm_targetSurvivorRatio

integer

50

1 → 100

yes

The desired percentage of Survivor-space used after young garbage collection.

jvm_minHeapFreeRatio

integer

percent

40

1 → 99

yes

The minimum percentage of heap free after garbage collection to avoid shrinking.

jvm_maxHeapFreeRatio

integer

percent

70

1 → 100

yes

The maximum percentage of heap free after garbage collection to avoid shrinking.

jvm_maxTenuringThreshold

integer

15

0 → 15

yes

The maximum value for the tenuring threshold.

jvm_gcType

categorical

Parallel

Serial, Parallel, ConcMarkSweep, G1

yes

Type of the garbage collection algorithm.

jvm_useParallelOldGC

categorical

-UseParallelOldGC

+UseParallelOldGC, -UseParallelOldGC

yes

Enables Parallel Mark and Compact Garbage Collection in Old/Tenured generations.

jvm_concurrentGCThreads

integer

threads

You should select your own default value.

You should select your own domain.

yes

The number of threads concurrent garbage collection will use.

jvm_parallelGCThreads

integer

threads

You should select your own default value.

You should select your own domain.

yes

The number of threads garbage collection will use for parallel phases.

jvm_maxGCPauseMillis

integer

milliseconds

200

1 → 32767

yes

Adaptive size policy maximum GC pause time goal in millisecond.

jvm_resizePLAB

categorical

+ResizePLAB

+ResizePLAB, -ResizePLAB

yes

Enables the dynamic resizing of promotion LABs.

jvm_GCTimeRatio

integer

99

0 → 100

yes

The target fraction of time that can be spent in garbage collection before increasing the heap, computet as 1 / (1 + GCTimeRatio).

jvm_initiatingHeapOccupancyPercent

integer

45

0 → 100

yes

Sets the percentage of the heap occupancy at which to start a concurrent GC cycle.

jvm_youngGenerationSizeIncrement

integer

percent

20

0 → 100

yes

The increment size for Young Generation adaptive resizing.

jvm_tenuredGenerationSizeIncrement

integer

percent

20

0 → 100

yes

The increment size for Old/Tenured Generation adaptive resizing.

jvm_adaptiveSizeDecrementScaleFactor

integer

percent

4

1 → 1024

yes

Specifies the scale factor for goal-driven generation resizing.

jvm_CMSTriggerRatio

integer

80

0 → 100

yes

The percentage of MinHeapFreeRatio allocated before CMS GC starts

jvm_CMSInitiatingOccupancyFraction

integer

-1

-1 → 99

yes

Configure oldgen occupancy fraction threshold for CMS GC. Negative values default to CMSTriggerRatio.

jvm_CMSClassUnloadingEnabled

categorical

+CMSClassUnloadingEnabled

+CMSClassUnloadingEnabled, -CMSClassUnloadingEnabled

yes

Enables class unloading when using CMS.

jvm_useCMSInitiatingOccupancyOnly

categorical

-UseCMSInitiatingOccupancyOnly

+UseCMSInitiatingOccupancyOnly, -UseCMSInitiatingOccupancyOnly

yes

Use of the occupancy value as the only criterion for initiating the CMS collector.

jvm_G1HeapRegionSize

integer

megabytes

8

1→32

yes

Sets the size of the regions for G1.

jvm_G1ReservePercent

integer

10

0 → 50

yes

Sets the percentage of the heap that is reserved as a false ceiling to reduce the possibility of promotion failure for the G1 collector.

jvm_G1NewSizePercent

integer

5

0 → 100

yes

Sets the percentage of the heap to use as the minimum for the young generation size.

jvm_G1MaxNewSizePercent

integer

60

0 → 100

yes

Sets the percentage of the heap size to use as the maximum for young generation size.

jvm_G1MixedGCLiveThresholdPercent

integer

85

0 → 100

yes

Sets the occupancy threshold for an old region to be included in a mixed garbage collection cycle.

jvm_G1HeapWastePercent

integer

5

0 → 100

yes

The maximum percentage of the reclaimable heap before starting mixed GC.

jvm_G1MixedGCCountTarget

integer

collections

8

0 → 100

yes

Sets the target number of mixed garbage collections after a marking cycle to collect old regions with at most G1MixedGCLIveThresholdPercent live data. The default is 8 mixed garbage collections.

jvm_G1OldCSetRegionThresholdPercent

integer

10

0 → 100

yes

The upper limit on the number of old regions to be collected during mixed GC.

Compilation

Name

Type

Unit

Default

Domain

Restart

Description

jvm_reservedCodeCacheSize

integer

megabytes

240

3 → 2048

yes

The maximum size of the compiled code cache pool.

jvm_tieredCompilation

categorical

+TieredCompilation

+TieredCompilation, -TieredCompilation

yes

The type of the garbage collection algorithm.

jvm_tieredCompilationStopAtLevel

integer

4

0 → 4

yes

Overrides the number of detected CPUs that the VM will use to calculate the size of thread pools.

jvm_compilationThreads

integer

threads

You should select your own default value.

You should select your own domain.

yes

The number of compilation threads.

jvm_backgroundCompilation

categorical

+BackgroundCompilation

+BackgroundCompilation, -BackgroundCompilation

yes

Allow async interpreted execution of a method while it is being compiled.

jvm_inline

categorical

+Inline

+Inline, -Inline

yes

Enable inlining.

jvm_maxInlineSize

integer

bytes

35

1 → 2097152

yes

The bytecode size limit (in bytes) of the inlined methods.

jvm_inlineSmallCode

integer

bytes

2000

1 → 16384

yes

The maximum compiled code size limit (in bytes) of the inlined methods.

Other parameters

Name

Type

Unit

Default

Domain

Restart

Description

jvm_aggressiveOpts

categorical

-AggressiveOpts

+AggressiveOpts, -AggressiveOpts

yes

Turn on point performance compiler optimizations.

jvm_usePerfData

categorical

+UsePerfData

+UsePerfData, -UsePerfData

yes

Enable monitoring of performance data.

jvm_useNUMA

categorical

-UseNUMA

+UseNUMA, -UseNUMA

yes

Enable NUMA.

jvm_useBiasedLocking

categorical

+UseBiasedLocking

+UseBiasedLocking, -UseBiasedLocking

yes

Manage the use of biased locking.

jvm_activeProcessorCount

integer

CPUs

1

1 → 512

yes

Overrides the number of detected CPUs that the VM will use to calculate the size of thread pools.

Domains

The following parameters require their ranges or default values to be updated according to the described rules:

Parameter

Default value

Domain

jvm_minHeapSize

Depends on the instance available memory

jvm_maxHeapSize

Depends on the instance available memory

jvm_newSize

Depends on the configured heap

jvm_maxNewSize

Depends on the configured heap

jvm_concurrentGCThreads

Depends on the available CPU cores

jvm_parallelGCThreads

Depends on the available CPU cores

jvm_compilation_threads

Depends on the available CPU cores

Constraints

The following tables show a list of constraints that may be required in the definition of the study, depending on the tuned parameters:

Formula

Notes

jvm.jvm_minHeapSize <= jvm.jvm_maxHeapSize

jvm.jvm_minHeapFreeRatio <= jvm.jvm_maxHeapFreeRatio

jvm.jvm_maxNewSize < jvm.jvm_maxHeapSize

jvm.jvm_concurrentGCThreads <= jvm.jvm_parallelGCThreads

Ubuntu 16.04

This page describes the Optimization Pack for the component type Ubuntu 16.04.

Metrics

CPU

Metric

Unit

Description

cpu_num

CPUs

The number of CPUs available in the system (physical and logical)

cpu_util

percent

The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)

cpu_util_details

percent

The average CPU utilization % broken down by usage type and cpu number (e.g., cp1 user, cp2 system, cp3 soft-irq)

cpu_load_avg

tasks

The system load average (i.e., the number of active tasks in the system)

Memory

Metric

Unit

Description

mem_util

percent

The memory utilization % (i.e, the % of memory used)

mem_util_nocache

percent

The memory utilization % (i.e., the % of memory used) without considering memory reserved for caching purposes

mem_util_details

percent

The memory utilization % (i.e., the % of memory used) broken down by usage type (e.g., active memory)

mem_used

bytes

The total amount of memory used

mem_used_nocache

bytes

The total amount of memory used without considering memory reserved for caching purposes

mem_total

bytes

The total amount of installed memory

mem_fault_minor

faults/s

The number of minor memory faults (i.e., faults that do not cause disk access) per second

mem_fault_major

faults/s

The number of major memory faults (i.e., faults that cause disk access) per second

mem_fault

faults/s

The number of memory faults (major + minor)

mem_swapins

pages/s

The number of memory pages swapped in per second

mem_swapouts

pages/s

The number of memory pages swapped out per second

Network

Metric

Unit

Description

network_tcp_retrans

retrans/s

The number of network TCP retransmissions per second

network_in_bytes_details

bytes/s

The number of inbound network packets in bytes per second broken down by network device (e.g., wlp4s0)

network_out_bytes_details

bytes/s

The number of outbound network packets in bytes per second broken down by network device (e.g., eth01)

Disk

Metric

Unit

Description

disk_swap_util

percent

The average space utilization % of swap disks

disk_swap_used

bytes

The total amount of space used by swap disks

disk_util_details

percent

The utilization % of disk, i.e how much time a disk is busy doing work broken down by disk (e.g., disk D://)

disk_iops_writes

ops/s

The average number of IO disk-write operations per second across all disks

disk_iops_reads

ops/s

The average number of IO disk-read operations per second across all disks

disk_iops

ops/s

The average number of IO disk operations per second across all disks

disk_response_time_read

seconds

The average response time of IO read-disk operations

disk_response_time_worst

seconds

The average response time of IO disk operations of the slowest disk

disk_response_time_write

seconds

The average response time of IO write-disk operations

disk_response_time_details

ops/s

The average response time of IO disk operations broken down by disk (e.g., disk /dev/nvme01 )

disk_iops_details

ops/s

The number of IO disk-write operations of per second broken down by disk (e.g., disk /dev/nvme01)

disk_io_inflight_details

ops

The number of IO disk operations in progress (outstanding) broken down by disk (e.g., disk /dev/nvme01)

disk_write_bytes

bytes/s

The number of bytes per second written across all disks

disk_read_bytes

bytes/s

The number of bytes per second read across all disks

disk_read_write_bytes

bytes/s

The number of bytes per second read and written across all disks

disk_write_bytes_details

bytes/s

The number of bytes per second written from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation WRITE)

disk_read_bytes_details

bytes/s

The number of bytes per second read from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation READ)

Filesystem

Metric

Unit

Description

filesystem_util

percent

The space utilization % of filesystems broken down by type and device (e.g., filesystem of type overlayfs on device /dev/loop1)

filesystem_used

bytes

The amount of space used on the filesystems broken down by type and device (e.g., filesystem of type zfs on device /dev/nvme01)

filesystem_size

bytes

The size of filesystems broken down by type and device (e.g., filesystem of type ext4 for device /dev/nvme01)

Other metrics

Metric

Unit

Description

proc_blocked

processes

The number of processes blocked (e.g, for IO or swapping reasons)

os_context_switch

switches/s

The number of context switches per second

Parameters

CPU

Parameter

Default Value

Domain

Description

os_cpuSchedMinGranularity

2250000 ns

300000→30000000 ns

Minimal preemption granularity (in nanoseconds) for CPU bound tasks

os_cpuSchedWakeupGranularity

3000000 ns

400000→40000000 ns

Scheduler Wakeup Granularity (in nanoseconds)

os_CPUSchedMigrationCost

500000 ns

100000→5000000 ns

os_CPUSchedChildRunsFirst

0→1

A freshly forked child runs before the parent continues execution

os_CPUSchedLatency

18000000 ns

2400000→240000000 ns

Targeted preemption latency (in nanoseconds) for CPU bound tasks

os_CPUSchedAutogroupEnabled

0→1

Enables the Linux task auto-grouping feature, where the kernel assigns related tasks to groups and schedules them together on CPUs to achieve higher performance for some workloads

os_CPUSchedNrMigrate

3→320

Scheduler NR Migrate

Memory

Parameter

Default Value

Domain

Description

os_MemorySwappiness

0→100

Memory Swappiness

os_MemoryVmVfsCachePressure

100 %

10→100 %

VFS Cache Pressure

os_MemoryVmMinFree

67584 KB

10240→1024000 KB

Minimum Free Memory

os_MemoryVmDirtyRatio

20 %

1→99 %

When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write

os_MemoryVmDirtyBackgroundRatio

10 %

1→99 %

When the dirty memory pages exceed this percentage of the total memory, the kernel begins to write them asynchronously in the background

os_MemoryTransparentHugepageEnabled

always

always never

Transparent Hugepage Enablement

os_MemoryTransparentHugepageDefrag

always

always never

Transparent Hugepage Enablement Defrag

os_MemorySwap

swapon

swapon swapoff

Memory Swap

os_MemoryVmDirtyExpire

3000 centisecs

300→30000 centisecs

Memory Dirty Expiration Time

os_MemoryVmDirtyWriteback

500 centisecs

50→5000 centisecs

Memory Dirty Writeback

Network

Parameter

Default value

Domain

Description

os_NetworkNetCoreSomaxconn

128 connections

12→1200 connections

Network Max Connections

os_NetworkNetCoreNetdevMaxBacklog

1000 packets

100→10000 packets

Network Max Backlog

os_NetworkNetIpv4TcpMaxSynBacklog

1024 packets

52→15120 packets

Network IPV4 Max Sync Backlog

os_NetworkNetCoreNetdevBudget

300 packets

30→3000 packets

Network Budget

os_NetworkNetCoreRmemMax

212992 bytes

21299→2129920 bytes

Maximum network receive buffer size that applications can request

os_NetworkNetCoreWmemMax

21299→2129920 bytes

Maximum network transmit buffer size that applications can request

os_NetworkNetIpv4TcpSlowStartAfterIdle

0→1

Network Slow Start After Idle Flag

os_NetworkNetIpv4TcpFinTimeout

6 →600 seconds

Network TCP timeout

os_NetworkRfs

0→131072

If enabled increases datacache hitrate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running

Storage

Parameter

Default value

Domain

Description

os_StorageReadAhead

128 KB

0→1024 KB

Read-ahead speeds up file access by pre-fetching data and loading it into the page cache so that it can be available earlier in memory instead of from disk

os_StorageNrRequests

1000 packets

100→10000 packets

Network Max Backlog

os_StorageRqAffinity

1→2

Storage Requests Affinity

os_StorageQueueScheduler

none

none kyber

Storage Queue Scheduler Type

os_StorageNomerges

0→2

os_StorageMaxSectorsKb

128 KB

32→128 KB

The largest IO size that the OS c

DotNet Core 3.1

This page describes the Optimization Pack for the component type DotNet Core 3.1.

Metrics

Metric

Unit

Description

gc_count

collections/s

The total number of garbage collections

gc_duration

seconds

The garbage collection duration

heap_hard_limit

bytes

The size of the heap

Parameters

Parameter

Type

Unit

Default

Domain

Restart

Description

csproj_System_GC_Server

categorical

CPUs

false

true, false

yes

The main flavor of the GC: set it to false for workstation GC or true for server GC. To be set in csproj file and requires rebuild.

csproj_System_GC_Concurrent

categorical

boolean

true

true, false

yes

Configures whether background (concurrent) garbage collection is enabled (setting to true). To be set in csproj file and requires rebuild.

runtime_System_GC_Server

categorical

boolean

false

true, false

yes

The main flavor of the GC: set it to false for workstation GC or true for server GC. To be set in csproj file and requires rebuild.

runtime_System_GC_Concurrent

categorical

boolean

true

true, false

yes

Configures whether background (concurrent) garbage collection is enabled (setting to true). To be set in csproj file and requires rebuild.

runtime_System_GC_HeapCount

integer

heapcount

8

1 → 1000

Limits the number of heaps created by the garbage collector. To be set in runtimeconfig.json in runtimeOptions: configProperties

runtime_System_GC_CpuGroup

categorical

boolean

0

1, 0

Configures whether the garbage collector uses CPU groups or not. Default is false. To be set in runtimeconfig.json

runtime_System_GC_NoAffinitize

categorical

boolean

false

true, false

Specifies whether to affinitize garbage collection threads with processors. To affinitize a GC thread means that it can only run on its specific CPU. To be set in runtimeconfig.json in runtimeOptions: configProperties

runtime_System_GC_HeapHardLimit

integer

bytes

20971520

16777216 → 1099511627776

Specifies the maximum commit size, in bytes, for the GC heap and GC bookkeeping. To be set in runtimeconfig.json in runtimeOptions: configProperties

runtime_System_GC_HeapHardLimitPercent

real

percent

0.75

0.1 → 100.0

Specifies the allowable GC heap usage as a percentage of the total physical memory. To be set in runtimeconfig.json in runtimeOptions: configProperties.

runtime_System_GC_HighMemoryPercent

integer

bytes

20971520

16777216 → 1099511627776

Specify the memory threshold that triggers the execution of a garbage collection. To be set in runtimeconfig.json.

runtime_System_GC_RetainVM

categorical

boolean

false

true, false

Configures whether segments that should be deleted are put on a standby list for future use or are released back to the operating system (OS). Default is false. To be set in runtimeconfig.json in runtimeOptions: configProperties

runtime_System_GC_LOHThreshold

integer

bytes

85000

850000 → 1099511627776

Specifies the threshold size, in bytes, that causes objects to go on the large object heap (LOH). To be set in runtimeconfig.json in runtimeOptions: configProperties

webconf_maxconnection

integer

connections

2

2 → 1000

This setting controls the maximum number of outgoing HTTP connections that you can initiate from a client. To be set in web.config (target app only) or machine.config (global)

webconf_maxIoThreads

integer

threads

20

20 → 1000

Controls the maximum number of I/O threads in the .NET thread pool. Automatically multiplied by the number of available CPUs. To be set in web.config (target app only) or machine.config (global). It requires autoConfig=false

webconf_minIoThreads

integer

threads

20

20 → 1000

The minIoThreads setting enable you to configure a minimum number of worker threads and I/O threads for load conditions. To be set in web.config (target app only) or machine.config (global). It requires autoConfig=false

webconf_maxWorkerThreads

integer

threads

20

20 → 1000

This setting controls the maximum number of worker threads in the thread pool. This number is then automatically multiplied by the number of available CPUs.To be set in web.config (target app only) or machine.config (global).It requires autoConfig=false

webconf_minWorkerThreads

integer

threads

20

20 → 1000

The minWorkerThreads setting enable you to configure a minimum number of worker threads and I/O threads for load conditions. To be set in web.config (target app only) or machine.config (global). It requires autoConfig=false

webconf_minFreeThreads

integer

threads

8

8 → 800

Used by the worker process to queue all the incoming requests if the number of available threads in the thread pool falls below its value. To be set in web.config (target app only) or machine.config (global). It requires autoConfig=false

webconf_minLocalRequestFreeThreads

integer

threads

4

4 → 7600

Used to queue requests from localhost (where a Web application sends requests to a local Web service) if the number of available threads falls below it. To be set in web.config (target app only) or machine.config (global). It requires autoConfig=false

webconf_autoConfig

categori

boolean

true

true, false

Enable settings the system.web configuration parameters. To be set in web.config (target app only) or machine.config (global)

Java OpenJDK 11

This page describes the Optimization Pack for Java OpenJDK 11 JVM.

Metrics

Memory

Metric

Unit

Description

mem_used

bytes

The total amount of memory used

requests_throughput

requests/s

The number of requests performed per second

requests_response_time

milliseconds

The average request response time

jvm_heap_size

bytes

The size of the JVM heap memory

jvm_heap_used

bytes

The amount of heap memory used

jvm_heap_util

percent

The utilization % of heap memory

jvm_memory_used

bytes

The total amount of memory used across all the JVM memory pools

jvm_memory_used_details

bytes

The total amount of memory used broken down by pool (e.g., code-cache, compressed-class-space)

jvm_memory_buffer_pool_used

bytes

The total amount bytes used by buffers within the JVM buffer memory pool

CPU

Metric

Unit

Description

cpu_util

percent

The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)

cpu_used

CPUs

The total amount of CPUs used

Garbage Collection

Metric

Unit

Description

jvm_gc_time

percent

The % of wall clock time the JVM spent doing stop the world garbage collection activities

jvm_gc_time_details

percent

The % of wall clock time the JVM spent doing stop the world garbage collection activities broken down by type of garbage collection algorithm (e.g., ParNew)

jvm_gc_count

collections/s

The total number of stop the world JVM garbage collections that have occurred per second

jvm_gc_count_details

collections/s

The total number of stop the world JVM garbage collections that have occurred per second, broken down by type of garbage collection algorithm (e.g., G1, CMS)

jvm_gc_duration

seconds

The average duration of a stop the world JVM garbage collection

jvm_gc_duration_details

seconds

The average duration of a stop the world JVM garbage collection broken down by type of garbage collection algorithm (e.g., G1, CMS)

Other metrics

Metric

Unit

Description

jvm_threads_current

threads

The total number of active threads within the JVM

jvm_threads_deadlocked

threads

The total number of deadlocked threads within the JVM

jvm_compilation_time

milliseconds

The total time spent by the JVM JIT compiler compiling bytecode

Parameters

Memory

Name

Type

Unit

Dafault

Domain

Restart

Description

jvm_minHeapSize

integer

megabytes

You should select your own default value.

You should select your own domain.

yes

The inimum heap size.

jvm_maxHeapSize

integer

megabytes

You should select your own default value.

You should select your own domain.

yes

The maximum heap size.

jvm_maxRAM

integer

megabytes

You should select your own default value.

You should select your own domain.

yes

The maximum amount of memory used by the JVM.

jvm_initialRAMPercentage

real

percent

1.563

0.1 → 100

yes

The initial percentage of memory used by the JVM.

jvm_maxRAMPercentage

real

percent

25.0

0.1 → 100.0

yes

The percentage of memory used for maximum heap size. Requires Java 10, Java 8 Update 191 or later.

jvm_alwaysPreTouch

categorical

-AlwaysPreTouch

+AlwaysPreTouch, -AlwaysPreTouch

yes

Pretouch pages during initialization.

jvm_metaspaceSize

integer

megabytes

20

You should select your own domain.

yes

The initial size of the allocated class metadata space.

jvm_maxMetaspaceSize

integer

megabytes

20

You should select your own domain.

yes

The maximum size of the allocated class metadata space.

jvm_useTransparentHugePages

categorical

-UseTransparentHugePages

+UseTransparentHugePages, -UseTransparentHugePages

yes

Enables the use of large pages that can dynamically grow or shrink.

jvm_allocatePrefetchInstr

integer

0

0 → 3

yes

Prefetch ahead of the allocation pointer.

jvm_allocatePrefetchDistance

integer

bytes

0

0 → 512

yes

Distance to prefetch ahead of allocation pointer. -1 use system-specific value (automatically determined).

jvm_allocatePrefetchLines

integer

lines

3

1 → 64

yes

The number of lines to prefetch ahead of array allocation pointer.

jvm_allocatePrefetchStyle

integer

1

0 → 3

yes

Selects the prefetch instruction to generate.

jvm_useLargePages

categorical

+UseLargePages

+UseLargePages, -UseLargePages

yes

Enable the use of large page memory.

jvm_aggressiveHeap

categorical

-AggressiveHeap

-AggressiveHeap, +AggressiveHeap

yes

Optimize heap options for long-running memory intensive apps.

Garbage Collection

Name

Type

Unit

Default

Domain

Restart

Description

jvm_newSize

integer

megabytes

You should select your own default value.

You should select your own domain.

yes

Sets the initial and maximum size of the heap for the young generation (nursery).

jvm_maxNewSize

integer

megabytes

You should select your own default value.

You should select your own domain.

yes

Specifies the upper bound for the young generation size.

jvm_survivorRatio

integer

8

1 → 100

yes

The ratio between the Eden and each Survivor-space within the JVM. For example, a jvm_survivorRatio would mean that the Eden-space is 6 times one Survivor-space.

jvm_useAdaptiveSizePolicy

categorical

+UseAdaptiveSizePolicy

+UseAdaptiveSizePolicy, -UseAdaptiveSizePolicy

yes

Enable adaptive generation sizing. Disable coupled with jvm_targetSurvivorRatio.

jvm_adaptiveSizePolicyWeight

integer

percent

10

1 → 100

yes

The weighting given to the current Garbage Collection time versus previous GC times when checking the timing goal.

jvm_targetSurvivorRatio

integer

50

1 → 100

yes

The desired percentage of Survivor-space used after young garbage collection.

jvm_minHeapFreeRatio

integer

percent

40

1 → 99

yes

The minimum percentage of heap free after garbage collection to avoid shrinking.

jvm_maxHeapFreeRatio

integer

percent

70

1 → 100

yes

The maximum percentage of heap free after garbage collection to avoid shrinking.

jvm_maxTenuringThreshold

integer

15

0 → 15

yes

The maximum value for the tenuring threshold.

jvm_gcType

categorical

Parallel

Serial, Parallel, ConcMarkSweep, G1

yes

Type of the garbage collection algorithm.

jvm_useParallelOldGC

categorical

-UseParallelOldGC

+UseParallelOldGC, -UseParallelOldGC

yes

Enables Parallel Mark and Compact Garbage Collection in Old/Tenured generations.

jvm_concurrentGCThreads

integer

threads

You should select your own default value.

You should select your own domain.

yes

The number of threads concurrent garbage collection will use.

jvm_parallelGCThreads

integer

threads

You should select your own default value.

You should select your own domain.

yes

The number of threads garbage collection will use for parallel phases.

jvm_maxGCPauseMillis

integer

milliseconds

200

1 → 32767

yes

Adaptive size policy maximum GC pause time goal in millisecond.

jvm_resizePLAB

categorical

+ResizePLAB

+ResizePLAB, -ResizePLAB

yes

Enables the dynamic resizing of promotion LABs.

jvm_GCTimeRatio

integer

99

0 → 100

yes

The target fraction of time that can be spent in garbage collection before increasing the heap, computet as 1 / (1 + GCTimeRatio).

jvm_initiatingHeapOccupancyPercent

integer

45

0 → 100

yes

Sets the percentage of the heap occupancy at which to start a concurrent GC cycle.

jvm_youngGenerationSizeIncrement

integer

percent

20

0 → 100

yes

The increment size for Young Generation adaptive resizing.

jvm_tenuredGenerationSizeIncrement

integer

percent

20

0 → 100

yes

The increment size for Old/Tenured Generation adaptive resizing.

jvm_adaptiveSizeDecrementScaleFactor

integer

percent

4

1 → 1024

yes

Specifies the scale factor for goal-driven generation resizing.

jvm_CMSTriggerRatio

integer

80

0 → 100

yes

The percentage of MinHeapFreeRatio allocated before CMS GC starts

jvm_CMSInitiatingOccupancyFraction

integer

-1

-1 → 99

yes

Configure oldgen occupancy fraction threshold for CMS GC. Negative values default to CMSTriggerRatio.

jvm_CMSClassUnloadingEnabled

categorical

+CMSClassUnloadingEnabled

+CMSClassUnloadingEnabled, -CMSClassUnloadingEnabled

yes

Enables class unloading when using CMS.

jvm_useCMSInitiatingOccupancyOnly

categorical

-UseCMSInitiatingOccupancyOnly

+UseCMSInitiatingOccupancyOnly, -UseCMSInitiatingOccupancyOnly

yes

Use of the occupancy value as the only criterion for initiating the CMS collector.

jvm_G1HeapRegionSize

integer

megabytes

8

1→32

yes

Sets the size of the regions for G1.

jvm_G1ReservePercent

integer

10

0 → 50

yes

Sets the percentage of the heap that is reserved as a false ceiling to reduce the possibility of promotion failure for the G1 collector.

jvm_G1NewSizePercent

integer

5

0 → 100

yes

Sets the percentage of the heap to use as the minimum for the young generation size.

jvm_G1MaxNewSizePercent

integer

60

0 → 100

yes

Sets the percentage of the heap size to use as the maximum for young generation size.

jvm_G1MixedGCLiveThresholdPercent

integer

85

0 → 100

yes

Sets the occupancy threshold for an old region to be included in a mixed garbage collection cycle.

jvm_G1HeapWastePercent

integer

5

0 → 100

yes

The maximum percentage of the reclaimable heap before starting mixed GC.

jvm_G1MixedGCCountTarget

integer

collections

8

0 → 100

yes

Sets the target number of mixed garbage collections after a marking cycle to collect old regions with at most G1MixedGCLIveThresholdPercent live data. The default is 8 mixed garbage collections.

jvm_G1OldCSetRegionThresholdPercent

integer

10

0 → 100

yes

The upper limit on the number of old regions to be collected during mixed GC.

jvm_G1AdaptiveIHOPNumInitialSamples

integer

3

1→2097152

yes

The number of completed time periods from initial mark to first mixed GC required to use the input values for prediction of the optimal occupancy to start marking.

jvm_G1UseAdaptiveIHOP

categorical

+G1UseAdaptiveIHOP

+G1UseAdaptiveIHOP, -G1UseAdaptiveIHOP

yes

Adaptively adjust the initiating heap occupancy from the initial value of InitiatingHeapOccupancyPercent.

Compilation

Name

Type

Unit

Default

Domain

Restart

Description

jvm_reservedCodeCacheSize

integer

megabytes

240

3 → 2048

yes

The maximum size of the compiled code cache pool.

jvm_tieredCompilation

categorical

+TieredCompilation

+TieredCompilation, -TieredCompilation

yes

The type of the garbage collection algorithm.

jvm_tieredCompilationStopAtLevel

integer

4

0 → 4

yes

Overrides the number of detected CPUs that the VM will use to calculate the size of thread pools.

jvm_compilationThreads

integer

threads

You should select your own default value.

You should select your own domain.

yes

The number of compilation threads.

jvm_backgroundCompilation

categorical

+BackgroundCompilation

+BackgroundCompilation, -BackgroundCompilation

yes

Allow async interpreted execution of a method while it is being compiled.

jvm_inline

categorical

+Inline

+Inline, -Inline

yes

Enable inlining.

jvm_maxInlineSize

integer

bytes

35

1 → 2097152

yes

The bytecode size limit (in bytes) of the inlined methods.

jvm_inlineSmallCode

integer

bytes

2000

1 → 16384

yes

The maximum compiled code size limit (in bytes) of the inlined methods.

Other parameters

Name

Type

Unit

Default

Domain

Restart

Description

jvm_aggressiveOpts

categorical

-AggressiveOpts

+AggressiveOpts, -AggressiveOpts

yes

Turn on point performance compiler optimizations.

jvm_usePerfData

categorical

+UsePerfData

+UsePerfData, -UsePerfData

yes

Enable monitoring of performance data.

jvm_useNUMA

categorical

-UseNUMA

+UseNUMA, -UseNUMA

yes

Enable NUMA.

jvm_useBiasedLocking

categorical

+UseBiasedLocking

+UseBiasedLocking, -UseBiasedLocking

yes

Manage the use of biased locking.

jvm_activeProcessorCount

integer

CPUs

1

1 → 512

yes

Overrides the number of detected CPUs that the VM will use to calculate the size of thread pools.

Domains

The following parameters require their ranges or default values to be updated according to the described rules:

Parameter

Default value

Domain

jvm_minHeapSize

Depends on the instance available memory

jvm_maxHeapSize

Depends on the instance available memory

jvm_newSize

Depends on the configured heap

jvm_maxNewSize

Depends on the configured heap

jvm_concurrentGCThreads

Depends on the available CPU cores

jvm_parallelGCThreads

Depends on the available CPU cores

jvm_compilation_threads

Depends on the available CPU cores

Constraints

The following tables show a list of constraints that may be required in the definition of the study, depending on the tuned parameters:

Formula

Notes

jvm.jvm_minHeapSize <= jvm.jvm_maxHeapSize

jvm.jvm_minHeapFreeRatio <= jvm.jvm_maxHeapFreeRatio

jvm.jvm_maxNewSize < jvm.jvm_maxHeapSize * 0.8

jvm.jvm_concurrentGCThreads <= jvm.jvm_parallelGCThreads

jvm_activeProcessorCount < container.cpu_limits + 1

RHEL 8

This page describes the Optimization Pack for the component type RHEL 8.

Metrics

CPU

Metric

Unit

Description

cpu_num

CPUs

The number of CPUs available in the system (physical and logical)

cpu_util

percent

The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)

cpu_util_details

percent

The average CPU utilization % broken down by usage type and cpu number (e.g., cp1 user, cp2 system, cp3 soft-irq)

cpu_load_avg

tasks

The system load average (i.e., the number of active tasks in the system)

Memory

Metric

Unit

Description

mem_util

percent

The memory utilization % (i.e, the % of memory used)

mem_util_nocache

percent

The memory utilization % (i.e., the % of memory used) without considering memory reserved for caching purposes

mem_util_details

percent

The memory utilization % (i.e., the % of memory used) broken down by usage type (e.g., active memory)

mem_used

bytes

The total amount of memory used

mem_used_nocache

bytes

The total amount of memory used without considering memory reserved for caching purposes

mem_total

bytes

The total amount of installed memory

mem_fault_minor

faults/s

The number of minor memory faults (i.e., faults that do not cause disk access) per second

mem_fault_major

faults/s

The number of major memory faults (i.e., faults that cause disk access) per second

mem_fault

faults/s

The number of memory faults (major + minor)

mem_swapins

pages/s

The number of memory pages swapped in per second

mem_swapouts

pages/s

The number of memory pages swapped out per second

Network

Metric

Unit

Description

network_tcp_retrans

retrans/s

The number of network TCP retransmissions per second

network_in_bytes_details

bytes/s

The number of inbound network packets in bytes per second broken down by network device (e.g., wlp4s0)

network_out_bytes_details

bytes/s

The number of outbound network packets in bytes per second broken down by network device (e.g., eth01)

Disk

Metric

Unit

Description

disk_swap_util

percent

The average space utilization % of swap disks

disk_swap_used

bytes

The total amount of space used by swap disks

disk_util_details

percent

The utilization % of disk, i.e how much time a disk is busy doing work broken down by disk (e.g., disk D://)

disk_iops_writes

ops/s

The average number of IO disk-write operations per second across all disks

disk_iops_reads

ops/s

The average number of IO disk-read operations per second across all disks

disk_iops

ops/s

The average number of IO disk operations per second across all disks

disk_response_time_read

seconds

The average response time of IO read-disk operations

disk_response_time_worst

seconds

The average response time of IO disk operations of the slowest disk

disk_response_time_write

seconds

The average response time of IO write-disk operations

disk_response_time_details

ops/s

The average response time of IO disk operations broken down by disk (e.g., disk /dev/nvme01 )

disk_iops_details

ops/s

The number of IO disk-write operations of per second broken down by disk (e.g., disk /dev/nvme01)

disk_io_inflight_details

ops

The number of IO disk operations in progress (outstanding) broken down by disk (e.g., disk /dev/nvme01)

disk_write_bytes

bytes/s

The number of bytes per second written across all disks

disk_read_bytes

bytes/s

The number of bytes per second read across all disks

disk_read_write_bytes

bytes/s

The number of bytes per second read and written across all disks

disk_write_bytes_details

bytes/s

The number of bytes per second written from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation WRITE)

disk_read_bytes_details

bytes/s

The number of bytes per second read from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation READ)

Filesystem

Metric

Unit

Description

filesystem_util

percent

The space utilization % of filesystems broken down by type and device (e.g., filesystem of type overlayfs on device /dev/loop1)

filesystem_used

bytes

The amount of space used on the filesystems broken down by type and device (e.g., filesystem of type zfs on device /dev/nvme01)

filesystem_size

bytes

The size of filesystems broken down by type and device (e.g., filesystem of type ext4 for device /dev/nvme01)

Other metrics

Metric

Unit

Description

proc_blocked

processes

The number of processes blocked (e.g, for IO or swapping reasons)

os_context_switch

switches/s

The number of context switches per second

Parameters

CPU

Parameter

Default Value

Domain

Description

os_cpuSchedMinGranularity

2250000 ns

300000→30000000 ns

Minimal preemption granularity (in nanoseconds) for CPU bound tasks

os_cpuSchedWakeupGranularity

3000000 ns

400000→40000000 ns

Scheduler Wakeup Granularity (in nanoseconds)

os_CPUSchedMigrationCost

500000 ns

100000→5000000 ns

os_CPUSchedChildRunsFirst

0→1

A freshly forked child runs before the parent continues execution

os_CPUSchedLatency

18000000 ns

2400000→240000000 ns

Targeted preemption latency (in nanoseconds) for CPU bound tasks

os_CPUSchedAutogroupEnabled

0→1

Enables the Linux task auto-grouping feature, where the kernel assigns related tasks to groups and schedules them together on CPUs to achieve higher performance for some workloads

os_CPUSchedNrMigrate

3→320

Scheduler NR Migrate

Memory

Parameter

Default Value

Domain

Description

os_MemorySwappiness

0→100

Memory Swappiness

os_MemoryVmVfsCachePressure

100 %

10→100 %

VFS Cache Pressure

os_MemoryVmMinFree

67584 KB

10240→1024000 KB

Minimum Free Memory

os_MemoryVmDirtyRatio

30 %

1→99 %

When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write

os_MemoryVmDirtyBackgroundRatio

10 %

1→99 %

When the dirty memory pages exceed this percentage of the total memory, the kernel begins to write them asynchronously in the background

os_MemoryTransparentHugepageEnabled

never

always never madvise

Transparent Hugepage Enablement

os_MemoryTransparentHugepageDefrag

always

always never madvise defer defer+madvise

Transparent Hugepage Enablement Defrag

os_MemorySwap

swapon

swapon swapoff

Memory Swap

os_MemoryVmDirtyExpire

3000 centisecs

300→30000 centisecs

Memory Dirty Expiration Time

os_MemoryVmDirtyWriteback

500 centisecs

50→5000 centisecs

Memory Dirty Writeback

Network

Parameter

Default value

Domain

Description

os_NetworkNetCoreSomaxconn

128 connections

12→1200 connections

Network Max Connections

os_NetworkNetCoreNetdevMaxBacklog

1000 packets

100→10000 packets

Network Max Backlog

os_NetworkNetIpv4TcpMaxSynBacklog

512 packets

52→15120 packets

Network IPV4 Max Sync Backlog

os_NetworkNetCoreNetdevBudget

300 packets

30→3000 packets

Network Budget

os_NetworkNetCoreRmemMax

212992 bytes

21299→2129920 bytes

Maximum network receive buffer size that applications can request

os_NetworkNetCoreWmemMax

21299→2129920 bytes

Maximum network transmit buffer size that applications can request

os_NetworkNetIpv4TcpSlowStartAfterIdle

0→1

Network Slow Start After Idle Flag

os_NetworkNetIpv4TcpFinTimeout

6 →600 seconds

Network TCP timeout

os_NetworkRfs

0→131072

If enabled increases datacache hitrate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running

Storage

Parameter

Default value

Domain

Description

os_StorageReadAhead

128 KB

0→1024 KB

Read-ahead speeds up file access by pre-fetching data and loading it into the page cache so that it can be available earlier in memory instead of from disk

os_StorageNrRequests

1000 packets

100→10000 packets

Network Max Backlog

os_StorageRqAffinity

1→2

Storage Requests Affinity

os_StorageQueueScheduler

none

none kyber mq-deadline bfq

Storage Queue Scheduler Type

os_StorageNomerges

0→2

os_StorageMaxSectorsKb

256 KB

32→256 KB

The largest IO size that the OS c

Spark History Server metrics mapping

This page describes the mapping between metrics provided by Spark History Server to Akamas metrics for each supported component type

Component Type

Notes

Spark Application

Component metric

Granularity

Document Path

JSON query

spark_duration

job

/{appId}/1/jobs/{jobId}

.duration

spark_completed_tasks

job

/{appId}/1/jobs/{jobId}

.numCompletedTasks

spark_active_tasks

job

/{appId}/1/jobs/{jobId}

.numActiveTasks

spark_skipped_tasks

job

/{appId}/1/jobs/{jobId}

.numSkippedTasks

spark_failed_tasks

job

/{appId}/1/jobs/{jobId}

.numFailedTasks

spark_killed_tasks

job

/{appId}/1/jobs/{jobId}

.numKilledTasks

spark_completed_stages

job

/{appId}/1/jobs/{jobId}

.numCompletedStages

spark_failed_stages

job

/{appId}/1/jobs/{jobId}

.numFailedStages

spark_skipped_stages

job

/{appId}/1/jobs/{jobId}

.numSkippedStages

spark_active_stages

job

/{appId}/1/jobs/{jobId}

.numActiveStages

spark_duration

stage

/{appId}/1/stages/{stageId}

.getDuration

spark_task_stage_executor_run_time

stage

/{appId}/1/stages/{stageId}

.getExecutorRunTime

spark_task_stage_executor_cpu_time

stage

/{appId}/1/stages/{stageId}

.getExecutorCpuTime

spark_active_tasks

stage

/{appId}/1/stages/{stageId}

.getNumActiveTasks

spark_completed_tasks

stage

/{appId}/1/stages/{stageId}

.getNumCompleteTasks

spark_failed_tasks

stage

/{appId}/1/stages/{stageId}

.getNumFailedTasks

spark_killed_tasks

stage

/{appId}/1/stages/{stageId}

.getNumKilledTasks

spark_task_stage_input_bytes_read

stage

/{appId}/1/stages/{stageId}

.getInputBytes

spark_task_stage_input_records_read

stage

/{appId}/1/stages/{stageId}

.getInputRecords

spark_task_stage_output_bytes_written

stage

/{appId}/1/stages/{stageId}

.getOutputBytes

spark_task_stage_output_records_written

stage

/{appId}/1/stages/{stageId}

.getOutputRecords

spark_stage_shuffle_read_bytes

stage

/{appId}/1/stages/{stageId}

.getShuffleReadBytes

spark_task_stage_shuffle_read_records

stage

/{appId}/1/stages/{stageId}

.getShuffleReadRecords

spark_task_stage_shuffle_write_bytes

stage

/{appId}/1/stages/{stageId}

.getShuffleWriteBytes

spark_task_stage_shuffle_write_records

stage

/{appId}/1/stages/{stageId}

.getShuffleWriteRecords

spark_task_stage_memory_bytes_spilled

stage

/{appId}/1/stages/{stageId}

.getMemoryBytesSpilled

spark_task_stage_disk_bytes_spilled

stage

/{appId}/1/stages/{stageId}

.getDiskBytesSpilled

spark_duration

task

/{appId}/1/stages/{stageId}

.tasks[].duration

spark_task_executor_deserialize_time

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.executorDeserializeTime

spark_task_executor_deserialize_cpu_time

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.executorDeserializeCpuTime

spark_task_stage_executor_run_time

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.executorRunTime

spark_task_stage_executor_cpu_time

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.executorCpuTime

spark_task_result_size

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.resultSize

spark_task_jvm_gc_duration

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.jvmGcTime

spark_task_result_serialization_time

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.resultSerializationTime

spark_task_stage_memory_bytes_spilled

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.memoryBytesSpilled

spark_task_stage_disk_bytes_spilled

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.diskBytesSpilled

spark_task_peak_execution_memory

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.peakExecutionMemory

spark_task_stage_input_bytes_read

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.inputMetrics.bytesRead

spark_task_stage_input_records_read

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.inputMetrics.recordsRead

spark_task_stage_output_bytes_written

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.outputMetrics.bytesWritten

spark_task_stage_output_records_written

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.outputMetrics.recordsWritten

spark_task_shuffle_read_remote_blocks_fetched

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.shuffleReadMetrics.remoteBlocksFetched

spark_task_shuffle_read_local_blocks_fetched

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.shuffleReadMetrics.localBlocksFetched

spark_task_shuffle_read_fetch_wait_time

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.shuffleReadMetrics.fetchWaitTime

spark_task_shuffle_read_remote_bytes

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.shuffleReadMetrics.remoteBytesRead

spark_task_shuffle_read_remote_bytes_to_disk

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.shuffleReadMetrics.remoteBytesReadToDisk

spark_task_shuffle_read_local_bytes

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.shuffleReadMetrics.localBytesRead

spark_task_stage_shuffle_read_records

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.shuffleReadMetrics.recordsRead

spark_task_stage_shuffle_write_bytes

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.shuffleWriteMetrics.bytesWritten

spark_task_shuffle_write_time

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.shuffleWriteMetrics.writeTime

spark_task_stage_shuffle_write_records

task

/{appId}/1/stages/{stageId}

.tasks[].taskMetrics.shuffleWriteMetrics.recordsWritten

spark_executor_rdd_blocks

executor

/{appId}/1/allexecutors

select(.id!='driver) | .rddBlocks

spark_executor_mem_used

executor

/{appId}/1/allexecutors

select(.id!='driver) | .memoryUsed

spark_executor_disk_used

executor

/{appId}/1/allexecutors

select(.id!='driver) | .diskUsed

spark_executor_cores

executor

/{appId}/1/allexecutors

select(.id!='driver) | .totalCores

spark_active_tasks

executor

/{appId}/1/allexecutors

select(.id!='driver) | .activeTasks

spark_failed_tasks

executor

/{appId}/1/allexecutors

select(.id!='driver) | .failedTasks

spark_completed_tasks

executor

/{appId}/1/allexecutors

select(.id!='driver) | .completedTasks

spark_executor_total_tasks

executor

/{appId}/1/allexecutors

select(.id!='driver) | .totalTasks

spark_executor_total_duration

executor

/{appId}/1/allexecutors

select(.id!='driver) | .totalDuration

spark_executor_total_jvm_gc_duration

executor

/{appId}/1/allexecutors

select(.id!='driver) | .totalGCTime

spark_executor_total_input_bytes

executor

/{appId}/1/allexecutors

select(.id!='driver) | .totalInputBytes

spark_executor_total_shuffle_read

executor

/{appId}/1/allexecutors

select(.id!='driver) | .totalShuffleRead

spark_executor_total_shuffle_write

executor

/{appId}/1/allexecutors

select(.id!='driver) | .totalShuffleWrite

spark_executor_max_mem_used

executor

/{appId}/1/allexecutors

select(.id!='driver) | .maxMemory

spark_executor_used_on_heap_storage_memory

executor

/{appId}/1/allexecutors

select(.id!='driver) | .memoryMetrics.usedOnHeapStorageMemory

spark_executor_used_off_heap_storage_memory

executor

/{appId}/1/allexecutors

select(.id!='driver) | .memoryMetrics.usedOffHeapStorageMemory

spark_executor_total_on_heap_storage_memory

executor

/{appId}/1/allexecutors

select(.id!='driver) | .memoryMetrics.totalOnHeapStorageMemory

spark_executor_total_off_heap_storage_memory

executor

/{appId}/1/allexecutors

select(.id!='driver) | .memoryMetrics.totalOffHeapStorageMemory

spark_driver_rdd_blocks

driver

/{appId}/1/allexecutors

select(.id=='driver') | .rddBlocks

spark_driver_mem_used

driver

/{appId}/1/allexecutors

select(.id=='driver') | .memoryUsed

spark_driver_disk_used

driver

/{appId}/1/allexecutors

select(.id=='driver') | .diskUsed

spark_driver_cores

driver

/{appId}/1/allexecutors

select(.id=='driver') | .totalCores

spark_driver_total_duration

driver

/{appId}/1/allexecutors

select(.id=='driver') | .totalDuration

spark_driver_total_jvm_gc_duration

driver

/{appId}/1/allexecutors

select(.id=='driver') | .totalGCTime

spark_driver_total_input_bytes

driver

/{appId}/1/allexecutors

select(.id=='driver') | .totalInputBytes

spark_driver_total_shuffle_read

driver

/{appId}/1/allexecutors

select(.id=='driver') | .totalShuffleRead

spark_driver_total_shuffle_write

driver

/{appId}/1/allexecutors

select(.id=='driver') | .totalShuffleWrite

spark_driver_max_mem_used

driver

/{appId}/1/allexecutors

select(.id=='driver') | .maxMemory

spark_driver_used_on_heap_storage_memory

driver

/{appId}/1/allexecutors

select(.id=='driver') | .memoryMetrics.usedOnHeapStorageMemory

spark_driver_used_off_heap_storage_memory

driver

/{appId}/1/allexecutors

select(.id=='driver') | .memoryMetrics.usedOffHeapStorageMemory

spark_driver_total_on_heap_storage_memory

driver

/{appId}/1/allexecutors

select(.id=='driver') | .memoryMetrics.totalOnHeapStorageMemory

spark_driver_total_off_heap_storage_memory

driver

/{appId}/1/allexecutors

select(.id=='driver') | .memoryMetrics.totalOffHeapStorageMemory

FileConfigurator Operator

The FileConfigurator operator allows configuring systems tuned by Akamas by interpolating configuration parameters into files on remote machines.

The operator performs the following operations:

It reads an input file from a remote machine containing templates for interpolating the configuration parameters generated by Akamas
It replaces the values of configuration parameters in the input file
It writes the file with replaced configuration parameters on a specified path on another remote machine

Access on remote machines is performed using SFTP (SSH).

Templates for configuration parameters

The FileConfigurator allows writing templates for configuration parameters in two ways:

specify that a parameter should be interpolated directly:

${component_name.parameter_name}

specify that all parameters of a component should be interpolated:

${component_name.*}

Suffix or prefix for interpolated parameters

It is possible to add a prefix or suffix to interpolated configuration parameters by acting at the component-type level:

Notice that any parameter that does not contain the FileConfigurator element in the operators' attribute is ignored and not written.

name: Component Type 1
description: My Component type
parameters:
- name: x1
  domain:
    type: real
    domain: [-5.0, 10.0]
  defaultValue: -5.0
  # Under this section, the operator to be used to configure the parameters is defined
  operators:
    FileConfigurator:
        # using this OPTIONAL confTemplate property is possible to interpolate the parameter value with a prefix and a suffix
        confTemplate: "PREFIX${value}SUFFIX"

In the example above, the parameter x1 will be interpolated with the prefix PREFIX and the suffix SUFFIX, ${value} will be replaced with the actual value of the parameter at each experiment.

Example

Let's assume we want to apply the following configuration:

component1.param1: 1024
component1.param2: Category1
component2.param3: 7
component2.param4: 35.4

where component1 is of type MyComponentType and MyComponentType is defined as follows:

name: MyComponentType
description: "MyComponentType
parameters:
- name: param1
  domain:
    type: real
    domain: [-5.0, 10.0]
  defaultValue: -5.0
  # Under this section, the operator to be used to configure the parameters is defined
  operators:
    FileConfigurator:
        # using this OPTIONAL confTemplate property is possible to interpolate the parameter value with a prefix and a suffix
        confTemplate: "X1:${value}MB"
...

A template file to interpolate only parameter component1.param1 and all parameters from component2 would look like this:

myexecutable.sh -PARAM ${component1.param1} -PARAMS ${component2.*}

The file after the configuration parameters are interpolated would look like this:

myexecutable.sh -PARAM X1:1024MB -PARAMS 7 35.4

Note that the file in this example contains a bash command whose arguments are constructed by interpolating configuration parameters. This represents a typical use case for the File Configurator: to construct the right bash commands that will configure a system with the new configuration parameters computed by Akamas.

Operator arguments

Name

Type

Value Restrictions

Required

Default

Description

source

Object

should have a structure like the one defined in the next section

no, if the Component whose name is defined in component has properties that map to the ones defined within source

Information relative to the source/input file to be used to interpolate optimal configuration parameters discovered by Akamas

target

Object

should have a structure like the one defined in the next section

no, if the Component whose name is defined in component has properties that map to the ones defined within target

Information relative to the target/output file to be used to interpolate optimal configuration parameters discovered by Akamas

component

String

should match the name of an existing Component of the System under test

The name of the Component whose properties can be used as arguments of the operator

ignoreUnsubstitutedTokens

Boolean

False

Behavior of the operator regarding leftover tokens in the target file. When False, FileConfigurator fails. When True , FileConfigurator succeeds regardless of leftover tokens

`source` and `target` structures and arguments

Here follows the structure of either the source or target operator argument

Name

Type

Value restrictions

Required

Default

Description

hostname

String

should be a valid SSH host address

yes

SSH endpoint

username

String

yes

SSH login username

password

String

cannot be set if key is already set

SSH login password

sshPort

Number

1≤sshPort≤65532

SSH port

key

String

cannot be set if password is already set

SSH login key, provided directly its value or the path of the file to import from. The operator supports RSA and DSA Keys

path

String

should be a valid path

yes

The path of the file to be used either as the source or target of the activity to applying Akamas computed configuration parameters using files

Get operator arguments from `component`

The component argument can be used to refer to a component by name and use its properties as the arguments of the operator. In case the mapped arguments are already provided to the operator, there is no override.

In this case, the operator replaces in the template file only tokens referring to the specified component. A parameter bound to any component will cause the substitution to fail.

Component property to operator argument mapping

Component property

Operator argument

hostname

source->hostname target->hostname

username

source->username target->username

sshPort

source->sshPort target->sshPort

password

source->password target->password

key

source->key target->key

sourcePath

source->path

targetPath

target->path

Examples

Configure parameters for an Apache server with explicit source and target machines information

name: RemoteConfOperatorTestStandalone
operator: FileConfigurator
arguments:
    source:
        hostname: template-server
        username: akamas-user1
        password: akamas-password1
        path: /templates/frontend-httpd.conf
    target:
        hostname: frontend-server
        username: akamas-user2
        password: akamas-password22
        path: /etc/httpd/httpd.conf

Configure parameters for an Apache server with information taken from a Component

name: RemoteConfOperatorTestStandalone
operator: FileConfigurator
arguments:
    component: apache-server-1

where the apache-server-1 component is defined as:

name: apache-server-1
description: The Apache server instance
componentType: Apache Server 2.4

properties:
  hostname: apache.akamas.io
  username: ubuntu
  key: key.pem
  sourcePath: templates/httpd.conf.template
  targetPath: /etc/httpd/httpd.conf