Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This guide provides a Glossary describing Akamas key concepts with their associated construct templates, command line commands, and user interfaces.
This guide also provides references to:
Workflow Operators structure and operators
metric mapping
metrics and parameters
to administer Akamas, manage users, authenticate and manage its resources
This section provides a definition of Akamas' key concepts and terms and also provides references to the related construct properties, commands, and user interfaces.
systems targeted by optimization studies
The Windowing field in a study specifies the windowing policy to be adopted to score the experiments of an optimization study.
The two available windowing strategies have different structures:
Trim windowing: trim the temporal interval of a trial, both from the start and the end of a specified temporal amount - this is the default strategy
: discard temporal intervals in which a given metric is not stable and selects the temporal interval in which a metric is maximized or minimized.
In case the windowing strategy is not specified, the entire time window is considered.
The MetricSelection field in a study specifies the metrics of the system. Such selection has only the purpose to specify which metrics need to be tracked while running the study and does not affect the optimization.
In case this selection is not specified, all metrics are considered.
A metrics selection can either assume the value of all to indicate that all the available metrics of the system of the study should be tracked, or can assume the value of a list of the names of metrics of the system that should be tracked prepended with the name of the component.
The following fragment is an example:
A metric is a measured property of a system.
Examples of a metric include:
the response time of an application
the utilization of a CPU
A system represents of the entire system which is the target of optimization.
A system is a single object irrespective of the number or type of entities or layers that are in the scope of the optimization. It can be used to model and describe a wide set of entities like:
An N-layers application
A single micro-service
A telemetry provider is a software object that represents a data source of metrics. A is a specific instance of a telemetry provider that refers to a specific data source.
Examples of telemetry providers are:
monitoring tools (e.g. Prometheus or Dynatrace)
load testing tools (e.g. LoadRunner or Neoload)
Telemetry Providers are defined using a YAML manifest with the following structure:
with the following properties:
The workloadsSelection is a structure used to define the metrics that are used by Akamas to model workloads as part of a live optimization study.
with the following fields:
A component type is a blueprint for a component that describes the type of entity the component refers to. In Akamas, a component needs to be associated with a component type, from which the component inherits its metrics and parameters.
Component types are platform entities (i.e.: shared among all the users) usually provided off the shelf and shipped within the . Typically, different component types within the same optimization pack are used to model different versions/releases of the same technology.
Akamas' users with appropriate privileges can create custom component types and optimization packs, as described on the page.
A workspace is a virtual environment that groups systems, workflows, and studies to restrict user access to them: a user can access these resources only when granted the required permissions to that workspace.
Akamas defines two user roles according to the assigned permission on the workspace:
Contributors (write permission) can create and manage workspace resources (studies, telemetry instances, systems, and workflows) and can also do exports/imports, view all global resources (Optimization Packs, and Telemetry Providers), and see remaining credits;
All operators accept some common, optional, arguments that allow you to control how the operator is executed within your workflow.
The following table reports all the arguments that can be used with any operator.
A telemetry instance is an instance of a that collects data from a specific instance of the data source.
A telemetry instance is an instance of a telemetry provider, providing the required information on how to connect and collect a given set of metrics from a specific data source.
While telemetry providers are platform-wide entities, telemetry instances are defined at each system level.
retries
integer
-
no
1
How many times a task can be re-executed in case of failures. If a task reaches the maximum number of retries and fails the entier workflow execution is aborted and the trial is considered failed.
retry_delay
string
string (supporting seconds, minutes and hours) int (seconds only)
no
5m
How much time to wait before retrying a failed task.
timeout
string
string (supporting seconds, minutes and hours) int (seconds only)
no
Infinite
The maximum time a task can run before considering a failure. If the timeout exceeds the task is considered failed.
elements of the system
types associated to a system component
objects encapsulating knowledge about component types
a measured metric, collected via telemetry providers
tunable parameters, set via native or other interfaces
general definition of providers of collected metrics
specific instances of telemetry providers
automation workflow to set parameters, collect metrics and run load testing
goal and constraints defined for an optimization study
optimization studies for a target system
optimization studies for a non-live system
optimization studies for a live system
virtual environments to organize and isolate resources
While Akamas leverages similar AI methods for both live optimizations and optimization studies, the way these methods are applied is radically different. Indeed, for optimization studies running in pre-production environments, the approach is to explore the configuration space by also accepting potential failed experiments, to identify regions that do not correspond to viable configurations. Of course, this approach cannot be accepted for live optimization running in production environments. For this purpose, Akamas live optimization uses observations of configuration changes combined with the automatic detection of workload contexts and provides several customizable safety policies when recommending configurations to be approved, revisited, and applied.
Akamas provides a few customizable optimizer options (refer to the options described on the Optimize step page of the reference guide) that should be configured so as to make configurations recommended in live optimization and applied to production environments as safe as possible.
Akamas provides an optimizer option known as the exploration factor that only allows gradual changes to the parameters. This gradual optimization allows Akamas to observe how these changes impact the system behavior before applying the following gradual changes.
By properly configuring the optimizer, Akamas can gradually explore regions of the configuration space and slowly approach any potentially risky regions, thus avoiding recommending any configurations that may negatively impact the system. Gradual optimization takes into account the maximum recommended change for each parameter. This is defined as a percentage (default is 5%) with respect to the baseline value. For example, in the case of a container whose CPU limit is 1000 millicores, the corresponding maximum allowed change is 50 millicores. It is important to notice that this does not represent an absolute cap, as Akamas also takes into account any good configurations observed. For example, in the event of a traffic peak, Akamas would recommend a good configuration that was observed working fine for a similar workload in the past, even if the change is higher than 5% of the current configuration value.
Notice that this feature would not work for categorical parameters (e.g. JVM GC Type) as their values do not change incrementally. Therefore, when it comes to these parameters, Akamas by default takes a conservative approach of only recommending configurations with categorical parameters taking already observed before values. This still allows some never-observed values to be recommended as users are allowed to modify values also for categorical parameters when operating in human-in-the-loop mode. Once Akamas has observed that that specific configuration is working fine, the corresponding value can then be recommended. For example, a user might modify the recommended configuration for GC Type from Serial to Parallel. Once Parallel has been observed as working fine, Akamas would consider it for future recommendations of GC Type, while other values (e.g. G1) would not be considered until verified as safe recommendations.
The exploration factor can be customized for each live optimization individually and changed while live optimizations are running.
Akamas provides an optimizer option known as the safety factor designed to prevent Akamas from selecting configurations (even if slowly approaching them) that may impact the ability to match defined SLOs. For example, when optimizing container CPU limits, lower and lower CPU limits might be recommended, up to the point that the limit becomes too low that the application performance degrades.
Akamas takes into account the magnitude of constraint breaches: a severe breach is considered more negative than a minor breach. For example, in the case of an SLO of 200 ms on response time, a configuration causing a 1 sec response time is assigned a very different penalty than a configuration causing a 210 ms response time. Moreover, Akamas leverages the smart constraint evaluation feature that takes into account if a configuration is causing constraints to approach their corresponding thresholds. For example, in the case of an SLO of 200 ms on response time, a configuration changing response time from 170 ms to 190 ms is considered more problematic than one causing a change from 100 ms to 120 ms. The first one is considered by Akamas as corresponding to a gray area that should not be explored.
The safety factor is also used when starting the study in order to validate the behavior of the baseline to identify the safety of exploring configurations close to the baseline. If the baseline presents some constraint violations, then even exploring configurations close to the baseline might cause a risk. If Akamas identifies that, in the baseline configuration, more than (safety_factor*number_of_trials) manifest constraint violations then the optimization is stopped.
If your baseline has some trials failing constraint validation we suggest you analyze them before proceeding with the optimization
The safety factor is set by default to 0.5 and can be customized for each live optimization individually and changed while live optimizations are running.
It is also worth mentioning that Akamas also features an outlier detection capability to compensate for production environments typically being noisy and much less stable than staging environments, thus displaying highly fluctuating performance metrics. As a consequence, constraints may fail from time to time, even for perfectly good configurations. This may be due to a variety of causes, such as shared infrastructure on the cloud, slowness of external systems, etc.
metricsSelection:
- application.response_time
- application.error_rate
- jvm1.gc_timeThe name of the Telemetry Provider. This name will be used to reference the Telemetry Provider in the Telemetry Provider Instances. This is unique in an Akamas instance
yes
description
string
A description for the Telemetry Provider
yes
dockerImage
string
The docker image of the Telemetry Provider.
yes
Refer to the page Integrating Telemetry Providers which describes the out-of-the-box Telemetry Providers that are created automatically at Akamas install time.
name
string
name
string
should match the following syntax:
component_name.metric_name<:aggregation>
where component_name is an existing component, metric_name is an existing metric associated with the component-type of the component component_name, and aggregation is an optional aggregation (default avg)
TRUE
The metric of the component that represents the workload
Notice that workload metrics must have been defined in the metricsSelection. Variables used in the name field can include an aggregation. The following aggregations are available: avg, min, max, sum, p90, p95, p99.
The following refers to a workload represented by the metric transactions_throughput of the konakart component with multiple aggregations:
name: "<string>"
description: "<string>"
dockerImage: "<string>"workloadsSelection:
- name: component1.metric1
- name: component2.metric2:p95workloadsSelection:
- name: konakart.transactions_throughput
- name: konakart.transactions_throughput:p95the amount of time spent in garbage collection
the cost of a cloud service
Metrics are used to both specify the optimization goal and constraints (e.g. minimize the heap size while keeping response time < 1000 and error rate <= 10% of a baseline value), and to assess the behavior of the system with respect to each specific configuration applied.
A metric is described by the following properties:
a name that uniquely identifies the metric
a description that clarifies the semantics of the metric
a unit that defines the unit of measurement used by the metric
The construct to be used to define a metric is described on the Metric template page.
Metrics are displayed in the Akamas UI when drilling down to each system component.
and are represented in metric charts for each specific optimization study
Please notice that in order for a metric to be displayed in the Akamas UI, it has to be collected from a Telemetry Provider by means of a specific Telemetry Instance defined for each specific target system.
A single (or a collection of) batch job(s)
A System is made of one or more components. Each component represents one of the elements in the system, whose parameters are involved in the optimization or whose metrics are collected to evaluate the results of such optimization.
A system is described by the following properties:
The full micro-services stack of an application
a name that uniquely identifies the system
a description that clarifies what the system refers to
The construct to be used to define a system is described on the System template page.
A system is an Akamas resource that can be managed via CLI using the resource management commands.
The Akamas UI displays systems (depending on the user privileges on the defined workspaces) in a specific top-level menu.
CSV files
A telemetry provider is a platform-wide entity that can be reused across systems to ease the integration with metrics sources.
Akamas provides a number of out-of-the-box telemetry providers. Custom telemetry providers can also be created.
The construct to be used to define a telemetry provider is described on the Telemetry Provider template page.
A telemetry provider is an Akamas resource that can be managed via CLI using the resource management commands.
The Akamas UI shows systems in a specific top-level menu.
A component type is described by the following mandatory properties (other properties can be defined but are not mandatory):
a name that uniquely identifies the component type within the system
a description that clarifies what the component type refers to
a parameter definitions array (more on Parameters later)
a metrics array (more on Metrics later)
The construct to be used to define a component type is described on the Component type template page.
A component type is an Akamas resource that can be managed via CLI using the resource management commands.
When visualizing system components the component type is displayed.
The following figure shows the out-of-the-box JVM component types related to the JVM optimization pack.
an optimization objective: either maximize or minimize
a scoring function (scalar): either a single metric or a formula defined by one or more metrics
One or more constraints can be associated with a goal
a formula defined on one or more metrics, referring to either absolute values (absolute constraints) or relative to a baseline value (relative constraints)
Notice that relative constraints are only supported by offline optimization studies while absolute constraints are supported by both offline and online optimization studies.
Goals and constraints are not an Akamas resource as they are defined as part of an optimization study. The construct to be used to define a goal and its constraints are described in the Goal & Constraint page of the Study template section.
Goals and constraints are not an Akamas resource and are always defined as part of an optimization study.
Goals and constraints are displayed in the Akamas UI when drilling down each optimization study.
The detail of the formula used to define the goal may also be displayed:
Workspaces and accesses are managed by users with administrative privileges. A user with administrator privileges can manage licenses, users, workspaces, and install/deinstall Optimization Packs, and Telemetry Providers.
Workspaces can be defined according to different criteria, such as:
By department (e.g. Performance, Development)
By initiative (e.g. Poc, Training)
By application (e.g. Registry, Banking..)
A workspace is described by the following property:
a name that uniquely identifies the workspace
A workspace is an Akamas resource that can be managed via CLI using the resource management commands. See also this page devoted to commands on how to define users and workspaces.
The workspace a study belongs to is always displayed. Filters can be used to select only studies belonging to specific workspaces
The construct to be used to define a telemetry instance is described on the Telemetry Instance template page.
A telemetry provider is an Akamas resource that can be managed via CLI using the resource management commands.
Telemetry instances are displayed in the Akamas UI when drilling down each system component.
MS .NET 3.1
Here’s the command to install the DotNet optimization pack using the Akamas CLI:
$ akamas install optimization-pack DockerA component represents an element of a system. Typically, systems are made up of different entities and layers which can be modeled by components. In other words, a system can be considered a collection of related components.
Notice that a component is a black-box definition of each entity involved in an optimization study, so detailed modeling of the entities being involved in the optimization is not required. The only relevant elements are the parameters that are involved in the optimization and the metrics that are collected to evaluate the results of such an optimization.
Notice that only the entities that are directly involved in the optimization need to be modeled and defined within Akamas. An entity is involved in an optimization study if it is optimized or monitored by Akamas, where "optimized" means that Akamas is optimizing at least one of its parameters, and "monitored" means that Akamas is monitoring at least one of its metrics.
A component is described by the following mandatory properties (other properties can be defined but are not mandatory):
a name that uniquely identifies the component within the system
a description that clarifies what the component refers to
a component type that identifies the technology of the component (see )
In general, a component contains a set of each of the following:
parameter(s) in the scope of the optimization
metric(s) needed to define the optimization goal
metric(s) needed to define the optimization constraints
The construct to be used to define a component is described on the page.
A component is an that can be managed via CLI using the
The Akamas UI shows more details about components by drilling down their respective system.
A workflow is a set of tasks that run in sequence to evaluate a configuration as part of an optimization study. A task is a single action performed within a workflow.
Workflows allow you to automate Akamas optimization studies, by automatically executing a sequence of tasks such as initializing an environment, triggering load testing, restoring a database, applying configurations, and much more.
These are examples of common tasks:
Launch remote commands via SSH
Apply parameter values in configuration files
Execute Spark jobs via spark-submit API
Start performance tests by integrating with external tools such as Neoload
Workflows are first-class entities that can be defined globally and then used in multiple optimization studies.
Akamas provides several that can be used to perform tasks in a workflow. Some operators are general-purpose, such as those executing a command or script on a specific host, while others provide native integrations with specific technologies and tools, such as Spark History Server or load testing tools.
The construct to be used to define a workflow is described on the page.
A telemetry provider is an that can be managed via CLI using the
The Akamas UI shows systems in a specific top-level menu.
The list of tasks is displayed when drilling down to each specific workflow.
A windowing policy of type trim trims the temporal interval of a trial, both from the start and from the end of a specified temporal amount (e.g., 3 seconds).
The trim windowing has the following structure:
In case a windowing policy is not specified, the default windowing corresponding to trim[0s,0s] is considered.
The following fragment shows a windowing strategy of type "trim" where the time window is specified to start 10s after the beginning of the trial and to end immediately before the end of the trial:
An optimization study (or study for short) represents an optimization initiative aimed at optimizing a goal on a target system. A study instructs Akamas about the space to explore and the KPIs used to evaluate whether a con configuration is good or bad
Akamas supports two types of optimizations:
Offline Optimization Studies are optimization studies where the workload is simulated by leveraging a load-testing tool.
are applied to systems that need to be optimized in production with respect to varying workloads observed while running live. For example, a microservices application can be optimized live by having Kubernetes and JVM parameters dynamically tuned for multiple microservices so as to minimize costs while matching response time objectives.
A study is described by the following properties
system: the under optimization
parameters: the set of being optimized
metrics: the set of to be collected
The construct to be used to define an optimization is described on the page.
An optimization study is an that can be managed via CLI using the
The Akamas UI shows optimization studies in 2 specific top-level menus: one for offline optimization studies and another for live optimization studies.
The ParameterSelection field in a study specifies which parameters of the system should be tuned while running the optimization study.
In case this selection is not specified, all parameters are considered.
A parameter selection can either assume the value of all to indicate that all the available parameters of the system of the study should be tuned, or can assume the value of a list with items of the shape like the one below:
Notice that, by default, every parameter specified in the parameters selection of a study is applied. This can be modified, by leveraging the options.
The following fragment is an example:
A KPI is a metric that is worth considering when analyzing the result of an offline optimization study, looking for (sub)optimal configurations generated by Akamas AI to be applied.
Akamas automatically considers any metric referred to in the defined optimization goal and constraints for an offline optimization study as a KPI. Moreover, any other metrics of the system component can be specified as a KPI for an offline optimization study.
A KPI is defined as follows (from the UI or the CLI):
KPIs are not an as they are defined as part of an optimization study. The construct to define KPIs is described on the page of the section.
KPIs are not an and are always defined as part of an optimization study.
The number and first KPIs are displayed in the Akamas UI in the header of each offline optimization study.
The full list of KPIs is displayed by drilling down to the KPIs section.
From this section, it is possible to modify the list of KPIs and change their names and other attributes.
An optimization pack is a software object that provides a convenient facility for encapsulating all the knowledge (e.g. metrics, parameters with their default values and domain ranges) required to apply Akamas optimizations to a set of entities associated with the same technology.
Notice that while optimization packs are very convenient for modeling systems and creating studies, it is not required for these entities to be covered by an optimization pack.
Akamas provides a library of out-the-box optimization packs and new custom optimization packs can be easily added (no coding is required).
An optimization pack needs to include the entities that encapsulate technology-specific information related to the supported component types:
supported component types
parameters and metrics for each component type
supported telemetry providers (optional)
An optimization pack is an that can be managed via CLI using the
The Akamas UI shows systems in a specific top-level menu.
An optimization pack encapsulates one or more of the following technology-specific elements:
Component Types: these represent the type of the component(s) included, each with its associated parameters and metrics
Telemetry Providers: that define where to collect metrics
An optimization pack enables Akamas users to optimize a technology without necessarily being an expert in that technology and to code their knowledge about a technology or a specific application to be reused in multiple optimization studies to ease the modeling process.
The Steps field in a study specifies the sequence of steps executed while running the study. These steps are run in exactly the same order in which they will be executed.
The following types of steps are available:
Baseline: performs an experiment and sets it as a baseline for all the other ones
: imports experiments from other studies
: performs an experiment with a specific configuration
: performs experiments and generates optimized configurations
Notice that the structure corresponding to the steps is different for the different types of steps.
The Web Application optimization pack provides a component type apt for monitoring the performances from the end-user perspective of a generic web application, to evaluate the configuration of the technologies in the underlying stack.
The bundled component type provides Akamas with performance metrics representing concepts like throughput, response time, error rate, and user load, split into different levels of detail such as transaction, page, and single request.
Here’s the command to install the Web Application optimization pack using the Akamas CLI:
The Node JS optimization pack enables the ability to optimize applications based on Node running in V8.
The following component types are supported for NodeJS applications.
Node JS 18 runtime
Here’s the command to install the Node JS optimization pack using the Akamas CLI:
For more information on the process of installing or upgrading an optimization pack refer to .
The renderParameters and doNotRenderParameters can be used to specify which configuration parameters should be rendered when doing experiments within a step.
Parameter rendering can be defined at the step level for baseline, preset, and optimize steps. This is not possible for bootstrap steps as bootstrapped experiments are not executed.
The Java-OpenJDK optimization pack enables the ability to optimize Java applications based on the OpenJDK and Oracle HotSpot JVM. Through this optimization pack, Akamas is able to tackle the problem of performance of JVM-based applications from both the point of view of cost savings and quality of service.
To achieve these goals the optimization pack provides parameters that focus on the following areas:
Garbage collection
The OpenJ9 optimization pack enables the ability to optimize Java applications based on the Eclipse OpenJ9 VM, formerly known as IBM J9. Through this optimization pack, Akamas is able to tackle the problem of performance of JVM-based applications from both the point of view of cost savings and quality of service.
To achieve these goals the optimization pack provides parameters that focus on the following areas:
Garbage collection
The Kubernetes optimization pack allows optimizing containerized applications running on a Kubernetes cluster. Through this optimization pack, Akamas is able to tackle the problem of distributing resources to containerized applications in order to minimize waste and ensure the quality of service.
To achieve these goals the optimization pack provides parameters that focus on the following areas:
Memory allocation
This section documents the mapping between the metrics provided by Telemetry Providers and the Akamas metrics for each supported component type.
Systems are defined using a YAML manifest with the following structure:
with the following properties:
name: branin
description: The branin analytical function
componentType: function_branin
properties:
hostname: function-serverworkflow: the workflow describing tasks to perform experiments/trials
goal: the desired optimization goal to be achieved
constraints: the optimization constraints that any configuration needs to satisfy
steps: the steps that are executed to run specific configurations (e.g. the baseline) and run the optimization
metric(s) that are not needed to either define the optimization goal or constraints, and hence not used by Akamas to perform the optimization, but are collected in order to support the analysis (and which can be possibly added at a later time as part of optimization goal or constraint when refining the optimization).













A valid metric aggregation such as min, max, avg, sum, p95, etc. If unspecified, default is avg
Name
Name of the KPI that will be used on UI labels
Formula
Must be defined as <Component_name>.<metric_name>
Direction
Must be 'minimize' or 'maximize'


Aggregation
bytes
Memory requested for the namespace
k8s_namespace_running_pods
pdds
The number of running pods in the namespace
k8s_namespace_cpu_limit
millicores
The CPU limit for the namespace
k8s_namespace_cpu_request
millicores
The CPUs requested for the namespace
k8s_namespace_memory_limit
bytes
The memory limit for the namespace
k8s_namespace_memory_request
How to trim the temporal interval of a trial to get the window. ["0s", "10m"] means trim 0 seconds from the start of the interval, 10 minutes from the end. ["0s", "1h"] means trim 0 seconds from the start, 1 hour from the end
task
string
The name of a task of the workflow associated with the study
FALSE
If the field is specified, the trim offset calculation for the window will be applied from the start time of the assigned task. Otherwise, it will be calculated from the start time of the trial.
type
string
{trim}
TRUE
the type of windowing strategy
trim
array of strings
The length of the array should be two.
Valid values should have the form of a whole number followed by either "s", "m", or "h"
TRUE
name
string
should match the following regexp:
^[a-zA-Z][a-zA-Z0-9_]*$
that is only letters, number and underscores, no initial number of underscore
Notice: this should not match the name of another component
TRUE
The name of the component.
description
string
TRUE
A description to characterize the component.
componentType
string
notice: this should match the name of an existing component-type
TRUE
The name of the component-type that defines the type of the component.
properties
object
FALSE
General custom properties of the component. These properties can be defined freely and usually have the purpose to expose information useful for configuring the component.
A custom domain for the parameter to be used only for the study
categories
array of string
should be set only if the parameter has a domain of type categorical, and be compatible with the domain defined in the component-type the component_name refers to.
FALSE
A custom set of categories for the parameter to be used only for the study.
name
string
should match the following syntax:
component_name.parameter_name
where component_name is an existing component, where parameter_name is an existing parameter that is associated with the component-type of the component component_name
TRUE
The name of the parameter to be tuned including the name of the component it refers to
domain
array of numbers
should be of size 2, contain either all integers or real numbers(do not omit the "."), be set only if the parameter has a domain of type integer or real,and be compatible with the domain defined in the component-type the component_name refers to
FALSE
seconds
Number (integer)
seconds > 0
Yes
The number of seconds for which pause the workflow
Web Application
akamas install optimization-pack Web-Applicationakamas install optimization-pack NodeJSIBM WebSphere Application Server 8.5
IBM WebSphere Liberty ND
akamas install optimization-pack WebSphereDocker container
akamas install optimization-pack DockerrenderParameters
Array of strings
should contain strings in the form component.parameter or component.*. The latter means every parameter of the component
No
Which configuration parameters should be rendered or applied when doing experiments/trials in addition to ones in the parameters selection or in the values if the step is of type baseline or preset
doNotRenderParameters
Array of strings
should contain strings in the form component.parameter or component.*. The latter means every parameter of the component
No
Which configuration parameters should not be rendered or applied when doing experiments/trials
The following baseline step specifies that every parameter of the component 'os' should not be rendered while the parameter 'cpu_limit' of the component 'docker' should be rendered:
The following preset step specifies that the parameter 'cpu_limit' of the component 'docker' should be rendered:
The following optimize step specifies that every parameter of the component 'os' should not be rendered:
string
TRUE
The name of the system
description
string
TRUE
A description to characterize the system
The following represents a system (for Cassandra related system)
name
windowing: # the temporal window in which to compute the score of a trial
type: "trim" # type of windowing is trim
trim: [10s, 0s]name: JVM_1
description: The first jvm of the system
componentType: java-openjdk-11
properties:
hostname: ycsb1.dev.akamas.io
username: ubuntuname: os_1
description: The operating system of team 1
componentType: Ubuntu-20.04
properties:
hostname: ycsb1.dev.akamas.io
username: ubuntuparametersSelection:
- name: jvm.jvm_gcType
- name: jvm.jvm_maxG1NewSizePercent
domain: [10, 90]
- name: webserver.aws_ec2_instance_size
categories: ["large", "x.large", "2x.large"]name: Pause
operator: Sleep
arguments:
seconds: 30name: "mybaseline"
type: "baseline"
values: # every parameter in 'values' is rendered by default
jvm.jvm_compilation_threads: 10
jvm.jvm_gcType: -XX:+UseG1GC
doNotRenderParameters: ["os.*"] # every parameter of the component 'os' will not be rendered
renderParameters: ["docker.cpu_limit"] # the parameter 'cpu_limit' of the component 'docker' will be renderedname: "mypreset"
type: "preset"
values:
jvm.jvm_compilation_threads: 10
jvm.jvm_gcType: -XX:+UseG1GC
renderParameters: ["docker.cpu_limit"] # the parameter 'cpu_limit' of the component 'docker' will be renderedname: "myoptimize"
type: "optimize"
numberOfExperiment: 200
doNotRenderParameters: ["os.*"] # every parameter of the component 'os' will not be rendered# General section
name: Analytical functions
description: A collection of analytical functionsname: system1
description: my system with 3 nodes of cassandraJIT
Similarly, the bundled metrics provide visibility on the following aspects of tuned applications:
Heap and memory utilization
Garbage collection
Execution threads
The optimization pack supports the most used versions of OpenJDK and Oracle HotSpot JVM.
Java OpenJDK 8 JVM
Java OpenJDK 11 JVM
Java OpenJDK 17 JVM
Here’s the command to install the Java OpenJDK optimization pack using the Akamas CLI:
For more information on the process of installing or upgrading an optimization pack refer to Install Optimization Packs.
JIT
Similarly, the bundled metrics provide visibility on the following aspects of tuned applications:
Heap and memory utilization
Garbage Collection
Execution threads
The optimization pack supports the most used versions of JVM.
Eclipse OpenJ9 (formerly known as IBM J9) Virtual Machine version 6
Eclipse OpenJ9 (formerly known as IBM J9) Virtual Machine version 8
Eclipse OpenJ9 (formerly known as IBM J9) version 11
Here’s the command to install the Eclipse OpenJ9 optimization pack using the Akamas CLI:
For more information on the process of installing or upgrading an optimization pack refer to Install Optimization Packs.
k8s_cluster_cpu
millicores
The CPUs in the cluster
k8s_cluster_cpu_available
millicores
The CPUs available for additional pods in the cluster
k8s_cluster_cpu_util
percent
The percentage of used CPUs in the cluster
There are no parameters for the Kubernetes Cluster component type.
k8s_pod_cpu_used
millicores
The CPUs used by all containers of the pod
k8s_pod_memory_used
bytes
The total amount of memory used as sum of all containers in a pod
k8s_pod_cpu_request
millicores
The CPUs requested for the pod as sum of all container cpu requests
There are no parameters for the Kubernetes Pod component type.
Number of replicas
Similarly, the bundled metrics provide visibility on the following aspects of tuned applications:
Memory utilization
CPU utilization
The component types provided in this optimization pack allow modeling the entities found in a Kubernetes-based application, optimizing their parameters, and monitoring the key performance metrics.
Kubernetes Container
Kubernetes Pod
Kubernetes Workload
Here’s the command to install the Kubernetes optimization pack optimization-pack using the Akamas CLI:
no predefined mapping as CSV provider is extensible
The Golang runtime 1
Here’s the command to install the Go optimization pack using the Akamas CLI:
akamas install optimization-pack GoTelemetry instances are defined using a YAML manifest with the following structure:
with the following properties for the global section
provider
and the metrics section
The kpis field in a study specifies which metrics should be considered as KPI for an offline optimization study.
In case this selection is not specified, all metrics mentioned in the goal and constraint of the optimization study are considered.
A KPI is defined as follows:
Notice that the textual badge displayed in the Akamas UI use "Best name".
The following fragment is an example of a list of KPIs:
This section describes all the structures that can be used to define resources and objects in Akamas.
A baseline step performs an experiment (a baseline experiment) and marks it as the initial experiment of a study. The purpose of the step is to build a reference configuration that Akamas can use to measure the effectiveness of an optimization conducted towards a system.
When a bootstrap step imports an experiment from another study, the step copies not only the experiment but also its trials and the system metrics generated during its execution.
The bootstrap step has the following structure:
where the from field should have the following structure:
with:
study contains the name or ID of the study from which to import experiments
experiments contains the numbers of the experiments from which to import
The following is an example of a bootstrap step that imports four experiments from two studies:
You can also import all the experiments of a study by omitting the experiments field:
A parameter is a property of the system that can be applied and tuned to change the system's behavior. Akamas optimizes systems by changing parameters to achieve the stated while respecting the defined .
Examples of a parameter include:
Configuration knobs (e.g. JVM garbage collection type)
Parameters are defined using a YAML manifest with the following structure:
with the following properties:
A baseline step performs an experiment (a baseline experiment) and marks it as the initial experiment of a study. The purpose of the step is to build a reference configuration that Akamas can use to measure the effectiveness of an optimization conducted towards a system.
A baseline step offers three options when it comes to selecting the configuration of the baseline experiment:
Use a configuration made with the default values of the parameters taken from the system of the study
An optimize step generates optimized configurations according to the defined optimization strategy. During this step, Akamas AI is used to generate such optimized configurations.
The optimize step has the following structure:
Workflow are defined using a YAML manifest with the following structure:
with the following properties:
The Linux optimization pack helps you optimize Linux-based systems. The optimization pack provides component types for various Linux distributions, thus enabling performance improvements on a plethora of different configurations.
Through this optimization pack, Akamas is able to tackle the problem of performance of Linux-based systems from both the point of you of cost savings, as well as quality and level of service: the included component types bring in parameters that act on the memory footprint of systems, on their ability to sustain higher levels of traffic, on their capacity of leveraging all the available resources and on their potential for lower latency transactions.
Each component type providers parameters that cover four main areas of tuning:
CPU tasks scheduling (for example, if to auto-group and schedule together similar tasks)
akamas install optimization-pack Java-OpenJDKakamas install optimization-pack OpenJ9akamas install optimization-pack Kubernetesprovider: Provider Name
name: My Telemetry
config:
providerSpecificConfig1: "<value>"
providerSpecificConfig2: 123
metrics:
- name: metric_name
datasourceName: datsource_metric_name
defaultValue: 1.23
labels:
- label1
- label2
staticLabels:
staticLabel1: staticValue1
staticLabel2: staticValue2k8s_cluster_cpu_request
millicores
The total CPUs requested in the cluster
k8s_cluster_memory
bytes
The overall memory in the cluster
k8s_cluster_memory_available
bytes
The amount of memory available for additional pods in the cluster
k8s_cluster_memory_util
percent
The percentage of used memory in the cluster
k8s_cluster_memory_request
bytes
The total memory requested in the cluster
k8s_cluster_nodes
nodes
The number of nodes in the cluster
k8s_pod_cpu_limit
millicores
The CPUs allowed for the pod as sum of all container cpu limits
k8s_pod_memory_request
bytes
The memory requested for the pod as sum of all container memory requests
k8s_pod_memory_limit
bytes
The memory limit for the pod as sum of all container memory limits
k8s_pod_restarts
events
The number of container restarts in a pod
Kubernetes Namespace
Kubernetes Cluster
Configures Linux kernel parameters using different strategies
Executes a command on a target Windows machine using WinRM
Interpolates configuration parameters into files on remote Windows machines
Pauses the execution of the workflow for a certain time
Executes custom queries on Oracle database instances
Configures Oracle database instances
Executes a Spark application using spark-submit on a machine using SSH
Executes a Spark application using spark-submit locally
Executes a Spark application using the Livy web service
Triggers the execution of performance tests using NeoLoad Web
Runs a performance test with LoadRunner Professional
Runs a performance test with LoadRunner Enterprise
millicores
The total amount of CPUs used by the entire workload
k8s_workload_memory_used
bytes
The total amount of memory used by the entire workload
k8s_workload_cpu_request
millicores
The total amount of CPUs requests for the workload
k8s_workload_cpu_limit
millicores
The total amount of CPUs limits for the entire workload
k8s_workload_memory_request
millicores
The total amount of memory requests for the workload
k8s_workload_memory_limit
millicores
The total amount of memory limits for the entire workload
0 → 1024
yes
Number of desired pods in the deployment
k8s_workload_desired_pods
pods
Number of desired pods per workload
k8s_workload_running_pods
pods
The number of running pods per workload
k8s_workload_ready_pods
pods
The number of ready pods per workload
k8s_workload_replicas
integer
pods
k8s_workload_cpu_used
1
The metric name associated to a component
direction
string
minimize, maximize
yes
The direction corresponding to the metric
aggregation
string
avg, min, max, sum, p90, p95, p99
no
avg
A valid metric aggregation
name
string
should match a component metric
no
<metric_name>
Label that will be used in the UI
formula
string
Must be defined as <component_name>.<metric_name>
yes
runOnFailure
boolean
true
false
no
false
The execution policy of the step:
false prevents the step from running in case the previous step failed
true allows the step to run even if the previous step failed
from
array of objects
Each object should have the structure described below
yes
The experiments to import in the current study
In case this is not set, this step imports every experiment of a study
type
string
bootstrap
yes
The type of the step, in this case, bootstrap
name
string
yes
The name of the step
string
-
yes
-
The name of the task.
operator
string
-
yes
-
The operator the task implements: the chosen operator affects available arguments.
critical
boolean
-
no
true
When set to true, task failure will determine workflow failure.
alwaysRun
boolean
-
no
false
When set to true, task will be executed regardless of workflow failure.
collectMetricsOnFailure
boolean
-
no
false
When set to true, failure of the task will not prevent metrics collection.
arguments
list
Determined by operator choice
yes
-
Arguments list required by operators to run.
The full list of Operators and related options is provided on the Workflow Operators pages.
A workflow for the java-based renaissance benchmark application
name
kpis:
- name: "Response time"
formula: renaissance.response_time
direction: minimize
- name: "CPU used"
formula: renaissance.cpu_used
direction: minimize
- name: "Memory used"
formula: renaissance.mem_used
direction: minimizestudy: "study_bootstrap_1"
experiments: [1,2,3]name: "my_bootstrap" # name of the step
type: "bootstrap" # type of the step (bootstrap)
from:
- study: "study_bootstap_1" # name or ID of the study from which to import
experiments: [1, 2, 4] # the numbers of the experiments to import
- study: "study_bootstrap_2"
experiments: [1]name: "my_bootstrap" # name of the step
type: "bootstrap" # type of the step (bootstrap)
from:
- study: "study_bootstrap_1" # name or ID of the study from which to importname: "insert_workflow_name_here"
tasks:
# these are the tasks that will be executed sequentially to complete a trial (configure the system under tests with the parameters optimized by Akamas )
- name: "insert_here_name_of_task"
# an operator specifies which type of task should be used
operator: "insert_here_which_operator_to_use_for_the_task"
# each operator accepts different arguments necessary to specify how it should behave
arguments:
...name: renaissance-optimize
tasks:
- name: Configure Benchmark
operator: FileConfigurator
arguments:
source:
hostname: benchmark
username: akamas
password: akamas
path: launch_benchmark.sh.templ
target:
hostname: benchmark
username: akamas
password: akamas
path: launch_benchmark.sh
- name: Launch Benchmark
operator: Executor
arguments:
command: "bash launch_benchmark.sh"
host:
hostname: benchmark
username: akamas
password: akamas
- name: Parse Output
operator: Executor
arguments:
command: "bash parse_output.sh"
host:
hostname: benchmark
username: akamas
password: akamasYes
defaultValue
double
Default value that, if specified, is used to create metrics in time-intervals where no other valid datapoint is available.
No
labels
List of strings
List of labels. For the specific usage of this parameter, see the documentation of the specific Telemetry Provider
No
staticLabels
List of key-value pair
List of Key-Value pairs that are interpreted as a pair of labels name and value. This "static labels" are copied directly in each sample of the specific metric and sent to the Metric Service
No
aggregation
String
see
No
extras
Object
Only the parameter mergeEntities can be defined to either true or false
No
string
The name of the Telemetry Provider
Yes
config
object
Provider-specific configuration in a key-value format (see specific provider documentation for details)
Yes
name
string
Custom telemetry instance name
No
metrics
object
This section is used to specify the metrics to extract. This section is specific for each Telemetry Provider (see specific provider documentation for details)
No
name
string
Name of the metric in Akamas.
This metric must exists in at least one of the referred by the System associated with the Telemetry Provider Instance
Yes
datasourceName
string
Name of the metric (or extraction query) in the data source. The value of this parameter is specific to the data source.
Algorithms settings (e.g. learning rate of a neural network)
Architectural properties (e.g. how many caching layers in an enterprise application)
Type of resources (e.g. AWS EC2 instance or EBS volume type)
Any other thing (e.g. amount of sugar in your cookies)
The following table describes the parameter types:
REAL
real values
Akamas normalizes the values
[0.0, 10.0] → [0.0, 1.0]
INTEGER
integer values
Akamas converts the integer into real and then normalizes the values
[0, 3] → [0.0, 3.0] → [0.0, 1.0]
A parameter is described by the following properties:
a name that uniquely identifies the parameter
a description that clarifies the semantics of the parameter
a unit that defines the unit of measurement used by the parameter
Although users can create parameters with any name, we suggest using the naming convention context_parameter where
context refers to the technology or more general environment in which that metric is defined (e.g. elasticsearch, jvm, mysql, spark)
parameter is the parameter name in the original context (e.g. gcType, numberOfExecutors)
This makes it possible to identify parameters more easily and avoid any potential name clash.
The construct to be used to define a parameter is described on the Parameter template page.
Parameters are displayed in the Akamas UI when drilling down to each system component.
For each optimization study, the optimization scope is the set of parameters that Akamas can change to achieve the defined optimization goal.
string
It should contain only lower/uppercase letters, numbers or underscores. It should start only with a letter. No spaces are allowed.
TRUE
The name of the parameter
description
string
TRUE
A description characterizing the parameter
unit
string
A supported unit or a custom unit (see )
FALSE
empty unit
The unit of measure of the parameter
restart
boolean
FALSE
FALSE
If the use of the parameters for changing the configuration of a system should cause the system to be restarted.
Notice that parameter definitions are shared across all the workspaces on the same Akamas installation, and require an account with administrative privileges to manage them.
The following represents a set of parameters for a JVM component
The following represents a set of CPU-related parameters for the Linux operating system
name
Use a custom configuration
The baseline step has the following structure:
type
string
baseline
yes
where the from field should have the following structure:
with
study contains the name or ID of the study from which to take the configuration
experiments contains the number of the experiment from which to take the configuration
Default values for the baseline configuration only require setting the name and type fields:
The configuration taken from another study to be used as a baseline only requires setting the from field:
Notice: the from and experiments fields are defined as an array, but can only contain one element.
The custom configuration for the baseline only requires setting the values field:
type
string
optimize
yes
The type of the step, in this case, optimize
name
string
yes
The name of the step
runOnFailure
boolean
true
false
no
false
The execution policy of the step:
false prevents the step from running in case the previous step failed
true allows the step to run even if the previous step failed
numberOfExperiments
integer
numberOfExperiments > 0 and
numberOfExperiments >= numberOfInitExperiments
yes
The number of experiments to execute - see below
numberOfTrials
integer
numberOfTrials > 0
no
1
The number of trials to execute for each experiment
numberOfInitExperiments
integer
numberOfInitExperiments < numberOfExperiments
no
10
The number of initialization experiment to execute - see below.
maxFailedExperiments
integer
maxFailedExperiments > 1
no
30
The number of experiment failures (as either workflow errors or constraint violations) to accept before the step is marked as failed
optimizer
string
AKAMAS
SOBOL
RANDOM
no
AKAMAS
The type of optimizer to use to generate the configuration of the experiments - see below
doNotRenderParameters
string
no
Parameters not to be rendered. - see
renderParameters
string
no
Parameters to be rendered. - see
The optimizer field allows selecting the desired optimizer:
AKAMAS identifies the standard AI optimizer used by Akamas
SOBOL identifies an optimizer that generates configurations using Sobol sequences
RANDOM identifies an optimization that generates configurations using random numbers
Notice that SOBOL and RANDOM optimizers do not perform initialization experiments, hence the field numberOfInitExperiments is ignored.
Refer to the page Optimizer Options for more configuration options for the optimizer
The optimize step is fault-tolerant and tries to relaunch experiments on failure. Nevertheless, the step limits the number of failed experiments: if too many experiments fail, then the entire step fails too. By default, at most 30 experiments can fail while Akamas is optimizing systems. An experiment is considered failed when it fails to run (i.e., there is an error in the workflow) or violates some constraint.
The optimize step launches some initialization experiments (by default 10) that do not apply the AI optimizer and are used to find good configurations. By default, the step performs 10 initialization experiments.
Initialization experiments take into account bootstrapped experiments, experiments executed in preset steps, and baseline experiments.
The following snippet shows an optimization step that runs 50 experiments using the SOBOL optimizer:
Memory (for example, the limit on memory usage for which start swapping pages on disk)
Network (for example, the size of the buffers used to write/read network packets)
Storage (for example, the type of storage scheduler)
Amazon Linux AMI
Amazon Linux 2 AMI
Amazon Linux 2022 AMI
Here’s the command to install the Linux optimization pack using the Akamas CLI:
For more information on the process of installing or upgrading an optimization pack refer to Install Optimization Packs.
This section documents Akamas out-of-the-box optimization packs.
based on Linux operating system
A preset step performs a single experiment with a specific configuration. The purpose of this step is to help you quickly understand how good is a particular configuration.
A preset step offers two options when selecting the configuration of the experiment to be executed:
Use a configuration taken from an experiment of a study (can be the same study)
Use a custom configuration
The preset step has the following structure:
where the from field should have the following structure:
with
study contains the name or ID of the study from which to take the configuration. In the case this is omitted, the same study of the step is considered for experiments from which taking configurations
experiments contains the number of the experiment from which to take the configuration
You can provide a custom configuration by setting values:
You can select a configuration taken from another study by setting from:
You can select a configuration taken from the same study by setting from but by omitting the study field:
Notice: the from and experiments fields are defined as a list, but can only contain one element.
A windowing policy of type stability discards temporal intervals in which a given metric is not stable, and selects, among the remaining intervals, the ones in which another target metric is maximized or minimized. Stability windowing can be sample-based or time-frame based.
The stability windowing has the following structure:
parameters:
- name: jvm_heap_size
description: the size of the heap of the jvm
unit: megabytes
restart: false
- name: jvm_survival_ratio
description: the ratio of the two survivor spaces in the JVM GCparameters:
- name: jvm_maxHeapSize
description: Maximum heap size
unit: megabytes
restart: true
- name: jvm_newRatio
description: Ratio of old/new generation sizes
restart: true
- name: jvm_maxTenuringThreshold
description: Maximum value for tenuring threshold
restart: true
- name: jvm_survivorRatio
description: Ratio of eden/survivor space size
restart: true
- name: jvm_concurrentGCThreads
description: Number of threads concurrent garbage collection will use
unit: threads
restart: true
- name: jvm_gcType
description: Type of the garbage collection algorithm
restart: trueparameters:
# CPU Related
- name: os_cpu_sched_min_granularity_ns
description: Target minimum scheduler period in which a single task will run
unit: nanoseconds
restart: false
- name: os_cpu_sched_wakeup_granularity_ns
unit: nanoseconds
description: desc
restart: false
- name: os_cpu_sched_migration_cost_ns
unit: nanoseconds
description: desc
restart: false
- name: os_cpu_sched_child_runs_first
description: desc
restart: false
- name: os_cpu_sched_latency_ns
unit: nanoseconds
description: desc
restart: false
- name: os_cpu_sched_autogroup_enabled
description: desc
restart: false
- name: os_cpu_sched_nr_migrate
description: desc
restart: falsestudy: "study_from_which_to_take_the_baseline"
experiments: [1]name: "my_baseline" # name of the step
type: "baseline" # type of the step (baseline)name: "my_baseline" # name of the step
type: "baseline" # type of the step (baseline)
from:
- study: "study_from_which_take_the_baseline" # name or id of the study from which to take the configuration
experiments: [1] # the number of the experiment from which to take the configurationname: "my_baseline" # name of the step
type: "baseline" # type of the step (baseline)
values:
jvm1.maxHeapSize: 1024 # parameter maxHeapSize of jvm1 is set to 1024
jvm2.maxHeapSize: 2048 # parameter maxHeapSize of jvm2 is set to 2048name: "my_optimize" # name of the step
type: "optimize" # type of the step (optimize)
optimizer: "SOBOL"
numberOfExperiments: 50 # amount of experiments to execute
numberOfTrials: 2 # amount of trials for each experimentakamas install optimization-pack LinuxThe type of the step, in this case, baseline
name
string
yes
The name of the step
runOnFailure
boolean
true
false
no
false
The execution policy of the step:
false prevents the step from running in case the previous step failed
true allows the step to run even if the previous step failed
from
array of objects
Each object should have the structure described below
no
The study and the experiment from which to take the configuration of the baseline experiment
The from and experiments fields are defined as an array, but it can only contain one element
This can be set only if values is not set
values
object
The keys should match existing parameters
no
The configuration with which execute the baseline experiment
This can be set only if from is not set
doNotRenderParameters
string
this cannot be used when using a from option since no experiment is actually executed
no
Parameters not to be rendered. - see Parameter rending
renderParameters
string
this cannot be used when using a from option since no experiment is actually executed
no
Parameters to be rendered. - see Parameter rending
transactions_response_time_max
milliseconds
The maximum recorded transaction response time
transactions_response_time_min
milliseconds
The minimum recorded transaction response time
pages_throughput
pages/s
The number of pages requested per second
pages_response_time
milliseconds
The average page response time
pages_response_time_max
milliseconds
The maximum recorded page response time
pages_response_time_min
milliseconds
The minimum recorded page response time
requests_throughput
requests/s
The number of requests performed per second
requests_response_time
milliseconds
The average request response time
requests_response_time_max
milliseconds
The maximum recorded request response time
requests_response_time_min
milliseconds
The minimum recorded request response time
pages_error_rate
percent
The percentage of pages flagged as error
pages_error_throughput
pages/s
The number of pages flagged as error per second
requests_error_rate
percent
The requests of requests flagged as error
requests_error_throughput
requests/s
The number of requests flagged as error per second
transactions_throughput
transactions/s
The number of transactions executed per second
transactions_response_time
milliseconds
The average transaction response time
transactions_error_rate
percent
The percentage of transactions flagged as error
transactions_error_throughput
transactions/s
The number of transactions flagged as error per second
users
users
The number of users performing requests on the web
CentOS Linux distribution version 7.x
CentOS Linux distribution version 8.x
Red Hat Enterprise Linux distribution version 7.x
Red Hat Enterprise Linux distribution version 8.x
Ubuntu Linux distribution by Canonical version 16.04 (LTS)
Ubuntu Linux distribution by Canonical version 18.04 (LTS)
Ubuntu Linux distribution by Canonical version 20.04 (LTS)
transactions_response_time_max
The max response time of LoadRunner transaction (requests)
transactions_response_time
The response time of LoadRunner transaction (requests)
transactions_response_time_p50
The 50th percentile (weighted median) of the response time of LoadRunner transaction (requests)
transactions_response_time_p85
The 85th percentile of the response time of LoadRunner transaction (requests)
transactions_response_time_p95
The 95th percentile of the response time of LoadRunner transaction (requests)
transactions_response_time_p99
The 99th percentile of the response time of LoadRunner transaction (requests)
pages_throughput
The average throughput of LoadRunner pages (transactions breakdown, second level) , per second
pages_response_time_min
The min response time of LoadRunner pages (transactions breakdown, second level)
pages_response_time_max
The max response time of LoadRunner pages (transactions breakdown, second level)
pages_response_time
The response time of LoadRunner pages (transactions breakdown, second level)
pages_response_time_p50
The 50th percentile (weighted median) of the response time of LoadRunner requests
pages_response_time_p85
The 85th percentile of the response time of LoadRunner transaction breakdown, first level (pages)
pages_response_time_p95
The 95th percentile of the response time of LoadRunner transaction breakdown, first level (pages)
pages_response_time_p99
The 99th percentile of the response time of LoadRunner transaction breakdown, first level (pages)
requests_throughput
The average throughput of LoadRunner requests, per second
requests_response_time_min
The min response time of LoadRunner requests
requests_response_time_max
The max response time of LoadRunner requests
requests_response_time
The response time of LoadRunner requests
requests_response_time_p50
The 50th percentile (weighted median) of the response time of LoadRunner requests
requests_response_time_p85
The 85th percentile of the response time of LoadRunner transaction breakdown, second level (requests)
requests_response_time_p95
The 95th percentile of the response time of LoadRunner transaction breakdown, second level (requests)
requests_response_time_p99
The 99th percentile of the response time of LoadRunner transaction breakdown, second level (requests)
requests_error_throughput
The number of requests (transactions breakdown, first level) flagged as error by LoadRunner, per second
users
The average number of users active in a specific timeframe.
transactions_throughput
The average throughput of LoadRunner transaction (requests), per second
transactions_response_time_min
The min response time of LoadRunner transaction (requests)
based on MS .Net technology
based on OpenJDK and Oracle HotSpot JVM
based on Eclipse OpenJ9 VM (formerly known as IBM J9)
based on NodeJS
based on GO runtime (aka Golang)
exposed as web applications
based on Docker containters
based on Kubernetes containters
based on WebSphere middleware
based on Apache Spark middleware
based on PostgreSQL database
based on Cassandra database
based on MySQL database
based on Oracle database
based on MongoDB database
based on Elasticsearch database
based on AWS EC2 or Lambda resources
ORDINAL
integer values
Akamas converts the category into real and then normalizes the values
['a', 'b', 'c'] → [0, 2] → [0.0, 2.0] → [0.0, 1.0]
CATEGORICAL
categorical values
Akamas converts each param value into a new param that may be either 1.0 (active) or 0.0 (inactive), only 1 of these new params can be "active" during each exp:
['a', 'b', 'c'] → [[0.0, 1.0], [0.0, 1.0], [0.0, 1.0]]


bytes
The amount of heap memory used
go_heap_util
bytes
The amount of heap memory used
go_memory_used
bytes
The total amount of memory used by Go
go_gc_time
percent
The % of wall clock time the Go spent doing stop the world garbage collection activities
go_gc_duration
seconds
The average duration of a stop the world Go garbage collection
go_gc_count
collections/s
The total number of stop the world Go garbage collections that have occurred per second
go_threads_current
threads
The total number of active Go threads
go_goroutines_current
goroutines
The total number of active Goroutines
0 → 25000
yes
Sets the GOGC variable which controls the aggressiveness of the garbage collector
go_maxProcs
integer
theads
8
0 → 100
yes
Limits the number of operating system threads that can execute user-level code simultaneously
go_memLimit
integer
megabtes
100
0 → 1048576
yes
Sets a soft memory limit for the runtime. Available since Go 1.19
cpu_used
CPUs
The total amount of CPUs used
cpu_util
percents
The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)
go_heap_size
bytes
The largest size reached by the Go heap memory
go_gcTargetPercentage
integer
go_heap_used
100
name
string
yes
The name of the step
runOnFailure
boolean
true
false
no
false
The execution policy of the step:
false prevents the step from running in case the previous step failed
true allows the step to run even if the previous step failed
from
array of objects
Each object should have the structure described below
no
The study and the experiment from which to take the configuration of the experiment
The from and experiments fields are defined as an array, but it can only contain one element
This can be set only if values is not set
values
object
The keys should match existing parameters
no
The configuration with which execute the experiment
This can be set only if from is not set
doNotRenderParameters
string
this cannot be used when using a from option since no experiment is actually executed
no
Parameters not to be rendered. - see
renderParameters
string
this cannot be used when using a from option since no experiment is actually executed
no
Parameters to be rendered. - see
type
string
preset
yes
The type of the step, in this case, preset
type
string
{stability}
TRUE
The type of windowing.
stability->metric
string
It should match the name of an existing metric monitored by AKAMAS
TRUE
and for the comparison metric section
metric
string
It should match the name of an existing metric monitored by Akamas
TRUE
The following fragment is an example of stability windowing (time-frame based):
container_cpu_util
CPUs
The number of CPUs (or fraction of CPUs) allowed for a container
container_cpu_used
percents
CPUs used by the container per second
container_cpu_throttle_time
CPUs
CPUs used by the container per second
limits_cpu
integer
CPUs
The Optimizer Options is a set of parameters used to fine-tune the study optimization strategy during the optimize step.
Optimizer options have the following structure:
onlineMode
The safetyFactor field specifies how much the optimizer should stay on the safe side in evaluating a candidate configuration with respect to the goal constraints. A higher safety factor corresponds to a safer configuration, that is a configuration that is less likely to violate goal constraints.
Acceptable values are all the real values ranging between 0 and 1, with (safetyFactor - 0.5) representing the allowed margin for staying within the defined constraint:
0 means "no safety", as with this value the optimizer totally ignores goal constraint violations;
0.5 means "safe, but no margin", as with this value the optimizer only tries configurations that do not violate the goal constraints, by remaining as close as possible to them;
1 means "super safe", as with this value the optimize only tries configurations that are very far from goal constraints.
For live optimization studies, 0.6 is the default value, while for offline optimization studies, the default value is 0.5.
For , the optimizerOptions field can be used to specify whether beta-warping optimization (a more sophisticated optimization that requires a longer time) should be used and for how many experiments (as a percentage):
where experimentsWithBeta can be:
A percentage between 0 and 100%
A number less than or equal to numberOfExperiments
For live optimization studies, the optimizerOptions field can be used to specify several important parameters governing the live optimization:
Notice that while available as independent options, the optimizer options onlineMode (described ), workloadOptimizedForStrategy () and the safetyFactor () work in conjunction according to the following schema:
All these optimizer options can be changed at any time, that is while the optimization study is running, to become immediately effective. The page in the reference guide provides these specific update commands.
The onlineMode field specifies how the Akamas optimizer should operate:
RECOMMEND: configurations are recommended to the user by Akamas and are only applied after having been approved (and possibly modified) by the user;
FULLY AUTONOMOUS MODE: configurations are immediately applied by Akamas.
The safetyMode field describes how the Akamas optimizer should evaluate the goal constraints on a candidate configuration for that configuration to be considered valid:
GLOBAL: the constraints must be satisfied by the configuration under all observed workloads in the configuration history - this is the value taken in case onlineMode is set to RECOMMEND;
LOCAL: the constraints are evaluated only under the workload selected according to the workload strategy - this should be used with onlineMode set to
Notice that when setting the safetyMode to LOCAL, the recommended configuration is only expected to be good for the specific workload selected under the defined workload strategy, but it might violate constraints under another workload.
The workloadOptimizedForStrategy field specifies the workload strategy that drives how Akamas leverages the workload information when looking for the next configuration:
MAXIMIN: the optimizer looks for a configuration that maximizes the minimum improvements for all the already observed workloads;
LAST: for each workload, the last observed workload is considered - this works well to find a configuration that is good for the last workloads - it is often used in conjunction with a LOCAL safety mode (see );
The explorationFactor field specifies how much the optimizer explores the (unknown) optimization space when looking for new configurations. For any parameter, this factor measures the difference between already tried values and the value of a new possible configuration. A higher exploration factor corresponds to a broader exploration of never-tried-before parameter values.
Acceptable values are all the real values ranging between 0 and 1, plus the special string FULL_EXPLORATION:
0 means "no exploration", as with this value the optimizer chooses a value among the previously seen values for each parameter;
1 means "full exploration, except for categories", as with this value the optimizer for a non-categorical parameter any value among all its domain values can be chosen, while only values (categories) that have already been seen in previous configurations are chosen for a categorical parameter;
FULL_EXPLORATION
In case the desired explorationFactoris 1 but there are some specific parameters that also need to be explored with respect to all its categories, then PRESET steps (refer to the page) can be used to run an optimization study with these values. For an example of a live optimization study where this approach is adopted see .
The following fragment refers to an optimization study that runs 100 experiments using the SOBOL optimizer and forces 50% of the experiments to use the beta-warping option, enabling a more sophisticated but longer optimization:
This page introduces the OracleConfigurator operator, a workflow operator that allows configuring the optimized parameters of an Oracle instance.
This section provides the minimum requirements that you should meet in order to use the OracleConfigurator operator.
Oracle 12c or later
The Oracle operator must be able to connect to the Oracle URL or IP address and port (default port: 1521).
The user used to log into the database must have ALTER SYSTEM privileges.
In order to configure the tuned parameters the Oracle Configurator operator requires to be bound to a component with one of the following types:
Oracle Database 12c
Oracle Database 18c
Oracle Database 19c
Databases hosted on Amazon RDS are not supported.
When you define an OracleExecutor task in the workflow you should specify some configuration information to allow the operator to connect to the Oracle instance.
You can specify configuration information within the config part of the YAML of the instance definition. The operator can also inherit some specific arguments from the properties of a bound component when not specified in the task.
The following table describes all the properties for the definition of a task using the OracleConfigurator operator.
In the following example, the workflow leverages the OracleConfigurator operator to update the database parameters before triggering the execution of the load test for a component oracledb:
The SparkLivy operator uses Livy to run Spark applications on a Spark instance.
The operator fetches the following parameters from the current Experiment to apply them to the System under test.
study: "study_preset_1"
experiments: [1]name: "my_preset" # name of the step
type: "preset" # type of the step (preset)
values:
jvm1.maxHeapSize: 1024 # parameter maxHeapSize of jvm1 is set to 1024
jvm2.maxHeapSize: 2048 # parameter maxHeapSize of jvm2 is set to 2048name: "my_preset" # name of the step
type: "preset" # type of the step (preset)
from:
- study: "preset_study_1" # name or ID of the study from which to take the configuration
experiments: [1] # the number of the experiment from which to take the configurationname: "my_preset" # name of the step
type: "preset" # type of the step (preset)
from:
- experiments: [1] # the step will take the configuration of the experiment #1 of the same study of the steptype: stability
stability:
metric: throughput
labels:
componentName: DB
resolution: "30s"
width: 10
maxStdDev: .6
# Comparison metric section
when:
metric: response_time
labels:
componentName: FE
is: min
The metric whose stability is going to be verified to exclude some temporal intervals over the duration of a trial.
stability->labels
set of key-value pairs
FALSE
A set of key-value pairs that represent filtering conditions for retrieving the value of the metric. This conditions can be used to consider the right metric of the right component, you can in fact filter by componentName or by other custom properties defined in the components of the system of the study.
stability->resolution
string
Valid values are in the form 30s 40m 2h
where s refers to seconds, m to minutes, h to hours
FALSE
0s
The temporal resolution at which Akamas aggregate data points to determine feasible windows.
stability->width
integer string
stability->width > 1
Valid values are in the form 30s 40m 2h as specified in stability->resolution
TRUE
The width of temporal intervals over the duration trial which are checked for the stability of the metric.
Width can be sample-based (integer) or time frame-based (string).
stability->maxStdDev
double
TRUE
The stability condition, i.e, the maximum amount of standard deviation among the value of the data point of the metric tolerated for a temporal interval of size width, otherwise, the temporal interval will be discarded
The metric whose value is analyzed to include or exclude temporal intervals over the duration of a trial, when another reference metric is stable.
labels
set of key-value pairs
FALSE
A set of key-value pairs that represent filtering conditions for retrieving the value of the metric. This conditions can be used to consider the right metric of the right component, you can in fact filter by componentName or by other custom properties defined in the components of the system of the study.
is
string
{min,max}
TRUE
If the value of the metric should be maximum or minimum to include or exclude temporal intervals over the duration of a trial when another reference metric is stable.
container_cpu_limit
percent
The number of CPUs (or fraction of CPUs) allowed for a container
container_mem_util_nocache
percent
Percentage of memory used with respect to the limit. Memory used includes all types of memory, including file system cache
container_mem_util
percent
Percentage of working set memory used with respect to the limit
container_mem_used
bytes
The total amount of memory used by the container. Memory used includes all types of memory, including file system cache
container_mem_limit
bytes
Memory limit for the container
container_mem_working_set
threads
Current working set in bytes
container_mem_limit_hits
hits/s
Number of times memory usage hits memory limit per second
0.7
0.1 → 100.0
Limits on the amount of CPU resources usage in CPU units
requests_cpu
integer
megabytes
0.7
0.1 → 100.0
Limits on the amount of memory resources usage
limits_memory
integer
CPUs
128
64 → 64000
Amount of CPU resources requests in CPU units
requests_memory
integer
megabytes
128
64 → 64000
Amount of memory resources requests
AVG_DURATION
requests_response_time
requests
AVG_DURATION
transactions_response_time_min
transactions
MIN_DURATION
pages_response_time_min
pages
MIN_DURATION
requests_response_time_min
requests
MIN_DURATION
transactions_response_time_max
transactions
MAX_DURATION
pages_response_time_max
pages
MAX_DURATION
requests_response_time_max
requests
MAX_DURATION
transactions_throughput
transactions
THROUGHPUT
pages_throughput
pages
THROUGHPUT
requests_throughput
requests
THROUGHPUT
transactions_error_rate
transactions
ERROR_RATE
pages_error_rate
pages
ERROR_RATE
requests_error_rate
requests
ERROR_RATE
transactions_error_throughput
transactions
ERRORS_PER_SECOND
pages_error_throughput
pages
ERRORS_PER_SECOND
requests_error_throughput
requests
ERRORS_PER_SECOND
users
Controller/User Load
AVG
transactions_response_time
transactions
AVG_DURATION
pages_response_time
pages
FULLY_AUTONOMOUSMOST_VIOLATED: for each workload, the workload of the configuration with more violations is considered.
string
RECOMMEND, FULLY_AUTONOMOUS
Changes are approved automatically or must be edited/approved by the user
safetyMode
string
LOCAL, GLOBAL
Defines how Akamas optimizer evaluates goal constraints
safetyFactor
decimal
between 0 and 1
Parameter that impacts the distance from goal constraints for new configurations
workloadOptimizedForStrategy
string
LAST, MOST_VIOLATED, MAXIMIN
Selects the computation strategy to generates future configurations
explorationFactor
decimal, string
between 0 and 1 or FULL_EXPLORATION
Set the tendency to explore toward unexplored configuration values
experimentsWithBeta
decimal / string
if string must be a percentage between 0% and 100%. If numeric must be less than or equal to numberOfExperiments
Percentage/number of experiments that will be computed with beta-warping optimization
RECOMMEND
GLOBAL
MAXIMIN
FULLY_AUTONOMOUS
LOCAL
LAST
Additional application arguments
className
String
No. Required for java applications.
The entry point of the java application.
name
String
No
Name of the task. When submitted the id of the study, experiment and trial will be appended.
queue
String
No
The name of the YARN queue to which submit a Spark application
pyFiles
List of Strings
Each item of the list should be a path that matches an existing python file
No
A list of python scripts to be added to the PYTHONPATH
proxyUser
String
No
The user to be used to launch Spark applications
pollingInterval
Number
pollingInterval > 0
No
10
The number of seconds to wait before checking if a launched Spark application has finished
component
String
It should match the name of an existing Component of the System under test
Yes
The name of the component whose properties can be used as arguments of the operator
spark_total_executor_cores
Total cores used by the application
Spark standalone and Mesos only
spark_executor_cores
Cores per executor
Spark standalone and YARN only
spark_num_executors
The number of executors
YARN only
file
String
It should be a path to a valid java or python spark application file
Yes
Spark application to submit (jar or python file)
args
List of Strings, Numbers or Booleans
Yes
spark_driver_memory
Memory for the driver
spark_executor_memory
Memory per executor
# Half the experiments should be done with beta-warping
experimentsWithBeta: "50%"optimizerOptions:
onlineMode: RECOMMEND # [RECOMMEND|FULLY_AUTONOMOUS]
safetyMode: GLOBAL # [GLOBAL|LOCAL]
workloadOptimizedForStrategy: MAXIMIN # [MAXIMIN|LAST|MOST_VIOLATED]
safetyFactor: 0.55 # 0 <= safetyFactor <= 1
explorationFactor: 0.05 # 0 <= explorationFactor <= 1 or FULL_EXPLORATIONname: "my_optimize" # name of the step
type: "optimize" # type of the step (optimize)
optimizer: "SOBOL"
numberOfExperiments: 100 # amount of experiments to execute
numberOfTrials: 2 # amount of trials for each experiment
optimizerOptions:
experimentsWithBeta: "50%"- name: Run spark application
operator: SparkLivy
arguments:
component: sparkemr
file: /spark-examples.jarIs possible to define only one of the following sets of configurations:
dsn
host, service and optionally port
task, component
connection.host
String
Address of the database instance
task, component
connection.port
Integer
listening port of the database instance
1521
task, component
connection.service
String
Database service name
task, component
connection.sid
String
Database SID
task, component
connection.user
String
User name
Yes
task, component
connection.password
String
User password
Yes
task, component
connection.mode
String
Connection mode
sysdba, sysoper
task, component
component
String
Name of the component to fetch properties and parameters from
Yes
task
connection.dsn
String
DSN or EasyConnect string
Additional application arguments
master
String
It should be a valid supported Master URL:
local
local[K]
local[K,F]
Yes
The master URL for the Spark cluster
deployMode
client cluster
No
cluster
Whether to launch the driver locally (client) or in the cluster (cluster)
className
String
No
The entry point of the java application. Required for java applications.
name
String
No
Name of the task. When submitted the id of the study, experiment and trial will be appended.
jars
List of Strings
Each item of the list should be a path that matches an existing jar file
No
A list of jars to be added in the classpath.
pyFiles
List of Strings
Each item of the list should be a path that matches an existing python file
No
A list of python scripts to be added to the PYTHONPATH
files
List of Strings
Each item of the list should be a path that matches an existing file
No
A list of files to be added to the context of the spark-submit
conf
Object (key-value pairs)
No
Mapping containing additional Spark configurations. See Spark documentation.
envVars
Object (key-value pairs)
No
Env variables when running the spark-submit command
sparkSubmitExec
String
It should be a path that matches an existing executable
No
The default for the Spark installation
The path of the spark-submit executable command
sparkHome
String
It should be a path that matches an existing directory
No
The default for the Spark installation
The path of the SPARK_HOME
proxyUser
String
No
The user to be used to execute Spark applications
verbose
Boolean
No
true
If additional debugging output should be displayed
component
String
It should match the name of an existing Component of the System under test
Yes
The name of the component whose properties can be used as arguments of the operator
file
String
It should be a path to a valid java or python spark application file
Yes
Spark application to submit (jar or python file)
args
List of Strings, Numbers or Booleans
Yes
container_cpu_used
millicores
The CPUs used by the container
container_cpu_used_max
millicores
The maximum CPUs used by the container among all container replicas
container_memory_used
bytes
The total amount of memory used by the container
container_memory_used_max
bytes
The maximum memory used by the container among all container replicas
cpu_request
integer
millicores
The following tables show a list of constraints that may be required in the definition of the study, depending on the tuned parameters:
component_name.cpu_request <= component_name.cpu_limit
component_name.memory_request <= component_name.memory_limit
This page introduces the LoadRunner operator, a workflow operator that allows piloting performance tests on a target system by leveraging Micro Focus LoadRunner. This page assumes you are familiar with the definition of a workflow and its tasks. If this is not the case, then check Creating automation workflows.
This section provides the minimum requirements that you should meet to use this operator.
Micro Focus LoadRunner 12.60 or 2020
Microsoft Windows Server 2016 or 2019
Powershell version 5.1 or greater
To configure WinRM to allow Akamas to launch tests please read the page.
All LoadRunner test files (VuGen scripts and folder, lrs files) and their parent folders, must be readable and writable by the user account used by Akamas.
When you define a task that uses the LoadRunner operator you should specify some configuration information to allow the operator to connect to the LoadRunner controller and execute a provided test scenario.
You can specify configuration information within the arguments that are part of a task in the YAML of the definition of a workflow.
You can avoid specifying each configuration information at the task level, by including a component property with the name of a component; in this way, the operator will take any configuration information from the properties of the referenced component
controller - a set of pieces of information useful for connecting to the LoadRunner controller
scenarioFile - the path to the scenario file within the LoadRunner controller to execute the performance test
To make it possible for the operator to connect to a LoadRunner controller to execute a performance test you can use the controller property within the workflow task definition:
This table reports the configuration reference for the arguments section.
Important notice: remember to escape your path with four backslashes (e.g. C:\\\\Users\\\\\...)
Controller argumentsThis table reports the configuration reference for the controller section, which is an object with the following fields:
Important notice: remember to escape your path with four backslashes (e.g. C:\\\\Users\\\\\...)
This page introduces the LoadRunnerEnterprise operator, a workflow operator that allows piloting performance tests on a target system by leveraging Micro Focus LoadRunner Enterprise (formerly known as Performance Center).
This section provides the minimum requirements that you should meet to use this operator.
name: oracledb
componentType: Oracle Database 18c
properties:
connection:
user: application
password: password
host: oradb.dev.akamas.io
service: XEtasks:
- name: update parameters
operator: OracleConfigurator
arguments:
component: oracledb
- name: run load test
operator: Executor
arguments:
command: sh run_test.sh
component: generatorhost, sid and optionally port
local[]
local[,F]
spark://HOST:PORT
spark://HOST1:PORT1, HOST2:PORT2
yarn
container_cpu_util
percent
The percentage of CPUs used with respect to the limit
container_cpu_util_max
percent
The maximum percentage of CPUs used with respect to the limit among all container replicas
container_cpu_throttle_time
percent
The amount of time the CPU has been throttled
container_cpu_throttled_millicores
millicores
The CPUs throttling per container in millicores
container_cpu_request
millicores
The CPUs requested for the container
container_cpu_limit
millicores
The CPUs limit for the container
container_memory_util
percent
The percentage of memory used with respect to the limit
container_memory_util_max
percent
The maximum percentage of memory used with respect to the limit among all container replicas
container_memory_working_set
bytes
The working set usage in bytes
container_memory_resident_set
bytes
The resident set usage in bytes
container_memory_cache
bytes
The memory cache usage in bytes
container_memory_request
bytes
The memory requested for the container
container_memory_limit
bytes
The memory limit for the container
You should select your own default value.
You should select your own domain.
yes
Amount of CPU resources requests in CPU units (milllicores)
cpu_limit
integer
millicores
You should select your own default value.
You should select your own domain.
yes
Limits on the amount of CPU resources usage in CPU units (millicores)
memory_request
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
Amount of memory resources requests in megabytes
memory_limit
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
Limits on the amount of memory resources usage in megabytes
resultFoldercomponent
String
No
The name of the component from which the operator will take its configuration options
scenarioFile
String
Matches an existing file within the LoadRunner controller
Yes
The LoadRunner scenario file to execute the performance test.
resultFolder
String
Yes
The folder, on the controller, where Loadrunner will put the results of a performance test.
You can use the placeholders {study}, {exp}, {trial} to generate a path that is unique for the running Akamas trial.
It can be a local path on the controller or on a network share
loadrunnerResOverride
String
A valid name for a Windows folder
No
res
The folder name where LoadRunner save the analysis results.
The default value can be changed in the LoadRunner controller.
timeout
String
The string must contain a numeric value followed by a suffix (s, m, h, d).
No
2h
The timeout for the Loadrunner scenario. If Loadrunner doesn’t finish the scenario within the specified amount of time, Akamas will consider the workflow as failed.
checkFrequency
String
The string must contain a numeric value followed by a suffix (s, m, h, d).
No
1m
The interval at which Akamas check’s the status of the Loadrunner scenario.
executable
String
A valid windows path
No
C:\Program Files (x86)\Micro Focus\LoadRunner\bin\Wlrun.exe
The LoadRunner executable path
scenarioFile
String
Matches an existing file within the LoadRunner controller
Yes
The LoadRunner scenario file to execute the performance test.
resultFolder
String
Yes
The folder, on the controller, where Loadrunner will put the results of a performance test.
You can use the placeholders {study}, {exp}, {trial} to generate a path that is unique for the running Akamas trial.
It can be a local path on the controller or on a network share.
loadrunnerResOverride
String
A valid name for a Windows folder
No
res
The folder name where LoadRunner save the analysis results.
The default value can be changed in the LoadRunner controller.
timeout
String
The string must contain a numeric value followed by a suffix (s, m, h, d).
No
2h
The timeout for the Loadrunner scenario. If Loadrunner doesn’t finish the scenario within the specified amount of time, Akamas will consider the workflow as failed.
checkFrequency
String
The string must contain a numeric value followed by a suffix (s, m, h, d).
No
1m
The interval at which Akamas check’s the status of the Loadrunner scenario.
executable
String
A valid windows path
No
C:\Program Files (x86)\Micro Focus\LoadRunner\bin\Wlrun.exe
The LoadRunner executable path.
controller
Object
Yes
component
String
No
The information required to connect to LoadRunner controller machine.
The name of the component from which the operator will take its configuration options.
controller:
hostname: loarrunner.example.com
username: Domain\LoadRunnerUser
password: j(sBdH5fsG9.I56P%7n2XPjmgO6!ARm=name: "task1"
operator: "LoadRunner"
arguments:
controller:
hostname: loarrunner.example.com
username: Domain\LoadRunnerUser
password: j(sBdH5fsG9.I56P%7n2XPjmgO6!ARm=
scenarioFile: 'C:\Users\LoadRunnerUser\Desktop\test\scenario\Scenario1.lrs'
resultFolder: 'c:\Temp\{study}\{exp}\{trial}'
timeout: 15m
checkFrequency: 30scommand
String
Yes
The command to be executed on the remote machine
host
Object
It should have a structure like the one described here below
No
Here follows the structure of the host argument
with its arguments:
protocol
String
https
http
Yes, if the Component whose name is defined in component hasn’t a property named host->protocol
The component argument can refer to a Component by name and use its properties as the arguments of the operator. In case the mapped arguments are already provided to the operator, there is no override.
Here is an example of a component that overrides the host and the command arguments:
Micro Focus Performance Center 12.60 or 12.63
LoadRunner Enterprise 2020 SP3
When you define a task that uses the LoadRunnerEnterprise operator you should specify some configuration information to allow the operator to connect to LoadRunner Enterprise and execute a provided test scenario.
You can specify configuration information within the arguments that are part of a task in the YAML of the definition of a workflow.
You can avoid specifying each configuration information at the task level, by including a component property with the name of a component; in this way, the operator will take any configuration information from the properties of the referenced component
This table reports the configuration reference for the arguments section
address
String
A valid URL I.e.
Yes
The following screenshot from Performance Center shows the testId value highlighted.
The following screenshot from Performance Center shows the testSet name highlighted.
How to retrieve the testId value from LoadRunner Enterprise
then test management from the main menu
Oracle 12c or later
The OracleExecutor operator must be able to connect to the Oracle URL or IP address and port (default port is 1521)
The user used to log into the database must have enough privilege to perform the required queries
When you define a task that uses the Oracle Executor operator you should specify some configuration information to allow the operator to connect to the Oracle instance and execute queries.
The operator inherits the connection arguments from the properties of the component when referenced in the task definition. The Akamas user can also override the properties of the component or not reference it at all defining the connection fields directly in the configuration of the task.
The following table provides the list of all properties required to define a task that uses the OracleExecutor operator.
connection.dsn
String
The DSN or EasyConnect string
Notice: it is a good practice to define only queries that update the state of the database. Is not possible to use SELECT queries to extract data from the database.
In the following example, the operator performs a cleanup action on a table of the database:
In the following example, the operator leverages its templating features to update a table:
The referenced oracledb component contains properties that specify how to connect to the Oracle database instance:
The WindowsFileConfigurator operator allows configuring systems tuned by Akamas by interpolating configuration parameters into files on remote Windows machines.
The operator performs the following operations:
It reads an input file from a remote machine containing templates for interpolating the configuration parameters generated by Akamas
It replaces the values of configuration parameters in the input file
It writes the file with replaced configuration parameters on a specified path on another remote machine
Access on remote machines is performed using WinRM
The Windows File Configurator allows writing templates for configuration parameters in two ways:
a single parameter is specified to be interpolated:
all parameters of a component to be interpolated:
It is possible to add a prefix or suffix to interpolated configuration parameters by acting at the component-type level:
In the example above, the parameter x1 will be interpolated with the prefix PREFIX and the suffix SUFFIX, ${value} will be replaced with the actual value of the parameter at each experiment.
Suppose we have the configuration of the following parameters for experiment 1 of a study:
where component1 is of type MyComponentType defined as follows:
A template file to interpolate only parameter component1.param1 and all parameters from component2 would look like this:
The file after the configuration parameters are interpolated would look like this:
Note that the file in this example contains a bash command whose arguments are constructed by interpolating configuration parameters. This represents a typical use case for the WindowsFileConfigurator: to construct the right bash commands that configure a system with the new configuration parameters computed by Akamas.
source and target structure and argumentsHere follows the structure of either the source or target operator argument
componentThe component argument can be used to refer to a Component by name and use its properties as the arguments of the operator. In case the mapped arguments are already provided to the operator, there is no override.
Notice that in this case, the operator replaces in the template file only tokens referring to the specified component. A parameter bound to any component causes the substitution to fail.
Where the apache-server-1 component is defined as:
The Executor Operator can be used to execute a shell command on a target machine using SSH.
Host structure and argumentsHere follows the structure of the host argument:
with its arguments:
componentThe component argument can refer to a component by name and use its properties as the arguments of the operator (see mapping here below). In case the mapped arguments are already provided to the operator, there is no override.
Let's assume you want to run a script on a remote host and expect the script to be executed successfully within 30 seconds but might fail occasionally.
Launch a script, wait for its completion, and in case of failures or timeout retry 3 times by waiting 10 seconds between retries:
Execute a uname command with explicit host information (explicit SSH key)
Execute a uname command with explicit host information (imported SSH key)
Execute a uname command with host information taken from a Component
Start a load-testing script and keep it running in the background during the workflow
Due to the stderr configuration, it could happen that invoking a bash script on a server has a different result than running the same script from Akamas Executor Operator. This is quite common with Tomcat startup scripts like $HOME/tomcat/apache-tomcat_1299/bin/startup.sh.
To avoid this issue simply create a wrapper bash file on the target server adding the set -m instruction before the sh command, eg:
and then configure the Executor Operator to run the wrapper script like:
You can run the following to emulate the same behavior of Akamas running scripts over SSH:
There are cases in which you would like to keep a script running for the whole duration of the test. Some examples could be:
A script applying load to your system for the duration of the workflow
The manual start of an application to be tested
The setup of a listener that gathers logs, metrics, or data
In all the instances where you need to keep a task running beyond the task that started it, you must use the detach: true property.
Note that a detached executor task returns immediately, so you should run only the background task in detached mode.
Remember to keep all tasks requiring synchronous (standard) behavior out of the detached task.
Example:
Library references
The library used to execute scripts remotely is , a high-level Python library designed to execute shell commands remotely over SSH, yielding useful Python objects in return.
The Fabric library uses a connection object to execute scripts remotely (see . The option of a dedicated detach mode comes from implementing the more robust disown property from the Invoke Runner underlying the Connection (see ). This is the reason you should rely on detach whenever possible instead of running the background processes straight into the script.
In the you may find some further information about the typical problems and solutions due to hanging problems for background processes.
host:
protocol: [https|http]
hostname: this_is_a_hostname
port: 5863
path: /wsman
username: this_is_a_username
password: this_is_a_password
validateCertificate: falsename: LoadRunnerMachine
componentType: WebApplication
properties:
command: "dir c:\"
host:
hostname: lr.mydomain.com
username: MyLoadRunnerUser
password: MyPasswordname: TestConnectivity
operator: WindowsExecutor
arguments:
command: "dir c:\"
host:
hostname: frontend.akamas.io
username: administrator
password: MyPasswordname: TestConnectivity
operator: WindowsExecutor
arguments:
command: "dir c:\"
component: frontend1name: test
operator: LoadRunnerEnterprise
arguments:
address: "http://lr-pc.dev.akamas.io"
username: akamas
password: akamas
tenantID: cf59c1a8-ad2d-4c9a-9324-edadaae5b8b9
domain: AKAMASDOMAIN
project: akamasproject
testId: 1
testSet: testsetname
timeSlot: '30m'tasks:
- name: clean database
operator: OracleExecutor
arguments:
sql:
- TRUNCATE TABLE user_action
- DELETE FROM user WHERE id LIKE 'test%'
connection:
user: application
password: password
dsn: oradb.dev.akamas.io/XEtasks:
- name: set value
operator: OracleExecutor
arguments:
sql:
- UPDATE rs_component_pros SET value='${app.max_connections}' WHERE property='maxconn'
component: oracledbname: oracledb
componentType: Oracle Database 18c
properties:
connection:
user: application
password: password
host: oradb.dev.akamas.io
service: XEInformation relative to the target machine onto which the command has to be executed
component
String
It should match the name of an existing Component of the System under test
No
The name of the Component whose properties can be used as arguments of the operator
https
The protocol to use to connect to the Windows machine with WinRM
hostname
String
Valid FQDN or ip address
Yes, if the Component whose name is defined in component hasn’t a property named host->hostname
-
Windows machine’s hostname
port
Number
1≤port≤65532
Yes, if the Component whose name is defined in component hasn’t a property named host->port
5863
WinRM port
path
String
-
Yes, if the Component whose name is defined in component hasn’t a property named host->path
/wsman
The path where WinRM is listening
username
String
username
domain\username
username@domain
Yes, if the Component whose name is defined in component hasn’t a property named host->hostname
-
User login (domain or local)
password
String
-
Yes, if the Component whose name is defined in component hasn’t a property named host->password
-
Login password
authType
String
ntlm
ssl
Yes, if the Component whose name is defined in component hasn’t a property named host->authType
ntlm
The authentication method to use against Windows machine
validateCertificate
Boolean
true
false
Yes, if the Component whose name is defined in component hasn’t a property named host->validateCertificate
False
Whether or not validate the server certificate
ca
String
A valid CA certificate
Yes, if the Component whose name is defined in component hasn’t a property named host->ca
-
The CA that is required to validate the servier certificate
operationTimeoutSec
Integer
Must be greather then 0
No
The amount in seconds after which the execution of the command is considered failed
Notice that the ouput of the command doesn’t reset the timeout.
readTimeoutSec
Integer
Must be greather then operationTimeoutSec
No
The amount of seconds to wait before an HTTP connect/read times out
Is possible to define only one of the following sets of configurations:
dsn
host, service and optionally port
host, sid and optionally port
task, component
connection.host
String
The address of the database instance
task, component
connection.port
Integer
The listening port of the database instance
1521
task, component
connection.service
String
The database service name
task, component
connection.sid
String
The database SID
task, component
connection.user
String
The user name
Yes
task, component
connection.password
String
The user password
Yes
task, component
connection.mode
String
The connection mode
sysdba, sysoper
task, component
sql
List[String]
The list of queries to update the database status before or after the workload execution. Queries can be templatized, containing tokens referencing parameters of any component in the system.
Yes
task
autocommit
boolean
A Flag to enable the auto-commit feature
False
No
task
component
String
The name of the component to fetch properties from
No
task
-
The information required to connect to LoadRunner Enterprise.
username
String
-
Yes
-
The username used to connect to LoadRunner Enterprise
password
String
-
Yes
-
The password for the specified user
tenantID
String
-
No
-
The id of the tenant (Only for LR2020)
domain
String
-
Yes
The Domain of your load test projects.
project
String
-
Yes
The Project name of your load test projects
testId
Number
-
Yes
The id of the load test. See here below how to retrieve this from LoadRunner.
testSet
String
-
Yes
-
The name of the TestSet. See here below how to retrieve this from LoadRunner.
timeSlot
String
A number followed by the time unit.
Values must be multiple of 15m and greater then 30m
Valid units are:
m: minutes
h: hours
Yes
-
The reserved time slot for the test.
Examples:
1h
45m
1h30m
component
String
A valid component name
No
-
The name of the component from which the operator will take its configuration options
pollingInterval
Number
A positive integer number
No
30
The frequency (in seconds) of at witch Akamas checks for the load test status
verifySSL
String
True, False
No
True
Wether to validate the certificate provided by the LRE server when using an https connection



Information relative to the source/input file to be used to interpolate optimal configuration parameters discovered by Akamas
target
Object
It should have a structure like the one defined in the next section
No, if the Component whose name is defined in component has properties that map to the ones defined within target
Information relative to the target/output file to be used to interpolate optimal configuration parameters discovered by Akamas
component
String
It should match the name of an existing Component of the System under test
No
The name of the Component whose properties can be used as arguments of the operator
Windows host
username
String
Yes
Login username
password
String
Windows password for the specified user
Yes
Login password
path
String
It should be a valid path
Yes
The path of the file to be used either as the source or target of the activity to applying Akamas computed configuration parameters using files
sourcePath
source->path
targetPath
target->path
source
Object
It should have a structure like the one defined in the next section
No, if the Component whose name is defined in component has properties that map to the ones defined within source
hostname
String
It should be a valid host address
Yes
Component property
Operator argument
hostname
source->hostname target->hostname
username
source->username target->username
password
source->password target->password
component
String
It should match the name of an existing Component of the System under test
no
The name of the Component whose properties can be used as arguments of the operator
detach
Boolean
no
False
The execution mode of the shell command.
Default (False) execution will be synchronous, detached (True) execution will be asynchronous and will return immediately
username
String
no, if the Component whose name is defined in component has a property named username
SSH login username
password
String
cannot be set if key is already set
no, if the Component whose name is defined in component has a property named password
SSH login password
sshPort
Number
1≤sshPort≤65532
no
22
SSH port
key
String
cannot be set if password is already set
no, if the Component whose name is defined in component has a property named key
SSH login key. Either provide directly the key value or specify the path of the file (local to the cli executing the create command) to read the key from. The operator supports RSA and DSA Keys.
host->password
key
host->key
command
String
yes
The shell command to be executed on the remote machine
host
Object
See structure documented below
no
hostname
String
should be a valid SSH host address
no, if the Component whose name is defined in component has a property named hostname
hostname
host->hostname
username
host->username
sshPort
host->sshPort
Information relative to the target machine onto which the command has to be executed using SSH
SSH endpoint
password
${component_name.parameter_name}${component_name.*}name: Component Type 1
description: My Component type
parameters:
- name: x1
domain:
type: real
domain: [-5.0, 10.0]
defaultValue: -5.0
# Under this section, the operator to be used to configure the parameters is defined
operators:
WindowsFileConfigurator:
# using this OPTIONAL confTemplate property is possible to interpolate the parameter value with a prefix and a suffix
confTemplate: "PREFIX${value}SUFFIX"component1.param1: 1024
component1.param2: Category1
component2.param3: 7
component2.param4: 35.4name: MyComponentType
description: "MyComponentType
parameters:
- name: param1
domain:
type: real
domain: [-5.0, 10.0]
defaultValue: -5.0
# Under this section, the operator to be used to configure the parameters is defined
operators:
WindowsFileConfigurator:
# using this OPTIONAL confTemplate property is possible to interpolate the parameter value with a prefix and a suffix
confTemplate: "X1:${value}MB"
# ...myexecutable.exe /PARAM ${component1.param1} /PARAMS ${component2.*}myexecutable.exe /PARAM X1:1024MB /PARAMS 7 35.4name: RemoteConfOperatorTestStandalone
operator: WindowsFileConfigurator
arguments:
source:
hostname: template-server
username: akamas-user1
password: akamas-password1
path: C:\templates\frontned-httpd.conf
target:
hostname: frontend-server
username: akamas-user2
password: akamas-password22
path: c:\httpd\httpd.confname: RemoteConfOperatorTestStandalone
operator: WindowsFileConfigurator
arguments:
component: apache-server-1name: apache-server-1
description: The Apache server instance
componentType: Apache Server 2.4
properties:
hostname: apache.akamas.io
username: administrator
sourcePath: c:\template\httpd.conf.template
targetPath: c:\httpd\httpd.confhost:
hostname: this_is_a_hostname
username: this_is_a_username
password: this_is_a_password
sshPort: 22
key: this_is_a_keyname: Run Script
operator: Executor
arguments:
timeout: 30s
retries: 3
retry_delay: 10s
command: bash /tmp/myscript.sh
host:
hostname: frontend.akamas.io
username: akamas
key: secret.keyname: TestConnectivity
operator: Executor
arguments:
command: bash uname -a
host:
hostname: frontend.akamas.io
username: akamas
key: |-
-----BEGIN RSA PRIVATE KEY-----
RSA KEY HERE
-----END RSA PRIVATE KEY-----name: TestConnectivity
operator: Executor
arguments:
command: bash uname -a
host:
hostname: frontend.akamas.io
username: akamas
key: path/to/keyname: TestConnectivity
operator: Executor
arguments:
command: bash uname -a
component: frontend1name: TestConnectivity
operator: Executor
arguments:
command: bash start_load.sh
component: tester
detach: true#!/bin/bash
set -m;
$HOME/tomcat/apache-tomcat_1299/bin/startup.shcommand: "bash $HOME/akamasScript/tomcatStart.shssh -t <user>@<server> <your command here>switch on machine and wait for SSH
run application test in background → detached mode
execute test runComponent types are defined using a YAML manifest with the following structure:
and properties for the general section:
The parameter section describes the relationship between the component type and already defined parameters with the following properties:
The metric section describes the relationship between the component type and already defined metrics with the following properties:
Notice that component type definitions are shared across all the workspaces on the same Akamas installation, and require an account with administrative privileges to manage them.
Example of a component for the Cassandra component type:
Example of a component for the Linux operating component type:
The FileConfigurator operator allows configuring systems tuned by Akamas by interpolating configuration parameters into files on remote machines.
The operator performs the following operations:
It reads an input file from a remote machine containing templates for interpolating the configuration parameters generated by Akamas
It replaces the values of configuration parameters in the input file
It writes the file with replaced configuration parameters on a specified path on another remote machine
Access on remote machines is performed using SFTP (SSH).
The FileConfigurator allows writing templates for configuration parameters in two ways:
specify that a parameter should be interpolated directly:
specify that all parameters of a component should be interpolated:
It is possible to add a prefix or suffix to interpolated configuration parameters by acting at the component-type level:
Notice that any parameter that does not contain the FileConfigurator element in the operators' attribute is ignored and not written.
In the example above, the parameter x1 will be interpolated with the prefix PREFIX and the suffix SUFFIX, ${value} will be replaced with the actual value of the parameter at each experiment.
Let's assume we want to apply the following configuration:
where component1 is of type MyComponentType and MyComponentType is defined as follows:
A template file to interpolate only parameter component1.param1 and all parameters from component2 would look like this:
The file after the configuration parameters are interpolated would look like this:
Note that the file in this example contains a bash command whose arguments are constructed by interpolating configuration parameters. This represents a typical use case for the File Configurator: to construct the right bash commands that will configure a system with the new configuration parameters computed by Akamas.
source and target structures and argumentsHere follows the structure of either the source or target operator argument
componentThe component argument can be used to refer to a component by name and use its properties as the arguments of the operator. In case the mapped arguments are already provided to the operator, there is no override.
In this case, the operator replaces in the template file only tokens referring to the specified component. A parameter bound to any component will cause the substitution to fail.
where the apache-server-1 component is defined as:
Optimization goals and constraints are defined using a YAML manifest with the following structure:
where:
The function field of the Goal of a Study details the characteristics of the function Akamas should minimize or maximize to reach the desired performance objective.
The function field has the following structure:
Where:
The formula field represents the mathematical expression of the performance objective for the Study and contains variables and operators with the following characteristics:
Valid operators are: +, -, *, /, ^, sqrt(variable), log(variable), max(variable1, variable2), and min(variable1, variable2)
Each metric that is directly or indirectly part of the formula of the function of the Goal is aggregated by default by average; more specifically, Akamas computes the average of each metric within the time window specified by the of the Study. Variables in the formula can be expanded with an aggregation in the form of <variable>:<aggreggation>. A list of available aggregations is available in the section .
The variables field contains the specification of additional variables present in the formula, variables that can offer more flexibility compared to directly specifying each metric of each Component in the formula.
Notice: each subfield of variables specifies a variable with its characteristics, the name of the subfield is the name of the variable.
The variable subfield has the following structure:
It is possible to use the notation <component_name>.<metric_name> in the metric field to automatically filter the metric’s data point by that component name is applied.
The constraints field specifies constraints on the metrics of the system under test. For a configuration to be valid for the defined goal, such constraints must be satisfied. Constraints can be defined as absolute or relativeToBaseline.
Each constraint has the form of:
mathematical_operationcomparison_operatorvalue_to_compare
where valid mathematical operations include:
+ - * / ^
min max
valid comparison operators include:
> < <= >=
== != (equality, inequality)
and valid values to compare include:
absolute values (e.g, 104343)
percentage values relative to the baseline (e.g, 20%)
As an example, you could define an absolute constraint with the following snippet:
Relative constraints can be defined by adding other constraints under the relativeToBaseline section. In the example below, for the configuration to be considered valid, it's required that the metric jvm.memory_used does not exceed by 80% the value measured in the baseline.
Variables used in the study formula specification and in the constraints definition can include an aggregation. The following aggregations are available: avg, min, max, sum, p90, p95, p99.
The following example refers to a study whose goal is to optimize the throughput of a Java service (jpetstore), that is to maximize the throughput (measured as elements_per_second) while keeping errors (error_rate) and latency (avg_duration, max_duration) under control (absolute values):
The following example refers to a study whose goal is to optimize the memory consumption of Docker containers in a microservices application, that is to minimize the average memory consumption of Docker containers within the application of appId="app1" by observing memory limits, also normalizing by the maximum duration of a benchmark (containers_benchmark_duration).
The LinuxConfigurator operator allows configuring systems tuned by Akamas by applying parameters related to the Linux kernel using different strategies.
The operator can configure provided Components or can configure every Component which has parameters related to the Linux kernel.
The parameters are applied via SSH protocol.
# General section
name: function_branin
description: A component type for the branin analytical function
# Parameters section
parameters:
- name: x1
domain:
type: real
domain: [-5.0, 10.0]
defaultValue: -5.0
decimals: 3
operators:
FileConfigurator:
confTemplate: "${value}"
- name: x2
domain:
type: real
domain: [0.0, 15.0]
defaultValue: 0.0
- name: x3
domain:
type: categorical
categories: [cat1,cat2,cat3]
operators:
LinuxConfigurator:
echo:
file: /sys/class/block/nvme0n1/queue/scheduler
# Metrics section
metrics:
- name: function_valuegoal:
objective: "minimize"
function:
formula: "jvm1.response_time + jvm2.response_time"
constraints:
absolute:
- name: heap_used
formula: jvm1.heap_used <= 3221225472
relativeToBaseline:
- name: memory_used
formula: jvm1.memory_used <= 80%Information relative to the source/input file to be used to interpolate optimal configuration parameters discovered by Akamas
target
Object
should have a structure like the one defined in the next section
no, if the Component whose name is defined in component has properties that map to the ones defined within target
Information relative to the target/output file to be used to interpolate optimal configuration parameters discovered by Akamas
component
String
should match the name of an existing Component of the System under test
no
The name of the Component whose properties can be used as arguments of the operator
ignoreUnsubstitutedTokens
Boolean
no
False
Behavior of the operator regarding leftover tokens in the target file.
When False, FileConfigurator fails.
When True , FileConfigurator succeeds regardless of leftover tokens
SSH endpoint
username
String
yes
SSH login username
password
String
cannot be set if key is already set
no
SSH login password
sshPort
Number
1≤sshPort≤65532
no
22
SSH port
key
String
cannot be set if password is already set
no
SSH login key, provided directly its value or the path of the file to import from. The operator supports RSA and DSA Keys
path
String
should be a valid path
yes
The path of the file to be used either as the source or target of the activity to applying Akamas computed configuration parameters using files
password
source->password target->password
key
source->key target->key
sourcePath
source->path
targetPath
target->path
source
Object
should have a structure like the one defined in the next section
no, if the Component whose name is defined in component has properties that map to the ones defined within source
hostname
String
should be a valid SSH host address
yes
hostname
source->hostname target->hostname
username
source->username target->username
sshPort
source->sshPort target->sshPort
The mathematical expression of what to minimize or maximize to reach the objective of the Study.
variables
Object
See below
No
The specification of additional variables present in the formula.
Valid variables are in the form:
<component_name>.<metric_name>, which correspond directly to metrics of Components of the System under test
<variable_name>, which should match variables specified in the variables field
The name of the metric of the Components of the System under test that maps to the variable.
labels
A set of key-value pairs
No
A set of filters based on the values of the labels that are attached to the different data points of the metric. One of these labels is componentName, which contains the name of the Component the metric refers to.
aggregation
String
MAX MIN AVG
No
AVG
The strategy through which data points of the metric should be aggregated within the window produced by the application of the selected windowing strategy. By default, an average is taken.
sqrt log (log is a natural logarithm)
objective
String
minimize maximize
Yes
How Akamas should evaluate the goodness of a generated configuration: if it should consider good a configuration generated that maximizes function, or a configuration that minimizes it.
function
Object
It should have a structure like the one described in Goal function
Yes
The specification of the function to be evaluated to assess the goodness of a configuration generated by Akamas. This function is a function of the metrics of the different Components of the System under test.
constraints
List of objects
It should have a structure like the one described in Goal constraints
No
A list of constraints on aggregated metrics of the Components of the System under test for which a generated configuration should not be considered valid.
formula
String
See formula
Yes
metric
String
should match the name of a metric defined for the Components of the System under test
Yes
${component_name.parameter_name}${component_name.*}name: Component Type 1
description: My Component type
parameters:
- name: x1
domain:
type: real
domain: [-5.0, 10.0]
defaultValue: -5.0
# Under this section, the operator to be used to configure the parameters is defined
operators:
FileConfigurator:
# using this OPTIONAL confTemplate property is possible to interpolate the parameter value with a prefix and a suffix
confTemplate: "PREFIX${value}SUFFIX"component1.param1: 1024
component1.param2: Category1
component2.param3: 7
component2.param4: 35.4name: MyComponentType
description: "MyComponentType
parameters:
- name: param1
domain:
type: real
domain: [-5.0, 10.0]
defaultValue: -5.0
# Under this section, the operator to be used to configure the parameters is defined
operators:
FileConfigurator:
# using this OPTIONAL confTemplate property is possible to interpolate the parameter value with a prefix and a suffix
confTemplate: "X1:${value}MB"
...myexecutable.sh -PARAM ${component1.param1} -PARAMS ${component2.*}myexecutable.sh -PARAM X1:1024MB -PARAMS 7 35.4name: RemoteConfOperatorTestStandalone
operator: FileConfigurator
arguments:
source:
hostname: template-server
username: akamas-user1
password: akamas-password1
path: /templates/frontend-httpd.conf
target:
hostname: frontend-server
username: akamas-user2
password: akamas-password22
path: /etc/httpd/httpd.confname: RemoteConfOperatorTestStandalone
operator: FileConfigurator
arguments:
component: apache-server-1name: apache-server-1
description: The Apache server instance
componentType: Apache Server 2.4
properties:
hostname: apache.akamas.io
username: ubuntu
key: key.pem
sourcePath: templates/httpd.conf.template
targetPath: /etc/httpd/httpd.conf
function:
formula: "jvm1.response_time / sqrt(x:max)"
variables:
x:
metric: "throughput"
labels:
componentName: "jvm2"goal:
objective: "minimize"
function:
formula: "jvm.response_time"
constraints:
absolute:
- name: heap_used
formula: jvm.heap_used <= 3221225472goal:
objective: "minimize"
function:
formula: "jvm.response_time"
constraints:
absolute:
- name: heap_used
formula: jvm.heap_used <= 3221225472
relativeToBaseline:
- name: memory_used
formula: jvm.memory_used <= 80%goal:
objective: "maximize"
function:
formula: "jpetstore.elements_per_second"
constraints:
absolute:
- name: elements_per_second
formula: "jpetstore.elements_per_second > 55"
- name: max_duration
formula: "jpetstore.max_duration < 800"
- name: avg_duration
formula: "jpetstore.avg_duration < 70"
- name: error_rate
formula: "jpetstore.error_rate < 0.01"goal:
objective: "minimize"
function:
formula: "containers_memory_limit/containers_benchmark_duration:max"
variables:
containers_memory_limit:
metric: "memory_limit"
labels:
appId: "app1"
containers_benchmark_duration:
metric: "benchmark_duration"
labels:
appId: "app1"domain->type
string
{real, integer, categorical}
Yes
-
The type of domain to be set for the parameter in relationship with the component-type
domain->domain
array of numbers
The numbers should be either all integers or real numbers(do not omit the " . ") depending on domain->type.
The size of the array must be 2.
No
-
The bounds to be used to define the domain of the parameter. These bounds are inclusive
domain->categories
array of strings
No
-
The possible categories that the parameter could possess
defaultValue
string, integer, real
The value must be included in the domain, for real and integer types and must be a value included in the categories
Yes
-
The default value of the parameter
decimals
integer
[0-255]
No
5
The number of decimal digits rendered for this parameter
operators
object
The name and the parameters of a supported
Yes
-
Specify what operators can be used to apply the parameter
name
string
should match the following regexp:
^[a-zA-Z][a-zA-Z0-9_]*$
that is only letters, number and underscores, no initial number of underscore
Notice: this should not match the name of another component
Yes
The name of the component.
description
string
Yes
A description to characterize the component.
componentType
string
notice: this should match the name of an existing component-type
Yes
The name of the component-type that defines the type of the component.
properties
object
No
General custom properties of the component. These properties can be defined freely and usually have the purpose to expose information useful for configuring the component.
name
string
It should match the name of an existing parameter.
Yes
-
name
string
It should match the name of an existing metric
Yes
The name of the parameter that should be related to the component-type
The name of the metric that should be related to the component type
Values restrictions
Required
Default
Description
file
String
It should be a path to a valid java or python spark application file
Yes
This operator automatically maps some properties of its component to some arguments. In case the mapped arguments are already provided to the operator, the is no override.
hostname
hostname
username
username
sshPort
sshPort
password
password
Name
Type
The operator makes use of properties specified in the component to identify which instance should be configured, how to access it, and any other information required to apply the configuration.
Name
Type
Value restrictions
Required
Default
Description
component
String
It should match the name of an existing Component of the System under test
If no component is provided, this operator will try to configure every parameter defined for the Components of the System under test
The following table highlights the properties that can be specified on components and are used by this operator.
Name
Type
Value restrictions
Required
Default
Description
hostname
String
It should be a valid SSH host address
The properties blockDevices and networkDevices allow specifying which parameters to apply to each block/network-device associated with the Component, as well as which block/network-device should be left untouched by the LinuxConfigurator.
If the properties are omitted, then all block/network-devices associated with the Component will be configured will all the available related parameters.
All block-devices called loopN (where N is an integer number greater or equal to 0) are automatically excluded from the Component’s block-devices
The properties blockDevices and networkDevices are lists of objects with the following structure:
Name
Type
Value restrictions
Required
Default
Description
name
String
It should be a valid regular expression to match block/network-devices
In this example, only the parameters os_StorageReadAhead and os_StorageQeueuScheduler are applied to all the devices that match the regex "xvd[a-z]" (i.e. xvda, xvdb, …, xvdc).
In these examples, only the parameter os_StorageMaxSectorKb is applied to block device xvdb and loop0.
Note that the parameter is applied also to the block device loop0, since it is specified in the name filter, this overrides the default behavior since loopN devices are excluded by the Linux Optimization Pack
In this example, no parameters are applied to the wlp4s0 network device, which is therefore excluded from the optimization.
To support the scenario in which some configuration parameters related to the Linux kernel may be applied using the strategies supported by this operator, while others with other strategies (e.g, using a file to be written on a remote machine), it is necessary to specify which parameters should be applied with the LinuxConfigurator, and this is done at the ComponentType level; moreover, still at the ComponentType level, it is necessary to specify which strategy should be used to configure each parameter. This information is already embedded in the Linux Optimization pack and, usually, no customization is required.
With this strategy, a parameter is configured by leveraging the sysctl utility. The sysctl variable to map to the parameter that needs to be configured is specified using the key argument.
With this strategy, a parameter is configured by echoing and piping its value into a provided file. The path of the file is specified using the file argument.
With this strategy, each possible value of a parameter is mapped to a command to be executed on the machine the LinuxConfigurator operates on(this is especially useful for categorical parameters).
With this strategy, a parameter is configured by executing a command into which the parameter value is interpolated.
gc_count
collections/s
The total number of garbage collections
gc_duration
seconds
The garbage collection duration
heap_hard_limit
bytes
The size of the heap
csproj_System_GC_Server
categorical
CPUs
Optimization studies are defined using a YAML manifest with the following structure:
with the following mandatory properties:
name: Cassandra
description: The Cassandra NoSQL database version 3
parameters:
- name: cassandra_compactionStrategy
domain:
type: categorical
categories: [A, B]
defaultValue: A
metrics:
- name: total_rate
- name: read_rate
- name: write_rate
- name: read_response_time_avg
- name: read_response_time_p90
- name: read_response_time_p99
- name: read_response_time_max
- name: write_response_time_avg
- name: write_response_time_p90
- name: write_response_time_p99
- name: write_response_time_maxname: Linux OS
description: A component type for the Linux Operating System
parameters:
#CPU Related
- name: os_cpuSchedMinGranularity
domain:
type: integer
domain: [300000, 30000000]
defaultValue: 3000000
- name: os_cpuSchedWakeupGranularity
domain:
type: integer
domain: [400000, 40000000]
defaultValue: 4000000
- name: osCpu.schedMigrationCost
domain:
type: integer
domain: [100000, 5000000]
defaultValue: 500000
- name: os_CPUSchedChildRunsFirst
domain:
type: integer
domain: [0, 1]
defaultValue: 0
- name: os_CPUSchedLatency
domain:
type: integer
domain: [2400000, 240000000]
defaultValue: 24000000
- name: os_CPUSchedAutogroupEnabled
domain:
type: integer
domain: [0, 1]
defaultValue: 1
- name: os_CPUSchedNrMigrate
domain:
type: integer
domain: [3, 320]
defaultValue: 32
#Memory Related
- name: os_MemorySwappiness
domain:
type: integer
domain: [0, 100]
defaultValue: 60
- name: os_MemoryVmVfsCachePressure
domain:
type: integer
domain: [10, 100]
defaultValue: 100
- name: os_MemoryVmMinFree
domain:
type: integer
domain: [10240, 1024000]
defaultValue: 67584
- name: os_MemoryVmDirtyRatio
domain:
type: integer
domain: [1, 99]
defaultValue: 10
- name: os_MemoryTransparentHugepageEnabled
domain:
type: categorical
categories: ['True', 'False']
defaultValue: 'True'
- name: os_MemoryTransparentHugepageDefrag
domain:
type: categorical
categories: ['True', 'False']
defaultValue: 'True'
- name: os_MemorySwap
domain:
type: categorical
categories: ['True', 'False']
defaultValue: 'True'
- name: os_MemoryVmDirtyExpire
domain:
type: integer
domain: [300, 30000]
defaultValue: 3000
- name: os_MemoryVmDirtyWriteback
domain:
type: integer
domain: [50, 5000]
defaultValue: 500
metrics:
- name: cpu_num
- name: cpu_util
- name: mem_util
- name: load_avg
- name: swapins
- name: swapouts
- name: disk_iops_writes
- name: disk_iops_reads
- name: disk_iops_total
- name: disk_await_worst
- name: proc_blocked
- name: context_switch
- name: tcp_retrans
- name: tcp_tozerowin
- name: net_band_rx_bits
- name: net_band_tx_bits
- name: network_in_byte_rate
- name: network_out_byte_rate
- name: mem_fault_minor
- name: mem_fault_major
- name: mem_active_file
- name: mem_active_anon
- name: mem_inactive_file
- name: mem_inactive_anon- name: LinuxConf
operator: LinuxConfigurator
arguments:
component: ComponentNameblockDevices:
- name: "xvd[a-z]"
parameters:
- os_StorageReadAhead
- os_StorageQueueSchedulerblockDevices:
- name: "xvdb|loop0"
parameters:
- os_StorageMaxSectorsKbnetworkDevices:
- name: wlp4s0
parameters: []name: Component Type 1
description: My Component type
parameters:
- name: net_forwarding
domain:
type: integer
domain: [0, 1]
defaultValue: 1
operators:
# the parameter is configured using LinuxConfigurator
LinuxConfigurator:
sysctl:
key: net.ipv4.forwardingname: Component Type 1
description: My Component type
parameters:
- name: os_MemoryTransparentHugepageEnabled
domain:
type: categorical
categories: [always, never]
defaultValue: always
operators:
LinuxConfigurator:
echo:
file: /sys/kernel/mm/transparent_hugepage/enabledname: Component Type 1
description: My Component type
parameters:
- name: os_MemorySwap
domain:
type: categorical
categories: [swapon, swapoff]
defaultValue: swapon
operators:
LinuxConfigurator:
map:
swapon: command1
swapoff: command2name: Component Type 1
description: My Component type
parameters:
- name: os_MemorySwap
domain:
type: categorical
categories: [swapon, swapoff]
defaultValue: swapon
operators:
LinuxConfigurator:
command:
cmd: sudo ${value} -aSpark application to submit (jar or python file)
args
List of Strings, Numbers or Booleans
Yes
Additional application arguments
master
String
t should be a valid supported Master URL:
local
local[K]
local[K,F]
local[]
local[,F]
spark://HOST:PORT
spark://HOST1:PORT1, HOST2:PORT2
yarn
Yes
The master URL for the Spark cluster
deployMode
client cluster
No
cluster
Whether to launch the driver locally (client) or in the cluster (cluster)
className
String
No
The entry point of the java application. Required for java applications.
name
String
No
Name of the task. When submitted the id of the study, experiment and trial will be appended.
jars
List of Strings
Each item of the list should be a path that matches an existing jar file
No
A list of jars to be added in the classpath.
pyFiles
List of Strings
Each item of the list should be a path that matches an existing python file
No
A list of python scripts to be added to the PYTHONPATH
files
List of Strings
Each item of the list should be a path that matches an existing file
No
A list of files to be added to the context of the spark-submit command
conf
Object (key-value pairs)
No
Mapping containing additional Spark configurations. See Spark documentation.
envVars
Object (key-value pairs)
No
Env variables when running the spark-submit command
verbose
Boolean
No
true
If additional debugging output should be output
sparkSubmitExec
String
It should be a path that matches an existing executable
No
The default for the Spark installation
The path of the spark-submit executable command
sparkHome
String
It should be a path that matches an existing directory
No
The default for the Spark installation
The path of the SPARK_HOME
proxyUser
String
No
The user to be used to execute Spark applications
hostname
String
It should be a valid SSH host address
No, if the Component whose name is defined in component has a property named hostname
SSH host address
username
String
No, if the Component whose name is defined in component has a property named username
SSH login username
sshPort
Number
1≤sshPort≤65532
No
22
SSH port
password
String
Cannot be set if key is already set
No, if the Component whose name is defined in component has a property named password
SSH login password
key
String
Cannot be set if password is already set
No, if the Component whose name is defined in component has a property named key
SSH login key, provided directly its value or the path of the file to import from. The operator supports RSA and DSA Keys.
component
String
It should match the name of an existing Component of the System under test
Yes
The name of the Component whose properties can be used as arguments of the operator
key
key
No
The name of the Component for which available Linux kernel parameters will be configured
Yes
SSH host address
sshPort
Integer
1≤sshPort≤65532
Yes
22
SSH port
username
String
Yes
SSH login username
key
Multiline string
Either key or password is required
SSH login key, provided directly its value or the path of the file to import from. The operator supports RSA and DSA Keys
password
String
Either key or password is required
blockDevices
List of objects
It should have a structure like the one described in the next section
No
Allows the user to restrict and specify to which block-device apply block-device-related parameters
networkDevices
List of objects
It should have a structure like the one described in the next section
No
Allows the user to restrict and specify to which network-device apply network-device-related parameters
Yes
A regular expression that matches block/network-devices to configure with related parameters of the Component
parameters
List of strings
It should contain the names of matching parameters of the Component
No
The list of parameters to be configured for the specified block/network-devices. If the list is empty, then no parameter will be applied for the block/network-devices matched by name
false
true, false
yes
The main flavor of the GC: set it to false for workstation GC or true for server GC. To be set in csproj file and requires rebuild.
csproj_System_GC_Concurrent
categorical
boolean
true
true, false
yes
Configures whether background (concurrent) garbage collection is enabled (setting to true). To be set in csproj file and requires rebuild.
runtime_System_GC_Server
categorical
boolean
false
true, false
yes
The main flavor of the GC: set it to false for workstation GC or true for server GC. To be set in csproj file and requires rebuild.
runtime_System_GC_Concurrent
categorical
boolean
true
true, false
yes
Configures whether background (concurrent) garbage collection is enabled (setting to true). To be set in csproj file and requires rebuild.
runtime_System_GC_HeapCount
integer
heapcount
8
1 → 1000
no
Limits the number of heaps created by the garbage collector. To be set in runtimeconfig.json in runtimeOptions: configProperties
runtime_System_GC_CpuGroup
categorical
boolean
0
1, 0
no
Configures whether the garbage collector uses CPU groups or not. Default is false. To be set in runtimeconfig.json
runtime_System_GC_NoAffinitize
categorical
boolean
false
true, false
no
Specifies whether to affinitize garbage collection threads with processors. To affinitize a GC thread means that it can only run on its specific CPU. To be set in runtimeconfig.json in runtimeOptions: configProperties
runtime_System_GC_HeapHardLimit
integer
bytes
20971520
16777216 → 1099511627776
no
Specifies the maximum commit size, in bytes, for the GC heap and GC bookkeeping. To be set in runtimeconfig.json in runtimeOptions: configProperties
runtime_System_GC_HeapHardLimitPercent
real
percent
0.75
0.1 → 100.0
no
Specifies the allowable GC heap usage as a percentage of the total physical memory. To be set in runtimeconfig.json in runtimeOptions: configProperties.
runtime_System_GC_HighMemoryPercent
integer
bytes
20971520
16777216 → 1099511627776
no
Specify the memory threshold that triggers the execution of a garbage collection. To be set in runtimeconfig.json.
runtime_System_GC_RetainVM
categorical
boolean
false
true, false
no
Configures whether segments that should be deleted are put on a standby list for future use or are released back to the operating system (OS). Default is false. To be set in runtimeconfig.json in runtimeOptions: configProperties
runtime_System_GC_LOHThreshold
integer
bytes
85000
850000 → 1099511627776
no
Specifies the threshold size, in bytes, that causes objects to go on the large object heap (LOH). To be set in runtimeconfig.json in runtimeOptions: configProperties
webconf_maxconnection
integer
connections
2
2 → 1000
no
This setting controls the maximum number of outgoing HTTP connections that you can initiate from a client. To be set in web.config (target app only) or machine.config (global)
webconf_maxIoThreads
integer
threads
20
20 → 1000
no
Controls the maximum number of I/O threads in the .NET thread pool. Automatically multiplied by the number of available CPUs. To be set in web.config (target app only) or machine.config (global). It requires autoConfig=false
webconf_minIoThreads
integer
threads
20
20 → 1000
no
The minIoThreads setting enable you to configure a minimum number of worker threads and I/O threads for load conditions. To be set in web.config (target app only) or machine.config (global). It requires autoConfig=false
webconf_maxWorkerThreads
integer
threads
20
20 → 1000
no
This setting controls the maximum number of worker threads in the thread pool. This number is then automatically multiplied by the number of available CPUs.To be set in web.config (target app only) or machine.config (global).It requires autoConfig=false
webconf_minWorkerThreads
integer
threads
20
20 → 1000
no
The minWorkerThreads setting enable you to configure a minimum number of worker threads and I/O threads for load conditions. To be set in web.config (target app only) or machine.config (global). It requires autoConfig=false
webconf_minFreeThreads
integer
threads
8
8 → 800
no
Used by the worker process to queue all the incoming requests if the number of available threads in the thread pool falls below its value. To be set in web.config (target app only) or machine.config (global). It requires autoConfig=false
webconf_minLocalRequestFreeThreads
integer
threads
4
4 → 7600
no
Used to queue requests from localhost (where a Web application sends requests to a local Web service) if the number of available threads falls below it. To be set in web.config (target app only) or machine.config (global). It requires autoConfig=false
webconf_autoConfig
categori
boolean
true
true, false
no
Enable settings the system.web configuration parameters. To be set in web.config (target app only) or machine.config (global)
system
object reference
TRUE
The system the study refers to
name
string
TRUE
The name of the study
goal
object
TRUE
The goal and constraint description - see
kpis
list
FALSE
The KPIs description - see
numberOfTrials
integer
FALSE
1
The number of trials for each experiment - see below
trialAggregation
string
MAX, MIN, AVG
FALSE
AVG
The aggregation used to calculate the score across multiple trials - see below
parametersSelection
list
FALSE
all
The list of parameters to be tuned - see
metricsSelection
list
FALSE
all
The list of metrics - see
workloadsSelection
object array
FALSE
The list of defined workloads - this only applies to live optimization studies - see
windowing
string
FALSE
trim
The windowing strategy - this only applies to offline optimization studies - see
workflow
object reference
TRUE
The workflow the study refers to
steps
list
TRUE
The description of the steps - see
Some of these optional properties depend on whether the study is an offline or live optimization study.
It is possible to perform more than one trial per experiment to validate the score of a configuration under test, e.g., to consider noisy environments.
The following fragment of the YAML definition of a study sets the number of trials to 3:
Notice: This is a global property of the study which can be overwritten for each step.
The trial aggregation policy defines how trial scores are aggregated to form experiment scores.
There are three different types of strategies to aggregate trial scores:
AVG: the score of an experiment is the average of the scores of its trials - this is the default
MIN: the score of an experiment is the minimum among the scores of its trials
MAX: the score of an experiment is the maximum among the scores of its trial
The following fragment of the YAML definition of a study sets the trial aggregation to MAX:
The following system refers to an offline optimization study for a system modeling an e-commerce service, where a windowing strategy is specified:
The following offline study refers to a tuning initiative for a Cassandra-based system (ID 2)
The following offline study is for tuning another Cassandra-based system (ID 3) by acting only on JVM and Linux parameters
The NeoLoadWeb operator allows piloting performance tests on a target system by leveraging the Tricentis NeoLoad Web solution.
Once triggered, this operator will configure and start the execution of a NeoLoad test run on the remote endpoint. When the test is unable to run then the operator blocks the Akamas workflow issuing an error.
This operator requires five pieces of information to pilot successfully performance tests within Akamas:
The location of a .zip archive(project file) containing the definition of the performance test. This location can be a URL accessible via HTTP/HTTPS or a file path accessible via SFTP. Otherwise, the unique identifier of a previously uploaded project must be provided.
The name of the scenario to be used for the test
The URL of the NeoLoad Web API (either on-premise or SaaS)
When a projectFile is specified the Operator uploads the provided project to NeoLoad and launches the specified scenario. After the execution of the scenario, the project is deleted from NeoLoad. When a projectId is specified the Operator expects the project to be already available on NeoLoad. Please refer to on how to upload a project and obtain a project ID.
ProjectFile structure and argumentsThe projectFile argument needs to be specified differently depending on the protocol used to get the specification of the performance test:
HTTP/HTTPS
SSH (SFTP)
Here follows the structure of the projectFile argument in the case in which HTTP/HTTPS is used to get the specification of the performance test:
with its arguments:
Here follows the structure of the projectFile argument in the case in which SFTP is used to get the specification of the performance test.
with its arguments
component structure and argumentsThe component argument can be used to refer to a component by name and use its properties as the arguments of the operator.
system: 1
name: Optimizing the e-shop application
goal:
objective: maximize
function:
formula: payments_per_sec
variables:
payments_per_sec:
metric: eshop_payments
labels:
componentName: eshop
workflow: eshop_jmeter_test
steps:
- name: baseline
type: baseline
values:
tomcat.maxThreads: 1024
jvm.maxHeap: 2048
jvm.garbageCollectorType: G1GC
postgres.shared_buffers: 4096numberOfTrials: 3trialAggregation: MAX # Other possible values are AVG, MINsystem: "bde4f259-9a51-4c67-87aa-3c5bc599c6b9" # id of the system to optimize with the actions defined in this study
workflow: "eshop_jmeter_test" # name of the workflow to use to perform trials
name: Optimizing the e-shop application # name of the study
goal: # the performance goal to achieve
objective: "maximize"
function:
formula: "eshop.payments_per_second"
windowing: # the temporal window in which to compute the score of a trial
type: "trim"
trim: ["10s", "0s"] # use the duration of the trial minus 0s from start and end to compute the score of the trial
parametersSelection: "all" # use all available configuration parameters
metricsSelection: "all" # gather all metrics
steps: # the steps to conduct to perform experiments and trials
- name: "my_baseline" # do first a baseline with the provided configuration
type: "baseline"
values:
jvm.maxHeap: 2048
jvm.gcType: "-XX:+UseParallelGC"
- name: my_optimization # then do 20 optimization experiments of 2 trials each
type: optimize
numberOfExperiments: 200
numberOfTrials: 2system: 2
name: Optimizing the cassandra - team 2
goal:
objective: minimize
function:
formula: read_response_time_p90
variables:
read_response_time_p90:
metric: read_response_time_p90
labels:
componentName: cassandra
windowing:
type: trim
trim: [5m, 1m]
workflow: cassandra_workflow
parametersSelection:
- name: cassandra_jvm.jvm_maxHeapSize
- name: cassandra.cassandra_concurrentReads
- name: cassandra.cassandra_concurrentWrites
- name: cassandra.cassandra_fileCacheSizeInMb
- name: cassandra.cassandra_memtableCleanupThreshold
- name: cassandra.cassandra_concurrentCompactors
steps:
- name: baseline_step
type: baseline
values:
cassandra_jvm.jvm_maxHeapSize: 1024
cassandra.cassandra_concurrentReads: 32
cassandra.cassandra_concurrentWrites: 32
cassandra.cassandra_fileCacheSizeInMb: 512
cassandra.cassandra_memtableCleanupThreshold: 0.11
cassandra.cassandra_concurrentCompactors: 2
- name: optimization_step
type: optimize
optimizer: CALABI
numberOfExperiments: 50system: 3
name: Optimizing a Cassandra NoSQL database version 3 (jvm + os parameters)
goal:
objective: minimize
function:
formula: (x1+x2)/2
variables:
x1:
metric: write_response_time_p90
labels:
componentName: cassandra_team1
x2:
metric: read_response_time_p90
labels:
componentName: cassandra_team1
windowing:
type: trim
trim: [8m,2m]
numberOfTrials: 2
workflow: cassandra_workflow_jvm_os
parametersSelection:
- name: JVM1.jvm_maxHeapSize
- name: JVM1.jvm_newRatio
- name: JVM1.jvm_survivorRatio
- name: JVM1.jvm_maxTenuringThreshold
- name: JVM1.jvm_gcType
- name: JVM1.jvm_concurrentGCThreads
- name: os1.os_cpuSchedMinGranularity
- name: os1.os_cpuSchedWakeupGranularity
- name: os1.os_CPUSchedMigrationCost
- name: os1.os_CPUSchedChildRunsFirst
- name: os1.os_CPUSchedLatency
steps:
- name: baseline_step
type: baseline
values:
JVM_team1.jvm_maxHeapSize: 1024
JVM_team1.jvm_newRatio: 2
JVM_team1.jvm_survivorRatio: 8
JVM_team1.jvm_maxTenuringThreshold: 15
JVM_team1.jvm_gcType: UseConcMarkSweepGC
JVM_team1.jvm_concurrentGCThreads: 8
os_team1.os_cpuSchedMinGranularity: 3000000
os_team1.os_cpuSchedWakeupGranularity: 4000000
os_team1.os_CPUSchedMigrationCost: 500000
os_team1.os_CPUSchedChildRunsFirst: 0
os_team1.os_CPUSchedLatency: 24000000
- name: optimization_sobol
type: optimize
optimizer: SOBOL
numberOfExperiments: 3
- name: optimization_calabi
type: optimize
optimizer: CALABI
numberOfExperiments: 50The account token used to access the NeoLoad Web APIs
The name of the scenario to be used for the performance piloted by Akamas
projectId
String
It should be a valid UUID
No, if a projectFile is already defined
The identified of a previously uploaded project file. Has precedence over projectFile
projectFile
Object
It should have a structure like the one described here below
No, if a projectId is already defined
The specification of the strategy to be used to get the archive containing the specification of the performance test to be piloted by Akamas. When defined projectId has the precedence.
neoloadProjectFilesApi
String
It should be a valid URL or IP
No
The address of the API to be used to upload project files to NeoLoad Web
neoloadApi
String
It should be a valid URL or IP
No
The address of the Neotys' NeoLoad Web API
lgZones
String
Comma-separated list of zones and number of LG
No
The list of LG zones id with the number of the LGs. Example: "ZoneId1:10,ZoneId2:5". If empty, the default zone will be used with one LG.
controllerZoneId
String
A controller zone Id
No
The controller zone Id. If empty, the default zone will be used.
component
String
It should match the name of an existing component of the System under test
No
The name of the component whose properties can be used as arguments of the operator.
accountToken
String
It should match an existing access token registered with NeoLoad Web
No, if specified in the component. See example below
The token to be used to authenticate requests against the NeoLoad Web APIs
The URL of the project file
verifySSL
Boolean
No
true
If the https connection should be verified using the certificates available on the machine in which the operator is running
SSH host address
username
String
Yes
SSH login username
password
String
No. Either password or key should be provided
SSH login password
sshPort
Number (integer)
1≤sshPort≤65532
22
SSH port
key
String
No, Either password or key should be provided
SSH login key, provided directly its value or the path of the file to import from. The operator supports RSA and DSA Keys.
path
String
It should be a valid path on the SSH host machine
Yes
The path of the project file
scenarioName
scenarioName
controllerZoneId
controllerZoneId
lgZones
lgZones
deleteProjectAfterTest
deleteProjectAfterTest
url
projectFile->http->url
verifySSL
projectFile->http->verifySSL
hostname
projectFile->ssh->hostname
username
projectFile->ssh->username
password
projectFile->ssh->password
key
projectFile->ssh->key
sshPort
projectFile->ssh->sshPort
path
projectFile->ssh->path
scenarioName
String
It should match an existing scenario in the project file. Can be retrieved from the "runtime" section of your neoload controller.
No, if the component whose name is defined in component has a property that maps to scenarioName
url
String
It should be a valid URL or IP
Yes
hostname
String
It should be a valid SSH host address
Yes
neoloadProjectFilesApi
neoloadProjectFilesApi
neoloadApi
neoloadApi
accountToken
accountToken
# ...
projectFile:
http:
url: http://url_of_project_fileprojectFile:
ssh:
hostname: this_is_a_hostname
username: this_is_a_username
sshPort: 22
key: this_is_a_key
path: /path/to/project/filename: task1
operator: NeoLoadWeb
arguments:
projectFile:
ssh:
hostname: akamas-machine-1
username: akamas
key: |-
-----BEGIN RSA PRIVATE KEY-----
RSA KEY HERE
-----END RSA PRIVATE KEY-----
path: projects/project1.zip
scenarioName: scenario1
accountToken: "ACCOUNT TOKEN HERE"name: task1
operator: NeoLoadWeb
arguments:
component: component1
accountToken: "ACCOUNT TOKEN HERE"percent
The average memory utilization %
nodejs_gc_heap_used
bytes
GC heap used
nodejs_rss
bytes
Process Resident Set Size (RSS)
nodejs_v8_heap_total
bytes
V8 heap total
nodejs_v8_heap_used
bytes
V8 heap used
nodejs_number_active_threads
threads
Number of active threads
nodejs_suspension_time
percent
Suspension time %
nodejs_active_handles
handles
Number of active libuv handles grouped by handle type. Every handle type is C++ class name
nodejs_active_handles_total
handles
Total number of active handles
nodejs_active_requests
requests
Number of active libuv requests grouped by request type. Every request type is C++ class name
nodejs_active_requests_total
requests
Total number of active requests
nodejs_eventloop_lag_max_seconds
seconds
The maximum recorded event loop delay
nodejs_eventloop_lag_mean_seconds
seconds
The mean of the recorded event loop delays
nodejs_eventloop_lag_min_seconds
seconds
The minimum recorded event loop delay
nodejs_eventloop_lag_p50_seconds
seconds
The 50th percentile of the recorded event loop delays
nodejs_eventloop_lag_p90_seconds
seconds
The 90th percentile of the recorded event loop delays
nodejs_eventloop_lag_p99_seconds
seconds
The 99th percentile of the recorded event loop delays
nodejs_eventloop_lag_seconds
seconds
Lag of event loop in seconds
nodejs_external_memory_bytes
bytes
NodeJS external memory size in bytes
nodejs_gc_duration_seconds_bucket
seconds
The total count of observations for a bucket in the histogram. Garbage collection duration by kind, one of major, minor, incremental or weakcb
nodejs_gc_duration_seconds_count
seconds
The total number of observations for Garbage collection duration by kind, one of major, minor, incremental or weakcb
nodejs_gc_duration_seconds_sum
seconds
The total sum of observations for Garbage collection duration by kind, one of major, minor, incremental or weakcb
nodejs_heap_size_total_bytes
bytes
Process heap size from NodeJS in bytes
nodejs_heap_size_used_bytes
bytes
Process heap size used from NodeJS in bytes
nodejs_heap_space_size_available_bytes
bytes
Process heap size available from NodeJS in bytes
nodejs_heap_space_size_total_bytes
bytes
Process heap space size total from NodeJS in bytes
nodejs_heap_space_size_used_bytes
bytes
Process heap space size used from NodeJS in bytes
process_cpu_seconds_total
seconds
Total user and system CPU time spent in seconds
process_cpu_system_seconds_total
seconds
Total system CPU time spent in seconds
process_cpu_user_seconds_total
seconds
Total user CPU time spent in seconds
process_heap_bytes
bytes
Process heap size in bytes
process_max_fds
fds
Maximum number of open file descriptors
process_open_fds
fds
Number of open file descriptors
process_resident_memory_bytes
bytes
Resident memory size in bytes
process_virtual_memory_bytes
bytes
Virtual memory size in bytes
--allocation-site-pretenuring, --no-allocation-site-pretenuring
yes
Pretenure with allocation sites
v8_min_semi_space_size
integer
megabytes
0
0 → 1048576
yes
Min size of a semi-space (in MBytes), the new space consists of two semi-spaces
v8_max_semi_space_size
integer
megabytes
0
0 → 1048576
yes
Max size of a semi-space (in MBytes), the new space consists of two semi-spaces. This parameter is equivalent to v8_max_semi_space_size_ordinal.
v8_max_semi_space_size_ordinal
ordinal
megabytes
16
2, 4, 6, 8, 16, 32, 64, 128, 256, 512, 1024, 2048
yes
Max size of a semi-space (in MBytes), the new space consists of two semi-spaces. This parameter is equivalent to v8_max_semi_space_size but forces power of 2 values.
v8_semi_space_grouth_factor
integer
2
0 → 100
yes
Factor by which to grow the new space
v8_max_old_space_size
integer
megabytes
0
0 → 1048576
yes
Max size of the old space (in Mbytes)
v8_max_heap_size
integer
megabytes
0
0 → 1048576
yes
Max size of the heap (in Mbytes) both max_semi_space_size and max_old_space_size take precedence. All three flags cannot be specified at the same time.
v8_initial_heap_size
integer
megabytes
0
0 → 1048576
yes
Initial size of the heap (in Mbytes)
v8_initial_old_space_size
integer
megabytes
0
0 → 1048576
yes
Initial old space size (in Mbytes)
v8_parallel_scavenge
categorical
--parallel-scavenge
--parallel-scavenge, --no-parallel-scavenge
yes
Parallel scavenge
v8_scavenge_task_trigger
integer
80
1 → 100
yes
Scavenge task trigger in percent of the current heap limit
v8_scavenge_separate_stack_scanning
categorical
--no-scavenge-separate-stack-scanning
--scavenge-separate-stack-scanning, --no-scavenge-separate-stack-scanning
yes
Use a separate phase for stack scanning in scavenge
v8_concurrent_marking
categorical
--concurrent-marking
--concurrent-marking, --no-concurrent-marking
yes
Use concurrent marking
v8_parallel_marking
categorical
--parallel-marking
--parallel-marking, --no-parallel-marking
yes
Use parallel marking in atomic pause
v8_concurrent_sweeping
categorical
--concurrent-sweeping
--concurrent-sweeping, --no-concurrent-sweeping
yes
Use concurrent sweeping
v8_heap_growing_percent
integer
0
0 → 99
yes
Specifies heap growing factor as (1 + heap_growing_percent/100)
v8_os_page_size
integer
kilobytes
0
0 → 1048576
yes
Override OS page size (in KBytes)
v8_stack_size
integer
kilobytes
984
16 → 1048576
yes
Default size of stack region v8 is allowed to use (in kBytes)
v8_single_threaded
categorical
--no-single-threaded
--single-threaded, --no-single-threaded
yes
Disable the use of background tasks
v8_single_threaded_gc
categorical
--no-single-threaded-gc
--single-threaded-gc, --no-single-threaded-gc
yes
Disable the use of background gc tasks
cpu_used
CPUs
The total amount of CPUs used
cpu_util
percent
The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)
memory_used
bytes
The total amount of memory used
v8_allocation_size_pretenuring
categorical
memory_util
--allocation-site-pretenuring
This page describes the Optimization Pack for the component type CentOS 7.
Notice: you can use a device custom filter to monitor a specific disk with Prometheus. You can find more information on Prometheus queries and the %FILTERS% placeholder here: and here: .
There are no general constraints among CentOS 7 parameters.
This page describes the Optimization Pack for the component type Ubuntu 18.04.
Notice: you can use a device custom filter to monitor a specific disk with Prometheus. You can find more information on Prometheus queries and the %FILTERS% placeholder here: and here: .
This page describes the Optimization Pack for the component type RHEL 8.
Notice: you can use a device custom filter to monitor a specific disk with Prometheus. You can find more information on Prometheus queries and the %FILTERS% placeholder here: and here: .
This page describes the Optimization Pack for the component type Ubuntu 16.04.
Notice: you can use a device custom filter to monitor a specific disk with Prometheus. You can find more information on Prometheus queries and the %FILTERS% placeholder here: and here: .
409681921638432768cpu_util_details
percent
The average CPU utilization % broken down by usage type and cpu number (e.g., cp1 user, cp2 system, cp3 soft-irq)
cpu_load_avg
tasks
The system load average (i.e., the number of active tasks in the system)
mem_util_details
percent
The memory utilization % (i.e., the % of memory used) broken down by usage type (e.g., active memory)
mem_used
bytes
The total amount of memory used
mem_used_nocache
bytes
The total amount of memory used without considering memory reserved for caching purposes
mem_total
bytes
The total amount of installed memory
mem_fault_minor
faults/s
The number of minor memory faults (i.e., faults that do not cause disk access) per second
mem_fault_major
faults/s
The number of major memory faults (i.e., faults that cause disk access) per second
mem_fault
faults/s
The number of memory faults (major + minor)
mem_swapins
pages/s
The number of memory pages swapped in per second
mem_swapouts
pages/s
The number of memory pages swapped out per second
network_out_bytes_details
bytes/s
The number of outbound network packets in bytes per second broken down by network device (e.g., eth01)
disk_util_details
percent
The utilization % of disk, i.e how much time a disk is busy doing work broken down by disk (e.g., disk D://)
disk_iops_writes
ops/s
The average number of IO disk-write operations per second across all disks
disk_iops_reads
ops/s
The average number of IO disk-read operations per second across all disks
disk_iops
ops/s
The average number of IO disk operations per second across all disks
disk_response_time_read
seconds
The average response time of IO read-disk operations
disk_response_time_worst
seconds
The average response time of IO disk operations of the slowest disk
disk_response_time_write
seconds
The average response time of IO write-disk operations
disk_response_time_details
ops/s
The average response time of IO disk operations broken down by disk (e.g., disk /dev/nvme01 )
disk_iops_details
ops/s
The number of IO disk-write operations of per second broken down by disk (e.g., disk /dev/nvme01)
disk_io_inflight_details
ops
The number of IO disk operations in progress (outstanding) broken down by disk (e.g., disk /dev/nvme01)
disk_write_bytes
bytes/s
The number of bytes per second written across all disks
disk_read_bytes
bytes/s
The number of bytes per second read across all disks
disk_read_write_bytes
bytes/s
The number of bytes per second read and written across all disks
disk_write_bytes_details
bytes/s
The number of bytes per second written from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation WRITE)
disk_read_bytes_details
bytes/s
The number of bytes per second read from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation READ)
filesystem_size
bytes
The size of filesystems broken down by type and device (e.g., filesystem of type ext4 for device /dev/nvme01)
400000→40000000 ns
Scheduler Wakeup Granularity (in nanoseconds)
os_CPUSchedMigrationCost
500000 ns
100000→5000000 ns
Amount of time (in nanoseconds) after the last execution that a task is considered to be "cache hot" in migration decisions. A "hot" task is less likely to be migrated to another CPU, so increasing this variable reduces task migrations
os_CPUSchedChildRunsFirst
0
0→1
A freshly forked child runs before the parent continues execution
os_CPUSchedLatency
18000000 ns
2400000→240000000 ns
Targeted preemption latency (in nanoseconds) for CPU bound tasks
os_CPUSchedAutogroupEnabled
1
0→1
Enables the Linux task auto-grouping feature, where the kernel assigns related tasks to groups and schedules them together on CPUs to achieve higher performance for some workloads
os_CPUSchedNrMigrate
32
3→320
Scheduler NR Migrate
10→100 %
VFS Cache Pressure
os_MemoryVmMinFree
67584 KB
10240→1024000 KB
Minimum Free Memory
os_MemoryVmDirtyRatio
20 %
1→99 %
When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write
os_MemoryVmDirtyBackgroundRatio
10 %
1→99 %
When the dirty memory pages exceed this percentage of the total memory, the kernel begins to write them asynchronously in the background
os_MemoryTransparentHugepageEnabled
always
always never
Transparent Hugepage Enablement
os_MemoryTransparentHugepageDefrag
always
always never
Transparent Hugepage Enablement Defrag
os_MemorySwap
swapon
swapon swapoff
Memory Swap
os_MemoryVmDirtyExpire
3000 centisecs
300→30000 centisecs
Memory Dirty Expiration Time
os_MemoryVmDirtyWriteback
500 centisecs
50→5000 centisecs
Memory Dirty Writeback
100→10000 packets
Network Max Backlog
os_NetworkNetIpv4TcpMaxSynBacklog
1024 packets
52→15120 packets
Network IPV4 Max Sync Backlog
os_NetworkNetCoreNetdevBudget
300 packets
30→3000 packets
Network Budget
os_NetworkNetCoreRmemMax
212992 bytes
21299→2129920 bytes
Maximum network receive buffer size that applications can request
os_NetworkNetCoreWmemMax
21299→2129920 bytes
21299→2129920 bytes
Maximum network transmit buffer size that applications can request
os_NetworkNetIpv4TcpSlowStartAfterIdle
1
0→1
Network Slow Start After Idle Flag
os_NetworkNetIpv4TcpFinTimeout
60
6 →600 seconds
Network TCP timeout
os_NetworkRfs
0
0→131072
If enabled increases datacache hitrate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running
100→10000 packets
Network Max Backlog
os_StorageRqAffinity
1
1→2
Storage Requests Affinity
os_StorageQueueScheduler
none
none kyber
Storage Queue Scheduler Type
os_StorageNomerges
0
0→2
Enables the user to disable the lookup logic involved with IO merging requests in the block layer. By default (0) all merges are enabled. With 1 only simple one-hit merges will be tried. With 2 no merge algorithms will be tried
os_StorageMaxSectorsKb
128 KB
32→128 KB
The largest IO size that the OS c
cpu_num
CPUs
The number of CPUs available in the system (physical and logical)
cpu_util
percent
The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)
mem_util
percent
The memory utilization % (i.e, the % of memory used)
mem_util_nocache
percent
The memory utilization % (i.e., the % of memory used) without considering memory reserved for caching purposes
network_tcp_retrans
retrans/s
The number of network TCP retransmissions per second
network_in_bytes_details
bytes/s
The number of inbound network packets in bytes per second broken down by network device (e.g., wlp4s0)
disk_swap_util
percent
The average space utilization % of swap disks
disk_swap_used
bytes
The total amount of space used by swap disks
filesystem_util
percent
The space utilization % of filesystems broken down by type and device (e.g., filesystem of type overlayfs on device /dev/loop1)
filesystem_used
bytes
The amount of space used on the filesystems broken down by type and device (e.g., filesystem of type zfs on device /dev/nvme01)
proc_blocked
processes
The number of processes blocked (e.g, for IO or swapping reasons)
os_context_switch
switches/s
The number of context switches per second
os_cpuSchedMinGranularity
2250000 ns
300000→30000000 ns
Minimal preemption granularity (in nanoseconds) for CPU bound tasks
os_cpuSchedWakeupGranularity
os_MemorySwappiness
1
0→100
Memory Swappiness
os_MemoryVmVfsCachePressure
os_NetworkNetCoreSomaxconn
128 connections
12→1200 connections
Network Max Connections
os_NetworkNetCoreNetdevMaxBacklog
os_StorageReadAhead
128 KB
0→1024 KB
Read-ahead speeds up file access by pre-fetching data and loading it into the page cache so that it can be available earlier in memory instead of from disk
os_StorageNrRequests
3000000 ns
100 %
1000 packets
1000 packets
cpu_util_details
percent
The average CPU utilization % broken down by usage type and cpu number (e.g., cp1 user, cp2 system, cp3 soft-irq)
cpu_load_avg
tasks
The system load average (i.e., the number of active tasks in the system)
mem_util_details
percent
The memory utilization % (i.e., the % of memory used) broken down by usage type (e.g., active memory)
mem_used
bytes
The total amount of memory used
mem_used_nocache
bytes
The total amount of memory used without considering memory reserved for caching purposes
mem_total
bytes
The total amount of installed memory
mem_fault_minor
faults/s
The number of minor memory faults (i.e., faults that do not cause disk access) per second
mem_fault_major
faults/s
The number of major memory faults (i.e., faults that cause disk access) per second
mem_fault
faults/s
The number of memory faults (major + minor)
mem_swapins
pages/s
The number of memory pages swapped in per second
mem_swapouts
pages/s
The number of memory pages swapped out per second
network_out_bytes_details
bytes/s
The number of outbound network packets in bytes per second broken down by network device (e.g., eth01)
disk_util_details
percent
The utilization % of disk, i.e how much time a disk is busy doing work broken down by disk (e.g., disk D://)
disk_iops_writes
ops/s
The average number of IO disk-write operations per second across all disks
disk_iops_reads
ops/s
The average number of IO disk-read operations per second across all disks
disk_iops
ops/s
The average number of IO disk operations per second across all disks
disk_response_time_read
seconds
The average response time of IO read-disk operations
disk_response_time_worst
seconds
The average response time of IO disk operations of the slowest disk
disk_response_time_write
seconds
The average response time of IO write-disk operations
disk_response_time_details
ops/s
The average response time of IO disk operations broken down by disk (e.g., disk /dev/nvme01 )
disk_iops_details
ops/s
The number of IO disk-write operations of per second broken down by disk (e.g., disk /dev/nvme01)
disk_io_inflight_details
ops
The number of IO disk operations in progress (outstanding) broken down by disk (e.g., disk /dev/nvme01)
disk_write_bytes
bytes/s
The number of bytes per second written across all disks
disk_read_bytes
bytes/s
The number of bytes per second read across all disks
disk_read_write_bytes
bytes/s
The number of bytes per second read and written across all disks
disk_write_bytes_details
bytes/s
The number of bytes per second written from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation WRITE)
disk_read_bytes_details
bytes/s
The number of bytes per second read from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation READ)
filesystem_size
bytes
The size of filesystems broken down by type and device (e.g., filesystem of type ext4 for device /dev/nvme01)
400000→40000000 ns
Scheduler Wakeup Granularity (in nanoseconds)
os_CPUSchedMigrationCost
500000 ns
100000→5000000 ns
Amount of time (in nanoseconds) after the last execution that a task is considered to be "cache hot" in migration decisions. A "hot" task is less likely to be migrated to another CPU, so increasing this variable reduces task migrations
os_CPUSchedChildRunsFirst
0
0→1
A freshly forked child runs before the parent continues execution
os_CPUSchedLatency
18000000 ns
2400000→240000000 ns
Targeted preemption latency (in nanoseconds) for CPU bound tasks
os_CPUSchedAutogroupEnabled
1
0→1
Enables the Linux task auto-grouping feature, where the kernel assigns related tasks to groups and schedules them together on CPUs to achieve higher performance for some workloads
os_CPUSchedNrMigrate
32
3→320
Scheduler NR Migrate
10→100 %
VFS Cache Pressure
os_MemoryVmMinFree
67584 KB
10240→1024000 KB
Minimum Free Memory
os_MemoryVmDirtyRatio
20 %
1→99 %
When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write
os_MemoryVmDirtyBackgroundRatio
10 %
1→99 %
When the dirty memory pages exceed this percentage of the total memory, the kernel begins to write them asynchronously in the background
os_MemoryTransparentHugepageEnabled
madvise
always never madvise
Transparent Hugepage Enablement
os_MemoryTransparentHugepageDefrag
madvise
always never madvise defer defer+madvise
Transparent Hugepage Enablement Defrag
os_MemorySwap
swapon
swapon swapoff
Memory Swap
os_MemoryVmDirtyExpire
3000 centisecs
300→30000 centisecs
Memory Dirty Expiration Time
os_MemoryVmDirtyWriteback
500 centisecs
50→5000 centisecs
Memory Dirty Writeback
100→10000 packets
Network Max Backlog
os_NetworkNetIpv4TcpMaxSynBacklog
1024 packets
52→15120 packets
Network IPV4 Max Sync Backlog
os_NetworkNetCoreNetdevBudget
300 packets
30→3000 packets
Network Budget
os_NetworkNetCoreRmemMax
212992 bytes
21299→2129920 bytes
Maximum network receive buffer size that applications can request
os_NetworkNetCoreWmemMax
21299→2129920 bytes
21299→2129920 bytes
Maximum network transmit buffer size that applications can request
os_NetworkNetIpv4TcpSlowStartAfterIdle
1
0→1
Network Slow Start After Idle Flag
os_NetworkNetIpv4TcpFinTimeout
60
6 →600 seconds
Network TCP timeout
os_NetworkRfs
0
0→131072
If enabled increases datacache hitrate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running
100→10000 packets
Network Max Backlog
os_StorageRqAffinity
1
1→2
Storage Requests Affinity
os_StorageQueueScheduler
none
none mq-deadline
Storage Queue Scheduler Type
os_StorageNomerges
0
0→2
Enables the user to disable the lookup logic involved with IO merging requests in the block layer. By default (0) all merges are enabled. With 1 only simple one-hit merges will be tried. With 2 no merge algorithms will be tried
os_StorageMaxSectorsKb
128 KB
32→128 KB
The largest IO size that the OS c
cpu_num
CPUs
The number of CPUs available in the system (physical and logical)
cpu_util
percent
The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)
mem_util
percent
The memory utilization % (i.e, the % of memory used)
mem_util_nocache
percent
The memory utilization % (i.e., the % of memory used) without considering memory reserved for caching purposes
network_tcp_retrans
retrans/s
The number of network TCP retransmissions per second
network_in_bytes_details
bytes/s
The number of inbound network packets in bytes per second broken down by network device (e.g., wlp4s0)
disk_swap_util
percent
The average space utilization % of swap disks
disk_swap_used
bytes
The total amount of space used by swap disks
filesystem_util
percent
The space utilization % of filesystems broken down by type and device (e.g., filesystem of type overlayfs on device /dev/loop1)
filesystem_used
bytes
The amount of space used on the filesystems broken down by type and device (e.g., filesystem of type zfs on device /dev/nvme01)
proc_blocked
processes
The number of processes blocked (e.g, for IO or swapping reasons)
os_context_switch
switches/s
The number of context switches per second
os_cpuSchedMinGranularity
2250000 ns
300000→30000000 ns
Minimal preemption granularity (in nanoseconds) for CPU bound tasks
os_cpuSchedWakeupGranularity
os_MemorySwappiness
1
0→100
Memory Swappiness
os_MemoryVmVfsCachePressure
os_NetworkNetCoreSomaxconn
128 connections
12→1200 connections
Network Max Connections
os_NetworkNetCoreNetdevMaxBacklog
os_StorageReadAhead
128 KB
0→1024 KB
Read-ahead speeds up file access by pre-fetching data and loading it into the page cache so that it can be available earlier in memory instead of from disk
os_StorageNrRequests
3000000 ns
100 %
1000 packets
1000 packets
cpu_util_details
percent
The average CPU utilization % broken down by usage type and cpu number (e.g., cp1 user, cp2 system, cp3 soft-irq)
cpu_load_avg
tasks
The system load average (i.e., the number of active tasks in the system)
mem_util_details
percent
The memory utilization % (i.e., the % of memory used) broken down by usage type (e.g., active memory)
mem_used
bytes
The total amount of memory used
mem_used_nocache
bytes
The total amount of memory used without considering memory reserved for caching purposes
mem_total
bytes
The total amount of installed memory
mem_fault_minor
faults/s
The number of minor memory faults (i.e., faults that do not cause disk access) per second
mem_fault_major
faults/s
The number of major memory faults (i.e., faults that cause disk access) per second
mem_fault
faults/s
The number of memory faults (major + minor)
mem_swapins
pages/s
The number of memory pages swapped in per second
mem_swapouts
pages/s
The number of memory pages swapped out per second
network_out_bytes_details
bytes/s
The number of outbound network packets in bytes per second broken down by network device (e.g., eth01)
disk_util_details
percent
The utilization % of disk, i.e how much time a disk is busy doing work broken down by disk (e.g., disk D://)
disk_iops_writes
ops/s
The average number of IO disk-write operations per second across all disks
disk_iops_reads
ops/s
The average number of IO disk-read operations per second across all disks
disk_iops
ops/s
The average number of IO disk operations per second across all disks
disk_response_time_read
seconds
The average response time of IO read-disk operations
disk_response_time_worst
seconds
The average response time of IO disk operations of the slowest disk
disk_response_time_write
seconds
The average response time of IO write-disk operations
disk_response_time_details
ops/s
The average response time of IO disk operations broken down by disk (e.g., disk /dev/nvme01 )
disk_iops_details
ops/s
The number of IO disk-write operations of per second broken down by disk (e.g., disk /dev/nvme01)
disk_io_inflight_details
ops
The number of IO disk operations in progress (outstanding) broken down by disk (e.g., disk /dev/nvme01)
disk_write_bytes
bytes/s
The number of bytes per second written across all disks
disk_read_bytes
bytes/s
The number of bytes per second read across all disks
disk_read_write_bytes
bytes/s
The number of bytes per second read and written across all disks
disk_write_bytes_details
bytes/s
The number of bytes per second written from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation WRITE)
disk_read_bytes_details
bytes/s
The number of bytes per second read from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation READ)
filesystem_size
bytes
The size of filesystems broken down by type and device (e.g., filesystem of type ext4 for device /dev/nvme01)
400000→40000000 ns
Scheduler Wakeup Granularity (in nanoseconds)
os_CPUSchedMigrationCost
500000 ns
100000→5000000 ns
Amount of time (in nanoseconds) after the last execution that a task is considered to be "cache hot" in migration decisions. A "hot" task is less likely to be migrated to another CPU, so increasing this variable reduces task migrations
os_CPUSchedChildRunsFirst
0
0→1
A freshly forked child runs before the parent continues execution
os_CPUSchedLatency
18000000 ns
2400000→240000000 ns
Targeted preemption latency (in nanoseconds) for CPU bound tasks
os_CPUSchedAutogroupEnabled
1
0→1
Enables the Linux task auto-grouping feature, where the kernel assigns related tasks to groups and schedules them together on CPUs to achieve higher performance for some workloads
os_CPUSchedNrMigrate
32
3→320
Scheduler NR Migrate
10→100 %
VFS Cache Pressure
os_MemoryVmMinFree
67584 KB
10240→1024000 KB
Minimum Free Memory
os_MemoryVmDirtyRatio
30 %
1→99 %
When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write
os_MemoryVmDirtyBackgroundRatio
10 %
1→99 %
When the dirty memory pages exceed this percentage of the total memory, the kernel begins to write them asynchronously in the background
os_MemoryTransparentHugepageEnabled
never
always never madvise
Transparent Hugepage Enablement
os_MemoryTransparentHugepageDefrag
always
always never madvise defer defer+madvise
Transparent Hugepage Enablement Defrag
os_MemorySwap
swapon
swapon swapoff
Memory Swap
os_MemoryVmDirtyExpire
3000 centisecs
300→30000 centisecs
Memory Dirty Expiration Time
os_MemoryVmDirtyWriteback
500 centisecs
50→5000 centisecs
Memory Dirty Writeback
100→10000 packets
Network Max Backlog
os_NetworkNetIpv4TcpMaxSynBacklog
512 packets
52→15120 packets
Network IPV4 Max Sync Backlog
os_NetworkNetCoreNetdevBudget
300 packets
30→3000 packets
Network Budget
os_NetworkNetCoreRmemMax
212992 bytes
21299→2129920 bytes
Maximum network receive buffer size that applications can request
os_NetworkNetCoreWmemMax
21299→2129920 bytes
21299→2129920 bytes
Maximum network transmit buffer size that applications can request
os_NetworkNetIpv4TcpSlowStartAfterIdle
1
0→1
Network Slow Start After Idle Flag
os_NetworkNetIpv4TcpFinTimeout
60
6 →600 seconds
Network TCP timeout
os_NetworkRfs
0
0→131072
If enabled increases datacache hitrate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running
100→10000 packets
Network Max Backlog
os_StorageRqAffinity
1
1→2
Storage Requests Affinity
os_StorageQueueScheduler
none
none kyber mq-deadline bfq
Storage Queue Scheduler Type
os_StorageNomerges
0
0→2
Enables the user to disable the lookup logic involved with IO merging requests in the block layer. By default (0) all merges are enabled. With 1 only simple one-hit merges will be tried. With 2 no merge algorithms will be tried
os_StorageMaxSectorsKb
256 KB
32→256 KB
The largest IO size that the OS c
cpu_num
CPUs
The number of CPUs available in the system (physical and logical)
cpu_util
percent
The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)
mem_util
percent
The memory utilization % (i.e, the % of memory used)
mem_util_nocache
percent
The memory utilization % (i.e., the % of memory used) without considering memory reserved for caching purposes
network_tcp_retrans
retrans/s
The number of network TCP retransmissions per second
network_in_bytes_details
bytes/s
The number of inbound network packets in bytes per second broken down by network device (e.g., wlp4s0)
disk_swap_util
percent
The average space utilization % of swap disks
disk_swap_used
bytes
The total amount of space used by swap disks
filesystem_util
percent
The space utilization % of filesystems broken down by type and device (e.g., filesystem of type overlayfs on device /dev/loop1)
filesystem_used
bytes
The amount of space used on the filesystems broken down by type and device (e.g., filesystem of type zfs on device /dev/nvme01)
proc_blocked
processes
The number of processes blocked (e.g, for IO or swapping reasons)
os_context_switch
switches/s
The number of context switches per second
os_cpuSchedMinGranularity
2250000 ns
300000→30000000 ns
Minimal preemption granularity (in nanoseconds) for CPU bound tasks
os_cpuSchedWakeupGranularity
os_MemorySwappiness
30
0→100
Memory Swappiness
os_MemoryVmVfsCachePressure
os_NetworkNetCoreSomaxconn
128 connections
12→1200 connections
Network Max Connections
os_NetworkNetCoreNetdevMaxBacklog
os_StorageReadAhead
128 KB
0→1024 KB
Read-ahead speeds up file access by pre-fetching data and loading it into the page cache so that it can be available earlier in memory instead of from disk
os_StorageNrRequests
3000000 ns
100 %
1000 packets
1000 packets
cpu_util_details
percent
The average CPU utilization % broken down by usage type and cpu number (e.g., cp1 user, cp2 system, cp3 soft-irq)
cpu_load_avg
tasks
The system load average (i.e., the number of active tasks in the system)
mem_util_details
percent
The memory utilization % (i.e., the % of memory used) broken down by usage type (e.g., active memory)
mem_used
bytes
The total amount of memory used
mem_used_nocache
bytes
The total amount of memory used without considering memory reserved for caching purposes
mem_total
bytes
The total amount of installed memory
mem_fault_minor
faults/s
The number of minor memory faults (i.e., faults that do not cause disk access) per second
mem_fault_major
faults/s
The number of major memory faults (i.e., faults that cause disk access) per second
mem_fault
faults/s
The number of memory faults (major + minor)
mem_swapins
pages/s
The number of memory pages swapped in per second
mem_swapouts
pages/s
The number of memory pages swapped out per second
network_out_bytes_details
bytes/s
The number of outbound network packets in bytes per second broken down by network device (e.g., eth01)
disk_util_details
percent
The utilization % of disk, i.e how much time a disk is busy doing work broken down by disk (e.g., disk D://)
disk_iops_writes
ops/s
The average number of IO disk-write operations per second across all disks
disk_iops_reads
ops/s
The average number of IO disk-read operations per second across all disks
disk_iops
ops/s
The average number of IO disk operations per second across all disks
disk_response_time_read
seconds
The average response time of IO read-disk operations
disk_response_time_worst
seconds
The average response time of IO disk operations of the slowest disk
disk_response_time_write
seconds
The average response time of IO write-disk operations
disk_response_time_details
ops/s
The average response time of IO disk operations broken down by disk (e.g., disk /dev/nvme01 )
disk_iops_details
ops/s
The number of IO disk-write operations of per second broken down by disk (e.g., disk /dev/nvme01)
disk_io_inflight_details
ops
The number of IO disk operations in progress (outstanding) broken down by disk (e.g., disk /dev/nvme01)
disk_write_bytes
bytes/s
The number of bytes per second written across all disks
disk_read_bytes
bytes/s
The number of bytes per second read across all disks
disk_read_write_bytes
bytes/s
The number of bytes per second read and written across all disks
disk_write_bytes_details
bytes/s
The number of bytes per second written from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation WRITE)
disk_read_bytes_details
bytes/s
The number of bytes per second read from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation READ)
filesystem_size
bytes
The size of filesystems broken down by type and device (e.g., filesystem of type ext4 for device /dev/nvme01)
400000→40000000 ns
Scheduler Wakeup Granularity (in nanoseconds)
os_CPUSchedMigrationCost
500000 ns
100000→5000000 ns
Amount of time (in nanoseconds) after the last execution that a task is considered to be "cache hot" in migration decisions. A "hot" task is less likely to be migrated to another CPU, so increasing this variable reduces task migrations
os_CPUSchedChildRunsFirst
0
0→1
A freshly forked child runs before the parent continues execution
os_CPUSchedLatency
18000000 ns
2400000→240000000 ns
Targeted preemption latency (in nanoseconds) for CPU bound tasks
os_CPUSchedAutogroupEnabled
1
0→1
Enables the Linux task auto-grouping feature, where the kernel assigns related tasks to groups and schedules them together on CPUs to achieve higher performance for some workloads
os_CPUSchedNrMigrate
32
3→320
Scheduler NR Migrate
10→100 %
VFS Cache Pressure
os_MemoryVmMinFree
67584 KB
10240→1024000 KB
Minimum Free Memory
os_MemoryVmDirtyRatio
20 %
1→99 %
When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write
os_MemoryVmDirtyBackgroundRatio
10 %
1→99 %
When the dirty memory pages exceed this percentage of the total memory, the kernel begins to write them asynchronously in the background
os_MemoryTransparentHugepageEnabled
always
always never
Transparent Hugepage Enablement
os_MemoryTransparentHugepageDefrag
always
always never
Transparent Hugepage Enablement Defrag
os_MemorySwap
swapon
swapon swapoff
Memory Swap
os_MemoryVmDirtyExpire
3000 centisecs
300→30000 centisecs
Memory Dirty Expiration Time
os_MemoryVmDirtyWriteback
500 centisecs
50→5000 centisecs
Memory Dirty Writeback
100→10000 packets
Network Max Backlog
os_NetworkNetIpv4TcpMaxSynBacklog
1024 packets
52→15120 packets
Network IPV4 Max Sync Backlog
os_NetworkNetCoreNetdevBudget
300 packets
30→3000 packets
Network Budget
os_NetworkNetCoreRmemMax
212992 bytes
21299→2129920 bytes
Maximum network receive buffer size that applications can request
os_NetworkNetCoreWmemMax
21299→2129920 bytes
21299→2129920 bytes
Maximum network transmit buffer size that applications can request
os_NetworkNetIpv4TcpSlowStartAfterIdle
1
0→1
Network Slow Start After Idle Flag
os_NetworkNetIpv4TcpFinTimeout
60
6 →600 seconds
Network TCP timeout
os_NetworkRfs
0
0→131072
If enabled increases datacache hitrate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running
100→10000 packets
Network Max Backlog
os_StorageRqAffinity
1
1→2
Storage Requests Affinity
os_StorageQueueScheduler
none
none kyber
Storage Queue Scheduler Type
os_StorageNomerges
0
0→2
Enables the user to disable the lookup logic involved with IO merging requests in the block layer. By default (0) all merges are enabled. With 1 only simple one-hit merges will be tried. With 2 no merge algorithms will be tried
os_StorageMaxSectorsKb
128 KB
32→128 KB
The largest IO size that the OS c
cpu_num
CPUs
The number of CPUs available in the system (physical and logical)
cpu_util
percent
The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)
mem_util
percent
The memory utilization % (i.e, the % of memory used)
mem_util_nocache
percent
The memory utilization % (i.e., the % of memory used) without considering memory reserved for caching purposes
network_tcp_retrans
retrans/s
The number of network TCP retransmissions per second
network_in_bytes_details
bytes/s
The number of inbound network packets in bytes per second broken down by network device (e.g., wlp4s0)
disk_swap_util
percent
The average space utilization % of swap disks
disk_swap_used
bytes
The total amount of space used by swap disks
filesystem_util
percent
The space utilization % of filesystems broken down by type and device (e.g., filesystem of type overlayfs on device /dev/loop1)
filesystem_used
bytes
The amount of space used on the filesystems broken down by type and device (e.g., filesystem of type zfs on device /dev/nvme01)
proc_blocked
processes
The number of processes blocked (e.g, for IO or swapping reasons)
os_context_switch
switches/s
The number of context switches per second
os_cpuSchedMinGranularity
2250000 ns
300000→30000000 ns
Minimal preemption granularity (in nanoseconds) for CPU bound tasks
os_cpuSchedWakeupGranularity
os_MemorySwappiness
1
0→100
Memory Swappiness
os_MemoryVmVfsCachePressure
os_NetworkNetCoreSomaxconn
128 connections
12→1200 connections
Network Max Connections
os_NetworkNetCoreNetdevMaxBacklog
os_StorageReadAhead
128 KB
0→1024 KB
Read-ahead speeds up file access by pre-fetching data and loading it into the page cache so that it can be available earlier in memory instead of from disk
os_StorageNrRequests
3000000 ns
100 %
1000 packets
1000 packets
cpu_num
CPUs
The number of CPUs available in the system (physical and logical)
cpu_util
percent
The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)
mem_util
percent
The memory utilization % (i.e, the % of memory used)
mem_util_nocache
percent
The memory utilization % (i.e., the % of memory used) without considering memory reserved for caching purposes
network_tcp_retrans
retrans/s
The number of network TCP retransmissions per second
network_in_bytes_details
bytes/s
The number of inbound network packets in bytes per second broken down by network device (e.g., wlp4s0)
Notice: you can use a device custom filter to monitor a specific disk with Prometheus. You can find more information on Prometheus queries and the %FILTERS% placeholder here: Prometheus provider and here: Prometheus provider metrics mapping.
disk_swap_util
percent
The average space utilization % of swap disks
disk_swap_used
bytes
The total amount of space used by swap disks
filesystem_util
percent
The space utilization % of filesystems broken down by type and device (e.g., filesystem of type overlayfs on device /dev/loop1)
filesystem_used
bytes
The amount of space used on the filesystems broken down by type and device (e.g., filesystem of type zfs on device /dev/nvme01)
proc_blocked
processes
The number of processes blocked (e.g, for IO or swapping reasons)
os_context_switch
switches/s
The number of context switches per second
os_cpuSchedMinGranularity
2250000 ns
300000→30000000 ns
Minimal preemption granularity (in nanoseconds) for CPU bound tasks
os_cpuSchedWakeupGranularity
os_MemorySwappiness
1
0→100
Memory Swappiness
os_MemoryVmVfsCachePressure
os_NetworkNetCoreSomaxconn
128 connections
12→1200 connections
Network Max Connections
os_NetworkNetCoreNetdevMaxBacklog
os_StorageReadAhead
128 KB
0→1024 KB
Read-ahead speeds up file access by pre-fetching data and loading it into the page cache so that it can be available earlier in memory instead of from disk
os_StorageNrRequests
There are no general constraints among RHEL 8 parameters.
cpu_num
CPUs
The number of CPUs available in the system (physical and logical)
cpu_util
percent
The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)
mem_util
percent
The memory utilization % (i.e, the % of memory used)
mem_util_nocache
percent
The memory utilization % (i.e., the % of memory used) without considering memory reserved for caching purposes
network_tcp_retrans
retrans/s
The number of network TCP retransmissions per second
network_in_bytes_details
bytes/s
The number of inbound network packets in bytes per second broken down by network device (e.g., wlp4s0)
Notice: you can use a device custom filter to monitor a specific disk with Prometheus. You can find more information on Prometheus queries and the %FILTERS% placeholder here: Prometheus provider and here: Prometheus provider metrics mapping.
disk_swap_util
percent
The average space utilization % of swap disks
disk_swap_used
bytes
The total amount of space used by swap disks
filesystem_util
percent
The space utilization % of filesystems broken down by type and device (e.g., filesystem of type overlayfs on device /dev/loop1)
filesystem_used
bytes
The amount of space used on the filesystems broken down by type and device (e.g., filesystem of type zfs on device /dev/nvme01)
proc_blocked
processes
The number of processes blocked (e.g, for IO or swapping reasons)
os_context_switch
switches/s
The number of context switches per second
os_cpuSchedMinGranularity
2250000 ns
300000→30000000 ns
Minimal preemption granularity (in nanoseconds) for CPU bound tasks
os_cpuSchedWakeupGranularity
os_MemorySwappiness
1
0→100
Memory Swappiness
os_MemoryVmVfsCachePressure
os_NetworkNetCoreSomaxconn
128 connections
12→1200 connections
Network Max Connections
os_NetworkNetCoreNetdevMaxBacklog
os_StorageReadAhead
128 KB
0→1024 KB
Read-ahead speeds up file access by pre-fetching data and loading it into the page cache so that it can be available earlier in memory instead of from disk
os_StorageNrRequests
cpu_util_details
percent
The average CPU utilization % broken down by usage type and cpu number (e.g., cp1 user, cp2 system, cp3 soft-irq)
cpu_load_avg
tasks
The system load average (i.e., the number of active tasks in the system)
mem_util_details
percent
The memory utilization % (i.e., the % of memory used) broken down by usage type (e.g., active memory)
mem_used
bytes
The total amount of memory used
mem_used_nocache
bytes
The total amount of memory used without considering memory reserved for caching purposes
mem_total
bytes
The total amount of installed memory
mem_fault_minor
faults/s
The number of minor memory faults (i.e., faults that do not cause disk access) per second
mem_fault_major
faults/s
The number of major memory faults (i.e., faults that cause disk access) per second
mem_fault
faults/s
The number of memory faults (major + minor)
mem_swapins
pages/s
The number of memory pages swapped in per second
mem_swapouts
pages/s
The number of memory pages swapped out per second
network_out_bytes_details
bytes/s
The number of outbound network packets in bytes per second broken down by network device (e.g., eth01)
disk_util_details
percent
The utilization % of disk, i.e how much time a disk is busy doing work broken down by disk (e.g., disk D://)
disk_iops_writes
ops/s
The average number of IO disk-write operations per second across all disks
disk_iops_reads
ops/s
The average number of IO disk-read operations per second across all disks
disk_iops
ops/s
The average number of IO disk operations per second across all disks
disk_response_time_read
seconds
The average response time of IO read-disk operations
disk_response_time_worst
seconds
The average response time of IO disk operations of the slowest disk
disk_response_time_write
seconds
The average response time of IO write-disk operations
disk_response_time_details
ops/s
The average response time of IO disk operations broken down by disk (e.g., disk /dev/nvme01 )
disk_iops_details
ops/s
The number of IO disk-write operations of per second broken down by disk (e.g., disk /dev/nvme01)
disk_io_inflight_details
ops
The number of IO disk operations in progress (outstanding) broken down by disk (e.g., disk /dev/nvme01)
disk_write_bytes
bytes/s
The number of bytes per second written across all disks
disk_read_bytes
bytes/s
The number of bytes per second read across all disks
disk_read_write_bytes
bytes/s
The number of bytes per second read and written across all disks
disk_write_bytes_details
bytes/s
The number of bytes per second written from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation WRITE)
disk_read_bytes_details
bytes/s
The number of bytes per second read from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation READ)
filesystem_size
bytes
The size of filesystems broken down by type and device (e.g., filesystem of type ext4 for device /dev/nvme01)
3000000 ns
400000→40000000 ns
Scheduler Wakeup Granularity (in nanoseconds)
os_CPUSchedMigrationCost
500000 ns
100000→5000000 ns
Amount of time (in nanoseconds) after the last execution that a task is considered to be "cache hot" in migration decisions. A "hot" task is less likely to be migrated to another CPU, so increasing this variable reduces task migrations
os_CPUSchedChildRunsFirst
0
0→1
A freshly forked child runs before the parent continues execution
os_CPUSchedLatency
18000000 ns
2400000→240000000 ns
Targeted preemption latency (in nanoseconds) for CPU bound tasks
os_CPUSchedAutogroupEnabled
1
0→1
Enables the Linux task auto-grouping feature, where the kernel assigns related tasks to groups and schedules them together on CPUs to achieve higher performance for some workloads
os_CPUSchedNrMigrate
32
3→320
Scheduler NR Migrate
100 %
10→100 %
VFS Cache Pressure
os_MemoryVmMinFree
67584 KB
10240→1024000 KB
Minimum Free Memory
os_MemoryVmDirtyRatio
20 %
1→99 %
When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write
os_MemoryVmDirtyBackgroundRatio
10 %
1→99 %
When the dirty memory pages exceed this percentage of the total memory, the kernel begins to write them asynchronously in the background
os_MemoryTransparentHugepageEnabled
never
always never madvise
Transparent Hugepage Enablement
os_MemoryTransparentHugepageDefrag
always
always never madvise defer defer+madvise
Transparent Hugepage Enablement Defrag
os_MemorySwap
swapon
swapon swapoff
Memory Swap
os_MemoryVmDirtyExpire
3000 centisecs
300→30000 centisecs
Memory Dirty Expiration Time
os_MemoryVmDirtyWriteback
500 centisecs
50→5000 centisecs
Memory Dirty Writeback
1000 packets
100→10000 packets
Network Max Backlog
os_NetworkNetIpv4TcpMaxSynBacklog
512 packets
52→15120 packets
Network IPV4 Max Sync Backlog
os_NetworkNetCoreNetdevBudget
300 packets
30→3000 packets
Network Budget
os_NetworkNetCoreRmemMax
212992 bytes
21299→2129920 bytes
Maximum network receive buffer size that applications can request
os_NetworkNetCoreWmemMax
21299→2129920 bytes
21299→2129920 bytes
Maximum network transmit buffer size that applications can request
os_NetworkNetIpv4TcpSlowStartAfterIdle
1
0→1
Network Slow Start After Idle Flag
os_NetworkNetIpv4TcpFinTimeout
60
6 →600 seconds
Network TCP timeout
os_NetworkRfs
0
0→131072
If enabled increases datacache hitrate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running
1000 packets
100→10000 packets
Network Max Backlog
os_StorageRqAffinity
1
1→2
Storage Requests Affinity
os_StorageQueueScheduler
none
none kyber mq-deadline bfq
Storage Queue Scheduler Type
os_StorageNomerges
0
0→2
Enables the user to disable the lookup logic involved with IO merging requests in the block layer. By default (0) all merges are enabled. With 1 only simple one-hit merges will be tried. With 2 no merge algorithms will be tried
os_StorageMaxSectorsKb
128 KB
32→128 KB
The largest IO size that the OS c
cpu_util_details
percent
The average CPU utilization % broken down by usage type and cpu number (e.g., cp1 user, cp2 system, cp3 soft-irq)
cpu_load_avg
tasks
The system load average (i.e., the number of active tasks in the system)
mem_util_details
percent
The memory utilization % (i.e., the % of memory used) broken down by usage type (e.g., active memory)
mem_used
bytes
The total amount of memory used
mem_used_nocache
bytes
The total amount of memory used without considering memory reserved for caching purposes
mem_total
bytes
The total amount of installed memory
mem_fault_minor
faults/s
The number of minor memory faults (i.e., faults that do not cause disk access) per second
mem_fault_major
faults/s
The number of major memory faults (i.e., faults that cause disk access) per second
mem_fault
faults/s
The number of memory faults (major + minor)
mem_swapins
pages/s
The number of memory pages swapped in per second
mem_swapouts
pages/s
The number of memory pages swapped out per second
network_out_bytes_details
bytes/s
The number of outbound network packets in bytes per second broken down by network device (e.g., eth01)
disk_util_details
percent
The utilization % of disk, i.e how much time a disk is busy doing work broken down by disk (e.g., disk D://)
disk_iops_writes
ops/s
The average number of IO disk-write operations per second across all disks
disk_iops_reads
ops/s
The average number of IO disk-read operations per second across all disks
disk_iops
ops/s
The average number of IO disk operations per second across all disks
disk_response_time_read
seconds
The average response time of IO read-disk operations
disk_response_time_worst
seconds
The average response time of IO disk operations of the slowest disk
disk_response_time_write
seconds
The average response time of IO write-disk operations
disk_response_time_details
ops/s
The average response time of IO disk operations broken down by disk (e.g., disk /dev/nvme01 )
disk_iops_details
ops/s
The number of IO disk-write operations of per second broken down by disk (e.g., disk /dev/nvme01)
disk_io_inflight_details
ops
The number of IO disk operations in progress (outstanding) broken down by disk (e.g., disk /dev/nvme01)
disk_write_bytes
bytes/s
The number of bytes per second written across all disks
disk_read_bytes
bytes/s
The number of bytes per second read across all disks
disk_read_write_bytes
bytes/s
The number of bytes per second read and written across all disks
disk_write_bytes_details
bytes/s
The number of bytes per second written from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation WRITE)
disk_read_bytes_details
bytes/s
The number of bytes per second read from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation READ)
filesystem_size
bytes
The size of filesystems broken down by type and device (e.g., filesystem of type ext4 for device /dev/nvme01)
3000000 ns
400000→40000000 ns
Scheduler Wakeup Granularity (in nanoseconds)
os_CPUSchedMigrationCost
500000 ns
100000→5000000 ns
Amount of time (in nanoseconds) after the last execution that a task is considered to be "cache hot" in migration decisions. A "hot" task is less likely to be migrated to another CPU, so increasing this variable reduces task migrations
os_CPUSchedChildRunsFirst
0
0→1
A freshly forked child runs before the parent continues execution
os_CPUSchedLatency
18000000 ns
2400000→240000000 ns
Targeted preemption latency (in nanoseconds) for CPU bound tasks
os_CPUSchedAutogroupEnabled
1
0→1
Enables the Linux task auto-grouping feature, where the kernel assigns related tasks to groups and schedules them together on CPUs to achieve higher performance for some workloads
os_CPUSchedNrMigrate
32
3→320
Scheduler NR Migrate
100 %
10→100 %
VFS Cache Pressure
os_MemoryVmMinFree
67584 KB
10240→1024000 KB
Minimum Free Memory
os_MemoryVmDirtyRatio
20 %
1→99 %
When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write
os_MemoryVmDirtyBackgroundRatio
10 %
1→99 %
When the dirty memory pages exceed this percentage of the total memory, the kernel begins to write them asynchronously in the background
os_MemoryTransparentHugepageEnabled
always
always never
Transparent Hugepage Enablement
os_MemoryTransparentHugepageDefrag
always
always never
Transparent Hugepage Enablement Defrag
os_MemorySwap
swapon
swapon swapoff
Memory Swap
os_MemoryVmDirtyExpire
3000 centisecs
300→30000 centisecs
Memory Dirty Expiration Time
os_MemoryVmDirtyWriteback
500 centisecs
50→5000 centisecs
Memory Dirty Writeback
1000 packets
100→10000 packets
Network Max Backlog
os_NetworkNetIpv4TcpMaxSynBacklog
1024 packets
52→15120 packets
Network IPV4 Max Sync Backlog
os_NetworkNetCoreNetdevBudget
300 packets
30→3000 packets
Network Budget
os_NetworkNetCoreRmemMax
212992 bytes
21299→2129920 bytes
Maximum network receive buffer size that applications can request
os_NetworkNetCoreWmemMax
21299→2129920 bytes
21299→2129920 bytes
Maximum network transmit buffer size that applications can request
os_NetworkNetIpv4TcpSlowStartAfterIdle
1
0→1
Network Slow Start After Idle Flag
os_NetworkNetIpv4TcpFinTimeout
60
6 →600 seconds
Network TCP timeout
os_NetworkRfs
0
0→131072
If enabled increases datacache hitrate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running
1000 packets
100→10000 packets
Network Max Backlog
os_StorageRqAffinity
1
1→2
Storage Requests Affinity
os_StorageQueueScheduler
none
none kyber
Storage Queue Scheduler Type
os_StorageNomerges
0
0→2
Enables the user to disable the lookup logic involved with IO merging requests in the block layer. By default (0) all merges are enabled. With 1 only simple one-hit merges will be tried. With 2 no merge algorithms will be tried
os_StorageMaxSectorsKb
128 KB
32→128 KB
The largest IO size that the OS c
Metrics are defined using a YAML manifest with the following structure:
and properties:
The supported units of measure for metrics are:
Notice that supported units of measure are automatically scaled for visualization purposes. In particular, for units of information, Akamas uses a base 2 scaling for bytes, i.e., 1 kilobyte = 1024 bytes, 1 megabyte = 1024 kilobytes, and so on. Other units of measure are only scaled up using millions or billions (e.g., 124000000 custom units become 124 Mln custom units).
metrics:
- name: "cpu_util"
description: "cpu utilization"
unit: "percent"
- name: "mem_util"
description: "memory utilization"
unit: "percent"milliseconds
seconds
minutes
hours
nanoseconds
microseconds
milliseconds
seconds
minutes
hours
nanoseconds
microseconds
milliseconds
seconds
minutes
hours
nanoseconds
microseconds
milliseconds
seconds
minutes
hours
nanoseconds
microseconds
milliseconds
seconds
minutes
hours
nanoseconds
microseconds
milliseconds
seconds
minutes
hours
nanoseconds
microseconds
milliseconds
seconds
minutes
hours
nanoseconds
microseconds
milliseconds
seconds
minutes
hours
Units of infomation
62d9d76450514395b1391af7efc320be
bits
kilobits
megabits
gigabit
terabit
petabit
bytes
kilobytes
megabytes
gigabytes
terabytes
petabytes
bits
kilobits
megabits
gigabit
terabit
petabit
bytes
kilobytes
megabytes
gigabytes
terabytes
petabytes
bits
kilobits
megabits
gigabit
terabit
petabit
bytes
kilobytes
megabytes
gigabytes
terabytes
petabytes
bits
kilobits
megabits
gigabit
terabit
petabit
bytes
kilobytes
megabytes
gigabytes
terabytes
petabytes
bits
kilobits
megabits
gigabit
terabit
petabit
bytes
kilobytes
megabytes
gigabytes
terabytes
petabytes
bits
kilobits
megabits
gigabit
terabit
petabit
bytes
kilobytes
megabytes
gigabytes
terabytes
petabytes
bits
kilobits
megabits
gigabit
terabit
petabit
bytes
kilobytes
megabytes
gigabytes
terabytes
petabytes
bits
kilobits
megabits
gigabit
terabit
petabit
bytes
kilobytes
megabytes
gigabytes
terabytes
petabytes
Others
percent
name
string
No spaces are allowed
TRUE
The name of the metric
unit
string
A supported unit or a custom unit (see supported units of measure)
The unit of measure of the metric
description
string
TRUE
A description characterizing the metric
Temporal units
bc49696649424295b7d565ceb89995a1
nanoseconds
microseconds
/{appId}/1/jobs/{jobId}
.numCompletedTasks
spark_active_tasks
job
/{appId}/1/jobs/{jobId}
.numActiveTasks
spark_skipped_tasks
job
/{appId}/1/jobs/{jobId}
.numSkippedTasks
spark_failed_tasks
job
/{appId}/1/jobs/{jobId}
.numFailedTasks
spark_killed_tasks
job
/{appId}/1/jobs/{jobId}
.numKilledTasks
spark_completed_stages
job
/{appId}/1/jobs/{jobId}
.numCompletedStages
spark_failed_stages
job
/{appId}/1/jobs/{jobId}
.numFailedStages
spark_skipped_stages
job
/{appId}/1/jobs/{jobId}
.numSkippedStages
spark_active_stages
job
/{appId}/1/jobs/{jobId}
.numActiveStages
spark_duration
stage
/{appId}/1/stages/{stageId}
.getDuration
spark_task_stage_executor_run_time
stage
/{appId}/1/stages/{stageId}
.getExecutorRunTime
spark_task_stage_executor_cpu_time
stage
/{appId}/1/stages/{stageId}
.getExecutorCpuTime
spark_active_tasks
stage
/{appId}/1/stages/{stageId}
.getNumActiveTasks
spark_completed_tasks
stage
/{appId}/1/stages/{stageId}
.getNumCompleteTasks
spark_failed_tasks
stage
/{appId}/1/stages/{stageId}
.getNumFailedTasks
spark_killed_tasks
stage
/{appId}/1/stages/{stageId}
.getNumKilledTasks
spark_task_stage_input_bytes_read
stage
/{appId}/1/stages/{stageId}
.getInputBytes
spark_task_stage_input_records_read
stage
/{appId}/1/stages/{stageId}
.getInputRecords
spark_task_stage_output_bytes_written
stage
/{appId}/1/stages/{stageId}
.getOutputBytes
spark_task_stage_output_records_written
stage
/{appId}/1/stages/{stageId}
.getOutputRecords
spark_stage_shuffle_read_bytes
stage
/{appId}/1/stages/{stageId}
.getShuffleReadBytes
spark_task_stage_shuffle_read_records
stage
/{appId}/1/stages/{stageId}
.getShuffleReadRecords
spark_task_stage_shuffle_write_bytes
stage
/{appId}/1/stages/{stageId}
.getShuffleWriteBytes
spark_task_stage_shuffle_write_records
stage
/{appId}/1/stages/{stageId}
.getShuffleWriteRecords
spark_task_stage_memory_bytes_spilled
stage
/{appId}/1/stages/{stageId}
.getMemoryBytesSpilled
spark_task_stage_disk_bytes_spilled
stage
/{appId}/1/stages/{stageId}
.getDiskBytesSpilled
spark_duration
task
/{appId}/1/stages/{stageId}
.tasks[].duration
spark_task_executor_deserialize_time
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.executorDeserializeTime
spark_task_executor_deserialize_cpu_time
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.executorDeserializeCpuTime
spark_task_stage_executor_run_time
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.executorRunTime
spark_task_stage_executor_cpu_time
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.executorCpuTime
spark_task_result_size
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.resultSize
spark_task_jvm_gc_duration
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.jvmGcTime
spark_task_result_serialization_time
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.resultSerializationTime
spark_task_stage_memory_bytes_spilled
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.memoryBytesSpilled
spark_task_stage_disk_bytes_spilled
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.diskBytesSpilled
spark_task_peak_execution_memory
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.peakExecutionMemory
spark_task_stage_input_bytes_read
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.inputMetrics.bytesRead
spark_task_stage_input_records_read
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.inputMetrics.recordsRead
spark_task_stage_output_bytes_written
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.outputMetrics.bytesWritten
spark_task_stage_output_records_written
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.outputMetrics.recordsWritten
spark_task_shuffle_read_remote_blocks_fetched
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.shuffleReadMetrics.remoteBlocksFetched
spark_task_shuffle_read_local_blocks_fetched
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.shuffleReadMetrics.localBlocksFetched
spark_task_shuffle_read_fetch_wait_time
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.shuffleReadMetrics.fetchWaitTime
spark_task_shuffle_read_remote_bytes
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.shuffleReadMetrics.remoteBytesRead
spark_task_shuffle_read_remote_bytes_to_disk
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.shuffleReadMetrics.remoteBytesReadToDisk
spark_task_shuffle_read_local_bytes
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.shuffleReadMetrics.localBytesRead
spark_task_stage_shuffle_read_records
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.shuffleReadMetrics.recordsRead
spark_task_stage_shuffle_write_bytes
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.shuffleWriteMetrics.bytesWritten
spark_task_shuffle_write_time
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.shuffleWriteMetrics.writeTime
spark_task_stage_shuffle_write_records
task
/{appId}/1/stages/{stageId}
.tasks[].taskMetrics.shuffleWriteMetrics.recordsWritten
spark_executor_rdd_blocks
executor
/{appId}/1/allexecutors
select(.id!='driver) | .rddBlocks
spark_executor_mem_used
executor
/{appId}/1/allexecutors
select(.id!='driver) | .memoryUsed
spark_executor_disk_used
executor
/{appId}/1/allexecutors
select(.id!='driver) | .diskUsed
spark_executor_cores
executor
/{appId}/1/allexecutors
select(.id!='driver) | .totalCores
spark_active_tasks
executor
/{appId}/1/allexecutors
select(.id!='driver) | .activeTasks
spark_failed_tasks
executor
/{appId}/1/allexecutors
select(.id!='driver) | .failedTasks
spark_completed_tasks
executor
/{appId}/1/allexecutors
select(.id!='driver) | .completedTasks
spark_executor_total_tasks
executor
/{appId}/1/allexecutors
select(.id!='driver) | .totalTasks
spark_executor_total_duration
executor
/{appId}/1/allexecutors
select(.id!='driver) | .totalDuration
spark_executor_total_jvm_gc_duration
executor
/{appId}/1/allexecutors
select(.id!='driver) | .totalGCTime
spark_executor_total_input_bytes
executor
/{appId}/1/allexecutors
select(.id!='driver) | .totalInputBytes
spark_executor_total_shuffle_read
executor
/{appId}/1/allexecutors
select(.id!='driver) | .totalShuffleRead
spark_executor_total_shuffle_write
executor
/{appId}/1/allexecutors
select(.id!='driver) | .totalShuffleWrite
spark_executor_max_mem_used
executor
/{appId}/1/allexecutors
select(.id!='driver) | .maxMemory
spark_executor_used_on_heap_storage_memory
executor
/{appId}/1/allexecutors
select(.id!='driver) | .memoryMetrics.usedOnHeapStorageMemory
spark_executor_used_off_heap_storage_memory
executor
/{appId}/1/allexecutors
select(.id!='driver) | .memoryMetrics.usedOffHeapStorageMemory
spark_executor_total_on_heap_storage_memory
executor
/{appId}/1/allexecutors
select(.id!='driver) | .memoryMetrics.totalOnHeapStorageMemory
spark_executor_total_off_heap_storage_memory
executor
/{appId}/1/allexecutors
select(.id!='driver) | .memoryMetrics.totalOffHeapStorageMemory
spark_driver_rdd_blocks
driver
/{appId}/1/allexecutors
select(.id=='driver') | .rddBlocks
spark_driver_mem_used
driver
/{appId}/1/allexecutors
select(.id=='driver') | .memoryUsed
spark_driver_disk_used
driver
/{appId}/1/allexecutors
select(.id=='driver') | .diskUsed
spark_driver_cores
driver
/{appId}/1/allexecutors
select(.id=='driver') | .totalCores
spark_driver_total_duration
driver
/{appId}/1/allexecutors
select(.id=='driver') | .totalDuration
spark_driver_total_jvm_gc_duration
driver
/{appId}/1/allexecutors
select(.id=='driver') | .totalGCTime
spark_driver_total_input_bytes
driver
/{appId}/1/allexecutors
select(.id=='driver') | .totalInputBytes
spark_driver_total_shuffle_read
driver
/{appId}/1/allexecutors
select(.id=='driver') | .totalShuffleRead
spark_driver_total_shuffle_write
driver
/{appId}/1/allexecutors
select(.id=='driver') | .totalShuffleWrite
spark_driver_max_mem_used
driver
/{appId}/1/allexecutors
select(.id=='driver') | .maxMemory
spark_driver_used_on_heap_storage_memory
driver
/{appId}/1/allexecutors
select(.id=='driver') | .memoryMetrics.usedOnHeapStorageMemory
spark_driver_used_off_heap_storage_memory
driver
/{appId}/1/allexecutors
select(.id=='driver') | .memoryMetrics.usedOffHeapStorageMemory
spark_driver_total_on_heap_storage_memory
driver
/{appId}/1/allexecutors
select(.id=='driver') | .memoryMetrics.totalOnHeapStorageMemory
spark_driver_total_off_heap_storage_memory
driver
/{appId}/1/allexecutors
select(.id=='driver') | .memoryMetrics.totalOffHeapStorageMemory
spark_duration
job
/{appId}/1/jobs/{jobId}
.duration
spark_completed_tasks
job
This page describes the Optimization Pack for Eclipse OpenJ9 (formerly known as IBM J9) Virtual Machine version 6.
The following parameters require their ranges or default values to be updated according to the described rules:
Notice that the value nocompressedreferences for j9vm_compressedReferences can only be specified for JVMs compiled with the proper --with-noncompressedrefs flag. If this is not the case you cannot actively disable compressed references, meaning:
for Xmx <= 57GB is useless to tune this parameter since compressed references are active by default and it is not possible to explicitly disable it
for Xmx > 57GB, since the by default (blank value) compressed references are disabled, Akamas can try to enable it. This requires removing the nocompressedreferences from the domain
The following tables show a list of constraints that may be required in the definition of the study, depending on the tuned parameters:
Notice that
j9vm_newSpaceFixed is mutually exclusive with j9vm_minNewSpace and j9vm_maxNewSpace
j9vm_oldSpaceFixed is mutually exclusive with j9vm_minOldSpace and j9vm_maxOldSpace
jvm_heap_util
percent
The utilization % of heap memory
jvm_memory_used
bytes
The total amount of memory used across all the JVM memory pools
jvm_memory_used_details
bytes
The total amount of memory used broken down by pool (e.g., code-cache, compressed-class-space)
jvm_memory_buffer_pool_used
bytes
The total amount of bytes used by buffers within the JVM buffer memory pool
jvm_gc_time
percent
The % of wall clock time the JVM spent doing stop the world garbage collection activities
jvm_gc_time_details
percent
The % of wall clock time the JVM spent doing stop the world garbage collection activities broken down by type of garbage collection algorithm (e.g., ParNew)
jvm_gc_count
collections/s
The total number of stop the world JVM garbage collections that have occurred per second
jvm_gc_count_details
collections/s
The total number of stop the world JVM garbage collections that have occurred per second, broken down by type of garbage collection algorithm (e.g., G1, CMS)
jvm_gc_duration
seconds
The average duration of a stop the world JVM garbage collection
jvm_gc_duration_details
seconds
The average duration of a stop the world JVM garbage collection broken down by type of garbage collection algorithm (e.g., G1, CMS)
jvm_threads_current
threads
The total number of active threads within the JVM
jvm_threads_deadlocked
threads
The total number of deadlocked threads within the JVM
jvm_compilation_time
milliseconds
The total time spent by the JVM JIT compiler compiling bytecode
You should select your own domain.
yes
Minimum heap size (in megabytes)
j9vm_maxHeapSize
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
Maximum heap size (in megabytes)
j9vm_minFreeHeap
real
percent
0.3
0.1 → 0.5
yes
Specify the minimum % free heap required after global GC
j9vm_maxFreeHeap
real
percent
0.6
0.4 → 0.9
yes
Specify the maximum % free heap required after global GC
gencon, subpool, optavgpause, optthruput, nogc
yes
GC policy to use
j9vm_gcThreads
integer
threads
You should select your own default value.
1 → 64
yes
Number of threads the garbage collector uses for parallel operations
j9vm_scvTenureAge
integer
10
1 → 14
yes
Set the initial tenuring threshold for generational concurrent GC policy
j9vm_scvAdaptiveTenureAge
categorical
blank
blank, -Xgc:scvNoAdaptiveTenure
yes
Enable the adaptive tenure age for generational concurrent GC policy
j9vm_newSpaceFixed
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The fixed size of the new area when using the gencon GC policy. Must not be set alongside min or max
j9vm_minNewSpace
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The initial size of the new area when using the gencon GC policy
j9vm_maxNewSpace
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The maximum size of the new area when using the gencon GC policy
j9vm_oldSpaceFixed
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The fixed size of the old area when using the gencon GC policy. Must not be set alongside min or max
j9vm_minOldSpace
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The initial size of the old area when using the gencon GC policy
j9vm_maxOldSpace
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The maximum size of the old area when using the gencon GC policy
j9vm_concurrentScavenge
categorical
concurrentScavenge
concurrentScavenge, noConcurrentScavenge
yes
Support pause-less garbage collection mode with gencon
j9vm_gcPartialCompact
categorical
nopartialcompactgc
nopartialcompactgc, partialcompactgc
yes
Enable partial compaction
j9vm_concurrentMeter
categorical
soa
soa, loa, dynamic
yes
Determine which area is monitored by the concurrent mark
j9vm_concurrentBackground
integer
0
0 → 128
yes
The number of background threads assisting the mutator threads in concurrent mark
j9vm_concurrentSlack
integer
megabytes
0
You should select your own domain.
yes
The target size of free heap space for concurrent collectors
j9vm_concurrentLevel
integer
percent
8
0 → 100
yes
The ratio between the amount of heap allocated and the amount of heap marked
j9vm_gcCompact
categorical
blank
blank, -Xcompactgc, -Xnocompactgc
yes
Enables full compaction on all garbage collections (system and global)
j9vm_minGcTime
real
percent
0.05
0.0 → 1.0
yes
The minimum percentage of time to be spent in garbage collection, triggering the resize of the heap to meet the specified values
j9vm_maxGcTime
real
percent
0.13
0.0 → 1.0
yes
The maximum percentage of time to be spent in garbage collection, triggering the resize of the heap to meet the specified values
j9vm_loa
categorical
loa
loa, noloa
yes
Enable the allocation of the large area object during garbage collection
j9vm_loa_initial
real
0.05
0.0 → 0.95
yes
The initial portion of the tenure area allocated to the large area object
j9vm_loa_minimum
real
0.01
0.0 → 0.95
yes
The minimum portion of the tenure area allocated to the large area object
j9vm_loa_maximum
real
0.5
0.0 → 0.95
yes
The maximum portion of the tenure area allocated to the large area object
noOpt, cold, warm, hot, veryHot, scorching
yes
Force the JIT compiler to compile all methods at a specific optimization level
j9vm_codeCacheTotal
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
Maximum size limit in MB for the JIT code cache
j9vm_jit_count
integer
10000
0 → 1000000
yes
The number of times a method is called before it is compiled
blank, -Xcompressedrefs, -Xnocompressedrefs
yes
Enable/disable the use of compressed references
j9vm_aggressiveOpts
categorical
blank
blank, -Xaggressive
yes
Enable the use of aggressive performance optimization features, which are expected to become default in upcoming releases
j9vm_virtualized
categorical
blank
blank, -Xtune:virtualized
yes
Optimize the VM for virtualized environment, reducing CPU usage when idle
j9vm_shareclasses
categorical
blank
blank, -Xshareclasses
yes
Enable class sharing
j9vm_quickstart
categorical
blank
blank, -Xquickstart
yes
Run JIT with only a subset of optimizations, improving the performance of short-running applications
j9vm_minimizeUserCpu
categorical
blank
blank, -Xthr:minimizeUserCPU
yes
Minimizes user-mode CPU usage in thread synchronization where possible
j9vm_minOldSpace
75% of j9vm_minHeapSize
must not exceed j9vm_minHeapSize
j9vm_maxOldSpace
same as j9vm_maxHeapSize
must not exceed j9vm_maxHeapSize
j9vm_gcthreads
number of CPUs - 1, up to a maximum of 64
capped to default, no benefit in exceeding that value
j9vm_compressedReferences
enabled for j9vm_maxHeapSize<= 57 GB
jvm.j9vm_minFreeHeap + 0.05 < jvm.j9vm_maxFreeHeap
jvm.j9vm_minGcTimeMin < jvm.j9vm_maxGcTime
the sum of j9vm_minNewSpace and j9vm_minOldSpace must be equal to j9vm_minHeapSize, so it's useless to tune all of them together. Max values seem to be more complex.
jvm_heap_size
bytes
The size of the JVM heap memory
jvm_heap_used
bytes
The amount of heap memory used
j9vm_minHeapSize
integer
megabytes
j9vm_gcPolicy
categorical
j9vm_jitOptlevel
ordinal
j9vm_compressedReferences
categorical
j9vm_minNewSpace
25% of j9vm_minHeapSize
must not exceed j9vm_minHeapSize
j9vm_maxNewSpace
25% of j9vm_maxHeapSize
must not exceed j9vm_maxHeapSize
jvm.j9vm_minHeapSize < jvm.j9vm_maxHeapSize
jvm.j9vm_minNewSpace < jvm.j9vm_maxNewSpace && jvm.j9vm_minNewSpace < jvm.j9vm_minHeapSize && jvm.j9vm_maxNewSpace < jvm.j9vm_maxHeapSize
jvm.j9vm_minOldSpace < jvm.j9vm_maxOldSpace && jvm.j9vm_minOldSpace < jvm.j9vm_minHeapSize && jvm.j9vm_maxOldSpace < jvm.j9vm_maxHeapSize
You should select your own default value.
gencon
noOpt
blank
jvm.j9vm_loa_minimum <= jvm.j9vm_loa_initial && jvm.j9vm_loa_initial <= jvm.j9vm_loa_maximum
cpu_load_avg
tasks
The system load average (i.e., the number of active tasks in the system)
cpu_num
CPUs
The number of CPUs available in the system (physical and logical)
mem_fault
faults/s
The number of memory faults (minor+major)
mem_fault_major
faults/s
The number of major memory faults (i.e., faults that cause disk access) per second
disk_io_inflight_details
ops
The number of IO disk operations in progress (outstanding) broken down by disk (e.g., disk /dev/nvme01)
disk_iops
ops/s
The average number of IO disk operations per second across all disks
network_in_bytes_details
bytes/s
The number of inbound network packets in bytes per second broken down by network device (e.g., wlp4s0)
network_out_bytes_details
bytes/s
The number of outbound network packets in bytes per second broken down by network device (e.g., eth01)
os_context_switch
switches/s
The number of context switches per second
proc_blocked
processes
The number of processes blocked (e.g, for IO or swapping reasons)
os_cpuSchedMinGranularity
integer
nanoseconds
os_MemorySwappiness
integer
percent
os_NetworkNetCoreSomaxconn
integer
megabytes
os_StorageReadAhead
integer
kilobytes
jvm_heap_size
bytes
The size of the JVM heap memory
jvm_heap_used
bytes
The amount of heap memory used
j9vm_minHeapSize
integer
megabytes
j9vm_gcPolicy
categorical
j9vm_jitOptlevel
ordinal
j9vm_lockReservation
categorical
The following parameters require their ranges or default values to be updated according to the described rules:
j9vm_minNewSpace
25% of j9vm_minHeapSize
must not exceed j9vm_minHeapSize
j9vm_maxNewSpace
25% of j9vm_maxHeapSize
must not exceed j9vm_maxHeapSize
Notice that the value nocompressedreferences for j9vm_compressedReferences can only be specified for JVMs compiled with the proper --with-noncompressedrefs flag. If this is not the case you cannot actively disable compressed references, meaning:
for Xmx <= 57GB is useless to tune this parameter since compressed references are active by default and it is not possible to explicitly disable it
for Xmx > 57GB, since the by default (blank value) compressed references are disabled, Akamas can try to enable it. This requires removing the nocompressedreferences from the domain
The following tables show a list of constraints that may be required in the definition of the study, depending on the tuned parameters:
jvm.j9vm_minHeapSize < jvm.j9vm_maxHeapSize
jvm.j9vm_minNewSpace < jvm.j9vm_maxNewSpace && jvm.j9vm_minNewSpace < jvm.j9vm_minHeapSize && jvm.j9vm_maxNewSpace < jvm.j9vm_maxHeapSize
jvm.j9vm_minOldSpace < jvm.j9vm_maxOldSpace && jvm.j9vm_minOldSpace < jvm.j9vm_minHeapSize && jvm.j9vm_maxOldSpace < jvm.j9vm_maxHeapSize
Notice that
j9vm_newSpaceFixed is mutually exclusive with j9vm_minNewSpace and j9vm_maxNewSpace
j9vm_oldSpaceFixed is mutually exclusive with j9vm_minOldSpace and j9vm_maxOldSpace
the sum of j9vm_minNewSpace and j9vm_minOldSpace must be equal to j9vm_minHeapSize, so it's useless to tune all of them together. Max values seem to be more complex.
cpu_num
CPUs
The number of CPUs available in the system (physical and logical)
cpu_util
percent
The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)
mem_util
percent
The memory utilization % (i.e, the % of memory used)
mem_util_nocache
percent
The memory utilization % (i.e., the % of memory used) without considering memory reserved for caching purposes
network_tcp_retrans
retrans/s
The number of network TCP retransmissions per second
network_in_bytes_details
bytes/s
The number of inbound network packets in bytes per second broken down by network device (e.g., wlp4s0)
Notice: you can use a device custom filter to monitor a specific disk with Prometheus. You can find more information on Prometheus queries and the %FILTERS% placeholder here: Prometheus provider and here: Prometheus provider metrics mapping.
disk_swap_util
percent
The average space utilization % of swap disks
disk_swap_used
bytes
The total amount of space used by swap disks
filesystem_util
percent
The space utilization % of filesystems broken down by type and device (e.g., filesystem of type overlayfs on device /dev/loop1)
filesystem_used
bytes
The amount of space used on the filesystems broken down by type and device (e.g., filesystem of type zfs on device /dev/nvme01)
proc_blocked
processes
The number of processes blocked (e.g, for IO or swapping reasons)
os_context_switch
switches/s
The number of context switches per second
os_cpuSchedMinGranularity
2250000 ns
300000→30000000 ns
Minimal preemption granularity (in nanoseconds) for CPU bound tasks
os_cpuSchedWakeupGranularity
os_MemorySwappiness
1
0→100
Memory Swappiness
os_MemoryVmVfsCachePressure
os_NetworkNetCoreSomaxconn
128 connections
12→1200 connections
Network Max Connections
os_NetworkNetCoreNetdevMaxBacklog
os_StorageReadAhead
128 KB
0→1024 KB
Read-ahead speeds up file access by pre-fetching data and loading it into the page cache so that it can be available earlier in memory instead of from disk
os_StorageNrRequests
This page describes the Optimization Pack for Eclipse OpenJ9 (formerly known as IBM J9) Virtual Machine version 8.
The following parameters require their ranges or default values to be updated according to the described rules:
Notice that the value nocompressedreferences for j9vm_compressedReferences can only be specified for JVMs compiled with the proper --with-noncompressedrefs flag. If this is not the case you cannot actively disable compressed references, meaning:
for Xmx <= 57GB is useless to tune this parameter since compressed references are active by default and it is not possible to explicitly disable it
for Xmx > 57GB, since the by default (blank value) compressed references are disabled, Akamas can try to enable it. This requires removing the nocompressedreferences from the domain
The following tables show a list of constraints that may be required in the definition of the study, depending on the tuned parameters:
Notice that
j9vm_newSpaceFixed is mutually exclusive with j9vm_minNewSpace and j9vm_maxNewSpace
j9vm_oldSpaceFixed is mutually exclusive with j9vm_minOldSpace and j9vm_maxOldSpace
cpu_util
percent
The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)
cpu_used
CPUs
The average number of CPUs used in the system (physical and logical)
cpu_util_details
percent
The average CPU utilization % broken down by usage type and CPU number (e.g., cp1 user, cp2 system, cp3 soft-irq)
mem_fault_minor
faults/s
The number of minor memory faults (i.e., faults that do not cause disk access) per second
mem_swapins
pages/s
The number of memory pages swapped in per second
mem_swapouts
pages/s
The number of memory pages swapped out per second
mem_total
bytes
The total amount of installed memory
mem_used
bytes
The total amount of memory used
mem_used_nocache
bytes
The total amount of memory used without considering memory reserved for caching purposes
mem_util
percent
The memory utilization % (i.e, the % of memory used)
mem_util_details
percent
The memory utilization % (i.e., the % of memory used) broken down by usage type (e.g., active memory)
mem_util_nocache
percent
The memory utilization % (i.e., the % of memory used) without considering memory reserved for caching purposes
disk_iops_details
ops/s
The number of IO disk-write operations per second broken down by disk (e.g., disk /dev/nvme01)
disk_iops_reads
ops/s
The average number of IO disk-read operations per second across all disks
disk_iops_writes
ops/s
The average number of IO disk-write operations per second across all disks
disk_read_bytes
bytes/s
The number of bytes per second read across all disks
disk_read_bytes_details
bytes/s
The average response time of IO disk operations broken down by disk (e.g., disk C://)
disk_read_write_bytes
bytes/s
The number of bytes per second written across all disks
disk_response_time_details
seconds
The average response time of IO disk operations broken down by disk (e.g., disk C://)
disk_response_time_read
seconds
The average response time of read disk operations
disk_response_time_worst
seconds
The average response time of IO disk operations of the slowest disk
disk_response_time_write
seconds
The average response time of write on disk operations
disk_swap_used
bytes
The total amount of space used by swap disks
disk_swap_util
percent
The average space utilization % of swap disks
disk_util_details
percent
The utilization % of disk, i.e how much time a disk is busy doing work broken down by disk (e.g., disk D://)
disk_write_bytes
bytes/s
The number of bytes per second written across all disks
disk_write_bytes_details
bytes/s
The number of bytes per second written from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation WRITE)
filesystem_size
bytes
The size of filesystems broken down by type and device (e.g., filesystem of type ext4 for device /dev/nvme01)
filesystem_used
bytes
The amount of space used on the filesystems broken down by type and device (e.g., filesystem of type zfs on device /dev/nvme01)
filesystem_util
percent
The space utilization % of filesystems broken down by type and device (e.g., filesystem of type overlayfs on device /dev/loop1)
network_tcp_retrans
retrans/s
The number of network TCP retransmissions per second
1500000
300000 → 30000000
no
Minimal preemption granularity (in nanoseconds) for CPU bound tasks
os_cpuSchedWakeupGranularity
integer
nanoseconds
2000000
400000 → 40000000
no
Scheduler Wakeup Granularity (in nanoseconds)
os_CPUSchedMigrationCost
integer
nanoseconds
500000
100000 → 5000000
no
Amount of time (in nanoseconds) after the last execution that a task is considered to be "cache hot" in migration decisions. A "hot" task is less likely to be migrated to another CPU, so increasing this variable reduces task migrations
os_CPUSchedChildRunsFirst
integer
0
0, 1
no
A freshly forked child runs before the parent continues execution
os_CPUSchedLatency
integer
nanoseconds
12000000
2400000 → 240000000
no
Targeted preemption latency (in nanoseconds) for CPU bound tasks
os_CPUSchedAutogroupEnabled
integer
0
0, 1
no
Enables the Linux task auto-grouping feature, where the kernel assigns related tasks to groups and schedules them together on CPUs to achieve higher performance for some workloads
os_CPUSchedNrMigrate
integer
32
3 → 320
no
Scheduler NR Migrate
60
0 → 100
no
The percentage of RAM free space for which the kernel will start swapping pages to disk
os_MemoryVmVfsCachePressure
integer
100
10 → 100
no
VFS Cache Pressure
os_MemoryVmCompactionProactiveness
integer
Determines how aggressively compaction is done in the background
os_MemoryVmMinFree
integer
67584
10240 → 1024000
no
Minimum Free Memory (in kbytes)
os_MemoryTransparentHugepageEnabled
categorical
madvise
always, never, madvise
no
Transparent Hugepage Enablement Flag
os_MemoryTransparentHugepageDefrag
categorical
madvise
always, never, defer+madvise, madvise, defer
no
Transparent Hugepage Enablement Defrag
os_MemorySwap
categorical
swapon
swapon, swapoff
no
Memory Swap
os_MemoryVmDirtyRatio
integer
20
1 → 99
no
When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write
os_MemoryVmDirtyBackgroundRatio
integer
10
1 → 99
no
When the dirty memory pages exceed this percentage of the total memory, the kernel begins to write them asynchronously in the background
os_MemoryVmDirtyExpire
integer
centiseconds
3000
300 → 30000
no
When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write
os_MemoryVmDirtyWriteback
integer
centiseconds
500
50 → 5000
no
Memory Dirty Writeback (in centisecs)
128
12 → 8192
no
Network Max Connections
os_NetworkNetCoreNetdevMaxBacklog
integer
megabytes/s
1000
100 → 10000
no
Network Max Backlog
os_NetworkNetIpv4TcpMaxSynBacklog
integer
milliseconds
256
52 → 5120
no
Network IPV4 Max Sync Backlog
os_NetworkNetCoreNetdevBudget
integer
300
30 → 30000
no
Network Budget
os_NetworkNetCoreRmemMax
integer
212992
21299 → 2129920
no
Maximum network receive buffer size that applications can request
os_NetworkNetCoreWmemMax
integer
212992
21299 → 2129920
no
Maximum network transmit buffer size that applications can request
os_NetworkNetIpv4TcpSlowStartAfterIdle
integer
1
0, 1
no
Network Slow Start After Idle Flag
os_NetworkNetIpv4TcpFinTimeout
integer
60
6 → 600
no
Network TCP timeout
os_NetworkRfs
integer
0
0 → 131072
no
If enabled increases datacache hitrate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running
128
0 → 4096
no
Read-ahead speeds up file access by pre-fetching data and loading it into the page cache so that it can be available earlier in memory instead of from disk
os_StorageNrRequests
integer
32
12 → 1280
no
Storage Number of Requests
os_StorageRqAffinity
integer
1
1, 2
no
Storage Requests Affinity
os_StorageNomerges
integer
0
0 → 2
no
Enables the user to disable the lookup logic involved with IO merging requests in the block layer. By default (0) all merges are enabled. With 1 only simple one-hit merges will be tried. With 2 no merge algorithms will be tried
os_StorageMaxSectorsKb
integer
kilobytes
256
32 → 256
no
The largest IO size that the OS can issue to a block device
jvm_heap_util
percent
The utilization % of heap memory
jvm_memory_used
bytes
The total amount of memory used across all the JVM memory pools
jvm_memory_used_details
bytes
The total amount of memory used broken down by pool (e.g., code-cache, compressed-class-space)
jvm_memory_buffer_pool_used
bytes
The total amount of bytes used by buffers within the JVM buffer memory pool
jvm_gc_time
percent
The % of wall clock time the JVM spent doing stop the world garbage collection activities
jvm_gc_time_details
percent
The % of wall clock time the JVM spent doing stop the world garbage collection activities broken down by type of garbage collection algorithm (e.g., ParNew)
jvm_gc_count
collections/s
The total number of stop the world JVM garbage collections that have occurred per second
jvm_gc_count_details
collections/s
The total number of stop the world JVM garbage collections that have occurred per second, broken down by type of garbage collection algorithm (e.g., G1, CMS)
jvm_gc_duration
seconds
The average duration of a stop the world JVM garbage collection
jvm_gc_duration_details
seconds
The average duration of a stop the world JVM garbage collection broken down by type of garbage collection algorithm (e.g., G1, CMS)
jvm_threads_current
threads
The total number of active threads within the JVM
jvm_threads_deadlocked
threads
The total number of deadlocked threads within the JVM
jvm_compilation_time
milliseconds
The total time spent by the JVM JIT compiler compiling bytecode
You should select your own default value.
You should select your own domain.
yes
Minimum heap size (in megabytes)
j9vm_maxHeapSize
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
Maximum heap size (in megabytes)
j9vm_minFreeHeap
real
percent
0.3
0.1 → 0.5
yes
Specify the minimum % free heap required after global GC
j9vm_maxFreeHeap
real
percent
0.6
0.4 → 0.9
yes
Specify the maximum % free heap required after global GC
gencon
gencon, subpool, optavgpause, optthruput, nogc
yes
GC policy to use
j9vm_gcThreads
integer
threads
You should select your own default value.
1 → 64
yes
Number of threads the garbage collector uses for parallel operations
j9vm_scvTenureAge
integer
10
1 → 14
yes
Set the initial tenuring threshold for generational concurrent GC policy
j9vm_scvAdaptiveTenureAge
categorical
blank
blank, -Xgc:scvNoAdaptiveTenure
yes
Enable the adaptive tenure age for generational concurrent GC policy
j9vm_newSpaceFixed
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The fixed size of the new area when using the gencon GC policy. Must not be set alongside min or max
j9vm_minNewSpace
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The initial size of the new area when using the gencon GC policy
j9vm_maxNewSpace
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The maximum size of the new area when using the gencon GC policy
j9vm_oldSpaceFixed
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The fixed size of the old area when using the gencon GC policy. Must not be set alongside min or max
j9vm_minOldSpace
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The initial size of the old area when using the gencon GC policy
j9vm_maxOldSpace
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The maximum size of the old area when using the gencon GC policy
j9vm_concurrentScavenge
categorical
concurrentScavenge
concurrentScavenge, noConcurrentScavenge
yes
Support pause-less garbage collection mode with gencon
j9vm_gcPartialCompact
categorical
nopartialcompactgc
nopartialcompactgc, partialcompactgc
yes
Enable partial compaction
j9vm_concurrentMeter
categorical
soa
soa, loa, dynamic
yes
Determine which area is monitored by the concurrent mark
j9vm_concurrentBackground
integer
0
0 → 128
yes
The number of background threads assisting the mutator threads in concurrent mark
j9vm_concurrentSlack
integer
megabytes
0
You should select your own domain.
yes
The target size of free heap space for concurrent collectors
j9vm_concurrentLevel
integer
percent
8
0 → 100
yes
The ratio between the amount of heap allocated and the amount of heap marked
j9vm_gcCompact
categorical
blank
blank, -Xcompactgc, -Xnocompactgc
yes
Enables full compaction on all garbage collections (system and global)
j9vm_minGcTime
real
percent
0.05
0.0 → 1.0
yes
The minimum percentage of time to be spent in garbage collection, triggering the resize of the heap to meet the specified values
j9vm_maxGcTime
real
percent
0.13
0.0 → 1.0
yes
The maximum percentage of time to be spent in garbage collection, triggering the resize of the heap to meet the specified values
j9vm_loa
categorical
loa
loa, noloa
yes
Enable the allocation of the large area object during garbage collection
j9vm_loa_initial
real
0.05
0.0 → 0.95
yes
The initial portion of the tenure area allocated to the large area object
j9vm_loa_minimum
real
0.01
0.0 → 0.95
yes
The minimum portion of the tenure area allocated to the large area object
j9vm_loa_maximum
real
0.5
0.0 → 0.95
yes
The maximum portion of the tenure area allocated to the large area object
noOpt
noOpt, cold, warm, hot, veryHot, scorching
yes
Force the JIT compiler to compile all methods at a specific optimization level
j9vm_compilationThreads
integer
threads
You should select your own default value.
1 → 7
yes
Number of JIT threads
j9vm_codeCacheTotal
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
Maximum size limit in MB for the JIT code cache
j9vm_jit_count
integer
10000
0 → 1000000
yes
The number of times a method is called before it is compiled
blank
blank, -XlockReservation
no
Enables an optimization that presumes a monitor is owned by the thread that last acquired it
j9vm_compressedReferences
categorical
blank
blank, -Xcompressedrefs, -Xnocompressedrefs
yes
Enable/disable the use of compressed references
j9vm_aggressiveOpts
categorical
blank
blank, -Xaggressive
yes
Enable the use of aggressive performance optimization features, which are expected to become default in upcoming releases
j9vm_virtualized
categorical
blank
blank, -Xtune:virtualized
yes
Optimize the VM for virtualized environment, reducing CPU usage when idle
j9vm_shareclasses
categorical
blank
blank, -Xshareclasses
yes
Enable class sharing
j9vm_quickstart
categorical
blank
blank, -Xquickstart
yes
Run JIT with only a subset of optimizations, improving the performance of short-running applications
j9vm_minimizeUserCpu
categorical
blank
blank, -Xthr:minimizeUserCPU
yes
Minimizes user-mode CPU usage in thread synchronization where possible
j9vm_minOldSpace
75% of j9vm_minHeapSize
must not exceed j9vm_minHeapSize
j9vm_maxOldSpace
same as j9vm_maxHeapSize
must not exceed j9vm_maxHeapSize
j9vm_gcthreads
number of CPUs - 1, up to a maximum of 64
capped to default, no benefit in exceeding that value
j9vm_compressedReferences
enabled for j9vm_maxHeapSize<= 57 GB
jvm.j9vm_loa_minimum <= jvm.j9vm_loa_initial && jvm.j9vm_loa_initial <= jvm.j9vm_loa_maximum
jvm.j9vm_minFreeHeap + 0.05 < jvm.j9vm_maxFreeHeap
jvm.j9vm_minGcTimeMin < jvm.j9vm_maxGcTime
cpu_util_details
percent
The average CPU utilization % broken down by usage type and cpu number (e.g., cp1 user, cp2 system, cp3 soft-irq)
cpu_load_avg
tasks
The system load average (i.e., the number of active tasks in the system)
mem_util_details
percent
The memory utilization % (i.e., the % of memory used) broken down by usage type (e.g., active memory)
mem_used
bytes
The total amount of memory used
mem_used_nocache
bytes
The total amount of memory used without considering memory reserved for caching purposes
mem_total
bytes
The total amount of installed memory
mem_fault_minor
faults/s
The number of minor memory faults (i.e., faults that do not cause disk access) per second
mem_fault_major
faults/s
The number of major memory faults (i.e., faults that cause disk access) per second
mem_fault
faults/s
The number of memory faults (major + minor)
mem_swapins
pages/s
The number of memory pages swapped in per second
mem_swapouts
pages/s
The number of memory pages swapped out per second
network_out_bytes_details
bytes/s
The number of outbound network packets in bytes per second broken down by network device (e.g., eth01)
disk_util_details
percent
The utilization % of disk, i.e how much time a disk is busy doing work broken down by disk (e.g., disk D://)
disk_iops_writes
ops/s
The average number of IO disk-write operations per second across all disks
disk_iops_reads
ops/s
The average number of IO disk-read operations per second across all disks
disk_iops
ops/s
The average number of IO disk operations per second across all disks
disk_response_time_read
seconds
The average response time of IO read-disk operations
disk_response_time_worst
seconds
The average response time of IO disk operations of the slowest disk
disk_response_time_write
seconds
The average response time of IO write-disk operations
disk_response_time_details
ops/s
The average response time of IO disk operations broken down by disk (e.g., disk /dev/nvme01 )
disk_iops_details
ops/s
The number of IO disk-write operations of per second broken down by disk (e.g., disk /dev/nvme01)
disk_io_inflight_details
ops
The number of IO disk operations in progress (outstanding) broken down by disk (e.g., disk /dev/nvme01)
disk_write_bytes
bytes/s
The number of bytes per second written across all disks
disk_read_bytes
bytes/s
The number of bytes per second read across all disks
disk_read_write_bytes
bytes/s
The number of bytes per second read and written across all disks
disk_write_bytes_details
bytes/s
The number of bytes per second written from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation WRITE)
disk_read_bytes_details
bytes/s
The number of bytes per second read from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation READ)
filesystem_size
bytes
The size of filesystems broken down by type and device (e.g., filesystem of type ext4 for device /dev/nvme01)
3000000 ns
400000→40000000 ns
Scheduler Wakeup Granularity (in nanoseconds)
os_CPUSchedMigrationCost
500000 ns
100000→5000000 ns
Amount of time (in nanoseconds) after the last execution that a task is considered to be "cache hot" in migration decisions. A "hot" task is less likely to be migrated to another CPU, so increasing this variable reduces task migrations
os_CPUSchedChildRunsFirst
0
0→1
A freshly forked child runs before the parent continues execution
os_CPUSchedLatency
18000000 ns
2400000→240000000 ns
Targeted preemption latency (in nanoseconds) for CPU bound tasks
os_CPUSchedAutogroupEnabled
1
0→1
Enables the Linux task auto-grouping feature, where the kernel assigns related tasks to groups and schedules them together on CPUs to achieve higher performance for some workloads
os_CPUSchedNrMigrate
32
3→320
Scheduler NR Migrate
100 %
10→100 %
VFS Cache Pressure
os_MemoryVmMinFree
67584 KB
10240→1024000 KB
Minimum Free Memory
os_MemoryVmDirtyRatio
20 %
1→99 %
When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write
os_MemoryVmDirtyBackgroundRatio
10 %
1→99 %
When the dirty memory pages exceed this percentage of the total memory, the kernel begins to write them asynchronously in the background
os_MemoryTransparentHugepageEnabled
madvise
always never madvise
Transparent Hugepage Enablement
os_MemoryTransparentHugepageDefrag
madvise
always never madvise defer defer+madvise
Transparent Hugepage Enablement Defrag
os_MemorySwap
swapon
swapon swapoff
Memory Swap
os_MemoryVmDirtyExpire
3000 centisecs
300→30000 centisecs
Memory Dirty Expiration Time
os_MemoryVmDirtyWriteback
500 centisecs
50→5000 centisecs
Memory Dirty Writeback
1000 packets
100→10000 packets
Network Max Backlog
os_NetworkNetIpv4TcpMaxSynBacklog
1024 packets
52→15120 packets
Network IPV4 Max Sync Backlog
os_NetworkNetCoreNetdevBudget
300 packets
30→3000 packets
Network Budget
os_NetworkNetCoreRmemMax
212992 bytes
21299→2129920 bytes
Maximum network receive buffer size that applications can request
os_NetworkNetCoreWmemMax
21299→2129920 bytes
21299→2129920 bytes
Maximum network transmit buffer size that applications can request
os_NetworkNetIpv4TcpSlowStartAfterIdle
1
0→1
Network Slow Start After Idle Flag
os_NetworkNetIpv4TcpFinTimeout
60
6 →600 seconds
Network TCP timeout
os_NetworkRfs
0
0→131072
If enabled increases datacache hitrate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running
1000 packets
100→10000 packets
Network Max Backlog
os_StorageRqAffinity
1
1→2
Storage Requests Affinity
os_StorageQueueScheduler
none
none mq-deadline
Storage Queue Scheduler Type
os_StorageNomerges
0
0→2
Enables the user to disable the lookup logic involved with IO merging requests in the block layer. By default (0) all merges are enabled. With 1 only simple one-hit merges will be tried. With 2 no merge algorithms will be tried
os_StorageMaxSectorsKb
128 KB
32→128 KB
The largest IO size that the OS c
jvm_heap_util
percent
The utilization % of heap memory
jvm_memory_used
bytes
The total amount of memory used across all the JVM memory pools
jvm_memory_used_details
bytes
The total amount of memory used broken down by pool (e.g., code-cache, compressed-class-space)
jvm_memory_buffer_pool_used
bytes
The total amount of bytes used by buffers within the JVM buffer memory pool
jvm_gc_time
percent
The % of wall clock time the JVM spent doing stop the world garbage collection activities
jvm_gc_time_details
percent
The % of wall clock time the JVM spent doing stop the world garbage collection activities broken down by type of garbage collection algorithm (e.g., ParNew)
jvm_gc_count
collections/s
The total number of stop the world JVM garbage collections that have occurred per second
jvm_gc_count_details
collections/s
The total number of stop the world JVM garbage collections that have occurred per second, broken down by type of garbage collection algorithm (e.g., G1, CMS)
jvm_gc_duration
seconds
The average duration of a stop the world JVM garbage collection
jvm_gc_duration_details
seconds
The average duration of a stop the world JVM garbage collection broken down by type of garbage collection algorithm (e.g., G1, CMS)
jvm_threads_current
threads
The total number of active threads within the JVM
jvm_threads_deadlocked
threads
The total number of deadlocked threads within the JVM
jvm_compilation_time
milliseconds
The total time spent by the JVM JIT compiler compiling bytecode
You should select your own domain.
yes
Minimum heap size (in megabytes)
j9vm_maxHeapSize
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
Maximum heap size (in megabytes)
j9vm_minFreeHeap
real
percent
0.3
0.1 → 0.5
yes
Specify the minimum % free heap required after global GC
j9vm_maxFreeHeap
real
percent
0.6
0.4 → 0.9
yes
Specify the maximum % free heap required after global GC
gencon, subpool, optavgpause, optthruput, nogc
yes
GC policy to use
j9vm_gcThreads
integer
threads
You should select your own default value.
1 → 64
yes
Number of threads the garbage collector uses for parallel operations
j9vm_scvTenureAge
integer
10
1 → 14
yes
Set the initial tenuring threshold for generational concurrent GC policy
j9vm_scvAdaptiveTenureAge
categorical
blank
blank, -Xgc:scvNoAdaptiveTenure
yes
Enable the adaptive tenure age for generational concurrent GC policy
j9vm_newSpaceFixed
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The fixed size of the new area when using the gencon GC policy. Must not be set alongside min or max
j9vm_minNewSpace
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The initial size of the new area when using the gencon GC policy
j9vm_maxNewSpace
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The maximum size of the new area when using the gencon GC policy
j9vm_oldSpaceFixed
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The fixed size of the old area when using the gencon GC policy. Must not be set alongside min or max
j9vm_minOldSpace
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The initial size of the old area when using the gencon GC policy
j9vm_maxOldSpace
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The maximum size of the old area when using the gencon GC policy
j9vm_concurrentScavenge
categorical
concurrentScavenge
concurrentScavenge, noConcurrentScavenge
yes
Support pause-less garbage collection mode with gencon
j9vm_gcPartialCompact
categorical
nopartialcompactgc
nopartialcompactgc, partialcompactgc
yes
Enable partial compaction
j9vm_concurrentMeter
categorical
soa
soa, loa, dynamic
yes
Determine which area is monitored by the concurrent mark
j9vm_concurrentBackground
integer
0
0 → 128
yes
The number of background threads assisting the mutator threads in concurrent mark
j9vm_concurrentSlack
integer
megabytes
0
You should select your own domain.
yes
The target size of free heap space for concurrent collectors
j9vm_concurrentLevel
integer
percent
8
0 → 100
yes
The ratio between the amount of heap allocated and the amount of heap marked
j9vm_gcCompact
categorical
blank
blank, -Xcompactgc, -Xnocompactgc
yes
Enables full compaction on all garbage collections (system and global)
j9vm_minGcTime
real
percent
0.05
0.0 → 1.0
yes
The minimum percentage of time to be spent in garbage collection, triggering the resize of the heap to meet the specified values
j9vm_maxGcTime
real
percent
0.13
0.0 → 1.0
yes
The maximum percentage of time to be spent in garbage collection, triggering the resize of the heap to meet the specified values
j9vm_loa
categorical
loa
loa, noloa
yes
Enable the allocation of the large area object during garbage collection
j9vm_loa_initial
real
0.05
0.0 → 0.95
yes
The initial portion of the tenure area allocated to the large area object
j9vm_loa_minimum
real
0.01
0.0 → 0.95
yes
The minimum portion of the tenure area allocated to the large area object
j9vm_loa_maximum
real
0.5
0.0 → 0.95
yes
The maximum portion of the tenure area allocated to the large area object
noOpt, cold, warm, hot, veryHot, scorching
yes
Force the JIT compiler to compile all methods at a specific optimization level
j9vm_compilationThreads
integer
integer
You should select your own default value.
1 → 7
yes
Number of JIT threads
j9vm_codeCacheTotal
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
Maximum size limit in MB for the JIT code cache
j9vm_jit_count
integer
10000
0 → 1000000
yes
The number of times a method is called before it is compiled
blank, -XlockReservation
no
Enables an optimization that presumes a monitor is owned by the thread that last acquired it
j9vm_compressedReferences
categorical
blank
blank, -Xcompressedrefs, -Xnocompressedrefs
yes
Enable/disable the use of compressed references
j9vm_aggressiveOpts
categorical
blank
blank, -Xaggressive
yes
Enable the use of aggressive performance optimization features, which are expected to become default in upcoming releases
j9vm_virtualized
categorical
blank
blank, -Xtune:virtualized
yes
Optimize the VM for virtualized environment, reducing CPU usage when idle
j9vm_shareclasses
categorical
blank
blank, -Xshareclasses
yes
Enable class sharing
j9vm_quickstart
categorical
blank
blank, -Xquickstart
yes
Run JIT with only a subset of optimizations, improving the performance of short-running applications
j9vm_minimizeUserCpu
categorical
blank
blank, -Xthr:minimizeUserCPU
yes
Minimizes user-mode CPU usage in thread synchronization where possible
j9vm_minOldSpace
75% of j9vm_minHeapSize
must not exceed j9vm_minHeapSize
j9vm_maxOldSpace
same as j9vm_maxHeapSize
must not exceed j9vm_maxHeapSize
j9vm_gcthreads
number of CPUs - 1, up to a maximum of 64
capped to default, no benefit in exceeding that value
j9vm_compressedReferences
enabled for j9vm_maxHeapSize<= 57 GB
jvm.j9vm_minFreeHeap + 0.05 < jvm.j9vm_maxFreeHeap
jvm.j9vm_minGcTimeMin < jvm.j9vm_maxGcTime
the sum of j9vm_minNewSpace and j9vm_minOldSpace must be equal to j9vm_minHeapSize, so it's useless to tune all of them together. Max values seem to be more complex.
jvm_heap_size
bytes
The size of the JVM heap memory
jvm_heap_used
bytes
The amount of heap memory used
j9vm_minHeapSize
integer
megabytes
j9vm_gcPolicy
categorical
j9vm_jitOptlevel
ordinal
j9vm_lockReservation
categorical
j9vm_minNewSpace
25% of j9vm_minHeapSize
must not exceed j9vm_minHeapSize
j9vm_maxNewSpace
25% of j9vm_maxHeapSize
must not exceed j9vm_maxHeapSize
jvm.j9vm_minHeapSize < jvm.j9vm_maxHeapSize
jvm.j9vm_minNewSpace < jvm.j9vm_maxNewSpace && jvm.j9vm_minNewSpace < jvm.j9vm_minHeapSize && jvm.j9vm_maxNewSpace < jvm.j9vm_maxHeapSize
jvm.j9vm_minOldSpace < jvm.j9vm_maxOldSpace && jvm.j9vm_minOldSpace < jvm.j9vm_minHeapSize && jvm.j9vm_maxOldSpace < jvm.j9vm_maxHeapSize
You should select your own default value.
gencon
noOpt
categorical
jvm.j9vm_loa_minimum <= jvm.j9vm_loa_initial && jvm.j9vm_loa_initial <= jvm.j9vm_loa_maximum
This page describes the Optimization Pack for the component type Amazon Linux 2.
This page describes the Optimization Pack for the component type Amazon Linux 2022.
cpu_util
percent
The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)
cpu_used
CPUs
The average number of CPUs used in the system (physical and logical)
cpu_util_details
percent
The average CPU utilization % broken down by usage type and CPU number (e.g., cp1 user, cp2 system, cp3 soft-irq)
mem_fault_minor
faults/s
The number of minor memory faults (i.e., faults that do not cause disk access) per second
mem_swapins
pages/s
The number of memory pages swapped in per second
mem_swapouts
pages/s
The number of memory pages swapped out per second
mem_total
bytes
The total amount of installed memory
mem_used
bytes
The total amount of memory used
mem_used_nocache
bytes
The total amount of memory used without considering memory reserved for caching purposes
mem_util
percent
The memory utilization % (i.e, the % of memory used)
mem_util_details
percent
The memory utilization % (i.e., the % of memory used) broken down by usage type (e.g., active memory)
mem_util_nocache
percent
The memory utilization % (i.e., the % of memory used) without considering memory reserved for caching purposes
disk_iops_details
ops/s
The number of IO disk-write operations per second broken down by disk (e.g., disk /dev/nvme01)
disk_iops_reads
ops/s
The average number of IO disk-read operations per second across all disks
disk_iops_writes
ops/s
The average number of IO disk-write operations per second across all disks
disk_read_bytes
bytes/s
The number of bytes per second read across all disks
disk_read_bytes_details
bytes/s
The average response time of IO disk operations broken down by disk (e.g., disk C://)
disk_read_write_bytes
bytes/s
The number of bytes per second written across all disks
disk_response_time_details
seconds
The average response time of IO disk operations broken down by disk (e.g., disk C://)
disk_response_time_read
seconds
The average response time of read disk operations
disk_response_time_worst
seconds
The average response time of IO disk operations of the slowest disk
disk_response_time_write
seconds
The average response time of write on disk operations
disk_swap_used
bytes
The total amount of space used by swap disks
disk_swap_util
percent
The average space utilization % of swap disks
disk_util_details
percent
The utilization % of disk, i.e how much time a disk is busy doing work broken down by disk (e.g., disk D://)
disk_write_bytes
bytes/s
The number of bytes per second written across all disks
disk_write_bytes_details
bytes/s
The number of bytes per second written from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation WRITE)
filesystem_size
bytes
The size of filesystems broken down by type and device (e.g., filesystem of type ext4 for device /dev/nvme01)
filesystem_used
bytes
The amount of space used on the filesystems broken down by type and device (e.g., filesystem of type zfs on device /dev/nvme01)
filesystem_util
percent
The space utilization % of filesystems broken down by type and device (e.g., filesystem of type overlayfs on device /dev/loop1)
network_tcp_retrans
retrans/s
The number of network TCP retransmissions per second
300000 → 30000000
no
Minimal preemption granularity (in nanoseconds) for CPU bound tasks
os_cpuSchedWakeupGranularity
integer
nanoseconds
2000000
400000 → 40000000
no
Scheduler Wakeup Granularity (in nanoseconds)
os_CPUSchedMigrationCost
integer
nanoseconds
500000
100000 → 5000000
no
Amount of time (in nanoseconds) after the last execution that a task is considered to be "cache hot" in migration decisions. A "hot" task is less likely to be migrated to another CPU, so increasing this variable reduces task migrations
os_CPUSchedChildRunsFirst
integer
0
0, 1
no
A freshly forked child runs before the parent continues execution
os_CPUSchedLatency
integer
nanoseconds
12000000
2400000 → 240000000
no
Targeted preemption latency (in nanoseconds) for CPU bound tasks
os_CPUSchedAutogroupEnabled
integer
0
0, 1
no
Enables the Linux task auto-grouping feature, where the kernel assigns related tasks to groups and schedules them together on CPUs to achieve higher performance for some workloads
os_CPUSchedNrMigrate
integer
32
3 → 320
no
Scheduler NR Migrate
0 → 100
no
The percentage of RAM free space for which the kernel will start swapping pages to disk
os_MemoryVmVfsCachePressure
integer
100
10 → 100
no
VFS Cache Pressure
os_MemoryVmCompactionProactiveness
integer
20
0 → 100
Determines how aggressively compaction is done in the background
os_MemoryVmPageLockUnfairness
integer
5
0 → 1000
no
Set the level of unfairness in the page lock queue.
os_MemoryVmWatermarkScaleFactor
integer
10
0 → 1000
no
The amount of memory, expressed as fractions of 10'000, left in a node/system before kswapd is woken up and how much memory needs to be free before kswapd goes back to sleep
os_MemoryVmWatermarkBoostFactor
integer
15000
0 → 30000
no
The level of reclaim when the memory is being fragmented, expressed as fractions of 10'000 of a zone's high watermark
os_MemoryVmMinFree
integer
67584
10240 → 1024000
no
Minimum Free Memory (in kbytes)
os_MemoryTransparentHugepageEnabled
categorical
madvise
always, never, madvise
no
Transparent Hugepage Enablement Flag
os_MemoryTransparentHugepageDefrag
categorical
madvise
always, never, defer+madvise, madvise, defer
no
Transparent Hugepage Enablement Defrag
os_MemorySwap
categorical
swapon
swapon, swapoff
no
Memory Swap
os_MemoryVmDirtyRatio
integer
20
1 → 99
no
When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write
os_MemoryVmDirtyBackgroundRatio
integer
10
1 → 99
no
When the dirty memory pages exceed this percentage of the total memory, the kernel begins to write them asynchronously in the background
os_MemoryVmDirtyExpire
integer
centiseconds
3000
300 → 30000
no
When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write
os_MemoryVmDirtyWriteback
integer
centiseconds
500
50 → 5000
no
Memory Dirty Writeback (in centisecs)
12 → 8192
no
Network Max Connections
os_NetworkNetCoreNetdevMaxBacklog
integer
megabytes/s
1000
100 → 10000
no
Network Max Backlog
os_NetworkNetIpv4TcpMaxSynBacklog
integer
milliseconds
256
52 → 5120
no
Network IPV4 Max Sync Backlog
os_NetworkNetCoreNetdevBudget
integer
300
30 → 30000
no
Network Budget
os_NetworkNetCoreRmemMax
integer
212992
21299 → 2129920
no
Maximum network receive buffer size that applications can request
os_NetworkNetCoreWmemMax
integer
212992
21299 → 2129920
no
Maximum network transmit buffer size that applications can request
os_NetworkNetIpv4TcpSlowStartAfterIdle
integer
1
0, 1
no
Network Slow Start After Idle Flag
os_NetworkNetIpv4TcpFinTimeout
integer
60
6 → 600
no
Network TCP timeout
os_NetworkRfs
integer
0
0 → 131072
no
If enabled increases datacache hitrate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running
0 → 4096
no
Read-ahead speeds up file access by pre-fetching data and loading it into the page cache so that it can be available earlier in memory instead of from disk
os_StorageNrRequests
integer
32
12 → 1280
no
Storage Number of Requests
os_StorageRqAffinity
integer
1
1, 2
no
Storage Requests Affinity
os_StorageQueueScheduler
integer
none
none, kyber, mq-deadline, bfq
no
Storage Queue Scheduler Type
os_StorageNomerges
integer
0
0 → 2
no
Enables the user to disable the lookup logic involved with IO merging requests in the block layer. By default (0) all merges are enabled. With 1 only simple one-hit merges will be tried. With 2 no merge algorithms will be tried
os_StorageMaxSectorsKb
integer
kilobytes
256
32 → 256
no
The largest IO size that the OS can issue to a block device
cpu_load_avg
tasks
The system load average (i.e., the number of active tasks in the system)
cpu_num
CPUs
The number of CPUs available in the system (physical and logical)
mem_fault
faults/s
The number of memory faults (minor+major)
mem_fault_major
faults/s
The number of major memory faults (i.e., faults that cause disk access) per second
disk_io_inflight_details
ops
The number of IO disk operations in progress (outstanding) broken down by disk (e.g., disk /dev/nvme01)
disk_iops
ops/s
The average number of IO disk operations per second across all disks
network_in_bytes_details
bytes/s
The number of inbound network packets in bytes per second broken down by network device (e.g., wlp4s0)
network_out_bytes_details
bytes/s
The number of outbound network packets in bytes per second broken down by network device (e.g., eth01)
os_context_switch
switches/s
The number of context switches per second
proc_blocked
processes
The number of processes blocked (e.g, for IO or swapping reasons)
os_cpuSchedMinGranularity
integer
nanoseconds
os_MemorySwappiness
integer
percent
os_NetworkNetCoreSomaxconn
integer
megabytes
os_StorageReadAhead
integer
kilobytes
1500000
60
128
128
cpu_util
percent
The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)
cpu_used
CPUs
The average number of CPUs used in the system (physical and logical)
cpu_util_details
percent
The average CPU utilization % broken down by usage type and CPU number (e.g., cp1 user, cp2 system, cp3 soft-irq)
mem_fault_minor
faults/s
The number of minor memory faults (i.e., faults that do not cause disk access) per second
mem_swapins
pages/s
The number of memory pages swapped in per second
mem_swapouts
pages/s
The number of memory pages swapped out per second
mem_total
bytes
The total amount of installed memory
mem_used
bytes
The total amount of memory used
mem_used_nocache
bytes
The total amount of memory used without considering memory reserved for caching purposes
mem_util
percent
The memory utilization % (i.e, the % of memory used)
mem_util_details
percent
The memory utilization % (i.e., the % of memory used) broken down by usage type (e.g., active memory)
mem_util_nocache
percent
The memory utilization % (i.e., the % of memory used) without considering memory reserved for caching purposes
disk_iops_details
ops/s
The number of IO disk-write operations per second broken down by disk (e.g., disk /dev/nvme01)
disk_iops_reads
ops/s
The average number of IO disk-read operations per second across all disks
disk_iops_writes
ops/s
The average number of IO disk-write operations per second across all disks
disk_read_bytes
bytes/s
The number of bytes per second read across all disks
disk_read_bytes_details
bytes/s
The average response time of IO disk operations broken down by disk (e.g., disk C://)
disk_read_write_bytes
bytes/s
The number of bytes per second written across all disks
disk_response_time_details
seconds
The average response time of IO disk operations broken down by disk (e.g., disk C://)
disk_response_time_read
seconds
The average response time of read disk operations
disk_response_time_worst
seconds
The average response time of IO disk operations of the slowest disk
disk_response_time_write
seconds
The average response time of write on disk operations
disk_swap_used
bytes
The total amount of space used by swap disks
disk_swap_util
percent
The average space utilization % of swap disks
disk_util_details
percent
The utilization % of disk, i.e how much time a disk is busy doing work broken down by disk (e.g., disk D://)
disk_write_bytes
bytes/s
The number of bytes per second written across all disks
disk_write_bytes_details
bytes/s
The number of bytes per second written from the disks broken down by disk and type of operation (e.g., disk /dev/nvme01 and operation WRITE)
filesystem_size
bytes
The size of filesystems broken down by type and device (e.g., filesystem of type ext4 for device /dev/nvme01)
filesystem_used
bytes
The amount of space used on the filesystems broken down by type and device (e.g., filesystem of type zfs on device /dev/nvme01)
filesystem_util
percent
The space utilization % of filesystems broken down by type and device (e.g., filesystem of type overlayfs on device /dev/loop1)
network_tcp_retrans
retrans/s
The number of network TCP retransmissions per second
300000 → 30000000
no
Minimal preemption granularity (in nanoseconds) for CPU bound tasks
os_cpuSchedWakeupGranularity
integer
nanoseconds
2000000
400000 → 40000000
no
Scheduler Wakeup Granularity (in nanoseconds)
os_CPUSchedMigrationCost
integer
nanoseconds
500000
100000 → 5000000
no
Amount of time (in nanoseconds) after the last execution that a task is considered to be "cache hot" in migration decisions. A "hot" task is less likely to be migrated to another CPU, so increasing this variable reduces task migrations
os_CPUSchedChildRunsFirst
integer
0
0, 1
no
A freshly forked child runs before the parent continues execution
os_CPUSchedLatency
integer
nanoseconds
12000000
2400000 → 240000000
no
Targeted preemption latency (in nanoseconds) for CPU bound tasks
os_CPUSchedAutogroupEnabled
integer
0
0, 1
no
Enables the Linux task auto-grouping feature, where the kernel assigns related tasks to groups and schedules them together on CPUs to achieve higher performance for some workloads
os_CPUSchedNrMigrate
integer
32
3 → 320
no
Scheduler NR Migrate
0 → 100
no
The percentage of RAM free space for which the kernel will start swapping pages to disk
os_MemoryVmVfsCachePressure
integer
100
10 → 100
no
VFS Cache Pressure
os_MemoryVmCompactionProactiveness
integer
20
10 → 100
no
Determines how aggressively compaction is done in the background
os_MemoryVmPageLockUnfairness
integer
5
0 → 1000
no
Set the level of unfairness in the page lock queue.
os_MemoryVmWatermarkScaleFactor
integer
10
0 → 1000
no
The amount of memory, expressed as fractions of 10'000, left in a node/system before kswapd is woken up and how much memory needs to be free before kswapd goes back to sleep
os_MemoryVmWatermarkBoostFactor
integer
15000
0 → 30000
no
The level of reclaim when the memory is being fragmented, expressed as fractions of 10'000 of a zone's high watermark
os_MemoryVmMinFree
integer
67584
10240 → 1024000
no
Minimum Free Memory (in kbytes)
os_MemoryTransparentHugepageEnabled
categorical
madvise
always, never, madvise
no
Transparent Hugepage Enablement Flag
os_MemoryTransparentHugepageDefrag
categorical
madvise
always, never, defer+madvise, madvise, defer
no
Transparent Hugepage Enablement Defrag
os_MemorySwap
categorical
swapon
swapon, swapoff
no
Memory Swap
os_MemoryVmDirtyRatio
integer
20
1 → 99
no
When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write
os_MemoryVmDirtyBackgroundRatio
integer
10
1 → 99
no
When the dirty memory pages exceed this percentage of the total memory, the kernel begins to write them asynchronously in the background
os_MemoryVmDirtyExpire
integer
centiseconds
3000
300 → 30000
no
When the dirty memory pages exceed this percentage of the total memory, processes are forced to write dirty buffers during their time slice instead of continuing to write
os_MemoryVmDirtyWriteback
integer
centiseconds
500
50 → 5000
no
Memory Dirty Writeback (in centisecs)
12 → 8192
no
Network Max Connections
os_NetworkNetCoreNetdevMaxBacklog
integer
megabytes/s
1000
100 → 10000
no
Network Max Backlog
os_NetworkNetIpv4TcpMaxSynBacklog
integer
milliseconds
256
52 → 5120
no
Network IPV4 Max Sync Backlog
os_NetworkNetCoreNetdevBudget
integer
300
30 → 30000
no
Network Budget
os_NetworkNetCoreRmemMax
integer
212992
21299 → 2129920
no
Maximum network receive buffer size that applications can request
os_NetworkNetCoreWmemMax
integer
212992
21299 → 2129920
no
Maximum network transmit buffer size that applications can request
os_NetworkNetIpv4TcpSlowStartAfterIdle
integer
1
0, 1
no
Network Slow Start After Idle Flag
os_NetworkNetIpv4TcpFinTimeout
integer
60
6 → 600
no
Network TCP timeout
os_NetworkRfs
integer
0
0 → 131072
no
If enabled increases datacache hitrate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running
0 → 4096
no
Read-ahead speeds up file access by pre-fetching data and loading it into the page cache so that it can be available earlier in memory instead of from disk
os_StorageNrRequests
integer
32
12 → 1280
no
Storage Number of Requests
os_StorageRqAffinity
integer
1
1, 2
no
Storage Requests Affinity
os_StorageQueueScheduler
integer
none
none, kyber, mq-deadline, bfq
no
Storage Queue Scheduler Type
os_StorageNomerges
integer
0
0 → 2
no
Enables the user to disable the lookup logic involved with IO merging requests in the block layer. By default (0) all merges are enabled. With 1 only simple one-hit merges will be tried. With 2 no merge algorithms will be tried
os_StorageMaxSectorsKb
integer
kilobytes
256
32 → 256
no
The largest IO size that the OS can issue to a block device
cpu_load_avg
tasks
The system load average (i.e., the number of active tasks in the system)
cpu_num
CPUs
The number of CPUs available in the system (physical and logical)
mem_fault
faults/s
The number of memory faults (minor+major)
mem_fault_major
faults/s
The number of major memory faults (i.e., faults that cause disk access) per second
disk_io_inflight_details
ops
The number of IO disk operations in progress (outstanding) broken down by disk (e.g., disk /dev/nvme01)
disk_iops
ops/s
The average number of IO disk operations per second across all disks
network_in_bytes_details
bytes/s
The number of inbound network packets in bytes per second broken down by network device (e.g., wlp4s0)
network_out_bytes_details
bytes/s
The number of outbound network packets in bytes per second broken down by network device (e.g., eth01)
os_context_switch
switches/s
The number of context switches per second
proc_blocked
processes
The number of processes blocked (e.g, for IO or swapping reasons)
os_cpuSchedMinGranularity
integer
nanoseconds
os_MemorySwappiness
integer
percent
os_NetworkNetCoreSomaxconn
integer
megabytes
os_StorageReadAhead
integer
kilobytes
1500000
60
128
128
This page describes the mapping between metrics provided by Dynatrace to Akamas metrics for each supported component type.
cpu_load_avg
builtin:host.cpu.load
jvm_gc_count
builtin:tech.jvm.memory.pool.collectionCount:merge(poolname,gcname):sum
1/60
Yes
requests_response_time
builtin:service.response.time
0
0.000001
requests_response_time_min
container_cpu_limit
builtin:containers.cpu.limit
Yes
k8s_pod_cpu_limit
builtin:cloud.kubernetes.pod.cpuLimits
Yes
k8s_workload_desired_pods
builtin:kubernetes.workload.pods_desired
No
This page describes the mapping between metrics provided by Prometheus to Akamas metrics for each supported component type
The following metrics are configured to work for Kubernetes. When using the Docker optimization pack, override the required metrics in the telemetry instance configuration.
cpu_util
avg by (job) (sum by (cpu, job) (rate(node_cpu_seconds_total{instance=~"$INSTANCE$", mode=~"user|system|softirq|irq|nice", job=~"$JOB$" %FILTERS%}[$DURATION$])))
cpu_util_details
avg by (instance, cpu, mode, job) (sum by (instance, cpu, mode, job) (rate(node_cpu_seconds_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$])))
disk_io_inflight_details
node_disk_io_now{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}
disk_iops
sum by (instance, job) (rate(node_disk_writes_completed_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$])) + sum by (instance, job) (rate(node_disk_reads_completed_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$]))
disk_iops_details
sum by (instance, device, job) (rate(node_disk_writes_completed_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$]))
disk_iops_details
sum by (instance, device, job) (rate(node_disk_reads_completed_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$]))
disk_iops_details
sum by (instance, device, job) (rate(node_disk_writes_completed_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$])) + sum by (instance, device, job) (rate(node_disk_reads_completed_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$]))
disk_iops_reads
sum by (instance, job) (rate(node_disk_reads_completed_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$]))
disk_iops_writes
sum by (instance, job) (rate(node_disk_writes_completed_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$]))
disk_read_bytes
sum by (instance, device, job) (rate(node_disk_read_bytes_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$]))
disk_read_bytes_details
sum by (instance, device, job) (rate(node_disk_read_bytes_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$]))
disk_read_write_bytes
sum by (instance, device, job) (rate(node_disk_written_bytes_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$]) + rate(node_disk_read_bytes_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$]))
disk_response_time
avg by (instance, job) ((rate(node_disk_read_time_seconds_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$]) + rate(node_disk_write_time_seconds_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$])) / (rate(node_disk_reads_completed_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$]) + rate(node_disk_writes_completed_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$]) > 0 ))
disk_response_time_details
avg by (instance, device, job) ((rate(node_disk_read_time_seconds_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$]) + rate(node_disk_write_time_seconds_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$])) / ((rate(node_disk_reads_completed_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$]) + rate(node_disk_writes_completed_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$])) > 0))
disk_response_time_read
rate(node_disk_read_time_seconds_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$])/ rate(node_disk_reads_completed_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$])
disk_response_time_worst
max by (instance, job) ((rate(node_disk_read_time_seconds_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$]) + rate(node_disk_write_time_seconds_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$])) / (rate(node_disk_reads_completed_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$]) + rate(node_disk_writes_completed_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$]) > 0 ))
disk_response_time_write
rate(node_disk_write_time_seconds_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$])/ rate(node_disk_writes_completed_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$])
disk_swap_used
node_memory_SwapTotal_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%} - node_memory_SwapFree_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}
disk_swap_util
((node_memory_SwapTotal_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%} - node_memory_SwapFree_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}) / (node_memory_SwapTotal_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%} > 0)) or ((node_memory_SwapTotal_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%} - node_memory_SwapFree_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}))
disk_util_details
rate(node_disk_io_time_seconds_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$])
disk_write_bytes
sum by (instance, device, job) (rate(node_disk_written_bytes_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$]))
disk_write_bytes_details
sum by (instance, device, job) (rate(node_disk_written_bytes_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$]))
filesystem_size
node_filesystem_size_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}
filesystem_used
node_filesystem_size_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%} - node_filesystem_free_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}
filesystem_util
((node_filesystem_size_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%} - node_filesystem_free_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}) / node_filesystem_size_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%})
mem_fault_major
rate(node_vmstat_pgmajfault{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$])
mem_fault_minor
rate(node_vmstat_pgfault{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$])
mem_swapins
rate(node_vmstat_pswpin{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$])
mem_swapouts
rate(node_vmstat_pswpout{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$])
mem_total
node_memory_MemTotal_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}
mem_used
(node_memory_MemTotal_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%} - node_memory_MemFree_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%})
mem_util
(node_memory_MemTotal_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%} - node_memory_MemFree_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}) / node_memory_MemTotal_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}
mem_util_details
(node_memory_Active_file_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%} / node_memory_MemTotal_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%})
mem_util_details
(node_memory_Active_anon_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%} / node_memory_MemTotal_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%})
mem_util_details
(node_memory_Inactive_file_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%} / node_memory_MemTotal_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%})
mem_util_details
(node_memory_Inactive_anon_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%} / node_memory_MemTotal_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%})
mem_util_nocache
(node_memory_MemTotal_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%} - node_memory_Buffers_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%} - node_memory_Cached_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%} - node_memory_MemFree_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}) / node_memory_MemTotal_bytes{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}
network_in_bytes_details
rate(node_network_receive_bytes_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$])
network_out_bytes_details
rate(node_network_transmit_bytes_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$])
network_tcp_retrans
rate(node_netstat_Tcp_RetransSegs{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$])
os_context_switch
rate(node_context_switches_total{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}[$DURATION$])
proc_blocked
node_procs_blocked{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}
jvm_off_heap_used
avg(jvm_memory_bytes_used{area="nonheap" %FILTERS%})
jvm_heap_util
avg(jvm_memory_bytes_used{area="heap" %FILTERS%} / jvm_memory_bytes_max{area="heap" %FILTERS%})
jvm_memory_used
avg(sum by (instance) (jvm_memory_bytes_used))
jvm_heap_young_gen_size
avg(sum by (instance) (jvm_memory_pool_bytes_max{pool=~".*Eden Space|.*Survivor Space" %FILTERS%}))
jvm_heap_young_gen_used
avg(sum by (instance) (jvm_memory_pool_bytes_used{pool=~".*Eden Space|.*Survivor Space" %FILTERS%}))
jvm_heap_old_gen_size
avg(sum by (instance) (jvm_memory_pool_bytes_max{pool=~".*Tenured Gen|.*Old Gen" %FILTERS%}))
jvm_heap_old_gen_used
avg(sum by (instance) (jvm_memory_pool_bytes_used{pool=~".*Tenured Gen|.*Old Gen" %FILTERS%}))
jvm_memory_buffer_pool_used
avg(sum by (instance) (jvm_buffer_pool_used_bytes))
jvm_gc_time
avg(sum by (instance) (rate(jvm_gc_collection_seconds_sum[$DURATION$])))
jvm_gc_count
avg(sum by (instance) (rate(jvm_gc_collection_seconds_count[$DURATION$])))
jvm_gc_duration
(sum(rate(jvm_gc_collection_seconds_sum[$DURATION$])) / sum(rate(jvm_gc_collection_seconds_count[$DURATION$])) > 0 ) or sum(rate(jvm_gc_collection_seconds_count[$DURATION$]))
jvm_threads_current
avg(jvm_threads_current)
jvm_threads_deadlocked
avg(jvm_threads_deadlocked)
transactions_response_time
avg(rate(ResponseTime_sum{code="200", job=~"$JOB$" %FILTERS%}[$DURATION$])/rate(ResponseTime_count{code="200", job=~"$JOB$" %FILTERS%}[$DURATION$])>0)
transactions_response_time_max
max(rate(ResponseTime_sum{code="200", job=~"$JOB$" %FILTERS%}[$DURATION$])/rate(ResponseTime_count{code="200", job=~"$JOB$" %FILTERS%}[$DURATION$])>0)
transactions_response_time_min
min(rate(ResponseTime_sum{code="200", job=~"$JOB$" %FILTERS%}[$DURATION$])/rate(ResponseTime_count{code="200", job=~"$JOB$" %FILTERS%}[$DURATION$])>0)
transactions_response_time_p50
ResponseTime{quantile="0.5", code="200", job=~"$JOB$" %FILTERS%}
transactions_response_time_p85
ResponseTime{quantile="0.85", code="200", job=~"$JOB$" %FILTERS%}
transactions_response_time_p90
ResponseTime{quantile="0.9", code="200", job=~"$JOB$" %FILTERS%}
transactions_response_time_p99
ResponseTime{quantile="0.99", code="200", job=~"$JOB$" %FILTERS%}
transactions_throughput
sum(rate(Ratio_success{job=~"$JOB$" %FILTERS%}[$DURATION$]))
transactions_error_throughput
sum(rate(Ratio_failure{job=~"$JOB$" %FILTERS%}[$DURATION$]))
transactions_error_rate
(avg(rate(Ratio_failure{job=~"$JOB$" %FILTERS%}[$DURATION$]))/avg(rate(Ratio_total{job=~"$JOB$" %FILTERS%}[$DURATION$])))*100
users
sum(jmeter_threads{state="active", job=~"$JOB$" %FILTERS%})
k8s_workload_cpu_used
1e3 * sum(rate(container_cpu_usage_seconds_total{container="", namespace=~"$NAMESPACE$", pod=~"$DEPLOYMENT$.*" %FILTERS%}[$DURATION$]))
k8s_workload_memory_used
sum(last_over_time(container_memory_usage_bytes{container="", namespace=~"$NAMESPACE$", pod=~"$DEPLOYMENT$.*" %FILTERS%}[$DURATION$]))
k8s_workload_cpu_request
1e3 * sum(kube_pod_container_resource_requests{resource="cpu", namespace=~"$NAMESPACE$", pod=~"$DEPLOYMENT$.*" %FILTERS%})
k8s_workload_cpu_limit
1e3 * sum(kube_pod_container_resource_limits{resource="cpu", namespace=~"$NAMESPACE$", pod=~"$DEPLOYMENT$.*" %FILTERS%})
k8s_workload_memory_request
sum(kube_pod_container_resource_requests{resource="memory", namespace=~"$NAMESPACE$", pod=~"$DEPLOYMENT$.*" %FILTERS%})
k8s_workload_memory_limit
sum(kube_pod_container_resource_limits{resource="memory", namespace=~"$NAMESPACE$", pod=~"$DEPLOYMENT$.*" %FILTERS%})
k8s_pod_memory_used
avg(last_over_time(container_memory_usage_bytes{container="", namespace=~"$NAMESPACE$", pod=~"$POD$" %FILTERS%}[$DURATION$]))
k8s_pod_memory_working_set
avg(container_memory_working_set_bytes{container="", namespace=~"$NAMESPACE$", pod=~"$POD$" %FILTERS%})
k8s_pod_memory_request
avg(sum by (pod) (kube_pod_container_resource_requests{resource="memory", namespace=~"$NAMESPACE$", pod=~"$POD$" %FILTERS%}))
k8s_pod_memory_limit
avg(sum by (pod) (kube_pod_container_resource_limits{resource="memory", namespace=~"$NAMESPACE$", pod=~"$POD$" %FILTERS%}))
k8s_pod_restarts
avg(sum by (pod) (increase(kube_pod_container_status_restarts_total{namespace=~"$NAMESPACE$", pod=~"$POD$" %FILTERS%}[$DURATION$])))
container_cpu_util_max
max(rate(container_cpu_usage_seconds_total{namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%}[$DURATION$]) / on (pod) group_left kube_pod_container_resource_limits{resource="cpu", namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%})
container_cpu_throttled_millicores
1e3 * avg(rate(container_cpu_cfs_throttled_seconds_total{namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%}[$DURATION$]))
container_cpu_throttle_time
avg(last_over_time(container_cpu_cfs_throttled_periods_total{namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%}[$DURATION$]) / container_cpu_cfs_periods_total{namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%})
container_memory_used
avg(last_over_time(container_memory_working_set_bytes{namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%}[$DURATION$]))
container_memory_used_max
max(last_over_time(container_memory_working_set_bytes{namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%}[$DURATION$]))
container_memory_util
avg(last_over_time(container_memory_working_set_bytes{namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%}[$DURATION$]) / on (pod) group_left kube_pod_container_resource_limits{resource="memory", namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%})
container_memory_util_max
max(last_over_time(container_memory_working_set_bytes{namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%}[$DURATION$]) / on (pod) group_left kube_pod_container_resource_limits{resource="memory", namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%})
container_memory_resident_set_used
avg(last_over_time(container_memory_rss{namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%}[$DURATION$]))
container_memory_cache
avg(last_over_time(container_memory_cache{namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%}[$DURATION$]))
container_cpu_request
1e3 * avg(kube_pod_container_resource_requests{resource="cpu", namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%})
container_cpu_limit
1e3 * avg(kube_pod_container_resource_limits{resource="cpu", namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%})
container_memory_request
avg(kube_pod_container_resource_requests{resource="memory", namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%})
container_memory_limit
avg(kube_pod_container_resource_limits{resource="memory", namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%})
container_restarts
avg(increase(kube_pod_container_status_restarts_total{namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%}[$DURATION$]))
container_oom_kills_count
avg(increase(container_oom_events_total{namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%}[$DURATION$]))
cost
sum(kube_pod_container_resource_requests{resource="cpu", namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%})*29 + sum(kube_pod_container_resource_requests{resource="memory", namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%})/1024/1024/1024*8
aws_ec2_credits_cpu_available
aws_resource_info{instance='$INSTANCE$', job='$JOB$' %FILTERS%} * on(instance_id) group_left() aws_ec2_cpucredit_balance_average{job='$JOB$'}
aws_ec2_credits_cpu_used
aws_resource_info{instance='$INSTANCE$', job='$JOB$' %FILTERS%} * on(instance_id) group_left() aws_ec2_cpucredit_usage_sum{job='$JOB$'}
disk_read_bytes
aws_resource_info{instance='$INSTANCE$', job='$JOB$' %FILTERS%} * on(instance_id) group_left() (aws_ec2_ebsread_bytes_sum{job='$JOB$'} * count_over_time(aws_ec2_ebsread_bytes_sum{job='$JOB$'}[300s]) / 300)
disk_write_bytes
aws_resource_info{instance='$INSTANCE$', job='$JOB$' %FILTERS%} * on(instance_id) group_left() (aws_ec2_ebswrite_bytes_sum{job='$JOB$'} * count_over_time(aws_ec2_ebswrite_bytes_sum{job='$JOB$'}[300s]) / 300)
aws_ec2_disk_iops
aws_resource_info{instance='$INSTANCE$', job='$JOB$' %FILTERS%} * on(instance_id) group_left() ((aws_ec2_ebsread_ops_sum{job='$JOB$'} + aws_ec2_ebswrite_ops_sum{job='$JOB$'}) * count_over_time(aws_ec2_ebsread_ops_sum{job='$JOB$'}[300s])/300)
aws_ec2_disk_iops_reads
aws_resource_info{instance='$INSTANCE$', job='$JOB$' %FILTERS%} * on(instance_id) group_left() (aws_ec2_ebsread_ops_sum{job='$JOB$'} * count_over_time(aws_ec2_ebsread_ops_sum{job='$JOB$'}[300s]) / 300)
aws_ec2_disk_iops_writes
aws_resource_info{instance='$INSTANCE$', job='$JOB$' %FILTERS%} * on(instance_id) group_left() (aws_ec2_ebswrite_ops_sum{job='$JOB$'} * count_over_time(aws_ec2_ebswrite_ops_sum{job='$JOB$'}[300s]) / 300)
aws_ec2_ebs_credits_io_util
aws_resource_info{instance='$INSTANCE$', job='$JOB$' %FILTERS%} * on(instance_id) group_left() aws_ec2_ebsiobalance__average{job='$JOB$'} / 100
aws_ec2_ebs_credits_bytes_util
aws_resource_info{instance='$INSTANCE$', job='$JOB$' %FILTERS%} * on(instance_id) group_left() aws_ec2_ebsbyte_balance__average{job='$JOB$'} / 100
oracle_pga_target_size
oracledb_memory_size{component='PGA Target', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_redo_buffers_size
oracledb_memory_size{component='Redo Buffers', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_default_buffer_cache_size
oracledb_memory_size{component='DEFAULT buffer cache', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_default_2k_buffer_cache_size
oracledb_memory_size{component='DEFAULT 2K buffer cache', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_default_4k_buffer_cache_size
oracledb_memory_size{component='DEFAULT 4K buffer cache', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_default_8k_buffer_cache_size
oracledb_memory_size{component='DEFULT 8K buffer cache', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_default_16k_buffer_cache_size
oracledb_memory_size{component='DEFAULT 16K buffer cache', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_default_32k_buffer_cache_size
oracledb_memory_size{component='DEFAULT 32K buffer cache', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_keep_buffer_cache_size
oracledb_memory_size{component='KEEP buffer cache', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_recycle_buffer_cache_size
oracledb_memory_size{component='RECYCLE buffer cache', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_asm_buffer_cache_size
oracledb_memory_size{component='ASM Buffer Cache', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_shared_io_pool_size
oracledb_memory_size{component='Shared IO Pool', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_java_pool_size
oracledb_memory_size{component='java pool', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_large_pool_size
oracledb_memory_size{component='large pool', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_shared_pool_size
oracledb_memory_size{component='shared pool', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_streams_pool_size
oracledb_memory_size{component='streams pool', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_sessions_active_user
oracledb_sessions_value{type='USER', status='ACTIVE', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_sessions_inactive_user
oracledb_sessions_value{type='USER', status='INACTIVE', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_sessions_active_background
oracledb_sessions_value{type='BACKGROUND', status='ACTIVE', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_sessions_inactive_background
oracledb_sessions_value{type='BACKGROUND', status='INACTIVE', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_buffer_cache_hit_ratio
ttps://docs.oracle.com/database/121/TGDBA/tune_buffer_cache.htm#TGDBA533
oracle_redo_log_space_requests
rate(oracledb_activity_redo_log_space_requests{instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$])
oracle_wait_event_log_file_sync
rate(oracledb_system_event_time_waited{event='log file sync', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$])/100
oracle_wait_event_log_file_parallel_write
rate(oracledb_system_event_time_waited{event='log file sequential read', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$])/100
oracle_wait_event_log_file_sequential_read
rate(oracledb_system_event_time_waited{event='log file parallel write', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$])/100
oracle_wait_event_enq_tx_contention
rate(oracledb_system_event_time_waited{event='enq: TX - contention', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$])/100
oracle_wait_event_enq_tx_row_lock_contention
rate(oracledb_system_event_time_waited{event='enq: TX - row lock contention', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$])/100
oracle_wait_event_latch_row_cache_objects
rate(oracledb_system_event_time_waited{event='latch: row cache objects', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$])/100
oracle_wait_event_latch_shared_pool
rate(oracledb_system_event_time_waited{event='latch: shared pool', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$])/100
oracle_wait_event_resmgr_cpu_quantum
rate(oracledb_system_event_time_waited{event='resmgr:cpu quantum', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$])/100
oracle_wait_event_sql_net_message_from_client
rate(oracledb_system_event_time_waited{event='SQL*Net message from client', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$])/100
oracle_wait_event_rdbms_ipc_message
rate(oracledb_system_event_time_waited{event='rdbms ipc message', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$])/100
oracle_wait_event_db_file_sequential_read
rate(oracledb_system_event_time_waited{event='db file sequential read', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$])/100
oracle_wait_event_log_file_switch_checkpoint_incomplete
rate(oracledb_system_event_time_waited{event='log file switch (checkpoint incomplete)', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$])/100
oracle_wait_event_row_cache_lock
rate(oracledb_system_event_time_waited{event='row cache lock', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$])/100
oracle_wait_event_buffer_busy_waits
rate(oracledb_system_event_time_waited{event='buffer busy waits', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$])/100
oracle_wait_event_db_file_async_io_submit
rate(oracledb_system_event_time_waited{event='db file async I/O submit', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$])/100
oracle_wait_class_commit
sum without(event) (rate(oracledb_system_event_time_waited{wait_class='Commit', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$]))/100
oracle_wait_class_concurrency
sum without(event) (rate(oracledb_system_event_time_waited{wait_class='Concurrency', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$]))/100
oracle_wait_class_system_io
sum without(event) (rate(oracledb_system_event_time_waited{wait_class='System I/O', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$]))/100
oracle_wait_class_user_io
sum without(event) (rate(oracledb_system_event_time_waited{wait_class='User I/O', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$]))/100
oracle_wait_class_other
sum without(event) (rate(oracledb_system_event_time_waited{wait_class='Other', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$]))/100
oracle_wait_class_scheduler
sum without(event) (rate(oracledb_system_event_time_waited{wait_class='Scheduler', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$]))/100
oracle_wait_class_idle
sum without(event) (rate(oracledb_system_event_time_waited{wait_class='Idle', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$]))/100
oracle_wait_class_application
sum without(event) (rate(oracledb_system_event_time_waited{wait_class='Application', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$]))/100
oracle_wait_class_network
sum without(event) (rate(oracledb_system_event_time_waited{wait_class='Network', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$]))/100
oracle_wait_class_configuration
sum without(event) (rate(oracledb_system_event_time_waited{wait_class='Configuration', instance='$INSTANCE$', job='$JOB$' %FILTERS%}[$DURATION$]))/100
transactions_response_time_p50
ResponseTime{quantile="0.5", code="200", job=~"$JOB$" %FILTERS%}
transactions_response_time_p85
ResponseTime{quantile="0.85", code="200", job=~"$JOB$" %FILTERS%}
transactions_response_time_p90
ResponseTime{quantile="0.9", code="200", job=~"$JOB$" %FILTERS%}
transactions_response_time_p99
ResponseTime{quantile="0.99", code="200", job=~"$JOB$" %FILTERS%}
transactions_throughput
sum(rate(Ratio_success{job=~"$JOB$" %FILTERS%}[$DURATION$]))
transactions_error_throughput
sum(rate(Ratio_failure{job=~"$JOB$" %FILTERS%}[$DURATION$]))
transactions_error_rate
(avg(rate(Ratio_failure{job=~"$JOB$" %FILTERS%}[$DURATION$]))/avg(rate(Ratio_total{job=~"$JOB$" %FILTERS%}[$DURATION$])))*100
users
sum(jmeter_threads{state="active", job=~"$JOB$" %FILTERS%})
The default metrics in this table are based on the cadvisor and kube-state-metrics
The default metrics in this table are based on the cadvisor and kube-state-metrics
The default metrics in this table are based on the CloudWatch Exporter, configured with the attached custom configuration file
The default metrics in this table are based on the OracleDB Exporter, extending the default queries with the attached custom configuration file
The default metrics in this table are based on the Prometheus Listener for Jmeter
cpu_load_avg
node_load1{instance=~"$INSTANCE$", job=~"$JOB$" %FILTERS%}
cpu_num
count(node_cpu_seconds_total{instance=~"$INSTANCE$", job=~"$JOB$", mode="system" %FILTERS%})
cpu_used
sum by (job) (sum by (cpu, job) (rate(node_cpu_seconds_total{instance=~"$INSTANCE$", mode=~"user|system|softirq|irq|nice", job=~"$JOB$" %FILTERS%}[$DURATION$])))
jvm_heap_size
avg(jvm_memory_bytes_max{area="heap" %FILTERS%})
jvm_heap_committed
avg(jvm_memory_bytes_committed{area="heap" %FILTERS%})
jvm_heap_used
avg(jvm_memory_bytes_used{area="heap" %FILTERS%})
k8s_workload_desired_pods
kube_deployment_spec_replicas{namespace=~"$NAMESPACE$", deployment=~"$DEPLOYMENT$" %FILTERS%}
k8s_workload_running_pods
kube_deployment_status_replicas_available{namespace=~"$NAMESPACE$", deployment=~"$DEPLOYMENT$" %FILTERS%}
k8s_workload_ready_pods
kube_deployment_status_replicas_ready{namespace=~"$NAMESPACE$", deployment=~"$DEPLOYMENT$" %FILTERS%}
k8s_pod_cpu_used
1e3 * avg(rate(container_cpu_usage_seconds_total{container="", namespace=~"$NAMESPACE$", pod=~"$POD$" %FILTERS%}[$DURATION$]))
k8s_pod_cpu_request
1e3 * avg(sum by (pod) (kube_pod_container_resource_requests{resource="cpu", namespace=~"$NAMESPACE$", pod=~"$POD$" %FILTERS%}))
k8s_pod_cpu_limit
1e3 * avg(sum by (pod) (kube_pod_container_resource_limits{resource="cpu", namespace=~"$NAMESPACE$", pod=~"$POD$" %FILTERS%}))
container_cpu_used
1e3 * avg(rate(container_cpu_usage_seconds_total{namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%}[$DURATION$]))
container_cpu_used_max
1e3 * max(rate(container_cpu_usage_seconds_total{namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%}[$DURATION$]))
container_cpu_util
avg(rate(container_cpu_usage_seconds_total{namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%}[$DURATION$]) / on (pod) group_left kube_pod_container_resource_limits{resource="cpu", namespace=~"$NAMESPACE$", pod=~"$POD$", container=~"$CONTAINER$" %FILTERS%})
cpu_util
aws_resource_info{instance='$INSTANCE$', job='$JOB$' %FILTERS%} * on(instance_id) group_left() aws_ec2_cpuutilization_average{job='$JOB$'}/100
network_in_bytes_details
aws_resource_info{instance='$INSTANCE$', job='$JOB$' %FILTERS%} * on(instance_id) group_left() (aws_ec2_network_in_sum{job='$JOB$'} * count_over_time(aws_ec2_network_in_sum{job='$JOB$'}[300s]) / 300)
network_out_bytes_details
aws_resource_info{instance='$INSTANCE$', job='$JOB$' %FILTERS%} * on(instance_id) group_left() (aws_ec2_network_out_sum{job='$JOB$'} * count_over_time(aws_ec2_network_out_sum{job='$JOB$'}[300s]) / 300)
oracle_sga_total_size
oracledb_memory_size{component='SGA Target', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_sga_free_size
oracledb_memory_size{component='Free SGA Memory Available', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
oracle_sga_max_size
oracledb_memory_size{component='Maximum SGA Size', instance='$INSTANCE$', job='$JOB$' %FILTERS%}
transactions_response_time
avg(rate(ResponseTime_sum{code="200", job=~"$JOB$" %FILTERS%}[$DURATION$])/rate(ResponseTime_count{code="200", job=~"$JOB$" %FILTERS%}[$DURATION$])>0)
transactions_response_time_max
max(rate(ResponseTime_sum{code="200", job=~"$JOB$" %FILTERS%}[$DURATION$])/rate(ResponseTime_count{code="200", job=~"$JOB$" %FILTERS%}[$DURATION$])>0)
transactions_response_time_min
min(rate(ResponseTime_sum{code="200", job=~"$JOB$" %FILTERS%}[$DURATION$])/rate(ResponseTime_count{code="200", job=~"$JOB$" %FILTERS%}[$DURATION$])>0)
cpu_num
N/A
cpu_util
builtin:host.cpu.usage
0.01
cpu_util_details
mode:
idle
user
system
iowait
builtin:host.cpu.idle (mode=idle)
builtin:host.cpu.system (mode=system)
builtin:host.cpu.user (mode=user)
builtin:host.cpu.iowait (mode=iowait)
0.01
mem_util
N/A
mem_util_nocache
builtin:host.mem.usage
0.01
mem_util_details
N/A
mem_used
N/A
mem_used_nocache
builtin:host.mem.used
mem_total
N/A
mem_fault
builtin:host.mem.avail.pfps
mem_fault_minor
N/A
mem_fault_major
N/A
mem_swapins
N/A
mem_swapouts
N/A
disk_swap_util
N/A
disk_swap_used
N/A
filesystem_util
Disk
builtin:host.disk.usedPct
filesystem_used
N/A
filesystem_size
N/A
disk_util_details
Disk
builtin:host.disk.free
0.01
disk_iops_writes
N/A
disk_iops_reads
N/A
disk_iops
N/A
disk_iops_details
N/A
disk_response_time_worst
N/A
disk_response_time
N/A
disk_io_inflight_details
N/A
0.01
disk_write_bytes
N/A
disk_read_bytes
N/A
disk_read_write_bytes
N/A
disk_write_bytes_details
Disk
builtin:host.disk.bytesWritten
disk_read_bytes_details
Disk
builtin:host.disk.bytesRead
disk_response_time_details
Disk
builtin:host.disk.readTime
0.001
proc_blocked
N/A
os_context_switch
N/A
network_tcp_retrans
N/A
network_in_bytes_details
Network interface
builtin:host.net.nic.bytesRx
network_out_bytes_details
Network interface
builtin:host.net.nic.bytesTx
avg
jvm_gc_time
builtin:tech.jvm.memory.gc.suspensionTime
0.01
Yes
avg
jvm_heap_size
builtin:tech.jvm.memory.runtime.max
Yes
avg
jvm_heap_committed
Yes
avg
jvm_heap_used
Yes
avg
jvm_off_heap_used
Yes
avg
jvm_heap_old_gen_size
Yes
avg
jvm_heap_old_gen_used
Yes
avg
jvm_heap_young_gen_size
Yes
avg
jvm_heap_young_gen_used
Yes
avg
jvm_threads_current
builtin:tech.jvm.threads.count
Yes
avg
builtin:service.response.time:min
0
0.000001
requests_response_time_max
builtin:service.response.time:max
0
0.000001
requests_throughput
builtin:service.errors.total.successCount
0
1/60
requests_error_rate
builtin:service.errors.total.rate
0
0.01
requests_response_time_p50
builtin:service.response.time:percentile(50)
0
0.001
requests_response_time_p85
builtin:service.response.time:percentile(85)
0
0.001
requests_response_time_p90
builtin:service.response.time:percentile(90)
0
0.001
requests_response_time_p95
builtin:service.response.time:percentile(95)
0
0.001
requests_response_time_p99
builtin:service.response.time:percentile(99)
0
0.001
avg
container_cpu_util
builtin:containers.cpu.usagePercent
0.01
Yes
avg
container_cpu_util_max
builtin:containers.cpu.usagePercent
0.01
Yes
max
container_cpu_throttled_millicores
builtin:containers.cpu.throttledMilliCores
Yes
avg
container_cpu_throttle_time
builtin:containers.cpu.throttledTime
1 / 10^9 / 60
Yes
avg
container_cpu_used
builtin:containers.cpu.usageMilliCores
Yes
avg
container_cpu_used_max
builtin:containers.cpu.usageMilliCores
Yes
max
container_memory_limit
builtin:containers.memory.limitBytes
Yes
avg
container_memory_used
builtin:containers.memory.residentSetBytes
Yes
avg
container_memory_used_max
builtin:containers.memory.residentSetBytes
Yes
max
container_memory_util
builtin:containers.memory.usagePercent
0.01
Yes
avg
container_memory_util_max
builtin:containers.memory.usagePercent
0.01
Yes
max
container_oom_kills_count
builtin:containers.memory.outOfMemoryKills
1/60
Yes
avg
avg
k8s_pod_cpu_request
builtin:cloud.kubernetes.pod.cpuRequests
Yes
avg
k8s_pod_memory_limit
builtin:cloud.kubernetes.pod.memoryLimits
Yes
avg
k8s_pod_memory_request
builtin:cloud.kubernetes.pod.memoryRequests
Yes
avg
k8s_pod_restarts
builtin:kubernetes.container.restarts:merge(k8s.container.name):sum
0
Yes
avg
k8s_workload_running_pods
builtin:kubernetes.pods:filter(eq(pod_phase,Running))
No
k8s_workload_cpu_limit
builtin:kubernetes.workload.limits_cpu
No
k8s_workload_cpu_request
builtin:kubernetes.workload.requests_cpu
No
k8s_workload_memory_limit
builtin:kubernetes.workload.limits_memory
No
k8s_workload_memory_request
builtin:kubernetes.workload.requests_memory
No
k8s_workload_cpu_used
builtin:containers.cpu.usageMilliCores
Yes
sum
k8s_workload_memory_used
builtin:containers.memory.residentSetBytes
Yes
sum
builtin:tech.jvm.memory.pool.committed:filter(ne(poolname,Metaspace),ne(poolname,Code Cache),ne(poolname,CodeHeap 'non-nmethods'),ne(poolname,CodeHeap 'non-profiled nmethods'),ne(poolname,CodeHeap 'profiled nmethods'),ne(poolname,Compressed Class Space),ne(poolname,class storage),ne(poolname,miscellaneous non-heap storage),ne(poolname,JIT code cache),ne(poolname,JIT data cache)):merge(poolname):sumbuiltin:tech.jvm.memory.pool.used:filter(ne(poolname,Metaspace),ne(poolname,Code Cache),ne(poolname,CodeHeap 'non-nmethods'),ne(poolname,CodeHeap 'non-profiled nmethods'),ne(poolname,CodeHeap 'profiled nmethods'),ne(poolname,Compressed Class Space),ne(poolname,class storage),ne(poolname,miscellaneous non-heap storage),ne(poolname,JIT code cache),ne(poolname,JIT data cache)):merge(poolname):sumbuiltin:tech.jvm.memory.pool.used:filter(or(eq(poolname,Metaspace),eq(poolname,Code Cache),eq(poolname,CodeHeap 'non-nmethods'),eq(poolname,CodeHeap 'non-profiled nmethods'),eq(poolname,CodeHeap 'profiled nmethods'),eq(poolname,Compressed Class Space),eq(poolname,class storage),eq(poolname,miscellaneous non-heap storage),eq(poolname,JIT code cache),eq(poolname,JIT data cache))):merge(poolname):sumbuiltin:tech.jvm.memory.pool.max:filter(or(eq(poolname,CMS Old Gen),eq(poolname,G1 Old Gen),eq(poolname,PS Old Gen),eq(poolname,Tenured Gen),eq(poolname,tenured-LOA),eq(poolname,tenured-SOA))):merge(poolname):sumbuiltin:tech.jvm.memory.pool.used:filter(or(eq(poolname,CMS Old Gen),eq(poolname,G1 Old Gen),eq(poolname,PS Old Gen),eq(poolname,Tenured Gen),eq(poolname,tenured-LOA),eq(poolname,tenured-SOA))):merge(poolname):sumbuiltin:tech.jvm.memory.pool.max:filter(or(eq(poolname,Eden Space),eq(poolname,G1 Survivor Space),eq(poolname,Par Eden Space),eq(poolname,Par Survivor Space),eq(poolname,PS Eden Space),eq(poolname,PS Survivor Space),eq(poolname,nursery-survivor),eq(poolname,nursery-allocate))):merge(poolname):sum builtin:tech.jvm.memory.pool.used:filter(or(eq(poolname,Eden Space),eq(poolname,G1 Survivor Space),eq(poolname,Par Eden Space),eq(poolname,Par Survivor Space),eq(poolname,PS Eden Space),eq(poolname,PS Survivor Space),eq(poolname,nursery-survivor),eq(poolname,nursery-allocate))):merge(poolname):sumThis page describes the Optimization Pack for Java OpenJDK 8 JVM.
The following parameters require their ranges or default values to be updated according to the described rules:
The following tables show a list of constraints that may be required in the definition of the study, depending on the tuned parameters:
jvm_heap_used
bytes
The amount of heap memory used
jvm_heap_util
percent
The utilization % of heap memory
jvm_off_heap_used
bytes
The amount of non-heap memory used
jvm_heap_old_gen_used
bytes
The amount of heap memory used (old generation)
jvm_heap_young_gen_used
bytes
The amount of heap memory used (young generation)
jvm_heap_old_gen_size
bytes
The size of the JVM heap memory (old generation)
jvm_heap_young_gen_size
bytes
The size of the JVM heap memory (young generation)
jvm_memory_used
bytes
The total amount of memory used across all the JVM memory pools
jvm_heap_committed
bytes
The size of the JVM committed memory
jvm_memory_buffer_pool_used
bytes
The total amount bytes used by buffers within the JVM buffer memory pool
jvm_gc_duration
seconds
The average duration of a stop the world JVM garbage collection
jvm_compilation_time
milliseconds
The total time spent by the JVM JIT compiler compiling bytecode
You should select your own domain.
yes
The minimum heap size.
jvm_maxHeapSize
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The maximum heap size.
jvm_maxRAM
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The maximum amount of memory used by the JVM.
jvm_initialRAMPercentage
real
percent
1.563
0.1 → 100
yes
The initial percentage of memory used by the JVM.
jvm_maxRAMPercentage
real
percent
25.0
0.1 → 100.0
yes
The percentage of memory used for maximum heap size, on systems with large physical memory size (more than 512MB). Requires Java 10, Java 8 Update 191 or later.
jvm_alwaysPreTouch
categorical
-AlwaysPreTouch
+AlwaysPreTouch, -AlwaysPreTouch
yes
Pretouch pages during initialization.
jvm_metaspaceSize
integer
megabytes
20
You should select your own domain within 1 and 1024
yes
The initial size of the allocated class metadata space.
jvm_maxMetaspaceSize
integer
megabytes
20
You should select your own domain within 1 and 1024
yes
The maximum size of the allocated class metadata space.
jvm_useTransparentHugePages
categorical
-UseTransparentHugePages
+UseTransparentHugePages, -UseTransparentHugePages
yes
Enables the use of large pages that can dynamically grow or shrink.
jvm_allocatePrefetchInstr
integer
0
0 → 3
yes
Prefetch ahead of the allocation pointer.
jvm_allocatePrefetchDistance
integer
bytes
0
0 → 512
yes
Distance to prefetch ahead of allocation pointer. -1 use system-specific value (automatically determined).
jvm_allocatePrefetchLines
integer
lines
3
1 → 64
yes
The number of lines to prefetch ahead of array allocation pointer.
jvm_allocatePrefetchStyle
integer
1
0 → 3
yes
Selects the prefetch instruction to generate.
jvm_useLargePages
categorical
+UseLargePages
+UseLargePages, -UseLargePages
yes
Enable the use of large page memory.
0 → 2147483647
yes
The ratio of old/new generation sizes.
jvm_newSize
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
Sets the initial and maximum size of the heap for the young generation (nursery).
jvm_maxNewSize
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
Specifies the upper bound for the young generation size.
jvm_survivorRatio
integer
8
1 → 100
yes
The ratio between the Eden and each Survivor-space within the JVM. For example, a jvm_survivorRatio would mean that the Eden-space is 6 times one Survivor-space.
jvm_useAdaptiveSizePolicy
categorical
+UseAdaptiveSizePolicy
+UseAdaptiveSizePolicy, -UseAdaptiveSizePolicy
yes
Enable adaptive generation sizing. Disable coupled with jvm_targetSurvivorRatio.
jvm_adaptiveSizePolicyWeight
integer
10
0 → 100
yes
The weighting given to the current Garbage Collection time versus previous GC times when checking the timing goal.
jvm_targetSurvivorRatio
integer
50
1 → 100
yes
The desired percentage of Survivor-space used after young garbage collection.
jvm_minHeapFreeRatio
integer
40
1 → 99
yes
The minimum percentage of heap free after garbage collection to avoid shrinking.
jvm_maxHeapFreeRatio
integer
70
0 → 100
yes
The maximum percentage of heap free after garbage collection to avoid shrinking.
jvm_maxTenuringThreshold
integer
15
0 → 15
yes
The maximum value for the tenuring threshold.
jvm_gcType
categorical
Parallel
Serial, Parallel, ConcMarkSweep, G1, ParNew
yes
Type of the garbage collection algorithm.
jvm_concurrentGCThreads
integer
threads
You should select your own default value.
You should select your own domain.
yes
The number of threads concurrent garbage collection will use.
jvm_parallelGCThreads
integer
threads
You should select your own default value.
You should select your own domain.
yes
The number of threads garbage collection will use for parallel phases.
jvm_maxGCPauseMillis
integer
milliseconds
200
1 → 1000
yes
Adaptive size policy maximum GC pause time goal in millisecond.
jvm_resizePLAB
categorical
+ResizePLAB
+ResizePLAB, -ResizePLAB
yes
Enables the dynamic resizing of promotion LABs.
jvm_GCTimeRatio
integer
99
0 → 100
yes
The target fraction of time that can be spent in garbage collection before increasing the heap, computet as 1 / (1 + GCTimeRatio).
jvm_initiatingHeapOccupancyPercent
integer
45
0 → 100
yes
Sets the percentage of the heap occupancy at which to start a concurrent GC cycle.
jvm_youngGenerationSizeIncrement
integer
20
0 → 100
yes
The increment size for Young Generation adaptive resizing.
jvm_tenuredGenerationSizeIncrement
integer
20
0 → 100
yes
The increment size for Old/Tenured Generation adaptive resizing.
jvm_adaptiveSizeDecrementScaleFactor
integer
4
1 → 1024
yes
Specifies the scale factor for goal-driven generation resizing.
jvm_CMSTriggerRatio
integer
80
0 → 100
yes
The percentage of MinHeapFreeRatio allocated before CMS GC starts
jvm_CMSInitiatingOccupancyFraction
integer
-1
-1 → 99
yes
Configure oldgen occupancy fraction threshold for CMS GC. Negative values default to CMSTriggerRatio.
jvm_CMSClassUnloadingEnabled
categorical
+CMSClassUnloadingEnabled
+CMSClassUnloadingEnabled, -CMSClassUnloadingEnabled
yes
Enables class unloading when using CMS.
jvm_useCMSInitiatingOccupancyOnly
categorical
-UseCMSInitiatingOccupancyOnly
+UseCMSInitiatingOccupancyOnly, -UseCMSInitiatingOccupancyOnly
yes
Use of the occupancy value as the only criterion for initiating the CMS collector.
jvm_G1HeapRegionSize
integer
megabytes
8
1→32
yes
Sets the size of the regions for G1.
jvm_G1ReservePercent
integer
10
0 → 50
yes
Sets the percentage of the heap that is reserved as a false ceiling to reduce the possibility of promotion failure for the G1 collector.
jvm_G1NewSizePercent
integer
5
0 → 100
yes
Sets the percentage of the heap to use as the minimum for the young generation size.
jvm_G1MaxNewSizePercent
integer
60
0 → 100
yes
Sets the percentage of the heap size to use as the maximum for young generation size.
jvm_G1MixedGCLiveThresholdPercent
integer
85
0 → 100
yes
Sets the occupancy threshold for an old region to be included in a mixed garbage collection cycle.
jvm_G1HeapWastePercent
integer
5
0 → 100
yes
The maximum percentage of the reclaimable heap before starting mixed GC.
jvm_G1MixedGCCountTarget
integer
collections
8
0 → 100
yes
Sets the target number of mixed garbage collections after a marking cycle to collect old regions with at most G1MixedGCLIveThresholdPercent live data. The default is 8 mixed garbage collections.
jvm_G1OldCSetRegionThresholdPercent
integer
10
0 → 100
yes
The upper limit on the number of old regions to be collected during mixed GC.
3 → 2048
yes
The maximum size of the compiled code cache pool.
jvm_tieredCompilation
categorical
+TieredCompilation
+TieredCompilation, -TieredCompilation
yes
The type of the garbage collection algorithm.
jvm_tieredCompilationStopAtLevel
integer
4
0 → 4
yes
Overrides the number of detected CPUs that the VM will use to calculate the size of thread pools.
jvm_compilationThreads
integer
threads
You should select your own default value.
You should select your own domain.
yes
The number of compilation threads.
jvm_backgroundCompilation
categorical
+BackgroundCompilation
+BackgroundCompilation, -BackgroundCompilation
yes
Allow async interpreted execution of a method while it is being compiled.
jvm_inline
categorical
+Inline
+Inline, -Inline
yes
Enable inlining.
jvm_maxInlineSize
integer
bytes
35
1 → 2097152
yes
The bytecode size limit (in bytes) of the inlined methods.
jvm_inlineSmallCode
integer
bytes
2000
1 → 16384
yes
The maximum compiled code size limit (in bytes) of the inlined methods.
+AggressiveOpts, -AggressiveOpts
yes
Turn on point performance compiler optimizations.
jvm_usePerfData
categorical
+UsePerfData
+UsePerfData, -UsePerfData
yes
Enable monitoring of performance data.
jvm_useNUMA
categorical
-UseNUMA
+UseNUMA, -UseNUMA
yes
Enable NUMA.
jvm_useBiasedLocking
categorical
+UseBiasedLocking
+UseBiasedLocking, -UseBiasedLocking
yes
Manage the use of biased locking.
jvm_activeProcessorCount
integer
CPUs
1
1 → 512
yes
Overrides the number of detected CPUs that the VM will use to calculate the size of thread pools.
jvm_newSize
Depends on the configured heap
jvm_maxNewSize
Depends on the configured heap
jvm_concurrentGCThreads
Depends on the available CPU cores
Depends on the available CPU cores
jvm_parallelGCThreads
Depends on the available CPU cores
Depends on the available CPU cores
jvm_compilation_threads
Depends on the available CPU cores
Depends on the available CPU cores
mem_used
bytes
The total amount of memory used
jvm_heap_size
bytes
The size of the JVM heap memory
cpu_util
percent
The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)
cpu_used
CPUs
The total amount of CPUs used
jvm_gc_time
percent
The % of wall clock time the JVM spent doing stop the world garbage collection activities
jvm_gc_count
collections/s
The total number of stop the world JVM garbage collections that have occurred per second
jvm_threads_current
threads
The total number of active threads within the JVM
jvm_threads_deadlocked
threads
The total number of deadlocked threads within the JVM
jvm_minHeapSize
integer
megabytes
jvm_newRatio
integer
jvm_reservedCodeCacheSize
integer
megabytes
jvm_aggressiveOpts
categorical
Parameter
Default value
Domain
jvm_minHeapSize
Depends on the instance available memory
jvm_maxHeapSize
jvm.jvm_minHeapSize <= jvm.jvm_maxHeapSize
jvm.jvm_minHeapFreeRatio <= jvm.jvm_maxHeapFreeRatio
jvm.jvm_maxNewSize < jvm.jvm_maxHeapSize
You should select your own default value.
2
240
-AggressiveOpts
Depends on the instance available memory
jvm.jvm_concurrentGCThreads <= jvm.jvm_parallelGCThreads
mem_used
bytes
The total amount of memory used
jvm_heap_size
bytes
The size of the JVM heap memory
cpu_util
percent
The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)
cpu_used
CPUs
The total amount of CPUs used
jvm_gc_time
percent
The % of wall clock time the JVM spent doing stop the world garbage collection activities
jvm_gc_count
collections/s
The total number of stop the world JVM garbage collections that have occurred per second
jvm_threads_current
threads
The total number of active threads within the JVM
jvm_threads_deadlocked
threads
The total number of deadlocked threads within the JVM
jvm_minHeapSize
integer
megabytes
jvm_newRatio
integer
jvm_reservedCodeCacheSize
integer
megabytes
jvm_usePerfData
categorical
The following parameters require their ranges or default values to be updated according to the described rules:
Parameter
Default value
Domain
jvm_minHeapSize
Depends on the instance available memory
jvm_maxHeapSize
The following tables show a list of constraints that may be required in the definition of the study, depending on the tuned parameters:
jvm.jvm_minHeapSize <= jvm.jvm_maxHeapSize
jvm.jvm_minHeapFreeRatio <= jvm.jvm_maxHeapFreeRatio
jvm.jvm_maxNewSize < jvm.jvm_maxHeapSize * 0.8
jvm_heap_used
bytes
The amount of heap memory used
jvm_heap_util
percent
The utilization % of heap memory
jvm_off_heap_used
bytes
The amount of non-heap memory used
jvm_heap_old_gen_used
bytes
The amount of heap memory used (old generation)
jvm_heap_young_gen_used
bytes
The amount of heap memory used (young generation)
jvm_heap_old_gen_size
bytes
The size of the JVM heap memory (old generation)
jvm_heap_young_gen_size
bytes
The size of the JVM heap memory (young generation)
jvm_memory_used
bytes
The total amount of memory used across all the JVM memory pools
jvm_heap_committed
bytes
The size of the JVM committed memory
jvm_memory_buffer_pool_used
bytes
The total amount bytes used by buffers within the JVM buffer memory pool
jvm_gc_duration
seconds
The average duration of a stop the world JVM garbage collection
jvm_compilation_time
milliseconds
The total time spent by the JVM JIT compiler compiling bytecode
You should select your own default value.
You should select your own domain.
yes
The minimum heap size.
jvm_maxHeapSize
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The maximum heap size.
jvm_maxRAM
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The maximum amount of memory used by the JVM.
jvm_initialRAMPercentage
real
percent
1.563
0.1 → 100
yes
The percentage of memory used for initial heap size.
jvm_maxRAMPercentage
real
percent
25.0
0.1 → 100.0
yes
The percentage of memory used for maximum heap size, on systems with large physical memory size (more than 512MB).
jvm_alwaysPreTouch
categorical
-AlwaysPreTouch
+AlwaysPreTouch, -AlwaysPreTouch
yes
Pretouch pages during initialization.
jvm_metaspaceSize
integer
megabytes
20
You should select your own domain within 1 and 1024
yes
The initial size of the allocated class metadata space.
jvm_maxMetaspaceSize
integer
megabytes
20
You should select your own domain within 1 and 1024
yes
The maximum size of the allocated class metadata space.
jvm_useTransparentHugePages
categorical
-UseTransparentHugePages
+UseTransparentHugePages, -UseTransparentHugePages
yes
Enables the use of large pages that can dynamically grow or shrink.
jvm_allocatePrefetchInstr
integer
0
0 → 3
yes
Prefetch ahead of the allocation pointer.
jvm_allocatePrefetchDistance
integer
bytes
0
0 → 512
yes
Distance to prefetch ahead of allocation pointer. -1 use system-specific value (automatically determined).
jvm_allocatePrefetchLines
integer
lines
3
1 → 64
yes
The number of lines to prefetch ahead of array allocation pointer.
jvm_allocatePrefetchStyle
integer
1
0 → 3
yes
Selects the prefetch instruction to generate.
jvm_useLargePages
categorical
+UseLargePages
+UseLargePages, -UseLargePages
yes
Enable the use of large page memory.
jvm_aggressiveHeap
categorical
-AggressiveHeap
-AggressiveHeap, +AggressiveHeap
yes
Optimize heap options for long-running memory intensive apps.
2
0 → 2147483647
yes
The ratio of old/new generation sizes.
jvm_newSize
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
Sets the initial and maximum size of the heap for the young generation (nursery).
jvm_maxNewSize
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
Specifies the upper bound for the young generation size.
jvm_survivorRatio
integer
8
1 → 100
yes
The ratio between the Eden and each Survivor-space within the JVM. For example, a jvm_survivorRatio would mean that the Eden-space is 6 times one Survivor-space.
jvm_useAdaptiveSizePolicy
categorical
+UseAdaptiveSizePolicy
+UseAdaptiveSizePolicy, -UseAdaptiveSizePolicy
yes
Enable adaptive generation sizing. Disable coupled with jvm_targetSurvivorRatio.
jvm_adaptiveSizePolicyWeight
integer
10
0 → 100
yes
The weighting given to the current Garbage Collection time versus previous GC times when checking the timing goal.
jvm_targetSurvivorRatio
integer
50
1 → 100
yes
The desired percentage of Survivor-space used after young garbage collection.
jvm_minHeapFreeRatio
integer
40
1 → 99
yes
The minimum percentage of heap free after garbage collection to avoid shrinking.
jvm_maxHeapFreeRatio
integer
70
0 → 100
yes
The maximum percentage of heap free after garbage collection to avoid shrinking.
jvm_maxTenuringThreshold
integer
15
0 → 15
yes
The maximum value for the tenuring threshold.
jvm_gcType
categorical
G1
Serial, Parallel, ConcMarkSweep, G1
yes
Type of the garbage collection algorithm.
jvm_concurrentGCThreads
integer
threads
You should select your own default value.
You should select your own domain.
yes
The number of threads concurrent garbage collection will use.
jvm_parallelGCThreads
integer
threads
You should select your own default value.
You should select your own domain.
yes
The number of threads garbage collection will use for parallel phases.
jvm_maxGCPauseMillis
integer
milliseconds
200
1 → 1000
yes
Adaptive size policy maximum GC pause time goal in millisecond.
jvm_resizePLAB
categorical
+ResizePLAB
+ResizePLAB, -ResizePLAB
yes
Enables the dynamic resizing of promotion LABs.
jvm_GCTimeRatio
integer
99
2 → 100
yes
The target fraction of time that can be spent in garbage collection before increasing the heap, computet as 1 / (1 + GCTimeRatio).
jvm_initiatingHeapOccupancyPercent
integer
45
5 → 90
yes
Sets the percentage of the heap occupancy at which to start a concurrent GC cycle.
jvm_youngGenerationSizeIncrement
integer
20
0 → 100
yes
The increment size for Young Generation adaptive resizing.
jvm_tenuredGenerationSizeIncrement
integer
20
0 → 100
yes
The increment size for Old/Tenured Generation adaptive resizing.
jvm_adaptiveSizeDecrementScaleFactor
integer
4
1 → 1024
yes
Specifies the scale factor for goal-driven generation resizing.
jvm_CMSTriggerRatio
integer
80
0 → 100
yes
The percentage of MinHeapFreeRatio allocated before CMS GC starts
jvm_CMSInitiatingOccupancyFraction
integer
-1
-1 → 99
yes
Configure oldgen occupancy fraction threshold for CMS GC. Negative values default to CMSTriggerRatio.
jvm_CMSClassUnloadingEnabled
categorical
+CMSClassUnloadingEnabled
+CMSClassUnloadingEnabled, -CMSClassUnloadingEnabled
yes
Enables class unloading when using CMS.
jvm_useCMSInitiatingOccupancyOnly
categorical
-UseCMSInitiatingOccupancyOnly
+UseCMSInitiatingOccupancyOnly, -UseCMSInitiatingOccupancyOnly
yes
Use of the occupancy value as the only criterion for initiating the CMS collector.
jvm_G1HeapRegionSize
integer
megabytes
8
1→32
yes
Sets the size of the regions for G1.
jvm_G1ReservePercent
integer
10
0 → 50
yes
Sets the percentage of the heap that is reserved as a false ceiling to reduce the possibility of promotion failure for the G1 collector.
jvm_G1NewSizePercent
integer
5
0 → 100
yes
Sets the percentage of the heap to use as the minimum for the young generation size.
jvm_G1MaxNewSizePercent
integer
60
0 → 100
yes
Sets the percentage of the heap size to use as the maximum for young generation size.
jvm_G1MixedGCLiveThresholdPercent
integer
85
0 → 100
yes
Sets the occupancy threshold for an old region to be included in a mixed garbage collection cycle.
jvm_G1HeapWastePercent
integer
5
0 → 100
yes
The maximum percentage of the reclaimable heap before starting mixed GC.
jvm_G1MixedGCCountTarget
integer
collections
8
0 → 100
yes
Sets the target number of mixed garbage collections after a marking cycle to collect old regions with at most G1MixedGCLIveThresholdPercent live data. The default is 8 mixed garbage collections.
jvm_G1OldCSetRegionThresholdPercent
integer
10
0 → 100
yes
The upper limit on the number of old regions to be collected during mixed GC.
jvm_G1AdaptiveIHOPNumInitialSamples
integer
3
1→2097152
yes
The number of completed time periods from initial mark to first mixed GC required to use the input values for prediction of the optimal occupancy to start marking.
jvm_G1UseAdaptiveIHOP
categorical
+G1UseAdaptiveIHOP
+G1UseAdaptiveIHOP, -G1UseAdaptiveIHOP
yes
Adaptively adjust the initiating heap occupancy from the initial value of InitiatingHeapOccupancyPercent.
240
3 → 2048
yes
The maximum size of the compiled code cache pool.
jvm_tieredCompilation
categorical
+TieredCompilation
+TieredCompilation, -TieredCompilation
yes
The type of the garbage collection algorithm.
jvm_tieredCompilationStopAtLevel
integer
4
0 → 4
yes
Overrides the number of detected CPUs that the VM will use to calculate the size of thread pools.
jvm_compilationThreads
integer
threads
You should select your own default value.
You should select your own domain.
yes
The number of compilation threads.
jvm_backgroundCompilation
categorical
+BackgroundCompilation
+BackgroundCompilation, -BackgroundCompilation
yes
Allow async interpreted execution of a method while it is being compiled.
jvm_inline
categorical
+Inline
+Inline, -Inline
yes
Enable inlining.
jvm_maxInlineSize
integer
bytes
35
1 → 2097152
yes
The bytecode size limit (in bytes) of the inlined methods.
jvm_inlineSmallCode
integer
bytes
2000
1 → 16384
yes
The maximum compiled code size limit (in bytes) of the inlined methods.
+UsePerfData
+UsePerfData, -UsePerfData
yes
Enable monitoring of performance data.
jvm_useNUMA
categorical
-UseNUMA
+UseNUMA, -UseNUMA
yes
Enable NUMA.
jvm_useBiasedLocking
categorical
+UseBiasedLocking
+UseBiasedLocking, -UseBiasedLocking
yes
Manage the use of biased locking.
jvm_activeProcessorCount
integer
CPUs
1
1 → 512
yes
Overrides the number of detected CPUs that the VM will use to calculate the size of thread pools.
Depends on the instance available memory
jvm_newSize
Depends on the configured heap
jvm_maxNewSize
Depends on the configured heap
jvm_concurrentGCThreads
Depends on the available CPU cores
Depends on the available CPU cores
jvm_parallelGCThreads
Depends on the available CPU cores
Depends on the available CPU cores
jvm_compilation_threads
Depends on the available CPU cores
Depends on the available CPU cores
jvm.jvm_concurrentGCThreads <= jvm.jvm_parallelGCThreads
jvm_activeProcessorCount < container.cpu_limits/1000 + 1
This page describes the Optimization Pack for Java OpenJDK 17 JVM.
The following parameters require their ranges or default values to be updated according to the described rules:
The following tables show a list of constraints that may be required in the definition of the study, depending on the tuned parameters:
jvm_heap_used
bytes
The amount of heap memory used
jvm_heap_util
percent
The utilization % of heap memory
jvm_off_heap_used
bytes
The amount of non-heap memory used
jvm_heap_old_gen_used
bytes
The amount of heap memory used (old generation)
jvm_heap_young_gen_used
bytes
The amount of heap memory used (young generation)
jvm_heap_old_gen_size
bytes
The size of the JVM heap memory (old generation)
jvm_heap_young_gen_size
bytes
The size of the JVM heap memory (young generation)
jvm_memory_used
bytes
The total amount of memory used across all the JVM memory pools
jvm_heap_committed
bytes
The size of the JVM committed memory
jvm_memory_buffer_pool_used
bytes
The total amount bytes used by buffers within the JVM buffer memory pool
jvm_gc_duration
seconds
The average duration of a stop the world JVM garbage collection
jvm_compilation_time
milliseconds
The total time spent by the JVM JIT compiler compiling bytecode
You should select your own domain.
yes
The minimum heap size.
jvm_maxHeapSize
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The maximum heap size.
jvm_maxRAM
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
The maximum amount of memory used by the JVM.
jvm_initialRAMPercentage
real
percent
2
1 → 100
yes
The percentage of memory used for initial heap size.
jvm_maxRAMPercentage
integer
percent
25
1 → 100
yes
The percentage of memory used for maximum heap size, on systems with large physical memory size (more than 512MB).
jvm_minRAMPercentage
integer
percent
25
1 → 100
yes
The percentage of memory used for maximum heap size, on systems with small physical memory size (up to 256MB)
jvm_alwaysPreTouch
categorical
-AlwaysPreTouch
+AlwaysPreTouch, -AlwaysPreTouch
yes
Pretouch pages during initialization.
jvm_metaspaceSize
integer
megabytes
20
You should select your own domain within 1 and 1024
yes
The initial size of the allocated class metadata space.
jvm_maxMetaspaceSize
integer
megabytes
20
You should select your own domain within 1 and 1024
yes
The maximum size of the allocated class metadata space.
jvm_useTransparentHugePages
categorical
-UseTransparentHugePages
+UseTransparentHugePages, -UseTransparentHugePages
yes
Enables the use of large pages that can dynamically grow or shrink.
jvm_allocatePrefetchInstr
integer
0
0 → 3
yes
Prefetch ahead of the allocation pointer.
jvm_allocatePrefetchDistance
integer
bytes
0
0 → 512
yes
Distance to prefetch ahead of allocation pointer. -1 use system-specific value (automatically determined).
jvm_allocatePrefetchLines
integer
lines
3
0 → 64
yes
The number of lines to prefetch ahead of array allocation pointer.
jvm_allocatePrefetchStyle
integer
1
0 → 3
yes
Selects the prefetch instruction to generate.
jvm_useLargePages
categorical
+UseLargePages
+UseLargePages, -UseLargePages
yes
Enable the use of large page memory.
jvm_aggressiveHeap
categorical
-AggressiveHeap
-AggressiveHeap, +AggressiveHeap
yes
Optimize heap options for long-running memory intensive apps.
0 → 2147483647
yes
The ratio of old/new generation sizes.
jvm_newSize
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
Sets the initial and maximum size of the heap for the young generation (nursery).
jvm_maxNewSize
integer
megabytes
You should select your own default value.
You should select your own domain.
yes
Specifies the upper bound for the young generation size.
jvm_survivorRatio
integer
8
1 → 100
yes
The ratio between the Eden and each Survivor-space within the JVM. For example, a jvm_survivorRatio would mean that the Eden-space is 6 times one Survivor-space.
jvm_useAdaptiveSizePolicy
categorical
+UseAdaptiveSizePolicy
+UseAdaptiveSizePolicy, -UseAdaptiveSizePolicy
yes
Enable adaptive generation sizing. Disable coupled with jvm_targetSurvivorRatio.
jvm_adaptiveSizePolicyWeight
integer
10
0 → 100
yes
The weighting given to the current Garbage Collection time versus previous GC times when checking the timing goal.
jvm_targetSurvivorRatio
integer
50
1 → 100
yes
The desired percentage of Survivor-space used after young garbage collection.
jvm_minHeapFreeRatio
integer
40
1 → 99
yes
The minimum percentage of heap free after garbage collection to avoid shrinking.
jvm_maxHeapFreeRatio
integer
70
0 → 100
yes
The maximum percentage of heap free after garbage collection to avoid shrinking.
jvm_maxTenuringThreshold
integer
15
0 → 15
yes
The maximum value for the tenuring threshold.
jvm_gcType
categorical
G1
Serial, Parallel, G1, Z , Shenandoah
yes
Type of the garbage collection algorithm.
jvm_concurrentGCThreads
integer
threads
You should select your own default value.
You should select your own domain.
yes
The number of threads concurrent garbage collection will use.
jvm_parallelGCThreads
integer
threads
You should select your own default value.
You should select your own domain.
yes
The number of threads garbage collection will use for parallel phases.
jvm_maxGCPauseMillis
integer
milliseconds
200
1 → 1000
yes
Adaptive size policy maximum GC pause time goal in millisecond.
jvm_resizePLAB
categorical
+ResizePLAB
+ResizePLAB, -ResizePLAB
yes
Enables the dynamic resizing of promotion LABs.
jvm_GCTimeRatio
integer
99
0 → 100
yes
The target fraction of time that can be spent in garbage collection before increasing the heap, computet as 1 / (1 + GCTimeRatio).
jvm_initiatingHeapOccupancyPercent
integer
45
0 → 100
yes
Sets the percentage of the heap occupancy at which to start a concurrent GC cycle.
jvm_youngGenerationSizeIncrement
integer
20
0 → 100
yes
The increment size for Young Generation adaptive resizing.
jvm_tenuredGenerationSizeIncrement
integer
20
0 → 100
yes
The increment size for Old/Tenured Generation adaptive resizing.
jvm_adaptiveSizeDecrementScaleFactor
integer
4
1 → 1024
yes
Specifies the scale factor for goal-driven generation resizing.
jvm_G1HeapRegionSize
integer
megabytes
8
1→32
yes
Sets the size of the regions for G1.
jvm_G1ReservePercent
integer
10
0 → 50
yes
Sets the percentage of the heap that is reserved as a false ceiling to reduce the possibility of promotion failure for the G1 collector.
jvm_G1NewSizePercent
integer
5
0 → 100
yes
Sets the percentage of the heap to use as the minimum for the young generation size.
jvm_G1MaxNewSizePercent
integer
60
0 → 100
yes
Sets the percentage of the heap size to use as the maximum for young generation size.
jvm_G1MixedGCLiveThresholdPercent
integer
85
0 → 100
yes
Sets the occupancy threshold for an old region to be included in a mixed garbage collection cycle.
jvm_G1HeapWastePercent
integer
5
0 → 100
yes
The maximum percentage of the reclaimable heap before starting mixed GC.
jvm_G1MixedGCCountTarget
integer
collections
8
0 → 100
yes
Sets the target number of mixed garbage collections after a marking cycle to collect old regions with at most G1MixedGCLIveThresholdPercent live data. The default is 8 mixed garbage collections.
jvm_G1OldCSetRegionThresholdPercent
integer
10
0 → 100
yes
The upper limit on the number of old regions to be collected during mixed GC.
jvm_G1AdaptiveIHOPNumInitialSamples
integer
3
1→2097152
yes
The number of completed time periods from initial mark to first mixed GC required to use the input values for prediction of the optimal occupancy to start marking.
jvm_G1UseAdaptiveIHOP
categorical
+G1UseAdaptiveIHOP
+G1UseAdaptiveIHOP, -G1UseAdaptiveIHOP
yes
Adaptively adjust the initiating heap occupancy from the initial value of InitiatingHeapOccupancyPercent.
jvm_G1PeriodicGCInterval
integer
milliseconds
0
0 → 3600000
yes
The number of milliseconds after a previous GC to wait before triggering a periodic gc. A value of zero disables periodically enforced gc cycles.
jvm_ZProactive
categorical
+ZProactive
+ZProactive, -ZProactive
yes
Enable proactive GC cycles.
jvm_ZUncommit
categorical
+ZUncommit
+ZUncommit, -ZUncommit
yes
Enable uncommit (free) of unused heap memory back to the OS.
jvm_ZAllocationSpikeTolerance
integer
2
1 → 10
yes
The allocation spike tolerance factor for ZGC.
jvm_ZFragmentationLimit
integer
25
10 → 90
yes
The maximum allowed heap fragmentation for ZGC.
jvm_ZCollectionInterval
integer
seconds
0
0 → 3600
yes
Force GC at a fixed time interval (in seconds) for ZGC.
jvm_ZMarkStackSpaceLimit
integer
bytes
8589934592
33554432 → 1099511627776
yes
The maximum number of bytes allocated for mark stacks for ZGC.
32 → 2048
yes
The maximum size of the compiled code cache pool.
jvm_tieredCompilation
categorical
+TieredCompilation
+TieredCompilation, -TieredCompilation
yes
The type of the garbage collection algorithm.
jvm_tieredCompilationStopAtLevel
integer
4
0 → 4
yes
Overrides the number of detected CPUs that the VM will use to calculate the size of thread pools.
jvm_compilationThreads
integer
threads
You should select your own default value.
You should select your own domain.
yes
The number of compilation threads.
jvm_backgroundCompilation
categorical
+BackgroundCompilation
+BackgroundCompilation, -BackgroundCompilation
yes
Allow async interpreted execution of a method while it is being compiled.
jvm_inline
categorical
+Inline
+Inline, -Inline
yes
Enable inlining.
jvm_maxInlineSize
integer
bytes
35
1 → 2097152
yes
The bytecode size limit (in bytes) of the inlined methods.
jvm_inlineSmallCode
integer
bytes
2000
500 → 5000
yes
The maximum compiled code size limit (in bytes) of the inlined methods.
jvm_maxInlineLevel
integer
15
1 → 64
yes
The maximum number of nested calls that are inlined by high tier compiler.
jvm_freqInlineSize
integer
bytes
325
1 → 3250
yes
The maximum number of bytecode instructions to inline for a method.
jvm_compilationMode
categorical
default
default, quick-only, high-only, high-only-quick-internal
yes
The JVM compilation mode.
jvm_typeProfileWidth
integer
2
1 → 8
yes
The number of receiver types to record in call/cast profile.
+UsePerfData, -UsePerfData
yes
Enable monitoring of performance data.
jvm_useNUMA
categorical
-UseNUMA
+UseNUMA, -UseNUMA
yes
Enable NUMA.
jvm_useBiasedLocking
categorical
+UseBiasedLocking
+UseBiasedLocking, -UseBiasedLocking
yes
Manage the use of biased locking.
jvm_activeProcessorCount
integer
CPUs
1
1 → 512
yes
Overrides the number of detected CPUs that the VM will use to calculate the size of thread pools.
jvm_threadStackSize
integer
kilobytes
1024
128 → 16384
yes
The thread Stack Size (in Kbytes).
jvm_newSize
Depends on the configured heap
jvm_maxNewSize
Depends on the configured heap
jvm_concurrentGCThreads
Depends on the available CPU cores
Depends on the available CPU cores
jvm_parallelGCThreads
Depends on the available CPU cores
Depends on the available CPU cores
jvm_compilation_threads
Depends on the available CPU cores
Depends on the available CPU cores
jvm_activeProcessorCount < container.cpu_limits/1000 + 1
mem_used
bytes
The total amount of memory used
jvm_heap_size
bytes
The size of the JVM heap memory
cpu_util
percent
The average CPU utilization % across all the CPUs (i.e., how much time on average the CPUs are busy doing work)
cpu_used
CPUs
The total amount of CPUs used
jvm_gc_time
percent
The % of wall clock time the JVM spent doing stop the world garbage collection activities
jvm_gc_count
collections/s
The total number of stop the world JVM garbage collections that have occurred per second
jvm_threads_current
threads
The total number of active threads within the JVM
jvm_threads_deadlocked
threads
The total number of deadlocked threads within the JVM
jvm_minHeapSize
integer
megabytes
jvm_newRatio
integer
jvm_reservedCodeCacheSize
integer
megabytes
jvm_usePerfData
categorical
Parameter
Default value
Domain
jvm_minHeapSize
Depends on the instance available memory
jvm_maxHeapSize
jvm.jvm_minHeapSize <= jvm.jvm_maxHeapSize
jvm.jvm_minHeapFreeRatio <= jvm.jvm_maxHeapFreeRatio
jvm.jvm_maxNewSize < jvm.jvm_maxHeapSize * 0.8
You should select your own default value.
2
240
+UsePerfData
Depends on the instance available memory
jvm.jvm_concurrentGCThreads <= jvm.jvm_parallelGCThreads