Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
A metric is a measured property of a system.
Examples of a metric include:
the response time of an application
the utilization of a CPU
the amount of time spent in garbage collection
the cost of a cloud service
Metrics are used to both specify the optimization goal and constraints (e.g. minimize the heap size while keeping response time < 1000 and error rate <= 10% of a baseline value), and to assess the behavior of the system with respect to each specific configuration applied.
A metric is described by the following properties:
a name that uniquely identifies the metric
a description that clarifies the semantics of the metric
a unit that defines the unit of measurement used by the metric
The construct to be used to define a metric is described on the Metric template page.
Metrics are displayed in the Akamas UI when drilling down to each system component.
and are represented in metric charts for each specific optimization study
Please notice that in order for a metric to be displayed in the Akamas UI, it has to be collected from a Telemetry Provider by means of a specific Telemetry Instance defined for each specific target system.
A workflow is a set of tasks that needs to be executed in sequence to evaluate a configuration as part of an optimization study. A task is a single action performed within a workflow.
Workflows allow you to automate Akamas optimization studies, by automatically executing a sequence of tasks such as initializing an environment, triggering load testing, restoring a database, applying configurations, and much more.
These are examples of common tasks that can be performed by a task
Launch remote commands via SSH
Apply parameters values in configuration files
Execute Spark jobs via spark-submit API
Start performance tests by integrating with external tools such as Neoload
Workflows are first-class entities that can be defined globally and then used in multiple optimization studies.
Akamas provides several workflow operators that can be used to perform tasks in a workflow. Some operators are general-purpose, such as those executing a command or script on a specific host, while others provide native integrations with specific technologies and tools, such as Spark History Server or load testing tools.
The construct to be used to define a workflow is described on the Workflow template page.
A telemetry provider is an Akamas resource that can be managed via CLI using the resource management commands.
The Akamas UI shows systems in a specific top-level menu.
The list of tasks is displayed when drilling down to each specific workflow.
A parameter is a property of the system that can be applied and tuned to change the system's behavior. Akamas optimizes systems by changing parameters to achieve the stated goal while respecting the defined constraints.
Examples of a parameter include:
Configuration knobs (e.g. JVM garbage collection type)
Resource settings (e.g. amount of memory allocated to a Spark job)
Algorithms settings (e.g. learning rate of a neural network)
Architectural properties (e.g. how many caching layers in an enterprise application)
Type of resources (e.g. AWS EC2 instance or EBS volume type)
Any other thing (e.g. amount of sugar in your cookies)
The following table describes the parameter types:
Prameter Type | Domain | Akamas normalized domain |
---|---|---|
A parameter is described by the following properties:
a name that uniquely identifies the parameter
a description that clarifies the semantics of the parameter
a unit that defines the unit of measurement used by the parameter
Although users can create parameters with any name, we suggest using the naming convention context_parameter
where
context
refers to the technology or more general environment in which that metric is defined (e.g. elasticsearch, jvm, mysql, spark)
parameter
is the parameter name in the original context (e.g. gcType, numberOfExecutors)
This makes it possible to identify parameters more easily and avoid any potential name clash.
The construct to be used to define a parameter is described on the Parameter template page.
Parameters are displayed in the Akamas UI when drilling down to each system component.
For each optimization study, the optimization scope is the set of parameters that Akamas can change to achieve the defined optimization goal.
REAL
real values
Akamas normalizes the values
[0.0, 10.0] → [0.0, 1.0]
INTEGER
integer values
Akamas converts the integer into real and then normalizes the values
[0, 3] → [0.0, 3.0] → [0.0, 1.0]
ORDINAL
integer values
Akamas converts the category into real and then normalizes the values
['a', 'b', 'c'] → [0, 2] → [0.0, 2.0] → [0.0, 1.0]
CATEGORICAL
categorical values
Akamas converts each param value into a new param that may be either 1.0 (active) or 0.0 (inactive), only 1 of these new params can be "active" during each exp:
['a', 'b', 'c'] → [[0.0, 1.0], [0.0, 1.0], [0.0, 1.0]]
This section provides a definition of Akamas' key concepts and terms and also provides references to the related construct properties, commands, and user interfaces.
Term | Definition |
---|---|
A telemetry instance is an instance of a that collects data from a specific instance of the data source.
A telemetry instance is an instance of a telemetry provider, providing the required information on how to connect and collect a given set of metrics from a specific data source.
While telemetry providers are platform-wide entities, telemetry instances are defined at each system level.
The construct to be used to define a telemetry instance is described on the page.
A telemetry provider is an that can be managed via CLI using the
Telemetry instances are displayed in the Akamas UI when drilling down each system component.
systems targeted by optimization studies
elements of the system
types associated to a system component
objects encapsulating knowledge about component types
a measured metric, collected via telemetry providers
tunable parameters, set via native or other interfaces
general definition of providers of collected metrics
specific instances of telemetry providers
automation workflow to set parameters, collect metrics and run load testing
goal and constraints defined for an optimization study
optimization studies for a target system
optimization studies for a non-live system
optimization studies for a live system
virtual environments to organize and isolate resources
A component represents an element of a system. Typically, systems are made up of different entities and layers which can be modeled by components. In other words, a system can be considered a collection of related components.
Notice that a component is a black-box definition of each entity involved in an optimization study, so detailed modeling of the entities being involved in the optimization is not required. The only relevant elements are the parameters that are involved in the optimization and the metrics that are collected to evaluate the results of such an optimization.
Notice that only the entities that are directly involved in the optimization need to be modeled and defined within Akamas. An entity is involved in an optimization study if it is optimized or monitored by Akamas, where "optimized" means that Akamas is optimizing at least one of its parameters, and "monitored" means that Akamas is monitoring at least one of its metrics.
A component is described by the following mandatory properties (other properties can be defined but are not mandatory):
a name that uniquely identifies the component within the system
a description that clarifies what the component refers to
a component type that identifies the technology of the component (see component type)
In general, a component contains a set of each of the following:
parameter(s) in the scope of the optimization
metric(s) needed to define the optimization goal
metric(s) needed to define the optimization constraints
metric(s) that are not needed to either define the optimization goal or constraints, and hence not used by Akamas to perform the optimization, but are collected in order to support the analysis (and which can be possibly added at a later time as part of optimization goal or constraint when refining the optimization).
The construct to be used to define a component is described on the Component template page.
A component is an Akamas resource that can be managed via CLI using the resource management commands.
The Akamas UI shows more details about components by drilling down their respective system.
A system represents of the entire system which is the target of optimization.
A system is a single object irrespective of the number or type of entities or layers that are in the scope of the optimization. It can be used to model and describe a wide set of entities like:
An N-layers application
A single micro-service
A single (or a collection of) batch job(s)
A System is made of one or more components. Each component represents one of the elements in the system, whose parameters are involved in the optimization or whose metrics are collected to evaluate the results of such optimization.
A system is described by the following properties:
The full micro-services stack of an application
a name that uniquely identifies the system
a description that clarifies what the system refers to
The construct to be used to define a system is described on the System template page.
A system is an Akamas resource that can be managed via CLI using the resource management commands.
The Akamas UI displays systems (depending on the user privileges on the defined workspaces) in a specific top-level menu.
A component type is a blueprint for a component that describes the type of entity the component refers to. In Akamas, a component needs to be associated with a component type, from which the component inherits its metrics and parameters.
Component types are platform entities (i.e.: shared among all the users) usually provided off the shelf and shipped within the Optimization Packs. Typically, different component types within the same optimization pack are used to model different versions/releases of the same technology.
Akamas users with appropriate privileges can create custom component types and optimization packs, as described on the Creating custom optimization pack page.
A component type is described by the following mandatory properties (other properties can be defined but are not mandatory):
a name that uniquely identifies the component type within the system
a description that clarifies what the component type refers to
a parameter definitions array (more on Parameters later)
a metrics array (more on Metrics later)
The construct to be used to define a component type is described on the Component type template page.
A component type is an Akamas resource that can be managed via CLI using the resource management commands.
When visualizing system components the component type is displayed.
The following figure shows the out-of-the-box JVM component types related to the JVM optimization pack.
An optimization pack is a software object that provides a convenient facility for encapsulating all the knowledge (e.g. metrics, parameters with their default values and domain ranges) required to apply Akamas optimizations to a set of entities associated with the same technology.
Notice that while optimization packs are very convenient for modeling systems and creating studies, it is not required for these entities to be covered by an optimization pack.
Akamas provides a library of out-the-box optimization packs and new custom optimization packs can be easily added (no coding is required).
An optimization pack needs to include the entities that encapsulate technology-specific information related to the supported component types:
supported component types
parameters and metrics for each component type
supported telemetry providers (optional)
An optimization pack is an Akamas resource that can be managed via CLI using the resource management commands.
The Akamas UI shows systems in a specific top-level menu.
An optimization pack encapsulates one or more of the following technology-specific elements:
Component Types: these represent the type of the component(s) included, each with its associated parameters and metrics
Telemetry Providers: that define where to collect metrics
An optimization pack enables Akamas users to optimize a technology without necessarily being an expert in that technology and to code their knowledge about a technology or a specific application to be reused in multiple optimization studies to ease the modeling process.
A KPI is a metric that is worth considering when analyzing the result of an offline optimization study, looking for (sub)optimal configurations generated by Akamas AI to be applied.
Akamas automatically considers any metric referred to in the defined optimization goal and constraints for an offline optimization study as a KPI. Moreover, any other metrics of the system component can be specified as a KPI for an offline optimization study.
A KPI is defined as follows (from the UI or the CLI):
Field name | Field description |
---|
KPIs are not an as they are defined as part of an optimization study. The construct to define KPIs is described on the page of the section.
KPIs are not an and are always defined as part of an optimization study.
The number and first KPIs are displayed in the Akamas UI in the header of each offline optimization study.
The full list of KPIs is displayed by drilling down to the KPIs section.
From this section, it is possible to modify the list of KPIs and change their names and other attributes.
The optimization goal defines the objective of an to be achieved by changing the system parameters to modify the system behavior while also satisfying any defined optimization constraints on the system metrics, possibly representing SLOs.
A goal is defined by:
an optimization objective: either maximize or minimize
a scoring function (scalar): either a single metric or a formula defined by one or more metrics
One or more constraints can be associated with a goal
a formula defined on one or more metrics, referring to either absolute values (absolute constraints) or relative to a baseline value (relative constraints)
Notice that relative constraints are only supported by while absolute constraints are supported by both offline and .
Goals and constraints are not an as they are defined as part of an optimization study. The construct to be used to define a goal and its constraints are described in the page of the section.
Goals and constraints are not an and are always defined as part of an optimization study.
Goals and constraints are displayed in the Akamas UI when drilling down each optimization study.
The detail of the formula used to define the goal may also be displayed:
An optimization study (or study for short) represents an optimization initiative aimed at optimizing a goal on a target system. A study instructs Akamas about the space to explore and the KPIs used to evaluate whether a con configuration is good or bad
Akamas supports two types of optimizations:
are optimization studies where the workload is simulated by leveraging a load-testing tool.
are applied to systems that need to be optimized in production with respect to varying workloads observed while running live. For example, a microservices application can be optimized live by having Kubernetes and JVM parameters dynamically tuned for multiple microservices so as to minimize costs while matching response time objectives.
A study is described by the following properties
system: the under optimization
parameters: the set of being optimized
metrics: the set of to be collected
workflow: the describing tasks to perform experiments/trials
goal: the desired optimization to be achieved
constraints: the optimization that any configuration needs to satisfy
steps: the steps that are executed to run specific configurations (e.g. the baseline) and run the optimization
The construct to be used to define an optimization is described on the page.
The Akamas UI shows optimization studies in 2 specific top-level menus: one for offline optimization studies and another for live optimization studies.
A telemetry provider is a software object that represents a data source of metrics. A is a specific instance of a telemetry provider that refers to a specific data source.
Examples of telemetry providers are:
monitoring tools (e.g. Prometheus or Dynatrace)
load testing tools (e.g. LoadRunner or Neoload)
CSV files
A telemetry provider is a platform-wide entity that can be reused across systems to ease the integration with metrics sources.
Akamas provides a number of out-of-the-box . Custom telemetry providers can also be created.
The construct to be used to define a telemetry provider is described on the page.
A telemetry provider is an that can be managed via CLI using the
The Akamas UI shows systems in a specific top-level menu.
An optimization study is an that can be managed via CLI using the
A workspace is a virtual environment that groups systems, workflows, and studies to restrict user access to them: a user can access these resources only when granted the required permissions to that workspace.
Akamas defines two user roles according to the assigned permission on the workspace:
Contributors (write permission) can create and manage workspace resources (studies, telemetry instances, systems, and workflows) and can also do exports/imports, view all global resources (Optimization Packs, and Telemetry Providers), and see remaining credits;
Viewers (read permission) can only access optimization results but cannot create or modify workspace resources.
Workspaces and accesses are managed by users with administrative privileges. A user with administrator privileges can manage licenses, users, workspaces, and install/deinstall Optimization Packs, and Telemetry Providers.
Workspaces can be defined according to different criteria, such as:
By department (e.g. Performance, Development)
By initiative (e.g. Poc, Training)
By application (e.g. Registry, Banking..)
A workspace is described by the following property:
a name that uniquely identifies the workspace
A workspace is an Akamas resource that can be managed via CLI using the resource management commands. See also this page devoted to commands on how to define users and workspaces.
The workspace a study belongs to is always displayed. Filters can be used to select only studies belonging to specific workspaces
Name | Name of the KPI that will be used on UI labels |
Formula | Must be defined as <Component_name>.<metric_name> |
Direction | Must be 'minimize' or 'maximize' |
Aggregation | A valid metric aggregation such as min, max, avg, sum, p95, etc. If unspecified, default is avg |
Live Optimization studies are optimization studies where the workload is real: the system that needs to be optimized operates with respect to varying workloads observed while running live.
Live optimization studies are typically used to optimize systems in production environments. For example, a microservices application can be optimized while running in production by having Kubernetes and JVM parameters dynamically tuned for multiple microservices so as to minimize costs while matching response time objectives.
The following figure represents the iterative process associated with offline optimizations:
The following 5 phases can be identified for each iteration:
Collect KPIs: Akamas collects the metrics of the system required to observe its behavior under the current parameter configuration by leveraging the associated telemetry provider - here Akamas is also observing and categorizing the different workload contexts that are used to recommend configurations that are appropriate for each specific workload context
Score vs Goal: Akamas scores the applied parameter configuration under the specific workload context against the defined goal and constraints
Recommend Conf: Akamas provides a recommendation for parameter configuration based on the observed behavior under the specific workload context and leveraging the Akamas AI
Human Approval: this is an optional step as there are two operational modes:
autonomous mode: no human intervention is required
human-approval mode: recommendations need to be approved by users before configuration changes get applied - recommendations can be changed by users
Apply Conf: Akamas applies the recommended (and possibly revisited) configuration, by leveraging the defined workflow.
Notice that configurations can be applied by Akamas via integrations with native interfaces (e.g. Kubectl), by leveraging any orchestration and automation tool in place (e.g. OpenShift), or by triggering a pull request to a configuration repository (e.g. Git). This can be applied to either the entire target system or to a canary deployment.
Akamas provides several customizable policies for live optimization studies to ensure that recommended configuration changes to production environments are as safe as possible. Akamas safety policies include gradual optimization, smart constraints, and outlier detection. For more details, see settings safety policies.
A live optimization study is an Akamas resource that can be managed via CLI using resource management commands.
The Akamas UI shows live optimization studies in a specific top-level menu.
The details and results of an offline optimization study are displayed when drilling down.
Offline optimization studies are where the workload is simulated by leveraging a load-testing tool.
Offline optimization studies are typically used to optimize systems in pre-production environments, with respect to planned and what-if scenarios that cannot be directly run in production. Scenarios include new application releases, planned technology changes (e.g. new JVM or DB), cloud migration or new provider, expected workload growth, and resilience under failure scenarios (from chaos engineering).
The following figure represents the iterative process associated with offline optimizations:
The following 5 phases can be identified for each iteration (also known as experiment):
Recommend Conf: Akamas AI engine identifies the configuration for the next iteration until a termination condition for the study is met (e.g. number of experiments).
Thanks to its patented AI (reinforcement learning) algorithms, Akamas can find the optimal configuration without having to explore all the possible configurations.
For each experiment, Akamas allows multiple trials to be executed. A trial is a repetition of the same experiment to reduce the impact of noise on the result of an experiment.
Environments can be noisy for several reasons such as:
External conditions (e.g. background jobs, "noisy neighbors" in the cloud)
Measurement errors (e.g. monitoring tools not always 100% accurate)
This approach is consistent with scientific and engineering practices, where the strategy to minimize the impact of noise is to repeat the same experiment multiple times.
An offline optimization study can include multiple steps.
Typically there are at least two steps:
Baseline step: a single experiment that is run by applying the already deployed configuration before the Akamas optimization is applied - the results of this experiment are used as a reference (baseline) for assessing the optimization and as such is a mandatory step for each study
Optimize step: a defined number of experiments used to identify the optimal configuration by leveraging Akamas AI.
Other steps are:
Bootstrap step: imported experiments from other optimization studies
Preset step: a single experiment with a defined configuration
The steps to be executed can be specified when defining an offline optimization study.
The Akamas UI shows offline optimization studies in a specific top-level menu.
The details and results of an offline optimization study are displayed when drilling down (there are multiple tabs and sections).
Apply configuration: Akamas applies the parameter configuration (one or more ) to the target system by leveraging a set of
Apply workload: Akamas triggers a workload on the target system by also leveraging a set of
Collect KPIs: Akamas collects the related to the target system - only those metrics that are specified by each defined in the system
Score vs goal: Akamas scores the applied parameter configuration against the defined - the score is the value of the goal function
An offline optimization study is an that can be managed via CLI using the