Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
A system represents of the entire system which is the target of optimization.
A system is a single object irrespective of the number or type of entities or layers that are in the scope of the optimization. It can be used to model and describe a wide set of entities like:
An N-layers application
A single micro-service
A single (or a collection of) batch job(s)
A System is made of one or more components. Each component represents one of the elements in the system, whose parameters are involved in the optimization or whose metrics are collected to evaluate the results of such optimization.
A system is described by the following properties:
The full micro-services stack of an application
a name that uniquely identifies the system
a description that clarifies what the system refers to
The construct to be used to define a system is described on the System template page.
A system is an Akamas resource that can be managed via CLI using the resource management commands.
The Akamas UI displays systems (depending on the user privileges on the defined workspaces) in a specific top-level menu.
A component represents an element of a . Typically, systems are made up of different entities and layers which can be modeled by components. In other words, a system can be considered a collection of related components.
Notice that a component is a black-box definition of each entity involved in an optimization study, so detailed modeling of the entities being involved in the optimization is not required. The only relevant elements are the that are involved in the optimization and the that are collected to evaluate the results of such an optimization.
Notice that only the entities that are directly involved in the optimization need to be modeled and defined within Akamas. An entity is involved in an optimization study if it is optimized or monitored by Akamas, where "optimized" means that Akamas is optimizing at least one of its parameters, and "monitored" means that Akamas is monitoring at least one of its metrics.
A component is described by the following mandatory properties (other properties can be defined but are not mandatory):
a name that uniquely identifies the component within the system
a description that clarifies what the component refers to
a component type that identifies the technology of the component (see )
In general, a component contains a set of each of the following:
parameter(s) in the scope of the optimization
metric(s) needed to define the optimization goal
metric(s) needed to define the optimization constraints
metric(s) that are not needed to either define the optimization goal or constraints, and hence not used by Akamas to perform the optimization, but are collected in order to support the analysis (and which can be possibly added at a later time as part of optimization goal or constraint when refining the optimization).
The construct to be used to define a component is described on the page.
The Akamas UI shows more details about components by drilling down their respective system.
A component is an that can be managed via CLI using the
A parameter is a property of the system that can be applied and tuned to change the system's behavior. Akamas optimizes systems by changing parameters to achieve the stated goal while respecting the defined constraints.
Examples of a parameter include:
Configuration knobs (e.g. JVM garbage collection type)
Resource settings (e.g. amount of memory allocated to a Spark job)
Algorithms settings (e.g. learning rate of a neural network)
Architectural properties (e.g. how many caching layers in an enterprise application)
Type of resources (e.g. AWS EC2 instance or EBS volume type)
Any other thing (e.g. amount of sugar in your cookies)
The following table describes the parameter types:
Prameter Type | Domain | Akamas normalized domain |
---|---|---|
A parameter is described by the following properties:
a name that uniquely identifies the parameter
a description that clarifies the semantics of the parameter
a unit that defines the unit of measurement used by the parameter
Although users can create parameters with any name, we suggest using the naming convention context_parameter
where
context
refers to the technology or more general environment in which that metric is defined (e.g. elasticsearch, jvm, mysql, spark)
parameter
is the parameter name in the original context (e.g. gcType, numberOfExecutors)
This makes it possible to identify parameters more easily and avoid any potential name clash.
The construct to be used to define a parameter is described on the Parameter template page.
Parameters are displayed in the Akamas UI when drilling down to each system component.
For each optimization study, the optimization scope is the set of parameters that Akamas can change to achieve the defined optimization goal.
This section provides a definition of Akamas' key concepts and terms and also provides references to the related construct properties, commands, and user interfaces.
Term | Definition |
---|
A telemetry instance is an instance of a that collects data from a specific instance of the data source.
A telemetry instance is an instance of a telemetry provider, providing the required information on how to connect and collect a given set of metrics from a specific data source.
While telemetry providers are platform-wide entities, telemetry instances are defined at each system level.
The construct to be used to define a telemetry instance is described on the page.
A telemetry provider is an that can be managed via CLI using the
Telemetry instances are displayed in the Akamas UI when drilling down each system component.
A telemetry provider is a software object that represents a data source of metrics. A is a specific instance of a telemetry provider that refers to a specific data source.
Examples of telemetry providers are:
monitoring tools (e.g. Prometheus or Dynatrace)
load testing tools (e.g. LoadRunner or Neoload)
CSV files
A telemetry provider is a platform-wide entity that can be reused across systems to ease the integration with metrics sources.
Akamas provides a number of out-of-the-box . Custom telemetry providers can also be created.
The construct to be used to define a telemetry provider is described on the page.
A telemetry provider is an that can be managed via CLI using the
The Akamas UI shows systems in a specific top-level menu.
An optimization pack is a software object that provides a convenient facility for encapsulating all the knowledge (e.g. metrics, parameters with their default values and domain ranges) required to apply Akamas optimizations to a set of entities associated with the same technology.
Notice that while optimization packs are very convenient for modeling systems and creating studies, it is not required for these entities to be covered by an optimization pack.
Akamas provides a library of out-the-box optimization packs and new custom optimization packs can be easily added (no coding is required).
An optimization pack needs to include the entities that encapsulate technology-specific information related to the supported component types:
supported component types
parameters and metrics for each component type
supported telemetry providers (optional)
An optimization pack is an that can be managed via CLI using the
The Akamas UI shows systems in a specific top-level menu.
An optimization pack encapsulates one or more of the following technology-specific elements:
Component Types: these represent the type of the component(s) included, each with its associated parameters and metrics
Telemetry Providers: that define where to collect metrics
An optimization pack enables Akamas users to optimize a technology without necessarily being an expert in that technology and to code their knowledge about a technology or a specific application to be reused in multiple optimization studies to ease the modeling process.
REAL
real values
Akamas normalizes the values
[0.0, 10.0] → [0.0, 1.0]
INTEGER
integer values
Akamas converts the integer into real and then normalizes the values
[0, 3] → [0.0, 3.0] → [0.0, 1.0]
ORDINAL
integer values
Akamas converts the category into real and then normalizes the values
['a', 'b', 'c'] → [0, 2] → [0.0, 2.0] → [0.0, 1.0]
CATEGORICAL
categorical values
Akamas converts each param value into a new param that may be either 1.0 (active) or 0.0 (inactive), only 1 of these new params can be "active" during each exp:
['a', 'b', 'c'] → [[0.0, 1.0], [0.0, 1.0], [0.0, 1.0]]
While Akamas leverages similar AI methods for both live optimizations and optimization studies, the way these methods are applied is radically different. Indeed, for optimization studies running in pre-production environments, the approach is to explore the configuration space by also accepting potential failed experiments, to identify regions that do not correspond to viable configurations. Of course, this approach cannot be accepted for live optimization running in production environments. For this purpose, Akamas live optimization uses observations of configuration changes combined with the automatic detection of workload contexts and provides several customizable safety policies when recommending configurations to be approved, revisited, and applied.
Akamas provides a few customizable optimizer options (refer to the options described on the Optimize step page of the reference guide) that should be configured so as to make configurations recommended in live optimization and applied to production environments as safe as possible.
Akamas provides an optimizer option known as the exploration factor that only allows gradual changes to the parameters. This gradual optimization allows Akamas to observe how these changes impact the system behavior before applying the following gradual changes.
By properly configuring the optimizer, Akamas can gradually explore regions of the configuration space and slowly approach any potentially risky regions, thus avoiding recommending any configurations that may negatively impact the system. Gradual optimization takes into account the maximum recommended change for each parameter. This is defined as a percentage (default is 5%) with respect to the baseline value. For example, in the case of a container whose CPU limit is 1000 millicores, the corresponding maximum allowed change is 50 millicores. It is important to notice that this does not represent an absolute cap, as Akamas also takes into account any good configurations observed. For example, in the event of a traffic peak, Akamas would recommend a good configuration that was observed working fine for a similar workload in the past, even if the change is higher than 5% of the current configuration value.
Notice that this feature would not work for categorical parameters (e.g. JVM GC Type) as their values do not change incrementally. Therefore, when it comes to these parameters, Akamas by default takes a conservative approach of only recommending configurations with categorical parameters taking already observed before values. This still allows some never-observed values to be recommended as users are allowed to modify values also for categorical parameters when operating in human-in-the-loop mode. Once Akamas has observed that that specific configuration is working fine, the corresponding value can then be recommended. For example, a user might modify the recommended configuration for GC Type from Serial to Parallel. Once Parallel has been observed as working fine, Akamas would consider it for future recommendations of GC Type, while other values (e.g. G1) would not be considered until verified as safe recommendations.
The exploration factor can be customized for each live optimization individually and changed while live optimizations are running.
Akamas provides an optimizer option known as the safety factor designed to prevent Akamas from selecting configurations (even if slowly approaching them) that may impact the ability to match defined SLOs. For example, when optimizing container CPU limits, lower and lower CPU limits might be recommended, up to the point that the limit becomes too low that the application performance degrades.
Akamas takes into account the magnitude of constraint breaches: a severe breach is considered more negative than a minor breach. For example, in the case of an SLO of 200 ms on response time, a configuration causing a 1 sec response time is assigned a very different penalty than a configuration causing a 210 ms response time. Moreover, Akamas leverages the smart constraint evaluation feature that takes into account if a configuration is causing constraints to approach their corresponding thresholds. For example, in the case of an SLO of 200 ms on response time, a configuration changing response time from 170 ms to 190 ms is considered more problematic than one causing a change from 100 ms to 120 ms. The first one is considered by Akamas as corresponding to a gray area that should not be explored.
The safety factor is also used when starting the study in order to validate the behavior of the baseline to identify the safety of exploring configurations close to the baseline. If the baseline presents some constraint violations, then even exploring configurations close to the baseline might cause a risk. If Akamas identifies that, in the baseline configuration, more than (safety_factor*number_of_trials) manifest constraint violations then the optimization is stopped.
If your baseline has some trials failing constraint validation we suggest you analyze them before proceeding with the optimization
The safety factor is set by default to 0.5 and can be customized for each live optimization individually and changed while live optimizations are running.
It is also worth mentioning that Akamas also features an outlier detection capability to compensate for production environments typically being noisy and much less stable than staging environments, thus displaying highly fluctuating performance metrics. As a consequence, constraints may fail from time to time, even for perfectly good configurations. This may be due to a variety of causes, such as shared infrastructure on the cloud, slowness of external systems, etc.
A workflow is a set of tasks that run in sequence to evaluate a configuration as part of an optimization study. A task is a single action performed within a workflow.
Workflows allow you to automate Akamas optimization studies, by automatically executing a sequence of tasks such as initializing an environment, triggering load testing, restoring a database, applying configurations, and much more.
These are examples of common tasks:
Launch remote commands via SSH
Apply parameter values in configuration files
Execute Spark jobs via spark-submit API
Start performance tests by integrating with external tools such as Neoload
Workflows are first-class entities that can be defined globally and then used in multiple optimization studies.
Akamas provides several workflow operators that can be used to perform tasks in a workflow. Some operators are general-purpose, such as those executing a command or script on a specific host, while others provide native integrations with specific technologies and tools, such as Spark History Server or load testing tools.
The construct to be used to define a workflow is described on the Workflow template page.
A telemetry provider is an Akamas resource that can be managed via CLI using the resource management commands.
The Akamas UI shows systems in a specific top-level menu.
The list of tasks is displayed when drilling down to each specific workflow.
A component type is a blueprint for a component that describes the type of entity the component refers to. In Akamas, a component needs to be associated with a component type, from which the component inherits its metrics and parameters.
Component types are platform entities (i.e.: shared among all the users) usually provided off the shelf and shipped within the Optimization Packs. Typically, different component types within the same optimization pack are used to model different versions/releases of the same technology.
Akamas' users with appropriate privileges can create custom component types and optimization packs, as described on the Creating custom optimization pack page.
A component type is described by the following mandatory properties (other properties can be defined but are not mandatory):
a name that uniquely identifies the component type within the system
a description that clarifies what the component type refers to
a parameter definitions array (more on Parameters later)
a metrics array (more on Metrics later)
The construct to be used to define a component type is described on the Component type template page.
A component type is an Akamas resource that can be managed via CLI using the resource management commands.
When visualizing system components the component type is displayed.
The following figure shows the out-of-the-box JVM component types related to the JVM optimization pack.
systems targeted by optimization studies |
elements of the system |
types associated to a system component |
objects encapsulating knowledge about component types |
a measured metric, collected via telemetry providers |
tunable parameters, set via native or other interfaces |
general definition of providers of collected metrics |
specific instances of telemetry providers |
automation workflow to set parameters, collect metrics and run load testing |
goal and constraints defined for an optimization study |
optimization studies for a target system |
optimization studies for a non-live system |
optimization studies for a live system |
virtual environments to organize and isolate resources |
The optimization goal defines the objective of an optimization study to be achieved by changing the system parameters to modify the system behavior while also satisfying any defined optimization constraints on the system metrics, possibly representing SLOs.
A goal is defined by:
an optimization objective: either maximize or minimize
a scoring function (scalar): either a single metric or a formula defined by one or more metrics
One or more constraints can be associated with a goal
a formula defined on one or more metrics, referring to either absolute values (absolute constraints) or relative to a baseline value (relative constraints)
Notice that relative constraints are only supported by offline optimization studies while absolute constraints are supported by both offline and online optimization studies.
Goals and constraints are not an Akamas resource as they are defined as part of an optimization study. The construct to be used to define a goal and its constraints are described in the Goal & Constraint page of the Study template section.
Goals and constraints are not an Akamas resource and are always defined as part of an optimization study.
Goals and constraints are displayed in the Akamas UI when drilling down each optimization study.
The detail of the formula used to define the goal may also be displayed:
An optimization study (or study for short) represents an optimization initiative aimed at optimizing a goal on a target system. A study instructs Akamas about the space to explore and the KPIs used to evaluate whether a con configuration is good or bad
Akamas supports two types of optimizations:
Offline Optimization Studies are optimization studies where the workload is simulated by leveraging a load-testing tool.
Live Optimization Studies are applied to systems that need to be optimized in production with respect to varying workloads observed while running live. For example, a microservices application can be optimized live by having Kubernetes and JVM parameters dynamically tuned for multiple microservices so as to minimize costs while matching response time objectives.
A study is described by the following properties
system: the system under optimization
parameters: the set of parameters being optimized
metrics: the set of metrics to be collected
workflow: the workflow describing tasks to perform experiments/trials
goal: the desired optimization goal to be achieved
constraints: the optimization constraints that any configuration needs to satisfy
steps: the steps that are executed to run specific configurations (e.g. the baseline) and run the optimization
The construct to be used to define an optimization is described on the Study template page.
An optimization study is an Akamas resource that can be managed via CLI using the resource management commands.
The Akamas UI shows optimization studies in 2 specific top-level menus: one for offline optimization studies and another for live optimization studies.
A metric is a measured property of a system.
Examples of a metric include:
the response time of an application
the utilization of a CPU
the amount of time spent in garbage collection
the cost of a cloud service
Metrics are used to both specify the optimization (e.g. minimize the heap size while keeping response time < 1000 and error rate <= 10% of a baseline value), and to assess the behavior of the system with respect to each specific configuration applied.
A metric is described by the following properties:
a name that uniquely identifies the metric
a description that clarifies the semantics of the metric
a unit that defines the unit of measurement used by the metric
The construct to be used to define a metric is described on the page.
Metrics are displayed in the Akamas UI when drilling down to each system component.
and are represented in metric charts for each specific optimization study
Please notice that in order for a metric to be displayed in the Akamas UI, it has to be collected from a by means of a specific defined for each specific target system.
A workspace is a virtual environment that groups systems, workflows, and studies to restrict user access to them: a user can access these resources only when granted the required permissions to that workspace.
Akamas defines two user roles according to the assigned permission on the workspace:
Contributors (write permission) can create and manage workspace resources (studies, telemetry instances, systems, and workflows) and can also do exports/imports, view all global resources (Optimization Packs, and Telemetry Providers), and see remaining credits;
Viewers (read permission) can only access optimization results but cannot create or modify workspace resources.
Workspaces and accesses are managed by users with administrative privileges. A user with administrator privileges can manage licenses, users, workspaces, and install/deinstall Optimization Packs, and Telemetry Providers.
Workspaces can be defined according to different criteria, such as:
By department (e.g. Performance, Development)
By initiative (e.g. Poc, Training)
By application (e.g. Registry, Banking..)
A workspace is described by the following property:
a name that uniquely identifies the workspace
A workspace is an Akamas resource that can be managed via CLI using the resource management commands. See also this page devoted to commands on how to define users and workspaces.
The workspace a study belongs to is always displayed. Filters can be used to select only studies belonging to specific workspaces
A KPI is a metric that is worth considering when analyzing the result of an offline optimization study, looking for (sub)optimal configurations generated by Akamas AI to be applied.
Akamas automatically considers any metric referred to in the defined optimization goal and constraints for an offline optimization study as a KPI. Moreover, any other metrics of the system component can be specified as a KPI for an offline optimization study.
A KPI is defined as follows (from the UI or the CLI):
Field name | Field description |
---|---|
KPIs are not an Akamas resource as they are defined as part of an optimization study. The construct to define KPIs is described on the KPIs page of the Study template section.
KPIs are not an Akamas resource and are always defined as part of an optimization study.
The number and first KPIs are displayed in the Akamas UI in the header of each offline optimization study.
The full list of KPIs is displayed by drilling down to the KPIs section.
From this section, it is possible to modify the list of KPIs and change their names and other attributes.
Name
Name of the KPI that will be used on UI labels
Formula
Must be defined as <Component_name>.<metric_name>
Direction
Must be 'minimize' or 'maximize'
Aggregation
A valid metric aggregation such as min, max, avg, sum, p95, etc. If unspecified, default is avg