The following best practices should be considered before applying a configuration identified by an offline optimization study from a test or pre-production environment to a production environment.
Most of these best practices are general and refer to any configuration change and application rollout, not only to Akamas-related scenarios.
Any configuration identified by Akamas in a test or pre-production environment, by executing a number of experiments and trials in a limited timeframe, should be first validated before being promoted to production in its ability to consistently deliver the expected performance over time.
An endurance test typically lasts for several hours and can either mimic the specific load profile of production environments (e.g. morning peaks or low load phases during the night) or a simple constant high load (flat load). A specific Akamas study can be implemented for this purpose.
When applying a new configuration to a production environment it is important to reduce the risk of severely impacting the supported services and allowing time to backtrack if required.
With a gradual rollout approach, a new configuration is applied to only a subset of the target system to allow the system to be observed for a period of time and avoid impacting the entire.
Several strategies are possible, including:
Canary deployment, where a small percentage of the traffic is served by the instance with the new configuration;
Shadow traffic, where traffic is mirrored and redirected to the instance with the new configuration, and responses are not impacting the user.
In the case of an application sharing entire layers or single components (e.g. microservices) with other applications, it is important to assess in advance the potential impact on other applications before applying a configuration identified by only considering SLOs related to a single application.
The following general considerations may help in assessing the impact on the infrastructure:
if the new configuration is more efficient (i.e. it is less demanding in terms of resources) or it does require changes to resource requirements (e.g. does not change K8s request limits), then the configuration can be expected to be beneficial as the resources will be freed and become available for additional applications;
If the new configuration is less efficient (i.e. it requires more resources), then appropriate checks of whether the additional capacity is available in the infrastructure (e.g. in the K8s cluster or namespace) should be done, as when allocating new applications.
As far as the other applications are concerned:
Just reducing the operational cost of a service does not have any impact on other applications that are calling or using the service;
While tuning service for performance may put the caller system in back-pressure fatigue, this is not the typical behavior of enterprise systems, where the most susceptible systems are on the backend side:
Tuning most external services will not increase the throughput much, which is typically business-driven, thus the risk to overwhelm the backends is low;
Tuning the backends allows the caller systems to handle faster connections, thus reducing the memory footprint and increasing the resilience of the entire system;
Especially in the case of highly distributed systems, such as microservices, the number of inflight packages for a given period of time is something to be minimized;
A latency reduction for a microservice implies fewer in-flight packages throughout the system, leading to better performance, faster failures, and fewer pending transactions to be rolled back in case of incidents.