Optimizing cost of a Kubernetes microservice while preserving SLOs in production
In this example, you will use Akamas live optimization to minimize the cost of a Kubernetes deployment, while preserving application performance and reliability requirements.
Prerequisites
In this example, you need:
an Akamas instance
a Kubernetes cluster, with a deployment to be optimized
the kubectl command installed in the Akamas instance, configured to access the target Kubernetes and with privileges to get and update the deployment configurations
a supported telemetry data source (e.g. Prometheus or Dynatrace) configured to collect metrics from the target Kubernetes cluster
Optimization setup
Optimization packs
This example leverages the following optimization packs:
A Kubernetes container component, which contains container-level metrics like CPU usage and parameters to be tuned like CPU limits
A Web Application component, which contains service-level metrics like throughput and response time
In this example, we assume the deployment to be optimized is called frontend, with a container named server, and is located within the boutique namespace. We also assume that Dynatrace is used as a telemetry provider.
Kubernetes component
Create a component-container.yaml manifest like the following:
name:containerdescription:Kubernetes container, part of the frontend deploymentcomponentType:Kubernetes Containerproperties:dynatrace:type:CONTAINER_GROUP_INSTANCEkubernetes:namespace:boutiquecontainerName:serverbasePodName:frontend-*
Now create a component-webapp.yaml manifest like the following:
name:webappdescription:The service related to the frontend deploymentcomponentType:Web Applicationproperties:dynatrace:id:<TELEMETRY_DYNATRACE_WEBAPP_ID>
the goal is to reduce the cost of the Kubernetes deployment. In this example, the cost is based on the amount of CPU and memory limits (assuming requests = limits).
the approval mode is set to manual, a new recommendation is generated daily
to avoid impacting application performance, constraints are specified on desired response times and error rates
to avoid impacting application reliability, constraints are specified on peak resource usage and out-of-memory kills
the parameters to be tuned are the container CPU and memory limits (we assume requests=limits in the deployment file)