In this example, you will go through the optimization of a Spark based PageRank algorithm on AWS instances. We’ll be using a PageRank implementation included in Renaissance, an industry-standard Java benchmarking suite developed by Oracle Labs, tweaking both Java and AWS parameters to improve the performance of our application.
Environment setup
For this example, you’re expected to use two dedicated machines:
an Akamas instance
a Linux-based AWS EC2 instance
The Akamas instance requires provisioning and manipulating instances, therefore it requires to be enabled to do so by setting AWS Policies, integrating with orchestration tools (such as Ansible) and an inventory linked to your AWS EC2 environment.
The Linux-based instance will run the application benchmark, so it requires the latest open-jdk11 release
sudoaptinstallopenjdk-11-jre
Telemetry Infrastructure setup
For this study you’re going to require the following telemetry providers:
CSV Provider to parse the results of the benchmark
In the same folder upload the template file launch.benchmark.sh.temp, containing the script that executes the benchmark using the provided parameters and parses the results:
Create a component-ec2.yaml file like the following:
name:instancedescription:The ec2 instance the benchmark runs oncomponentType:ec2properties:hostname:renaissance.akamas.iosshPort:22instance:ec2_instanceusername:ubuntukey:# SSH KEYec2:region:us-east-2# This is just a reference
Then create its resource by typing in your terminal:
The workflow in this example is composed by three main steps:
Update the instance type
Run the application benchmark
Stop the instance
To manage the instance we are going to integrate a very simple Ansible in our workflow: the FileConfigurator operator will replace the parameters in the template file in order to generate the code run by the Executor operator, as explained in the Ansible page.
In detail:
Update the instance size
Generate the the playbook file from the template
Update the instance using the playbook
Wait for the instance to be available
Run the application benchmark
Configure the benchmark Java launch script
Execute the launch script
Parse PageRank output to make it consumable by the CSV telemetry instance
Stop the instance
Configure the playbook to stop an instance with a specific instance id
Run the playbook to stop the instance
The following is the template of the Ansible playbook:
# Change instance type, requires AWS CLI- name:Resize the instancehosts:localhostgather_facts:noconnection:localtasks: - name:save instance infoec2_instance_info:filters:"tag:Name":<your-instance-name>register:ec2 - name:Stop the instanceec2:region:<your-aws-region>state:stoppedinstance_ids: - "{{ ec2.instances[0].instance_id }}"instance_type:"{{ ec2.instances[0].instance_type }}"wait:True - name:Change the instances ec2 typeshell:> aws ec2 modify-instance-attribute --instance-id "{{ ec2.instances[0].instance_id }}" --instance-type "${ec2.aws_ec2_instance_type}.${ec2.aws_ec2_instance_size}"delegate_to:localhost - name:restart the instanceec2:region:<your-aws-region>state:runninginstance_ids: - "{{ ec2.instances[0].instance_id }}"wait:Trueregister:ec2 - name:wait for SSH to come upwait_for:host:"{{ item.public_dns_name }}"port:22delay:60timeout:320state:startedwith_items:"{{ ec2.instances }}"
The following is the workflow configuration file:
name:Pagerank AWS optimizationtasks:# Creating the EC2 instance - name:Configure provisioningoperator:FileConfiguratorarguments:sourcePath:/home/ubuntu/ansible/resize.yaml.templtargetPath:/home/ubuntu/ansible/resize.yamlhost:hostname:bastion.akamas.iousername:ubuntukey:# SSH KEY - name:Execute Provisioningoperator:Executorarguments:command:ansible-playbook /home/akamas/ansible/resize.yamlhost:hostname:bastion.akamas.iousername:akamaskey:# SSH KEY# Waiting for the instance to come up and set up its DNS - name:Pauseoperator:Sleeparguments:seconds:120# Running the benchmark - name:Configure Benchmarkoperator:FileConfiguratorarguments:source:hostname:renaissance.akamas.iousername:ubuntupath:/home/ubuntu/renaissance/launch_benchmark.sh.templkey:# SSH KEYtarget:hostname:renaissance.akamas.iousername:ubuntupath:/home/ubuntu/renaissance/launch_benchmark.shkey:# SSH KEY - name:Launch Benchmarkoperator:Executorarguments:command:bash /home/ubuntu/renaissance/launch_benchmark.shhost:hostname:renaissance.akamas.iousername:ubuntukey:# SSH KEYCreate the workflow resource by typing in your terminal:
Telemetry
If you have not installed the Prometheus telemetry provider or the CSV telemetry provider yet, take a look at the telemetry provider pages Prometheus provider and CSV Provider to proceed with the installation.
Prometheus
Prometheus allows us to gather jvm execution metrics through the jmx exporter: download the java agent required to gather metrics from here, then update the two following files:
The prometheus.yml file, located in your Prometheus folder:
# my global configglobal:scrape_interval:15s# Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval:15s# Evaluate rules every 15 seconds. The default is every 1 minute.# A scrape configuration containing exactly one endpoint to scrape:# Here it's Prometheus itself.scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name:prometheusstatic_configs: - targets: ['localhost:9090'] - job_name:jmxstatic_configs: - targets: ["localhost:9110"]relabel_configs: - source_labels: ["__address__"]regex:"(.*):.*"target_label:instancereplacement:jmx_instanc
The config.yml file you have to create in the ~/renaissance folder:
startDelaySeconds:0username:password:ssl:falselowercaseOutputName:falselowercaseOutputLabelNames:false# using the property above we are telling the export to export only relevant java metricswhitelistObjectNames: - "java.lang:*" - "jvm:*"
Now you can create a prometheus-instance.yaml file:
Then create the resource by typing in your terminal:
akamas create telemetry-instance renaissance
Study
Here we provide a reference study for AWS.
As we’ve anticipated, the goal of this study is to optimize a sample java application, the PageRank benchmark you may find in the renaissance benchmark suite by Oracle.
Our goal is rather simple: minimizing the product between the benchmark execution time and the instance price, that is, finding the most cost-effective instance for our application.
Create a study.yaml file with the following content:
name:awsdescription:Tweaking aws and the JVM to optimize the page-rank application.system:renaissancegoal:objective:minimizefunction:formula:benchmark.elapsed * aws.aws_ec2_priceworkflow:workflow-awsparametersSelection: - name:aws.aws_ec2_instance_typecategories: [c5,c5d,c5a,m5,m5d,m5a,r5,r5d,r5a] - name:aws.aws_ec2_instance_sizecategories: [large,xlarge,2xlarge,4xlarge] - name:jvm.jvm_gcType - name:jvm.jvm_newSize - name:jvm.jvm_maxHeapSize - name:jvm.jvm_minHeapSize - name:jvm.jvm_survivorRatio - name:jvm.jvm_maxTenuringThresholdsteps: - name:baselinetype:baselinenumberOfTrials:2values:aws.aws_ec2_instance_type:c5aws.aws_ec2_instance_size:2xlargejvm.jvm_gcType:G1 - name:optimizetype:optimizenumberOfExperiments:60
Then create the corresponding Akamas resource and start the study: