Optimizing an Oracle Database server instance
In this example, we are going to tune the initialization parameters of an Oracle Database server instance in order to maximize its throughput while stressed by a load generator.
For the workload, we’ll use the OLTPBench's implementation of TPC-C, a popular transaction processing benchmarking suite, while to extract the metrics we are going to leverage the Oracle Prometheus exporter.
Environment setup
Environment
For the purpose of this experiment we are going to use two dedicated machines:
oraxe.mycompany.com, hosting a single Oracle 18c XE instance running inside a docker container (provisioned using the scripts on the official Oracle GitHub repository)
oltpbench.mycompany.com, that generates the workload using OLTPBench's TPC-C and will host the OracleDB Prometheus exporter instance
We assume to be working with Linux hosts
Prometheus and exporters
Install the OracleDB Prometheus exporter
The OracleDB Prometheus exporter publishes as metrics the results of the queries defined in the configuration file. In our case, we’ll use it to extract valuable performance metrics from Oracle’s Dynamic Performance (V$) Views.
We can spin up the exporter using the official Docker image using the following command, where cust-metrics.toml
is our custom metrics file:
docker run -d --name orabench_exporter --restart always \
-p 9161:9161 \
-v ~/oracledb_exporter/cust-metrics.toml:/cust-metrics.toml \
-e CUSTOM_METRICS=/cust-metrics.toml \
-e DATA_SOURCE_NAME='system/passwd@//oraxe.mycompany.com:1521/XE' \
iamseth/oracledb_exporter
The exporter will publish the metrics on the port 9161
.
Here’s the example metrics file used to run the exporter:
[[metric]]
context= "memory"
labels= [ "component" ]
metricsdesc= { size="Component memory extracted from v$memory_dynamic_components in Oracle." }
request = '''
SELECT component, current_size as "size"
FROM V$MEMORY_DYNAMIC_COMPONENTS
UNION
SELECT name, bytes as "size"
FROM V$SGAINFO
WHERE name in ('Free SGA Memory Available', 'Redo Buffers', 'Maximum SGA Size')
'''
[[metric]]
context = "activity"
metricsdesc = { value="Generic counter metric from v$sysstat view in Oracle." }
fieldtoappend = "name"
request = '''
SELECT name, value
FROM V$SYSSTAT WHERE name IN (
'execute count',
'user commits', 'user rollbacks',
'db block gets from cache', 'consistent gets from cache', 'physical reads cache', /* CACHE */
'redo log space requests'
)
'''
[[metric]]
context = "system_event"
labels = [ "event", "wait_class" ]
request = '''
SELECT
event, wait_class,
total_waits, time_waited
FROM V$SYSTEM_EVENT
'''
[metric.metricsdesc]
total_waits= "Total number of waits for the event as per V$SYSTEM_EVENT in Oracle."
time_waited= "Total time waited for the event (in hundredths of seconds) as per V$SYSTEM_EVENT in Oracle."
Install and configure Prometheus
You can check how to configure Prometheus here; by default, it will run on port 9090.
To configure the OracleDB exporter you can add the following snippet to the configuration file:
scrape_configs:
- job_name: oraxe-exporter
scrape_interval: 15s
static_configs:
- targets: [oltpbench.mycompany.com:9161]
relabel_configs:
- source_labels: [__address__]
regex: (.*)
target_label: instance
replacement: oraxe
Optimization setup
System
To model the system composed of the tuned database and the workload generator we need two different components:
An
oracle
component that represents the Oracle Database instance and maps directly to oraxe.mycompany.com.A
tpcc
component that represents the TPC-C workload from the OLTPBench suite and maps to oltpbench.mycompany.com.
For the tpcc
component, we’ll need first to define some custom metrics and a new component-type. The following is the definition of the metrics (tpcc-metrics.yaml
):
metrics:
- name: throughput
description: throughput
unit: requests/s
- name: resp_time
description: resp_time
unit: milliseconds
- name: resp_time_min
description: resp_time_min
unit: milliseconds
- name: resp_time90th
description: resp_time90th
unit: milliseconds
- name: resp_time_max
description: resp_time_max
unit: milliseconds
The following is the definition of the new component-type (tpcc-ctype.yaml
):
name: TPCC Benchmarck
description: OLTP TPCC Benchmarck
parameters: []
metrics:
- name: throughput
- name: resp_time
- name: resp_time_min
- name: resp_time90th
- name: resp_time_max
We can then create the new component type running the commands:
akamas create metrics tpcc-metrics.yaml
akamas create component-type tpcc-ctype.yaml
As a next step, we can proceed then with the definition of our system (system.yaml
):
name: oracle system
description: oracle system
Here’s the definition of our oracle
component (oracle.yaml
):
name: oracle
description: Oracle DB
componentType: Oracle Database 18c
properties:
instance: oraxe
connection:
user: system
password: passwd
dsn: oraxe.mycompany.com:1521/XE
hostname: oraxe.mycompany.com # needed to run docker restart
username: ubuntu
sshPort: 22
key: rsa_key_file
Here’s the definition of the tpcc
component (tpcc.yaml
):
name: tpcc
description: OLTP TPC-C load benchmarck
componentType: TPC-C Benchmarck
properties:
hostname: oltpbench.mycompany.com
username: ubuntu
sshPort: 22
key: rsa_key_file
We can create the system by running:
akamas create system system.yaml
We can then create the components by running:
akamas create component oracle.yaml 'oracle system'
akamas create component tpcc.yaml 'oracle system'
Telemetry
Prometheus
Since we are using Prometheus to extract the database metrics we can leverage the Prometheus provider, which already includes the queries needed for the Oracle metrics we need. To use the Prometheus provider we need to define a telemetry instance (prom.yaml
):
provider: Prometheus
config:
address: prometheus
port: 9090
We can now create the telemetry instance and attach it to our system by running:
akamas create telemetry-instance prom.yaml 'oracle system'
CSV
Other than the telemetry of the Oracle instance, we need also the metrics in the output CSVs from the TPC-C workload runs. To ingest these metrics we can leverage the CSV Provider, defining the following telemetry instance (csv.yaml
):
provider: csv
config:
address: oltpbench.mycompany.com
port: 22
username: ubuntu
protocol: scp
authType: key
auth: rsa_key_file
remoteFilePattern: /home/ubuntu/oltpbench/results/output.csv
componentColumn: component
timestampColumn: ts
timestampFormat: yyyy-MM-dd HH:mm:ss
metrics:
- metric: throughput
datasourceMetric: throughput
staticLabels: {}
- metric: resp_time
datasourceMetric: avg_lat
staticLabels: {}
- metric: resp_time_min
datasourceMetric: min_lat
staticLabels: {}
- metric: resp_time90th
datasourceMetric: 90th_lat
staticLabels: {}
- metric: resp_time_max
datasourceMetric: max_lat
staticLabels: {}
We can create the telemetry instance and attach it to our system by running:
akamas create telemetry-instance csv.yaml 'oracle system'
Workflow
Remove previous executions' data
Using an Executor operator we run a command to clean the results folder that may contain files from previous executions
name: Clean results
operator: Executor
arguments:
command: rm -f ~/oltpbench/results/*
component: tpcc
Configure the Oracle instance
We define a task that uses the OracleConfigurator operator to update the Oracle initialization parameters:
name: Update parameters
operator: OracleConfigurator
arguments:
component: oracle
Restart the instance
We define a task that uses the Executor operator that reboots the Oracle container for the parameters that need a restart to take effect:
name: Restart Oracle container
operator: Executor
arguments:
command: docker restart oraxe
component: oracle
Run the workload
We define a task that uses the Executor operator to launch the TPC-C benchmark against the Oracle instance:
name: Execute load test
operator: Executor
arguments:
command: cd ~/oltpbench ; ./oltpbenchmark --bench tpcc --config tpcc_conf.xml --execute=true -s 5 --output out
component: tpcc
Prepare test results
We define a workflow task that runs a script that parses the TPC-C output files and generates a file compatible with the CSV Provider:
name: Parse TPC-C results
operator: Executor
arguments:
command: cd ~/oltpbench ; ./tpcc_parse_csv.sh
component: tpcc
Where tpcc_parse_csv.sh
is the following script:
#!/bin/bash
OUTFILE=output.csv
COMP_NAME=tpcc
BASETS=`tail -n+2 results/out.csv | head -n1 | cut -d',' -f3`
echo 'component,ts,throughput,avg_lat,min_lat,90th_lat,max_lat' > $OUTFILE
awk -F, "BEGIN{OFS=\",\"} NR>1 {\$1=strftime(\"%F %T\", ${BASETS}+\$1); print \"${COMP_NAME}\",\$0}" < results/out.res | cut -d',' -f1-5,9,12 >> $OUTFILE
Complete workflow
By putting together all the tasks defined above we come up with the following workflow definition (workflow.yaml
):
name: oracle workflow
tasks:
- name: Clean results
operator: Executor
arguments:
command: rm -f ~/oltpbench/results/*
component: tpcc
- name: Update parameters
operator: OracleConfigurator
arguments:
component: oracle
- name: Restart Oracle container
operator: Executor
arguments:
command: docker restart oraxe
component: oracle
- name: Execute load test
operator: Executor
arguments:
command: cd ~/oltpbench ; ./oltpbenchmark --bench tpcc --config tpcc_conf.xml --execute=true -s 5 --output out
component: tpcc
- name: Parse TPC-C results
operator: Executor
arguments:
command: cd ~/oltpbench ; ./tpcc_parse_csv.sh
component: tpcc
We can create the workflow by running:
akamas create workflow workflow.yaml
Study
The objective of this study is to maximize the transaction throughput while stressed by the TPC-C load generator, and to achieve this goal the study will tune the size of the most important areas of the Oracle instance.
Goal
Here’s the definition of the goal of our study, which is to maximize the tpcc.throughput
metric:
goal:
objective: maximize
function:
formula: tpcc.throughput
Windowing
We define a window to consider only the data points after the ramp-up time of the load test:
windowing:
type: trim
trim: [4m, 1m]
task: Execute load test
Parameters to optimize
For this study, we are trying to achieve our goal by tuning the size of several areas in the memory of the database instance. In particular, we will tune the overall size of the Program Global Area (containing the work area of the active sessions) and the size of the components of the Shared Global Area.
The domains are configured to explore, for each parameter, the values around the default values.
parametersSelection:
- name: oracle.pga_aggregate_target
domain: [1128, 4512]
- name: oracle.db_cache_size
domain: [512, 6144]
- name: oracle.java_pool_size
domain: [1, 1024]
- name: oracle.large_pool_size
domain: [1, 256]
- name: oracle.log_buffer
domain: [2, 256]
- name: oracle.shared_pool_size
domain: [128, 1024]
- name: oracle.streams_pool_size
domain: [1, 1024]
Constraints
The following constraint allows the study to explore different size configurations without exceeding the maximum overall memory available for the instance:
parameterConstraints:
- name: Cap total memory to 10G
formula: oracle.db_cache_size + oracle.java_pool_size + oracle.large_pool_size + oracle.log_buffer + oracle.shared_pool_size + oracle.streams_pool_size + oracle.pga_aggregate_target < 10240
Steps
We are going to add to our study two steps:
A baseline step, in which we configure the default values for the memory parameters as discovered from previous manual executions.
An optimization step, where we perform 200 experiments to search the set of parameters that best satisfies our goal.
The baseline step contains some additional parameters (oracle.memory_target
, oracle.sga_target
) that are required by Oracle in order to disable the automatic management of the SGA components.
Here’s what these steps look like:
steps:
- name: baseline
type: baseline
values:
oracle.pga_aggregate_target: 1128
oracle.db_cache_size: 2496
oracle.java_pool_size: 16
oracle.large_pool_size: 16
oracle.log_buffer: 13
oracle.shared_pool_size: 640
oracle.streams_pool_size: 0
oracle.memory_target: 0
oracle.sga_target: 0
- name: optimization
type: optimize
numberOfExperiments: 200
maxFailedExperiments: 200
Complete study
Here’s the study definition (study.yaml
) for optimizing the Oracle instance:
name: Oracle: tune memory
description: Tune memory minimizing response
system: oracle system
workflow: oracle workflow
goal:
objective: maximize
function:
formula: throughput
variables:
throughput:
metric: tpcc.throughput
windowing:
type: trim
trim: [4m, 1m]
task: Execute load test
parametersSelection:
- name: oracle.pga_aggregate_target
domain: [1128, 4512]
- name: oracle.db_cache_size
domain: [1024, 6144]
- name: oracle.java_pool_size
domain: [1, 1024]
- name: oracle.large_pool_size
domain: [1, 256]
- name: oracle.log_buffer
domain: [2, 256]
- name: oracle.shared_pool_size
domain: [128, 1024]
- name: oracle.streams_pool_size
domain: [1, 1024]
parameterConstraints:
- name: Cap total memory to 10G
formula: oracle.db_cache_size + oracle.java_pool_size + oracle.large_pool_size + oracle.log_buffer + oracle.shared_pool_size + oracle.streams_pool_size + oracle.pga_aggregate_target < 10240
steps:
- name: baseline
type: baseline
values:
oracle.pga_aggregate_target: 1128
oracle.db_cache_size: 2496
oracle.java_pool_size: 16
oracle.large_pool_size: 16
oracle.log_buffer: 13
oracle.shared_pool_size: 640
oracle.streams_pool_size: 0
oracle.memory_target: 0
oracle.sga_target: 0
- name: optimization
type: optimize
numberOfExperiments: 200
maxFailedExperiments: 200
You can create the study by running:
akamas create study study.yaml
You can then start it by running:
akamas start study 'Oracle: tune memory'
Last updated
Was this helpful?