In this example, we are going to tune the initialization parameters of an Oracle Database server instance in order to maximize its throughput while stressed by a load generator.
For the workload, we’ll use the , a popular transaction processing benchmarking suite, while to extract the metrics we are going to leverage the .
Environment setup
Environment
For the purpose of this experiment we are going to use two dedicated machines:
, hosting a single Oracle 18c XE instance running inside a docker container (provisioned using the scripts on the )
, that generates the workload using and will host the instance
We assume to be working with Linux hosts
Prometheus and exporters
Install the OracleDB Prometheus exporter
The OracleDB Prometheus exporter publishes as metrics the results of the queries defined in the configuration file. In our case, we’ll use it to extract valuable performance metrics from Oracle’s Dynamic Performance (V$) Views.
We can spin up the exporter using the using the following command, where cust-metrics.toml is our custom metrics file:
The exporter will publish the metrics on port 9161.
Here’s the example metrics file used to run the exporter:
[[metric]]
context= "memory"
labels= [ "component" ]
metricsdesc= { size="Component memory extracted from v$memory_dynamic_components in Oracle." }
request = '''
SELECT component, current_size as "size"
FROM V$MEMORY_DYNAMIC_COMPONENTS
UNION
SELECT name, bytes as "size"
FROM V$SGAINFO
WHERE name in ('Free SGA Memory Available', 'Redo Buffers', 'Maximum SGA Size')
'''
[[metric]]
context = "activity"
metricsdesc = { value="Generic counter metric from v$sysstat view in Oracle." }
fieldtoappend = "name"
request = '''
SELECT name, value
FROM V$SYSSTAT WHERE name IN (
'execute count',
'user commits', 'user rollbacks',
'db block gets from cache', 'consistent gets from cache', 'physical reads cache', /* CACHE */
'redo log space requests'
)
'''
[[metric]]
context = "system_event"
labels = [ "event", "wait_class" ]
request = '''
SELECT
event, wait_class,
total_waits, time_waited
FROM V$SYSTEM_EVENT
'''
[metric.metricsdesc]
total_waits= "Total number of waits for the event as per V$SYSTEM_EVENT in Oracle."
time_waited= "Total time waited for the event (in hundredths of seconds) as per V$SYSTEM_EVENT in Oracle."
Install and configure Prometheus
In order to configure the OracleDB exporter you can add the following snippet to the configuration file:
In order to model the system composed of the tuned database and the workload generator we need two different components:
For the tpcc component, we’ll need first to define some custom metrics and a new component-type. The following is the definition of the metrics (tpcc-metrics.yaml):
The objective of this study is to maximize the transaction throughput while stressed by the TPC-C load generator, and to achieve this goal the study will tune the size of the most important areas of the Oracle instance.
Goal
Here’s the definition of the goal for our study, which is to maximize the tpcc.throughput metric:
We define a window in order to consider only the data points after the ramp-up time of the load test:
windowing:
type: trim
trim: [4m, 1m]
task: Execute load test
Parameters to optimize
For this study, we are trying to achieve our goal by tuning the size of several areas in the memory of the database instance. In particular, we will tune the overall size of the Program Global Area (containing the work area of the active sessions) and the size of the components of the Shared Global Area.
The domains are configured to explore, for each parameter, the values around the default values.
The following constraint allows the study to explore different size configurations without exceeding the maximum overall memory available for the instance:
parameterConstraints:
- name: Cap total memory to 10G
formula: oracle.db_cache_size + oracle.java_pool_size + oracle.large_pool_size + oracle.log_buffer + oracle.shared_pool_size + oracle.streams_pool_size + oracle.pga_aggregate_target < 10240
Steps
We are going to add to our study two steps:
A baseline step, in which we configure the default values for the memory parameters as discovered from previous manual executions.
An optimization step, where we perform 200 experiments to search the set of parameters that best satisfies our goal.
The baseline step contains some additional parameters (oracle.memory_target, oracle.sga_target) that are required by Oracle in order to disable the automatic management of the SGA components.
You can check how to configure Prometheus ; by default, it will run on port 9090.
An oracle component that represents the Oracle Database instance and maps directly to .
A tpcc component that represents the TPC-C workload from the OLTPBench suite and maps to .
Since we are using Prometheus to extract the database metrics we can leverage the , which already includes the queries needed for the Oracle metrics we need. To use the Prometheus provider we need to define a telemetry instance (prom.yaml):
Other than the telemetry of the Oracle instance, we need also the metrics in the output CSVs from the TPC-C workload runs. To ingest these metrics we can leverage the , defining the following telemetry instance (csv.yaml):
Using an we run a command to clean the results folder that may contain files from previous executions
We define a task that uses the to update the Oracle initialization parameters:
We define a task that uses the that reboots the Oracle container for the parameters that need a restart to take effect:
We define a task that uses the to launch the TPC-C benchmark against the Oracle instance:
We define a workflow task that runs a script that parses the TPC-C output files and generates a file compatible with the :