# Optimizing an Oracle Database server instance In this example, we are going to tune the initialization parameters of an Oracle Database server instance in order to maximize its throughput while stressed by a load generator. For the workload, we’ll use the [OLTPBench's implementation of TPC-C](https://github.com/oltpbenchmark/oltpbench), a popular transaction processing benchmarking suite, while to extract the metrics we are going to leverage the [Oracle Prometheus exporter](https://github.com/iamseth/oracledb_exporter). ## Environment setup ### Environment For the purpose of this experiment we are going to use two dedicated machines: * [oraxe.mycompany.com](http://ycsb.mycompany.com/), hosting a single Oracle 18c XE instance running inside a docker container (provisioned using the scripts on the [official Oracle GitHub repository](https://github.com/oracle/docker-images/tree/master/OracleDatabase/SingleInstance)) * [oltpbench.mycompany.com](http://oltpbench.mycompany.com/), that generates the workload using [OLTPBench's TPC-C](https://github.com/oltpbenchmark/oltpbench) and will host the [OracleDB Prometheus exporter](https://github.com/iamseth/oracledb_exporter) instance We assume to be working with Linux hosts ### Prometheus and exporters #### Install the OracleDB Prometheus exporter The OracleDB Prometheus exporter publishes as metrics the results of the queries defined in the configuration file. In our case, we’ll use it to extract valuable performance metrics from Oracle’s Dynamic Performance (V$) Views. We can spin up the exporter using the [official Docker image](https://github.com/iamseth/oracledb_exporter) using the following command, where `cust-metrics.toml` is our custom metrics file: ```bash docker run -d --name orabench_exporter --restart always \ -p 9161:9161 \ -v ~/oracledb_exporter/cust-metrics.toml:/cust-metrics.toml \ -e CUSTOM_METRICS=/cust-metrics.toml \ -e DATA_SOURCE_NAME='system/passwd@//oraxe.mycompany.com:1521/XE' \ iamseth/oracledb_exporter ``` The exporter will publish the metrics on the port `9161`. Here’s the example metrics file used to run the exporter: ```toml [[metric]] context= "memory" labels= [ "component" ] metricsdesc= { size="Component memory extracted from v$memory_dynamic_components in Oracle." } request = ''' SELECT component, current_size as "size" FROM V$MEMORY_DYNAMIC_COMPONENTS UNION SELECT name, bytes as "size" FROM V$SGAINFO WHERE name in ('Free SGA Memory Available', 'Redo Buffers', 'Maximum SGA Size') ''' [[metric]] context = "activity" metricsdesc = { value="Generic counter metric from v$sysstat view in Oracle." } fieldtoappend = "name" request = ''' SELECT name, value FROM V$SYSSTAT WHERE name IN ( 'execute count', 'user commits', 'user rollbacks', 'db block gets from cache', 'consistent gets from cache', 'physical reads cache', /* CACHE */ 'redo log space requests' ) ''' [[metric]] context = "system_event" labels = [ "event", "wait_class" ] request = ''' SELECT event, wait_class, total_waits, time_waited FROM V$SYSTEM_EVENT ''' [metric.metricsdesc] total_waits= "Total number of waits for the event as per V$SYSTEM_EVENT in Oracle." time_waited= "Total time waited for the event (in hundredths of seconds) as per V$SYSTEM_EVENT in Oracle." ``` #### Install and configure Prometheus You can check how to configure Prometheus [here](https://prometheus.io/docs/introduction/first_steps/); by default, it will run on port 9090. To configure the OracleDB exporter you can add the following snippet to the configuration file: ```yaml scrape_configs: - job_name: oraxe-exporter scrape_interval: 15s static_configs: - targets: [oltpbench.mycompany.com:9161] relabel_configs: - source_labels: [__address__] regex: (.*) target_label: instance replacement: oraxe ``` ## Optimization setup ## System To model the system composed of the tuned database and the workload generator we need two different components: * An `oracle` component that represents the Oracle Database instance and maps directly to [oraxe.mycompany.com](http://oraxe.mycompany.com/). * A `tpcc` component that represents the TPC-C workload from the OLTPBench suite and maps to [oltpbench.mycompany.com](http://oltpbench.mycompany.com/). For the `tpcc` component, we’ll need first to define some custom metrics and a new component-type. The following is the definition of the metrics (`tpcc-metrics.yaml`): ```yaml metrics: - name: throughput description: throughput unit: requests/s - name: resp_time description: resp_time unit: milliseconds - name: resp_time_min description: resp_time_min unit: milliseconds - name: resp_time90th description: resp_time90th unit: milliseconds - name: resp_time_max description: resp_time_max unit: milliseconds ``` The following is the definition of the new component-type (`tpcc-ctype.yaml`): ```yaml name: TPCC Benchmarck description: OLTP TPCC Benchmarck parameters: [] metrics: - name: throughput - name: resp_time - name: resp_time_min - name: resp_time90th - name: resp_time_max ``` We can then create the new component type running the commands: ```bash akamas create metrics tpcc-metrics.yaml akamas create component-type tpcc-ctype.yaml ``` As a next step, we can proceed then with the definition of our system (`system.yaml`): ```yaml name: oracle system description: oracle system ``` Here’s the definition of our `oracle` component (`oracle.yaml`): ```yaml name: oracle description: Oracle DB componentType: Oracle Database 18c properties: instance: oraxe connection: user: system password: passwd dsn: oraxe.mycompany.com:1521/XE hostname: oraxe.mycompany.com # needed to run docker restart username: ubuntu sshPort: 22 key: rsa_key_file ``` Here’s the definition of the `tpcc` component (`tpcc.yaml`): ```yaml name: tpcc description: OLTP TPC-C load benchmarck componentType: TPC-C Benchmarck properties: hostname: oltpbench.mycompany.com username: ubuntu sshPort: 22 key: rsa_key_file ``` We can create the system by running: ```bash akamas create system system.yaml ``` We can then create the components by running: ```bash akamas create component oracle.yaml 'oracle system' akamas create component tpcc.yaml 'oracle system' ``` ## Telemetry ### Prometheus Since we are using Prometheus to extract the database metrics we can leverage the [Prometheus provider](https://docs.akamas.io/akamas-docs/3.6/integrating/integrating-telemetry-providers/prometheus-provider), which already includes the queries needed for the Oracle metrics we need. To use the Prometheus provider we need to define a telemetry instance (`prom.yaml`): ```yaml provider: Prometheus config: address: prometheus port: 9090 ``` We can now create the telemetry instance and attach it to our system by running: ```bash akamas create telemetry-instance prom.yaml 'oracle system' ``` ### CSV Other than the telemetry of the Oracle instance, we need also the metrics in the output CSVs from the TPC-C workload runs. To ingest these metrics we can leverage the [CSV Provider](https://docs.akamas.io/akamas-docs/3.6/integrating/integrating-telemetry-providers/csv-provider), defining the following telemetry instance (`csv.yaml`): ```yaml provider: csv config: address: oltpbench.mycompany.com port: 22 username: ubuntu protocol: scp authType: key auth: rsa_key_file remoteFilePattern: /home/ubuntu/oltpbench/results/output.csv componentColumn: component timestampColumn: ts timestampFormat: yyyy-MM-dd HH:mm:ss metrics: - metric: throughput datasourceMetric: throughput staticLabels: {} - metric: resp_time datasourceMetric: avg_lat staticLabels: {} - metric: resp_time_min datasourceMetric: min_lat staticLabels: {} - metric: resp_time90th datasourceMetric: 90th_lat staticLabels: {} - metric: resp_time_max datasourceMetric: max_lat staticLabels: {} ``` We can create the telemetry instance and attach it to our system by running: ```bash akamas create telemetry-instance csv.yaml 'oracle system' ``` ### Workflow #### Remove previous executions' data Using an [Executor operator](https://docs.akamas.io/akamas-docs/3.6/reference/workflow-operators/executor-operator) we run a command to clean the results folder that may contain files from previous executions ```yaml name: Clean results operator: Executor arguments: command: rm -f ~/oltpbench/results/* component: tpcc ``` #### Configure the Oracle instance We define a task that uses the [OracleConfigurator operator](https://docs.akamas.io/akamas-docs/3.6/reference/workflow-operators/oracleconfigurator-operator) to update the Oracle initialization parameters: ```yaml name: Update parameters operator: OracleConfigurator arguments: component: oracle ``` #### Restart the instance We define a task that uses the [Executor operator](https://docs.akamas.io/akamas-docs/3.6/reference/workflow-operators/executor-operator) that reboots the Oracle container for the parameters that need a restart to take effect: ```yaml name: Restart Oracle container operator: Executor arguments: command: docker restart oraxe component: oracle ``` #### Run the workload We define a task that uses the [Executor operator](https://docs.akamas.io/akamas-docs/3.6/reference/workflow-operators/executor-operator) to launch the TPC-C benchmark against the Oracle instance: ```yaml name: Execute load test operator: Executor arguments: command: cd ~/oltpbench ; ./oltpbenchmark --bench tpcc --config tpcc_conf.xml --execute=true -s 5 --output out component: tpcc ``` #### Prepare test results We define a workflow task that runs a script that parses the TPC-C output files and generates a file compatible with the [CSV Provider](https://docs.akamas.io/akamas-docs/3.6/integrating/integrating-telemetry-providers/csv-provider): ```yaml name: Parse TPC-C results operator: Executor arguments: command: cd ~/oltpbench ; ./tpcc_parse_csv.sh component: tpcc ``` Where `tpcc_parse_csv.sh` is the following script: ```bash #!/bin/bash OUTFILE=output.csv COMP_NAME=tpcc BASETS=`tail -n+2 results/out.csv | head -n1 | cut -d',' -f3` echo 'component,ts,throughput,avg_lat,min_lat,90th_lat,max_lat' > $OUTFILE awk -F, "BEGIN{OFS=\",\"} NR>1 {\$1=strftime(\"%F %T\", ${BASETS}+\$1); print \"${COMP_NAME}\",\$0}" < results/out.res | cut -d',' -f1-5,9,12 >> $OUTFILE ``` #### Complete workflow By putting together all the tasks defined above we come up with the following workflow definition (`workflow.yaml`): ```yaml name: oracle workflow tasks: - name: Clean results operator: Executor arguments: command: rm -f ~/oltpbench/results/* component: tpcc - name: Update parameters operator: OracleConfigurator arguments: component: oracle - name: Restart Oracle container operator: Executor arguments: command: docker restart oraxe component: oracle - name: Execute load test operator: Executor arguments: command: cd ~/oltpbench ; ./oltpbenchmark --bench tpcc --config tpcc_conf.xml --execute=true -s 5 --output out component: tpcc - name: Parse TPC-C results operator: Executor arguments: command: cd ~/oltpbench ; ./tpcc_parse_csv.sh component: tpcc ``` We can create the workflow by running: ```bash akamas create workflow workflow.yaml ``` ### Study The objective of this study is to maximize the transaction throughput while stressed by the TPC-C load generator, and to achieve this goal the study will tune the size of the most important areas of the Oracle instance. #### Goal Here’s the definition of the goal of our study, which is to maximize the `tpcc.throughput` metric: ```yaml goal: objective: maximize function: formula: tpcc.throughput ``` #### Windowing We define a window to consider only the data points after the ramp-up time of the load test: ```yaml windowing: type: trim trim: [4m, 1m] task: Execute load test ``` #### Parameters to optimize For this study, we are trying to achieve our goal by tuning the size of several areas in the memory of the database instance. In particular, we will tune the overall size of the *Program Global Area* (containing the work area of the active sessions) and the size of the components of the *Shared Global Area*. The domains are configured to explore, for each parameter, the values around the default values. ```yaml parametersSelection: - name: oracle.pga_aggregate_target domain: [1128, 4512] - name: oracle.db_cache_size domain: [512, 6144] - name: oracle.java_pool_size domain: [1, 1024] - name: oracle.large_pool_size domain: [1, 256] - name: oracle.log_buffer domain: [2, 256] - name: oracle.shared_pool_size domain: [128, 1024] - name: oracle.streams_pool_size domain: [1, 1024] ``` #### Constraints The following constraint allows the study to explore different size configurations without exceeding the maximum overall memory available for the instance: ```yaml parameterConstraints: - name: Cap total memory to 10G formula: oracle.db_cache_size + oracle.java_pool_size + oracle.large_pool_size + oracle.log_buffer + oracle.shared_pool_size + oracle.streams_pool_size + oracle.pga_aggregate_target < 10240 ``` #### Steps We are going to add to our study two steps: * A baseline step, in which we configure the default values for the memory parameters as discovered from previous manual executions. * An optimization step, where we perform 200 experiments to search the set of parameters that best satisfies our goal. The baseline step contains some additional parameters (`oracle.memory_target`, `oracle.sga_target`) that are required by Oracle in order to disable the automatic management of the SGA components. Here’s what these steps look like: ```yaml steps: - name: baseline type: baseline values: oracle.pga_aggregate_target: 1128 oracle.db_cache_size: 2496 oracle.java_pool_size: 16 oracle.large_pool_size: 16 oracle.log_buffer: 13 oracle.shared_pool_size: 640 oracle.streams_pool_size: 0 oracle.memory_target: 0 oracle.sga_target: 0 - name: optimization type: optimize numberOfExperiments: 200 maxFailedExperiments: 200 ``` #### Complete study Here’s the study definition (`study.yaml`) for optimizing the Oracle instance: ```yaml name: Oracle: tune memory description: Tune memory minimizing response system: oracle system workflow: oracle workflow goal: objective: maximize function: formula: throughput variables: throughput: metric: tpcc.throughput windowing: type: trim trim: [4m, 1m] task: Execute load test parametersSelection: - name: oracle.pga_aggregate_target domain: [1128, 4512] - name: oracle.db_cache_size domain: [1024, 6144] - name: oracle.java_pool_size domain: [1, 1024] - name: oracle.large_pool_size domain: [1, 256] - name: oracle.log_buffer domain: [2, 256] - name: oracle.shared_pool_size domain: [128, 1024] - name: oracle.streams_pool_size domain: [1, 1024] parameterConstraints: - name: Cap total memory to 10G formula: oracle.db_cache_size + oracle.java_pool_size + oracle.large_pool_size + oracle.log_buffer + oracle.shared_pool_size + oracle.streams_pool_size + oracle.pga_aggregate_target < 10240 steps: - name: baseline type: baseline values: oracle.pga_aggregate_target: 1128 oracle.db_cache_size: 2496 oracle.java_pool_size: 16 oracle.large_pool_size: 16 oracle.log_buffer: 13 oracle.shared_pool_size: 640 oracle.streams_pool_size: 0 oracle.memory_target: 0 oracle.sga_target: 0 - name: optimization type: optimize numberOfExperiments: 200 maxFailedExperiments: 200 ``` You can create the study by running: ```bash akamas create study study.yaml ``` You can then start it by running: ```bash akamas start study 'Oracle: tune memory' ```