1 of 5

Prometheus provider

The Prometheus provider collects metrics from a Prometheus instance and makes them available to Akamas.

This provider includes support for several technologies (Prometheus exporters). In any case, custom queries can be defined to gather the desired metrics.

Prerequisites

This section provides the minimum requirements that you should match before using the Prometheus provider.

Supported Prometheus versions:

Akamas supports Prometheus starting from version2.26.

Using also theprometheus-operator requires Prometheus 0.47 or greater. This version is bundled with the kube-prometheus-stack since version 15.

Connectivity between the Akamas server and the Prometheus server is also required. By default, Prometheus is run on port 9090.

Supported Prometheus exporters

Node exporter (Linux system metrics)
JMX exporter (Java metrics)
cAdvisor (Docker container metrics)
CloudWatch exporter (AWS resources metrics)
Jmeter (Web application metrics)

The Prometheus provider includes queries for most of the monitoring use cases these exporters cover. If you need to specify custom queries or make use of exporters not currently supported you can specify them as described in creating Prometheus telemetry instances.

Supported Akamas component types

Kubernetes (Pod, Container, Workload, Namespace)
Web Application
Java (java-ibm-j9vm-6, java-ibm-j9vm-8, java-eclipse-openj9-11, java-openjdk-8, java-openjdk-11)
Linux (Ubuntu-16.04, Rhel-7.6)

Refer to Prometheus provider metrics mapping to see how component-type metrics are extracted by this provider.

Component configuration

Akamas reasons in terms of a system to be optimized and in terms of parameters and metrics of components of that system. To understand which metrics collected from Prometheus should be mapped to a component, the Prometheus provider looks up some properties in the components of a system grouped under prometheus property. These properties depend on the exporter and the component type.

Nested under this property you can also include any additional field your use case may require to filter the imported metrics further. These fields will be appended in queries to the list of label matches in the form field_name=~'field_value', and can specify either exact values or patterns.

Notice: you should configure your Prometheus instances so that the Prometheus provider can leverage the instance property of components, as described in the Setup datasource section here above.

It is important that you add instance and, optionally, the job properties to the components of a system so that the Prometheus provider can gather metrics from them:

# Specification for a component, whose metrics should be collected by the Prometheus Provider
name: jvm1  # name of the component
description: jvm1 for payment services  # description of the component
properties:
  prometheus:
    instance: service0001  # instance of the component: where the component is located relative to Prometheus
    job: jmx               # job of the component: which prom exporter is gathering metrics from the component

Prometheus configuration

The Prometheus provider does not usually require a specific configuration of the Prometheus instance it uses.

When gathering metrics for hosts it's usually convenient to set the value of the instance label so that it matches the value of the instance property in a component; in this way, the Prometheus provider knows which system component each data point refers to.

Here’s an example configuration for Prometheus that sets the instance label:

# Custom global config
global:
  scrape_interval:     5s   # Set the scrape interval to every 15 seconds. The default is every 1 minute.
  evaluation_interval: 5s   # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# A scrape configuration containing exactly one endpoint to scrape:
scrape_configs:
# Node Exporter
- job_name: 'node'
  static_configs:
  - targets: ["localhost:9100"]
  relabel_configs:
  - source_labels: ["__address__"]
    regex: "(.*):.*"
    target_label: instance
    replacement: value_of_instance_property_in_the_component_the_data_points_should_refer_to

Install Prometheus provider

To install the Prometheus provider, create a YAML file (provider.yml in this example) with the definition of the provider:

name: Prometheus
description: Telemetry Provider that enables to import of metrics from Prometheus
dockerImage: 485790562880.dkr.ecr.us-east-2.amazonaws.com/akamas/telemetry-providers/prometheus-provider:3.4.2

Then you can install the provider using the Akamas CLI:

akamas install telemetry-provider provider.yml

The installed provider is shared with all users of your Akamas installation and can monitor many different systems, by configuring appropriate telemetry provider instances.

Create Prometheus telemetry instances

To create an instance of the Prometheus provider, edit a YAML file (instance.yml in this example) with the definition of the instance:

# Prometheus Telemetry Provider Instance
provider: Prometheus

config:
  address: host1  # URL or IP of the Prometheus from which extract metrics
  port: 9090      # Port of the Prometheus from which extract metrics

Then you can create the instance for the system using the Akamas CLI:

akamas create telemetry-instance instance.yml system

Configuration options

When you create an instance of the Prometheus provider, you should specify some configuration information to allow the provider to extract and process metrics from Prometheus correctly.

You can specify configuration information within the config part of the YAML of the instance definition.

Required properties

address, a URL or IP identifying the address of the host where Prometheus is installed
port, the port exposed by Prometheus

Optional properties

user, the username for the Prometheus service
password, the user password for the Prometheus service
job, a string to specify the scraping job name. The default is ".*" for all scraping jobs
logLevel, set this to "DETAILED" for some extra logs when searching for metrics (default value is "INFO")
headers, to specify additional custom headers e.g: headers: "custom_key": "custom_value"
namespace, a string to specify the namespace
duration, integer to determine the duration in seconds for data collection (use a number between 1 and 3600)
enableHttps, boolean to enable HTTPS in Prometheus (since 3.2.6)
ignoreCertificates, boolean to ignore SSL certificates
disableConnectionCheck, boolean to disable initial connection check to Prometheus

Custom queries

The Prometheus provider allows defining additional queries to populate custom metrics or redefine the default ones according to your use case. You can configure additional metrics using the metrics field as shown in the configuration below:

config:
  address: host1
  port: 9090

metrics:
  - metric: cust_metric   # extra akamas metric to monitor
    datasourceMetric: 'http_requests_total{environment=~"staging|testing|development", method!="GET"}' # query to execute to extract the metric
    labels:
    - method   # The "method" label will be retained within akamas

In this example, the telemetry instance will populate cust_metric with the results of the query specified in datasource, maintaining the value of the labels listed under labels.

Please refer to Querying basics | Prometheus for a complete reference of PromQL

Akamas placeholders

Akamas pre-processes the queries before running them, replacing special-purpose placeholders with the fields provided in the components. For example, given the following component definition:

name: jvm1
description: jvm1 for payment services
properties:
  prometheus:
    instance: service01
    job: jmx

the query sum(jvm_memory_used_bytes{instance=~"$INSTANCE$", job=~"$JOB$"}) will be expanded for this component into sum(jvm_memory_used_bytes{instance=~"service01", job=~"jmx"}). This provides greater flexibility through the templatization of the queries, allowing the same query to select the correct data sources for different components.

The following is the list of available placeholders:

Example

prometheus:
  instance: frontend
  job: node

Use cases

This section reports common use cases addressed by this provider.

Collect Kubernetes metrics

To gather kubernetes metrics, the following exporters are required:

kube-state-metrics
cadvisor

As an example, you can define a component with type Kubernetes Container in this way:

name: adservice
description: The adservice of the online boutique by Google
componentType: Kubernetes Container
properties:
  prometheus:
    namespace: boutique
    pod: adservice.*
    container: server

Collect Java metrics

Check Java OpenJDK page for a list of all the Java metrics available in Akamas

You can leverage the Prometheus provider to collect Java metrics by using the JMX Exporter. The JMX Exporter is a collector of Java metrics for Prometheus that can be run as an agent for any Java application. Once downloaded, you execute it alongside a Java application with this command:

java -javaagent:the_downloaded_jmx_exporter_jar.jar=9100:config.yaml -jar yourJar.jar

The command will expose on localhost on port 9100 Java metrics of youJar.jar __ which can be scraped by Prometheus.

config.yaml is a configuration file useful for the activity of this exporter. It is suggested to use this configuration for an optimal experience with the Prometheus provider:

startDelaySeconds: 0
username:
password:
ssl: false
lowercaseOutputName: false
lowercaseOutputLabelNames: false
# using the property above we are telling the export to export only relevant java metrics
whitelistObjectNames:
- "java.lang:*"
- "jvm:*"

As a next step, add a new scraping target in the configuration of the Prometheus used by the provider:

...
scrape_configs:
# JMX Exporter
- job_name: "jmx"
  static_configs:
  - targets: ["jmx_exporter_host:9100"]

You can then create a YAML file with the definition of a telemetry instance (prom_instance.yml) of the Prometheus provider:

name: Prometheus
config:
  address: prometheus_host
  port: 9090

And you can create the telemetry instance using the Akamas CLI:

akamas create telemetry-instance prom_instance.yml

Finally, to bind the extracted metrics to the related component, you should add the following field to the properties of the component’s definition:

prometheus:
  job: jmx

Collect system metrics

Check the Linux page for a list of all the system metrics available in Akamas

You can leverage the Prometheus provider to collect system metrics (Linux) by using the Node exporter. The Node exporter is a collector of system metrics for Prometheus that can be run as a standalone executable or a service within a Linux machine to be monitored. Once downloaded, schedule it as a service using, for example, systemd:

systemctl start node_exporter

Here’s the manifest of the node_exporter service:

[Unit]
Description=Node Exporter

[Service]
ExecStart=/path/to/node_exporter/executable

[Install]
WantedBy=default.target

The service will expose on localhost on port 9100 system metrics __ which can be scraped by Prometheus.

As a final step, add a new scraping target in the configuration of the Prometheus used by the provider:

scrape_configs:
# Node Exporter
- job_name: "node"
  static_configs:
  - targets: ["node_exporter_host:9100"]
  relabel_configs:
  - source_labels: ["__address__"]
    regex: "(.*):.*"
    # here we put as "instance", the name of the component the metrics refer to
    target_label: "instance"
    replacement: "linux_component_name"

You can then create a YAML file with the definition of a telemetry instance (prom_instance.yml) of the Prometheus provider:

provider: Prometheus
config:
  address: prometheus_host
  port: 9090

And you can create the telemetry instance using the Akamas CLI:

akamas create telemetry-instance prom_instance.yml

Finally, to bind the extracted metrics to the related component, you should add the following field to the properties of the component’s definition:

prometheus:
  instance: linux_component_name
  job: node

CloudWatch Exporter

This page describes how to set up a CloudWatch exporter in order to gather AWS metrics through the Prometheus provider. This is especially useful to monitor system metrics when you don’t have direct SSH access to AWS resources like EC2 Instances or if you want to gather AWS-specific metrics not available in the guest OS.

AWS policies

In order to fetch metrics fromCloudWatch, the exporter requires an IAM user or role with the following privileges:

cloudwatch:GetMetricData
cloudwatch:GetMetricStatistics
cloudwatch:ListMetrics
tag:GetResources

You can assign AWS-managed policies CloudWatchReadOnlyAccess and ResourceGroupsandTagEditorReadOnlyAccess to the desired user to enable these permissions.

Exporter configuration

The CloudWatch exporter repository is available on the official project page. It requires a minimal configuration to fetch metrics from the desired AWS instances. Below is a short list of the parameters needed for a minimal configuration:

region: AWS region of the monitored resource
metrics: a list of objects containing filters for the exported metrics
- aws_namespace: the namespace of the monitored resource
- aws_metric_name: the name of the AWS metric to fetch
- aws_dimensions: the dimension to expose as labels
- aws_dimension_select: the dimension to filter over
- aws_statistics: the list of metric statistics to expose
- aws_tag_select: optional tags to filter on
  - tag_selections: map containing the list of values to select for each tag
  - resource_type_selection: resource type to fetch the tags from (see: Resource Types)
  - resource_id_dimension: dimension to use for the resource id (see: Resource Types)

For a complete list of possible values for namespaces, metrics, and dimensions please refer to the official AWS CloudWatch User Guide.

Notice: AWS bills CloudWatch usage in batches of 1 million requests, where every metric counts as a single request. To avoid unnecessary expenses configure only the metrics you need.

region: us-east-2
metrics:
  - aws_namespace: AWS/EC2
    aws_metric_name: CPUUtilization
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkIn
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkOut
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkPacketsIn
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkPacketsOut
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: CPUCreditUsage
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: CPUCreditBalance
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSReadOps
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSWriteOps
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSReadBytes
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSWriteBytes
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSIOBalance%
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSByteBalance%
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

The suggested deployment mode for the exporter is through a Docker image. The following snippet provides a command line example to run the container (remember to provide your AWS credentials if needed and the path of the configuration file):

docker run -d --name cloudwatch_exporter \
  -p 9106:9106 \
  -v $(pwd)/cloudwatch-exporter.yaml:/config/config.yml \
  -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
  prom/cloudwatch-exporter

You can refer to the official guide for more details or alternative deployment modes.

Prometheus configuration

In order to scrape the newly created exporter add a new job to the configuration file. You will also need to define some relabeling rules in order to add the instance label required by Akamas to properly filter the incoming metrics. In the example below the instance label is copied from the instance’s Name tag:

Notice: AWS bills CloudWatch usage in batches of 1 million requests, where every metric counts as a single request. To avoid unnecessary expenses configure an appropriate scraping interval.

scrape_configs:
  - job_name: cloudwatch_exporter
    scrape_interval: 60s
    scrape_timeout: 30s
    static_configs:
      - targets: [cloudwatch_exporter:9106]
    metric_relabel_configs:
      - source_labels: [tag_Name]
        regex: '(.+)'
        target_label: instance

Additional workflow task

Once you configured the exporter in the Prometheus configuration you can start to fetch metrics using the Prometheus provider. The following sections describe some scripts you can add as tasks in your workflow.

Wait for metrics

It’s worth noting that CloudWatch may require some minutes to aggregate the stats according to the configured granularity, causing the telemetry provider to fail while trying to fetch data points not available yet. To avoid such issues you can add at the end of your workflow a task using an Executor operator to wait for the CloudWatch metrics to be ready. The following script is an example of implementation:

METRIC=aws_rds_cpuutilization_sum   # metric to check for
DELAY_SEC=15
RETRIES=60

NOW=`date +'%FT%T.%3NZ'`

for i in `seq $RETRIES`; do
  sleep $DELAY_SEC
  curl -sS "http://prometheus_host/api/v1/query?query=${METRIC}&time=${NOW}" | jq -ce '.data.result[]' && exit 0
done

exit 255

Start/stop the exporter as needed

Since Amazon bills your CloudWatch queries is wise to run the exporter only when needed. The following script allows you to manage the exporter from the workflow by adding the following tasks:

start the container right before the beginning of the load test (command: bash script.sh start)
stop the container after the metrics publication, as described in the previous section (command: bash script.sh stop).

#!/bin/bash

set -e

CMD=$1
CONT_NAME=cloudwatch_exporter

stop_cont() {
  [ -z `docker ps -aq -f "name=${CONT_NAME}"` ] || (echo Removing ${CONT_NAME} && docker rm -f ${CONT_NAME})
}

case $CMD in
  stop|remove)
    stop_cont
    ;;

  start)
    stop_cont

    AWS_ACCESS_KEY_ID=`awk 'BEGIN { FS = "=" } /aws_access_key_id/ {print $2 }' ~/.aws/credentials | tr -d '[:space:]'`
    AWS_SECRET_ACCESS_KEY=`awk 'BEGIN { FS = "=" } /aws_secret_access_key/ {print $2 }' ~/.aws/credentials | tr -d '[:space:]'`

    echo Starting container $CONT_NAME
    docker run -d --name $CONT_NAME \
      -p 9106:9106 \
      -v ~/oracle-database/utils/cloudwatch-exporter.yaml:/config/config.yml \
      -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
      prom/cloudwatch-exporter
    ;;

    *)
    echo Unrecognized option $CMD
    exit 255
    ;;
esac

Custom Configuration file

The example below is the Akamas-supported configuration, fetching metrics of EC2 instances named server1 and server2.

region: us-east-2
metrics:
  - aws_namespace: AWS/EC2
    aws_metric_name: CPUUtilization
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkIn
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkOut
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkPacketsIn
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkPacketsOut
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: CPUCreditUsage
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: CPUCreditBalance
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSReadOps
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSWriteOps
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSReadBytes
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSWriteBytes
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSIOBalance%
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSByteBalance%
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

OracleDB Exporter

This page describes how to set up an OracleDB exporter in order to gather metrics regarding an Oracle Database instance through the Prometheus provider.

Installation

The OracleDB exporter repository is available on the official project page. The suggested deploy mode is through a Docker image, since the Prometheus instance can easily access the running container through the Akamas network.

Use the following command line to run the container, where cust-metrics.toml is your configuration file defining the queries for additional custom metrics (see paragraph below) and DATA_SOURCE_NAME an environment variable containing the Oracle EasyConnect string:

docker run -d --name oracledb_exporter --restart always \
  --network akamas -p 9161:9161 \
  -v ~/oracledb_exporter/cust-metrics.toml:/cust-metrics.toml \
  -e CUSTOM_METRICS=/cust-metrics.toml \
  -e DATA_SOURCE_NAME="username/password@//oracledb.mycompany.com/service" \
  iamseth/oracledb_exporter

You can refer to the official guide for more details or alternative deployment modes.

Custom queries

It is possible to define additional queries to expose custom metrics using any data in the database instance that is readable by the monitoring user (see the guide for more details about the syntax).

Custom Configuration file

The following is an example of exporting system metrics from the Dynamic Performance (V$) Views used by the Prometheus provider default queries for the Oracle Database optimization pack:

[[metric]]
context= "memory"
labels= [ "component" ]
metricsdesc= { size="Component memory extracted from v$memory_dynamic_components in Oracle." }
request = '''
SELECT component, current_size as "size"
FROM V$MEMORY_DYNAMIC_COMPONENTS
UNION
SELECT name, bytes as "size"
FROM V$SGAINFO
WHERE name in ('Free SGA Memory Available', 'Redo Buffers', 'Maximum SGA Size')
'''

[[metric]]
context = "activity"
metricsdesc = { value="Generic counter metric from v$sysstat view in Oracle." }
fieldtoappend = "name"
request = '''
SELECT name, value
FROM V$SYSSTAT WHERE name IN (
  'execute count',
  'user commits', 'user rollbacks',
  'db block gets from cache', 'consistent gets from cache', 'physical reads cache', /* CACHE */
  'redo log space requests'
 )
 '''

[[metric]]
context = "system_event"
labels = [ "event", "wait_class" ]
request = '''
SELECT
  event, wait_class,
  total_waits, time_waited
FROM V$SYSTEM_EVENT
'''
[metric.metricsdesc]
  total_waits= "Total number of waits for the event as per V$SYSTEM_EVENT in Oracle."
  time_waited= "Total time waited for the event (in hundredths of seconds) as per V$SYSTEM_EVENT in Oracle."

CloudWatch Exporter

AWS policies

In order to fetch metrics fromCloudWatch, the exporter requires an IAM user or role with the following privileges:

cloudwatch:GetMetricData
cloudwatch:GetMetricStatistics
cloudwatch:ListMetrics
tag:GetResources

You can assign AWS-managed policies CloudWatchReadOnlyAccess and ResourceGroupsandTagEditorReadOnlyAccess to the desired user to enable these permissions.

Exporter configuration

region: AWS region of the monitored resource
metrics: a list of objects containing filters for the exported metrics
- aws_namespace: the namespace of the monitored resource
- aws_metric_name: the name of the AWS metric to fetch
- aws_dimensions: the dimension to expose as labels
- aws_dimension_select: the dimension to filter over
- aws_statistics: the list of metric statistics to expose
- aws_tag_select: optional tags to filter on
  - tag_selections: map containing the list of values to select for each tag
  - resource_type_selection: resource type to fetch the tags from (see: Resource Types)
  - resource_id_dimension: dimension to use for the resource id (see: Resource Types)

For a complete list of possible values for namespaces, metrics, and dimensions please refer to the official AWS CloudWatch User Guide.

Notice: AWS bills CloudWatch usage in batches of 1 million requests, where every metric counts as a single request. To avoid unnecessary expenses configure only the metrics you need.

region: us-east-2
metrics:
  - aws_namespace: AWS/EC2
    aws_metric_name: CPUUtilization
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkIn
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkOut
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkPacketsIn
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkPacketsOut
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: CPUCreditUsage
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: CPUCreditBalance
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSReadOps
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSWriteOps
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSReadBytes
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSWriteBytes
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSIOBalance%
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSByteBalance%
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

docker run -d --name cloudwatch_exporter \
  -p 9106:9106 \
  -v $(pwd)/cloudwatch-exporter.yaml:/config/config.yml \
  -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
  prom/cloudwatch-exporter

You can refer to the official guide for more details or alternative deployment modes.

Prometheus configuration

Notice: AWS bills CloudWatch usage in batches of 1 million requests, where every metric counts as a single request. To avoid unnecessary expenses configure an appropriate scraping interval.

scrape_configs:
  - job_name: cloudwatch_exporter
    scrape_interval: 60s
    scrape_timeout: 30s
    static_configs:
      - targets: [cloudwatch_exporter:9106]
    metric_relabel_configs:
      - source_labels: [tag_Name]
        regex: '(.+)'
        target_label: instance

Additional workflow task

Wait for metrics

METRIC=aws_rds_cpuutilization_sum   # metric to check for
DELAY_SEC=15
RETRIES=60

NOW=`date +'%FT%T.%3NZ'`

for i in `seq $RETRIES`; do
  sleep $DELAY_SEC
  curl -sS "http://prometheus_host/api/v1/query?query=${METRIC}&time=${NOW}" | jq -ce '.data.result[]' && exit 0
done

exit 255

Start/stop the exporter as needed

Since Amazon bills your CloudWatch queries is wise to run the exporter only when needed. The following script allows you to manage the exporter from the workflow by adding the following tasks:

start the container right before the beginning of the load test (command: bash script.sh start)
stop the container after the metrics publication, as described in the previous section (command: bash script.sh stop).

#!/bin/bash

set -e

CMD=$1
CONT_NAME=cloudwatch_exporter

stop_cont() {
  [ -z `docker ps -aq -f "name=${CONT_NAME}"` ] || (echo Removing ${CONT_NAME} && docker rm -f ${CONT_NAME})
}

case $CMD in
  stop|remove)
    stop_cont
    ;;

  start)
    stop_cont

    AWS_ACCESS_KEY_ID=`awk 'BEGIN { FS = "=" } /aws_access_key_id/ {print $2 }' ~/.aws/credentials | tr -d '[:space:]'`
    AWS_SECRET_ACCESS_KEY=`awk 'BEGIN { FS = "=" } /aws_secret_access_key/ {print $2 }' ~/.aws/credentials | tr -d '[:space:]'`

    echo Starting container $CONT_NAME
    docker run -d --name $CONT_NAME \
      -p 9106:9106 \
      -v ~/oracle-database/utils/cloudwatch-exporter.yaml:/config/config.yml \
      -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
      prom/cloudwatch-exporter
    ;;

    *)
    echo Unrecognized option $CMD
    exit 255
    ;;
esac

Custom Configuration file

The example below is the Akamas-supported configuration, fetching metrics of EC2 instances named server1 and server2.

region: us-east-2
metrics:
  - aws_namespace: AWS/EC2
    aws_metric_name: CPUUtilization
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkIn
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkOut
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkPacketsIn
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkPacketsOut
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: CPUCreditUsage
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: CPUCreditBalance
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSReadOps
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSWriteOps
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSReadBytes
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSWriteBytes
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSIOBalance%
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSByteBalance%
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

Create Prometheus telemetry instances

To create an instance of the Prometheus provider, edit a YAML file (instance.yml in this example) with the definition of the instance:

# Prometheus Telemetry Provider Instance
provider: Prometheus

config:
  address: host1  # URL or IP of the Prometheus from which extract metrics
  port: 9090      # Port of the Prometheus from which extract metrics

Then you can create the instance for the system using the Akamas CLI:

akamas create telemetry-instance instance.yml system

Configuration options

When you create an instance of the Prometheus provider, you should specify some configuration information to allow the provider to extract and process metrics from Prometheus correctly.

You can specify configuration information within the config part of the YAML of the instance definition.

Required properties

address, a URL or IP identifying the address of the host where Prometheus is installed
port, the port exposed by Prometheus

Optional properties

user, the username for the Prometheus service
password, the user password for the Prometheus service
job, a string to specify the scraping job name. The default is ".*" for all scraping jobs
logLevel, set this to "DETAILED" for some extra logs when searching for metrics (default value is "INFO")
headers, to specify additional custom headers e.g: headers: "custom_key": "custom_value"
namespace, a string to specify the namespace
duration, integer to determine the duration in seconds for data collection (use a number between 1 and 3600)
enableHttps, boolean to enable HTTPS in Prometheus (since 3.2.6)
ignoreCertificates, boolean to ignore SSL certificates
disableConnectionCheck, boolean to disable initial connection check to Prometheus

Custom queries

config:
  address: host1
  port: 9090

metrics:
  - metric: cust_metric   # extra akamas metric to monitor
    datasourceMetric: 'http_requests_total{environment=~"staging|testing|development", method!="GET"}' # query to execute to extract the metric
    labels:
    - method   # The "method" label will be retained within akamas

In this example, the telemetry instance will populate cust_metric with the results of the query specified in datasource, maintaining the value of the labels listed under labels.

Please refer to Querying basics | Prometheus for a complete reference of PromQL

Akamas placeholders

Akamas pre-processes the queries before running them, replacing special-purpose placeholders with the fields provided in the components. For example, given the following component definition:

name: jvm1
description: jvm1 for payment services
properties:
  prometheus:
    instance: service01
    job: jmx

The following is the list of available placeholders:

Placeholder

Usage example

Component definition example

Expanded query

Description

Example

prometheus:
  instance: frontend
  job: node

Use cases

This section reports common use cases addressed by this provider.

Collect Kubernetes metrics

To gather kubernetes metrics, the following exporters are required:

kube-state-metrics
cadvisor

As an example, you can define a component with type Kubernetes Container in this way:

name: adservice
description: The adservice of the online boutique by Google
componentType: Kubernetes Container
properties:
  prometheus:
    namespace: boutique
    pod: adservice.*
    container: server

Collect Java metrics

Check Java OpenJDK page for a list of all the Java metrics available in Akamas

java -javaagent:the_downloaded_jmx_exporter_jar.jar=9100:config.yaml -jar yourJar.jar

The command will expose on localhost on port 9100 Java metrics of youJar.jar __ which can be scraped by Prometheus.

config.yaml is a configuration file useful for the activity of this exporter. It is suggested to use this configuration for an optimal experience with the Prometheus provider:

startDelaySeconds: 0
username:
password:
ssl: false
lowercaseOutputName: false
lowercaseOutputLabelNames: false
# using the property above we are telling the export to export only relevant java metrics
whitelistObjectNames:
- "java.lang:*"
- "jvm:*"

As a next step, add a new scraping target in the configuration of the Prometheus used by the provider:

...
scrape_configs:
# JMX Exporter
- job_name: "jmx"
  static_configs:
  - targets: ["jmx_exporter_host:9100"]

You can then create a YAML file with the definition of a telemetry instance (prom_instance.yml) of the Prometheus provider:

name: Prometheus
config:
  address: prometheus_host
  port: 9090

And you can create the telemetry instance using the Akamas CLI:

akamas create telemetry-instance prom_instance.yml

Finally, to bind the extracted metrics to the related component, you should add the following field to the properties of the component’s definition:

prometheus:
  job: jmx

Collect system metrics

Check the Linux page for a list of all the system metrics available in Akamas

systemctl start node_exporter

Here’s the manifest of the node_exporter service:

[Unit]
Description=Node Exporter

[Service]
ExecStart=/path/to/node_exporter/executable

[Install]
WantedBy=default.target

The service will expose on localhost on port 9100 system metrics __ which can be scraped by Prometheus.

As a final step, add a new scraping target in the configuration of the Prometheus used by the provider:

scrape_configs:
# Node Exporter
- job_name: "node"
  static_configs:
  - targets: ["node_exporter_host:9100"]
  relabel_configs:
  - source_labels: ["__address__"]
    regex: "(.*):.*"
    # here we put as "instance", the name of the component the metrics refer to
    target_label: "instance"
    replacement: "linux_component_name"

You can then create a YAML file with the definition of a telemetry instance (prom_instance.yml) of the Prometheus provider:

provider: Prometheus
config:
  address: prometheus_host
  port: 9090

And you can create the telemetry instance using the Akamas CLI:

akamas create telemetry-instance prom_instance.yml

Finally, to bind the extracted metrics to the related component, you should add the following field to the properties of the component’s definition:

prometheus:
  instance: linux_component_name
  job: node