CloudWatch Exporter

This page describes how to set up a CloudWatch exporter in order to gather AWS metrics through the Prometheus provider. This is especially useful to monitor system metrics when you don’t have direct SSH access to AWS resources like EC2 Instances or if you want to gather AWS-specific metrics not available in the guest OS.

AWS policies

In order to fetch metrics fromCloudWatch, the exporter requires an IAM user or role with the following privileges:

  • cloudwatch:GetMetricData

  • cloudwatch:GetMetricStatistics

  • cloudwatch:ListMetrics

  • tag:GetResources

You can assign AWS-managed policies CloudWatchReadOnlyAccess and ResourceGroupsandTagEditorReadOnlyAccess to the desired user to enable these permissions.

Exporter configuration

The CloudWatch exporter repository is available on the official project page. It requires a minimal configuration to fetch metrics from the desired AWS instances. Below is a short list of the parameters needed for a minimal configuration:

  • region: AWS region of the monitored resource

  • metrics: a list of objects containing filters for the exported metrics

    • aws_namespace: the namespace of the monitored resource

    • aws_metric_name: the name of the AWS metric to fetch

    • aws_dimensions: the dimension to expose as labels

    • aws_dimension_select: the dimension to filter over

    • aws_statistics: the list of metric statistics to expose

    • aws_tag_select: optional tags to filter on

      • tag_selections: map containing the list of values to select for each tag

      • resource_type_selection: resource type to fetch the tags from (see: Resource Types)

      • resource_id_dimension: dimension to use for the resource id (see: Resource Types)

For a complete list of possible values for namespaces, metrics, and dimensions please refer to the official AWS CloudWatch User Guide.

Notice: AWS bills CloudWatch usage in batches of 1 million requests, where every metric counts as a single request. To avoid unnecessary expenses configure only the metrics you need.

region: us-east-2
metrics:
  - aws_namespace: AWS/EC2
    aws_metric_name: CPUUtilization
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkIn
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkOut
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkPacketsIn
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkPacketsOut
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: CPUCreditUsage
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: CPUCreditBalance
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSReadOps
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSWriteOps
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSReadBytes
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSWriteBytes
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSIOBalance%
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSByteBalance%
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

The suggested deployment mode for the exporter is through a Docker image. The following snippet provides a command line example to run the container (remember to provide your AWS credentials if needed and the path of the configuration file):

docker run -d --name cloudwatch_exporter \
  -p 9106:9106 \
  -v $(pwd)/cloudwatch-exporter.yaml:/config/config.yml \
  -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
  prom/cloudwatch-exporter

You can refer to the official guide for more details or alternative deployment modes.

Prometheus configuration

In order to scrape the newly created exporter add a new job to the configuration file. You will also need to define some relabeling rules in order to add the instance label required by Akamas to properly filter the incoming metrics. In the example below the instance label is copied from the instance’s Name tag:

Notice: AWS bills CloudWatch usage in batches of 1 million requests, where every metric counts as a single request. To avoid unnecessary expenses configure an appropriate scraping interval.

scrape_configs:
  - job_name: cloudwatch_exporter
    scrape_interval: 60s
    scrape_timeout: 30s
    static_configs:
      - targets: [cloudwatch_exporter:9106]
    metric_relabel_configs:
      - source_labels: [tag_Name]
        regex: '(.+)'
        target_label: instance

Additional workflow task

Once you configured the exporter in the Prometheus configuration you can start to fetch metrics using the Prometheus provider. The following sections describe some scripts you can add as tasks in your workflow.

Wait for metrics

It’s worth noting that CloudWatch may require some minutes to aggregate the stats according to the configured granularity, causing the telemetry provider to fail while trying to fetch data points not available yet. To avoid such issues you can add at the end of your workflow a task using an Executor operator to wait for the CloudWatch metrics to be ready. The following script is an example of implementation:

METRIC=aws_rds_cpuutilization_sum   # metric to check for
DELAY_SEC=15
RETRIES=60

NOW=`date +'%FT%T.%3NZ'`

for i in `seq $RETRIES`; do
  sleep $DELAY_SEC
  curl -sS "http://prometheus_host/api/v1/query?query=${METRIC}&time=${NOW}" | jq -ce '.data.result[]' && exit 0
done

exit 255

Start/stop the exporter as needed

Since Amazon bills your CloudWatch queries is wise to run the exporter only when needed. The following script allows you to manage the exporter from the workflow by adding the following tasks:

  • start the container right before the beginning of the load test (command: bash script.sh start)

  • stop the container after the metrics publication, as described in the previous section (command: bash script.sh stop).

#!/bin/bash

set -e

CMD=$1
CONT_NAME=cloudwatch_exporter

stop_cont() {
  [ -z `docker ps -aq -f "name=${CONT_NAME}"` ] || (echo Removing ${CONT_NAME} && docker rm -f ${CONT_NAME})
}

case $CMD in
  stop|remove)
    stop_cont
    ;;

  start)
    stop_cont

    AWS_ACCESS_KEY_ID=`awk 'BEGIN { FS = "=" } /aws_access_key_id/ {print $2 }' ~/.aws/credentials | tr -d '[:space:]'`
    AWS_SECRET_ACCESS_KEY=`awk 'BEGIN { FS = "=" } /aws_secret_access_key/ {print $2 }' ~/.aws/credentials | tr -d '[:space:]'`

    echo Starting container $CONT_NAME
    docker run -d --name $CONT_NAME \
      -p 9106:9106 \
      -v ~/oracle-database/utils/cloudwatch-exporter.yaml:/config/config.yml \
      -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
      prom/cloudwatch-exporter
    ;;

    *)
    echo Unrecognized option $CMD
    exit 255
    ;;
esac

Custom Configuration file

The example below is the Akamas-supported configuration, fetching metrics of EC2 instances named server1 and server2.

region: us-east-2
metrics:
  - aws_namespace: AWS/EC2
    aws_metric_name: CPUUtilization
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkIn
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkOut
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkPacketsIn
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: NetworkPacketsOut
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: CPUCreditUsage
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: CPUCreditBalance
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSReadOps
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSWriteOps
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSReadBytes
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSWriteBytes
    aws_statistics: [Sum]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSIOBalance%
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

  - aws_namespace: AWS/EC2
    aws_metric_name: EBSByteBalance%
    aws_statistics: [Average]
    aws_dimensions: [InstanceId]
    # aws_dimension_select:
    #   InstanceId: [i-XXXXXXXXXXXXXXXXX]
    aws_tag_select:
      tag_selections:
        Name: [server1, server2]
      resource_type_selection: ec2:instance
      resource_id_dimension: InstanceId

Last updated