The product is still in its early stages. Breaking changes can still happen within minor releases until the first major release is done.
The Rusty AWS CloudWatch Exporter is aimed at intersection of companies that use AWS infrastructure - particularly AWS proprietary tooling - and use Prometheus based metrics.
AWS only publishes metrics through its AWS CloudWatch product which has its own idiosyncrasies and does not provide proper interoperability with Prometheus.
This product aims to provide a meaningful translation of metrics from AWS CloudWatch that can then be ingested into a Prometheus-based solution.
1 - Overview
How the exporter fits into the ecosystem to provide the translated metrics.
Architecture
In order for the Rusty AWS CloudWatch Exporter do its purpose a whole ecosystem of AWS Components need to be setup so that the metrics arrive to it in the expected manner.
AWS CloudWatch Stream
The CloudWatch Stream is the source component. It is the way provided by AWS to export metrics to other systems as they are generated.
AWS Kinesis Firehose Delivery Stream
The Kinesis Firehose works as the aggregator of metrics coming from the metric stream in order to minimize the number of requests hitting the Rusty CloudWatch Exporter.
This will introduce some delay but it can be tuned with different configuration options.
Operating Costs
Good Estimate
1-minute frequency metrics will cost approximately $0.13176 per month with lower frequency ones costing proportionally.
The operating costs will depend on several factors:
The number of metrics being processed and how frequently are they being updated
The location where the exporter is running and how Firehose is able to reach it
The region where all the services are running
For a more in-depth explanation you can read this article.
2 - Getting Started
Step by step setup of the Rusty AWS CloudWatch Exporter
Prerequisites
Free Trial - License
You can obtain the latest published limited free
license here.
Have a valid license in a file. The default name is license.json but a different name can be used.
AWS Policy
The Rusty AWS CloudWatch Exporter leverages the AWS API to obtain resource information that is then used for labeling
the metrics.
In order to use them, the exporter requires authenticating against the API and the profile used needs to have the proper
permissions that enables it to use the APIs.
The following is the minimum set of permissions needed in the profile:
ec2:DescribeInstances
dynamodb:DescribeTable
dynamodb:DescribeGlobalTable
dynamodb:ListTagsOfResource
elasticloadbalancing:DescribeLoadBalancers
elasticloadbalancing:DescribeTags
elasticloadbalancing:DescribeTargetGroups
lambda:GetFunction
sqs:GetQueueUrl
sqs:ListQueueTags
We recommend creating a policy with them and then attaching said policy to the role used to authenticate or run the
service with.
Rusty AWS CloudWatch Exporter Setup
Configuration File
Create a configuration file - usually named config.yaml from the following template:
host_url:'localhost'host_port:3000ingestion_token:'some random string'aws_account_id:'1234567890'aws_region:'us-west-2'aws_profile:'default'
The meaning of each field is:
host_url and host_port: The IP and port, respectively, where the service will bind for listening to incoming
requests. (Required)
ingestion_token: String to use for incoming metric ingestion request validation. This will be needed when setting up
the AWS Firehose. Metric ingestion requests that do not contain the expected ingestion token in
the X-Amz-Firehose-Access-Key header will be rejected by the exporter. (Required)
aws_account_id: The AWS account the resources being monitored belong to. It is used for the AWS API. (Required)
aws_region: The default region to use for the AWS API client. (Required)
aws_profile: Profile name to use to acquire the authentication for the AWS API Client. (Required)
Service Setup
The Rusty AWS CloudWatch Exporter is distributed via a Docker image
through the following repository.
The simplest command to get the container up and running is by executing the following command:
If the location for the config file or license file are different the following command line arguments can be passed in
to change where the service looks for them:
--config-file <config-file-path>: Configures the location to search for the configuration file. (Optional)
--license-file <license-file-path>: Configures the location to look for the license file. (Optional)
We do not make any assumptions on the way this is going to be deployed as it is too specific to the user’s environment
and setup.
This can be done in multiple ways such as executing it on a specific host, deploy it as part of k8s pod, etc.
AWS API Credentials
The service relies on the AWS API to retrieve resource information. In order to authenticate against it, it uses
the AWS SDK Authentication mechanisms.
Even though you can pick up the best option that suits you, we recommend running the application where it directly
inherits a role that has the appropriate permissions/policy already assigned.
For example, running the service within an EC2 instance.
Reachable Endpoint
The HTTP endpoint opened by the exporter service needs to be reachable from the Firehose and it needs to be an HTTPS
endpoint.
The service itself DOES NOT support TLS on the endpoint natively. The provisioning of that is left to the user of the
service to setup.
Depending on the way the service is deployed, the options that will be available to achieve this.
As an example, if the exporter is setup to run as a docker container within a specific EC2 instance, then this can be
achieved by using a reverse proxy such as nginx with a proper certificate.
Metric Streaming Setup
This assumes that the Rusty AWS CloudWatch exporter is already setup as above and up and running.
What remains to be done is make AWS send metrics to it. For this we need to setup a pipeline that consists of a
CloudWatch Stream and a Kinesis Firehose Delivery Stream.
AWS Kinesis Firehose Delivery Stream
This is the first component that needs to be created as the CloudWatch Stream needs to reference it in its creation.
This AWS Blogpost
is a good guide on how to create it.
The important details are that the Source is set to be Direct PUT and the destination is an HTTP Endpoint.
For the HTTP Endpoint URL configuration option make sure that you
use https://your-reachable-exporter-url/ingestion/firehose.
In the Access Key use the same string as what was set for the ingestion_token configuration in the exporter service.
The rest of the settings are optional and discretionary based on the environment and setup. A few notable mentions are:
Setting the encoding to be gzipped is not necessary but recommended.
The Buffer Hints can be tuned in order to minimize latency or improve request efficiency.
The selected Service Access role needs to be capable of reaching the endpoint and making the request.
AWS CloudWatch Metric Stream
Once the AWS Kinesis Delivery Stream has been setup, the metric faucet needs to be created.
The type of metric stream that needs to be created is the Custom setup with firehose using the previously created AWS
Firehose as the Kinesis Data Firehose stream to use.
The Output Formatmust be OpenTelemetry 1.0.
The rest of the configuration options are used to control the set of metrics to be sent by the Metric Stream and are
completely discretionary to the end user based on their needs.
3 - Generated Metrics
Under this section you will find the metrics that the exporter is capable to consume and how they are exposed in prometheus format.
Tag Set
The tag set is the set of labels generated from the tags associated through a resource.
Each tag associated to the resource will be mapped to a single label in the form tag_<tag_name>.
The value for the label will be the verbatim copy of the value associated to the tag.
3.1 - DynamoDB Metrics
Generated metrics for AWS DynamoDB
Info Metrics
aws_dynamodb_table_info
Type: Info
Description: Information regarding a DynamoDB Table resource
Description: The average percentage of provisioned read capacity units utilized by the highest provisioned read table
or global secondary index of the account
Based on: DynamoDB/MaxProvisionedTableReadCapacityUtilization
Description: The minimum percentage of provisioned read capacity units utilized by the highest provisioned read table
or global secondary index of the account
Based on: DynamoDB/MaxProvisionedTableReadCapacityUtilization
Description: The maximum percentage of provisioned read capacity units utilized by the highest provisioned read table
or global secondary index of the account
Based on: DynamoDB/MaxProvisionedTableReadCapacityUtilization
Description: The average percentage of provisioned write capacity units utilized by the highest provisioned read table
or global secondary index of the account
Based on: DynamoDB/MaxProvisionedTableWriteCapacityUtilization
Description: The minimum percentage of provisioned write capacity units utilized by the highest provisioned read table
or global secondary index of the account
Based on: DynamoDB/MaxProvisionedTableWriteCapacityUtilization
Description: The maximum percentage of provisioned write capacity units utilized by the highest provisioned read table
or global secondary index of the account
Based on: DynamoDB/MaxProvisionedTableWriteCapacityUtilization
Labels:
aws_account_id
aws_region
aws_dynamodb_user_errors_total
Type: Counter
Description: The number of HTTP 400 errors for DynamoDB or Amazon DynamoDB Streams requests
Based on: DynamoDB/UserErrors
Labels:
aws_account_id
aws_region
aws_dynamodb_table_count_avg
Type: Gauge
Description: The average number of existing DynamoDB table
Description: The number of records that DynamoDB failed to replicate to your Kinesis data stream
Based on: DynamoDB/FailedToReplicateRecordCount
Labels:
aws_account_id
aws_region
table_name
delegate_operation
aws_dynamodb_throttled_put_records_total
Type: Counter
Description: The number of records that were throttled by your Kinesis data stream due to insufficient Kinesis Data
Streams capacity
Based on: DynamoDB/ThrottledPutRecordCount
Labels:
aws_account_id
aws_region
table_name
delegate_operation
aws_dynamodb_returned_items_total
Type: Counter
Description: The number of items returned by select operations
Based on: DynamoDB/ReturnedItemCount
Labels:
aws_account_id
aws_region
table_name
operation
aws_dynamodb_successful_requests_total
Type: Counter
Description: The number of successful requests to DynamoDB or Amazon DynamoDB Streams
Based on: DynamoDB/SuccessfulRequestLatency
Labels:
aws_account_id
aws_region
table_name
operation
aws_dynamodb_successful_request_latency_min
Type: Gauge
Description: The min latency within the successful requests to DynamoDB or Amazon DynamoDB Streams
Based on: DynamoDB/SuccessfulRequestLatency
Labels:
aws_account_id
aws_region
table_name
operation
aws_dynamodb_successful_request_latency_avg
Type: Gauge
Description: The average latency within the successful requests to DynamoDB or Amazon DynamoDB Streams
Based on: DynamoDB/SuccessfulRequestLatency
Labels:
aws_account_id
aws_region
table_name
operation
aws_dynamodb_successful_request_latency_max
Type: Gauge
Description: The maximum latency within the successful requests to DynamoDB or Amazon DynamoDB Streams
Based on: DynamoDB/SuccessfulRequestLatency
Labels:
aws_account_id
aws_region
table_name
operation
aws_dynamodb_system_errors_total
Type: Counter
Description: The requests to DynamoDB or Amazon DynamoDB Streams that generate an HTTP 500 status code
Based on: DynamoDB/SystemErrors
Labels:
aws_account_id
aws_region
table_name
operation
aws_dynamodb_throttled_requests
Type: Counter
Description: The requests to DynamoDB or Amazon DynamoDB Streams that generate an HTTP 500 status code
Based on: DynamoDB/SystemErrors
Labels:
aws_account_id
aws_region
table_name
operation
aws_dynamodb_returned_bytes_total
Type: Counter
Description: The number of bytes returned by GetRecords operations (Amazon DynamoDB Streams)
Based on: DynamoDB/ReturnedBytes
Labels:
aws_account_id
aws_region
table_name
stream_label
operation
aws_dynamodb_returned_records_total
Type: Counter
Description: The number of stream records returned by GetRecords operations (Amazon DynamoDB Streams)
Based on: DynamoDB/ReturnedRecordsCount
Labels:
aws_account_id
aws_region
table_name
stream_label
operation
aws_dynamodb_pending_replication_count
Type: Gauge
Description: The number of item updates that are written to one replica table, but that have not yet been written to
another replica in the global table
Based on: DynamoDB/PendingReplicationCount
Labels:
aws_account_id
aws_region
table_name
receiving_region
aws_dynamodb_replication_latency_avg
Type: Gauge
Description: The average time it takes for an updated item appearing in the DynamoDB stream to be replicated in the
global table
Based on: DynamoDB/ReplicationLatency
Labels:
aws_account_id
aws_region
table_name
receiving_region
aws_dynamodb_replication_latency_max
Type: Gauge
Description: The maximum time it takes for an updated item appearing in the DynamoDB stream to be replicated in the
global table
Based on: DynamoDB/ReplicationLatency
Labels:
aws_account_id
aws_region
table_name
receiving_region
3.2 - EC2 Metrics
Generated metrics for AWS Elastic Compute Cloud
Info Metrics
aws_ec2_instance_info
Type: Info
Description: Information related to the AWS EC2 Instance resource
Description: The percentage of allocated EC2 compute units that are currently in use on the instance
Based on: EC2/CPUUtilization
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_disk_read_ops_total
Type: Counter
Description: Number of completed read operations from all instance store volumes available to the instance
Based on: EC2/DiskReadOps
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_disk_write_ops_total
Type: Counter
Description: Number of completed write operations from all instance store volumes available to the instance
Based on: EC2/DiskWriteOps
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_disk_read_bytes_total
Type: Counter
Description: Volume of data the application reads from the hard disk of the instance
Based on: EC2/DiskReadBytes
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_disk_write_bytes_total
Type: Counter
Description: Volume of data the application writes onto the hard disk of the instance
Based on: EC2/DiskWriteBytes
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_metadata_no_token_total
Type: Counter
Description: The number of times the instance metadata service was successfully accessed using a method that does not
use a token
Based on: EC2/MetadataNoToken
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_network_in_total
Type: Counter
Description: The number of bytes received by the instance on all network interfaces
Based on: EC2/NetworkIn
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_network_out_total
Type: Counter
Description: The number of bytes sent out by the instance on all network interfaces
Based on: EC2/NetworkOut
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_network_packets_in_total
Type: Counter
Description: The number of packets received by the instance on all network interfaces
Based on: EC2/NetworkPacketsIn
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_network_packets_out_total
Type: Counter
Description: The number of packets sent out by the instance on all network interfaces
Based on: EC2/NetworkPacketsOut
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_cpu_credit_usage_total
Type: Counter
Description: The number of CPU credits spent by the instance for CPU utilization
Based on: EC2/CPUCreditUsage
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_cpu_surplus_credit_balance
Type: Gauge
Description: The number of surplus credits that have been spent by an unlimited instance when its CPUCreditBalance
value is zero
Based on: EC2/CPUSurplusCreditBalance
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_cpu_surplus_credits_charged_total
Type: Counter
Description: The number of spent surplus credits that are not paid down by earned CPU credits, and which thus incur an
additional charge
Based on: EC2/CPUSurplusCreditsCharged
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_dedicated_host_cpu_utilization_avg
Type: Gauge
Description: The percentage of allocated compute capacity that is currently in use by the instances running on the
Dedicated Host
Based on: EC2/CPUSurplusCreditBalance
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_ebs_read_ops_total
Type: Counter
Description: Completed read operations from all Amazon EBS volumes attached to the instance in a specified period of
time
Based on: EC2/EBSReadOps
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_ebs_write_ops_total
Type: Counter
Description: Completed write operations to all EBS volumes attached to the instance in a specified period of time
Based on: EC2/EBSWriteOps
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_ebs_read_bytes_total
Type: Counter
Description: Bytes read from all EBS volumes attached to the instance in a specified period of time
Based on: EC2/EBSReadBytes
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_ebs_write_bytes_total
Type: Counter
Description: Bytes written to all EBS volumes attached to the instance in a specified period of time
Based on: EC2/EBSWriteBytes
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_status_check_failed
Type: Gauge
Description: Reports whether the instance has passed both the instance status check and the system status check in the
last minute
Based on: EC2/StatusCheckFailed
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_status_check_failed_instance
Type: Gauge
Description: Reports whether the instance has passed the instance status check in the last minute
Based on: EC2/StatusCheckFailed_Instance
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_status_check_failed_system
Type: Gauge
Description: Reports whether the instance has passed the system status check in the last minute
Based on: EC2/StatusCheckFailed_System
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_network_mirror_in_total
Type: Counter
Description: The number of bytes received on all network interfaces by the instance that are mirrored
Based on: EC2/NetworkMirrorIn
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_network_mirror_out_total
Type: Counter
Description: The number of bytes sent out on all network interfaces by the instance that are mirrored
Based on: EC2/NetworkMirrorOut
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_network_packets_mirror_in_total
Type: Counter
Description: The number of packets received on all network interfaces by the instance that are mirrored
Based on: EC2/NetworkPacketsMirrorIn
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_network_packets_mirror_out_total
Type: Counter
Description: The number of packets sent out on all network interfaces by the instance that are mirrored
Based on: EC2/NetworkPacketsMirrorOut
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_network_skip_mirror_in_total
Type: Counter
Description: The number of bytes received, that meet the traffic mirror filter rules, that did not get mirrored
because of production traffic taking priority
Based on: EC2/NetworkSkipMirrorIn
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_network_skip_mirror_out_total
Type: Counter
Description: The number of bytes sent out, that meet the traffic mirror filter rules, that did not get mirrored
because of production traffic taking priority
Based on: EC2/NetworkSkipMirrorOut
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_network_packets_skip_mirror_in_total
Type: Counter
Description: The number of packets received, that meet the traffic mirror filter rules, that did not get mirrored
because of production traffic taking priority
Based on: EC2/NetworkPacketsSkipMirrorIn
Labels:
aws_account_id
aws_region
instance_id
aws_ec2_network_packets_skip_mirror_out_total
Type: Counter
Description: The number of packets sent out, that meet the traffic mirror filter rules, that did not get mirrored
because of production traffic taking priority
Based on: EC2/NetworkPacketsSkipMirrorOut
Labels:
aws_account_id
aws_region
instance_id
3.3 - Elastic Load Balancer Metrics
Generated metrics for AWS Elastic Load Balancers
Info Metrics
aws_load_balancer_info
Type: Info
Description: Information related to the AWS Elastic Load Balancer resource generated from the AWS API
Description: The number of subscription attempt failures because the consumer is already subscribed or it exceed the
number of calls per second allowed
Description: The number of times that your function code is invoked using standard concurrency when all provisioned
concurrency is in use
Based on: Lambda/ProvisionedConcurrencySpilloverInvocations
Labels:
aws_account_id
aws_region
function_name
aws_lambda_dropped_recursive_invocations_total
Type: Counter
Description: The number of times that Lambda has stopped invocation of your function because it’s detected that your
function is part of an infinite recursive loop
Based on: Lambda/RecursiveInvocationsDropped
Labels:
aws_account_id
aws_region
function_name
aws_lambda_duration
Type: Summary
Description: The amount of time that your function code spends processing an event in milliseconds
Based on: Lambda/Duration
Labels:
aws_account_id
aws_region
function_name
aws_lambda_post_runtime_extension_duration
Type: Summary
Description: The cumulative amount of time that the runtime spends running code for extensions after the function code
has completed
Based on: Lambda/PostRuntimeExtensionsDuration
Labels:
aws_account_id
aws_region
function_name
aws_lambda_iterator_age_max
Type: Gauge
Description: The max age of the last record in the event
Based on: Lambda/IteratorAge
Labels:
aws_account_id
aws_region
function_name
aws_lambda_offset_lag
Type: Gauge
Description: The difference in offset between the last record written to a topic and the last record that your
function’s consumer group processed
Based on: Lambda/OffsetLag
Labels:
aws_account_id
aws_region
function_name
aws_lambda_concurrent_executions_avg
Type: Gauge
Description: The average number of function instances that are processing events
Based on: Lambda/ConcurrentExecutions
Labels:
aws_account_id
aws_region
function_name
aws_lambda_concurrent_executions_max
Type: Gauge
Description: The maximum number of function instances that are processing events
Based on: Lambda/ConcurrentExecutions
Labels:
aws_account_id
aws_region
function_name
aws_lambda_provisioned_concurrent_executions_max
Type: Gauge
Description: The number of function instances that are processing events using provisioned concurrency
Based on: Lambda/ProvisionedConcurrentExecutions
Labels:
aws_account_id
aws_region
function_name
3.6 - SQS Metrics
Generated metrics for the AWS SQS subsystem
aws_sqs_queue_info
Type: Info
Description: Information related to the SQS resource generated from the AWS API
Description: The number of messages available for retrieval from the queue
Based on: SQS/ApproximateNumberOfMessagesVisible
Labels:
aws_account_id
aws_region
queue_name
aws_sqs_empty_receives_total
Type: Counter
Description: The number of ReceiveMessage API calls that did not return a message
Based on: SQS/NumberOfEmptyReceives
Labels:
aws_account_id
aws_region
queue_name
aws_sqs_messages_deleted_total
Type: Counter
Description: The number of messages deleted from the queue
Based on: SQS/NumberOfMessagesDeleted
Labels:
aws_account_id
aws_region
queue_name
aws_sqs_messages_sent_total
Type: Counter
Description: The number of messages added to a queue
Based on: SQS/NumberOfMessagesSent
Labels:
aws_account_id
aws_region
queue_name
aws_sqs_message_size_min
Type: Gauge
Description: The min size of messages added to a queue
Based on: SQS/SentMessageSize
Labels:
aws_account_id
aws_region
queue_name
aws_sqs_message_size_avg
Type: Gauge
Description: The average size of messages added to a queue
Based on: SQS/SentMessageSize
Labels:
aws_account_id
aws_region
queue_name
aws_sqs_message_size_max
Type: Gauge
Description: The max size of messages added to a queue
Based on: SQS/SentMessageSize
Labels:
aws_account_id
aws_region
queue_name
4 - Free Trial License Agreement
Grant of License
Licensor (Rusty Bones Software) grants Licensee a non-exclusive, non-transferable, revocable license to use the
Software (Rusty AWS CloudWatch Exporter) for evaluation purposes only.
Restrictions
Licensee shall not redistribute or modify the Software.
Ownership and Rights
The Software remains the exclusive property of Licensor, protected by copyright and intellectual property laws.
Non-Warranty
The Software is provided “as is” without warranty of any kind, expressed or implied.
Liability Waiver
Licensor shall not be liable for any damages arising from the use of the Software, including but not limited to,
indirect, incidental, or consequential damages.
Term and Termination
This Agreement shall remain in effect for the duration of the free trial period expressed in the license file, unless
terminated earlier.
Entire Agreement
This Agreement constitutes the entire understanding between the parties concerning the Software.