Hermes gathers a big number of different metrics which are useful when trying to observe the current state of the system.
Latencies are measured as: 50, 75, 95, 99 and 99.9 percentiles. Rates are measured and averaged in a time window. There are 3 time windows measured: 1, 5 and 15 minutes.
Frontend metrics are all prefixed with producer.{hostname}
. Most of the metrics are collected in both aggregated and
per-topic scope.
Latency metrics are grouped into two categories:
In those categories it is possible to read both broker latency and Hermes latency. Broker latency measures Kafka response times, while Hermes latency measures time span between receiving message till sending the response.
Metrics:
ack-all.broker-latency
ack-all.broker-latency.{groupName}.{topicName}
ack-all.latency
ack-all.latency.{groupName}.{topicName}
ack-leader.broker-latency
ack-leader.broker-latency.{groupName}.{topicName}
ack-leader.latency
ack-leader.latency.{groupName}.{topicName}
Metrics:
meter
meter.{groupName}.{topicName}
These metrics measure global Hermes response codes. They make for good monitoring metrics, as sudden increase of 202 or 500 status codes might signal an emergency. There are no per-topic metrics for response codes. See publishing guide for the meaning of response codes.
Metrics:
http-status-codes.code201
http-status-codes.code202
http-status-codes.code408
http-status-codes.code500
There are three metrics related to messages:
Metrics:
parsing-request
parsing-request.{groupName}.{topicName}
message-size
message-size.{groupName}.{topicName}
validation-latency
validation-latency.{groupName}.{topicName}
These metrics indicate available buffer size for both ACK-all:
everyone-confirms-buffer-total-bytes
everyone-confirms-buffer-available-bytes
and ACK-leader buffers:
leader-confirms-buffer-total-bytes
leader-confirms-buffer-available-bytes
When using Kafka compression algorithm, these metrics show average compression rate of messages:
everyone-confirms-compression-rate
leader-confirms-compression-rate
Consumers metrics are all prefixed with consumer.{hostname}
. Most of the metrics are collected in both aggregated and
per-subscription scope.
Hermes publishes a lot of metrics that can be useful when reasoning about subscribers health and debugging subscribers issues:
latency
meter
output-rate
status
With tracing enabled, it is possible to observe the tracer queue size and remaining capacity:
tracker-queue-size
tracker-remaining-capacity