Hermes gathers a big number of different metrics which are useful when trying to observe the current state of the system.
Latencies are measured as: 50, 75, 95, 99 and 99.9 percentiles. Rates are measured and averaged in a time window. There are 3 time windows measured: 1, 5 and 15 minutes.
Frontend metrics are all prefixed with producer.{hostname}. Most of the metrics are collected in both aggregated and
per-topic scope.
Latency metrics are grouped into two categories:
In those categories it is possible to read both broker latency and Hermes latency. Broker latency measures Kafka response times, while Hermes latency measures time span between receiving message till sending the response.
Metrics:
ack-all.broker-latencyack-all.broker-latency.{groupName}.{topicName}ack-all.latencyack-all.latency.{groupName}.{topicName}ack-leader.broker-latencyack-leader.broker-latency.{groupName}.{topicName}ack-leader.latencyack-leader.latency.{groupName}.{topicName}Metrics:
metermeter.{groupName}.{topicName}These metrics measure global Hermes response codes. They make for good monitoring metrics, as sudden increase of 202 or 500 status codes might signal an emergency. There are no per-topic metrics for response codes. See publishing guide for the meaning of response codes.
Metrics:
http-status-codes.code201http-status-codes.code202http-status-codes.code408http-status-codes.code500There are three metrics related to messages:
Metrics:
parsing-requestparsing-request.{groupName}.{topicName}message-sizemessage-size.{groupName}.{topicName}validation-latencyvalidation-latency.{groupName}.{topicName}These metrics indicate available buffer size for both ACK-all:
everyone-confirms-buffer-total-byteseveryone-confirms-buffer-available-bytesand ACK-leader buffers:
leader-confirms-buffer-total-bytesleader-confirms-buffer-available-bytesWhen using Kafka compression algorithm, these metrics show average compression rate of messages:
everyone-confirms-compression-rateleader-confirms-compression-rateConsumers metrics are all prefixed with consumer.{hostname}. Most of the metrics are collected in both aggregated and
per-subscription scope.
Hermes publishes a lot of metrics that can be useful when reasoning about subscribers health and debugging subscribers issues:
latencymeteroutput-ratestatusWith tracing enabled, it is possible to observe the tracer queue size and remaining capacity:
tracker-queue-sizetracker-remaining-capacity