Metrics

Hermes gathers a big number of different metrics which are useful when trying to observe the current state of the system.

Latencies are measured as: 50, 75, 95, 99 and 99.9 percentiles. Rates are measured and averaged in a time window. There are 3 time windows measured: 1, 5 and 15 minutes.

Frontend

Frontend metrics are all prefixed with producer.{hostname}. Most of the metrics are collected in both aggregated and per-topic scope.

Latency

Latency metrics are grouped into two categories:

In those categories it is possible to read both broker latency and Hermes latency. Broker latency measures Kafka response times, while Hermes latency measures time span between receiving message till sending the response.

Metrics:

Rate

Metrics:

Response codes

These metrics measure global Hermes response codes. They make for good monitoring metrics, as sudden increase of 202 or 500 status codes might signal an emergency. There are no per-topic metrics for response codes. See publishing guide for the meaning of response codes.

Metrics:

Message

There are three metrics related to messages:

Metrics:

Buffers

These metrics indicate available buffer size for both ACK-all:

and ACK-leader buffers:

Compression

When using Kafka compression algorithm, these metrics show average compression rate of messages:

Consumers

Consumers metrics are all prefixed with consumer.{hostname}. Most of the metrics are collected in both aggregated and per-subscription scope.

Subscription metrics

Hermes publishes a lot of metrics that can be useful when reasoning about subscribers health and debugging subscribers issues:

Tracker

With tracing enabled, it is possible to observe the tracer queue size and remaining capacity: