Add documentation of request and latency metrics, backups to docs #496

damonfeldman · 2023-03-01T16:32:35Z

Dgraph metrics allow counting of successful and errored queries and mutations, as well as latency averages, but this is not documented, and the metric names are called "latency" which is confusing.

Note that the most reliable list of available metrics is the response from the /prometheus and similar endpoints on a running alpha, which return all metrics names (formatted for prometheus) as well as short descriptions, but Dgraph also documents key metrics at https://dgraph.io/docs/deploy/metrics/#activity-metrics and the latency/count metrics should be documented there.

The specific metrics to know about are

dgraph_grpc_io_client_roundtrip_latency_count
dgraph_grpc_io_client_roundtrip_latency_sum

Together, these can be used to compute the average latency of all requests. Also both have a "method" and "status" property that distinguish between query/mutation, and success/error respectively, so they can be used to count errors and queries, as well as latencies.

To compute query and error rates, use dgraph_grpc_io_client_roundtrip_latency_count only (it says latency, but is a categorized count of all operations, so can be used to count operations generally)

--

Also
dgraph_num_backups_total
should be used to monitor when backups have happened (typically via promql rate( {dgraph_num_backups_total}[5m]) or similar) so any slow or unusual activity can be correlated with backup activity if that is relevant.

The text was updated successfully, but these errors were encountered:

damonfeldman added the bug Something isn't working label Mar 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add documentation of request and latency metrics, backups to docs #496

Add documentation of request and latency metrics, backups to docs #496

damonfeldman commented Mar 1, 2023

Add documentation of request and latency metrics, backups to docs #496

Add documentation of request and latency metrics, backups to docs #496

Comments

damonfeldman commented Mar 1, 2023