You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dgraph metrics allow counting of successful and errored queries and mutations, as well as latency averages, but this is not documented, and the metric names are called "latency" which is confusing.
Note that the most reliable list of available metrics is the response from the /prometheus and similar endpoints on a running alpha, which return all metrics names (formatted for prometheus) as well as short descriptions, but Dgraph also documents key metrics at https://dgraph.io/docs/deploy/metrics/#activity-metrics and the latency/count metrics should be documented there.
Together, these can be used to compute the average latency of all requests. Also both have a "method" and "status" property that distinguish between query/mutation, and success/error respectively, so they can be used to count errors and queries, as well as latencies.
To compute query and error rates, use dgraph_grpc_io_client_roundtrip_latency_count only (it says latency, but is a categorized count of all operations, so can be used to count operations generally)
--
Also
dgraph_num_backups_total
should be used to monitor when backups have happened (typically via promql rate( {dgraph_num_backups_total}[5m]) or similar) so any slow or unusual activity can be correlated with backup activity if that is relevant.
The text was updated successfully, but these errors were encountered:
Dgraph metrics allow counting of successful and errored queries and mutations, as well as latency averages, but this is not documented, and the metric names are called "latency" which is confusing.
Note that the most reliable list of available metrics is the response from the /prometheus and similar endpoints on a running alpha, which return all metrics names (formatted for prometheus) as well as short descriptions, but Dgraph also documents key metrics at https://dgraph.io/docs/deploy/metrics/#activity-metrics and the latency/count metrics should be documented there.
The specific metrics to know about are
Together, these can be used to compute the average latency of all requests. Also both have a "method" and "status" property that distinguish between query/mutation, and success/error respectively, so they can be used to count errors and queries, as well as latencies.
To compute query and error rates, use dgraph_grpc_io_client_roundtrip_latency_count only (it says latency, but is a categorized count of all operations, so can be used to count operations generally)
--
Also
dgraph_num_backups_total
should be used to monitor when backups have happened (typically via promql rate( {dgraph_num_backups_total}[5m]) or similar) so any slow or unusual activity can be correlated with backup activity if that is relevant.
The text was updated successfully, but these errors were encountered: