Cluster should always aggregate metrics in a same order #539

matej21 · 2023-02-03T11:57:12Z

The issue with the current metrics aggregation is that it occurs in the order in which individual workers respond to the GET_METRICS_REQ message. This may cause variations in the aggregation results when the metrics contain floats, such as a histogram.

For example, if workers respond to the first request with values 0.5848208, 0.5479198, 0.3437699 (which sum to 1.4765105), and then respond to a second request with the same values but in a different order (0.3437699+0.5848208+0.5479198), the result of the aggregation will be1.4765104999999998 in JS, due to float errors.

This can trigger a "reset" detection in Prometheus and severely impact the accuracy of the graphs.

siimon/prom-client#539

matej21 added a commit to contember/engine that referenced this issue Feb 3, 2023

fix(engine-ee): round ms metrics to avoid prom-client agg errors

1b1e4c3

siimon/prom-client#539

matej21 added a commit to contember/engine that referenced this issue Feb 6, 2023

fix(engine-ee): round ms metrics to avoid prom-client agg errors

5580842

siimon/prom-client#539

matej21 added a commit to contember/engine that referenced this issue Feb 6, 2023

fix(engine-ee): round ms metrics to avoid prom-client agg errors

741b39e

siimon/prom-client#539

zbjornson added bug help wanted labels Mar 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster should always aggregate metrics in a same order #539

Cluster should always aggregate metrics in a same order #539

matej21 commented Feb 3, 2023

Cluster should always aggregate metrics in a same order #539

Cluster should always aggregate metrics in a same order #539

Comments

matej21 commented Feb 3, 2023