What might happen If I use Gauge to record the max/min value with custom implementation? #13425
Unanswered
KoalaBryson
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Currently, Prometheus does not support Delta Temporality, whereas OpenTelemetry does. In some cases, the maximum (max) and minimum (min) values of request latencies are quite important, as mentioned in this issue: open-telemetry/opentelemetry-proto#266.
I am trying to find a solution and would appreciate input from developers here. My objective is to record the max, min, sum, and count values of request latencies per minute.
To begin, I have implemented an MMSC (Max/Min/Sum/Count) aggregator to capture measurement data.
Solution 1:
I registered an MMSC that comprises four GaugeVecs, each designed to record the max, min, count, and mean values separately. I then reset the Prometheus GaugeVecs with a cron task—potentially triggered every 10 seconds—and ensured that the Prometheus scrapers fetch metrics at the same frequency.
The PromQL queries would be:
The issue is that the time interval of the Prometheus scrapers is bound to the client's reset window, and they must be equal.
Solution 2:
Implement a MetricExporter that translates MMSC metrics into Prometheus TimeSeries and uses the remote write protocol to write the TimeSeries into the Prometheus time series database.
The problem with this approach is that I need to design an appropriate fault-tolerance mechanism for the push action.
A well-established approach is for client agents to collect metrics, with collectors then gathering the metrics and writing them into storage.
Additionally, OpenTelemetry offers a method for applications to write directly to the backend, but it is recommended for use only in test/dev environments.
Most of my metrics are collected using the Prometheus SDK, with OpenTelemetry serving as a supplement. However, I want OpenTelemetry to be lightweight, for instance, I prefer not to use its collector.
Solution 3:
use aggregated metric logs instead of Aggregators in Prometheus/OpenTelemetry。
Solution 3 might serve as a plan B if Prometheus truly isn't suitable for this task.
I'm considering what sort of thorny issues might arise with approaches 1 and 2. For example, in approach 2, I would need to design a considerable amount of fault tolerance.
Beta Was this translation helpful? Give feedback.
All reactions