Add float histograms and gauge histograms to proto spec #58

beorn7 · 2022-06-14T18:12:43Z

This is only for the sparsehistogram branch!

@codesome as discussed before. I'm hereby starting the work to support float histograms and gauge histograms in the exposition format (and ultimately in TSDB and federation).

@bboreham to double check if my protobuf handling makes sense here. The idea is that the common case of a normal histogram doesn't look different at all on the wire.

@cstyan & @csmarchbanks Note that this is not yet an update of the remote-write protobuf but merely of the exposition format. However, we have to support float and gauge histograms there as well, so I thought we first complete the exposition proto spec and let it inform the remote-write proto spec. (But keep in mind that the more important source of inspiration is the respective Go types, i.e. https://github.com/prometheus/prometheus/blob/095b6c93dd5ab75f0c9f22f52b4fb5f45b33ff80/model/histogram/histogram.go#L37-L58 and https://github.com/prometheus/prometheus/blob/095b6c93dd5ab75f0c9f22f52b4fb5f45b33ff80/model/histogram/float_histogram.go#L30-L50 .)

Commit description follows:

Note that this is only an extension of the proto spec. Both generators
and consumers of the protobuf still need changes to make use of these
changes.

Gauge histograms measure current distributions. For one, they are
inspired by the GaugeHistogram type introducted by OpenMetrics, see
https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#gaugehistogram

They are also handled in the same way as OpenMetrics does it, by
using a new MetricType enum field GAUGE_HISTOGRAM, but not changing
anything else, i.e. for both regular and gauge histograms, the same
Histogram message type is used.

The other reason why we need gauge histograms comes from PromQL: If
you rate a histogram (which is possible with the new sparse
histograms as 1st class data type), the result is a gauge histogram. A
rate'd histogram can be created by a recording rule and then stored in
the TSDB. From there, it can be exposed by federation, so we need to
be able to represent it in the exposition format.

Float histograms are histograms where all counts (count of
observations, counts in each bucket, zero bucket count) are floating
point numbers rather than integer numbers. They are rarely needed for
direct instrumentation. Use cases are weighted histograms or timing
histograms, see kubernetes/kubernetes#109277
for a real-world example.

However, float histograms happen all the time as results of PromQL
expressions. Following the same line of argument as above, those float
histograms can end up in the TSDB via recording rules, which means
they can be exposed via federation.

Note that float histograms are implicitly supported by the original
Prometheus text format, as this format simply uses floating point
numbers for all sample values. OpenMetrics has avoided this ambiguity
and has specified integers for bucket counts and the count of
observations in a histogram, which means it needs to be extended to
support float histograms, similar to how this commit extends the
original Prometheus protobuf format.

Signed-off-by: beorn7 beorn@grafana.com

Note that this is only an extension of the proto spec. Both generators and consumers of the protobuf still need changes to make use of these changes. Gauge histograms measure current distributions. For one, they are inspired by the GaugeHistogram type introducted by OpenMetrics, see https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#gaugehistogram They are also handled in the same way as OpenMetrics does it, by using a new MetricType enum field GAUGE_HISTOGRAM, but not changing anything else, i.e. for both regular and gauge histograms, the same Histogram message type is used. The other reason why we need gauge histograms comes from PromQL: If you `rate` a histogram (which is possible with the new sparse histograms as 1st class data type), the result is a gauge histogram. A rate'd histogram can be created by a recording rule and then stored in the TSDB. From there, it can be exposed by federation, so we need to be able to represent it in the exposition format. Float histograms are histograms where all counts (count of observations, counts in each bucket, zero bucket count) are floating point numbers rather than integer numbers. They are rarely needed for direct instrumentation. Use cases are weighted histograms or timing histograms, see kubernetes/kubernetes#109277 for a real-world example. However, float histograms happen all the time as results of PromQL expressions. Following the same line of argument as above, those float histograms can end up in the TSDB via recording rules, which means they can be exposed via federation. Note that float histograms are implicitly supported by the original Prometheus text format, as this format simply uses floating point numbers for all sample values. OpenMetrics has avoided this ambiguity and has specified integers for bucket counts and the count of observations in a histogram, which means it needs to be extended to support float histograms, similar to how this commit extends the original Prometheus protobuf format. Signed-off-by: beorn7 <beorn@grafana.com>

beorn7 · 2022-06-14T18:14:17Z

What I forgot to mention, but maybe it is obvious anyway: In the way things are designed here, it is no problem at all to represent a histogram that is both a gauge histogram and a float histogram (the typical outcome of a recording rule that rates a histogram).

bboreham · 2022-06-19T15:31:51Z

"float histograms" is a new concept to me; it seems that much of the discussion is at prometheus/client_golang#796.

beorn7 · 2022-06-19T16:26:27Z

prometheus/client_golang#796 is more the start of a thought that, in the end, arrived at a "scaled" or float histogram, which can be seen in the aforementioned kubernetes/kubernetes#109277. While I see it as a valid use case, it's still fairly niche. I guess federation is a more pressing reason to allow a float histogram in the exposition format. At the end of the day, the reason doesn't matter, though. Float histogram are a thing, even if rare.

Signed-off-by: beorn7 <beorn@grafana.com>

beorn7 · 2022-06-29T11:19:24Z

/cc @marctc

beorn7 · 2022-06-29T14:35:10Z

Since this has been out for a while and it is only for the super experimental sparsehistogram branch, I will merge it now. @marctc plans to work on implementing ingestion (and ultimately storage) for this within Prometheus. Based on the experience, we can then iterate on the proto spec here before seeing it in main (if this ever goes to main).

beorn7 requested a review from codesome June 14, 2022 18:12

beorn7 mentioned this pull request Jun 19, 2022

prompb: Add histograms to remote write/read protobufs prometheus/prometheus#10870

Merged

Explain Span layout better

0da3265

Signed-off-by: beorn7 <beorn@grafana.com>

beorn7 merged commit 421ad2b into sparsehistogram Jun 29, 2022

beorn7 deleted the beorn7/histogram branch June 29, 2022 14:35

beorn7 mentioned this pull request Jun 29, 2022

prompb: Update exposition protobuf to include float and gauge histograms prometheus/prometheus#10932

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add float histograms and gauge histograms to proto spec #58

Add float histograms and gauge histograms to proto spec #58

beorn7 commented Jun 14, 2022

Uh oh!

beorn7 commented Jun 14, 2022

Uh oh!

bboreham commented Jun 19, 2022

Uh oh!

beorn7 commented Jun 19, 2022

Uh oh!

beorn7 commented Jun 29, 2022

Uh oh!

beorn7 commented Jun 29, 2022

Uh oh!

Add float histograms and gauge histograms to proto spec #58

Add float histograms and gauge histograms to proto spec #58

Conversation

beorn7 commented Jun 14, 2022

Uh oh!

beorn7 commented Jun 14, 2022

Uh oh!

bboreham commented Jun 19, 2022

Uh oh!

beorn7 commented Jun 19, 2022

Uh oh!

beorn7 commented Jun 29, 2022

Uh oh!

beorn7 commented Jun 29, 2022

Uh oh!