Merge sparsehistogram branch into main #1150

beorn7 · 2022-10-19T17:36:57Z

We plan to soon merge prometheus/prometheus#11447, which will add Native Histogram support to the Prometheus server (behind a feature flag).

To use Native Histograms, instrumentation libraries need to support them. While client_golang doesn't have a notion of feature flags, I think it is the most convenient way to merge the experimental features into main anyway, but clearly mark the respective options as experimental. Note that the changes are fairly transparent to the user. Nothing changes as long as they do not touch any SparseBuckets... options in HistogramOpts. And most of the time, they only have to set SparseBucketsFactor to activate the feature. (In production, setting a SparseBucketsMaxNumber is highly recommended, though. See doc comment.)

IMPORTANT REQUEST: This has been a long-lived feature branch, from which presumably other developers have branched off again. I beg you to properly merge this branch and not squash or rebase it, so that the commits stay compatible with those in other branches. We tried to keep the commits meaningful and not too spammy.

As a final note a bit of reasoning why I kept the nomenclature of "sparse buckets" here: On the Prometheus side, "sparse histograms" turned out to be a misleading word. While the native histograms indeed have sparse buckets, that's only describing one of their many features. It was better to use a name that describes the underlying property that unlocked all those features, namely that histograms aren't emulated anymore by a number of separate float series (for each bucket, count, and sum) but now histograms are 1st class citizens, where the sample of a single series can be a fully fledged (on "native") histogram. client_golang stayed faithful to the original data model of the Prometheus exposition format, which (not by coincidence) always treated histograms as first class citizens. Calling the new histograms "native" is confusing on the client_golang side, because they were indeed always native. That's also the reason why we only need to add a few more fields and "new style" buckets to the existing protobuf exposition format. For the nomenclature, we essentially only need a new word for the "new style" buckets. "Native buckets" doesn't really nail it. "Native histogram buckets" would be correct (as in: those new-style buckets that enable Prometheus to ingest the histogram natively), but it's too long. "Exponential buckets" and "high-res buckets" are not really expressing the important difference because you could configure the conventional buckets to be exponential or high-res already. "Dynamic buckets" and "sparse buckets" are the terms that remain, and I simply opted for the term we are already familiar with.

Printf the structure of it instead of actually encoding it. Signed-off-by: beorn7 <beorn@grafana.com>

Signed-off-by: beorn7 <beorn@grafana.com>

(now using sparsehistogram branch) Signed-off-by: beorn7 <beorn@grafana.com>

Signed-off-by: beorn7 <beorn@grafana.com>

This seem what OTel is converging towards, see open-telemetry/oteps#149 . I see pros and cons with base-10 vs base-2. They are discussed in detail in that OTel PR, and the gist of the discussion is pretty much in line with my design doc. Since the balance is easy to tip here, I think we should go with base-2 if OTel picks base-2. This also seems to be in agreement with several proprietary solution (see again the discussion on that OTel PR.) The idea to make the number of buckets per power of 2 (or formerly 10) a power of 2 itself was also sketched out in the design doc already. It guarantees mergeability of different resolutions. I was undecided between making it a recommendation or mandatory. Now I think it should be mandatory as it has the additional benefit of playing well with OTel's plans. This commit also addresses a number of outstanding TODOs. Signed-off-by: beorn7 <beorn@grafana.com>

Signed-off-by: beorn7 <beorn@grafana.com>

Switch sparse histograms to base-2 buckets

Signed-off-by: beorn7 <beorn@grafana.com>

Implement strategy to limit the sparse bucket count

Signed-off-by: beorn7 <beorn@grafana.com>

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

…to release-1.12

* Histogram: Expose bug with negative schema Signed-off-by: beorn7 <beorn@grafana.com> * Histogram: Fix bug with negative schemas Signed-off-by: beorn7 <beorn@grafana.com>

Note that this is an incompatible change. To scrape this new format, the Prometheus server needs to be updated at the same time. PR incoming. Signed-off-by: beorn7 <beorn@grafana.com>

histograms: Move to new exposition protobuf format

NaN observations now go to no bucket, but increment count (and effectively set sum to NaN, too). ±Inf observations now go to the bucket following the bucket that would have received math.MaxFloat64. The former is now the last bucket that can be created. The getLe is modified to return math.MaxFloat64 for the penultimate possible bucket. Also add a test for getLe. Signed-off-by: beorn7 <beorn@grafana.com>

sparse buckets: Fix handling of +Inf/-Inf/NaN observations

Native histograms are now in a tagged version (v0.3.0). Signed-off-by: beorn7 <beorn@grafana.com>

Signed-off-by: beorn7 <beorn@grafana.com>

beorn7 · 2022-10-26T11:55:54Z

@bwplotka @kakkoyun any thoughts on this? It would be good to address any concerns soon and get this merged before Prometheus v2.40 is released with native histogram support.

bwplotka · 2022-10-27T19:20:41Z

@codesome pinged me about this during KubeCon. I would love to look it through and get it merged this week or so.

I can agree with sparse histograms standpoints, but honestly, it's already a bit confusing for users. I was asked twice today, "is native and sparse histogram same thing"? I would vote for keeping one name - I am fine with any. So perhaps "native histograms" make more sense? Since it's native to Prometheus, it's fine here too. Unless we call the whole feature "sparse histograms" somewhere else and not ONLY in prometheus clients. (e.g in OM, Otel, etc)

beorn7 · 2022-10-27T20:00:44Z

I am also in favor of the term "native histogram". The problem is that the term "native bucket" doesn't make sense.
If we really want to get rid of the word "sparse", we need to call them "native histogram buckets".
We can do that, it's just quite long.

beorn7 · 2022-10-27T20:06:09Z

For same names, "Native Histograms" without "Buckets" makes sense. I'll add another commit, so you can see the effect.

beorn7 · 2022-10-27T20:35:38Z

Added a new commit. Please check it out and let me know how you like it.

bwplotka

Amazing work! I carefully reviewed public APIs and a bit less detailed review for implementation - I think it all makes sense. LGTM, happy with this, modulo naming confusion -> sparse vs native vs exponential histograms. Wonder what we can do here.

@RichiH also mentioned this problem in OM talk - and ppl from community were confused as well. Let's try to pick one name ;p

examples/random/main.go

prometheus/histogram.go

bwplotka · 2022-10-28T16:00:54Z

Ok - so "sparse buckets" for "native histograms" is something I can see as a good naming 👍🏽

bwplotka

After you explanation I think it makes sense - happy with naming 👍🏽

Just commented code then.

prometheus/histogram.go

beorn7 · 2022-10-31T15:21:13Z

Addressed all comments. Will now solve the merge conflict and then merge. 🎉

This intends to avoid confusing users by the subtle difference between a native histogram and a sparse bucket. Signed-off-by: beorn7 <beorn@grafana.com>

beorn7 and others added 30 commits April 7, 2020 23:18

Encode sparse histograms in protobuf

abe540f

Signed-off-by: beorn7 <beorn@grafana.com>

Fix span offset

d1f5366

Signed-off-by: beorn7 <beorn@grafana.com>

Add note about pow-of-10 precision issue

a9d0066

Signed-off-by: beorn7 <beorn@grafana.com>

Minor doc comment fixes

08104a0

Signed-off-by: beorn7 <beorn@grafana.com>

Merge branch 'master' into beorn7/histogram

d698336

Merge branch 'master' into beorn7/histogram

ce36ee3

Update prometheus/client_model

a9df0ba

(now using sparsehistogram branch) Signed-off-by: beorn7 <beorn@grafana.com>

Fix test

b7a540a

Signed-off-by: beorn7 <beorn@grafana.com>

Fix lint warning

553ed73

Signed-off-by: beorn7 <beorn@grafana.com>

Tidy go.sum

97eb041

Signed-off-by: beorn7 <beorn@grafana.com>

Merge branch 'master' into sparsehistogram

5aa8534

Add tests for sparse histogram

6c4e0ef

Signed-off-by: beorn7 <beorn@grafana.com>

Pin client_model to the most recent sparsehistogram commit

5142344

Signed-off-by: beorn7 <beorn@grafana.com>

Merge pull request #886 from prometheus/beorn7/histogram

43f31c2

Switch sparse histograms to base-2 buckets

Add TODO about bucket search optimization

aa6f67a

Signed-off-by: beorn7 <beorn@grafana.com>

Allow a zero threshold of zero

9ef5f90

Signed-off-by: beorn7 <beorn@grafana.com>

Merge branch 'master' into sparsehistogram

84fcaff

Implement strategy to limit the sparse bucket count

2409960

Signed-off-by: beorn7 <beorn@grafana.com>

Refactoring of sparse histograms

263be8d

Signed-off-by: beorn7 <beorn@grafana.com>

Merge pull request #901 from prometheus/beorn7/histogram

dfbcc28

Implement strategy to limit the sparse bucket count

Merge branch 'master' into sparsehistogram

5b19c55

Fix typo in doc comment

70253f4

Signed-off-by: beorn7 <beorn@grafana.com>

Merge branch 'main' into sparsehistogram

294cca4

Fix conflicts

a27b6d7

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

Merge branch 'release-1.12' of github.com:prometheus/client_golang in…

e203144

…to release-1.12

Merge branch 'main' into sparsehistogram

b237230

Histogram: Fix bug with negative schemas (#1054)

eb59a7b

* Histogram: Expose bug with negative schema Signed-off-by: beorn7 <beorn@grafana.com> * Histogram: Fix bug with negative schemas Signed-off-by: beorn7 <beorn@grafana.com>

Merge branch 'main' into sparsehistogram

6ba7871

beorn7 added 11 commits July 7, 2022 16:40

Merge branch 'foo-commit' into sparsehistogram

5a321c7

Merge branch 'main' into sparsehistogram

6141a07

histograms: Move to new exposition protobuf format

8cbcd40

Note that this is an incompatible change. To scrape this new format, the Prometheus server needs to be updated at the same time. PR incoming. Signed-off-by: beorn7 <beorn@grafana.com>

Merge pull request #1092 from prometheus/beorn7/histogram

ec86ef1

histograms: Move to new exposition protobuf format

Merge branch 'main' into sparsehistogram

95cf173

Merge pull request #1144 from prometheus/beorn7/histogram2

25bc188

sparse buckets: Fix handling of +Inf/-Inf/NaN observations

Merge branch 'main' into sparsehistogram

111fae1

Update prometheus/client_model dependency

4e71e6f

Native histograms are now in a tagged version (v0.3.0). Signed-off-by: beorn7 <beorn@grafana.com>

examples: Adjust doc comment for native histograms

58a8ca4

Signed-off-by: beorn7 <beorn@grafana.com>

Add SparseBucketsZeroThresholdZero and groom doc comments

d31f13b

Signed-off-by: beorn7 <beorn@grafana.com>

beorn7 requested review from bwplotka, codesome and kakkoyun October 19, 2022 17:37

beorn7 force-pushed the sparsehistogram branch from 40a70dc to e91b762 Compare October 27, 2022 20:34

beorn7 force-pushed the sparsehistogram branch from e91b762 to 1bad934 Compare October 27, 2022 20:55

bwplotka approved these changes Oct 28, 2022

View reviewed changes

prometheus/histogram.go Show resolved Hide resolved

prometheus/histogram.go Show resolved Hide resolved

beorn7 added 2 commits October 31, 2022 16:23

Avoid the term 'sparse' where possible

e92a8c7

This intends to avoid confusing users by the subtle difference between a native histogram and a sparse bucket. Signed-off-by: beorn7 <beorn@grafana.com>

Merge branch 'main' into sparsehistogram

fffb76c

beorn7 force-pushed the sparsehistogram branch from 1bad934 to fffb76c Compare October 31, 2022 15:31

beorn7 merged commit 5f202ee into main Oct 31, 2022

beorn7 deleted the sparsehistogram branch October 31, 2022 15:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge sparsehistogram branch into main #1150

Merge sparsehistogram branch into main #1150

beorn7 commented Oct 19, 2022

Uh oh!

beorn7 commented Oct 26, 2022

Uh oh!

bwplotka commented Oct 27, 2022

Uh oh!

beorn7 commented Oct 27, 2022

Uh oh!

beorn7 commented Oct 27, 2022

Uh oh!

beorn7 commented Oct 27, 2022

Uh oh!

bwplotka left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bwplotka commented Oct 28, 2022

Uh oh!

bwplotka left a comment

Uh oh!

Uh oh!

Uh oh!

beorn7 commented Oct 31, 2022

Uh oh!

Merge sparsehistogram branch into main #1150

Merge sparsehistogram branch into main #1150

Conversation

beorn7 commented Oct 19, 2022

Uh oh!

beorn7 commented Oct 26, 2022

Uh oh!

bwplotka commented Oct 27, 2022

Uh oh!

beorn7 commented Oct 27, 2022

Uh oh!

beorn7 commented Oct 27, 2022

Uh oh!

beorn7 commented Oct 27, 2022

Uh oh!

bwplotka left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bwplotka commented Oct 28, 2022

Uh oh!

bwplotka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

beorn7 commented Oct 31, 2022

Uh oh!

bwplotka left a comment •

edited

Loading