Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge sparsehistogram branch into main #1150

Merged
merged 46 commits into from Oct 31, 2022
Merged

Merge sparsehistogram branch into main #1150

merged 46 commits into from Oct 31, 2022

Conversation

beorn7
Copy link
Member

@beorn7 beorn7 commented Oct 19, 2022

We plan to soon merge prometheus/prometheus#11447, which will add Native Histogram support to the Prometheus server (behind a feature flag).

To use Native Histograms, instrumentation libraries need to support them. While client_golang doesn't have a notion of feature flags, I think it is the most convenient way to merge the experimental features into main anyway, but clearly mark the respective options as experimental. Note that the changes are fairly transparent to the user. Nothing changes as long as they do not touch any SparseBuckets... options in HistogramOpts. And most of the time, they only have to set SparseBucketsFactor to activate the feature. (In production, setting a SparseBucketsMaxNumber is highly recommended, though. See doc comment.)

IMPORTANT REQUEST: This has been a long-lived feature branch, from which presumably other developers have branched off again. I beg you to properly merge this branch and not squash or rebase it, so that the commits stay compatible with those in other branches. We tried to keep the commits meaningful and not too spammy.

As a final note a bit of reasoning why I kept the nomenclature of "sparse buckets" here: On the Prometheus side, "sparse histograms" turned out to be a misleading word. While the native histograms indeed have sparse buckets, that's only describing one of their many features. It was better to use a name that describes the underlying property that unlocked all those features, namely that histograms aren't emulated anymore by a number of separate float series (for each bucket, count, and sum) but now histograms are 1st class citizens, where the sample of a single series can be a fully fledged (on "native") histogram. client_golang stayed faithful to the original data model of the Prometheus exposition format, which (not by coincidence) always treated histograms as first class citizens. Calling the new histograms "native" is confusing on the client_golang side, because they were indeed always native. That's also the reason why we only need to add a few more fields and "new style" buckets to the existing protobuf exposition format. For the nomenclature, we essentially only need a new word for the "new style" buckets. "Native buckets" doesn't really nail it. "Native histogram buckets" would be correct (as in: those new-style buckets that enable Prometheus to ingest the histogram natively), but it's too long. "Exponential buckets" and "high-res buckets" are not really expressing the important difference because you could configure the conventional buckets to be exponential or high-res already. "Dynamic buckets" and "sparse buckets" are the terms that remain, and I simply opted for the term we are already familiar with.

beorn7 and others added 30 commits April 7, 2020 23:18
Printf the structure of it instead of actually encoding it.

Signed-off-by: beorn7 <beorn@grafana.com>
Signed-off-by: beorn7 <beorn@grafana.com>
Signed-off-by: beorn7 <beorn@grafana.com>
Signed-off-by: beorn7 <beorn@grafana.com>
Signed-off-by: beorn7 <beorn@grafana.com>
(now using sparsehistogram branch)

Signed-off-by: beorn7 <beorn@grafana.com>
Signed-off-by: beorn7 <beorn@grafana.com>
Signed-off-by: beorn7 <beorn@grafana.com>
Signed-off-by: beorn7 <beorn@grafana.com>
This seem what OTel is converging towards, see
open-telemetry/oteps#149 .

I see pros and cons with base-10 vs base-2. They are discussed in
detail in that OTel PR, and the gist of the discussion is pretty much
in line with my design doc. Since the balance is easy to tip here, I
think we should go with base-2 if OTel picks base-2. This also seems
to be in agreement with several proprietary solution (see again the
discussion on that OTel PR.)

The idea to make the number of buckets per power of 2 (or formerly 10)
a power of 2 itself was also sketched out in the design doc
already. It guarantees mergeability of different resolutions. I was
undecided between making it a recommendation or mandatory. Now I think
it should be mandatory as it has the additional benefit of playing
well with OTel's plans.

This commit also addresses a number of outstanding TODOs.

Signed-off-by: beorn7 <beorn@grafana.com>
Signed-off-by: beorn7 <beorn@grafana.com>
Signed-off-by: beorn7 <beorn@grafana.com>
Switch sparse histograms to base-2 buckets
Signed-off-by: beorn7 <beorn@grafana.com>
Signed-off-by: beorn7 <beorn@grafana.com>
Signed-off-by: beorn7 <beorn@grafana.com>
Signed-off-by: beorn7 <beorn@grafana.com>
Implement strategy to limit the sparse bucket count
Signed-off-by: beorn7 <beorn@grafana.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
* Histogram: Expose bug with negative schema

Signed-off-by: beorn7 <beorn@grafana.com>

* Histogram: Fix bug with negative schemas

Signed-off-by: beorn7 <beorn@grafana.com>
beorn7 added 11 commits July 7, 2022 16:40
Note that this is an incompatible change. To scrape this new format,
the Prometheus server needs to be updated at the same time. PR
incoming.

Signed-off-by: beorn7 <beorn@grafana.com>
histograms: Move to new exposition protobuf format
NaN observations now go to no bucket, but increment count (and
effectively set sum to NaN, too).

±Inf observations now go to the bucket following the bucket that would
have received math.MaxFloat64. The former is now the last bucket that
can be created.

The getLe is modified to return math.MaxFloat64 for the penultimate
possible bucket.

Also add a test for getLe.

Signed-off-by: beorn7 <beorn@grafana.com>
sparse buckets: Fix handling of +Inf/-Inf/NaN observations
Native histograms are now in a tagged version (v0.3.0).

Signed-off-by: beorn7 <beorn@grafana.com>
Signed-off-by: beorn7 <beorn@grafana.com>
Signed-off-by: beorn7 <beorn@grafana.com>
@beorn7
Copy link
Member Author

beorn7 commented Oct 26, 2022

@bwplotka @kakkoyun any thoughts on this? It would be good to address any concerns soon and get this merged before Prometheus v2.40 is released with native histogram support.

@bwplotka
Copy link
Member

@codesome pinged me about this during KubeCon. I would love to look it through and get it merged this week or so.

I can agree with sparse histograms standpoints, but honestly, it's already a bit confusing for users. I was asked twice today, "is native and sparse histogram same thing"? I would vote for keeping one name - I am fine with any. So perhaps "native histograms" make more sense? Since it's native to Prometheus, it's fine here too. Unless we call the whole feature "sparse histograms" somewhere else and not ONLY in prometheus clients. (e.g in OM, Otel, etc)

@beorn7
Copy link
Member Author

beorn7 commented Oct 27, 2022

I am also in favor of the term "native histogram". The problem is that the term "native bucket" doesn't make sense.
If we really want to get rid of the word "sparse", we need to call them "native histogram buckets".
We can do that, it's just quite long.

@beorn7
Copy link
Member Author

beorn7 commented Oct 27, 2022

For same names, "Native Histograms" without "Buckets" makes sense. I'll add another commit, so you can see the effect.

@beorn7
Copy link
Member Author

beorn7 commented Oct 27, 2022

Added a new commit. Please check it out and let me know how you like it.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work! I carefully reviewed public APIs and a bit less detailed review for implementation - I think it all makes sense. LGTM, happy with this, modulo naming confusion -> sparse vs native vs exponential histograms. Wonder what we can do here.

@RichiH also mentioned this problem in OM talk - and ppl from community were confused as well. Let's try to pick one name ;p

examples/random/main.go Outdated Show resolved Hide resolved
prometheus/histogram.go Outdated Show resolved Hide resolved
prometheus/histogram.go Show resolved Hide resolved
prometheus/histogram.go Show resolved Hide resolved
prometheus/histogram.go Show resolved Hide resolved
prometheus/histogram.go Outdated Show resolved Hide resolved
prometheus/histogram.go Show resolved Hide resolved
@bwplotka
Copy link
Member

Ok - so "sparse buckets" for "native histograms" is something I can see as a good naming 👍🏽

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After you explanation I think it makes sense - happy with naming 👍🏽

Just commented code then.

prometheus/histogram.go Show resolved Hide resolved
prometheus/histogram.go Show resolved Hide resolved
@beorn7
Copy link
Member Author

beorn7 commented Oct 31, 2022

Addressed all comments. Will now solve the merge conflict and then merge. 🎉

This intends to avoid confusing users by the subtle difference between
a native histogram and a sparse bucket.

Signed-off-by: beorn7 <beorn@grafana.com>
@beorn7 beorn7 merged commit 5f202ee into main Oct 31, 2022
@beorn7 beorn7 deleted the sparsehistogram branch October 31, 2022 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants