histograms: Support exemplars in pure Native Histograms #1126

beorn7 · 2022-09-06T15:23:47Z

This depends on prometheus/client_model#61 .

Exemplars can be added to the buckets of conventional histograms but not to native histograms. So far, the way of sending exemplars is by also rendering a conventional histogram. With the issue above fixed, we have a way of sending exemplars independently of conventional buckets. However, without conventional buckets, we don't have a heuristics which exemplars to pick. Native histograms can have so many buckets that adding an examplar to each is infeasible.

This is P2 as explained in the milestone's description.

krajorama · 2023-11-07T12:28:58Z

Idea: keep a schema for the exemplars. At the beginning the exemplar schema would be the same as the buckets schema.

The exemplar schema "buckets" will cover the same range of values as the normal buckets, but at a much lower resolution. We could keep a maximum N exemplars per exemplar schema bucket (M). So max MxN. We'd automatically get selection of sampled exemplars per logarithmic range.

beorn7 · 2023-11-09T13:06:55Z

Note that exemplars directly added to native histograms MUST have a timestamp, see prometheus/client_model#61 and prometheus/prometheus#13021 for details.

fatsheep9146 · 2024-01-24T04:49:20Z

Since prometheus/client_model#61 is solved, I would love to take this issue. If ok, you can assign to me @beorn7 @carrieedwards

beorn7 · 2024-01-24T09:15:07Z

Thank you @fatsheep9146 . I have assigned this issue to you. Feel free to discuss selection heuristics on Slack, if you need inspiration beyond what has been discussed above.

fatsheep9146 · 2024-02-16T15:26:00Z

Before I support exemplars in pure native histogram in client golang, should we release the client_model with a new tag v0.6.0 then update the dependency in client golang to use the new client model? @beorn7

beorn7 · 2024-02-16T15:30:18Z

True. I'll add the new tag ASAP (hopefully later today).

fatsheep9146 · 2024-02-16T15:32:52Z

The dependency in prometheus server should also be updated, I will also do this after the new tag is ready. @beorn7

beorn7 · 2024-02-16T19:56:49Z

The tag is in place! https://github.com/prometheus/client_model/releases/tag/v0.6.0

beorn7 · 2024-02-21T13:01:41Z

Broad idea for the heuristics:

Let the user configure a number of exemplars n to expose per histogram (default could be around 10).
The first n ObserveWithExemplar calls create an exemplar that we keep for now.
Whenever an ObserveWithExemplar call happens after that, first add the new exemplar to the current set of exemplars, and then delete one according to the following priority list:
1. The oldest exemplar that is older than a certain threshold (maybe 5m, could also be configurable).
2. Find the two exemplars that are closest to each other on an exponential scale. Of those, delete the older one.
A reset of the histogram should also delete all exemplars (so that we never expose an exemplar that doesn't fall into any of the populated buckets).

This should lead to an even spread of exemplars, while not keeping any individual exemplar for overly long.

fatsheep9146 · 2024-02-23T01:52:22Z

"Find the two exemplars that are closest to each other on an exponential scale."
How should I understand this sentence?
Does it mean that I should first take the logarithm of the values in all exemplars, and compare logarithm of newly added exemplar to all of existing exemplars, found the exemplar which has the closest logarithm with newly added exemplar?

For example, there exists two exemplars which has value 4, 6, the histogram schema is 0, the max count of exemplar is 2.
When we add a new exemplar which value is 5, we should compare the diff of logarithm with existing exemplar
log2(5) - log2(4) = 0.321
log2(6) - log2(5) = 0.263

So we should drop 6, keep 4, 5. Do I understand this correctly?

@beorn7

beorn7 · 2024-02-23T11:57:04Z

Yes, sounds about right.

A naive implementation of this could be quite expensive, though, especially if many exemplars are to be kept around. However, for one I assume the number of exemplars will be generally small (around 10?). Also, you can probably play many tricks, like the following:

Keep the logarithm of each exemplar around.
Maintain a list of exemplars sorted by value (so finding the two closest exemplars, it's O(n), not O(n²)).
Maintain a list of exemplars sorted by time (so it's easy to find the oldest exemplar).

beorn7 added this to the Native Histograms milestone Sep 6, 2022

kakkoyun added the keep-open label Nov 7, 2022

krajorama mentioned this issue Nov 8, 2023

Prometheus Native Histogram support phase2 grafana/mimir#4173

Open

carrieedwards added the help wanted label Jan 23, 2024

beorn7 assigned fatsheep9146 Jan 24, 2024

fionaliao removed the help wanted label Feb 6, 2024

fatsheep9146 linked a pull request Mar 15, 2024 that will close this issue

add native histogram exemplar support #1471

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

histograms: Support exemplars in pure Native Histograms #1126

histograms: Support exemplars in pure Native Histograms #1126

beorn7 commented Sep 6, 2022 •

edited

krajorama commented Nov 7, 2023

beorn7 commented Nov 9, 2023

fatsheep9146 commented Jan 24, 2024

beorn7 commented Jan 24, 2024

fatsheep9146 commented Feb 16, 2024

beorn7 commented Feb 16, 2024

fatsheep9146 commented Feb 16, 2024

beorn7 commented Feb 16, 2024

beorn7 commented Feb 21, 2024

fatsheep9146 commented Feb 23, 2024

beorn7 commented Feb 23, 2024

histograms: Support exemplars in pure Native Histograms #1126

histograms: Support exemplars in pure Native Histograms #1126

Comments

beorn7 commented Sep 6, 2022 • edited

krajorama commented Nov 7, 2023

beorn7 commented Nov 9, 2023

fatsheep9146 commented Jan 24, 2024

beorn7 commented Jan 24, 2024

fatsheep9146 commented Feb 16, 2024

beorn7 commented Feb 16, 2024

fatsheep9146 commented Feb 16, 2024

beorn7 commented Feb 16, 2024

beorn7 commented Feb 21, 2024

fatsheep9146 commented Feb 23, 2024

beorn7 commented Feb 23, 2024

beorn7 commented Sep 6, 2022 •

edited