New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Permit histogram's Observe() method to take an observation count #796
Comments
@lavalamp you don't mean "update it with that many zeros", "zero" here is not operative. We call |
Thanks for opening this. |
Sorry, you're right, that was a confusing level of detail, and it looks like I misunderstood the code, too :) |
/cc @LogicalShark @brancz |
cc @beorn7 |
Apologies for late (post vacation) reply. I don't quite get the context here. If nothing is happening, why do you need to perform n observations when the next thing is happening? Would that next thing just require that one observation? Nothingness should require no observations, no? (o: |
We want the effect of making Histogram Observations of a certain variable at a highish rate. The implementation is that whenever the variable being observed changes, we call the histogram's Observe method N times, where N is the number of observation period boundaries that have been crossed since the variable last changed. A little more concretely: suppose we want the effect of calling |
OK, to rephrase: You want to sample a relatively rarely changing measured value faster than you could reasonably scrape it with Prometheus, right? I mean, if you want to call The Summary and Histogram are more meant for "actual" observations that can happen at any time rather than "artificially generated" observations that are just regular samples (or "mini scrapes") of what's essentially a Gauge. But let's assume that your use case is really one where you need to do that, there is still a problem: A Prometheus endpoint can be scraped at any time. Let's imagine the following timeline:
From the point of view of the Prometheus server, all those 20,000 observations have happened between t=15s and t=25s. In reality, only 5,000 have happened in that timeframe. 10,000 observations have happened (or should have happened) between t=5s and t=15s, in which timeframe Prometheus will report exactly zero observations. Before t=5s, the Prometheus server has only seen the observations of 42, but none of the remaining 5,000 observations of 1000. Prometheus doesn't give you exact values, but the way it gives you imprecise values is well defined. Your "delayed" batch-observations would break assumptions here in quite confusing ways. You might argue that the values of X might change in reality much more often than a scraper scrapes, and you might even suggest a "timeout" to guarantee that I'd still find that problematic. In particular because there is no theoretical lower limit to a scrape interval. And even a practical lower limit is hard to set. I have seen the rare but real use case of sub-second scrape intervals. If 100ms scrape intervals are a possibility, you have to set your timeout to 10ms or perhaps even 1ms. Which further narrows down where your use case kicks in, e.g. if you want to |
Yes, fundamentally I am trying to use a Histogram to get information about the behavior of a thing that has a value at every point in time, by sampling that thing more frequently than I can rely on scraping to happen. BTW, I noticed that I forgot to be precise about the update to X. At the time of a change to X, it is the old value of X that is given to Observe. And yes, I have a background process that prevents the observations from falling too far behind. But that is only to prevent performance hiccups (the time to make a huge number of calls to Observe) in rare circumstance. In my use case it is rare for there to be a long period of time without updates to X, so the histogram observations normally do not fall far behind. I usually do two things with a Histogram: compute quantiles, and compute averages. Both start with applying |
That totally makes sense, and I should have seen that before. Revised timeline:
However, the essence of my argument stays the same. (And I think you understood my line of argument in any case.)
In that case, it sounds like a reasonable approximation.
But that's what I'm not sure of. Fundamentally, This is the first time that someone brings up your use case, so I cannot tell if the "common" case will naturally lead to a sane usage pattern. If we had more use cases, it would both prove that this is not just a bizarre niche feature request as well as inform a good design to implement it. My educated guess is that your use case is very rare, though. And that's because all of the following must be true for it to apply:
In fact, I can see a lot of use cases sampling physical measurements where each But let's assume we want to support that use case. In that case, we should aim for a clean implementation. For example, we could create a "fixed-frequency sampling histogram" type. It would be configured with a frequency and some call-back to signal changes of X. Whenever X changes, it would update the observations, but it would also update the observations whenever |
When I filed this I wondered if it would be better to have a hook that made the "observations" (if any were needed) when the metric was read. I think it would still lead to the same feature request?
In our case, we have N of these thingies (with different labels, I think?), and we expect many of them to be changing often, but a few of them to possibly change infrequently. |
The changes happen at unpredictable times. Some of these variables have very frequent changes, and frankly those are the ones that motivate this request. But the changes are not at regular intervals. It really is a matter of making do with the available instrumentation to get what information we can about fast-varying variables. |
@beorn7 : I think you have characterized the use case too narrowly. In the use case that @lavalamp and I have, there are several of these variables and the timing of their updates is very usage-dependent and thus impossible to predict. It is possible that for some variable there are some sampling periods with several updates; it is possible that for some variable there could be intervals much longer than a sampling period long with no update. We know of know practical way to get complete information about such a variable through Prometheus metrics, so we settle for sampling at a high rate. (I originally made a different proposal, but reviewers preferred sampling.) These variables are in Kubernetes kube-apiserver processes. We do not control how often they are scraped, that is a configuration detail left up to cluster admins. But one scrape produces a huge amount of information; IIRC, on the order of 10^4 lines. It would be very impractical to tell admins that they should scrape once per second. If you are up for a new abstraction, so am I. I will look into the approach you suggested. |
BTW, I use vectors of these things. The vector support in prometheus is not public. I will not be able to do vectors decently out-of-tree. Maybe vector support should be made public? |
MetricVec was already exported in early versions of this library, but nobody really used it to implement vectors of custom Metric implementations. Now #796 has shown up with a fairly special use case for which I'd prefer a custom implementation of a special "auto-sampling histogram" outside of this library. Therefore, I'd like to reinstate support for creating vectors of custom Metric implementations. I played around for quite some while with the option of a separate package providing the tools one would need to create vectors of custom Metric implementations. However, with the current structure of the prometheus/client_golang/prometheus package, this leads to a lot of complications with circular dependencies. (The new package would need the primitives from the prometheus package, while the existing metric vectors like GaugeVec need to import the new vector package to not duplicate the implementation. Separating vector types from the main prometheus package is out of the question at this point because that would be a breaking change.) Signed-off-by: beorn7 <beorn@grafana.com>
Indeed. In fact, it used to be public for a while, but nobody ever used it (at least not in open-source repositories I searched). But now we have a fine use case for them. Please have a look at #803 . |
So I am pursuing the new abstraction approach in kubernetes/kubernetes#94672 . But this thing dumps its data into a Histogram, so I still need the generalization of Histogram::Observe to take a count parameter. |
So yeah, I think the problem is that an It's also not a lightweight change. We cannot add the method to the What I would suggest is to implement the core histogram functionality in k8s.io/component-base/metrics/prometheusextension/sampling-histogram.go itself, as bad is it might sound. You could copy the code from prometheuslclient_golang, but it can be even simpler: the histogram code in prometheus/client_golang is complicated for mostly one reason: to make |
MetricVec was already exported in early versions of this library, but nobody really used it to implement vectors of custom Metric implementations. Now #796 has shown up with a fairly special use case for which I'd prefer a custom implementation of a special "auto-sampling histogram" outside of this library. Therefore, I'd like to reinstate support for creating vectors of custom Metric implementations. I played around for quite some while with the option of a separate package providing the tools one would need to create vectors of custom Metric implementations. However, with the current structure of the prometheus/client_golang/prometheus package, this leads to a lot of complications with circular dependencies. (The new package would need the primitives from the prometheus package, while the existing metric vectors like GaugeVec need to import the new vector package to not duplicate the implementation. Separating vector types from the main prometheus package is out of the question at this point because that would be a breaking change.) Signed-off-by: beorn7 <beorn@grafana.com>
MetricVec was already exported in early versions of this library, but nobody really used it to implement vectors of custom Metric implementations. Now prometheus#796 has shown up with a fairly special use case for which I'd prefer a custom implementation of a special "auto-sampling histogram" outside of this library. Therefore, I'd like to reinstate support for creating vectors of custom Metric implementations. I played around for quite some while with the option of a separate package providing the tools one would need to create vectors of custom Metric implementations. However, with the current structure of the prometheus/client_golang/prometheus package, this leads to a lot of complications with circular dependencies. (The new package would need the primitives from the prometheus package, while the existing metric vectors like GaugeVec need to import the new vector package to not duplicate the implementation. Separating vector types from the main prometheus package is out of the question at this point because that would be a breaking change.) Signed-off-by: beorn7 <beorn@grafana.com>
The related PR kubernetes/kubernetes#94672 has entered the "rotten" state by now. My offer to help has not been responded to so far, so I'm now closing this issue, too. I'm still happy to help, just let me know. |
Yes, I would like to get to this. Just a little swamped right now. |
I'm facing a related case , for simplicity, imagine the messages in kafka looks like this
That means the request to /a/b cost 800ms and the sampling ratio is 1%. requestTotal.WithLabelValues(...).Inc( 1 / sample_ratio) But for histogram, there's no similar method, so I have to do things like: for i := 0; i< int(1/ sampleRatio); i++ {
apiHistogram.WithLableValues(...).Observe(totalCost)
} If there's a function like: |
@caibirdme that's an interesting use case. Still not sure how often it will come up and if it justifies 1st class treatment directly here in this repository. (You can always implement it yourself and even advertise it via community channels to see if there are more with the same need. If that turns out to be the case, it's easy to accept it as a contribution here. I should also mention at this point that the maintainers of the repo have changed. By now, the people making the call are @bwplotka and @kakkoyun .) In different news, to address the original request, I created MikeSpreitzer/kubernetes#2 , which is what I proposed above, and it's actually not a big deal. However, @MikeSpreitzer has never responded to it. Thus, I'm not even sure if the original use case is still valid. |
Sorry, I have been swamped, but this is still an active issue. I have two additional thoughts. I realized that what is going on here is not that I want high frequency sampling, that was just an approximation. For each one of these histograms, it is about observing a variable that changes at times that are very difficult to characterize. What I actually want is to know how much time, cumulatively since start, this variable's value has been in each of the ranges defined by the bucket boundaries. So what I really want at a high level is an At the other end, I need PromQL to keep working. I need to be able to apply |
/reopen |
@beorn7 , client_golang/prometheus/histogram.go Lines 615 to 616 in 8dfa334
|
.. or is that possible by just including them in client_golang/prometheus/histogram.go Lines 622 to 623 in 8dfa334
|
Yes, the count and sum include all observations. The |
About the bigger topic here: I see what you are doing. That's really interesting. I think this will result in a Histogram with the interface of a Gauge. A Gauge can be set and can be added to, and under the hood, it would count the duration the Gauge value was in each bucket. It could even take into account the precise duration for the most current value at scrape time. If the changes of the Gauge value happen rarely compared to the scrape frequency, we can already emulate this behavior now with a plain Gauge and the Interestingly, this results in a Histogram where the "count" in the buckets is not an integer anymore but a float. Which would even be tolerated by Prometheus and PromQL because all sample values are floats. The original Prometheus text format would also tolerate it, but OpenMetrics is specified more precisely and explicitly requires that "bucket values MUST be integers". The same is true for the original Prometheus protobuf format, where bucket values are also integers. kubernetes/kubernetes#109094 more or less elegantly works around this by using nanoseconds as the time unit. Since the Go I'm not sure if the use case is common enough to introduce a "float version" of the current histograms. However, for the new sparse high-res histograms being developed, it's not too late to take this use case into account. Interestingly, we already introduced a Coming back to the existing histograms: I think it would make sense to file a new issue with the appropriate title rather than reopening and renaming this issue. In that way, we can start with a clean slate and not have newcomers read through the whole discussion above. @MikeSpreitzer perhaps you could do that, referring to kubernetes/kubernetes#109094 as the experiment that can serve as the role model? Ultimately, @bwplotka and @kakkoyun need to decide how to move forward from here. (I'm not the client_golang maintainer anymore, but I am leading the effort to get us the new high-res histograms, and I'll definitely take this use-case into account, see previous paragraph.) |
We had a case where we in theory update a histogram at some frequency. In actuality, we only update it when things are happening, and when nothing is happening, we just count the amount of time that has passed and then update it with that many
zeros[edit: observations] when the next thing happens.Right now adding a value to a historgram N times takes N calls of Observe(). It would be very nice if we could pass N as a parameter instead, and if I follow the code, it could be done in constant time.
The text was updated successfully, but these errors were encountered: