Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Permit histogram's Observe() method to take an observation count #796

Closed
lavalamp opened this issue Aug 31, 2020 · 28 comments

Comments

@lavalamp
Copy link

lavalamp commented Aug 31, 2020

We had a case where we in theory update a histogram at some frequency. In actuality, we only update it when things are happening, and when nothing is happening, we just count the amount of time that has passed and then update it with that many zeros [edit: observations] when the next thing happens.

Right now adding a value to a historgram N times takes N calls of Observe(). It would be very nice if we could pass N as a parameter instead, and if I follow the code, it could be done in constant time.

@lavalamp
Copy link
Author

cc @MikeSpreitzer @logicalhan

@MikeSpreitzer
Copy link

MikeSpreitzer commented Aug 31, 2020

@lavalamp you don't mean "update it with that many zeros", "zero" here is not operative. We call Observe the desired number of times ("N" in your description).

@MikeSpreitzer
Copy link

Thanks for opening this.

@lavalamp
Copy link
Author

Sorry, you're right, that was a confusing level of detail, and it looks like I misunderstood the code, too :)

@logicalhan
Copy link
Contributor

/cc @LogicalShark @brancz

@brancz
Copy link
Member

brancz commented Sep 2, 2020

cc @beorn7

@beorn7
Copy link
Member

beorn7 commented Sep 7, 2020

Apologies for late (post vacation) reply.

I don't quite get the context here. If nothing is happening, why do you need to perform n observations when the next thing is happening? Would that next thing just require that one observation? Nothingness should require no observations, no? (o:

@MikeSpreitzer
Copy link

MikeSpreitzer commented Sep 7, 2020

We want the effect of making Histogram Observations of a certain variable at a highish rate. The implementation is that whenever the variable being observed changes, we call the histogram's Observe method N times, where N is the number of observation period boundaries that have been crossed since the variable last changed. A little more concretely: suppose we want the effect of calling Observe(X) every millisecond. The implementation keeps track of X and numObservations. Whenever X changes, calculate newNumObservations = time.Now().UnixNano()/int64(time.Millisecond), then make newNumObservations - numObservations calls to Observe(X), then set numObservations = newNumObservations.

@beorn7
Copy link
Member

beorn7 commented Sep 7, 2020

OK, to rephrase: You want to sample a relatively rarely changing measured value faster than you could reasonably scrape it with Prometheus, right?

I mean, if you want to call Observe(X) for example every second, I'd recommend to simply expose X directly as a Gauge and let Prometheus scrape it every second. You then have the more fundamental time series data in Prometheus and can act in any way you want on it.

The Summary and Histogram are more meant for "actual" observations that can happen at any time rather than "artificially generated" observations that are just regular samples (or "mini scrapes") of what's essentially a Gauge.

But let's assume that your use case is really one where you need to do that, there is still a problem: A Prometheus endpoint can be scraped at any time. Let's imagine the following timeline:

  • t=0s: X changes to 42. Observe(42) gets called n times according to your formula above.
  • t=5s: Endpoint gets scraped by a Prometheus server.
  • t=15s: Endpoint gets scraped by the same Prometheus server again.
  • t=20s: X changes to 1000, Observe(1000) gets called 20,000 times.
  • t=25s: Endpoint gets scraped by the same Prometheus server again.

From the point of view of the Prometheus server, all those 20,000 observations have happened between t=15s and t=25s. In reality, only 5,000 have happened in that timeframe. 10,000 observations have happened (or should have happened) between t=5s and t=15s, in which timeframe Prometheus will report exactly zero observations. Before t=5s, the Prometheus server has only seen the observations of 42, but none of the remaining 5,000 observations of 1000.

Prometheus doesn't give you exact values, but the way it gives you imprecise values is well defined. Your "delayed" batch-observations would break assumptions here in quite confusing ways.

You might argue that the values of X might change in reality much more often than a scraper scrapes, and you might even suggest a "timeout" to guarantee that Observe is called at least once per second to keep the new kind of error low.

I'd still find that problematic. In particular because there is no theoretical lower limit to a scrape interval. And even a practical lower limit is hard to set. I have seen the rare but real use case of sub-second scrape intervals. If 100ms scrape intervals are a possibility, you have to set your timeout to 10ms or perhaps even 1ms. Which further narrows down where your use case kicks in, e.g. if you want to Observe once per ms as above, nothing is gained by the new feature, because you have to call Observe at least once per ms anyway.

@MikeSpreitzer
Copy link

MikeSpreitzer commented Sep 8, 2020

Yes, fundamentally I am trying to use a Histogram to get information about the behavior of a thing that has a value at every point in time, by sampling that thing more frequently than I can rely on scraping to happen.

BTW, I noticed that I forgot to be precise about the update to X. At the time of a change to X, it is the old value of X that is given to Observe.

And yes, I have a background process that prevents the observations from falling too far behind. But that is only to prevent performance hiccups (the time to make a huge number of calls to Observe) in rare circumstance. In my use case it is rare for there to be a long period of time without updates to X, so the histogram observations normally do not fall far behind.

I usually do two things with a Histogram: compute quantiles, and compute averages. Both start with applying rate to range vectors. So the relevant time delta is the one that defines the range vectors, not the individual scrapes. In my own practice so far, the time delta defining the range vectors is bigger than the period of the background process (enough bigger that the noise of plus-or-minus one background iteration is not a big deal), and I think it is reasonable to expect that to be the practice for other people with my use case.

@beorn7
Copy link
Member

beorn7 commented Sep 9, 2020

BTW, I noticed that I forgot to be precise about the update to X. At the time of a change to X, it is the old value of X that is given to Observe.

That totally makes sense, and I should have seen that before. Revised timeline:

  • t=-10s: X changes to 1971. Observe isn't called yet as this is the 1st change detected,.
  • t=0s: X changes to 42. Observe(1971) gets called 10,000 times according to your formula above.
  • t=5s: Endpoint gets scraped by a Prometheus server.
  • t=15s: Endpoint gets scraped by the same Prometheus server again.
  • t=20s: X changes to 1000, Observe(42) gets called 20,000 times.
  • t=25s: Endpoint gets scraped by the same Prometheus server again.

However, the essence of my argument stays the same. (And I think you understood my line of argument in any case.)

I usually do two things with a Histogram: compute quantiles, and compute averages. Both start with applying rate to range vectors. So the relevant time delta is the one that defines the range vectors, not the individual scrapes. In my own practice so far, the time delta defining the range vectors is bigger than the period of the background process (enough bigger that the noise of plus-or-minus one background iteration is not a big deal), …

In that case, it sounds like a reasonable approximation.

… and I think it is reasonable to expect that to be the practice for other people with my use case.

But that's what I'm not sure of. Fundamentally, Observe should be called at exactly the time the observation happens. If you don't do it that way, you need to understand very well all the implications of your special call timing to make sure you are not creating nonsensical numbers.

This is the first time that someone brings up your use case, so I cannot tell if the "common" case will naturally lead to a sane usage pattern. If we had more use cases, it would both prove that this is not just a bizarre niche feature request as well as inform a good design to implement it.

My educated guess is that your use case is very rare, though. And that's because all of the following must be true for it to apply:

  1. There is a need for high-frequency sampling (higher than ~1Hz) because otherwise you would just expose a Gauge and scrape it normally with Prometheus.
  2. X changes often enough to justify 1. (If X changes once per minute, why do you need high-frequency sampling in the first place?)
  3. But X must not change too often because otherwise the suggested optimization doesn't help in the first place.

In fact, I can see a lot of use cases sampling physical measurements where each Observe call will have a slightly different value. "Infinite frequency of change", if you want (and thus breaking 3.). An X that features discreet changes within a fairly narrow frequency band is already something that I'd expect to be rare. Plus there has to be a need to sample this rare X at high frequency.

But let's assume we want to support that use case. In that case, we should aim for a clean implementation. For example, we could create a "fixed-frequency sampling histogram" type. It would be configured with a frequency and some call-back to signal changes of X. Whenever X changes, it would update the observations, but it would also update the observations whenever Collect is called so that the exposed histogram would safely follow the correct semantics. That would be much cleaner and much easier to use than giving Observe an additional parameter. I believe such a new type of histogram could be easily implemented even outside of this library. It only has to implement the prometheus.Collector interface. And that's what I'd like to suggest. If this new kind of histogram proves to be useful outside of a small niche, I'll happily move it into this library.

@lavalamp
Copy link
Author

lavalamp commented Sep 9, 2020

When I filed this I wondered if it would be better to have a hook that made the "observations" (if any were needed) when the metric was read. I think it would still lead to the same feature request?

If X changes once per minute, why do you need high-frequency sampling ...

In our case, we have N of these thingies (with different labels, I think?), and we expect many of them to be changing often, but a few of them to possibly change infrequently.

@MikeSpreitzer
Copy link

MikeSpreitzer commented Sep 9, 2020

The changes happen at unpredictable times. Some of these variables have very frequent changes, and frankly those are the ones that motivate this request. But the changes are not at regular intervals. It really is a matter of making do with the available instrumentation to get what information we can about fast-varying variables.

@MikeSpreitzer
Copy link

@beorn7 : I think you have characterized the use case too narrowly. In the use case that @lavalamp and I have, there are several of these variables and the timing of their updates is very usage-dependent and thus impossible to predict. It is possible that for some variable there are some sampling periods with several updates; it is possible that for some variable there could be intervals much longer than a sampling period long with no update. We know of know practical way to get complete information about such a variable through Prometheus metrics, so we settle for sampling at a high rate. (I originally made a different proposal, but reviewers preferred sampling.)

These variables are in Kubernetes kube-apiserver processes. We do not control how often they are scraped, that is a configuration detail left up to cluster admins. But one scrape produces a huge amount of information; IIRC, on the order of 10^4 lines. It would be very impractical to tell admins that they should scrape once per second.

If you are up for a new abstraction, so am I. I will look into the approach you suggested.

@MikeSpreitzer
Copy link

MikeSpreitzer commented Sep 10, 2020

BTW, I use vectors of these things. The vector support in prometheus is not public. I will not be able to do vectors decently out-of-tree. Maybe vector support should be made public?

beorn7 added a commit that referenced this issue Sep 10, 2020
MetricVec was already exported in early versions of this library, but
nobody really used it to implement vectors of custom Metric
implementations. Now #796 has shown up with a fairly special use case
for which I'd prefer a custom implementation of a special
"auto-sampling histogram" outside of this library. Therefore, I'd like
to reinstate support for creating vectors of custom Metric
implementations.

I played around for quite some while with the option of a separate
package providing the tools one would need to create vectors of custom
Metric implementations. However, with the current structure of the
prometheus/client_golang/prometheus package, this leads to a lot of
complications with circular dependencies. (The new package would need
the primitives from the prometheus package, while the existing metric
vectors like GaugeVec need to import the new vector package to not
duplicate the implementation. Separating vector types from the main
prometheus package is out of the question at this point because that
would be a breaking change.)

Signed-off-by: beorn7 <beorn@grafana.com>
@beorn7
Copy link
Member

beorn7 commented Sep 10, 2020

The vector support in prometheus is not public. […] Maybe vector support should be made public?

Indeed. In fact, it used to be public for a while, but nobody ever used it (at least not in open-source repositories I searched). But now we have a fine use case for them. Please have a look at #803 .

@MikeSpreitzer
Copy link

So I am pursuing the new abstraction approach in kubernetes/kubernetes#94672 . But this thing dumps its data into a Histogram, so I still need the generalization of Histogram::Observe to take a count parameter.

@beorn7
Copy link
Member

beorn7 commented Sep 14, 2020

So yeah, I think the problem is that an ObserveNTimes function would really pollute the namespace in client_golang as it is essentially an internal function, i.e. if the SamplingHistogram from kubernetes/kubernetes#94672 proves to be useful for others, too, we will just move it into client_golang, and then the code can use the histogram code internally and no exported ObserveNTimes method will be needed. If on the other hand the SamplingHistogram is just a very special case, we don't want an ObserveNTimes method in the first place. And finally, we cannot add ObserveNTimes temporarily because removing it would be a breaking change.

It's also not a lightweight change. We cannot add the method to the Histogram interface because that would be a breaking change, too. Therefore, we needed a new interface type, and users have to type-assert to get to call the new method. (See how we added examplar support to the Observer interface.

What I would suggest is to implement the core histogram functionality in k8s.io/component-base/metrics/prometheusextension/sampling-histogram.go itself, as bad is it might sound. You could copy the code from prometheuslclient_golang, but it can be even simpler: the histogram code in prometheus/client_golang is complicated for mostly one reason: to make Observe calls at the same time fast and concurrency-safe. Since the SamplingHistogram doesn't need fast Observe calls, and the Update method has a mutex anyway, you can just use a naive mutex-based histogram implementation, which is just a handful of LOC. (You could use https://pkg.go.dev/github.com/prometheus/client_golang@v1.7.1/prometheus?tab=doc#NewConstHistogram to do the only remaining heavy-lifting, which is the rendering of the protobuf.)

beorn7 added a commit that referenced this issue Oct 15, 2020
MetricVec was already exported in early versions of this library, but
nobody really used it to implement vectors of custom Metric
implementations. Now #796 has shown up with a fairly special use case
for which I'd prefer a custom implementation of a special
"auto-sampling histogram" outside of this library. Therefore, I'd like
to reinstate support for creating vectors of custom Metric
implementations.

I played around for quite some while with the option of a separate
package providing the tools one would need to create vectors of custom
Metric implementations. However, with the current structure of the
prometheus/client_golang/prometheus package, this leads to a lot of
complications with circular dependencies. (The new package would need
the primitives from the prometheus package, while the existing metric
vectors like GaugeVec need to import the new vector package to not
duplicate the implementation. Separating vector types from the main
prometheus package is out of the question at this point because that
would be a breaking change.)

Signed-off-by: beorn7 <beorn@grafana.com>
sbookworm pushed a commit to sbookworm/client_golang that referenced this issue Oct 20, 2020
MetricVec was already exported in early versions of this library, but
nobody really used it to implement vectors of custom Metric
implementations. Now prometheus#796 has shown up with a fairly special use case
for which I'd prefer a custom implementation of a special
"auto-sampling histogram" outside of this library. Therefore, I'd like
to reinstate support for creating vectors of custom Metric
implementations.

I played around for quite some while with the option of a separate
package providing the tools one would need to create vectors of custom
Metric implementations. However, with the current structure of the
prometheus/client_golang/prometheus package, this leads to a lot of
complications with circular dependencies. (The new package would need
the primitives from the prometheus package, while the existing metric
vectors like GaugeVec need to import the new vector package to not
duplicate the implementation. Separating vector types from the main
prometheus package is out of the question at this point because that
would be a breaking change.)

Signed-off-by: beorn7 <beorn@grafana.com>
@beorn7
Copy link
Member

beorn7 commented Feb 12, 2021

The related PR kubernetes/kubernetes#94672 has entered the "rotten" state by now. My offer to help has not been responded to so far, so I'm now closing this issue, too. I'm still happy to help, just let me know.

@beorn7 beorn7 closed this as completed Feb 12, 2021
@MikeSpreitzer
Copy link

Yes, I would like to get to this. Just a little swamped right now.

@caibirdme
Copy link

I'm facing a related case , for simplicity, imagine the messages in kafka looks like this

{"api": "/a/b", "cost_time": 800, "sample_ratio": 0.01}

That means the request to /a/b cost 800ms and the sampling ratio is 1%.
Since the sample_ratio varies api to api, so when I calc the requests_total, I did something like

requestTotal.WithLabelValues(...).Inc( 1 / sample_ratio)

But for histogram, there's no similar method, so I have to do things like:

for i := 0; i< int(1/ sampleRatio); i++ {
   apiHistogram.WithLableValues(...).Observe(totalCost)
}

If there's a function like: ObserveN(value float64, times int), it'd be more convenient

@beorn7
Copy link
Member

beorn7 commented Nov 22, 2021

@caibirdme that's an interesting use case. Still not sure how often it will come up and if it justifies 1st class treatment directly here in this repository. (You can always implement it yourself and even advertise it via community channels to see if there are more with the same need. If that turns out to be the case, it's easy to accept it as a contribution here. I should also mention at this point that the maintainers of the repo have changed. By now, the people making the call are @bwplotka and @kakkoyun .)

In different news, to address the original request, I created MikeSpreitzer/kubernetes#2 , which is what I proposed above, and it's actually not a big deal. However, @MikeSpreitzer has never responded to it. Thus, I'm not even sure if the original use case is still valid.

@MikeSpreitzer
Copy link

MikeSpreitzer commented Mar 28, 2022

Sorry, I have been swamped, but this is still an active issue. I have two additional thoughts.

I realized that what is going on here is not that I want high frequency sampling, that was just an approximation. For each one of these histograms, it is about observing a variable that changes at times that are very difficult to characterize. What I actually want is to know how much time, cumulatively since start, this variable's value has been in each of the ranges defined by the bucket boundaries. So what I really want at a high level is an Observe(x float64) that maps to a call to Add(prevX float64, secsSincePrevAdd float64) on an underlying histogram.

At the other end, I need PromQL to keep working. I need to be able to apply histogram_quantile to such a thing.

@MikeSpreitzer
Copy link

/reopen

@MikeSpreitzer
Copy link

@beorn7 ,

// buckets is a map of upper bounds to cumulative counts, excluding the +Inf
// bucket.
is a surprising exclusion. I would normally expect to be able to deliver counts for over-the-top values.

@MikeSpreitzer
Copy link

.. or is that possible by just including them in

count uint64,
sum float64,
?

@beorn7
Copy link
Member

beorn7 commented Mar 31, 2022

Yes, the count and sum include all observations. The +Inf bucket is by definition equal to the count and therefore redundant.

@beorn7
Copy link
Member

beorn7 commented Mar 31, 2022

About the bigger topic here:

I see what you are doing. That's really interesting. I think this will result in a Histogram with the interface of a Gauge. A Gauge can be set and can be added to, and under the hood, it would count the duration the Gauge value was in each bucket. It could even take into account the precise duration for the most current value at scrape time.

If the changes of the Gauge value happen rarely compared to the scrape frequency, we can already emulate this behavior now with a plain Gauge and the quantile_over_time function. But obviously, if you need precise sampling of high-frequency changes, that's not enough.

Interestingly, this results in a Histogram where the "count" in the buckets is not an integer anymore but a float. Which would even be tolerated by Prometheus and PromQL because all sample values are floats. The original Prometheus text format would also tolerate it, but OpenMetrics is specified more precisely and explicitly requires that "bucket values MUST be integers". The same is true for the original Prometheus protobuf format, where bucket values are also integers.

kubernetes/kubernetes#109094 more or less elegantly works around this by using nanoseconds as the time unit. Since the Go time.Duration type is inherently an int64 counting nanoseconds, we are back in integer land. However, a Histogram with float buckets would allow more obvious units like seconds, and it would not be limited to nanosecond precision.

I'm not sure if the use case is common enough to introduce a "float version" of the current histograms. However, for the new sparse high-res histograms being developed, it's not too late to take this use case into account. Interestingly, we already introduced a FloatHistogram for internal use (as PromQL will generate "scaled" histograms all the time). We were actually wondering if there was a use case for such a FloatHistogram in the exposition format, and here we have one!

Coming back to the existing histograms: I think it would make sense to file a new issue with the appropriate title rather than reopening and renaming this issue. In that way, we can start with a clean slate and not have newcomers read through the whole discussion above. @MikeSpreitzer perhaps you could do that, referring to kubernetes/kubernetes#109094 as the experiment that can serve as the role model? Ultimately, @bwplotka and @kakkoyun need to decide how to move forward from here. (I'm not the client_golang maintainer anymore, but I am leading the effort to get us the new high-res histograms, and I'll definitely take this use-case into account, see previous paragraph.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants