Some aggregations and functions produce incorrect results for native histograms #13934

NeerajGartia21 · 2024-04-15T21:17:06Z

Currently there are some operators and aggregations that produce either incorrect or inconsistent results for native histograms in some cases. Which are-

	Args	Current Behaviour	Expected Behaviour
`+` and `-`	`float`, `histogram` in any order	returns `float`	no result, add info annotation
`*`	`histogram`, `histogram`	returns `float`	no result, add info annotation
`/`	`float`, `histogram` (in this order! `histogram`, `float` is fine)	returns `float` ignoring histogram	no result, add info annotation
`/`	`histogram`, `0`	returns `histogram` with every value divided by zero	return `histogram` with `sum` and `count` as `NaN` or `Inf` and all buckets removed
`min` and `max`	combination of `histogram` and `float`	returns `scalar` by taking `histogram` as 0	return `min`/`max` of all the floats and add info annotation
`min` and `max`	only `histogram`	returns `scalar` by taking `histogram` as 0	no result, add info annotation

Here I've added the cases which I've found till now. I'll update it with more such cases subsequently and also will fix these issues. If you have encountered any such cases then add them here either by editing it or comment them.

The text was updated successfully, but these errors were encountered:

beorn7 · 2024-04-20T21:28:04Z

Related: #11973

beorn7 · 2024-04-20T21:37:22Z

My thoughts:

I think the (float, histogram) args for arithmetic binary operators needs to be broken up. For example, float + histogram or histogram - float doesn't work, but float * histogram and histogram / float is just fine (but float / histogram is not). histogram + histogram is fine, but histogram * histogram is not.

For min and max, I think it's fine to just return nothing for aggregations of only histograms. The warning should only be issued if the aggregation tries to aggregate histograms and floats into the same key. (For example, if foo is a vector that has both floats and histograms, and all the histograms have a kind="histogram" label and all the floats have a kind="float" label, then min by (kind)(foo) should not give a warning (but the only element in the result vector has kind="float"), but min(foo) should give a warning.)

I think the current behavior of count is fine. count just counts sample, and a float and a histogram should be counted the same. (I think that's even documented.)

histogram / 0 should return sum and count and the zero bucket "normally" (I assume in the way it is done already), but then all (normal) buckets should be removed (because all the non-represented empty buckets should be NaN now, which we cannot represent. So better have no buckets at all. (This case is anyway pathologic.)

beorn7 · 2024-04-20T22:00:00Z

The basic idea is this: Where operations and aggregations make sense (between histogram and histogram, or between histogram and float), they happen. Where they don't make sense, the histogram is treated as if the vector element with the histogram did not exist. This is not an error, but we attach a "warn"-level annotation (or maybe "info"-level only?) if any vector element is label-matched with an ignored histogram. That implies no annotation is added if two ignored histograms are matched or if an ignored histogram is matched with a scalar (which is not label-matching).

The rationale here is that label matching might be complex, and if a label match leads to an impossible combination, it might be accidental and the user should know about it. In contrast to that, 2 / foo essentially means "Divide 2 by all the float elements and ignore the histogram elements" and that's just fine and can deliberately be used for filtering. However, if I write foo + bar, the user's idea is probably that the label-matched elements should also have the same type, and if a float gets matched with a histogram, it's probably an error. (And the "probably" makes me think we should perhaps use "info" level, which implies that false positives are possible, while "warn" level is meant for annotations that are always triggered by something that is "wrong" in some way and could be fixed somehow.)

(This is all just my stream of thoughts. Feel free to express other viewpoints.)

NeerajGartia21 · 2024-04-24T19:04:30Z

I think the (float, histogram) args for arithmetic binary operators needs to be broken up. For example, float + histogram or histogram - float doesn't work, but float * histogram and histogram / float is just fine (but float / histogram is not). histogram + histogram is fine, but histogram * histogram is not.

Yes, sorry, I missed this case. I'll break the operators further with different arguments and update the table.

For min and max, I think it's fine to just return nothing for aggregations of only histograms. The warning should only be issued if the aggregation tries to aggregate histograms and floats into the same key. (For example, if foo is a vector that has both floats and histograms, and all the histograms have a kind="histogram" label and all the floats have a kind="float" label, then min by (kind)(foo) should not give a warning (but the only element in the result vector has kind="float"), but min(foo) should give a warning.)

So the gist here is that after grouping the labels if the vector contains both histogram and float then we'll issue a warning otherwise we will return the min in case of float only vector and nothing in case of histogram only vector. Please correct me if I misunderstood something.

I think the current behavior of count is fine. count just counts sample, and a float and a histogram should be counted the same. (I think that's even documented.)

yeah, my bad, I somehow got confused between count and histogram_count.

I'll make all these changes into the table.

NeerajGartia21 · 2024-04-30T19:20:39Z

The rationale here is that label matching might be complex, and if a label match leads to an impossible combination, it might be accidental and the user should know about it. In contrast to that, 2 / foo essentially means "Divide 2 by all the float elements and ignore the histogram elements" and that's just fine and can deliberately be used for filtering. However, if I write foo + bar, the user's idea is probably that the label-matched elements should also have the same type, and if a float gets matched with a histogram, it's probably an error. (And the "probably" makes me think we should perhaps use "info" level, which implies that false positives are possible, while "warn" level is meant for annotations that are always triggered by something that is "wrong" in some way and could be fixed somehow.)

In my opinion if the operation makes sense and something is returned from that operation but we have ignored histogram in the vector ( like 2/foo ) then we can add an info level annotation stating that histograms are ignored in the operation and if the operation does not makes any sense (like float+histogram ) then I think we should add a warn level annotation that this operation for the given arguments is not possible (or something like that). These are my thoughts pls correct me.

beorn7 · 2024-05-02T14:36:28Z

So the gist here is that after grouping the labels if the vector contains both histogram and float then we'll issue a warning otherwise we will return the min in case of float only vector and nothing in case of histogram only vector.

Yes, that's the idea (but the case of how to deal with histogram-only could still be debated, see your later comment).

beorn7 · 2024-05-02T14:50:24Z

In my opinion if the operation makes sense and something is returned from that operation but we have ignored histogram in the vector ( like 2/foo ) then we can add an info level annotation stating that histograms are ignored in the operation and if the operation does not makes any sense (like float+histogram ) then I think we should add a warn level annotation that this operation for the given arguments is not possible (or something like that). These are my thoughts pls correct me.

That's also an option. If I understand you correctly, you would add an info annotation to the 2 / foo expression if foo contains both histograms and floats (so something is returned) but if foo is a pure histogram vector, you would add a warning annotation (because nothing is returned). That might be hard to implement (you can only decide about the level of the annotation when looking at the whole vector), but maybe it's doable. I'm a bit more concerned about the change of behavior that a "float divided by histogram" operation becomes "more bad" if the histogram is not accompanied by a float in the same vector.

Another thought in this regard: A mix of floats and histograms in the same vector should be a very rare case. So maybe we don't need to think too much about treating that case specially or with concerns about spammy annotations. It shouldn't really happen in practice, and if it does, it's probably a faulty setup or a transition period and will be fixed soon. So maybe let's say for now, whenever we ignore a histogram in an operation because it doesn't make sense, we add an info-level annotation saying that an operation wasn't performed, even if the expression still yields a result from other vector elements. We can do this first, it's easy, and then iterate on it once we see how it plays out in practice.

beorn7 · 2024-05-08T18:01:54Z

I have updated the table above according to my last paragraph above.

NeerajGartia21 · 2024-05-09T18:41:59Z

Thanks for updating the table Beorn. I was busy and couldn't update the table fully.

beorn7 added this to the Native Histograms milestone Apr 20, 2024

beorn7 added kind/bug priority/P2 component/promql labels Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some aggregations and functions produce incorrect results for native histograms #13934

Some aggregations and functions produce incorrect results for native histograms #13934

NeerajGartia21 commented Apr 15, 2024 •

edited by beorn7

beorn7 commented Apr 20, 2024

beorn7 commented Apr 20, 2024 •

edited

beorn7 commented Apr 20, 2024 •

edited

NeerajGartia21 commented Apr 24, 2024

NeerajGartia21 commented Apr 30, 2024 •

edited

beorn7 commented May 2, 2024

beorn7 commented May 2, 2024

beorn7 commented May 8, 2024

NeerajGartia21 commented May 9, 2024

Some aggregations and functions produce incorrect results for native histograms #13934

Some aggregations and functions produce incorrect results for native histograms #13934

Comments

NeerajGartia21 commented Apr 15, 2024 • edited by beorn7

beorn7 commented Apr 20, 2024

beorn7 commented Apr 20, 2024 • edited

beorn7 commented Apr 20, 2024 • edited

NeerajGartia21 commented Apr 24, 2024

NeerajGartia21 commented Apr 30, 2024 • edited

beorn7 commented May 2, 2024

beorn7 commented May 2, 2024

beorn7 commented May 8, 2024

NeerajGartia21 commented May 9, 2024

NeerajGartia21 commented Apr 15, 2024 •

edited by beorn7

beorn7 commented Apr 20, 2024 •

edited

beorn7 commented Apr 20, 2024 •

edited

NeerajGartia21 commented Apr 30, 2024 •

edited