histogram: make timeseries histogram visualization compatible with v3 data format #5392

yatbear · 2021-10-29T22:01:22Z

Change the timeseries histogram visualization logics, before the count of a 0-width bin is split into two halves, now each zero-width bin count will be full allocated to one bin, with the restriction that all redistributed bins (or ranges) are closed-open except the last one.

Current visualization implementation with v3 histogram data input:
Current visualization implementation with v2 histogram data input:

#histogram #tpu

nfelt

Thanks for taking this on! Code LGTM, just some comments on the code comments and tests. I do think this makes the Time Series visualization more correct, even if it's hard to actually generate a case where it is visually different.

FWIW - I expected to be able to get this to visually look different by changing the demo's single_value example to be ten 0.0 values. This should have triggered the "wrong" treatment (two peaks of count 5, rather than one peak of count 10), because this code would generate 30 intervals from -1.0 to 1.0, and right at the middle, the 0.0 values would fall on the middle boundary and get split 0.5/0.5.

However, that doesn't happen, which I eventually figured out is due to floating point error 🤦‍♂️ The bins generated by the logic in that case actually look like this:

0: {x: -1, dx: 0.06666666666666667, y: 0}
1: {x: -0.9333333333333333, dx: 0.06666666666666667, y: 0}
...
14: {x: -0.06666666666666665, dx: 0.06666666666666667, y: 10}
15: {x: 0, dx: 0.06666666666666667, y: 0}
...
29: {x: 0.9333333333333333, dx: 0.06666666666666667, y: 0}

Critically, although the x for bin 15 is exactly 0, for bin 14, x = 0.0666...5 while dx = 0.0666...7, so they don't quite add up to 0.0:

>>> -0.06666666666666665 + 0.06666666666666667
1.3877787807814457e-17

As a result, because it sees that 0.0 < 1.38777...e-17, doesn't split them evenly after all (it does technically put them in the "wrong" bin, but visually that's less important).

Anyway, maybe at some point I'll try to fix this by ensuring we actually use the same exact values for left/right edges of adjacent bins, since at least in theory having two different calculations could mean we double-count or incorrectly omit counts.

nfelt · 2021-10-29T23:37:51Z

tensorboard/webapp/widgets/histogram/histogram_util.ts

@@ -149,8 +149,14 @@ function rebuildBins(bins: Bin[], range: Range, binCount: number): Bin[] {

 /**
 * Computes how much of the input bin's 'y' counts should be allocated to a new
- * range. For 0 width input bins, the allocation may be split in half across 2
- * bins.
+ * range. For 0 width input bins, the allocation will be distributed to the last


I think this description isn't quite right - zero width input bins shouldn't always get distributed to the last "result" bin (the one that is closed-closed), it depends on the alignment of the bins.

For example, we can imagine an input which has the following bins, where the final bin in each step has count 10:

Step 1: [0.0, 0.0), [0.0, 0.0] Step 2: [1.0, 1.0), [1.0, 1.0]

In this case, if we imagine the output also has 2 bins, then it will span the range [0.0, 1.0] (the min and max across all steps), and so the output bins will be [0.0, 0.5) and [0.5, 1.0]. In that case, the count of 10 from step 1 should get distributed to the first output bin, not to the final output bin.

What I would recommend here is actually not to talk about the overall distribution of bins and just to focus on the specific arguments to the function. E.g. something like:

Computes how much of the input bin's 'y' counts should be allocated to this output bin.

Where both bins have non-zero width, this computed by multiplying the input y value by the ratio of the width-wise overlap in the bins to the total width of the output bin. (This can be thought of redistributing the overlapping "area" of the bar in the input histogram across the full width of the output bin.)

When the input bin has zero width (the output bin cannot have zero width by construction), we instead have to consider several cases depending on the open/closed-ness of the underlying intervals. If the zero width input bin has y value 0, the contribution is always 0. Otherwise, if zero width input bin has y value greater than 0, it must represent the closed interval [x, x]. In this case, it contributes the full value of y if and only if the output bin's interval contains x. This interval is the closed-open interval [resultLeft, resultRight), except if resultHasRightNeighbor is false, in which case it's the closed interval [resultLeft, resultRight].

Changed the description, thanks!

tensorboard/webapp/widgets/histogram/histogram_util_test.ts

nfelt · 2021-10-29T23:46:51Z

tensorboard/webapp/widgets/histogram/histogram_util_test.ts

@@ -276,8 +276,8 @@ describe('histogram util', () => {
          )
        ).toEqual([
          [
-            {x: 0, dx: 5, y: 100},
-            {x: 5, dx: 5, y: 100},
+            {x: 0, dx: 5, y: 0},


FWIW, these input bins aren't a great test case at this point, since they're definitely not valid histogram v3 format (they're non-contiguous because the bins have gaps between them, and zero-width bins should only have a non-zero count if they're the final bin). Ideally with the histogram v3 data_compat.py changes, we should be validating that any histogram data reaching the plugin is valid v3 format, so it shouldn't be possible for data like this to actually reach the frontend.

If you don't mind updating it, bins like this would be more realistic (EDIT: corrected 3rd bin's dx value):

binsToHistogram([ {x: 0, dx: 1, y: 10}, {x: 1, dx: 1, y: 10}, {x: 2, dx: 0, y: 10}, ]),

With the expected 2-bin output being [{x: 0, dx: 1, y:10}, {x:1, dx:1, y:20}]. You could also just combine this with the other test case below IMO.

Changed the example to be zero-width bin input in two steps.

Thanks! That makes sense for the last test case here about multiple steps, but the input bins here could still be changed? Sorry, the GH UI won't let me comment directly, but what I'm referring to is lines 268-272 in this file:

binsToHistogram([ {x: 0, dx: 1, y: 0}, {x: 5, dx: 0, y: 200}, {x: 9, dx: 1, y: 0}, ]),

As described below, binsToHistogram() is supposed to simulate a single step's worth of data, but in the valid v3 format, we should never get a single step with bins like this (where there is a zero-width bin that is not the final bin, but is not empty either). Hence the suggestion to modify this input to something more realistic.

Note that I made a mistake in my original comment in one of the dx values in my example, which I've edited to fix.

Changed the test case.

nfelt · 2021-10-29T23:48:17Z

tensorboard/webapp/widgets/histogram/histogram_util_test.ts

-            {x: 0, dx: 5, y: 100},
-            {x: 5, dx: 5, y: 100},
+            {x: 0, dx: 5, y: 0},
+            {x: 5, dx: 5, y: 200},


Mind also changing the name of this test case? "redistributes 0 width bin evenly over edges of result bins" is no longer quite accurate.

Removed this test case since it's basically duplicate with the newly edited test suggested above (produces result bins from multiple 0 width bins in different steps).

Oops, I actually commented in the wrong place - I meant to comment on the test name redistributes 0 width bin evenly over edges of result bins (line 263) but GH won't let me comment there. It's the "evenly" that isn't accurate any more.

Got it! Renamed.

nfelt

Apologies for the long review latency on this, and thank you for the documentation updates! Just added a few more comments on the tests, sorry my original comments weren't all that clear / had some mistakes.

nfelt · 2021-11-03T22:54:55Z

tensorboard/webapp/widgets/histogram/histogram_util.ts

- * bins.
+ * Computes how much of the input bin's 'y' counts should be allocated to this output bin.
+ *
+ * Where both bins have non-zero width, this computed by multiplying the input y value by


Missing word, should be "this is computed" (my bad leaving that out originally)

Done. Thanks for noticing!

nfelt · 2021-11-03T23:04:01Z

tensorboard/webapp/widgets/histogram/histogram_util.ts

+ * [resultLeft, resultRight), except if resultHasRightNeighbor is false, in which case it's
+ * the closed interval [resultLeft, resultRight].
+ *
+ * For example, assuming the middle bin has width 0 and count 20:


I'd probably omit this example, especially since the input bins shown aren't technically a valid set of bins and counts - since there is a zero-width bin that isn't the final bin, so it must be a closed-open interval, but it has a non-empty count, which shouldn't be possible.

If we do keep an example, I'd suggest phasing it in terms of the specific arguments to this function, like having a table in which there are columns for the function arguments and for the function return value, and each row represents a specific case, to help illustrate the logic in the paragraph of description. But that might also be better just implemented as a set of test cases.

nfelt · 2021-11-03T23:05:48Z

tensorboard/webapp/widgets/histogram/histogram_util_test.ts

-            {x: 0, dx: 5, y: 100},
-            {x: 5, dx: 5, y: 100},
+            {x: 0, dx: 5, y: 0},
+            {x: 5, dx: 5, y: 200},


Oops, I actually commented in the wrong place - I meant to comment on the test name redistributes 0 width bin evenly over edges of result bins (line 263) but GH won't let me comment there. It's the "evenly" that isn't accurate any more.

nfelt · 2021-11-03T23:16:45Z

tensorboard/webapp/widgets/histogram/histogram_util_test.ts

-        'produces result bins with full and partial contributions from ' +
-          'multiple 0 width bins',
+        'produces result bins from multiple 0 width bins in different ' +
+          'steps',


To simulate different steps, it would be best if for the input here we call binsToHistogram() more than once, e.g. something like this:

expect( histogramsToBins( buildNormalizedHistograms( [ binsToHistogram([{x: 0, dx: 0, y: 200}]), binsToHistogram([{x: 1.0, dx: 0, y: 100}]), ], 2 ) ) ).toEqual([...]);

That's because each call of binsToHistogram() is meant to represent a single step of data, and in a single step of the valid v3 format, there shouldn't be more than one zero-width bin that has data in it.

The resulting output should be the same; this is just a more accurate representation of how the buildNormalizedHistograms() API is intended to be used.

If using two binsToHistogram above the result becomes:

[ {x: 0, dx: 0.5, y: 200}, {x: 0.5, dx: 0.5, y: 0}, ], [ {x: 0, dx: 0.5, y: 0}, {x: 0.5, dx: 0.5, y: 100}, ]

hence moving this case under the multiple histograms category below.

Got it, ok, that makes sense!

nfelt · 2021-11-03T23:20:46Z

tensorboard/webapp/widgets/histogram/histogram_util_test.ts

@@ -276,8 +276,8 @@ describe('histogram util', () => {
          )
        ).toEqual([
          [
-            {x: 0, dx: 5, y: 100},
-            {x: 5, dx: 5, y: 100},
+            {x: 0, dx: 5, y: 0},


Thanks! That makes sense for the last test case here about multiple steps, but the input bins here could still be changed? Sorry, the GH UI won't let me comment directly, but what I'm referring to is lines 268-272 in this file:

binsToHistogram([ {x: 0, dx: 1, y: 0}, {x: 5, dx: 0, y: 200}, {x: 9, dx: 1, y: 0}, ]),

As described below, binsToHistogram() is supposed to simulate a single step's worth of data, but in the valid v3 format, we should never get a single step with bins like this (where there is a zero-width bin that is not the final bin, but is not empty either). Hence the suggestion to modify this input to something more realistic.

Note that I made a mistake in my original comment in one of the dx values in my example, which I've edited to fix.

… data format (tensorflow#5392) * make timeseries histogram visualization compatible with v3 data format * improve getBinContribution description (thanks nfelt@!) * update test for edge case zero-width bin scenarios * remove duplicate test case * update doc * improve test

yatbear added the plugin:histogram label Oct 29, 2021

yatbear requested a review from nfelt October 29, 2021 22:01

google-cla bot added the cla: yes label Oct 29, 2021

nfelt reviewed Oct 30, 2021

View reviewed changes

yatbear requested a review from nfelt November 2, 2021 02:12

nfelt reviewed Nov 3, 2021

View reviewed changes

yatbear added 6 commits November 4, 2021 14:10

make timeseries histogram visualization compatible with v3 data format

5aa57bd

improve getBinContribution description (thanks nfelt@!)

190c201

update test for edge case zero-width bin scenarios

857e348

remove duplicate test case

d1e9df4

update doc

baf48ec

improve test

94f3e52

yatbear force-pushed the histo_util branch from 9bb9db7 to 94f3e52 Compare November 4, 2021 14:58

yatbear requested a review from nfelt November 4, 2021 14:59

nfelt approved these changes Nov 4, 2021

View reviewed changes

yatbear merged commit 7859719 into tensorflow:master Nov 4, 2021

yatbear deleted the histo_util branch November 4, 2021 22:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

histogram: make timeseries histogram visualization compatible with v3 data format #5392

histogram: make timeseries histogram visualization compatible with v3 data format #5392

yatbear commented Oct 29, 2021

nfelt left a comment

nfelt Oct 29, 2021

yatbear Nov 1, 2021

nfelt Oct 29, 2021 •

edited

yatbear Nov 1, 2021

nfelt Nov 3, 2021

yatbear Nov 4, 2021

nfelt Oct 29, 2021

yatbear Nov 1, 2021

nfelt Nov 3, 2021

yatbear Nov 4, 2021

nfelt left a comment

nfelt Nov 3, 2021

yatbear Nov 4, 2021

nfelt Nov 3, 2021

yatbear Nov 4, 2021

nfelt Nov 3, 2021

nfelt Nov 3, 2021

yatbear Nov 4, 2021

nfelt Nov 4, 2021

nfelt Nov 3, 2021

histogram: make timeseries histogram visualization compatible with v3 data format #5392

histogram: make timeseries histogram visualization compatible with v3 data format #5392

Conversation

yatbear commented Oct 29, 2021

nfelt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nfelt Oct 29, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nfelt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nfelt Oct 29, 2021 •

edited