histogram: remove `backwards` buckets in v1 histogram migration #5404

yatbear · 2021-11-05T19:46:01Z

Remove empty buckets on both ends to avoid having backwards (left edge > right edge) buckets.

Ran the change internally: cl/407885499

Visualization change:

before:
after:

#histogram #tpu

nfelt

Thanks for taking this on! The logic itself looks good to me, just some suggestions to make it clearer.

nfelt · 2021-11-09T02:14:30Z

tensorboard/data_compat.py

+    # Find the indices of the leftmost and rightmost non-empty buckets.
+    n = len(bucket_counts)
+    start = next((i for i in range(n) if bucket_counts[i] > 0), n)
+    end = next((i for i in range(n - 1, -1, -1) if bucket_counts[i] > 0), -1)


A bit simpler as reversed(range(n))

Done. Thanks!

nfelt · 2021-11-09T02:27:56Z

tensorboard/data_compat.py

+    bucket_lefts = histogram_value.bucket_limit[start:end]
+    bucket_rights = histogram_value.bucket_limit[start:end]
+    bucket_counts = bucket_counts[start : end + 1]
+    if start <= end:


This logic might read a little easier if we special-case the if start > end case instead (to return the [k, 3] array of zeros). Otherwise, it looks like lines 97-98 above are constructing the normal-case bucket lefts and bucket rights, and then it's confusing because it's defining them both to be the same thing. But that's because what it's actually handling there is a special case (of all empty buckets). The logic that handles the normal case is down below, and it has to overwrite the bucket_lefts and bucket_rights variables, which is a bit counterintuitive IMO.

E.g.

if start > end: # If all input buckets were empty, treat it as a zero-bucket new-style histogram. return np.zeros([0, 3], dtype=np.float32) ## normal flow continues here bucket_lefts = ... bucket_rights = ...

nfelt · 2021-11-09T02:36:02Z

tensorboard/data_compat.py

+    n = len(bucket_counts)
+    start = next((i for i in range(n) if bucket_counts[i] > 0), n)
+    end = next((i for i in range(n - 1, -1, -1) if bucket_counts[i] > 0), -1)
+    # Discard empty buckets on both ends.


It's a bit non-obvious how the indexing works here - even being relatively familiar with this, I had to read the code here a few times to be sure it made sense and there weren't any off-by-one issues :) In particular, I would suggest a little more explanation of why we want start:end for the bucket limits, but start:end+1 for the bucket counts, for example:

# Discard empty buckets on both ends, and keep only the "inner" edges from # the remaining buckets. Note that bucket indices range from `start` to # `end` inclusive, but bucket_limit indices are exclusive of `end` - this is # because bucket_limit[i] is the right-hand edge for bucket[i]. bucket_counts = bucket_counts[start : end + 1] inner_edges = histogram_value.bucket_limit[start:end] # Use min as the left-hand limit for the first non-empty bucket. bucket_lefts = [histogram_value.min] + inner_edges # Use max as the right-hand limit for the last non-empty bucket. bucket_rights = inner_edges + [histogram_value.max]

Done, thanks for improving the readability!

nfelt · 2021-11-09T07:45:55Z

tensorboard/data_compat_test.py

+        self.assertEqual(1, buckets[-1][1])
+        self.assertEqual(1024, buckets[0][2])
+
+    def test_histogram_with_empty_buckets_on_both_ends(self):


It might be worth an additional test case using a histogram with data that has extremal values, e.g. [-1e20, 1e20], which should produce counts in the "farthest out" buckets that the legacy histogram format can generate. In particular, in that case the final bucket in the legacy histogram format is non-empty (whereas usually it's empty).

nfelt · 2021-11-09T07:54:33Z

tensorboard/data_compat.py

@@ -81,10 +81,27 @@ def make_summary(tag, metadata, data):


 def _migrate_histogram_value(value):
+    """Convert `old-style` histogram value to `new-style`.
+
+    Since by default min value is DBL_MAX and max value is -DBL_MAX, empty


I might phrase this a little differently - in particular, "min" and "max" here might be easy to confuse with the legacy histogram format's actual min and max fields (which are derived from the data), but I think you're actually referring to the minimum and maximum bucket limits? (Also, I think you have -DBL_MAX and DBL_MAX swapped.)

Maybe something like:

The "old-style" format can have outermost bucket limits of -DBL_MAX and DBL_MAX,
which are problematic for visualization. We replace those here with the actual min and
max values seen in the input data, but then in order to avoid introducing "backwards"
buckets (where left edge > right edge), we first must drop all empty buckets on the
left and right ends.

This is referring to https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/lib/histogram/histogram.cc#L93, when _min and _max aren't set given empty buckets, I think the limits could end up being [DBL_MAX, -DBL_MAX], but I agree that it's very confusing, adopted the suggested doc string, thanks!

Ohhh right I understand now, sorry for the confusion! But yes, since in the all-empty histogram case now, we should be ignoring the legacy format's min and max anyway (since we just produce the [0, 3] empty tensor), I think the fact that those values might be [DBL_MAX, -DBL_MAX] hopefully shouldn't be an issue.

- improve docs - test extremal histogram values - fix tag names in test

nfelt · 2021-11-09T23:54:26Z

tensorboard/data_compat.py

+    if start > end:
+        # If all input buckets were empty, treat it as a zero-bucket
+        # new-style histogram.
+        return make_summary(


Ah right, I forgot that we need to wrap this in make_summary() in two places if we do an early return here. Alternatively we could do something like:

if start > end: buckets = np.zeros([0, 3], dtype=np.float32) else: # rest of normal codepath goes here buckets = np.array([...], dtype=np.float32).transpose() return make_summary(value.tag, summary_metadata, buckets)

Up to you!

Changed the sequence to avoid moving the code blocks around.

nfelt · 2021-11-09T23:57:02Z

tensorboard/data_compat.py

@@ -81,10 +81,27 @@ def make_summary(tag, metadata, data):


 def _migrate_histogram_value(value):
+    """Convert `old-style` histogram value to `new-style`.
+
+    Since by default min value is DBL_MAX and max value is -DBL_MAX, empty


Ohhh right I understand now, sorry for the confusion! But yes, since in the all-empty histogram case now, we should be ignoring the legacy format's min and max anyway (since we just produce the [0, 3] empty tensor), I think the fact that those values might be [DBL_MAX, -DBL_MAX] hopefully shouldn't be an issue.

…orflow#5404) * remove empty buckets on both ends to avoid having `backwards` buckets * fix typo * use data min/max as bucket left/right hand limit * grammar fix * improve readability - improve docs - test extremal histogram values - fix tag names in test * small change

yatbear added the plugin:histogram label Nov 5, 2021

yatbear requested a review from nfelt November 5, 2021 19:46

google-cla bot added the cla: yes label Nov 5, 2021

yatbear changed the title ~~histogram: make v1 migration compatible with v3 histogram data~~ histogram: remove backwards bucket in v1 histogram migration Nov 5, 2021

yatbear changed the title ~~histogram: remove backwards bucket in v1 histogram migration~~ histogram: remove backwards buckets in v1 histogram migration Nov 5, 2021

yatbear removed the request for review from nfelt November 5, 2021 20:18

yatbear marked this pull request as draft November 5, 2021 20:18

yatbear marked this pull request as ready for review November 8, 2021 16:04

yatbear force-pushed the compat branch from 0add767 to 627521d Compare November 8, 2021 16:07

yatbear requested a review from nfelt November 8, 2021 17:00

nfelt reviewed Nov 9, 2021

View reviewed changes

yatbear added 5 commits November 9, 2021 18:42

remove empty buckets on both ends to avoid having backwards buckets

6dc689b

fix typo

44ee7f4

use data min/max as bucket left/right hand limit

011b656

grammar fix

21ff50e

improve readability

7cebaaf

- improve docs - test extremal histogram values - fix tag names in test

yatbear force-pushed the compat branch from 627521d to 7cebaaf Compare November 9, 2021 18:42

yatbear requested a review from nfelt November 9, 2021 18:44

nfelt reviewed Nov 9, 2021

View reviewed changes

nfelt approved these changes Nov 9, 2021

View reviewed changes

small change

ff907e9

yatbear merged commit cec1311 into tensorflow:master Nov 10, 2021

yatbear deleted the compat branch November 10, 2021 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

histogram: remove `backwards` buckets in v1 histogram migration #5404

histogram: remove `backwards` buckets in v1 histogram migration #5404

yatbear commented Nov 5, 2021 •

edited

nfelt left a comment

nfelt Nov 9, 2021

yatbear Nov 9, 2021

nfelt Nov 9, 2021

yatbear Nov 9, 2021

nfelt Nov 9, 2021

yatbear Nov 9, 2021

nfelt Nov 9, 2021

yatbear Nov 9, 2021

nfelt Nov 9, 2021

yatbear Nov 9, 2021

nfelt Nov 9, 2021

nfelt Nov 9, 2021

yatbear Nov 10, 2021

nfelt Nov 9, 2021

histogram: remove backwards buckets in v1 histogram migration #5404

histogram: remove backwards buckets in v1 histogram migration #5404

Conversation

yatbear commented Nov 5, 2021 • edited

nfelt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

histogram: remove `backwards` buckets in v1 histogram migration #5404

histogram: remove `backwards` buckets in v1 histogram migration #5404

yatbear commented Nov 5, 2021 •

edited