Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

histogram: make summary_v2.histogram_pb TPU compatible #5409

Merged
merged 4 commits into from Nov 10, 2021

Conversation

yatbear
Copy link
Member

@yatbear yatbear commented Nov 8, 2021

Make sure the output shape wouldn't be dynamic in single value data case.

#histogram #tpu

@@ -175,6 +175,24 @@ class SummaryV2PbTest(SummaryBaseTest, tf.test.TestCase):
def histogram(self, *args, **kwargs):
return summary.histogram_pb(*args, **kwargs)

def test_singleton_input(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the other 3 test cases for v3 as well?

test_empty_input, test_empty_input_of_high_rank, test_zero_bucket_count

See also other comment about the empty input case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -271,14 +271,17 @@ def histogram_pb(tag, data, buckets=None, description=None):
bucket_count = DEFAULT_BUCKET_COUNT if buckets is None else buckets
data = np.array(data).flatten().astype(float)
if data.size == 0:
buckets = np.array([]).reshape((0, 3))
histogram_buckets = np.array([]).reshape((0, 3))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this case, we should match _buckets_v3 in by returning the all-zeros array of shape [bucket_size, 3]:

def when_empty():
"""When input data is empty or bucket_count is zero.
1. If bucket_count is specified as zero, an empty tensor of shape
(0, 3) will be returned.
2. If the input data is empty, a tensor of shape (bucket_count, 3)
of all zero values will be returned.
"""
return tf.zeros((bucket_count, 3), dtype=tf.float64)

That ensures that the size of the resulting array is always [bucket_size, 3] no matter what the size or shape of data is.

Can you update the docstring accordingly? Specifically where it says "The output will have this many buckets, except in two edge cases." - that sentence is what we're trying to get rid of here :) Now, the output shape will always be [k, 3] unconditionally. Rather than affecting the shape, the edge cases affect the contents - i.e. empty data means the edges and counts are all 0, and only a single value means the edges are all that value and only the last bucket has nonzero count (as you've described).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

@@ -21,8 +21,8 @@
In general, the value of `k` (the number of buckets) will be a constant,
like 30. There are two edge cases: if there is no data, then there are
no buckets (the shape is `[0, 3]`); and if there is data but all points
have the same value, then there is one bucket whose left and right
endpoints are the same (the shape is `[1, 3]`).
have the same value, then then all buckets' left and right endpoints are
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the migration is still in progress, I'd recommend changing this overall file-level docstring to distinguish between the two different formats, since the part here is still relevant to the current histogram_v2 logic.

Basically

  • First paragraph w/ just a single line can remain as-is
  • Second paragraph is common to both v2 and v3 formats
  • Third paragraph (this one) we can split into 2 different paragraphs. One addressing v2 format (unchanged from existing), one addressing v3 format.

The one about the v3 format we'll want to clarify that the value of k is indeed always a constant, so the output is always [k, 3], even in the empty data case (as discussed below).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@yatbear yatbear requested a review from nfelt November 9, 2021 20:18
Copy link
Collaborator

@nfelt nfelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you!

histogram_buckets = np.array([]).reshape((0, 3))
elif data.size == 0:
histogram_buckets = np.zeros((bucket_count, 3))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional, but this implementation also handles the 0-bucket-count case, so you could combine these two conditions into just

if bucket_count == 0 or data.size == 0:
    histogram_buckets = np.zeros((bucket_count, 3))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

@yatbear yatbear merged commit 9e842f1 into tensorflow:master Nov 10, 2021
@yatbear yatbear deleted the histo_pb_v3 branch November 10, 2021 14:58
yatbear added a commit to yatbear/tensorboard that referenced this pull request Mar 27, 2023
…5409)

* make histogram_pb tpu compatible

* remove superfluous trailing whitespaces

* fix empty data case & update docs

* merge the empty data and zero bucket count cases
dna2github pushed a commit to dna2fork/tensorboard that referenced this pull request May 1, 2023
…5409)

* make histogram_pb tpu compatible

* remove superfluous trailing whitespaces

* fix empty data case & update docs

* merge the empty data and zero bucket count cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants