Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for optimize_out_slice_nd #543

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

jotix16
Copy link
Contributor

@jotix16 jotix16 commented Jun 13, 2021

This is an example case which shows that slice_nd layer doesn't get properly optimized out of the loop.

This pull request is meant to fix that.

For clarification, we want to generalize SliceNdLayer to not only work on start of shape (batch,) but cover even the cases with several time axis.

@jotix16 jotix16 marked this pull request as draft June 13, 2021 11:40
@jotix16 jotix16 force-pushed the optimize_out_slice_nd branch 2 times, most recently from a75c364 to bbdbb67 Compare June 13, 2021 17:02
@albertz

This comment has been minimized.

@jotix16 jotix16 force-pushed the optimize_out_slice_nd branch 2 times, most recently from a45b4a5 to e432a28 Compare June 20, 2021 14:12
@jotix16
Copy link
Contributor Author

jotix16 commented Jun 20, 2021

I've added the tf implementation of the new slice_nd2 in util/basic and some tests. I haven't yet cleaned the old ones out.

Before implementing the RETURNN layer, I wanted to clarify if we want to allow a slice_axis as input param to the function, similar to the GatherLayer. If we don't, the user has to make sure that the axis of input and start have the same order.

I.e.

"""
  :param tf.Tensor input: shape (B, T1, ..., Tn, D)
  :param tf.Tensor start: shape (B,T1 .., Tn-1), int32 which automatically indicates n as the slice-axis
"""

returnn/tf/layers/basic.py Outdated Show resolved Hide resolved
@albertz
Copy link
Member

albertz commented Jun 25, 2021 via email

returnn/tf/layers/basic.py Outdated Show resolved Hide resolved
@albertz
Copy link
Member

albertz commented Jun 26, 2021

Can you post here what the actual problem/error is when you use these test cases with the original slice_nd? It is not really clear to me why there is actually a problem. (Just post the error with stack trace.)

@jotix16
Copy link
Contributor Author

jotix16 commented Jun 28, 2021

(Just post the error with stack trace.)

The Error strace you asked. The thing is that the slice_nd we had, expects start to have only one batch axis [B], but if slice_nd gets pulled out, so does start and instead of shape [B], start has shape [B,T]. So, it requires a loop over T. That's what I added with this commit.

@jotix16 jotix16 marked this pull request as ready for review June 29, 2021 09:36
assert x.get_dim_tag(start_axis).is_equal(start.get_dim_tag(start_axis), **is_equal_opts)

# Handle the case when layer is pulled out of rec loop but the input hasn't change
if self.optimized_out_of_loop_and_unchanged_input(x, start):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the name of the function and the comment. This layer is not related to the rec loop (RecLayer) in any way. No comment should mention anything about rec loop here.

I don't really understand what you are actually testing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only expect input_data comes from base_network, no check is required. We can always assume that the input_data stays the same both when being pulled out of the loop or not.

Should I follow this logic?

returnn/tf/layers/basic.py Outdated Show resolved Hide resolved
returnn/tf/layers/basic.py Outdated Show resolved Hide resolved
returnn/tf/layers/basic.py Outdated Show resolved Hide resolved
self.output.size_placeholder = x.size_placeholder.copy()
if isinstance(size, tf.Tensor):
self.output.size_placeholder[0] = tf.maximum(seq_lens - tf.reshape(start, tf.shape(seq_lens)), 0)
self.output.size_placeholder[slice_axis] = size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. size is just a scalar. But you need a vector of shape [B] here. What's wrong with the old code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you need a vector of shape [B] here. What's wrong with the old code?

We cannot use start directly to calculate the size as it has shape [B,T0,..] instead of [B,].
So I am not sure how to set self.output.size_placeholder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the meaning of self.output.size_placeholder if we have more than 1 spatial axis? Should any element of self.output.size_placeholder have size (B,)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_placeholder is a dict, mapping each spatial axis (counted without batch dim) to such [B] tensor.
So if you have a tensor [B,T1,T2,F], then for each batch entry b you find it's length of axis T1 in size_placeholder[0][b], and the one for T2 in size_placeholder[1][b].
However with this it's not possible that lengths depend on anything else then the batch entry, i.e. you cannot model that the some batch entry should have a different length in axis T2 for different time steps t1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you cannot model that the same batch entry should have a different length in axis T2 for different time steps t1

This is what we have here, though.

What is self.output.size_placeholder used for anyways? Does it cause any problems if set wrong?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size_placeholder is mostly used for masking, e.g which entries to ignore when calculating the loss, or reducing a sequence (e.g. take the average/min/max over the sequence), or also for layers which concat in time and such.
Many layers consider this, if this is set wrong then weird things can happen and your calculations will be wrong and depend on batching.
So yes, I'd say setting this is kind of important.

I don't know a way to set it how you want it. Maybe @albertz knows more?

self.output.placeholder = slices

@classmethod
def optimized_out_of_loop_and_unchanged_input(cls, input_data, start):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As explained above (and before), this name must be changed to what this function actually does/checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now called input_comes_from_base_network.

def optimized_out_of_loop_and_unchanged_input(cls, input_data, start):
"""
:rtype: bool
The idea is to check that the axis after the last common axis is a feature axis instead of spatial.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this. Why do you check this? Why is this relevant?
Also, what does this mean, "the idea"? Is this what this function returns, or how does this idea relate to what this function returns? If this is what the function returns, it should be like :returns: True iff axis after last ... is ... instead of ... or so.

Copy link
Contributor Author

@jotix16 jotix16 Jul 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reformulated it. Should be more clear now.

    """
    The idea is to check if the axis after the last common axis is the feature axis instead of another spatial axis.
    Because the input_data should normally have one extra spatial axis compared to start.
    """

slice_idx = tf.tile(tf.expand_dims(start, -1), [1] * len_common_dims + [size]) + tf.range(size)
mask = tf.logical_or(tf.greater(slice_idx, slice_dim - 1), tf.less(slice_idx, 0)) # (B,T1 .., Tn-1, size)
slice_idx = tf.clip_by_value(slice_idx, 0, slice_dim - 1) # cliped slice idx
res = tf.gather(x, slice_idx, axis=len_common_dims, batch_dims=len_common_dims)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think tf.gather with axis and batch_dims is only supported in later TF versions (I don't remember since what version, can you/someone check?).

I'm not sure anymore which is the min TF version we want to support (I think we documented this somewhere; can someone check? @patrick-wilken ? @JackTemaki ?).

Maybe we need our own wrapper for gather. Or you use the more generic gather_nd (which might be slower for this case though).

Copy link
Contributor Author

@jotix16 jotix16 Jul 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arguments are present from tf_2.0.0 up to tf_2.3.0 and tf_2.4.0. So it should be fine?

returnn/tf/util/basic.py Outdated Show resolved Hide resolved
@jotix16 jotix16 requested a review from a team as a code owner July 13, 2021 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants