GatherLayer on batch axis #1089

vieting · 2022-08-04T09:32:10Z

This PR fixes #1087. As I face the issue in the context of supervised multilingual training, I added a more general test case also for that which does not necessarily need to go into the main branch. The fix is similar to how the size_placeholder is modified in the ShiftAxisLayer.

vieting · 2022-08-10T12:38:55Z

Hi @albertz, what do you think about the way the size placeholder and dim tag are modified in general? Right now there is a failing test case, where we first do flatten_batch and then gather on the batch axis. I'm not very sure how the desired behavior in this case would look like. It'd be nice if you could comment on what you think here.

albertz · 2022-08-10T12:41:23Z

returnn/tf/layers/basic.py

+          kind=Dim.Types.Spatial, description="%s_gather_axis" % self.name,
+          dyn_size=new_size, batch=self.output.batch,
+          src_data=self.output, src_axis=axis, auto_generated=True)
+        self.output.size_placeholder[axis] = new_size


You don't use the Dim object you created?
Instead of assigning size_placeholder, I think it would be better to set the newly created dim tag.

What is the usual way to set dim tags? I can't just reassign self.output.dim_tags. declare_same_as is used elsewhere, but not sure if it applies here.

See most other layers. Usually you set dim_tags in get_out_data_from_opts. You should not assign a new dim tag in __init__. In __init__, you just might to assign the dyn_size_ext or dyn_size_ext.placeholder of a dim tag which was previously newly created in get_out_data_from_opts.

albertz · 2022-08-10T12:43:01Z

tests/test_TFNetworkLayer.py

+        # gather targets and encoder outputs
+        "tgt": {"class": "gather", "from": "data", "axis": "B", "position": "idx"},  # B', T (sparse)
+        "enc_raw": {"class": "gather", "from": "base:encoder", "axis": "B", "position": "idx"},  # B', T, F
+        "enc": {"class": "reinterpret_data", "size_base": "tgt", "from": "enc_raw"},  # B', T, F


Why is this needed?

returnn/tf/layers/basic.py

albertz · 2022-08-10T12:45:27Z

returnn/tf/layers/basic.py

+        from ..util.data import Dim
+        Dim(
+          kind=Dim.Types.Spatial, description="%s_gather_axis" % self.name,
+          dyn_size=new_size, batch=self.output.batch,


You miss dyn_size_ext here.

You should not set dyn_size in case it is non-standard. Set dyn_size_ext instead.

albertz · 2022-08-10T12:46:22Z

When you modify the batch dim, you should create a new BatchInfo object as well, and assign that to output.

albertz · 2022-08-10T13:11:17Z

returnn/tf/layers/basic.py

@@ -1341,6 +1341,17 @@ def __init__(self, position, axis, **kwargs):
    # (BatchAxes.., InputAxesBeforeGatherAxis, PositionAxes.., InputAxesAfterGatherAxis..)
    self.output.placeholder = tf.gather(params=params, indices=indices, axis=gather_axis, batch_dims=batch_dims)

+    if input_data.dim_tags[old_gather_axis].is_batch_dim():
+      for axis in self.output.size_placeholder:
+        new_size = tf.gather(params=self.output.size_placeholder[axis], indices=position_data.placeholder)


You assume that position_data is of shape [new-batch]?

Right, in the case I have in mind yes. But for the failing test case, this is different and we need to take this into account.

What is it in that case?

There it's of shape [B,T,F], however, in the input B and T are packed

>>> input_data Data{'flat_output', [B&Packed{'time'},F|F'feature'(5)]} >>> self.output Data{'output_output', [B,T|'time'[B],'other-spatial'(7),F|F'feature'(5)]} >>> position_data Data{'indices_flat_output', [B,T|'time'[B],F|'other-spatial'(7)], dtype='int32'}

Which test case is that? The one you added? test_rand_indices?
Why is position_data of this shape? As described, it should have some new-batch dim in it, right? Or basically just the shape [new-batch]? When you gather into the batch dim. It definitely should not have the old batch dim in its shape.

I see that I need to assign it for output. But it should come from position_data, right?

Why is it None for position_data? I don't mean in the test case, I mean in the real case which motivated this test case. In the real case, you would not have such InternalLayer.

It should never be done if the data has a batch dim, unless sth is wrong. In case of the test case, then the test case is buggy.

Right, this is about the test case. However, in the case that I'm interested in, still input_data.batch == position_data.batch is True. This is probably because I'm using an EvalLayer to get the batch indices from a 0/1 vector with shape (B,) and that EvalLayer does not set the output correctly. Then we would need a layer which does that correctly, right?

An EvalLayer should never change the shape. If it does, and you are not very careful in setting the output data, then yes, this is a bug in your config.

vieting · 2022-08-10T13:37:10Z

When you modify the batch dim, you should create a new BatchInfo object as well, and assign that to output.

As I said, the fix is similar to what is done in the ShiftAxisLayer. Do you have another layer which modifies the batch axis and could serve as a good example?

albertz · 2022-08-10T14:25:49Z

When you modify the batch dim, you should create a new BatchInfo object as well, and assign that to output.

As I said, the fix is similar to what is done in the ShiftAxisLayer. Do you have another layer which modifies the batch axis and could serve as a good example?

ShiftAxisLayer does not modify the batch dim. You probably mean the size adoption. That code is a bit ugly/outdated/deprecated/hacky in ShiftAxisLayer, and might not work correct in all cases (but anyway it's simpler because the batch dim is not changed).

albertz · 2022-08-10T14:26:27Z

Do you have another layer which modifies the batch axis and could serve as a good example?

Not many layers do that. I just recall FlattenBatchLayer right now.

albertz · 2022-08-26T10:50:38Z

returnn/tf/layers/basic.py

+            kind=Dim.Types.Spatial, description="%s_gather_axis" % self.name,
+            dyn_size=new_size, batch=self.output.batch,
+            src_data=self.output, src_axis=axis, auto_generated=True)
+          self.output.size_placeholder[axis] = new_size


You should not assign size_placeholder but rather the dim tags.

albertz · 2022-08-26T10:50:50Z

returnn/tf/layers/basic.py

+          from ..util.data import Dim
+          Dim(
+            kind=Dim.Types.Spatial, description="%s_gather_axis" % self.name,
+            dyn_size=new_size, batch=self.output.batch,


You should not assign dyn_size but rather dyn_size_ext.

albertz · 2022-08-26T10:51:28Z

returnn/tf/layers/basic.py

+      for dim_tag in self.output.dim_tags:
+        if dim_tag.is_spatial_dim():
+          axis = self.output.get_batch_axis_excluding_batch(self.output.get_axis_by_tag_name(dim_tag.description))
+          new_size = tf.gather(params=self.output.size_placeholder[axis], indices=position_data.placeholder)


You should not access size_placeholder but rather dim_tag.dyn_size_ext.

albertz · 2022-08-26T10:53:00Z

returnn/tf/layers/basic.py

+    if input_data.dim_tags[old_gather_axis].is_batch_dim():
+      for dim_tag in self.output.dim_tags:
+        if dim_tag.is_spatial_dim():
+          axis = self.output.get_batch_axis_excluding_batch(self.output.get_axis_by_tag_name(dim_tag.description))


This is:

way too complicated: you can simply do for axis, dim_tag in enumerate(self.output.dim_tags)

wrong: do not rely on get_axis_by_tag_name and dim_tag.description

not necessary: just use dim_tag.dyn_size_ext

vieting · 2022-08-26T11:35:26Z

tests/test_TFNetworkLayer.py

+    position = InternalLayer(
+      name="position", network=net,
+      output=Data(
+        name="position",
+        placeholder=tf.constant(position_np, dtype=tf.int64),
+        batch_dim_axis=0, shape=[], dtype="int64",
+      ))


@albertz do I need to change the creation of position in order to make it have a different batch axis dim tag here?

Yes. It's actually not so simple because of the special treatment of the batch dim tag. I'm not sure it's really possible currently.

In practice, in your real code, how would you end up with position?

In practice, in your real code, how would you end up with position?
Do you mean what dim tag I get there?

>>> position.output.dim_tags[0].description 'batch:position'

So it's not actually the global batch dim. I was just confused because I got

>>> position.output.dim_tags[0] == values.output.dim_tags[0] True

but this is because the check does not cover this case, see comment here: #1089 (comment)

vieting · 2022-08-30T07:46:28Z

As discussed offline, it is possible to get the desired results in my use case using the MaskedComputationLayer. Instead of the indices to gather, we need a boolean mask over the batch axis. In my use case, I have this anyway and only computed the indices from the mask. We can use the mask like this:

network = {
    "encoder": {...},  # B, T, F
    "boolean_mask": {...},  # B
    "encoder_masked": {
        "class": "masked_computation",
        "mask": "boolean_mask",
        "unit": {"class": "copy", "from": "encoder"}
    },  # B', T, F
    ...
}

Since that does exactly what I need, I'll close this PR and the corresponding issue.

albertz · 2022-08-30T07:53:04Z

Well, GatherLayer on batch axis is still maybe sometimes a valid thing someone wants to do. I would leave this PR open.

vieting added 3 commits August 4, 2022 11:23

add test_GatherLayer_batch_dim

a983d80

add test for supervised multilingual training

3dc0018

fix size placeholder when gathering batch axis

94c8476

vieting requested review from albertz and a team as code owners August 4, 2022 09:32

albertz reviewed Aug 10, 2022

View reviewed changes

returnn/tf/layers/basic.py Outdated Show resolved Hide resolved

albertz reviewed Aug 10, 2022

View reviewed changes

vieting added 2 commits August 26, 2022 12:40

loop over dim tags, not size placeholder

09c357a

set dim tags in test case

151f498

albertz reviewed Aug 26, 2022

View reviewed changes

vieting commented Aug 26, 2022

View reviewed changes

vieting closed this Aug 30, 2022

albertz reopened this Aug 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GatherLayer on batch axis #1089

GatherLayer on batch axis #1089

vieting commented Aug 4, 2022

vieting commented Aug 10, 2022

albertz Aug 10, 2022

vieting Aug 26, 2022

albertz Aug 26, 2022

albertz Aug 10, 2022

albertz Aug 10, 2022

albertz Aug 10, 2022

albertz commented Aug 10, 2022

albertz Aug 10, 2022

vieting Aug 10, 2022

albertz Aug 10, 2022

vieting Aug 26, 2022

albertz Aug 26, 2022

vieting Aug 26, 2022

albertz Aug 26, 2022

albertz Aug 26, 2022

vieting Aug 26, 2022

albertz Aug 26, 2022

vieting commented Aug 10, 2022

albertz commented Aug 10, 2022

albertz commented Aug 10, 2022

albertz Aug 26, 2022

albertz Aug 26, 2022

albertz Aug 26, 2022

albertz Aug 26, 2022

vieting Aug 26, 2022

albertz Aug 26, 2022

vieting Aug 26, 2022

vieting commented Aug 30, 2022

albertz commented Aug 30, 2022

GatherLayer on batch axis #1089

Are you sure you want to change the base?

GatherLayer on batch axis #1089

Conversation

vieting commented Aug 4, 2022

vieting commented Aug 10, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

albertz commented Aug 10, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vieting commented Aug 10, 2022

albertz commented Aug 10, 2022

albertz commented Aug 10, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vieting commented Aug 30, 2022

albertz commented Aug 30, 2022