Test ReuseParams with different variable names #448

Zettelkasten · 2021-02-19T23:20:15Z

The test case from #447, currently failing.

albertz · 2021-02-22T10:43:41Z

So, we basically have outlined a possible solution for this in the discussion in #447.

Basically, what we want (self being the layer):

Let's assume you have param = self.add_param(tf.get_variable(name)) (or sth like that).
param is what you would directly use in this code, for whatever purpose, e.g. matmul in LinearLayer.
self.params[name] is param should be true after this call. This implies:
- self.params includes potential non-owned vars (shared/reuse) and/or potential transformations (transpose, weight noise, weight norm, etc) (i.e. not the original variable, but just some tensor).
Weight noise, weight norm and other transformations, and also L2 losses etc, should only be applied if the tensor or variable is owned by the layer (determined by looking at the namespace).

So we need to change the logic of add_param to implement this logic for self.params. The difficulty is to infer the name. This should be doable by extending var_creation_scope:

In case there is no reuse/sharing or transformation, no custom getter, let's not change anything. Then add_param would also keep the code for this simple case.
With a custom getter, we can grab name, and store it somewhere.

We might need a separate way to get a list of variables of a layer. (Or do we really? What would be the use case?) Or we could also simply iterate through all existing variables (via the global collections) and check for the namespace if we need to filter.

Btw, you might ask, why do we need self.add_param(tf.get_variable(name)), and why not simply do sth like self.create_param(name, ...) instead, and not use tf.get_variable. This is because we want to be able to use other TF code, which makes use of tf.get_variable. E.g. when we use the original LSTM by TF, or other things. We want to support that. (Btw, if this was unclear, you might also extend the documentation in this PR for that.)

Actually, when this is external code, note that it could create multiple variables, i.e. multiple calls to get_variable. We should be careful to handle that correctly. Maybe the whole logic of adding it to self.params should be in var_creation_scope. Or we can keep it in add_param but call to that in var_creation_scope.

Are you going to try to implement this? (I'm somewhat short on time right now, not sure when I get to this.)

Zettelkasten · 2021-02-22T15:56:53Z

Are you going to try to implement this? (I'm somewhat short on time right now, not sure when I get to this.)

Thanks for your detailed notes! Yes, I can certainly give it a try implementing this. If I run into issues/open questions I of course will let you know. Thanks!

albertz · 2021-02-23T15:14:57Z

returnn/tf/layers/base.py

@@ -836,14 +836,18 @@ def add_param(self, param, custom_update=None, trainable=None, saveable=None, ax
      custom_update.set_on_var(param)
    if axes_split_info:
      tf_util.set_param_axes_split_info(param, axes_split_info)
-    if self.reuse_params:
-      name_scope_prefix = self.reuse_params.get_absolute_name_scope_prefix(base_layer=self, param=param)
+    if getattr(param, "RETURNN_layer_map_name", None) is not None:


It should be _RETURNN_layer_map_name.

But also, I think you don't need this at all, and this approach only leads to problems. (Edit Or maybe not. Maybe ignore my comment for now.)

albertz · 2021-02-23T15:21:10Z

returnn/tf/layers/base.py

        base_layer=base_layer, reuse_layer=self.reuse_layer, name=param_name, getter=getter, full_name=name, **kwargs)
+      # The name of the variable created by custom_func might not match param_name.
+      # We store it here for LayerBase.add_param.
+      variable.RETURNN_layer_map_name = param_name


I suggested _RETURNN_layer_map_name because this actually should be a map (dict). This cannot be the name directly, because this param might be used by multiple layers, under different names. So I thought about a mapping layer -> name. However, I'm now thinking, it might even be used by only one single layer under different names...

Hm I see, I kind of thought that there are no unrelated calls to custom_getter before the corresponding add_param is called. That's of course not so nice (and probably breaks if e.g. a custom_func calls get_variable multiple times or so).
Maybe we can make it a layer -> list[name] dict (and then always use the last entry in add_param?) I don't really know a good solution.

and then always use the last entry ...

Such heuristics are never nice. They mostly work, but in some rare cases, or when we do sth more crazy, they suddenly fail. And then this is very annoying to debug. (Unfortunately, we have a couple of such heuristics, which have worked for the initial use case, but then later on failed, and caused quite some extra work. "Technical debt" is the keyword for this...)

But I think we don't need that here.

We can know when we are in the direct get_variable call from the layer. (Via var_creation_scope.)
And in there, we have the original name, and we can get the param, and we can directly do self.params[name] = param, and add_param doesn't need to do anything anymore.

And when var_creation_scope does not create a custom getter, this implies there will be no special handling. Then add_param can catch this.

Fixes #447

Zettelkasten marked this pull request as draft February 19, 2021 23:20

Zettelkasten mentioned this pull request Feb 19, 2021

Reusing parameter with different variable name not working #447

Open

Test ReuseParams with different variable names

c04171c

Zettelkasten force-pushed the frithjof-reuseparams-different-names branch from bf0adbd to 561de92 Compare February 23, 2021 15:07

albertz reviewed Feb 23, 2021

View reviewed changes

Zettelkasten added 2 commits February 23, 2021 21:15

ReuseParams: Pass param_name to add_param if using custom_func

e0facfe

Fixes #447

layer.add_param always adds variable to layer.params

caf7313

Zettelkasten force-pushed the frithjof-reuseparams-different-names branch from 561de92 to caf7313 Compare February 23, 2021 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test ReuseParams with different variable names #448

Test ReuseParams with different variable names #448

Zettelkasten commented Feb 19, 2021

albertz commented Feb 22, 2021

Zettelkasten commented Feb 22, 2021

albertz Feb 23, 2021

albertz Feb 23, 2021 •

edited

albertz Feb 23, 2021

Zettelkasten Feb 23, 2021

albertz Feb 23, 2021

Test ReuseParams with different variable names #448

Are you sure you want to change the base?

Test ReuseParams with different variable names #448

Conversation

Zettelkasten commented Feb 19, 2021

albertz commented Feb 22, 2021

Zettelkasten commented Feb 22, 2021

albertz Feb 23, 2021

Choose a reason for hiding this comment

albertz Feb 23, 2021 • edited

Choose a reason for hiding this comment

albertz Feb 23, 2021

Choose a reason for hiding this comment

Zettelkasten Feb 23, 2021

Choose a reason for hiding this comment

albertz Feb 23, 2021

Choose a reason for hiding this comment

albertz Feb 23, 2021 •

edited