Skip to content

Different ways to import parameters

Albert Zeyer edited this page Sep 20, 2021 · 19 revisions

There are multiple ways how parameters can be imported from some other model or checkpoint.

Note that you should differ between the use cases, whether this is for importing parameters for further training (e.g. like import_model_train_epoch1), or for loading a model with given parameters for recognition only (e.g. load in recognition).

Also be careful that some of the approaches might hide errors (such as typos) and would just ignore some parameters instead of throwing an error.

Custom script generating new TF checkpoint

This is the most flexible option. See the scripts tf_avg_checkpoints.py or tf_inspect_checkpoint.py as examples. It should be straightforward to write some own custom logic.

This is probably also the safest option, as you should notice any errors.

However, this takes maybe also the most effort, so it might be overkill, esp for simple cases.

Config option preload_from_files

This is a dict name -> opts. The name is arbitrary, but the dict will be sorted by it to define the order. (for _, opts in sorted(self.preload_from_files.items()):.)

The opts is another dict which can contain:

  • init_for_train: bool = False: If True, will use for train initialization (first epoch), and ignored in recognition. If False, will use for recognition, and ignored in training. Default is False.
  • filename: str: TF checkpoint path
  • prefix: str = "": Prefix in the current model (e.g. subnetwork path like "model1/", or also layers prepared like "model1_..." or so)
  • ignore_missing: bool = False
  • ignore_params: list[str] = []
  • ignore_params_prefixes: list[str] = []

This is handled via CustomCheckpointLoader.

Config option import_model_train_epoch1

The parameters must match exactly. In a new training (first epoch), instead of random initialization, it would load the given model checkpoint.

Config option load

In training (task="train"), if no existing model is found (model specified by model config option), it would use load (but for this case, you should use import_model_train_epoch1 instead to make it more explicit). In non-training (task!="train", e.g. search, forwarding, eval etc), it would use load.

Layer option custom_param_importer

If set, RETURNN will use CustomCheckpointLoader, and that will call LayerBase.set_param_values_by_dict, and if custom_param_importer is a function, this will then call custom_param_importer(layer=self, values_dict=values_dict, session=session). So you can define your own custom function to load the parameters in any way.

Note that this functionality is also used in pretraining when the model architecture changed. The parameters of the previous epoch are then stored in Numpy arrays in values_dict and then in the next epoch it will call LayerBase.set_param_values_by_dict.

Layer param init options (forward_weights_init and others)

The param init uses get_initializer and can in principle use any initializer (e.g. load_txt_file_initializer), or even custom initializing code, which could import other parameters from a checkpoint or elsewhere.

SubnetworkLayer option load_on_init

Note that this is used always, both training and recognition, and also when the network is reinitialized (e.g. due to pretraining). Thus this is probably only useful for recognition with the current logic. The config option preload_from_files is maybe a better and more flexible way.

Layer option reuse_params

This is intended to reuse params from other layers, i.e. to share params. But it can be used to overwrite a custom get_variable function which can again do arbitrary things, like using a custom name in any custom name scope, setting a custom initializers, or defining a variable as a fixed constant, or whatever. Example:

"layer": {
  ...,
  "reuse_params": {"map": {"W": {"custom": my_custom_variable_creater}}}
}

This is handled by ReuseParams. It will call the function like custom_func(base_layer=base_layer, reuse_layer=self.reuse_layer, name=param_name, getter=getter, full_name=name, **kwargs) where kwargs are other args passed to tf.compat.v1.variable_scope.

Layer custom TF name scope

(This is currently not implemented but planned.)

Via returnn-common, layers can define their own custom name scope, and thus allowing to match some other model checkpoint format.