Skip to content

RETURNN principles

Albert Zeyer edited this page Aug 20, 2021 · 6 revisions

High-level

Flexibility, efficiency, simplicity.

  • Things should be extremely flexible. Everything should be possible (without modifying RETURNN code itself). And not only possible but also not too hacky or complicated.

  • Things should be efficient. Training as well as recognition.

  • Things should be straight-forward, logical, easy/simple for the user. This doesn't necessarily mean the most shortest code in the config (or user script), but clear code, which is non-ambiguous. Simple for both reading and writing.

Flexibility and simplicity is doable together. But that makes efficiency difficult. We solve the efficiency by some automatic optimizations and some native code for core operations.

(Also see our tutorial video on this topic.)

Technical

  • Dimension tags and Data are a core concept in RETURNN. Orders of axes in the raw tensor should never matter. All operations, layers, modules make use of Data and dimension tags.

  • The user usually does not need to think about the batch dim.

  • Axes can be reordered automatically to perform calculations more efficiently, depending on hardware or other circumstances. As per the principle above, this never should matter in any way and should not make a difference in behavior (only that it is faster).

  • The recurrent automatic optimization might optimize layers out of a recurrent loop, which can greatly increase performance in some cases. The user never should need to think about this. This should be completely opaque to the user.

    (Also see rec automatic optimization, special behavior of layers.)

  • The user can define the model in a unified way for both training and recognition. This is done by the integrated beam search via the ChoiceLayer (and automatic optimization makes this efficient).

    (This is technically tricky for things like auto-regressive self-attention, to have this flexible, generic and efficient. See the issue on generalized self-attention for a long discussion and a solution.)