Skip to content

Layers with special behavior on dynamic spatial axes

Albert Zeyer edited this page Nov 21, 2021 · 17 revisions

This is a list of layers with special behavior on dynamic spatial axes, i.e. axes with dynamic sequence lengths because considering the padding or sequence lengths is important for correct behavior. Or in general any layer where the output tensor (placeholder) depends on the sequence lengths.

  • SoftmaxOverSpatialLayer, will make sure that the padded frames are masked away.
  • BatchSoftmaxLayer
  • ReduceLayer. sum, max etc will ignore the padded frames.
  • MathNormLayer. shares code with ReduceLayer internally.
  • DotLayer when reducing a dynamic spatial axes
  • BatchNormLayer (and batch norm in general on any layer)
  • (NormLayer actually should have special behavior and ignore padded frames, but incorrectly it does not currently (#575))
  • SliceNdLayer
  • SeqLenMaskLayer
  • FlattenBatchLayer
  • PostfixInTimeLayer
  • (CumsumLayer with reverse=True should ignore padded frames but currently does not (#574))
  • LossLayer (deprecated). see below for losses
  • RecLayer with direction=-1
  • SelfAttentionLayer (deprecated)
  • (LengthLayer would return the sequence lengths)

(This list is currently incomplete.)

Sequence lengths also matter for the losses. For the framewise losses, it matters for the accumulation (it will ignore the padded frames), and obviously it also matters for all the sequence losses such as CTC.

Somewhat related is the option recurrent on each layer class (or loss). recurrent=False implies that sequence lengths do not matter as well as the ordering of frames. But this is not exactly the same. E.g. ConvLayer has recurrent=True but ConvLayer does not make use of the sequence lengths.

The obvious example of a layer where the dynamic spatial axes do not matter is LinearLayer.

(Partly related is the list of layers with special behavior for recurrent automatic optimization.)