Skip to content

Old: Theano Attention Parameters

Albert Zeyer edited this page Aug 12, 2021 · 1 revision

This is for the Theano backend. The settings are not compatible with the TensorFlow backend. The settings are meant for the rec-layer.

Mandatory parameters

  • base: The layer that attention mechanism uses as its base. If you don't specify this parameter, encoder is taken as the base.
  • recurrent_transform: If this parameter is set to "attention_list", then attention is enabled for the layer.

Additional parameters

  • attention_template
    Size of the template vector for attention.
    Default: 128

  • attention_distance
    Different types of possible distance functions to create the energy vector from previous state of the decoder and the final state of the encoder.
    Possible values:

    • "l2" : Euclidean distance
    • "sqr" : Squared distance
    • "dot" : Dot product
    • "l1" : L1 norm
    • "cos" : Cosine similarity
    • "rnn" : Exponential Linear Units [https://arxiv.org/pdf/1511.07289v1.pdf]

    Default : "l2"

  • attention_norm
    Different types of possible normalizations that can be done to obtain the alpha weights for attention.
    Possible values:

    • "exp" : Exponential normalization
    • "sigmoid" : Sigmoid normalization
    • "lstm : Normalization with an LSTM

    Default : "exp"

  • attention_sharpening
    Degree by which you would like to sharpen/scale your attention weights. Default: 1.0

  • attention_nbest
    Selects n highest alpha weights to enable attending to the corresponding states alone instead of attending to the entire sequence.

  • attention_glimpse
    Number of glimpses into previous decoder states.
    Default : 1