Skip to content

Old: Theano Sequence to Sequence Learning with RETURNN

Albert Zeyer edited this page Aug 12, 2021 · 1 revision

RETURNN is a highly configurable neural network training software. This is a tutorial to configure Sequence to Sequence Learning for Speech/Handwriting recognition using RETURNN. It can be achieved by configuring the json file with slight changes and some new parameters as explained below.

An example JSON file from the RETURNN paper shows a simple bidirectional LSTM RNN with 2 layers and 300 nodes in forward and backward directions:

{"fw_0" : { "class":"rec", "n_out":300, "direction":1 },
"bw_0" : { "class":"rec", "n_out":300, "direction":−1 },
"fw_1" : { "class":"rec", "n_out":300, "direction":1, "from" : ["fw_0", "bw_0"]},
"bw_1" : { "class":"rec", "n_out":300, "direction":−1, "from" : ["fw_0", "bw_0"]},
"output" : { "class":"softmax","from" : ["fw_1", "bw_1"]}}

The main parameters that are to be added/modified to make this into seq2seq are:

  • encoder : This has to be specified in the decoder network to denote which layers' final states are used as the encoder to initialize the hidden state of the decoder in the encoder-decoder network. It is specified as a list of layers that form the encoder.
  • from : For the decoder, "from" has to be "null" always.

Specifying these would give you the simplest seq2seq system without attention. It would look like what's shown below:

{"fw_0" : { "class":"rec", "n_out":300, "direction":1 },
"bw_0" : { "class":"rec", "n_out":300, "direction":−1 },
"fw_1" : { "class":"rec", "n_out":600,"direction":1, "from" : "null", "encoder":["fw_0","bw_0"]},
"bw_1" : { "class":"rec", "n out":600, "direction":−1, "from" : "null", "encoder":["fw_0","bw_0"]},
"output" : { "class":"softmax", "from" : ["fw_1", "bw_1"]}}

An important thing to note here is that the decoder should have the parameter "n_out" as exactly the same as sum("n_out") of all layers specified as the "encoder". This is the reason we have 600 units for "fw_1" and "bw_1".

If you would like to include attention as well, the following page details the required parameters:
Theano Attention Parameters