Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF get_sprint_automata_for_batch: RASR segmentation fault in Speech::CTCTopologyGraphBuilder::addLoopTransition #1456

Open
vieting opened this issue Nov 8, 2023 · 41 comments

Comments

@vieting
Copy link
Contributor

vieting commented Nov 8, 2023

I created an apptainer image with tf 2.13 and tried to run a training with FastBaumWelchLoss. It crashes in step 0 because the get_sprint_automata_for_batch op is not found.

``` EXCEPTION Traceback (most recent call last): File ".../returnn/tf/network.py", line 4341, in help_on_tf_exception line: debug_fetch, fetch_helpers, op_copied = FetchHelper.copy_graph( debug_fetch, target_op=op, fetch_helper_tensors=list(op.inputs), stop_at_ts=stop_at_ts, verbose_stream=file, ) locals: debug_fetch = fetch_helpers = op_copied = FetchHelper = FetchHelper.copy_graph = > target_op = op = fetch_helper_tensors = list = op.inputs = (,) stop_at_ts = [, , , file = File ".../returnn/tf/util/basic.py", line 7700, in FetchHelper.copy_graph line: assert target_op in ops, "target_op %r,\nops\n%s" % (target_op, pformat(ops)) locals: target_op = ops = [] pformat = AssertionError: target_op , ops [] ```

The actual error is this:

Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
  (0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 164, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 225, in _read
    return Unpickler(p).load()

EOFError: Ran out of input
@albertz
Copy link
Member

albertz commented Nov 8, 2023

Ah, that's just in help_on_tf_exception, which is not critical (help_on_tf_exception is itself for debugging only, to print some additional information, and for some reason, it fails).

But it means there was another actual exception happening before. Can you post the full log?

@vieting
Copy link
Contributor Author

vieting commented Nov 8, 2023

Sure, the full log is here:

RETURNN starting up, version 1.20231107.125810+git.dbef0ca0, date/time 2023-11-08-12-17-46 (UTC+0100), pid 1212279, cwd /work/asr4/vieting/tmp/20231108_tf213_sprint_op, Python /usr/bin/python3
RETURNN command line options: ['returnn.config']
Hostname: cn-04
TensorFlow: 2.13.0 (v2.13.0-rc2-7-g1cb1a030a62) (<not-under-git> in /usr/local/lib/python3.8/dist-packages/tensorflow)
Use num_threads=1 (but min 2) via OMP_NUM_THREADS.
Setup TF inter and intra global thread pools, num_threads 2, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2}.
CUDA_VISIBLE_DEVICES is not set.
Collecting TensorFlow device list...
Local devices available to TensorFlow:
  1/1: name: "/device:CPU:0"
       device_type: "CPU"
       memory_limit: 268435456
       locality {
       }
       incarnation: 3855380559335333431
       xla_global_id: -1
Train data:
  input: 1 x 1
  output: {'raw': {'dtype': 'string', 'shape': ()}, 'orth': [256, 1], 'data': [1, 2]}
  OggZipDataset, sequences: 249229, frames: unknown
Dev data:
  OggZipDataset, sequences: 300, frames: unknown
RETURNN starting up, version 1.20231107.125810+git.dbef0ca0, date/time 2023-11-08-12-18-11 (UTC+0100), pid 3325131, cwd /work/asr4/vieting/tmp/20231108_tf213_sprint_op, Python /usr/bin/python3
RETURNN command line options: ['returnn.config']
Hostname: cn-285
TensorFlow: 2.13.0 (v2.13.0-rc2-7-g1cb1a030a62) (<not-under-git> in /usr/local/lib/python3.8/dist-packages/tensorflow)
Use num_threads=1 (but min 2) via OMP_NUM_THREADS.
Setup TF inter and intra global thread pools, num_threads 2, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2}.
CUDA_VISIBLE_DEVICES is set to '2'.
Collecting TensorFlow device list...
Local devices available to TensorFlow:
  1/2: name: "/device:CPU:0"
       device_type: "CPU"
       memory_limit: 268435456
       locality {
       }
       incarnation: 7046766875533982763
       xla_global_id: -1
  2/2: name: "/device:GPU:0"
       device_type: "GPU"
       memory_limit: 10089005056
       locality {
         bus_id: 1
         links {
         }
       }
       incarnation: 14158601620701111509
       physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:41:00.0, compute capability: 7.5"
       xla_global_id: 416903419
Using gpu device 2: NVIDIA GeForce RTX 2080 Ti
Hostname 'cn-285', GPU 2, GPU-dev-name 'NVIDIA GeForce RTX 2080 Ti', GPU-memory 9.4GB
Train data:
  input: 1 x 1
  output: {'raw': {'dtype': 'string', 'shape': ()}, 'orth': [256, 1], 'data': [1, 2]}
  OggZipDataset, sequences: 249229, frames: unknown
Dev data:
  OggZipDataset, sequences: 300, frames: unknown
Learning-rate-control: file learning_rates.swb.ctc does not exist yet
Setup TF session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ...
layer /'data': [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)] float32
layer /features/'conv_h_filter': ['conv_h_filter:static:0'(128),'conv_h_filter:static:1'(1),F|F'conv_h_filter:static:2'(150)] float32
layer /features/'conv_h': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_act': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_split': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F'conv_h:channel'(150),F|F'conv_h_split_split_dims1'(1)] float32
DEPRECATION WARNING: Explicitly specify in_spatial_dims when there is more than one spatial dim in the input.
This will be disallowed with behavior_version 8.
layer /features/'conv_l': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel'(150),F|F'conv_l:channel'(5)] float32
layer /features/'conv_l_merge': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
DEPRECATION WARNING: MergeDimsLayer, only keep_order=True is allowed
This will be disallowed with behavior_version 6.
layer /features/'conv_l_act_no_norm': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'conv_l_act': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'features': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'specaug': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'conv_source': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_source_split_dims1'(1)] float32
layer /'conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_1:channel'(32)] float32
layer /'conv_1_pool': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_1:channel'(32)] float32
layer /'conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/32⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_2:channel'(64)] float32
layer /'conv_3': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_3:channel'(64)] float32
layer /'conv_merged': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conv_h:channel*conv_l:channel//2)*conv_3:channel'(24000)] float32
layer /'input_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'input_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_linear_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_linear_swish:feature-dense'(2048)] float32
layer /'conformer_1_ffmod_1_dropout_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_half_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_conv_mod_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_conv_mod_pointwise_conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32
layer /'conformer_1_conv_mod_glu': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conformer_1_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32
layer /'conformer_1_conv_mod_depthwise_conv': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_bn': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
DEPRECATION WARNING: batch_norm masked_time should be specified explicitly
This will be disallowed with behavior_version 12.
layer /'conformer_1_conv_mod_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_pointwise_conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_mhsa_mod_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_mhsa_mod_relpos_encoding': [T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32
layer /'conformer_1_mhsa_mod_self_attention': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_att_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_ffmod_2_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_ffmod_2_linear_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_linear_swish:feature-dense'(2048)] float32
layer /'conformer_1_ffmod_2_dropout_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_2_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_2_half_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'encoder': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'output:feature-dense'(88)] float32
Network layer topology:
  extern data: data: Tensor{[B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}, seq_tag: Tensor{[B?], dtype='string'}
  used data keys: ['data', 'seq_tag']
  layers:
    layer batch_norm 'conformer_1_conv_mod_bn' #: 512
    layer conv 'conformer_1_conv_mod_depthwise_conv' #: 512
    layer copy 'conformer_1_conv_mod_dropout' #: 512
    layer gating 'conformer_1_conv_mod_glu' #: 512
    layer layer_norm 'conformer_1_conv_mod_ln' #: 512
    layer linear 'conformer_1_conv_mod_pointwise_conv_1' #: 1024
    layer linear 'conformer_1_conv_mod_pointwise_conv_2' #: 512
    layer combine 'conformer_1_conv_mod_res_add' #: 512
    layer activation 'conformer_1_conv_mod_swish' #: 512
    layer copy 'conformer_1_ffmod_1_dropout' #: 512
    layer linear 'conformer_1_ffmod_1_dropout_linear' #: 512
    layer eval 'conformer_1_ffmod_1_half_res_add' #: 512
    layer linear 'conformer_1_ffmod_1_linear_swish' #: 2048
    layer layer_norm 'conformer_1_ffmod_1_ln' #: 512
    layer copy 'conformer_1_ffmod_2_dropout' #: 512
    layer linear 'conformer_1_ffmod_2_dropout_linear' #: 512
    layer eval 'conformer_1_ffmod_2_half_res_add' #: 512
    layer linear 'conformer_1_ffmod_2_linear_swish' #: 2048
    layer layer_norm 'conformer_1_ffmod_2_ln' #: 512
    layer linear 'conformer_1_mhsa_mod_att_linear' #: 512
    layer copy 'conformer_1_mhsa_mod_dropout' #: 512
    layer layer_norm 'conformer_1_mhsa_mod_ln' #: 512
    layer relative_positional_encoding 'conformer_1_mhsa_mod_relpos_encoding' #: 64
    layer combine 'conformer_1_mhsa_mod_res_add' #: 512
    layer self_attention 'conformer_1_mhsa_mod_self_attention' #: 512
    layer layer_norm 'conformer_1_output' #: 512
    layer conv 'conv_1' #: 32
    layer pool 'conv_1_pool' #: 32
    layer conv 'conv_2' #: 64
    layer conv 'conv_3' #: 64
    layer merge_dims 'conv_merged' #: 24000
    layer split_dims 'conv_source' #: 1
    layer source 'data' #: 1
    layer copy 'encoder' #: 512
    layer subnetwork 'features' #: 750
    layer conv 'features/conv_h' #: 150
    layer eval 'features/conv_h_act' #: 150
    layer variable 'features/conv_h_filter' #: 150
    layer split_dims 'features/conv_h_split' #: 1
    layer conv 'features/conv_l' #: 5
    layer layer_norm 'features/conv_l_act' #: 750
    layer eval 'features/conv_l_act_no_norm' #: 750
    layer merge_dims 'features/conv_l_merge' #: 750
    layer copy 'features/output' #: 750
    layer copy 'input_dropout' #: 512
    layer linear 'input_linear' #: 512
    layer softmax 'output' #: 88
    layer eval 'specaug' #: 750
net params #: 18473980
net trainable params: [<tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_1_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conv_1/W:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'conv_1/bias:0' shape=(32,) dtype=float32>, <tf.Variable 'conv_2/W:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'conv_2/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'conv_3/W:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'conv_3/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'features/conv_h_filter/conv_h_filter:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'features/conv_l/W:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'features/conv_l_act/bias:0' shape=(750,) dtype=float32>, <tf.Variable 'features/conv_l_act/scale:0' shape=(750,) dtype=float32>, <tf.Variable 'input_linear/W:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'output/W:0' shape=(512, 88) dtype=float32>, <tf.Variable 'output/b:0' shape=(88,) dtype=float32>]
start training at epoch 1
using batch size: {'classes': 5000, 'data': 400000}, max seqs: 128
learning rate control: NewbobMultiEpoch(num_epochs=6, update_interval=1, relative_error_threshold=-0.01, relative_error_grow_threshold=-0.01), epoch data: 1: EpochData(learningRate=1.325e-05, error={}), 2: EpochData(learningRate=1.539861111111111e-05, error={}), 3: EpochData(learningRate=1.754722222222222e-05, error={}), ..., 360: EpochData(learningRate=1.4333333333333375e-05, error={}), 361: EpochData(learningRate=1.2166666666666727e-05, error={}), 362: EpochData(learningRate=1e-05, error={}), error key: None
pretrain: None
start epoch 1 with learning rate 1.325e-05 ...
TF: log_dir: output/models/train-2023-11-08-11-18-11
Create optimizer <class 'returnn.tf.updater.NadamOptimizer'> with options {'epsilon': 1e-08, 'learning_rate': <tf.Variable 'learning_rate:0' shape=() dtype=float32>}.
Initialize optimizer (default) with slots ['m', 'v'].
These additional variable were created by the optimizer: [<tf.Variable 'optimize/gradients/conformer_1_conv_mod_bn/batch_norm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_bn/batch_norm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_depthwise_conv/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_depthwise_conv/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_1/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(1024,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_2/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_dropout_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_dropout_linear/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_linear_swish/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_linear_swish/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_dropout_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_dropout_linear/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_linear_swish/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_linear_swish/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_att_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_relpos_encoding/Gather_grad/Reshape_accum_grad/var_accum_grad:0' shape=(65, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_self_attention/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_output/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_output/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_h/convolution/ExpandDims_1_grad/Reshape_accum_grad/var_accum_grad:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l/convolution_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/input_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 88) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(88,) dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta1_power:0' shape=() dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta2_power:0' shape=() dtype=float32>].
SprintSubprocessInstance: exec ['/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:37,p2c_fd:38,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=no', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']
SprintSubprocessInstance: starting, pid 3325822
SprintSubprocessInstance: Sprint child process (['/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:37,p2c_fd:38,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=no', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']) caused an exception.
TensorFlow exception: Graph execution error:

Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "/u/vieting/setups/swb/20230406_feat/work/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/rnn.py", line 11, in <module>
      main()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 634, in main
      execute_main_task()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 439, in execute_main_task
      engine.init_train_from_config(config, train_data, dev_data, eval_data)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1149, in init_train_from_config
      self.init_network_from_config(config)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1234, in init_network_from_config
      self._init_network(net_desc=net_dict, epoch=self.epoch)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1429, in _init_network
      self.network, self.updater = self.create_network(
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1491, in create_network
      updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/updater.py", line 172, in __init__
      self.loss = network.get_objective()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1552, in get_objective
      self.maybe_construct_objective()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1545, in maybe_construct_objective
      self._construct_objective()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1529, in _construct_objective
      losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1499, in get_losses_initialized
      if loss_obj.get_loss_value_for_objective() is not None:
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
      self._prepare()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 4080, in _prepare
      self._loss_value = self.loss.get_value()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/layers/basic.py", line 13165, in get_value
      fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
      edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
      edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "/u/vieting/setups/swb/20230406_feat/work/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/rnn.py", line 11, in <module>
      main()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 634, in main
      execute_main_task()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 439, in execute_main_task
      engine.init_train_from_config(config, train_data, dev_data, eval_data)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1149, in init_train_from_config
      self.init_network_from_config(config)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1234, in init_network_from_config
      self._init_network(net_desc=net_dict, epoch=self.epoch)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1429, in _init_network
      self.network, self.updater = self.create_network(
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1491, in create_network
      updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/updater.py", line 172, in __init__
      self.loss = network.get_objective()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1552, in get_objective
      self.maybe_construct_objective()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1545, in maybe_construct_objective
      self._construct_objective()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1529, in _construct_objective
      losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1499, in get_losses_initialized
      if loss_obj.get_loss_value_for_objective() is not None:
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
      self._prepare()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 4080, in _prepare
      self._loss_value = self.loss.get_value()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/layers/basic.py", line 13165, in get_value
      fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
      edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
      edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
  (0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 164, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 225, in _read
    return Unpickler(p).load()

EOFError: Ran out of input


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 511, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 417, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 405, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 80, in __init__
    self.init()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 302, in init
    self._start_child()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 169, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


	 [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
	 [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
  (1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 164, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 225, in _read
    return Unpickler(p).load()

EOFError: Ran out of input


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 511, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 417, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 405, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 80, in __init__
    self.init()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 302, in init
    self._start_child()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 169, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


	 [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
  File "/u/vieting/setups/swb/20230406_feat/work/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/rnn.py", line 11, in <module>
    main()
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 634, in main
    execute_main_task()
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 439, in execute_main_task
    engine.init_train_from_config(config, train_data, dev_data, eval_data)
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1149, in init_train_from_config
    self.init_network_from_config(config)
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1234, in init_network_from_config
    self._init_network(net_desc=net_dict, epoch=self.epoch)
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1429, in _init_network
    self.network, self.updater = self.create_network(
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1491, in create_network
    updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/updater.py", line 172, in __init__
    self.loss = network.get_objective()
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1552, in get_objective
    self.maybe_construct_objective()
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1545, in maybe_construct_objective
    self._construct_objective()
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1529, in _construct_objective
    losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1499, in get_losses_initialized
    if loss_obj.get_loss_value_for_objective() is not None:
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
    self._prepare()
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 4080, in _prepare
    self._loss_value = self.loss.get_value()
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/layers/basic.py", line 13165, in get_value
    fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
    edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
    edges, weights, start_end_states = tf_compat.v1.py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/deprecation.py", line 371, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 1176, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 678, in py_func
    return py_func_common(func, inp, Tout, stateful, name=name)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 653, in py_func_common
    return _internal_py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 378, in _internal_py_func
    result = gen_script_ops.py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 795, in _apply_op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3381, in _create_op_internal
    ret = Operation.from_node_def(

Exception UnknownError() in step 0. (pid 3325131)
Failing op: <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
We tried to fetch the op inputs ([<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>]) but got another exception:
target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
�[31;1mEXCEPTION�[0m
�[34mTraceback (most recent call last):�[0m
  �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m1379�[0m, �[34min�[0m BaseSession._do_call
    �[34mline:�[0m �[34mreturn�[0m fn�[34m(�[0m�[34m*�[0margs�[34m)�[0m
    �[34mlocals:�[0m
      fn �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mfunction BaseSession�[34m.�[0m_do_run�[34m.�[0m�[34m<�[0mlocals�[34m>�[0m�[34m.�[0m_run_fn at 0x7f2192d77d30�[34m>�[0m
      args �[34;1m=�[0m �[34m<local>�[0m �[34m(�[0m�[34m{�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f2422de3eb0�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m
                             �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m
                             �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m
                             �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m
                             �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                             �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                             �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m]�[0m�[34m,�[0m
                     
                            �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m
                             �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049833�[34m]�[0m�[34m,�[0m
                             �[34m[�[0m�[34m-�[0m0�[34m.�[0m00...
  �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m1362�[0m, �[34min�[0m BaseSession._do_run.<locals>._run_fn
    �[34mline:�[0m �[34mreturn�[0m self�[34m.�[0m_call_tf_sessionrun�[34m(�[0moptions�[34m,�[0m feed_dict�[34m,�[0m fetch_list�[34m,�[0m
                                          target_list�[34m,�[0m run_metadata�[34m)�[0m
    �[34mlocals:�[0m
      self �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f2571096ac0�[34m>�[0m
      self�[34;1m.�[0m_call_tf_sessionrun �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mbound method BaseSession�[34m.�[0m_call_tf_sessionrun of �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f2571096ac0�[34m>�[0m�[34m>�[0m
      options �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
      feed_dict �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f2422de3eb0�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m
                                  �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m]�[0m�[34m,�[0m
                          
                                 �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049833�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m001...
      fetch_list �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f24250d81b0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f2423f96cf0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f2423b01830�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Ou...
      target_list �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f24080fa970�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f24080fa930�[34m>�[0m�[34m]�[0m
      run_metadata �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
  �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m1455�[0m, �[34min�[0m BaseSession._call_tf_sessionrun
    �[34mline:�[0m �[34mreturn�[0m tf_session�[34m.�[0mTF_SessionRun_wrapper�[34m(�[0mself�[34m.�[0m_session�[34m,�[0m options�[34m,�[0m feed_dict�[34m,�[0m
                                                  fetch_list�[34m,�[0m target_list�[34m,�[0m
                                                  run_metadata�[34m)�[0m
    �[34mlocals:�[0m
      tf_session �[34;1m=�[0m �[34m<global>�[0m �[34m<�[0mmodule �[36m'tensorflow.python.client.pywrap_tf_session'�[0m �[34mfrom�[0m �[36m'/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/pywrap_tf_session.py'�[0m�[34m>�[0m
      tf_session�[34;1m.�[0mTF_SessionRun_wrapper �[34;1m=�[0m �[34m<global>�[0m �[34m<�[0mbuilt�[34m-�[0m�[34min�[0m method TF_SessionRun_wrapper of PyCapsule object at 0x7f2538137300�[34m>�[0m
      self �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f2571096ac0�[34m>�[0m
      self�[34;1m.�[0m_session �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Session object at 0x7f2423372a70�[34m>�[0m
      options �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
      feed_dict �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f2422de3eb0�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m
                                  �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m]�[0m�[34m,�[0m
                          
                                 �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049833�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m001...
      fetch_list �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f24250d81b0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f2423f96cf0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f2423b01830�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Ou...
      target_list �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f24080fa970�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f24080fa930�[34m>�[0m�[34m]�[0m
      run_metadata �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
�[31mUnknownError�[0m: 2 root error(s) found.
  (0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 164, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 225, in _read
    return Unpickler(p).load()

EOFError: Ran out of input


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 511, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 417, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 405, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 80, in __init__
    self.init()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 302, in init
    self._start_child()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 169, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


	 [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
	 [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
  (1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 164, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 225, in _read
    return Unpickler(p).load()

EOFError: Ran out of input


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 511, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 417, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 405, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 80, in __init__
    self.init()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 302, in init
    self._start_child()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 169, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


	 [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.


During handling of the above exception, another exception occurred:

�[31;1mEXCEPTION�[0m
�[34mTraceback (most recent call last):�[0m
  �[34;1mFile�[0m �[36m"/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/�[0m�[36;1mengine.py�[0m�[36m"�[0m, �[34mline�[0m �[35m744�[0m, �[34min�[0m Runner.run
    �[34mline:�[0m fetches_results �[34m=�[0m sess�[34m.�[0mrun�[34m(�[0m
              fetches_dict�[34m,�[0m feed_dict�[34m=�[0mfeed_dict�[34m,�[0m options�[34m=�[0mrun_options
          �[34m)�[0m  �[37m# type: typing.Dict[str,typing.Union[numpy.ndarray,str]]�[0m
    �[34mlocals:�[0m
      fetches_results �[34;1m=�[0m �[34m<not found>�[0m
      sess �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f2571096ac0�[34m>�[0m
      sess�[34;1m.�[0mrun �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mbound method BaseSession�[34m.�[0mrun of �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f2571096ac0�[34m>�[0m�[34m>�[0m
      fetches_dict �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[36m'size:data:0'�[0m�[34m:�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/data/data_dim0_size:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m�[34m)�[0m dtype�[34m=�[0mint32�[34m>�[0m�[34m,�[0m �[36m'loss'�[0m�[34m:�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'objective/add:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[36m'cost:output'�[0m�[34m:�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[36m'loss_norm_�[0m..., len �[34m=�[0m 8
      feed_dict �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/data/data:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m ?�[34m,�[0m 1�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m
                                  �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m]�[0m�[34m,�[0m
                          
                                 �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049833�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m001...
      options �[34;1m=�[0m �[34m<not found>�[0m
      run_options �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
  �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m969�[0m, �[34min�[0m BaseSession.run
    �[34mline:�[0m result �[34m=�[0m self�[34m.�[0m_run�[34m(�[0m�[34mNone�[0m�[34m,�[0m fetches�[34m,�[0m feed_dict�[34m,�[0m options_ptr�[34m,�[0m
                             run_metadata_ptr�[34m)�[0m
    �[34mlocals:�[0m
      result �[34;1m=�[0m �[34m<not found>�[0m
      self �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f2571096ac0�[34m>�[0m
      self�[34;1m.�[0m_run �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mbound method BaseSession�[34m.�[0m_run of �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f2571096ac0�[34m>�[0m�[34m>�[0m
      fetches �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[36m'size:data:0'�[0m�[34m:�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/data/data_dim0_size:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m�[34m)�[0m dtype�[34m=�[0mint32�[34m>�[0m�[34m,�[0m �[36m'loss'�[0m�[34m:�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'objective/add:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[36m'cost:output'�[0m�[34m:�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[36m'loss_norm_�[0m..., len �[34m=�[0m 8
      feed_dict �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/data/data:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m ?�[34m,�[0m 1�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m
                                  �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m]�[0m�[34m,�[0m
                          
                                 �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049833�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m001...
      options_ptr �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
      run_metadata_ptr �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
  �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m1192�[0m, �[34min�[0m BaseSession._run
    �[34mline:�[0m results �[34m=�[0m self�[34m.�[0m_do_run�[34m(�[0mhandle�[34m,�[0m final_targets�[34m,�[0m final_fetches�[34m,�[0m
                                 feed_dict_tensor�[34m,�[0m options�[34m,�[0m run_metadata�[34m)�[0m
    �[34mlocals:�[0m
      results �[34;1m=�[0m �[34m<not found>�[0m
      self �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f2571096ac0�[34m>�[0m
      self�[34;1m.�[0m_do_run �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mbound method BaseSession�[34m.�[0m_do_run of �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f2571096ac0�[34m>�[0m�[34m>�[0m
      handle �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
      final_targets �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtf�[34m.�[0mOperation �[36m'conformer_1_conv_mod_bn/batch_norm/cond/Merge_1'�[0m type�[34m=�[0mMerge�[34m>�[0m�[34m,�[0m �[34m<�[0mtf�[34m.�[0mOperation �[36m'optim_and_step_incr'�[0m type�[34m=�[0mNoOp�[34m>�[0m�[34m]�[0m
      final_fetches �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtf�[34m.�[0mTensor �[36m'objective/add:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'objective/loss/loss_init/truediv:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'globals/mem_usage_deviceGPU0:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0m�[34min�[0m...
      feed_dict_tensor �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[34m<�[0mReference wrapping �[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/data/data:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m ?�[34m,�[0m 1�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m
                                         �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m
                                         �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m
                                         �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m
                                         �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                         �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                         �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m]�[0m�[34m,�[0m
                                 
                                        �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m
                                         �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049...
      options �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
      run_metadata �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
  �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m1372�[0m, �[34min�[0m BaseSession._do_run
    �[34mline:�[0m �[34mreturn�[0m self�[34m.�[0m_do_call�[34m(�[0m_run_fn�[34m,�[0m feeds�[34m,�[0m fetches�[34m,�[0m targets�[34m,�[0m options�[34m,�[0m
                               run_metadata�[34m)�[0m
    �[34mlocals:�[0m
      self �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f2571096ac0�[34m>�[0m
      self�[34;1m.�[0m_do_call �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mbound method BaseSession�[34m.�[0m_do_call of �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f2571096ac0�[34m>�[0m�[34m>�[0m
      _run_fn �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mfunction BaseSession�[34m.�[0m_do_run�[34m.�[0m�[34m<�[0mlocals�[34m>�[0m�[34m.�[0m_run_fn at 0x7f2192d77d30�[34m>�[0m
      feeds �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f2422de3eb0�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m
                              �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m
                              �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m
                              �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m
                              �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                              �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                              �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m]�[0m�[34m,�[0m
                      
                             �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m
                              �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049833�[34m]�[0m�[34m,�[0m
                              �[34m[�[0m�[34m-�[0m0�[34m.�[0m001...
      fetches �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f24250d81b0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f2423f96cf0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f2423b01830�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Ou...
      targets �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f24080fa970�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f24080fa930�[34m>�[0m�[34m]�[0m
      options �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
      run_metadata �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
  �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m1398�[0m, �[34min�[0m BaseSession._do_call
    �[34mline:�[0m �[34mraise�[0m type�[34m(�[0me�[34m)�[0m�[34m(�[0mnode_def�[34m,�[0m op�[34m,�[0m message�[34m)�[0m  �[37m# pylint: disable=no-value-for-parameter�[0m
    �[34mlocals:�[0m
      type �[34;1m=�[0m �[34m<builtin>�[0m �[34m<�[0m�[34mclass�[0m �[36m'type'�[0m�[34m>�[0m
      e �[34;1m=�[0m �[34m<not found>�[0m
      node_def �[34;1m=�[0m �[34m<local>�[0m name�[34m:�[0m �[36m"objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch"�[0m
                         op�[34m:�[0m �[36m"PyFunc"�[0m
                         input�[34m:�[0m �[36m"extern_data/placeholders/seq_tag/seq_tag"�[0m
                         attr �[34m{�[0m
                           key�[34m:�[0m �[36m"token"�[0m
                           value �[34m{�[0m
                             s�[34m:�[0m �[36m"pyfunc_0"�[0m
                           �[34m}�[0m
                         �[34m}�[0m
                         attr �[34m{�[0m
                           key�[34m:�[0m �[36m"Tout"�[0m
                           value �[34m{�[0m
                             list �[34m{�[0m
                               type�[34m:�[0m DT_INT32
                               type�[34m:�[0m DT_FLOAT
                               type�[34m:�[0m DT_INT...
      op �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtf�[34m.�[0mOperation �[36m'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'�[0m type�[34m=�[0mPyFunc�[34m>�[0m
      message �[34;1m=�[0m �[34m<local>�[0m �[36m'Graph execution error:\n\nDetected at node \'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch\' defined at (most recent call last):\n    File "/u/vieting/setups/swb/20230406_feat/work/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/rnn.py", line 11, in <�[0m..., len �[34m=�[0m 14876
�[31mUnknownError�[0m: Graph execution error:

Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "/u/vieting/setups/swb/20230406_feat/work/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/rnn.py", line 11, in <module>
      main()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 634, in main
      execute_main_task()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 439, in execute_main_task
      engine.init_train_from_config(config, train_data, dev_data, eval_data)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1149, in init_train_from_config
      self.init_network_from_config(config)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1234, in init_network_from_config
      self._init_network(net_desc=net_dict, epoch=self.epoch)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1429, in _init_network
      self.network, self.updater = self.create_network(
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1491, in create_network
      updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/updater.py", line 172, in __init__
      self.loss = network.get_objective()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1552, in get_objective
      self.maybe_construct_objective()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1545, in maybe_construct_objective
      self._construct_objective()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1529, in _construct_objective
      losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1499, in get_losses_initialized
      if loss_obj.get_loss_value_for_objective() is not None:
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
      self._prepare()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 4080, in _prepare
      self._loss_value = self.loss.get_value()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/layers/basic.py", line 13165, in get_value
      fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
      edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
      edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "/u/vieting/setups/swb/20230406_feat/work/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/rnn.py", line 11, in <module>
      main()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 634, in main
      execute_main_task()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 439, in execute_main_task
      engine.init_train_from_config(config, train_data, dev_data, eval_data)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1149, in init_train_from_config
      self.init_network_from_config(config)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1234, in init_network_from_config
      self._init_network(net_desc=net_dict, epoch=self.epoch)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1429, in _init_network
      self.network, self.updater = self.create_network(
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1491, in create_network
      updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/updater.py", line 172, in __init__
      self.loss = network.get_objective()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1552, in get_objective
      self.maybe_construct_objective()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1545, in maybe_construct_objective
      self._construct_objective()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1529, in _construct_objective
      losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1499, in get_losses_initialized
      if loss_obj.get_loss_value_for_objective() is not None:
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
      self._prepare()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 4080, in _prepare
      self._loss_value = self.loss.get_value()
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/layers/basic.py", line 13165, in get_value
      fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
      edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
    File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
      edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
  (0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 164, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 225, in _read
    return Unpickler(p).load()

EOFError: Ran out of input


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 511, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 417, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 405, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 80, in __init__
    self.init()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 302, in init
    self._start_child()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 169, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


	 [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
	 [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
  (1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 164, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 225, in _read
    return Unpickler(p).load()

EOFError: Ran out of input


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 511, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 417, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 405, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 80, in __init__
    self.init()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 302, in init
    self._start_child()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 169, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


	 [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
  File "/u/vieting/setups/swb/20230406_feat/work/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/rnn.py", line 11, in <module>
    main()
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 634, in main
    execute_main_task()
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 439, in execute_main_task
    engine.init_train_from_config(config, train_data, dev_data, eval_data)
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1149, in init_train_from_config
    self.init_network_from_config(config)
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1234, in init_network_from_config
    self._init_network(net_desc=net_dict, epoch=self.epoch)
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1429, in _init_network
    self.network, self.updater = self.create_network(
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1491, in create_network
    updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/updater.py", line 172, in __init__
    self.loss = network.get_objective()
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1552, in get_objective
    self.maybe_construct_objective()
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1545, in maybe_construct_objective
    self._construct_objective()
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1529, in _construct_objective
    losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1499, in get_losses_initialized
    if loss_obj.get_loss_value_for_objective() is not None:
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
    self._prepare()
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 4080, in _prepare
    self._loss_value = self.loss.get_value()
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/layers/basic.py", line 13165, in get_value
    fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
    edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
    edges, weights, start_end_states = tf_compat.v1.py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/deprecation.py", line 371, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 1176, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 678, in py_func
    return py_func_common(func, inp, Tout, stateful, name=name)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 653, in py_func_common
    return _internal_py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 378, in _internal_py_func
    result = gen_script_ops.py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 795, in _apply_op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3381, in _create_op_internal
    ret = Operation.from_node_def(



During handling of the above exception, another exception occurred:

�[31;1mEXCEPTION�[0m
�[34mTraceback (most recent call last):�[0m
  �[34;1mFile�[0m �[36m"/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/�[0m�[36;1mnetwork.py�[0m�[36m"�[0m, �[34mline�[0m �[35m4341�[0m, �[34min�[0m help_on_tf_exception
    �[34mline:�[0m debug_fetch�[34m,�[0m fetch_helpers�[34m,�[0m op_copied �[34m=�[0m FetchHelper�[34m.�[0mcopy_graph�[34m(�[0m
              debug_fetch�[34m,�[0m
              target_op�[34m=�[0mop�[34m,�[0m
              fetch_helper_tensors�[34m=�[0mlist�[34m(�[0mop�[34m.�[0minputs�[34m)�[0m�[34m,�[0m
              stop_at_ts�[34m=�[0mstop_at_ts�[34m,�[0m
              verbose_stream�[34m=�[0mfile�[34m,�[0m
          �[34m)�[0m
    �[34mlocals:�[0m
      debug_fetch �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtf�[34m.�[0mOperation �[36m'extern_data/placeholders/seq_tag/seq_tag'�[0m type�[34m=�[0mPlaceholder�[34m>�[0m
      fetch_helpers �[34;1m=�[0m �[34m<not found>�[0m
      op_copied �[34;1m=�[0m �[34m<not found>�[0m
      FetchHelper �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0m�[34mclass�[0m �[36m'returnn.tf.util.basic.FetchHelper'�[0m�[34m>�[0m
      FetchHelper�[34;1m.�[0mcopy_graph �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mbound method FetchHelper�[34m.�[0mcopy_graph of �[34m<�[0m�[34mclass�[0m �[36m'returnn.tf.util.basic.FetchHelper'�[0m�[34m>�[0m�[34m>�[0m
      target_op �[34;1m=�[0m �[34m<not found>�[0m
      op �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtf�[34m.�[0mOperation �[36m'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'�[0m type�[34m=�[0mPyFunc�[34m>�[0m
      fetch_helper_tensors �[34;1m=�[0m �[34m<not found>�[0m
      list �[34;1m=�[0m �[34m<builtin>�[0m �[34m<�[0m�[34mclass�[0m �[36m'list'�[0m�[34m>�[0m
      op�[34;1m.�[0minputs �[34;1m=�[0m �[34m<local>�[0m �[34m(�[0m�[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/seq_tag/seq_tag:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m�[34m)�[0m dtype�[34m=�[0mstring�[34m>�[0m�[34m,�[0m�[34m)�[0m
      stop_at_ts �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/data/data:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m ?�[34m,�[0m 1�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/seq_tag/seq_tag:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m�[34m)�[0m dtype�[34m=�[0mstring�[34m>�[0m�[34m,�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/data/data_dim0_size:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m�[34m)�[0m dtype�[34m=�[0mint32�[34m>�[0m�[34m,�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/batch_dim:�[0m...
      verbose_stream �[34;1m=�[0m �[34m<not found>�[0m
      file �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mreturnn�[34m.�[0mlog�[34m.�[0mStream object at 0x7f25711ccdf0�[34m>�[0m
  �[34;1mFile�[0m �[36m"/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/util/�[0m�[36;1mbasic.py�[0m�[36m"�[0m, �[34mline�[0m �[35m7700�[0m, �[34min�[0m FetchHelper.copy_graph
    �[34mline:�[0m �[34massert�[0m target_op �[34min�[0m ops�[34m,�[0m �[36m"target_op %r,\nops\n%s"�[0m �[34m%�[0m �[34m(�[0mtarget_op�[34m,�[0m pformat�[34m(�[0mops�[34m)�[0m�[34m)�[0m
    �[34mlocals:�[0m
      target_op �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtf�[34m.�[0mOperation �[36m'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'�[0m type�[34m=�[0mPyFunc�[34m>�[0m
      ops �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtf�[34m.�[0mOperation �[36m'extern_data/placeholders/seq_tag/seq_tag'�[0m type�[34m=�[0mPlaceholder�[34m>�[0m�[34m]�[0m
      pformat �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mfunction pformat at 0x7f2575517c10�[34m>�[0m
�[31mAssertionError�[0m: target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]

Step meta information:
{'seq_idx': [0,
             1,
             2,
             3,
             4,
             5,
             6,
             7,
             8,
             9,
             10,
             11,
             12,
             13,
             14,
             15,
             16,
             17,
             18,
             19,
             20,
             21,
             22,
             23,
             24,
             25,
             26,
             27,
             28,
             29,
             30,
             31,
             32,
             33,
             34,
             35,
             36,
             37,
             38],
 'seq_tag': ['switchboard-1/sw02721B/sw2721B-ms98-a-0031',
             'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
             'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
             'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
             'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
             'switchboard-1/sw02145A/sw2145A-ms98-a-0107',
             'switchboard-1/sw02484A/sw2484A-ms98-a-0077',
             'switchboard-1/sw02768A/sw2768A-ms98-a-0064',
             'switchboard-1/sw03312B/sw3312B-ms98-a-0041',
             'switchboard-1/sw02344B/sw2344B-ms98-a-0023',
             'switchboard-1/sw04248B/sw4248B-ms98-a-0017',
             'switchboard-1/sw02762A/sw2762A-ms98-a-0059',
             'switchboard-1/sw03146A/sw3146A-ms98-a-0047',
             'switchboard-1/sw03032A/sw3032A-ms98-a-0065',
             'switchboard-1/sw02288A/sw2288A-ms98-a-0080',
             'switchboard-1/sw02751A/sw2751A-ms98-a-0066',
             'switchboard-1/sw02369A/sw2369A-ms98-a-0118',
             'switchboard-1/sw04169A/sw4169A-ms98-a-0059',
             'switchboard-1/sw02227A/sw2227A-ms98-a-0016',
             'switchboard-1/sw02061B/sw2061B-ms98-a-0170',
             'switchboard-1/sw02862B/sw2862B-ms98-a-0033',
             'switchboard-1/sw03116B/sw3116B-ms98-a-0065',
             'switchboard-1/sw03517B/sw3517B-ms98-a-0038',
             'switchboard-1/sw02360B/sw2360B-ms98-a-0086',
             'switchboard-1/sw02510B/sw2510B-ms98-a-0061',
             'switchboard-1/sw03919A/sw3919A-ms98-a-0017',
             'switchboard-1/sw02965A/sw2965A-ms98-a-0045',
             'switchboard-1/sw03154A/sw3154A-ms98-a-0073',
             'switchboard-1/sw02299A/sw2299A-ms98-a-0005',
             'switchboard-1/sw04572A/sw4572A-ms98-a-0026',
             'switchboard-1/sw02682A/sw2682A-ms98-a-0022',
             'switchboard-1/sw02808A/sw2808A-ms98-a-0014',
             'switchboard-1/sw04526A/sw4526A-ms98-a-0026',
             'switchboard-1/sw03180B/sw3180B-ms98-a-0010',
             'switchboard-1/sw03227A/sw3227A-ms98-a-0029',
             'switchboard-1/sw03891B/sw3891B-ms98-a-0008',
             'switchboard-1/sw03882B/sw3882B-ms98-a-0041',
             'switchboard-1/sw03102B/sw3102B-ms98-a-0027',
             'switchboard-1/sw02454A/sw2454A-ms98-a-0029']}
Feed dict:
  <tf.Tensor 'extern_data/placeholders/batch_dim:0' shape=() dtype=int32>: int(39)
  <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: shape (39, 10208, 1), dtype float32, min/max -1.0/1.0, mean/stddev 0.0014351769/0.11459725, Tensor{'data', [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}
  <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>: shape (39,), dtype int32, min/max 4760/10208, ([ 4760  6246  6372  6861  7296  7499  7534  7622  7824  8031  8295  8431
  8690  8675  8667  8886  9084  9199  9163  9156  9274  9262  9540  9668
  9678  9719  9711  9902  9989 10010 10020 10073 10006 10102 10131 10112
 10130 10178 10208])
  <tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>: type <class 'list'>, Tensor{'seq_tag', [B?], dtype='string'}
  <tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>: bool(True)
Save model under output/models/epoch.001.crash_0
Trainer not finalized, quitting. (pid 3325131)

See also in /work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn.log to avoid the broken color codes here.

I created script to reproduce the error: vieting@cn-285:/work/asr4/vieting/tmp/20231108_tf213_sprint_op $ ./run_example.sh

@Marvin84
Copy link

Marvin84 commented Nov 8, 2023 via email

@Marvin84
Copy link

Marvin84 commented Nov 8, 2023

1 diff --git a/returnn/sprint/error_signals.py b/returnn/sprint/error_signals.py
  2 index 735ac363..1c204e68 100644
  3 --- a/returnn/sprint/error_signals.py
  4 +++ b/returnn/sprint/error_signals.py
  5 @@ -130,7 +130,7 @@ class SprintSubprocessInstance:
  6
  7      def _start_child(self):
  8          assert self.child_pid is None
  9 -        self.pipe_c2p = self._pipe_open()
 10 +        self.pipe_c2p = self._pipe_open(buffered=True)
 11          self.pipe_p2c = self._pipe_open()
 12          args = self._build_sprint_args()
 13          print("SprintSubprocessInstance: exec", args, file=log.v5)
 14 @@ -169,14 +169,14 @@ class SprintSubprocessInstance:
 15              raise Exception("SprintSubprocessInstance Sprint init failed")
 16
 17      # noinspection PyMethodMayBeStatic
 18 -    def _pipe_open(self):
 19 +    def _pipe_open(self, buffered=False):
 20          readend, writeend = os.pipe()
 21          if hasattr(os, "set_inheritable"):
 22              # https://www.python.org/dev/peps/pep-0446/
 23              os.set_inheritable(readend, True)
 24              os.set_inheritable(writeend, True)
 25 -        readend = os.fdopen(readend, "rb", 0)
 26 -        writeend = os.fdopen(writeend, "wb", 0)
 27 +        readend = os.fdopen(readend, "rb", -bool(buffered)) # -1 is default for buffered
 28 +        writeend = os.fdopen(writeend, "wb", -bool(buffered))
 29          return readend, writeend
 30
 31      @property
~                                                                                                 ~                                                                                                 ~                                                                                                 ~                                                                                                 ~                                                                                                   1 diff --git a/returnn/sprint/error_signals.py b/returnn/sprint/error_signals.py                  2 index 735ac363..1c204e68 100644                                                                 3 --- a/returnn/sprint/error_signals.py                                                           4 +++ b/returnn/sprint/error_signals.py                                                           5 @@ -130,7 +130,7 @@ class SprintSubprocessInstance:                                             6                                                                                                 7      def _start_child(self):                                                                    8          assert self.child_pid is None                                                          9 -        self.pipe_c2p = self._pipe_open()                                                     10 +        self.pipe_c2p = self._pipe_open(buffered=True)                                        11          self.pipe_p2c = self._pipe_open()                                                     12          args = self._build_sprint_args()                                                      13          print("SprintSubprocessInstance: exec", args, file=log.v5)                            14 @@ -169,14 +169,14 @@ class SprintSubprocessInstance:                                          15              raise Exception("SprintSubprocessInstance Sprint init failed")                    16                                                                                                17      # noinspection PyMethodMayBeStatic                                                        18 -    def _pipe_open(self):                                                                     19 +    def _pipe_open(self, buffered=False):                                                     20          readend, writeend = os.pipe()                                                         21          if hasattr(os, "set_inheritable"):                                                    22              # https://www.python.org/dev/peps/pep-0446/                                       23              os.set_inheritable(readend, True)                                                 24              os.set_inheritable(writeend, True)                                                25 -        readend = os.fdopen(readend, "rb", 0)                                                 26 -        writeend = os.fdopen(writeend, "wb", 0)                                               27 +        readend = os.fdopen(readend, "rb", -bool(buffered)) # -1 is default for buffered      28 +        writeend = os.fdopen(writeend, "wb", -bool(buffered))                                 29          return readend, writeend                                                              30                                                                                                31      @property                                                                                                                                                                                       

@Marvin84
Copy link

Marvin84 commented Nov 8, 2023

AFAIR, the problem occurs when running in apptainer environment only. The buffer does not contain all info and returnn crashes because of rasr automata being truncated/ not complete

@albertz
Copy link
Member

albertz commented Nov 8, 2023

So for reference, the actual error is this:

Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
  (0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 164, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 225, in _read
    return Unpickler(p).load()

EOFError: Ran out of input

@albertz albertz changed the title get_sprint_automata_for_batch op not found when using tf 2.13 TF get_sprint_automata_for_batch: EOFError: Ran out of input Nov 8, 2023
@vieting
Copy link
Contributor Author

vieting commented Nov 8, 2023

I just tested the proposed patch and it does not fix the issue for my example.

@albertz

This comment was marked as resolved.

@Marvin84

This comment was marked as resolved.

@albertz albertz closed this as completed in a3d1094 Nov 8, 2023
@albertz
Copy link
Member

albertz commented Nov 8, 2023

@vieting I pushed sth which should fix this. Can you try?

@albertz
Copy link
Member

albertz commented Nov 8, 2023

(For reference, there was also an EOFError in #1363, but I think that was another problem.)

@albertz
Copy link
Member

albertz commented Nov 8, 2023

Note: I did not actually test my recent change, as I don't have any setup ready to try this. Please try it out and report if it works.

@vieting
Copy link
Contributor Author

vieting commented Nov 8, 2023

Just tested and I still get the error.

Log:

RETURNN starting up, version 1.20231108.124950+git.a3d1094d, date/time 2023-11-08-14-13-24 (UTC+0100), pid 352402, cwd /work/asr4/vieting/tmp/20231108_tf213_sprint_op, Python /usr/bin/python3
RETURNN command line options: ['returnn.config']
Hostname: cn-283
TensorFlow: 2.13.0 (v2.13.0-rc2-7-g1cb1a030a62) (<not-under-git> in /usr/local/lib/python3.8/dist-packages/tensorflow)
Use num_threads=1 (but min 2) via OMP_NUM_THREADS.
Setup TF inter and intra global thread pools, num_threads 2, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2}.
CUDA_VISIBLE_DEVICES is set to '4'.
Collecting TensorFlow device list...
Local devices available to TensorFlow:
  1/2: name: "/device:CPU:0"
       device_type: "CPU"
       memory_limit: 268435456
       locality {
       }
       incarnation: 14088248937803725314
       xla_global_id: -1
  2/2: name: "/device:GPU:0"
       device_type: "GPU"
       memory_limit: 10089005056
       locality {
         bus_id: 2
         numa_node: 1
         links {
         }
       }
       incarnation: 17654959729817767865
       physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:81:00.0, compute capability: 7.5"
       xla_global_id: 416903419
Using gpu device 4: NVIDIA GeForce RTX 2080 Ti
Hostname 'cn-283', GPU 4, GPU-dev-name 'NVIDIA GeForce RTX 2080 Ti', GPU-memory 9.4GB
Train data:
  input: 1 x 1
  output: {'raw': {'dtype': 'string', 'shape': ()}, 'orth': [256, 1], 'data': [1, 2]}
  OggZipDataset, sequences: 249229, frames: unknown
Dev data:
  OggZipDataset, sequences: 300, frames: unknown
Learning-rate-control: file learning_rates.swb.ctc does not exist yet
Setup TF session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ...
layer /'data': [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)] float32
layer /features/'conv_h_filter': ['conv_h_filter:static:0'(128),'conv_h_filter:static:1'(1),F|F'conv_h_filter:static:2'(150)] float32
layer /features/'conv_h': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_act': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_split': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F'conv_h:channel'(150),F|F'conv_h_split_split_dims1'(1)] float32
DEPRECATION WARNING: Explicitly specify in_spatial_dims when there is more than one spatial dim in the input.
This will be disallowed with behavior_version 8.
layer /features/'conv_l': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel'(150),F|F'conv_l:channel'(5)] float32
layer /features/'conv_l_merge': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
DEPRECATION WARNING: MergeDimsLayer, only keep_order=True is allowed
This will be disallowed with behavior_version 6.
layer /features/'conv_l_act_no_norm': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'conv_l_act': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'features': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'specaug': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'conv_source': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_source_split_dims1'(1)] float32
layer /'conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_1:channel'(32)] float32
layer /'conv_1_pool': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_1:channel'(32)] float32
layer /'conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/32⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_2:channel'(64)] float32
layer /'conv_3': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_3:channel'(64)] float32
layer /'conv_merged': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conv_h:channel*conv_l:channel//2)*conv_3:channel'(24000)] float32
layer /'input_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'input_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_linear_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_linear_swish:feature-dense'(2048)] float32
layer /'conformer_1_ffmod_1_dropout_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_half_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_conv_mod_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_conv_mod_pointwise_conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32
layer /'conformer_1_conv_mod_glu': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conformer_1_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32
layer /'conformer_1_conv_mod_depthwise_conv': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_bn': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
DEPRECATION WARNING: batch_norm masked_time should be specified explicitly
This will be disallowed with behavior_version 12.
layer /'conformer_1_conv_mod_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_pointwise_conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_mhsa_mod_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_mhsa_mod_relpos_encoding': [T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32
layer /'conformer_1_mhsa_mod_self_attention': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_att_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_ffmod_2_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_ffmod_2_linear_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_linear_swish:feature-dense'(2048)] float32
layer /'conformer_1_ffmod_2_dropout_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_2_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_2_half_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'encoder': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'output:feature-dense'(88)] float32
Network layer topology:
  extern data: data: Tensor{[B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}, seq_tag: Tensor{[B?], dtype='string'}
  used data keys: ['data', 'seq_tag']
  layers:
    layer batch_norm 'conformer_1_conv_mod_bn' #: 512
    layer conv 'conformer_1_conv_mod_depthwise_conv' #: 512
    layer copy 'conformer_1_conv_mod_dropout' #: 512
    layer gating 'conformer_1_conv_mod_glu' #: 512
    layer layer_norm 'conformer_1_conv_mod_ln' #: 512
    layer linear 'conformer_1_conv_mod_pointwise_conv_1' #: 1024
    layer linear 'conformer_1_conv_mod_pointwise_conv_2' #: 512
    layer combine 'conformer_1_conv_mod_res_add' #: 512
    layer activation 'conformer_1_conv_mod_swish' #: 512
    layer copy 'conformer_1_ffmod_1_dropout' #: 512
    layer linear 'conformer_1_ffmod_1_dropout_linear' #: 512
    layer eval 'conformer_1_ffmod_1_half_res_add' #: 512
    layer linear 'conformer_1_ffmod_1_linear_swish' #: 2048
    layer layer_norm 'conformer_1_ffmod_1_ln' #: 512
    layer copy 'conformer_1_ffmod_2_dropout' #: 512
    layer linear 'conformer_1_ffmod_2_dropout_linear' #: 512
    layer eval 'conformer_1_ffmod_2_half_res_add' #: 512
    layer linear 'conformer_1_ffmod_2_linear_swish' #: 2048
    layer layer_norm 'conformer_1_ffmod_2_ln' #: 512
    layer linear 'conformer_1_mhsa_mod_att_linear' #: 512
    layer copy 'conformer_1_mhsa_mod_dropout' #: 512
    layer layer_norm 'conformer_1_mhsa_mod_ln' #: 512
    layer relative_positional_encoding 'conformer_1_mhsa_mod_relpos_encoding' #: 64
    layer combine 'conformer_1_mhsa_mod_res_add' #: 512
    layer self_attention 'conformer_1_mhsa_mod_self_attention' #: 512
    layer layer_norm 'conformer_1_output' #: 512
    layer conv 'conv_1' #: 32
    layer pool 'conv_1_pool' #: 32
    layer conv 'conv_2' #: 64
    layer conv 'conv_3' #: 64
    layer merge_dims 'conv_merged' #: 24000
    layer split_dims 'conv_source' #: 1
    layer source 'data' #: 1
    layer copy 'encoder' #: 512
    layer subnetwork 'features' #: 750
    layer conv 'features/conv_h' #: 150
    layer eval 'features/conv_h_act' #: 150
    layer variable 'features/conv_h_filter' #: 150
    layer split_dims 'features/conv_h_split' #: 1
    layer conv 'features/conv_l' #: 5
    layer layer_norm 'features/conv_l_act' #: 750
    layer eval 'features/conv_l_act_no_norm' #: 750
    layer merge_dims 'features/conv_l_merge' #: 750
    layer copy 'features/output' #: 750
    layer copy 'input_dropout' #: 512
    layer linear 'input_linear' #: 512
    layer softmax 'output' #: 88
    layer eval 'specaug' #: 750
net params #: 18473980
net trainable params: [<tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_1_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conv_1/W:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'conv_1/bias:0' shape=(32,) dtype=float32>, <tf.Variable 'conv_2/W:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'conv_2/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'conv_3/W:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'conv_3/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'features/conv_h_filter/conv_h_filter:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'features/conv_l/W:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'features/conv_l_act/bias:0' shape=(750,) dtype=float32>, <tf.Variable 'features/conv_l_act/scale:0' shape=(750,) dtype=float32>, <tf.Variable 'input_linear/W:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'output/W:0' shape=(512, 88) dtype=float32>, <tf.Variable 'output/b:0' shape=(88,) dtype=float32>]
start training at epoch 1
using batch size: {'classes': 5000, 'data': 400000}, max seqs: 128
learning rate control: NewbobMultiEpoch(num_epochs=6, update_interval=1, relative_error_threshold=-0.01, relative_error_grow_threshold=-0.01), epoch data: 1: EpochData(learningRate=1.325e-05, error={}), 2: EpochData(learningRate=1.539861111111111e-05, error={}), 3: EpochData(learningRate=1.754722222222222e-05, error={}), ..., 360: EpochData(learningRate=1.4333333333333375e-05, error={}), 361: EpochData(learningRate=1.2166666666666727e-05, error={}), 362: EpochData(learningRate=1e-05, error={}), error key: None
pretrain: None
start epoch 1 with learning rate 1.325e-05 ...
TF: log_dir: output/models/train-2023-11-08-13-13-24
Create optimizer <class 'returnn.tf.updater.NadamOptimizer'> with options {'epsilon': 1e-08, 'learning_rate': <tf.Variable 'learning_rate:0' shape=() dtype=float32>}.
Initialize optimizer (default) with slots ['m', 'v'].
These additional variable were created by the optimizer: [<tf.Variable 'optimize/gradients/conformer_1_conv_mod_bn/batch_norm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_bn/batch_norm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_depthwise_conv/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_depthwise_conv/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_1/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(1024,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_2/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_dropout_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_dropout_linear/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_linear_swish/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_linear_swish/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_dropout_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_dropout_linear/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_linear_swish/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_linear_swish/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_att_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_relpos_encoding/Gather_grad/Reshape_accum_grad/var_accum_grad:0' shape=(65, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_self_attention/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_output/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_output/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_h/convolution/ExpandDims_1_grad/Reshape_accum_grad/var_accum_grad:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l/convolution_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/input_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 88) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(88,) dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta1_power:0' shape=() dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta2_power:0' shape=() dtype=float32>].
SprintSubprocessInstance: exec ['/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:37,p2c_fd:38,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=no', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']
SprintSubprocessInstance: starting, pid 353093
SprintSubprocessInstance: Sprint child process (['/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:37,p2c_fd:38,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=no', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']) caused an exception.
TensorFlow exception: Graph execution error:

Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "./returnn/rnn.py", line 11, in <module>
      main()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
      execute_main_task()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
      engine.init_train_from_config(config, train_data, dev_data, eval_data)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
      self.init_network_from_config(config)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
      self._init_network(net_desc=net_dict, epoch=self.epoch)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
      self.network, self.updater = self.create_network(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
      updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
      self.loss = network.get_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
      self.maybe_construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
      self._construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
      losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
      if loss_obj.get_loss_value_for_objective() is not None:
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
      self._prepare()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
      self._loss_value = self.loss.get_value()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
      fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
      edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
      edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "./returnn/rnn.py", line 11, in <module>
      main()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
      execute_main_task()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
      engine.init_train_from_config(config, train_data, dev_data, eval_data)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
      self.init_network_from_config(config)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
      self._init_network(net_desc=net_dict, epoch=self.epoch)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
      self.network, self.updater = self.create_network(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
      updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
      self.loss = network.get_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
      self.maybe_construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
      self._construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
      losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
      if loss_obj.get_loss_value_for_objective() is not None:
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
      self._prepare()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
      self._loss_value = self.loss.get_value()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
      fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
      edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
      edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
  (0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 166, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 235, in _read
    raise EOFError

EOFError


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 533, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 439, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 427, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 82, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 324, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 171, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


	 [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
	 [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
  (1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 166, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 235, in _read
    raise EOFError

EOFError


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 533, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 439, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 427, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 82, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 324, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 171, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


	 [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
  File "./returnn/rnn.py", line 11, in <module>
    main()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
    execute_main_task()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
    engine.init_train_from_config(config, train_data, dev_data, eval_data)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
    self.init_network_from_config(config)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
    self._init_network(net_desc=net_dict, epoch=self.epoch)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
    self.network, self.updater = self.create_network(
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
    updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
    self.loss = network.get_objective()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
    self.maybe_construct_objective()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
    self._construct_objective()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
    losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
    if loss_obj.get_loss_value_for_objective() is not None:
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
    self._prepare()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
    self._loss_value = self.loss.get_value()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
    fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
    edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
    edges, weights, start_end_states = tf_compat.v1.py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/deprecation.py", line 371, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 1176, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 678, in py_func
    return py_func_common(func, inp, Tout, stateful, name=name)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 653, in py_func_common
    return _internal_py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 378, in _internal_py_func
    result = gen_script_ops.py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 795, in _apply_op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3381, in _create_op_internal
    ret = Operation.from_node_def(

Exception UnknownError() in step 0. (pid 352402)
Failing op: <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
We tried to fetch the op inputs ([<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>]) but got another exception:
target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
�[31;1mEXCEPTION�[0m
�[34mTraceback (most recent call last):�[0m
  �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m1379�[0m, �[34min�[0m BaseSession._do_call
    �[34mline:�[0m �[34mreturn�[0m fn�[34m(�[0m�[34m*�[0margs�[34m)�[0m
    �[34mlocals:�[0m
      fn �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mfunction BaseSession�[34m.�[0m_do_run�[34m.�[0m�[34m<�[0mlocals�[34m>�[0m�[34m.�[0m_run_fn at 0x7f3307fe4f70�[34m>�[0m
      args �[34;1m=�[0m �[34m<local>�[0m �[34m(�[0m�[34m{�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f35983ad7b0�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m
                             �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m
                             �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m
                             �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m
                             �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                             �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                             �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m]�[0m�[34m,�[0m
                     
                            �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m
                             �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049833�[34m]�[0m�[34m,�[0m
                             �[34m[�[0m�[34m-�[0m0�[34m.�[0m00...
  �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m1362�[0m, �[34min�[0m BaseSession._do_run.<locals>._run_fn
    �[34mline:�[0m �[34mreturn�[0m self�[34m.�[0m_call_tf_sessionrun�[34m(�[0moptions�[34m,�[0m feed_dict�[34m,�[0m fetch_list�[34m,�[0m
                                          target_list�[34m,�[0m run_metadata�[34m)�[0m
    �[34mlocals:�[0m
      self �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f36e7563ac0�[34m>�[0m
      self�[34;1m.�[0m_call_tf_sessionrun �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mbound method BaseSession�[34m.�[0m_call_tf_sessionrun of �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f36e7563ac0�[34m>�[0m�[34m>�[0m
      options �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
      feed_dict �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f35983ad7b0�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m
                                  �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m]�[0m�[34m,�[0m
                          
                                 �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049833�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m001...
      fetch_list �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f35893975b0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f35893a4ef0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f3589379470�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Ou...
      target_list �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f35917f95b0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f35917f9770�[34m>�[0m�[34m]�[0m
      run_metadata �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
  �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m1455�[0m, �[34min�[0m BaseSession._call_tf_sessionrun
    �[34mline:�[0m �[34mreturn�[0m tf_session�[34m.�[0mTF_SessionRun_wrapper�[34m(�[0mself�[34m.�[0m_session�[34m,�[0m options�[34m,�[0m feed_dict�[34m,�[0m
                                                  fetch_list�[34m,�[0m target_list�[34m,�[0m
                                                  run_metadata�[34m)�[0m
    �[34mlocals:�[0m
      tf_session �[34;1m=�[0m �[34m<global>�[0m �[34m<�[0mmodule �[36m'tensorflow.python.client.pywrap_tf_session'�[0m �[34mfrom�[0m �[36m'/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/pywrap_tf_session.py'�[0m�[34m>�[0m
      tf_session�[34;1m.�[0mTF_SessionRun_wrapper �[34;1m=�[0m �[34m<global>�[0m �[34m<�[0mbuilt�[34m-�[0m�[34min�[0m method TF_SessionRun_wrapper of PyCapsule object at 0x7f36aecb2300�[34m>�[0m
      self �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f36e7563ac0�[34m>�[0m
      self�[34;1m.�[0m_session �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Session object at 0x7f35986e9470�[34m>�[0m
      options �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
      feed_dict �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f35983ad7b0�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m
                                  �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m]�[0m�[34m,�[0m
                          
                                 �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049833�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m001...
      fetch_list �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f35893975b0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f35893a4ef0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f3589379470�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Ou...
      target_list �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f35917f95b0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f35917f9770�[34m>�[0m�[34m]�[0m
      run_metadata �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
�[31mUnknownError�[0m: 2 root error(s) found.
  (0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 166, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 235, in _read
    raise EOFError

EOFError


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 533, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 439, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 427, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 82, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 324, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 171, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


	 [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
	 [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
  (1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 166, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 235, in _read
    raise EOFError

EOFError


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 533, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 439, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 427, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 82, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 324, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 171, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


	 [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.


During handling of the above exception, another exception occurred:

�[31;1mEXCEPTION�[0m
�[34mTraceback (most recent call last):�[0m
  �[34;1mFile�[0m �[36m"/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/�[0m�[36;1mengine.py�[0m�[36m"�[0m, �[34mline�[0m �[35m744�[0m, �[34min�[0m Runner.run
    �[34mline:�[0m fetches_results �[34m=�[0m sess�[34m.�[0mrun�[34m(�[0m
              fetches_dict�[34m,�[0m feed_dict�[34m=�[0mfeed_dict�[34m,�[0m options�[34m=�[0mrun_options
          �[34m)�[0m  �[37m# type: typing.Dict[str,typing.Union[numpy.ndarray,str]]�[0m
    �[34mlocals:�[0m
      fetches_results �[34;1m=�[0m �[34m<not found>�[0m
      sess �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f36e7563ac0�[34m>�[0m
      sess�[34;1m.�[0mrun �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mbound method BaseSession�[34m.�[0mrun of �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f36e7563ac0�[34m>�[0m�[34m>�[0m
      fetches_dict �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[36m'size:data:0'�[0m�[34m:�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/data/data_dim0_size:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m�[34m)�[0m dtype�[34m=�[0mint32�[34m>�[0m�[34m,�[0m �[36m'loss'�[0m�[34m:�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'objective/add:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[36m'cost:output'�[0m�[34m:�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[36m'loss_norm_�[0m..., len �[34m=�[0m 8
      feed_dict �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/data/data:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m ?�[34m,�[0m 1�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m
                                  �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m]�[0m�[34m,�[0m
                          
                                 �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049833�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m001...
      options �[34;1m=�[0m �[34m<not found>�[0m
      run_options �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
  �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m969�[0m, �[34min�[0m BaseSession.run
    �[34mline:�[0m result �[34m=�[0m self�[34m.�[0m_run�[34m(�[0m�[34mNone�[0m�[34m,�[0m fetches�[34m,�[0m feed_dict�[34m,�[0m options_ptr�[34m,�[0m
                             run_metadata_ptr�[34m)�[0m
    �[34mlocals:�[0m
      result �[34;1m=�[0m �[34m<not found>�[0m
      self �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f36e7563ac0�[34m>�[0m
      self�[34;1m.�[0m_run �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mbound method BaseSession�[34m.�[0m_run of �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f36e7563ac0�[34m>�[0m�[34m>�[0m
      fetches �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[36m'size:data:0'�[0m�[34m:�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/data/data_dim0_size:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m�[34m)�[0m dtype�[34m=�[0mint32�[34m>�[0m�[34m,�[0m �[36m'loss'�[0m�[34m:�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'objective/add:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[36m'cost:output'�[0m�[34m:�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[36m'loss_norm_�[0m..., len �[34m=�[0m 8
      feed_dict �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/data/data:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m ?�[34m,�[0m 1�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m
                                  �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m]�[0m�[34m,�[0m
                          
                                 �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049833�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m001...
      options_ptr �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
      run_metadata_ptr �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
  �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m1192�[0m, �[34min�[0m BaseSession._run
    �[34mline:�[0m results �[34m=�[0m self�[34m.�[0m_do_run�[34m(�[0mhandle�[34m,�[0m final_targets�[34m,�[0m final_fetches�[34m,�[0m
                                 feed_dict_tensor�[34m,�[0m options�[34m,�[0m run_metadata�[34m)�[0m
    �[34mlocals:�[0m
      results �[34;1m=�[0m �[34m<not found>�[0m
      self �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f36e7563ac0�[34m>�[0m
      self�[34;1m.�[0m_do_run �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mbound method BaseSession�[34m.�[0m_do_run of �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f36e7563ac0�[34m>�[0m�[34m>�[0m
      handle �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
      final_targets �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtf�[34m.�[0mOperation �[36m'conformer_1_conv_mod_bn/batch_norm/cond/Merge_1'�[0m type�[34m=�[0mMerge�[34m>�[0m�[34m,�[0m �[34m<�[0mtf�[34m.�[0mOperation �[36m'optim_and_step_incr'�[0m type�[34m=�[0mNoOp�[34m>�[0m�[34m]�[0m
      final_fetches �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtf�[34m.�[0mTensor �[36m'objective/add:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'objective/loss/loss_init/truediv:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'globals/mem_usage_deviceGPU0:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0m�[34min�[0m...
      feed_dict_tensor �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[34m<�[0mReference wrapping �[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/data/data:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m ?�[34m,�[0m 1�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m
                                         �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m
                                         �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m
                                         �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m
                                         �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                         �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                         �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m]�[0m�[34m,�[0m
                                 
                                        �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m
                                         �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049...
      options �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
      run_metadata �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
  �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m1372�[0m, �[34min�[0m BaseSession._do_run
    �[34mline:�[0m �[34mreturn�[0m self�[34m.�[0m_do_call�[34m(�[0m_run_fn�[34m,�[0m feeds�[34m,�[0m fetches�[34m,�[0m targets�[34m,�[0m options�[34m,�[0m
                               run_metadata�[34m)�[0m
    �[34mlocals:�[0m
      self �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f36e7563ac0�[34m>�[0m
      self�[34;1m.�[0m_do_call �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mbound method BaseSession�[34m.�[0m_do_call of �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f36e7563ac0�[34m>�[0m�[34m>�[0m
      _run_fn �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mfunction BaseSession�[34m.�[0m_do_run�[34m.�[0m�[34m<�[0mlocals�[34m>�[0m�[34m.�[0m_run_fn at 0x7f3307fe4f70�[34m>�[0m
      feeds �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f35983ad7b0�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m
                              �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m
                              �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m
                              �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m
                              �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                              �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                              �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m]�[0m�[34m,�[0m
                      
                             �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m
                              �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049833�[34m]�[0m�[34m,�[0m
                              �[34m[�[0m�[34m-�[0m0�[34m.�[0m001...
      fetches �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f35893975b0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f35893a4ef0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f3589379470�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Ou...
      targets �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f35917f95b0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f35917f9770�[34m>�[0m�[34m]�[0m
      options �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
      run_metadata �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
  �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m1398�[0m, �[34min�[0m BaseSession._do_call
    �[34mline:�[0m �[34mraise�[0m type�[34m(�[0me�[34m)�[0m�[34m(�[0mnode_def�[34m,�[0m op�[34m,�[0m message�[34m)�[0m  �[37m# pylint: disable=no-value-for-parameter�[0m
    �[34mlocals:�[0m
      type �[34;1m=�[0m �[34m<builtin>�[0m �[34m<�[0m�[34mclass�[0m �[36m'type'�[0m�[34m>�[0m
      e �[34;1m=�[0m �[34m<not found>�[0m
      node_def �[34;1m=�[0m �[34m<local>�[0m name�[34m:�[0m �[36m"objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch"�[0m
                         op�[34m:�[0m �[36m"PyFunc"�[0m
                         input�[34m:�[0m �[36m"extern_data/placeholders/seq_tag/seq_tag"�[0m
                         attr �[34m{�[0m
                           key�[34m:�[0m �[36m"token"�[0m
                           value �[34m{�[0m
                             s�[34m:�[0m �[36m"pyfunc_0"�[0m
                           �[34m}�[0m
                         �[34m}�[0m
                         attr �[34m{�[0m
                           key�[34m:�[0m �[36m"Tout"�[0m
                           value �[34m{�[0m
                             list �[34m{�[0m
                               type�[34m:�[0m DT_INT32
                               type�[34m:�[0m DT_FLOAT
                               type�[34m:�[0m DT_INT...
      op �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtf�[34m.�[0mOperation �[36m'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'�[0m type�[34m=�[0mPyFunc�[34m>�[0m
      message �[34;1m=�[0m �[34m<local>�[0m �[36m'Graph execution error:\n\nDetected at node \'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch\' defined at (most recent call last):\n    File "./returnn/rnn.py", line 11, in <module>\n      main()\n    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__mai�[0m..., len �[34m=�[0m 11284
�[31mUnknownError�[0m: Graph execution error:

Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "./returnn/rnn.py", line 11, in <module>
      main()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
      execute_main_task()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
      engine.init_train_from_config(config, train_data, dev_data, eval_data)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
      self.init_network_from_config(config)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
      self._init_network(net_desc=net_dict, epoch=self.epoch)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
      self.network, self.updater = self.create_network(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
      updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
      self.loss = network.get_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
      self.maybe_construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
      self._construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
      losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
      if loss_obj.get_loss_value_for_objective() is not None:
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
      self._prepare()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
      self._loss_value = self.loss.get_value()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
      fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
      edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
      edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "./returnn/rnn.py", line 11, in <module>
      main()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
      execute_main_task()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
      engine.init_train_from_config(config, train_data, dev_data, eval_data)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
      self.init_network_from_config(config)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
      self._init_network(net_desc=net_dict, epoch=self.epoch)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
      self.network, self.updater = self.create_network(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
      updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
      self.loss = network.get_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
      self.maybe_construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
      self._construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
      losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
      if loss_obj.get_loss_value_for_objective() is not None:
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
      self._prepare()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
      self._loss_value = self.loss.get_value()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
      fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
      edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
      edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
  (0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 166, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 235, in _read
    raise EOFError

EOFError


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 533, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 439, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 427, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 82, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 324, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 171, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


	 [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
	 [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
  (1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 166, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 235, in _read
    raise EOFError

EOFError


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 533, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 439, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 427, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 82, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 324, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 171, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


	 [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
  File "./returnn/rnn.py", line 11, in <module>
    main()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
    execute_main_task()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
    engine.init_train_from_config(config, train_data, dev_data, eval_data)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
    self.init_network_from_config(config)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
    self._init_network(net_desc=net_dict, epoch=self.epoch)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
    self.network, self.updater = self.create_network(
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
    updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
    self.loss = network.get_objective()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
    self.maybe_construct_objective()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
    self._construct_objective()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
    losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
    if loss_obj.get_loss_value_for_objective() is not None:
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
    self._prepare()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
    self._loss_value = self.loss.get_value()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
    fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
    edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
    edges, weights, start_end_states = tf_compat.v1.py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/deprecation.py", line 371, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 1176, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 678, in py_func
    return py_func_common(func, inp, Tout, stateful, name=name)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 653, in py_func_common
    return _internal_py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 378, in _internal_py_func
    result = gen_script_ops.py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 795, in _apply_op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3381, in _create_op_internal
    ret = Operation.from_node_def(



During handling of the above exception, another exception occurred:

�[31;1mEXCEPTION�[0m
�[34mTraceback (most recent call last):�[0m
  �[34;1mFile�[0m �[36m"/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/�[0m�[36;1mnetwork.py�[0m�[36m"�[0m, �[34mline�[0m �[35m4341�[0m, �[34min�[0m help_on_tf_exception
    �[34mline:�[0m debug_fetch�[34m,�[0m fetch_helpers�[34m,�[0m op_copied �[34m=�[0m FetchHelper�[34m.�[0mcopy_graph�[34m(�[0m
              debug_fetch�[34m,�[0m
              target_op�[34m=�[0mop�[34m,�[0m
              fetch_helper_tensors�[34m=�[0mlist�[34m(�[0mop�[34m.�[0minputs�[34m)�[0m�[34m,�[0m
              stop_at_ts�[34m=�[0mstop_at_ts�[34m,�[0m
              verbose_stream�[34m=�[0mfile�[34m,�[0m
          �[34m)�[0m
    �[34mlocals:�[0m
      debug_fetch �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtf�[34m.�[0mOperation �[36m'extern_data/placeholders/seq_tag/seq_tag'�[0m type�[34m=�[0mPlaceholder�[34m>�[0m
      fetch_helpers �[34;1m=�[0m �[34m<not found>�[0m
      op_copied �[34;1m=�[0m �[34m<not found>�[0m
      FetchHelper �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0m�[34mclass�[0m �[36m'returnn.tf.util.basic.FetchHelper'�[0m�[34m>�[0m
      FetchHelper�[34;1m.�[0mcopy_graph �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mbound method FetchHelper�[34m.�[0mcopy_graph of �[34m<�[0m�[34mclass�[0m �[36m'returnn.tf.util.basic.FetchHelper'�[0m�[34m>�[0m�[34m>�[0m
      target_op �[34;1m=�[0m �[34m<not found>�[0m
      op �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtf�[34m.�[0mOperation �[36m'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'�[0m type�[34m=�[0mPyFunc�[34m>�[0m
      fetch_helper_tensors �[34;1m=�[0m �[34m<not found>�[0m
      list �[34;1m=�[0m �[34m<builtin>�[0m �[34m<�[0m�[34mclass�[0m �[36m'list'�[0m�[34m>�[0m
      op�[34;1m.�[0minputs �[34;1m=�[0m �[34m<local>�[0m �[34m(�[0m�[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/seq_tag/seq_tag:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m�[34m)�[0m dtype�[34m=�[0mstring�[34m>�[0m�[34m,�[0m�[34m)�[0m
      stop_at_ts �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/data/data:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m ?�[34m,�[0m 1�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/seq_tag/seq_tag:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m�[34m)�[0m dtype�[34m=�[0mstring�[34m>�[0m�[34m,�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/data/data_dim0_size:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m�[34m)�[0m dtype�[34m=�[0mint32�[34m>�[0m�[34m,�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/batch_dim:�[0m...
      verbose_stream �[34;1m=�[0m �[34m<not found>�[0m
      file �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mreturnn�[34m.�[0mlog�[34m.�[0mStream object at 0x7f36e7695df0�[34m>�[0m
  �[34;1mFile�[0m �[36m"/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/util/�[0m�[36;1mbasic.py�[0m�[36m"�[0m, �[34mline�[0m �[35m7700�[0m, �[34min�[0m FetchHelper.copy_graph
    �[34mline:�[0m �[34massert�[0m target_op �[34min�[0m ops�[34m,�[0m �[36m"target_op %r,\nops\n%s"�[0m �[34m%�[0m �[34m(�[0mtarget_op�[34m,�[0m pformat�[34m(�[0mops�[34m)�[0m�[34m)�[0m
    �[34mlocals:�[0m
      target_op �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtf�[34m.�[0mOperation �[36m'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'�[0m type�[34m=�[0mPyFunc�[34m>�[0m
      ops �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtf�[34m.�[0mOperation �[36m'extern_data/placeholders/seq_tag/seq_tag'�[0m type�[34m=�[0mPlaceholder�[34m>�[0m�[34m]�[0m
      pformat �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mfunction pformat at 0x7f36eb9e5c10�[34m>�[0m
�[31mAssertionError�[0m: target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]

Step meta information:
{'seq_idx': [0,
             1,
             2,
             3,
             4,
             5,
             6,
             7,
             8,
             9,
             10,
             11,
             12,
             13,
             14,
             15,
             16,
             17,
             18,
             19,
             20,
             21,
             22,
             23,
             24,
             25,
             26,
             27,
             28,
             29,
             30,
             31,
             32,
             33,
             34,
             35,
             36,
             37,
             38],
 'seq_tag': ['switchboard-1/sw02721B/sw2721B-ms98-a-0031',
             'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
             'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
             'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
             'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
             'switchboard-1/sw02145A/sw2145A-ms98-a-0107',
             'switchboard-1/sw02484A/sw2484A-ms98-a-0077',
             'switchboard-1/sw02768A/sw2768A-ms98-a-0064',
             'switchboard-1/sw03312B/sw3312B-ms98-a-0041',
             'switchboard-1/sw02344B/sw2344B-ms98-a-0023',
             'switchboard-1/sw04248B/sw4248B-ms98-a-0017',
             'switchboard-1/sw02762A/sw2762A-ms98-a-0059',
             'switchboard-1/sw03146A/sw3146A-ms98-a-0047',
             'switchboard-1/sw03032A/sw3032A-ms98-a-0065',
             'switchboard-1/sw02288A/sw2288A-ms98-a-0080',
             'switchboard-1/sw02751A/sw2751A-ms98-a-0066',
             'switchboard-1/sw02369A/sw2369A-ms98-a-0118',
             'switchboard-1/sw04169A/sw4169A-ms98-a-0059',
             'switchboard-1/sw02227A/sw2227A-ms98-a-0016',
             'switchboard-1/sw02061B/sw2061B-ms98-a-0170',
             'switchboard-1/sw02862B/sw2862B-ms98-a-0033',
             'switchboard-1/sw03116B/sw3116B-ms98-a-0065',
             'switchboard-1/sw03517B/sw3517B-ms98-a-0038',
             'switchboard-1/sw02360B/sw2360B-ms98-a-0086',
             'switchboard-1/sw02510B/sw2510B-ms98-a-0061',
             'switchboard-1/sw03919A/sw3919A-ms98-a-0017',
             'switchboard-1/sw02965A/sw2965A-ms98-a-0045',
             'switchboard-1/sw03154A/sw3154A-ms98-a-0073',
             'switchboard-1/sw02299A/sw2299A-ms98-a-0005',
             'switchboard-1/sw04572A/sw4572A-ms98-a-0026',
             'switchboard-1/sw02682A/sw2682A-ms98-a-0022',
             'switchboard-1/sw02808A/sw2808A-ms98-a-0014',
             'switchboard-1/sw04526A/sw4526A-ms98-a-0026',
             'switchboard-1/sw03180B/sw3180B-ms98-a-0010',
             'switchboard-1/sw03227A/sw3227A-ms98-a-0029',
             'switchboard-1/sw03891B/sw3891B-ms98-a-0008',
             'switchboard-1/sw03882B/sw3882B-ms98-a-0041',
             'switchboard-1/sw03102B/sw3102B-ms98-a-0027',
             'switchboard-1/sw02454A/sw2454A-ms98-a-0029']}
Feed dict:
  <tf.Tensor 'extern_data/placeholders/batch_dim:0' shape=() dtype=int32>: int(39)
  <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: shape (39, 10208, 1), dtype float32, min/max -1.0/1.0, mean/stddev 0.0014351769/0.11459725, Tensor{'data', [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}
  <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>: shape (39,), dtype int32, min/max 4760/10208, ([ 4760  6246  6372  6861  7296  7499  7534  7622  7824  8031  8295  8431
  8690  8675  8667  8886  9084  9199  9163  9156  9274  9262  9540  9668
  9678  9719  9711  9902  9989 10010 10020 10073 10006 10102 10131 10112
 10130 10178 10208])
  <tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>: type <class 'list'>, Tensor{'seq_tag', [B?], dtype='string'}
  <tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>: bool(True)
Save model under output/models/epoch.001.crash_0
Trainer not finalized, quitting. (pid 352402)

@vieting
Copy link
Contributor Author

vieting commented Nov 8, 2023

@albertz check /work/asr4/vieting/tmp/20231108_tf213_sprint_op/run_example.sh if you want to test it yourself.

@Marvin84
Copy link

Marvin84 commented Nov 8, 2023

@christophmluscher @NeoLegends does this relate to the rasr compiled with TF 2.13? Do you recognize this error?

@vieting
Copy link
Contributor Author

vieting commented Nov 8, 2023

Is it maybe a problem that RASR was compiled with my old tf 2.8 image? I still use the same RASR binary with the new image. Loading the automata does not require tf, so I thought, that I can use the same RASR.

albertz added a commit that referenced this issue Nov 8, 2023
@albertz
Copy link
Member

albertz commented Nov 8, 2023

@vieting I pushed another small change. Can you try again?

albertz added a commit that referenced this issue Nov 8, 2023
Small followup to #1456
albertz added a commit that referenced this issue Nov 8, 2023
Small followup to #1456
@vieting
Copy link
Contributor Author

vieting commented Nov 8, 2023

I pushed another small change. Can you try again?

Unfortunately, this still does not fix my example.

Traceback (most recent call last):
                                                  
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
    ret = self._read()                                                                               
                                                                                                                                                                                                           
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read                                                                                       
    return util.read_pickled_object(p)                                                                                                                                                                     
                                                  
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()                  
                                                                                                                                                                                                           
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes
RETURNN starting up, version 1.20231108.140626+git.9fe93590, date/time 2023-11-08-15-13-28 (UTC+0100), pid 356353, cwd /work/asr4/vieting/tmp/20231108_tf213_sprint_op, Python /usr/bin/python3
RETURNN command line options: ['returnn.config']
Hostname: cn-283
TensorFlow: 2.13.0 (v2.13.0-rc2-7-g1cb1a030a62) (<not-under-git> in /usr/local/lib/python3.8/dist-packages/tensorflow)
Use num_threads=1 (but min 2) via OMP_NUM_THREADS.
Setup TF inter and intra global thread pools, num_threads 2, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2}.
CUDA_VISIBLE_DEVICES is set to '4'.
Collecting TensorFlow device list...
Local devices available to TensorFlow:
  1/2: name: "/device:CPU:0"
       device_type: "CPU"
       memory_limit: 268435456
       locality {
       }
       incarnation: 13595377529408947728
       xla_global_id: -1
  2/2: name: "/device:GPU:0"
       device_type: "GPU"
       memory_limit: 10089005056
       locality {
         bus_id: 2
         numa_node: 1
         links {
         }
       }
       incarnation: 17849739553926303687
       physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:81:00.0, compute capability: 7.5"
       xla_global_id: 416903419
Using gpu device 4: NVIDIA GeForce RTX 2080 Ti
Hostname 'cn-283', GPU 4, GPU-dev-name 'NVIDIA GeForce RTX 2080 Ti', GPU-memory 9.4GB
Train data:
  input: 1 x 1
  output: {'raw': {'dtype': 'string', 'shape': ()}, 'orth': [256, 1], 'data': [1, 2]}
  OggZipDataset, sequences: 249229, frames: unknown
Dev data:
  OggZipDataset, sequences: 300, frames: unknown
Learning-rate-control: file learning_rates.swb.ctc does not exist yet
Setup TF session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ...
layer /'data': [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)] float32
layer /features/'conv_h_filter': ['conv_h_filter:static:0'(128),'conv_h_filter:static:1'(1),F|F'conv_h_filter:static:2'(150)] float32
layer /features/'conv_h': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_act': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_split': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F'conv_h:channel'(150),F|F'conv_h_split_split_dims1'(1)] float32
DEPRECATION WARNING: Explicitly specify in_spatial_dims when there is more than one spatial dim in the input.
This will be disallowed with behavior_version 8.
layer /features/'conv_l': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel'(150),F|F'conv_l:channel'(5)] float32
layer /features/'conv_l_merge': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
DEPRECATION WARNING: MergeDimsLayer, only keep_order=True is allowed
This will be disallowed with behavior_version 6.
layer /features/'conv_l_act_no_norm': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'conv_l_act': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'features': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'specaug': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'conv_source': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_source_split_dims1'(1)] float32
layer /'conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_1:channel'(32)] float32
layer /'conv_1_pool': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_1:channel'(32)] float32
layer /'conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/32⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_2:channel'(64)] float32
layer /'conv_3': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_3:channel'(64)] float32
layer /'conv_merged': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conv_h:channel*conv_l:channel//2)*conv_3:channel'(24000)] float32
layer /'input_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'input_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_linear_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_linear_swish:feature-dense'(2048)] float32
layer /'conformer_1_ffmod_1_dropout_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_half_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_conv_mod_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_conv_mod_pointwise_conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32
layer /'conformer_1_conv_mod_glu': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conformer_1_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32
layer /'conformer_1_conv_mod_depthwise_conv': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_bn': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
DEPRECATION WARNING: batch_norm masked_time should be specified explicitly
This will be disallowed with behavior_version 12.
layer /'conformer_1_conv_mod_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_pointwise_conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_mhsa_mod_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_mhsa_mod_relpos_encoding': [T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32
layer /'conformer_1_mhsa_mod_self_attention': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_att_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_ffmod_2_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_ffmod_2_linear_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_linear_swish:feature-dense'(2048)] float32
layer /'conformer_1_ffmod_2_dropout_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_2_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_2_half_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'encoder': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'output:feature-dense'(88)] float32
Network layer topology:
  extern data: data: Tensor{[B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}, seq_tag: Tensor{[B?], dtype='string'}
  used data keys: ['data', 'seq_tag']
  layers:
    layer batch_norm 'conformer_1_conv_mod_bn' #: 512
    layer conv 'conformer_1_conv_mod_depthwise_conv' #: 512
    layer copy 'conformer_1_conv_mod_dropout' #: 512
    layer gating 'conformer_1_conv_mod_glu' #: 512
    layer layer_norm 'conformer_1_conv_mod_ln' #: 512
    layer linear 'conformer_1_conv_mod_pointwise_conv_1' #: 1024
    layer linear 'conformer_1_conv_mod_pointwise_conv_2' #: 512
    layer combine 'conformer_1_conv_mod_res_add' #: 512
    layer activation 'conformer_1_conv_mod_swish' #: 512
    layer copy 'conformer_1_ffmod_1_dropout' #: 512
    layer linear 'conformer_1_ffmod_1_dropout_linear' #: 512
    layer eval 'conformer_1_ffmod_1_half_res_add' #: 512
    layer linear 'conformer_1_ffmod_1_linear_swish' #: 2048
    layer layer_norm 'conformer_1_ffmod_1_ln' #: 512
    layer copy 'conformer_1_ffmod_2_dropout' #: 512
    layer linear 'conformer_1_ffmod_2_dropout_linear' #: 512
    layer eval 'conformer_1_ffmod_2_half_res_add' #: 512
    layer linear 'conformer_1_ffmod_2_linear_swish' #: 2048
    layer layer_norm 'conformer_1_ffmod_2_ln' #: 512
    layer linear 'conformer_1_mhsa_mod_att_linear' #: 512
    layer copy 'conformer_1_mhsa_mod_dropout' #: 512
    layer layer_norm 'conformer_1_mhsa_mod_ln' #: 512
    layer relative_positional_encoding 'conformer_1_mhsa_mod_relpos_encoding' #: 64
    layer combine 'conformer_1_mhsa_mod_res_add' #: 512
    layer self_attention 'conformer_1_mhsa_mod_self_attention' #: 512
    layer layer_norm 'conformer_1_output' #: 512
    layer conv 'conv_1' #: 32
    layer pool 'conv_1_pool' #: 32
    layer conv 'conv_2' #: 64
    layer conv 'conv_3' #: 64
    layer merge_dims 'conv_merged' #: 24000
    layer split_dims 'conv_source' #: 1
    layer source 'data' #: 1
    layer copy 'encoder' #: 512
    layer subnetwork 'features' #: 750
    layer conv 'features/conv_h' #: 150
    layer eval 'features/conv_h_act' #: 150
    layer variable 'features/conv_h_filter' #: 150
    layer split_dims 'features/conv_h_split' #: 1
    layer conv 'features/conv_l' #: 5
    layer layer_norm 'features/conv_l_act' #: 750
    layer eval 'features/conv_l_act_no_norm' #: 750
    layer merge_dims 'features/conv_l_merge' #: 750
    layer copy 'features/output' #: 750
    layer copy 'input_dropout' #: 512
    layer linear 'input_linear' #: 512
    layer softmax 'output' #: 88
    layer eval 'specaug' #: 750
net params #: 18473980
net trainable params: [<tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_1_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conv_1/W:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'conv_1/bias:0' shape=(32,) dtype=float32>, <tf.Variable 'conv_2/W:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'conv_2/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'conv_3/W:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'conv_3/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'features/conv_h_filter/conv_h_filter:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'features/conv_l/W:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'features/conv_l_act/bias:0' shape=(750,) dtype=float32>, <tf.Variable 'features/conv_l_act/scale:0' shape=(750,) dtype=float32>, <tf.Variable 'input_linear/W:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'output/W:0' shape=(512, 88) dtype=float32>, <tf.Variable 'output/b:0' shape=(88,) dtype=float32>]
start training at epoch 1
using batch size: {'classes': 5000, 'data': 400000}, max seqs: 128
learning rate control: NewbobMultiEpoch(num_epochs=6, update_interval=1, relative_error_threshold=-0.01, relative_error_grow_threshold=-0.01), epoch data: 1: EpochData(learningRate=1.325e-05, error={}), 2: EpochData(learningRate=1.539861111111111e-05, error={}), 3: EpochData(learningRate=1.754722222222222e-05, error={}), ..., 360: EpochData(learningRate=1.4333333333333375e-05, error={}), 361: EpochData(learningRate=1.2166666666666727e-05, error={}), 362: EpochData(learningRate=1e-05, error={}), error key: None
pretrain: None
start epoch 1 with learning rate 1.325e-05 ...
TF: log_dir: output/models/train-2023-11-08-14-13-28
Create optimizer <class 'returnn.tf.updater.NadamOptimizer'> with options {'epsilon': 1e-08, 'learning_rate': <tf.Variable 'learning_rate:0' shape=() dtype=float32>}.
Initialize optimizer (default) with slots ['m', 'v'].
These additional variable were created by the optimizer: [<tf.Variable 'optimize/gradients/conformer_1_conv_mod_bn/batch_norm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_bn/batch_norm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_depthwise_conv/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_depthwise_conv/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_1/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(1024,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_2/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_dropout_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_dropout_linear/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_linear_swish/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_linear_swish/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_dropout_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_dropout_linear/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_linear_swish/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_linear_swish/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_att_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_relpos_encoding/Gather_grad/Reshape_accum_grad/var_accum_grad:0' shape=(65, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_self_attention/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_output/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_output/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_h/convolution/ExpandDims_1_grad/Reshape_accum_grad/var_accum_grad:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l/convolution_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/input_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 88) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(88,) dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta1_power:0' shape=() dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta2_power:0' shape=() dtype=float32>].
SprintSubprocessInstance: exec ['/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:37,p2c_fd:38,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=no', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']
SprintSubprocessInstance: starting, pid 356974
SprintSubprocessInstance: Sprint child process (['/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:37,p2c_fd:38,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=no', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']) caused an exception.
TensorFlow exception: Graph execution error:

Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "./returnn/rnn.py", line 11, in <module>
      main()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
      execute_main_task()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
      engine.init_train_from_config(config, train_data, dev_data, eval_data)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
      self.init_network_from_config(config)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
      self._init_network(net_desc=net_dict, epoch=self.epoch)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
      self.network, self.updater = self.create_network(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
      updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
      self.loss = network.get_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
      self.maybe_construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
      self._construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
      losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
      if loss_obj.get_loss_value_for_objective() is not None:
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
      self._prepare()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
      self._loss_value = self.loss.get_value()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
      fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
      edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
      edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "./returnn/rnn.py", line 11, in <module>
      main()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
      execute_main_task()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
      engine.init_train_from_config(config, train_data, dev_data, eval_data)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
      self.init_network_from_config(config)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
      self._init_network(net_desc=net_dict, epoch=self.epoch)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
      self.network, self.updater = self.create_network(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
      updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
      self.loss = network.get_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
      self.maybe_construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
      self._construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
      losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
      if loss_obj.get_loss_value_for_objective() is not None:
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
      self._prepare()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
      self._loss_value = self.loss.get_value()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
      fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
      edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
      edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
  (0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


	 [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
	 [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
  (1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


	 [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
  File "./returnn/rnn.py", line 11, in <module>
    main()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
    execute_main_task()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
    engine.init_train_from_config(config, train_data, dev_data, eval_data)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
    self.init_network_from_config(config)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
    self._init_network(net_desc=net_dict, epoch=self.epoch)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
    self.network, self.updater = self.create_network(
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
    updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
    self.loss = network.get_objective()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
    self.maybe_construct_objective()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
    self._construct_objective()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
    losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
    if loss_obj.get_loss_value_for_objective() is not None:
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
    self._prepare()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
    self._loss_value = self.loss.get_value()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
    fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
    edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
    edges, weights, start_end_states = tf_compat.v1.py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/deprecation.py", line 371, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 1176, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 678, in py_func
    return py_func_common(func, inp, Tout, stateful, name=name)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 653, in py_func_common
    return _internal_py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 378, in _internal_py_func
    result = gen_script_ops.py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 795, in _apply_op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3381, in _create_op_internal
    ret = Operation.from_node_def(

Exception UnknownError() in step 0. (pid 356353)
Failing op: <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
We tried to fetch the op inputs ([<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>]) but got another exception:
target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
�[31;1mEXCEPTION�[0m
�[34mTraceback (most recent call last):�[0m
  �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m1379�[0m, �[34min�[0m BaseSession._do_call
    �[34mline:�[0m �[34mreturn�[0m fn�[34m(�[0m�[34m*�[0margs�[34m)�[0m
    �[34mlocals:�[0m
      fn �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mfunction BaseSession�[34m.�[0m_do_run�[34m.�[0m�[34m<�[0mlocals�[34m>�[0m�[34m.�[0m_run_fn at 0x7f4267b80c10�[34m>�[0m
      args �[34;1m=�[0m �[34m<local>�[0m �[34m(�[0m�[34m{�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f44f80b9630�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m
                             �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m
                             �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m
                             �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m
                             �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                             �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                             �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m]�[0m�[34m,�[0m
                     
                            �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m
                             �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049833�[34m]�[0m�[34m,�[0m
                             �[34m[�[0m�[34m-�[0m0�[34m.�[0m00...
  �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m1362�[0m, �[34min�[0m BaseSession._do_run.<locals>._run_fn
    �[34mline:�[0m �[34mreturn�[0m self�[34m.�[0m_call_tf_sessionrun�[34m(�[0moptions�[34m,�[0m feed_dict�[34m,�[0m fetch_list�[34m,�[0m
                                          target_list�[34m,�[0m run_metadata�[34m)�[0m
    �[34mlocals:�[0m
      self �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f46458c3d60�[34m>�[0m
      self�[34;1m.�[0m_call_tf_sessionrun �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mbound method BaseSession�[34m.�[0m_call_tf_sessionrun of �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f46458c3d60�[34m>�[0m�[34m>�[0m
      options �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
      feed_dict �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f44f80b9630�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m
                                  �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m]�[0m�[34m,�[0m
                          
                                 �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049833�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m001...
      fetch_list �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f44f2b68ef0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f44f2b688b0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f44ef901eb0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Ou...
      target_list �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f44eaac5d70�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f44eaac5db0�[34m>�[0m�[34m]�[0m
      run_metadata �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
  �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m1455�[0m, �[34min�[0m BaseSession._call_tf_sessionrun
    �[34mline:�[0m �[34mreturn�[0m tf_session�[34m.�[0mTF_SessionRun_wrapper�[34m(�[0mself�[34m.�[0m_session�[34m,�[0m options�[34m,�[0m feed_dict�[34m,�[0m
                                                  fetch_list�[34m,�[0m target_list�[34m,�[0m
                                                  run_metadata�[34m)�[0m
    �[34mlocals:�[0m
      tf_session �[34;1m=�[0m �[34m<global>�[0m �[34m<�[0mmodule �[36m'tensorflow.python.client.pywrap_tf_session'�[0m �[34mfrom�[0m �[36m'/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/pywrap_tf_session.py'�[0m�[34m>�[0m
      tf_session�[34;1m.�[0mTF_SessionRun_wrapper �[34;1m=�[0m �[34m<global>�[0m �[34m<�[0mbuilt�[34m-�[0m�[34min�[0m method TF_SessionRun_wrapper of PyCapsule object at 0x7f46444243f0�[34m>�[0m
      self �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f46458c3d60�[34m>�[0m
      self�[34;1m.�[0m_session �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Session object at 0x7f44f83404f0�[34m>�[0m
      options �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
      feed_dict �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f44f80b9630�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m
                                  �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m]�[0m�[34m,�[0m
                          
                                 �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049833�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m001...
      fetch_list �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f44f2b68ef0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f44f2b688b0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f44ef901eb0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Ou...
      target_list �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f44eaac5d70�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f44eaac5db0�[34m>�[0m�[34m]�[0m
      run_metadata �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
�[31mUnknownError�[0m: 2 root error(s) found.
  (0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


	 [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
	 [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
  (1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


	 [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.


During handling of the above exception, another exception occurred:

�[31;1mEXCEPTION�[0m
�[34mTraceback (most recent call last):�[0m
  �[34;1mFile�[0m �[36m"/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/�[0m�[36;1mengine.py�[0m�[36m"�[0m, �[34mline�[0m �[35m744�[0m, �[34min�[0m Runner.run
    �[34mline:�[0m fetches_results �[34m=�[0m sess�[34m.�[0mrun�[34m(�[0m
              fetches_dict�[34m,�[0m feed_dict�[34m=�[0mfeed_dict�[34m,�[0m options�[34m=�[0mrun_options
          �[34m)�[0m  �[37m# type: typing.Dict[str,typing.Union[numpy.ndarray,str]]�[0m
    �[34mlocals:�[0m
      fetches_results �[34;1m=�[0m �[34m<not found>�[0m
      sess �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f46458c3d60�[34m>�[0m
      sess�[34;1m.�[0mrun �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mbound method BaseSession�[34m.�[0mrun of �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f46458c3d60�[34m>�[0m�[34m>�[0m
      fetches_dict �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[36m'size:data:0'�[0m�[34m:�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/data/data_dim0_size:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m�[34m)�[0m dtype�[34m=�[0mint32�[34m>�[0m�[34m,�[0m �[36m'loss'�[0m�[34m:�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'objective/add:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[36m'cost:output'�[0m�[34m:�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[36m'loss_norm_�[0m..., len �[34m=�[0m 8
      feed_dict �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/data/data:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m ?�[34m,�[0m 1�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m
                                  �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m]�[0m�[34m,�[0m
                          
                                 �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049833�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m001...
      options �[34;1m=�[0m �[34m<not found>�[0m
      run_options �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
  �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m969�[0m, �[34min�[0m BaseSession.run
    �[34mline:�[0m result �[34m=�[0m self�[34m.�[0m_run�[34m(�[0m�[34mNone�[0m�[34m,�[0m fetches�[34m,�[0m feed_dict�[34m,�[0m options_ptr�[34m,�[0m
                             run_metadata_ptr�[34m)�[0m
    �[34mlocals:�[0m
      result �[34;1m=�[0m �[34m<not found>�[0m
      self �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f46458c3d60�[34m>�[0m
      self�[34;1m.�[0m_run �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mbound method BaseSession�[34m.�[0m_run of �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f46458c3d60�[34m>�[0m�[34m>�[0m
      fetches �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[36m'size:data:0'�[0m�[34m:�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/data/data_dim0_size:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m�[34m)�[0m dtype�[34m=�[0mint32�[34m>�[0m�[34m,�[0m �[36m'loss'�[0m�[34m:�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'objective/add:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[36m'cost:output'�[0m�[34m:�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[36m'loss_norm_�[0m..., len �[34m=�[0m 8
      feed_dict �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/data/data:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m ?�[34m,�[0m 1�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m
                                  �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m]�[0m�[34m,�[0m
                          
                                 �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049833�[34m]�[0m�[34m,�[0m
                                  �[34m[�[0m�[34m-�[0m0�[34m.�[0m001...
      options_ptr �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
      run_metadata_ptr �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
  �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m1192�[0m, �[34min�[0m BaseSession._run
    �[34mline:�[0m results �[34m=�[0m self�[34m.�[0m_do_run�[34m(�[0mhandle�[34m,�[0m final_targets�[34m,�[0m final_fetches�[34m,�[0m
                                 feed_dict_tensor�[34m,�[0m options�[34m,�[0m run_metadata�[34m)�[0m
    �[34mlocals:�[0m
      results �[34;1m=�[0m �[34m<not found>�[0m
      self �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f46458c3d60�[34m>�[0m
      self�[34;1m.�[0m_do_run �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mbound method BaseSession�[34m.�[0m_do_run of �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f46458c3d60�[34m>�[0m�[34m>�[0m
      handle �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
      final_targets �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtf�[34m.�[0mOperation �[36m'conformer_1_conv_mod_bn/batch_norm/cond/Merge_1'�[0m type�[34m=�[0mMerge�[34m>�[0m�[34m,�[0m �[34m<�[0mtf�[34m.�[0mOperation �[36m'optim_and_step_incr'�[0m type�[34m=�[0mNoOp�[34m>�[0m�[34m]�[0m
      final_fetches �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtf�[34m.�[0mTensor �[36m'objective/add:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'objective/loss/loss_init/truediv:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'globals/mem_usage_deviceGPU0:0'�[0m shape�[34m=�[0m�[34m(�[0m�[34m)�[0m dtype�[34m=�[0m�[34min�[0m...
      feed_dict_tensor �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[34m<�[0mReference wrapping �[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/data/data:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m ?�[34m,�[0m 1�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m
                                         �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m
                                         �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m
                                         �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m
                                         �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                         �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                                         �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m]�[0m�[34m,�[0m
                                 
                                        �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m
                                         �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049...
      options �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
      run_metadata �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
  �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m1372�[0m, �[34min�[0m BaseSession._do_run
    �[34mline:�[0m �[34mreturn�[0m self�[34m.�[0m_do_call�[34m(�[0m_run_fn�[34m,�[0m feeds�[34m,�[0m fetches�[34m,�[0m targets�[34m,�[0m options�[34m,�[0m
                               run_metadata�[34m)�[0m
    �[34mlocals:�[0m
      self �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f46458c3d60�[34m>�[0m
      self�[34;1m.�[0m_do_call �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mbound method BaseSession�[34m.�[0m_do_call of �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f46458c3d60�[34m>�[0m�[34m>�[0m
      _run_fn �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mfunction BaseSession�[34m.�[0m_do_run�[34m.�[0m�[34m<�[0mlocals�[34m>�[0m�[34m.�[0m_run_fn at 0x7f4267b80c10�[34m>�[0m
      feeds �[34;1m=�[0m �[34m<local>�[0m �[34m{�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f44f80b9630�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m
                              �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m
                              �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m
                              �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m
                              �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                              �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m,�[0m
                              �[34m[�[0m 0�[34m.�[0m        �[34m]�[0m�[34m]�[0m�[34m,�[0m
                      
                             �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m
                              �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049833�[34m]�[0m�[34m,�[0m
                              �[34m[�[0m�[34m-�[0m0�[34m.�[0m001...
      fetches �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f44f2b68ef0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f44f2b688b0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f44ef901eb0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Ou...
      targets �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f44eaac5d70�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f44eaac5db0�[34m>�[0m�[34m]�[0m
      options �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
      run_metadata �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m
  �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m1398�[0m, �[34min�[0m BaseSession._do_call
    �[34mline:�[0m �[34mraise�[0m type�[34m(�[0me�[34m)�[0m�[34m(�[0mnode_def�[34m,�[0m op�[34m,�[0m message�[34m)�[0m  �[37m# pylint: disable=no-value-for-parameter�[0m
    �[34mlocals:�[0m
      type �[34;1m=�[0m �[34m<builtin>�[0m �[34m<�[0m�[34mclass�[0m �[36m'type'�[0m�[34m>�[0m
      e �[34;1m=�[0m �[34m<not found>�[0m
      node_def �[34;1m=�[0m �[34m<local>�[0m name�[34m:�[0m �[36m"objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch"�[0m
                         op�[34m:�[0m �[36m"PyFunc"�[0m
                         input�[34m:�[0m �[36m"extern_data/placeholders/seq_tag/seq_tag"�[0m
                         attr �[34m{�[0m
                           key�[34m:�[0m �[36m"token"�[0m
                           value �[34m{�[0m
                             s�[34m:�[0m �[36m"pyfunc_0"�[0m
                           �[34m}�[0m
                         �[34m}�[0m
                         attr �[34m{�[0m
                           key�[34m:�[0m �[36m"Tout"�[0m
                           value �[34m{�[0m
                             list �[34m{�[0m
                               type�[34m:�[0m DT_INT32
                               type�[34m:�[0m DT_FLOAT
                               type�[34m:�[0m DT_INT...
      op �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtf�[34m.�[0mOperation �[36m'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'�[0m type�[34m=�[0mPyFunc�[34m>�[0m
      message �[34;1m=�[0m �[34m<local>�[0m �[36m'Graph execution error:\n\nDetected at node \'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch\' defined at (most recent call last):\n    File "./returnn/rnn.py", line 11, in <module>\n      main()\n    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__mai�[0m..., len �[34m=�[0m 12234
�[31mUnknownError�[0m: Graph execution error:

Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "./returnn/rnn.py", line 11, in <module>
      main()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
      execute_main_task()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
      engine.init_train_from_config(config, train_data, dev_data, eval_data)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
      self.init_network_from_config(config)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
      self._init_network(net_desc=net_dict, epoch=self.epoch)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
      self.network, self.updater = self.create_network(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
      updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
      self.loss = network.get_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
      self.maybe_construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
      self._construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
      losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
      if loss_obj.get_loss_value_for_objective() is not None:
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
      self._prepare()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
      self._loss_value = self.loss.get_value()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
      fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
      edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
      edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "./returnn/rnn.py", line 11, in <module>
      main()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
      execute_main_task()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
      engine.init_train_from_config(config, train_data, dev_data, eval_data)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
      self.init_network_from_config(config)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
      self._init_network(net_desc=net_dict, epoch=self.epoch)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
      self.network, self.updater = self.create_network(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
      updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
      self.loss = network.get_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
      self.maybe_construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
      self._construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
      losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
      if loss_obj.get_loss_value_for_objective() is not None:
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
      self._prepare()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
      self._loss_value = self.loss.get_value()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
      fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
      edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
      edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
  (0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


	 [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
	 [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
  (1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


	 [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
  File "./returnn/rnn.py", line 11, in <module>
    main()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
    execute_main_task()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
    engine.init_train_from_config(config, train_data, dev_data, eval_data)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
    self.init_network_from_config(config)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
    self._init_network(net_desc=net_dict, epoch=self.epoch)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
    self.network, self.updater = self.create_network(
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
    updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
    self.loss = network.get_objective()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
    self.maybe_construct_objective()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
    self._construct_objective()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
    losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
    if loss_obj.get_loss_value_for_objective() is not None:
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
    self._prepare()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
    self._loss_value = self.loss.get_value()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
    fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
    edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
    edges, weights, start_end_states = tf_compat.v1.py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/deprecation.py", line 371, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 1176, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 678, in py_func
    return py_func_common(func, inp, Tout, stateful, name=name)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 653, in py_func_common
    return _internal_py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 378, in _internal_py_func
    result = gen_script_ops.py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 795, in _apply_op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3381, in _create_op_internal
    ret = Operation.from_node_def(



During handling of the above exception, another exception occurred:

�[31;1mEXCEPTION�[0m
�[34mTraceback (most recent call last):�[0m
  �[34;1mFile�[0m �[36m"/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/�[0m�[36;1mnetwork.py�[0m�[36m"�[0m, �[34mline�[0m �[35m4341�[0m, �[34min�[0m help_on_tf_exception
    �[34mline:�[0m debug_fetch�[34m,�[0m fetch_helpers�[34m,�[0m op_copied �[34m=�[0m FetchHelper�[34m.�[0mcopy_graph�[34m(�[0m
              debug_fetch�[34m,�[0m
              target_op�[34m=�[0mop�[34m,�[0m
              fetch_helper_tensors�[34m=�[0mlist�[34m(�[0mop�[34m.�[0minputs�[34m)�[0m�[34m,�[0m
              stop_at_ts�[34m=�[0mstop_at_ts�[34m,�[0m
              verbose_stream�[34m=�[0mfile�[34m,�[0m
          �[34m)�[0m
    �[34mlocals:�[0m
      debug_fetch �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtf�[34m.�[0mOperation �[36m'extern_data/placeholders/seq_tag/seq_tag'�[0m type�[34m=�[0mPlaceholder�[34m>�[0m
      fetch_helpers �[34;1m=�[0m �[34m<not found>�[0m
      op_copied �[34;1m=�[0m �[34m<not found>�[0m
      FetchHelper �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0m�[34mclass�[0m �[36m'returnn.tf.util.basic.FetchHelper'�[0m�[34m>�[0m
      FetchHelper�[34;1m.�[0mcopy_graph �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mbound method FetchHelper�[34m.�[0mcopy_graph of �[34m<�[0m�[34mclass�[0m �[36m'returnn.tf.util.basic.FetchHelper'�[0m�[34m>�[0m�[34m>�[0m
      target_op �[34;1m=�[0m �[34m<not found>�[0m
      op �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtf�[34m.�[0mOperation �[36m'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'�[0m type�[34m=�[0mPyFunc�[34m>�[0m
      fetch_helper_tensors �[34;1m=�[0m �[34m<not found>�[0m
      list �[34;1m=�[0m �[34m<builtin>�[0m �[34m<�[0m�[34mclass�[0m �[36m'list'�[0m�[34m>�[0m
      op�[34;1m.�[0minputs �[34;1m=�[0m �[34m<local>�[0m �[34m(�[0m�[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/seq_tag/seq_tag:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m�[34m)�[0m dtype�[34m=�[0mstring�[34m>�[0m�[34m,�[0m�[34m)�[0m
      stop_at_ts �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/data/data:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m ?�[34m,�[0m 1�[34m)�[0m dtype�[34m=�[0mfloat32�[34m>�[0m�[34m,�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/seq_tag/seq_tag:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m�[34m)�[0m dtype�[34m=�[0mstring�[34m>�[0m�[34m,�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/data/data_dim0_size:0'�[0m shape�[34m=�[0m�[34m(�[0m?�[34m,�[0m�[34m)�[0m dtype�[34m=�[0mint32�[34m>�[0m�[34m,�[0m �[34m<�[0mtf�[34m.�[0mTensor �[36m'extern_data/placeholders/batch_dim:�[0m...
      verbose_stream �[34;1m=�[0m �[34m<not found>�[0m
      file �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mreturnn�[34m.�[0mlog�[34m.�[0mStream object at 0x7f4646730e50�[34m>�[0m
  �[34;1mFile�[0m �[36m"/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/util/�[0m�[36;1mbasic.py�[0m�[36m"�[0m, �[34mline�[0m �[35m7700�[0m, �[34min�[0m FetchHelper.copy_graph
    �[34mline:�[0m �[34massert�[0m target_op �[34min�[0m ops�[34m,�[0m �[36m"target_op %r,\nops\n%s"�[0m �[34m%�[0m �[34m(�[0mtarget_op�[34m,�[0m pformat�[34m(�[0mops�[34m)�[0m�[34m)�[0m
    �[34mlocals:�[0m
      target_op �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mtf�[34m.�[0mOperation �[36m'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'�[0m type�[34m=�[0mPyFunc�[34m>�[0m
      ops �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtf�[34m.�[0mOperation �[36m'extern_data/placeholders/seq_tag/seq_tag'�[0m type�[34m=�[0mPlaceholder�[34m>�[0m�[34m]�[0m
      pformat �[34;1m=�[0m �[34m<local>�[0m �[34m<�[0mfunction pformat at 0x7f464aa7ec10�[34m>�[0m
�[31mAssertionError�[0m: target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]

Step meta information:
{'seq_idx': [0,
             1,
             2,
             3,
             4,
             5,
             6,
             7,
             8,
             9,
             10,
             11,
             12,
             13,
             14,
             15,
             16,
             17,
             18,
             19,
             20,
             21,
             22,
             23,
             24,
             25,
             26,
             27,
             28,
             29,
             30,
             31,
             32,
             33,
             34,
             35,
             36,
             37,
             38],
 'seq_tag': ['switchboard-1/sw02721B/sw2721B-ms98-a-0031',
             'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
             'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
             'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
             'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
             'switchboard-1/sw02145A/sw2145A-ms98-a-0107',
             'switchboard-1/sw02484A/sw2484A-ms98-a-0077',
             'switchboard-1/sw02768A/sw2768A-ms98-a-0064',
             'switchboard-1/sw03312B/sw3312B-ms98-a-0041',
             'switchboard-1/sw02344B/sw2344B-ms98-a-0023',
             'switchboard-1/sw04248B/sw4248B-ms98-a-0017',
             'switchboard-1/sw02762A/sw2762A-ms98-a-0059',
             'switchboard-1/sw03146A/sw3146A-ms98-a-0047',
             'switchboard-1/sw03032A/sw3032A-ms98-a-0065',
             'switchboard-1/sw02288A/sw2288A-ms98-a-0080',
             'switchboard-1/sw02751A/sw2751A-ms98-a-0066',
             'switchboard-1/sw02369A/sw2369A-ms98-a-0118',
             'switchboard-1/sw04169A/sw4169A-ms98-a-0059',
             'switchboard-1/sw02227A/sw2227A-ms98-a-0016',
             'switchboard-1/sw02061B/sw2061B-ms98-a-0170',
             'switchboard-1/sw02862B/sw2862B-ms98-a-0033',
             'switchboard-1/sw03116B/sw3116B-ms98-a-0065',
             'switchboard-1/sw03517B/sw3517B-ms98-a-0038',
             'switchboard-1/sw02360B/sw2360B-ms98-a-0086',
             'switchboard-1/sw02510B/sw2510B-ms98-a-0061',
             'switchboard-1/sw03919A/sw3919A-ms98-a-0017',
             'switchboard-1/sw02965A/sw2965A-ms98-a-0045',
             'switchboard-1/sw03154A/sw3154A-ms98-a-0073',
             'switchboard-1/sw02299A/sw2299A-ms98-a-0005',
             'switchboard-1/sw04572A/sw4572A-ms98-a-0026',
             'switchboard-1/sw02682A/sw2682A-ms98-a-0022',
             'switchboard-1/sw02808A/sw2808A-ms98-a-0014',
             'switchboard-1/sw04526A/sw4526A-ms98-a-0026',
             'switchboard-1/sw03180B/sw3180B-ms98-a-0010',
             'switchboard-1/sw03227A/sw3227A-ms98-a-0029',
             'switchboard-1/sw03891B/sw3891B-ms98-a-0008',
             'switchboard-1/sw03882B/sw3882B-ms98-a-0041',
             'switchboard-1/sw03102B/sw3102B-ms98-a-0027',
             'switchboard-1/sw02454A/sw2454A-ms98-a-0029']}
Feed dict:
  <tf.Tensor 'extern_data/placeholders/batch_dim:0' shape=() dtype=int32>: int(39)
  <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: shape (39, 10208, 1), dtype float32, min/max -1.0/1.0, mean/stddev 0.0014351769/0.11459725, Tensor{'data', [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}
  <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>: shape (39,), dtype int32, min/max 4760/10208, ([ 4760  6246  6372  6861  7296  7499  7534  7622  7824  8031  8295  8431
  8690  8675  8667  8886  9084  9199  9163  9156  9274  9262  9540  9668
  9678  9719  9711  9902  9989 10010 10020 10073 10006 10102 10131 10112
 10130 10178 10208])
  <tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>: type <class 'list'>, Tensor{'seq_tag', [B?], dtype='string'}
  <tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>: bool(True)
Save model under output/models/epoch.001.crash_0
Trainer not finalized, quitting. (pid 356353)

@vieting vieting reopened this Nov 8, 2023
@vieting
Copy link
Contributor Author

vieting commented Nov 8, 2023

I get the same error when using a tf 2.14 image and RASR compiled using that image.

@albertz
Copy link
Member

albertz commented Nov 8, 2023

Is that the original stdout + stderr, or just the log?

It looks a bit like maybe RASR does not correctly starts at all? You should e.g. see this then on stdout:

print("RETURNN SprintControl[pid %i] Python module load" % os.getpid())

And then:

    print(
        (
            "RETURNN SprintControl[pid %i] init: "
            "name=%r, sprint_unit=%r, version_number=%r, callback=%r, ref=%r, config=%r, kwargs=%r"
        )
        % (os.getpid(), name, sprint_unit, version_number, callback, reference, config, kwargs)
    )

If you don't see that, then my recent fixes, and also Tinas patch are not really related to your issue at all.

You should check the RASR log then. There should be some error by RASR, probably Python related, maybe sth like that it could not load the module or so. Maybe some import missing.

@vieting
Copy link
Contributor Author

vieting commented Nov 8, 2023

What I posted before was from the log. The following is copied from stdout and stderr (with tf 2.14 image, also for RASR compilation):

vieting@cn-251:/work/asr4/vieting/tmp/20231108_tf213_sprint_op$ ./run_example_rasr_tf214.sh
RETURNN starting up, version 1.20231108.140626+git.9fe93590.dirty, date/time 2023-11-08-16-43-54 (UTC+0100), pid 2130233, cwd /work/asr4/vieting/tmp/20231108_tf213_sprint_op, Python /usr/bin/python3
RETURNN command line options: ['returnn.tf214.config']
Hostname: cn-251
2023-11-08 16:44:01.024863: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-08 16:44:01.024944: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-08 16:44:01.034051: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-08 16:44:02.271356: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
TensorFlow: 2.14.0 (v2.14.0-rc1-21-g4dacf3f368e) (<not-under-git> in /usr/local/lib/python3.11/dist-packages/tensorflow)
Use num_threads=1 (but min 2) via OMP_NUM_THREADS.
Setup TF inter and intra global thread pools, num_threads 2, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2}.
CUDA_VISIBLE_DEVICES is set to '2'.
Collecting TensorFlow device list...
2023-11-08 16:44:23.424846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /device:GPU:0 with 10396 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:81:00.0, compute capability: 6.1
Local devices available to TensorFlow:
  1/2: name: "/device:CPU:0"
       device_type: "CPU"
       memory_limit: 268435456
       locality {
       }
       incarnation: 11581945563073303627
       xla_global_id: -1
  2/2: name: "/device:GPU:0"
       device_type: "GPU"
       memory_limit: 10901061632
       locality {
         bus_id: 2
         numa_node: 1
         links {
         }
       }
       incarnation: 1815047742352363074
       physical_device_desc: "device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:81:00.0, compute capability: 6.1"
       xla_global_id: 416903419
Using gpu device 2: NVIDIA GeForce GTX 1080 Ti
Hostname 'cn-251', GPU 2, GPU-dev-name 'NVIDIA GeForce GTX 1080 Ti', GPU-memory 10.2GB
Train data:
  input: 1 x 1
  output: {'raw': {'dtype': 'string', 'shape': ()}, 'orth': [256, 1], 'data': [1, 2]}
  OggZipDataset, sequences: 249229, frames: unknown
Dev data:
  OggZipDataset, sequences: 300, frames: unknown
Learning-rate-control: file learning_rates.swb.ctc does not exist yet
Setup TF session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ...
2023-11-08 16:44:31.951062: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10396 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:81:00.0, compute capability: 6.1
layer /'data': [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)] float32
layer /features/'conv_h_filter': ['conv_h_filter:static:0'(128),'conv_h_filter:static:1'(1),F|F'conv_h_filter:static:2'(150)] float32
layer /features/'conv_h': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_act': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_split': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F'conv_h:channel'(150),F|F'conv_h_split_split_dims1'(1)] float32
DEPRECATION WARNING: Explicitly specify in_spatial_dims when there is more than one spatial dim in the input.
This will be disallowed with behavior_version 8.
layer /features/'conv_l': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel'(150),F|F'conv_l:channel'(5)] float32
layer /features/'conv_l_merge': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
DEPRECATION WARNING: MergeDimsLayer, only keep_order=True is allowed
This will be disallowed with behavior_version 6.
layer /features/'conv_l_act_no_norm': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'conv_l_act': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'features': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'specaug': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'conv_source': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_source_split_dims1'(1)] float32
layer /'conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_1:channel'(32)] float32
WARNING:tensorflow:From /work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/util/basic.py:1723: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
layer /'conv_1_pool': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_1:channel'(32)] float32
layer /'conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/32⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_2:channel'(64)] float32
layer /'conv_3': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_3:channel'(64)] float32
layer /'conv_merged': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conv_h:channel*conv_l:channel//2)*conv_3:channel'(24000)] float32
layer /'input_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'encoder': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
2023-11-08 16:44:32.241797: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10396 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:81:00.0, compute capability: 6.1
layer /'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'output:feature-dense'(88)] float32
WARNING:tensorflow:From /work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py:54: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.

Network layer topology:
  extern data: data: Tensor{[B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}, seq_tag: Tensor{[B?], dtype='string'}
  used data keys: ['data', 'seq_tag']
  layers:
    layer conv 'conv_1' #: 32
    layer pool 'conv_1_pool' #: 32
    layer conv 'conv_2' #: 64
    layer conv 'conv_3' #: 64
    layer merge_dims 'conv_merged' #: 24000
    layer split_dims 'conv_source' #: 1
    layer source 'data' #: 1
    layer copy 'encoder' #: 512
    layer subnetwork 'features' #: 750
    layer conv 'features/conv_h' #: 150
    layer eval 'features/conv_h_act' #: 150
    layer variable 'features/conv_h_filter' #: 150
    layer split_dims 'features/conv_h_split' #: 1
    layer conv 'features/conv_l' #: 5
    layer layer_norm 'features/conv_l_act' #: 750
    layer eval 'features/conv_l_act_no_norm' #: 750
    layer merge_dims 'features/conv_l_merge' #: 750
    layer copy 'features/output' #: 750
    layer linear 'input_linear' #: 512
    layer softmax 'output' #: 88
    layer copy 'specaug' #: 750
net params #: 12409788
net trainable params: [<tf.Variable 'conv_1/W:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'conv_1/bias:0' shape=(32,) dtype=float32>, <tf.Variable 'conv_2/W:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'conv_2/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'conv_3/W:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'conv_3/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'features/conv_h_filter/conv_h_filter:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'features/conv_l/W:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'features/conv_l_act/bias:0' shape=(750,) dtype=float32>, <tf.Variable 'features/conv_l_act/scale:0' shape=(750,) dtype=float32>, <tf.Variable 'input_linear/W:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'output/W:0' shape=(512, 88) dtype=float32>, <tf.Variable 'output/b:0' shape=(88,) dtype=float32>]
2023-11-08 16:44:34.658621: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:382] MLIR V1 optimization pass is not enabled
start training at epoch 1
using batch size: {'classes': 5000, 'data': 400000}, max seqs: 128
learning rate control: NewbobMultiEpoch(num_epochs=6, update_interval=1, relative_error_threshold=-0.01, relative_error_grow_threshold=-0.01), epoch data: 1: EpochData(learningRate=1.325e-05, error={}), 2: EpochData(learningRate=1.539861111111111e-05, error={}), 3: EpochData(learningRate=1.754722222222222e-05, error={}), ..., 360: EpochData(learningRate=1.4333333333333375e-05, error={}), 361: EpochData(learningRate=1.2166666666666727e-05, error={}), 362: EpochData(learningRate=1e-05, error={}), error key: None
pretrain: None
start epoch 1 with learning rate 1.325e-05 ...
TF: log_dir: output/models/train-2023-11-08-15-43-53
Create optimizer <class 'returnn.tf.updater.NadamOptimizer'> with options {'epsilon': 1e-08, 'learning_rate': <tf.Variable 'learning_rate:0' shape=() dtype=float32>}.
Initialize optimizer (default) with slots ['m', 'v'].
These additional variable were created by the optimizer: [<tf.Variable 'optimize/gradients/conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_h/convolution/ExpandDims_1_grad/Reshape_accum_grad/var_accum_grad:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l/convolution_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/input_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 88) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(88,) dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta1_power:0' shape=() dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta2_power:0' shape=() dtype=float32>].
2023-11-08 16:44:39.517531: W tensorflow/c/c_api.cc:305] Operation '{name:'global_step' id:357 op device:{requested: '/device:CPU:0', assigned: ''} def:{{{node global_step}} = VarHandleOp[_class=["loc:@global_step"], _has_manual_control_dependencies=true, allowed_devices=[], container="", dtype=DT_INT64, shape=[], shared_name="global_step", _device="/device:CPU:0"]()}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
SprintSubprocessInstance: exec ['/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:35,p2c_fd:36,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=yes', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']
SprintSubprocessInstance: starting, pid 2130824
/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard: Relink `/usr/local/lib/python3.11/dist-packages/tensorflow/libtensorflow_framework.so.2' with `/lib/x86_64-linux-gnu/libz.so.1' for IFUNC symbol `crc32_z'
2023-11-08 16:44:43.478818: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-08 16:44:43.478967: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-08 16:44:43.479063: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
configuration error: failed to open file "neural-network-trainer.config" for reading. (No such file or directory)
RETURNN SprintControl[pid 2130824] Python module load
RETURNN SprintControl[pid 2130824] init: name='Sprint.PythonControl', sprint_unit='NnTrainer.pythonControl', version_number=5, callback=<built-in method callback of PyCapsule object at 0x7f58637e2e80>, ref=<capsule object "Sprint.PythonControl.Internal" at 0x7f58637e2e80>, config={'c2p_fd': '35', 'p2c_fd': '36', 'minPythonControlVersion': '4'}, kwargs={}
RETURNN SprintControl[pid 2130824] PythonControl create {'c2p_fd': 35, 'p2c_fd': 36, 'name': 'Sprint.PythonControl', 'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7f58637e2e80>, 'config': {'c2p_fd': '35', 'p2c_fd': '36', 'minPythonControlVersion': '4'}, 'sprint_unit': 'NnTrainer.pythonControl', 'version_number': 5, 'min_version_number': 4, 'callback': <built-in method callback of PyCapsule object at 0x7f58637e2e80>}
RETURNN SprintControl[pid 2130824] PythonControl init {'name': 'Sprint.PythonControl', 'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7f58637e2e80>, 'config': {'c2p_fd': '35', 'p2c_fd': '36', 'minPythonControlVersion': '4'}, 'sprint_unit': 'NnTrainer.pythonControl', 'version_number': 5, 'min_version_number': 4, 'callback': <built-in method callback of PyCapsule object at 0x7f58637e2e80>}
RETURNN SprintControl[pid 2130824] init for Sprint.PythonControl {'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7f58637e2e80>, 'config': {'c2p_fd': '35', 'p2c_fd': '36', 'minPythonControlVersion': '4'}}
RETURNN SprintControl[pid 2130824] PythonControl run_control_loop: <built-in method callback of PyCapsule object at 0x7f58637e2e80>, {}
RETURNN SprintControl[pid 2130824] PythonControl run_control_loop control: '<version>RWTH ASR 0.9beta (431c74d54b895a2a4c3689bcd5bf641a878bb925)\n</version>'
SprintSubprocessInstance: exec ['/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:36,p2c_fd:38,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=yes', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']
SprintSubprocessInstance: starting, pid 2130845
/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard: Relink `/usr/local/lib/python3.11/dist-packages/tensorflow/libtensorflow_framework.so.2' with `/lib/x86_64-linux-gnu/libz.so.1' for IFUNC symbol `crc32_z'
2023-11-08 16:44:44.788087: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-08 16:44:44.788217: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-08 16:44:44.788276: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
configuration error: failed to open file "neural-network-trainer.config" for reading. (No such file or directory)
RETURNN SprintControl[pid 2130845] Python module load
RETURNN SprintControl[pid 2130845] init: name='Sprint.PythonControl', sprint_unit='NnTrainer.pythonControl', version_number=5, callback=<built-in method callback of PyCapsule object at 0x7f6940b4ee80>, ref=<capsule object "Sprint.PythonControl.Internal" at 0x7f6940b4ee80>, config={'c2p_fd': '36', 'p2c_fd': '38', 'minPythonControlVersion': '4'}, kwargs={}
RETURNN SprintControl[pid 2130845] PythonControl create {'c2p_fd': 36, 'p2c_fd': 38, 'name': 'Sprint.PythonControl', 'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7f6940b4ee80>, 'config': {'c2p_fd': '36', 'p2c_fd': '38', 'minPythonControlVersion': '4'}, 'sprint_unit': 'NnTrainer.pythonControl', 'version_number': 5, 'min_version_number': 4, 'callback': <built-in method callback of PyCapsule object at 0x7f6940b4ee80>}
RETURNN SprintControl[pid 2130845] PythonControl init {'name': 'Sprint.PythonControl', 'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7f6940b4ee80>, 'config': {'c2p_fd': '36', 'p2c_fd': '38', 'minPythonControlVersion': '4'}, 'sprint_unit': 'NnTrainer.pythonControl', 'version_number': 5, 'min_version_number': 4, 'callback': <built-in method callback of PyCapsule object at 0x7f6940b4ee80>}
RETURNN SprintControl[pid 2130845] init for Sprint.PythonControl {'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7f6940b4ee80>, 'config': {'c2p_fd': '36', 'p2c_fd': '38', 'minPythonControlVersion': '4'}}
RETURNN SprintControl[pid 2130845] PythonControl run_control_loop: <built-in method callback of PyCapsule object at 0x7f6940b4ee80>, {}
RETURNN SprintControl[pid 2130845] PythonControl run_control_loop control: '<version>RWTH ASR 0.9beta (431c74d54b895a2a4c3689bcd5bf641a878bb925)\n</version>'
2023-11-08 16:45:03.663421: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8600
Fatal Python error: Segmentation fault

Current thread 0x00007f69453ea380 (most recent call first):
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/control.py", line 499 in _handle_cmd_export_allophone_state_fsa_by_segment_name
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/control.py", line 509 in _handle_cmd
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/control.py", line 524 in handle_next
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/control.py", line 550 in run_control_loop

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector (total: 37)
<?xml version="1.0" encoding="UTF-8"?>
<sprint>
<?xml version="1.0" encoding="UTF-8"?>
<sprint>


  PROGRAM DEFECTIVE (TERMINATED BY SIGNAL):
  Segmentation fault

  Creating stack trace (innermost first):
  #2  /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f6947720520]
  #3  /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c) [0x7f69477749fc]
  #4  /lib/x86_64-linux-gnu/libc.so.6(raise+0x16) [0x7f6947720476]
  #5  /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f6947720520]
  #6  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK3Ftl13TrimAutomatonIN3Fsa9AutomatonEE8getStateEj+0x3a) [0x55d2626e440a]
  #7  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK3Ftl14CacheAutomatonIN3Fsa9AutomatonEE8getStateEj+0x3a2) [0x55d2626f3c72]
  #8  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(+0x9fb257) [0x55d262675257]
  #9  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(+0x9fe9ac) [0x55d2626789ac]
  #10  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK2Am15TransitionModel5applyEN4Core3RefIKN3Fsa9AutomatonEEEib+0x274) [0x55d262671194]
  #11  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Am24ClassicTransducerBuilder20applyTransitionModelEN4Core3RefIKN3Fsa9AutomatonEEE+0x387) [0x55d262660df7]
  #12  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech26AllophoneStateGraphBuilder17addLoopTransitionEN4Core3RefIKN3Fsa9AutomatonEEE+0x123) [0x55d262482e43]
  #13  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech23CTCTopologyGraphBuilder17addLoopTransitionEN4Core3RefIKN3Fsa9AutomatonEEE+0x53) [0x55d262483183]
  #14  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech23CTCTopologyGraphBuilder15buildTransducerEN4Core3RefIKN3Fsa9AutomatonEEE+0x8f) [0x55d262485cbf]
  #15  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech26AllophoneStateGraphBuilder15buildTransducerERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x66) [0x55d262480516]
  #16  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech26AllophoneStateGraphBuilder5buildERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x2e) [0x55d262480d5e]
  #17  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK2Nn25AllophoneStateFsaExporter23exportFsaForOrthographyERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x54) [0x55d262359054]
  #18  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Nn13PythonControl8Internal32exportAllophoneStateFsaBySegNameEP7_objectS3_+0x133) [0x55d26233e833]
  #19  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Nn13PythonControl8Internal8callbackEP7_objectS3_+0x25d) [0x55d26233ee6d]
  #20  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x1cd073) [0x7f697baa0073]
  #21  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyObject_MakeTpCall+0x87) [0x7f697ba50ff7]
  #22  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x477a) [0x7f697b9de96a]
  #23  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x26bf9a) [0x7f697bb3ef9a]
  #24  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x181058) [0x7f697ba54058]
  #25  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x50ae) [0x7f697b9df29e]
  #26  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x26bf9a) [0x7f697bb3ef9a]
  #27  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x181058) [0x7f697ba54058]
  #28  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x50ae) [0x7f697b9df29e]
  #29  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x26bf9a) [0x7f697bb3ef9a]
  #30  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x1810d8) [0x7f697ba540d8]
  #31  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyObject_Call+0x128) [0x7f697ba53b88]
  #32  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Python8PyCallKwEP7_objectPKcS3_z+0xe6) [0x55d26258c876]
  #33  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Nn13PythonControl16run_control_loopEv+0x5f) [0x55d262332fbf]
  #34  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN9NnTrainer13pythonControlEv+0x167) [0x55d2620df317]
  #35  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN9NnTrainer4mainERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS6_EE+0x303) [0x55d2620b8e13]
  #36  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN4Core11Application3runERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x23) [0x55d26211e413]
  #37  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN4Core11Application4mainEiPPc+0x577) [0x55d2620ba577]
  #38  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(main+0x3d) [0x55d2620b852d]
  #39  /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f6947707d90]
  #40  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f6947707e40]
  #41  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_start+0x25) [0x55d2620dd7a5]

Exception in py_wrap_get_sprint_automata_for_batch:
EXCEPTION
Traceback (most recent call last):
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in get_sprint_automata_for_batch_op.<locals>.py_wrap_get_sprint_automata_for_batch
    line: return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
    locals:
      py_get_sprint_automata_for_batch = <global> <function py_get_sprint_automata_for_batch at 0x7ff04b0351c0>
      sprint_opts = <local> {'sprintExecPath': '/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', 'sprintConfigStr': '--*.configuration.channel=output-channel --*.real-time-factor.channel=output-channel --*.system-info.channel=output-channel --*.time.channel=output-channel --*.version....
      tags = <not found>
      py_tags = <local> array([b'switchboard-1/sw02721B/sw2721B-ms98-a-0031',
                               b'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
                               b'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
                               b'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
                               b'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
                               b'switchboard-1/sw02...
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    line: edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
    locals:
      edges = <not found>
      weights = <not found>
      start_end_states = <not found>
      sprint_instance_pool = <local> <returnn.sprint.error_signals.SprintInstancePool object at 0x7ff04c59c1d0>
      sprint_instance_pool.get_automata_for_batch = <local> <bound method SprintInstancePool.get_automata_for_batch of <returnn.sprint.error_signals.SprintInstancePool object at 0x7ff04c59c1d0>>
      tags = <local> array([b'switchboard-1/sw02721B/sw2721B-ms98-a-0031',
                            b'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
                            b'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
                            b'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
                            b'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
                            b'switchboard-1/sw02...
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in SprintInstancePool.get_automata_for_batch
    line: r = instance._read()
    locals:
      r = <local> ('ok', 9, 22, array([ 1,  2,  3,  4,  5,  6,  7,  0,  1,  2,  3,  4,  5,  6,  0,  2,  4,
                          6,  7,  5,  6,  4,  1,  2,  3,  4,  5,  6,  7,  1,  2,  3,  4,  5,
                          6,  7,  2,  4,  6,  8,  8,  8,  8,  8,  0,  6,  0, 22,  0, 48,  0,
                          0,  6,  0, 22,  0, 48,  0,  6, 22, 48, 48,  0, 48,...
      instance = <local> <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7ff102701d10>
      instance._read = <local> <bound method SprintSubprocessInstance._read of <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7ff102701d10>>
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in SprintSubprocessInstance._read
    line: return util.read_pickled_object(p)
    locals:
      util = <global> <module 'returnn.util.basic' from '/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py'>
      util.read_pickled_object = <global> <function read_pickled_object at 0x7ff17f482b60>
      p = <local> <_io.FileIO name=35 mode='rb' closefd=True>
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    line: size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
    locals:
      size_raw = <not found>
      read_bytes_to_new_buffer = <global> <function read_bytes_to_new_buffer at 0x7ff17f482ac0>
      p = <local> <_io.FileIO name=35 mode='rb' closefd=True>
      getvalue = <not found>
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    line: raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
    locals:
      EOFError = <builtin> <class 'EOFError'>
      size = <local> 4
      read_size = <local> 0
EOFError: expected to read 4 bytes but got EOF after 0 bytes
2023-11-08 16:45:06.805151: W tensorflow/core/framework/op_kernel.cc:1827] UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


2023-11-08 16:45:06.805314: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 4669204044388377120
2023-11-08 16:45:06.805394: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 14394728958513161507
2023-11-08 16:45:06.805423: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 4611900397994247129
2023-11-08 16:45:06.805450: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 11246935140361182411
2023-11-08 16:45:06.805476: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 3527483492372743068
2023-11-08 16:45:06.805500: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 455321662105441778
2023-11-08 16:45:06.805527: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 4997316685218163964
2023-11-08 16:45:06.805550: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 11970666840078253952
TensorFlow exception: Graph execution error:

Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
  (0) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
         [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_127]]
  (1) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/deprecation.py", line 383, in new_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/dispatch.py", line 1260, in op_dispatch_handler
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 798, in py_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 773, in py_func_common
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 380, in _internal_py_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/op_def_library.py", line 796, in _apply_op_helper
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 2657, in _create_op_internal
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 1161, in from_node_def

Exception UnknownError() in step 0. (pid 2130233)
Failing op: <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
We tried to fetch the op inputs ([<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>]) but got another exception:
target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
EXCEPTION
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1402, in BaseSession._do_call
    line: return fn(*args)
    locals:
      fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7ff04bb38860>
      args = <local> ({<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff052ae50f0>: array([[[-0.05505638],
                             [-0.09610788],
                             [-0.05115783],
                             ...,
                             [ 0.        ],
                             [ 0.        ],
                             [ 0.        ]],

                            [[-0.00226238],
                             [-0.01049833],
                             [-0.00...
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1385, in BaseSession._do_run.<locals>._run_fn
    line: return self._call_tf_sessionrun(options, feed_dict, fetch_list,
                                          target_list, run_metadata)
    locals:
      self = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
      self._call_tf_sessionrun = <local> <bound method BaseSession._call_tf_sessionrun of <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>>
      options = <local> None
      feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff052ae50f0>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051676ef0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051674e70>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff05177f870>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
      target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7ff04c6b2770>]
      run_metadata = <local> None
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1478, in BaseSession._call_tf_sessionrun
    line: return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
                                                  fetch_list, target_list,
                                                  run_metadata)
    locals:
      tf_session = <global> <module 'tensorflow.python.client.pywrap_tf_session' from '/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/pywrap_tf_session.py'>
      tf_session.TF_SessionRun_wrapper = <global> <built-in method TF_SessionRun_wrapper of PyCapsule object at 0x7ff14a9916e0>
      self = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
      self._session = <local> <tensorflow.python.client._pywrap_tf_session.TF_Session object at 0x7ff052c1fb30>
      options = <local> None
      feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff052ae50f0>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051676ef0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051674e70>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff05177f870>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
      target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7ff04c6b2770>]
      run_metadata = <local> None
UnknownError: 2 root error(s) found.
  (0) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
         [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_127]]
  (1) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.


During handling of the above exception, another exception occurred:

EXCEPTION
Traceback (most recent call last):
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 744, in Runner.run
    line: fetches_results = sess.run(
              fetches_dict, feed_dict=feed_dict, options=run_options
          )  # type: typing.Dict[str,typing.Union[numpy.ndarray,str]]
    locals:
      fetches_results = <not found>
      sess = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
      sess.run = <local> <bound method BaseSession.run of <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>>
      fetches_dict = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 7
      feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      options = <not found>
      run_options = <local> None
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 972, in BaseSession.run
    line: result = self._run(None, fetches, feed_dict, options_ptr,
                             run_metadata_ptr)
    locals:
      result = <not found>
      self = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
      self._run = <local> <bound method BaseSession._run of <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>>
      fetches = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 7
      feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      options_ptr = <local> None
      run_metadata_ptr = <local> None
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1215, in BaseSession._run
    line: results = self._do_run(handle, final_targets, final_fetches,
                                 feed_dict_tensor, options, run_metadata)
    locals:
      results = <not found>
      self = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
      self._do_run = <local> <bound method BaseSession._do_run of <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>>
      handle = <local> None
      final_targets = <local> [<tf.Operation 'optim_and_step_incr' type=NoOp>]
      final_fetches = <local> [<tf.Tensor 'objective/add:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss_init/truediv:0' shape=() dtype=float32>, <tf.Tensor 'globals/mem_usage_deviceGPU0:0' shape=() dtype=in...
      feed_dict_tensor = <local> {<Reference wrapping <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>>: array([[[-0.05505638],
                                         [-0.09610788],
                                         [-0.05115783],
                                         ...,
                                         [ 0.        ],
                                         [ 0.        ],
                                         [ 0.        ]],

                                        [[-0.00226238],
                                         [-0.01049...
      options = <local> None
      run_metadata = <local> None
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1395, in BaseSession._do_run
    line: return self._do_call(_run_fn, feeds, fetches, targets, options,
                               run_metadata)
    locals:
      self = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
      self._do_call = <local> <bound method BaseSession._do_call of <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>>
      _run_fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7ff04bb38860>
      feeds = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff052ae50f0>: array([[[-0.05505638],
                              [-0.09610788],
                              [-0.05115783],
                              ...,
                              [ 0.        ],
                              [ 0.        ],
                              [ 0.        ]],

                             [[-0.00226238],
                              [-0.01049833],
                              [-0.001...
      fetches = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051676ef0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051674e70>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff05177f870>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
      targets = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7ff04c6b2770>]
      options = <local> None
      run_metadata = <local> None
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1421, in BaseSession._do_call
    line: raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
    locals:
      type = <builtin> <class 'type'>
      e = <not found>
      node_def = <local> name: "objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch"
                         op: "PyFunc"
                         input: "extern_data/placeholders/seq_tag/seq_tag"
                         attr {
                           key: "token"
                           value {
                             s: "pyfunc_0"
                           }
                         }
                         attr {
                           key: "Tout"
                           value {
                             list {
                               type: DT_INT32
                               type: DT_FLOAT
                               type: DT_INT...
      op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
      message = <local> 'Graph execution error:\n\nDetected at node \'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch\' defined at (most recent call last):\n    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>\n    File "/work/asr4/vieting/tmp/20231108_tf2..., len = 8772
UnknownError: Graph execution error:

Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
  (0) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
         [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_127]]
  (1) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/deprecation.py", line 383, in new_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/dispatch.py", line 1260, in op_dispatch_handler
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 798, in py_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 773, in py_func_common
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 380, in _internal_py_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/op_def_library.py", line 796, in _apply_op_helper
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 2657, in _create_op_internal
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 1161, in from_node_def



During handling of the above exception, another exception occurred:

EXCEPTION
Traceback (most recent call last):
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4341, in help_on_tf_exception
    line: debug_fetch, fetch_helpers, op_copied = FetchHelper.copy_graph(
              debug_fetch,
              target_op=op,
              fetch_helper_tensors=list(op.inputs),
              stop_at_ts=stop_at_ts,
              verbose_stream=file,
          )
    locals:
      debug_fetch = <local> <tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>
      fetch_helpers = <not found>
      op_copied = <not found>
      FetchHelper = <local> <class 'returnn.tf.util.basic.FetchHelper'>
      FetchHelper.copy_graph = <local> <bound method FetchHelper.copy_graph of <class 'returnn.tf.util.basic.FetchHelper'>>
      target_op = <not found>
      op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
      fetch_helper_tensors = <not found>
      list = <builtin> <class 'list'>
      op.inputs = <local> (<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>,)
      stop_at_ts = <local> [<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>, <tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>, <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, <tf.Tensor 'extern_data/placeholders/batch_dim:...
      verbose_stream = <not found>
      file = <local> <returnn.log.Stream object at 0x7ff1800af490>
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/util/basic.py", line 7700, in FetchHelper.copy_graph
    line: assert target_op in ops, "target_op %r,\nops\n%s" % (target_op, pformat(ops))
    locals:
      target_op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
      ops = <local> [<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
      pformat = <local> <function pformat at 0x7ff183bc9e40>
AssertionError: target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]

Step meta information:
{'seq_idx': [0,
             1,
             2,
             3,
             4,
             5,
             6,
             7,
             8,
             9,
             10,
             11,
             12,
             13,
             14,
             15,
             16,
             17,
             18,
             19,
             20,
             21,
             22,
             23,
             24,
             25,
             26,
             27,
             28,
             29,
             30,
             31,
             32,
             33,
             34,
             35,
             36,
             37,
             38],
 'seq_tag': ['switchboard-1/sw02721B/sw2721B-ms98-a-0031',
             'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
             'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
             'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
             'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
             'switchboard-1/sw02145A/sw2145A-ms98-a-0107',
             'switchboard-1/sw02484A/sw2484A-ms98-a-0077',
             'switchboard-1/sw02768A/sw2768A-ms98-a-0064',
             'switchboard-1/sw03312B/sw3312B-ms98-a-0041',
             'switchboard-1/sw02344B/sw2344B-ms98-a-0023',
             'switchboard-1/sw04248B/sw4248B-ms98-a-0017',
             'switchboard-1/sw02762A/sw2762A-ms98-a-0059',
             'switchboard-1/sw03146A/sw3146A-ms98-a-0047',
             'switchboard-1/sw03032A/sw3032A-ms98-a-0065',
             'switchboard-1/sw02288A/sw2288A-ms98-a-0080',
             'switchboard-1/sw02751A/sw2751A-ms98-a-0066',
             'switchboard-1/sw02369A/sw2369A-ms98-a-0118',
             'switchboard-1/sw04169A/sw4169A-ms98-a-0059',
             'switchboard-1/sw02227A/sw2227A-ms98-a-0016',
             'switchboard-1/sw02061B/sw2061B-ms98-a-0170',
             'switchboard-1/sw02862B/sw2862B-ms98-a-0033',
             'switchboard-1/sw03116B/sw3116B-ms98-a-0065',
             'switchboard-1/sw03517B/sw3517B-ms98-a-0038',
             'switchboard-1/sw02360B/sw2360B-ms98-a-0086',
             'switchboard-1/sw02510B/sw2510B-ms98-a-0061',
             'switchboard-1/sw03919A/sw3919A-ms98-a-0017',
             'switchboard-1/sw02965A/sw2965A-ms98-a-0045',
             'switchboard-1/sw03154A/sw3154A-ms98-a-0073',
             'switchboard-1/sw02299A/sw2299A-ms98-a-0005',
             'switchboard-1/sw04572A/sw4572A-ms98-a-0026',
             'switchboard-1/sw02682A/sw2682A-ms98-a-0022',
             'switchboard-1/sw02808A/sw2808A-ms98-a-0014',
             'switchboard-1/sw04526A/sw4526A-ms98-a-0026',
             'switchboard-1/sw03180B/sw3180B-ms98-a-0010',
             'switchboard-1/sw03227A/sw3227A-ms98-a-0029',
             'switchboard-1/sw03891B/sw3891B-ms98-a-0008',
             'switchboard-1/sw03882B/sw3882B-ms98-a-0041',
             'switchboard-1/sw03102B/sw3102B-ms98-a-0027',
             'switchboard-1/sw02454A/sw2454A-ms98-a-0029']}
Feed dict:
  <tf.Tensor 'extern_data/placeholders/batch_dim:0' shape=() dtype=int32>: int(39)
  <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: shape (39, 10208, 1), dtype float32, min/max -1.0/1.0, mean/stddev 0.0014351769/0.11459725, Tensor{'data', [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}
  <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>: shape (39,), dtype int32, min/max 4760/10208, ([ 4760  6246  6372  6861  7296  7499  7534  7622  7824  8031  8295  8431
  8690  8675  8667  8886  9084  9199  9163  9156  9274  9262  9540  9668
  9678  9719  9711  9902  9989 10010 10020 10073 10006 10102 10131 10112
 10130 10178 10208])
  <tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>: type <class 'list'>, Tensor{'seq_tag', [B?], dtype='string'}
  <tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>: bool(True)
EXCEPTION
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1402, in BaseSession._do_call
    line: return fn(*args)
    locals:
      fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7ff04bb38860>
      args = <local> ({<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff052ae50f0>: array([[[-0.05505638],
                             [-0.09610788],
                             [-0.05115783],
                             ...,
                             [ 0.        ],
                             [ 0.        ],
                             [ 0.        ]],

                            [[-0.00226238],
                             [-0.01049833],
                             [-0.00...
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1385, in BaseSession._do_run.<locals>._run_fn
    line: return self._call_tf_sessionrun(options, feed_dict, fetch_list,
                                          target_list, run_metadata)
    locals:
      self = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
      self._call_tf_sessionrun = <local> <bound method BaseSession._call_tf_sessionrun of <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>>
      options = <local> None
      feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff052ae50f0>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051676ef0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051674e70>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff05177f870>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
      target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7ff04c6b2770>]
      run_metadata = <local> None
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1478, in BaseSession._call_tf_sessionrun
    line: return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
                                                  fetch_list, target_list,
                                                  run_metadata)
    locals:
      tf_session = <global> <module 'tensorflow.python.client.pywrap_tf_session' from '/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/pywrap_tf_session.py'>
      tf_session.TF_SessionRun_wrapper = <global> <built-in method TF_SessionRun_wrapper of PyCapsule object at 0x7ff14a9916e0>
      self = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
      self._session = <local> <tensorflow.python.client._pywrap_tf_session.TF_Session object at 0x7ff052c1fb30>
      options = <local> None
      feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff052ae50f0>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051676ef0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051674e70>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff05177f870>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
      target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7ff04c6b2770>]
      run_metadata = <local> None
UnknownError: 2 root error(s) found.
  (0) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
         [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_127]]
  (1) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

EXCEPTION
Traceback (most recent call last):
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 744, in Runner.run
    line: fetches_results = sess.run(
              fetches_dict, feed_dict=feed_dict, options=run_options
          )  # type: typing.Dict[str,typing.Union[numpy.ndarray,str]]
    locals:
      fetches_results = <not found>
      sess = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
      sess.run = <local> <bound method BaseSession.run of <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>>
      fetches_dict = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 7
      feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      options = <not found>
      run_options = <local> None
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 972, in BaseSession.run
    line: result = self._run(None, fetches, feed_dict, options_ptr,
                             run_metadata_ptr)
    locals:
      result = <not found>
      self = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
      self._run = <local> <bound method BaseSession._run of <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>>
      fetches = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 7
      feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      options_ptr = <local> None
      run_metadata_ptr = <local> None
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1215, in BaseSession._run
    line: results = self._do_run(handle, final_targets, final_fetches,
                                 feed_dict_tensor, options, run_metadata)
    locals:
      results = <not found>
      self = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
      self._do_run = <local> <bound method BaseSession._do_run of <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>>
      handle = <local> None
      final_targets = <local> [<tf.Operation 'optim_and_step_incr' type=NoOp>]
      final_fetches = <local> [<tf.Tensor 'objective/add:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss_init/truediv:0' shape=() dtype=float32>, <tf.Tensor 'globals/mem_usage_deviceGPU0:0' shape=() dtype=in...
      feed_dict_tensor = <local> {<Reference wrapping <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>>: array([[[-0.05505638],
                                         [-0.09610788],
                                         [-0.05115783],
                                         ...,
                                         [ 0.        ],
                                         [ 0.        ],
                                         [ 0.        ]],

                                        [[-0.00226238],
                                         [-0.01049...
      options = <local> None
      run_metadata = <local> None
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1395, in BaseSession._do_run
    line: return self._do_call(_run_fn, feeds, fetches, targets, options,
                               run_metadata)
    locals:
      self = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
      self._do_call = <local> <bound method BaseSession._do_call of <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>>
      _run_fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7ff04bb38860>
      feeds = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff052ae50f0>: array([[[-0.05505638],
                              [-0.09610788],
                              [-0.05115783],
                              ...,
                              [ 0.        ],
                              [ 0.        ],
                              [ 0.        ]],

                             [[-0.00226238],
                              [-0.01049833],
                              [-0.001...
      fetches = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051676ef0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051674e70>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff05177f870>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
      targets = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7ff04c6b2770>]
      options = <local> None
      run_metadata = <local> None
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1421, in BaseSession._do_call
    line: raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
    locals:
      type = <builtin> <class 'type'>
      e = <not found>
      node_def = <local> name: "objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch"
                         op: "PyFunc"
                         input: "extern_data/placeholders/seq_tag/seq_tag"
                         attr {
                           key: "token"
                           value {
                             s: "pyfunc_0"
                           }
                         }
                         attr {
                           key: "Tout"
                           value {
                             list {
                               type: DT_INT32
                               type: DT_FLOAT
                               type: DT_INT...
      op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
      message = <local> 'Graph execution error:\n\nDetected at node \'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch\' defined at (most recent call last):\n    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>\n    File "/work/asr4/vieting/tmp/20231108_tf2..., len = 8772
UnknownError: Graph execution error:

Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
  (0) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
         [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_127]]
  (1) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
    r = instance._read()
        ^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/deprecation.py", line 383, in new_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/dispatch.py", line 1260, in op_dispatch_handler
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 798, in py_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 773, in py_func_common
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 380, in _internal_py_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/op_def_library.py", line 796, in _apply_op_helper
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 2657, in _create_op_internal
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 1161, in from_node_def

Save model under output/models/epoch.001.crash_0
Trainer not finalized, quitting. (pid 2130233)
SprintSubprocessInstance: interrupt child proc 2130824


@vieting
Copy link
Contributor Author

vieting commented Nov 8, 2023

The RASR log of the nn trainer does not contain anything that looks particularly suspicious to me.

@albertz
Copy link
Member

albertz commented Nov 8, 2023

What about this?

configuration error: failed to open file "neural-network-trainer.config" for reading. (No such file or directory)

@albertz
Copy link
Member

albertz commented Nov 8, 2023

And in your stdout, you see the actual error:

Fatal Python error: Segmentation fault

Current thread 0x00007f69453ea380 (most recent call first):
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/control.py", line 499 in _handle_cmd_export_allophone_state_fsa_by_segment_name
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/control.py", line 509 in _handle_cmd
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/control.py", line 524 in handle_next
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/control.py", line 550 in run_control_loop

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector (total: 37)
<?xml version="1.0" encoding="UTF-8"?>
<sprint>
<?xml version="1.0" encoding="UTF-8"?>
<sprint>


  PROGRAM DEFECTIVE (TERMINATED BY SIGNAL):
  Segmentation fault

  Creating stack trace (innermost first):
  #2  /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f6947720520]
  #3  /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c) [0x7f69477749fc]
  #4  /lib/x86_64-linux-gnu/libc.so.6(raise+0x16) [0x7f6947720476]
  #5  /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f6947720520]
  #6  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK3Ftl13TrimAutomatonIN3Fsa9AutomatonEE8getStateEj+0x3a) [0x55d2626e440a]
  #7  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK3Ftl14CacheAutomatonIN3Fsa9AutomatonEE8getStateEj+0x3a2) [0x55d2626f3c72]
  #8  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(+0x9fb257) [0x55d262675257]
  #9  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(+0x9fe9ac) [0x55d2626789ac]
  #10  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK2Am15TransitionModel5applyEN4Core3RefIKN3Fsa9AutomatonEEEib+0x274) [0x55d262671194]
  #11  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Am24ClassicTransducerBuilder20applyTransitionModelEN4Core3RefIKN3Fsa9AutomatonEEE+0x387) [0x55d262660df7]
  #12  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech26AllophoneStateGraphBuilder17addLoopTransitionEN4Core3RefIKN3Fsa9AutomatonEEE+0x123) [0x55d262482e43]
  #13  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech23CTCTopologyGraphBuilder17addLoopTransitionEN4Core3RefIKN3Fsa9AutomatonEEE+0x53) [0x55d262483183]
  #14  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech23CTCTopologyGraphBuilder15buildTransducerEN4Core3RefIKN3Fsa9AutomatonEEE+0x8f) [0x55d262485cbf]
  #15  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech26AllophoneStateGraphBuilder15buildTransducerERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x66) [0x55d262480516]
  #16  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech26AllophoneStateGraphBuilder5buildERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x2e) [0x55d262480d5e]
  #17  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK2Nn25AllophoneStateFsaExporter23exportFsaForOrthographyERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x54) [0x55d262359054]
  #18  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Nn13PythonControl8Internal32exportAllophoneStateFsaBySegNameEP7_objectS3_+0x133) [0x55d26233e833]
  #19  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Nn13PythonControl8Internal8callbackEP7_objectS3_+0x25d) [0x55d26233ee6d]
  #20  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x1cd073) [0x7f697baa0073]
  #21  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyObject_MakeTpCall+0x87) [0x7f697ba50ff7]
  #22  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x477a) [0x7f697b9de96a]
  #23  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x26bf9a) [0x7f697bb3ef9a]
  #24  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x181058) [0x7f697ba54058]
  #25  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x50ae) [0x7f697b9df29e]
  #26  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x26bf9a) [0x7f697bb3ef9a]
  #27  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x181058) [0x7f697ba54058]
  #28  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x50ae) [0x7f697b9df29e]
  #29  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x26bf9a) [0x7f697bb3ef9a]
  #30  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x1810d8) [0x7f697ba540d8]
  #31  /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyObject_Call+0x128) [0x7f697ba53b88]
  #32  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Python8PyCallKwEP7_objectPKcS3_z+0xe6) [0x55d26258c876]
  #33  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Nn13PythonControl16run_control_loopEv+0x5f) [0x55d262332fbf]
  #34  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN9NnTrainer13pythonControlEv+0x167) [0x55d2620df317]
  #35  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN9NnTrainer4mainERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS6_EE+0x303) [0x55d2620b8e13]
  #36  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN4Core11Application3runERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x23) [0x55d26211e413]
  #37  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN4Core11Application4mainEiPPc+0x577) [0x55d2620ba577]
  #38  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(main+0x3d) [0x55d2620b852d]
  #39  /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f6947707d90]
  #40  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f6947707e40]
  #41  /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_start+0x25) [0x55d2620dd7a5]

@vieting
Copy link
Contributor Author

vieting commented Nov 8, 2023

What about this?
configuration error: failed to open file "neural-network-trainer.config" for reading. (No such file or directory)

I just use "sprint_opts" with "sprintConfigStr" for the fast_bw loss. Not sure why this "neural-network-trainer.config" is also checked. I do not define this anywhere in my config.

@vieting
Copy link
Contributor Author

vieting commented Nov 8, 2023

Note that the segmentation fault only occurs with the tf2.14 image and RASR. There might be something wrong on that side as well, see .

With my previous settings (tf2.13, RASR compiled with tf2.8), this is stdout + stderr

vieting@cn-251:/work/asr4/vieting/tmp/20231108_tf213_sprint_op$ ./run_example_patch.sh
RETURNN starting up, version 1.20231108.140626+git.9fe93590, date/time 2023-11-08-17-07-35 (UTC+0100), pid 2131331, cwd /work/asr4/vieting/tmp/20231108_tf213_sprint_op, Python /usr/bin/python3
RETURNN command line options: ['returnn.config']
Hostname: cn-251
MEMORY: main proc python3(2131331) initial: rss=40.9MB pss=40.9MB uss=40.9MB shared=4.0KB
MEMORY: total (1 procs): pss=40.9MB uss=40.9MB
2023-11-08 17:07:41.035240: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
MEMORY: main proc python3(2131331) increased RSS: rss=212.4MB pss=212.4MB uss=212.4MB shared=4.0KB
MEMORY: total (1 procs): pss=212.4MB uss=212.4MB
MEMORY: main proc python3(2131331) increased RSS: rss=283.6MB pss=283.6MB uss=283.6MB shared=4.0KB
MEMORY: total (1 procs): pss=283.6MB uss=283.6MB
MEMORY: main proc python3(2131331) increased RSS: rss=420.4MB pss=419.8MB uss=419.4MB shared=0.9MB
MEMORY: total (1 procs): pss=419.8MB uss=419.4MB
/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/util/basic.py:2258: SyntaxWarning: "is not" with a literal. Did you mean "!="?
  if dim is not 1:
/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/util/basic.py:6254: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if start is 0 and stop is None:
TensorFlow: 2.13.0 (v2.13.0-rc2-7-g1cb1a030a62) (<not-under-git> in /usr/local/lib/python3.8/dist-packages/tensorflow)
Use num_threads=1 (but min 2) via OMP_NUM_THREADS.
Setup TF inter and intra global thread pools, num_threads 2, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2}.
CUDA_VISIBLE_DEVICES is set to '2'.
Collecting TensorFlow device list...
2023-11-08 17:08:04.048461: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /device:GPU:0 with 10396 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:81:00.0, compute capability: 6.1
Local devices available to TensorFlow:
  1/2: name: "/device:CPU:0"
       device_type: "CPU"
       memory_limit: 268435456
       locality {
       }
       incarnation: 12364557139125826212
       xla_global_id: -1
  2/2: name: "/device:GPU:0"
       device_type: "GPU"
       memory_limit: 10901061632
       locality {
         bus_id: 2
         numa_node: 1
         links {
         }
       }
       incarnation: 14856658680689284311
       physical_device_desc: "device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:81:00.0, compute capability: 6.1"
       xla_global_id: 416903419
Using gpu device 2: NVIDIA GeForce GTX 1080 Ti
Hostname 'cn-251', GPU 2, GPU-dev-name 'NVIDIA GeForce GTX 1080 Ti', GPU-memory 10.2GB
MEMORY: main proc python3(2131331) increased RSS: rss=1.1GB pss=1.0GB uss=1.0GB shared=5.5MB
MEMORY: total (1 procs): pss=1.0GB uss=1.0GB
Train data:
  input: 1 x 1
  output: {'raw': {'dtype': 'string', 'shape': ()}, 'orth': [256, 1], 'data': [1, 2]}
  OggZipDataset, sequences: 249229, frames: unknown
Dev data:
MEMORY: main proc python3(2131331) increased RSS: rss=1.7GB pss=1.7GB uss=1.7GB shared=5.5MB
MEMORY: total (1 procs): pss=1.7GB uss=1.7GB
  OggZipDataset, sequences: 300, frames: unknown
Learning-rate-control: file learning_rates.swb.ctc does not exist yet
Setup TF session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ...
2023-11-08 17:08:13.177173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10396 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:81:00.0, compute capability: 6.1
layer /'data': [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)] float32
layer /features/'conv_h_filter': ['conv_h_filter:static:0'(128),'conv_h_filter:static:1'(1),F|F'conv_h_filter:static:2'(150)] float32
layer /features/'conv_h': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_act': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_split': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F'conv_h:channel'(150),F|F'conv_h_split_split_dims1'(1)] float32
DEPRECATION WARNING: Explicitly specify in_spatial_dims when there is more than one spatial dim in the input.
This will be disallowed with behavior_version 8.
layer /features/'conv_l': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel'(150),F|F'conv_l:channel'(5)] float32
layer /features/'conv_l_merge': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
DEPRECATION WARNING: MergeDimsLayer, only keep_order=True is allowed
This will be disallowed with behavior_version 6.
layer /features/'conv_l_act_no_norm': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'conv_l_act': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'features': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'specaug': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
WARNING:tensorflow:From /work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py:2462: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
layer /'conv_source': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_source_split_dims1'(1)] float32
layer /'conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_1:channel'(32)] float32
layer /'conv_1_pool': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_1:channel'(32)] float32
layer /'conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/32⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_2:channel'(64)] float32
layer /'conv_3': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_3:channel'(64)] float32
layer /'conv_merged': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conv_h:channel*conv_l:channel//2)*conv_3:channel'(24000)] float32
layer /'input_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'input_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_linear_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_linear_swish:feature-dense'(2048)] float32
layer /'conformer_1_ffmod_1_dropout_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_half_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_conv_mod_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_conv_mod_pointwise_conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32
layer /'conformer_1_conv_mod_glu': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conformer_1_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32
layer /'conformer_1_conv_mod_depthwise_conv': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_bn': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
DEPRECATION WARNING: batch_norm masked_time should be specified explicitly
This will be disallowed with behavior_version 12.
WARNING:tensorflow:From /work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/util/basic.py:1725: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
layer /'conformer_1_conv_mod_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_pointwise_conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_mhsa_mod_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_mhsa_mod_relpos_encoding': [T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32
layer /'conformer_1_mhsa_mod_self_attention': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_att_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_ffmod_2_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_ffmod_2_linear_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_linear_swish:feature-dense'(2048)] float32
layer /'conformer_1_ffmod_2_dropout_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_2_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_2_half_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'encoder': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
2023-11-08 17:08:14.118488: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10396 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:81:00.0, compute capability: 6.1
layer /'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'output:feature-dense'(88)] float32
WARNING:tensorflow:From /work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py:54: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.

MEMORY: main proc python3(2131331) increased RSS: rss=1.9GB pss=1.9GB uss=1.8GB shared=31.8MB
MEMORY: total (1 procs): pss=1.9GB uss=1.8GB
OpCodeCompiler call: /usr/local/cuda-11.8/bin/nvcc -shared -O2 -std=c++17 -I /usr/local/lib/python3.8/dist-packages/tensorflow/include -I /usr/local/lib/python3.8/dist-packages/tensorflow/include/external/nsync/public -ccbin /usr/bin/gcc -I /usr/local/cuda-11.8/targets/x86_64-linux/include -I /usr/local/cuda-11.8/include -L /usr/local/cuda-11.8/lib64 -x cu -v -DGOOGLE_CUDA=1 -Xcompiler -fPIC -Xcompiler -v -arch compute_61 -I /usr/local/lib/python3.8/dist-packages/tensorflow/include/third_party/gpus/cuda/include -D_GLIBCXX_USE_CXX11_ABI=1 -DNDEBUG=1 -g /var/tmp/vieting/returnn_tf_cache/ops/FastBaumWelchOp/b50a371e1a/FastBaumWelchOp.cc -o /var/tmp/vieting/returnn_tf_cache/ops/FastBaumWelchOp/b50a371e1a/FastBaumWelchOp.so -L/usr/local/lib/python3.8/dist-packages/scipy.libs -l:libopenblasp-r0-41284840.3.18.so -L/usr/local/lib/python3.8/dist-packages/tensorflow -l:libtensorflow_framework.so.2
MEMORY: sub proc nvcc(2131947) initial: rss=3.4MB pss=2.0MB uss=0.9MB shared=2.5MB
MEMORY: total (2 procs): pss=1.9GB uss=1.8GB
MEMORY: sub proc nvcc(2131947) increased RSS: rss=3.5MB pss=2.1MB uss=1.5MB shared=2.0MB
MEMORY: sub proc sh(2131954) initial: rss=1.6MB pss=603.0KB uss=236.0KB shared=1.3MB
MEMORY: sub proc cicc(2131955) initial: rss=257.3MB pss=255.3MB uss=254.2MB shared=3.0MB
MEMORY: total (4 procs): pss=2.1GB uss=2.1GB
MEMORY: sub proc cicc(2131955) increased RSS: rss=1.0GB pss=1.0GB uss=1.0GB shared=3.0MB
MEMORY: total (4 procs): pss=2.9GB uss=2.9GB
MEMORY: proc <unknown-dead>(2131954) exited, old: rss=1.6MB pss=603.0KB uss=236.0KB shared=1.3MB
MEMORY: proc cicc(2131955) exited, old: rss=1.0GB pss=1.0GB uss=1.0GB shared=3.0MB
MEMORY: sub proc sh(2131963) initial: rss=1.6MB pss=605.0KB uss=228.0KB shared=1.4MB
MEMORY: sub proc cudafe++(2131964) initial: rss=229.5MB pss=228.3MB uss=227.8MB shared=1.7MB
MEMORY: total (4 procs): pss=2.1GB uss=2.1GB
MEMORY: sub proc cudafe++(2131964) increased RSS: rss=1.1GB pss=1.1GB uss=1.1GB shared=1.7MB
MEMORY: total (4 procs): pss=3.0GB uss=2.9GB
MEMORY: proc <unknown-dead>(2131963) exited, old: rss=1.6MB pss=605.0KB uss=228.0KB shared=1.4MB
MEMORY: proc cudafe++(2131964) exited, old: rss=1.1GB pss=1.1GB uss=1.1GB shared=1.7MB
MEMORY: sub proc nvcc(2131947) increased RSS: rss=3.6MB pss=2.1MB uss=1.5MB shared=2.0MB
MEMORY: sub proc sh(2131969) initial: rss=1.7MB pss=552.0KB uss=224.0KB shared=1.5MB
MEMORY: sub proc gcc(2131970) initial: rss=2.6MB pss=1.4MB uss=1.0MB shared=1.6MB
MEMORY: sub proc cc1plus(2131971) initial: rss=397.0MB pss=395.4MB uss=394.8MB shared=2.2MB
MEMORY: total (5 procs): pss=2.2GB uss=2.2GB
MEMORY: sub proc cc1plus(2131971) increased RSS: rss=0.8GB pss=0.8GB uss=0.8GB shared=2.2MB
MEMORY: total (5 procs): pss=2.7GB uss=2.7GB
Network layer topology:
  extern data: data: Tensor{[B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}, seq_tag: Tensor{[B?], dtype='string'}
  used data keys: ['data', 'seq_tag']
  layers:
    layer batch_norm 'conformer_1_conv_mod_bn' #: 512
    layer conv 'conformer_1_conv_mod_depthwise_conv' #: 512
    layer copy 'conformer_1_conv_mod_dropout' #: 512
    layer gating 'conformer_1_conv_mod_glu' #: 512
    layer layer_norm 'conformer_1_conv_mod_ln' #: 512
    layer linear 'conformer_1_conv_mod_pointwise_conv_1' #: 1024
    layer linear 'conformer_1_conv_mod_pointwise_conv_2' #: 512
    layer combine 'conformer_1_conv_mod_res_add' #: 512
    layer activation 'conformer_1_conv_mod_swish' #: 512
    layer copy 'conformer_1_ffmod_1_dropout' #: 512
    layer linear 'conformer_1_ffmod_1_dropout_linear' #: 512
    layer eval 'conformer_1_ffmod_1_half_res_add' #: 512
    layer linear 'conformer_1_ffmod_1_linear_swish' #: 2048
    layer layer_norm 'conformer_1_ffmod_1_ln' #: 512
    layer copy 'conformer_1_ffmod_2_dropout' #: 512
    layer linear 'conformer_1_ffmod_2_dropout_linear' #: 512
    layer eval 'conformer_1_ffmod_2_half_res_add' #: 512
    layer linear 'conformer_1_ffmod_2_linear_swish' #: 2048
    layer layer_norm 'conformer_1_ffmod_2_ln' #: 512
    layer linear 'conformer_1_mhsa_mod_att_linear' #: 512
    layer copy 'conformer_1_mhsa_mod_dropout' #: 512
    layer layer_norm 'conformer_1_mhsa_mod_ln' #: 512
    layer relative_positional_encoding 'conformer_1_mhsa_mod_relpos_encoding' #: 64
    layer combine 'conformer_1_mhsa_mod_res_add' #: 512
    layer self_attention 'conformer_1_mhsa_mod_self_attention' #: 512
    layer layer_norm 'conformer_1_output' #: 512
    layer conv 'conv_1' #: 32
    layer pool 'conv_1_pool' #: 32
    layer conv 'conv_2' #: 64
    layer conv 'conv_3' #: 64
    layer merge_dims 'conv_merged' #: 24000
    layer split_dims 'conv_source' #: 1
    layer source 'data' #: 1
    layer copy 'encoder' #: 512
    layer subnetwork 'features' #: 750
    layer conv 'features/conv_h' #: 150
    layer eval 'features/conv_h_act' #: 150
    layer variable 'features/conv_h_filter' #: 150
    layer split_dims 'features/conv_h_split' #: 1
    layer conv 'features/conv_l' #: 5
    layer layer_norm 'features/conv_l_act' #: 750
    layer eval 'features/conv_l_act_no_norm' #: 750
    layer merge_dims 'features/conv_l_merge' #: 750
    layer copy 'features/output' #: 750
    layer copy 'input_dropout' #: 512
    layer linear 'input_linear' #: 512
    layer softmax 'output' #: 88
    layer eval 'specaug' #: 750
net params #: 18473980
net trainable params: [<tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_1_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conv_1/W:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'conv_1/bias:0' shape=(32,) dtype=float32>, <tf.Variable 'conv_2/W:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'conv_2/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'conv_3/W:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'conv_3/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'features/conv_h_filter/conv_h_filter:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'features/conv_l/W:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'features/conv_l_act/bias:0' shape=(750,) dtype=float32>, <tf.Variable 'features/conv_l_act/scale:0' shape=(750,) dtype=float32>, <tf.Variable 'input_linear/W:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'output/W:0' shape=(512, 88) dtype=float32>, <tf.Variable 'output/b:0' shape=(88,) dtype=float32>]
2023-11-08 17:09:01.409733: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:375] MLIR V1 optimization pass is not enabled
start training at epoch 1
using batch size: {'classes': 5000, 'data': 400000}, max seqs: 128
learning rate control: NewbobMultiEpoch(num_epochs=6, update_interval=1, relative_error_threshold=-0.01, relative_error_grow_threshold=-0.01), epoch data: 1: EpochData(learningRate=1.325e-05, error={}), 2: EpochData(learningRate=1.539861111111111e-05, error={}), 3: EpochData(learningRate=1.754722222222222e-05, error={}), ..., 360: EpochData(learningRate=1.4333333333333375e-05, error={}), 361: EpochData(learningRate=1.2166666666666727e-05, error={}), 362: EpochData(learningRate=1e-05, error={}), error key: None
pretrain: None
MEMORY: proc <unknown-dead>(2131947) exited, old: rss=3.6MB pss=2.1MB uss=1.5MB shared=2.0MB
MEMORY: proc <unknown-dead>(2131969) exited, old: rss=1.7MB pss=552.0KB uss=224.0KB shared=1.5MB
MEMORY: proc <unknown-dead>(2131970) exited, old: rss=2.6MB pss=1.4MB uss=1.0MB shared=1.6MB
MEMORY: proc cc1plus(2131971) exited, old: rss=0.8GB pss=0.8GB uss=0.8GB shared=2.2MB
MEMORY: main proc python3(2131331) increased RSS: rss=2.3GB pss=2.3GB uss=2.3GB shared=6.4MB
MEMORY: total (1 procs): pss=2.3GB uss=2.3GB
start epoch 1 with learning rate 1.325e-05 ...
TF: log_dir: output/models/train-2023-11-08-16-07-34
Create optimizer <class 'returnn.tf.updater.NadamOptimizer'> with options {'epsilon': 1e-08, 'learning_rate': <tf.Variable 'learning_rate:0' shape=() dtype=float32>}.
Initialize optimizer (default) with slots ['m', 'v'].
These additional variable were created by the optimizer: [<tf.Variable 'optimize/gradients/conformer_1_conv_mod_bn/batch_norm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_bn/batch_norm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_depthwise_conv/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_depthwise_conv/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_1/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(1024,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_2/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_dropout_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_dropout_linear/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_linear_swish/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_linear_swish/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_dropout_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_dropout_linear/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_linear_swish/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_linear_swish/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_att_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_relpos_encoding/Gather_grad/Reshape_accum_grad/var_accum_grad:0' shape=(65, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_self_attention/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_output/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_output/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_h/convolution/ExpandDims_1_grad/Reshape_accum_grad/var_accum_grad:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l/convolution_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/input_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 88) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(88,) dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta1_power:0' shape=() dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta2_power:0' shape=() dtype=float32>].
2023-11-08 17:09:08.816918: W tensorflow/c/c_api.cc:304] Operation '{name:'global_step' id:161 op device:{requested: '/device:CPU:0', assigned: ''} def:{{{node global_step}} = VarHandleOp[_class=["loc:@global_step"], _has_manual_control_dependencies=true, allowed_devices=[], container="", dtype=DT_INT64, shape=[], shared_name="global_step", _device="/device:CPU:0"]()}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
OpCodeCompiler call: /usr/local/cuda-11.8/bin/nvcc -shared -O2 -std=c++17 -I /usr/local/lib/python3.8/dist-packages/tensorflow/include -I /usr/local/lib/python3.8/dist-packages/tensorflow/include/external/nsync/public -ccbin /usr/bin/gcc -I /usr/local/cuda-11.8/targets/x86_64-linux/include -I /usr/local/cuda-11.8/include -L /usr/local/cuda-11.8/lib64 -x cu -v -DGOOGLE_CUDA=1 -Xcompiler -fPIC -Xcompiler -v -I /usr/local/lib/python3.8/dist-packages/tensorflow/include/third_party/gpus/cuda/include -D_GLIBCXX_USE_CXX11_ABI=1 -DNDEBUG=1 -g /var/tmp/vieting/returnn_tf_cache/ops/DevMaxBytesInUse/5fd1f0202b/DevMaxBytesInUse.cc -o /var/tmp/vieting/returnn_tf_cache/ops/DevMaxBytesInUse/5fd1f0202b/DevMaxBytesInUse.so -L/usr/local/lib/python3.8/dist-packages/tensorflow -l:libtensorflow_framework.so.2
MEMORY: main proc python3(2131331) increased RSS: rss=2.6GB pss=2.6GB uss=2.5GB shared=8.8MB
MEMORY: sub proc nvcc(2131988) initial: rss=3.5MB pss=2.0MB uss=1.5MB shared=2.1MB
MEMORY: sub proc sh(2131991) initial: rss=1.6MB pss=565.0KB uss=256.0KB shared=1.4MB
MEMORY: sub proc gcc(2131992) initial: rss=2.5MB pss=1.3MB uss=1.0MB shared=1.6MB
MEMORY: sub proc cc1plus(2131993) initial: rss=43.0MB pss=41.4MB uss=40.9MB shared=2.2MB
MEMORY: total (5 procs): pss=2.6GB uss=2.6GB
MEMORY: proc sh(2131991) exited, old: rss=1.6MB pss=565.0KB uss=256.0KB shared=1.4MB
MEMORY: proc gcc(2131992) exited, old: rss=2.5MB pss=1.3MB uss=1.0MB shared=1.6MB
MEMORY: proc cc1plus(2131993) exited, old: rss=43.0MB pss=41.4MB uss=40.9MB shared=2.2MB
MEMORY: sub proc sh(2131994) initial: rss=1.7MB pss=633.0KB uss=232.0KB shared=1.5MB
MEMORY: sub proc cicc(2131995) initial: rss=736.6MB pss=734.8MB uss=733.8MB shared=2.9MB
MEMORY: total (4 procs): pss=3.3GB uss=3.3GB
MEMORY: proc sh(2131994) exited, old: rss=1.7MB pss=633.0KB uss=232.0KB shared=1.5MB
MEMORY: proc cicc(2131995) exited, old: rss=736.6MB pss=734.8MB uss=733.8MB shared=2.9MB
MEMORY: sub proc nvcc(2131988) increased RSS: rss=3.6MB pss=2.2MB uss=1.5MB shared=2.0MB
MEMORY: sub proc sh(2132005) initial: rss=1.6MB pss=613.0KB uss=232.0KB shared=1.4MB
MEMORY: sub proc cudafe++(2132006) initial: rss=242.1MB pss=241.0MB uss=240.5MB shared=1.6MB
MEMORY: total (4 procs): pss=2.8GB uss=2.8GB
MEMORY: proc sh(2132005) exited, old: rss=1.6MB pss=613.0KB uss=232.0KB shared=1.4MB
MEMORY: proc cudafe++(2132006) exited, old: rss=242.1MB pss=241.0MB uss=240.5MB shared=1.6MB
MEMORY: sub proc sh(2132007) initial: rss=1.6MB pss=531.0KB uss=224.0KB shared=1.4MB
MEMORY: sub proc gcc(2132008) initial: rss=2.6MB pss=1.4MB uss=1.0MB shared=1.6MB
MEMORY: sub proc cc1plus(2132009) initial: rss=121.0MB pss=119.5MB uss=119.0MB shared=2.1MB
MEMORY: total (5 procs): pss=2.7GB uss=2.7GB
MEMORY: sub proc cc1plus(2132009) increased RSS: rss=515.9MB pss=514.4MB uss=513.9MB shared=2.1MB
MEMORY: total (5 procs): pss=3.1GB uss=3.1GB
SprintSubprocessInstance: exec ['/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:35,p2c_fd:36,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=no', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']
SprintSubprocessInstance: starting, pid 2132023
MEMORY: proc <unknown-dead>(2131988) exited, old: rss=3.6MB pss=2.2MB uss=1.5MB shared=2.0MB
MEMORY: proc <unknown-dead>(2132007) exited, old: rss=1.6MB pss=531.0KB uss=224.0KB shared=1.4MB
MEMORY: proc <unknown-dead>(2132008) exited, old: rss=2.6MB pss=1.4MB uss=1.0MB shared=1.6MB
MEMORY: proc cc1plus(2132009) exited, old: rss=515.9MB pss=514.4MB uss=513.9MB shared=2.1MB
/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard: error while loading shared libraries: libtensorflow_cc.so.2: cannot open shared object file: No such file or directory
SprintSubprocessInstance: Sprint child process (['/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:35,p2c_fd:36,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=no', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']) caused an exception.
MEMORY: total (1 procs): pss=2.6GB uss=2.5GB
EXCEPTION
Traceback (most recent call last):
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in SprintSubprocessInstance._start_child
    line: ret = self._read()
    locals:
      ret = <not found>
      self = <local> <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7fdc896c9e50>
      self._read = <local> <bound method SprintSubprocessInstance._read of <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7fdc896c9e50>>
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in SprintSubprocessInstance._read
    line: return util.read_pickled_object(p)
    locals:
      util = <global> <module 'returnn.util.basic' from '/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py'>
      util.read_pickled_object = <global> <function read_pickled_object at 0x7fddcfbc3d30>
      p = <local> <_io.FileIO name=34 mode='rb' closefd=True>
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    line: size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
    locals:
      size_raw = <not found>
      read_bytes_to_new_buffer = <global> <function read_bytes_to_new_buffer at 0x7fddcfbc3ca0>
      p = <local> <_io.FileIO name=34 mode='rb' closefd=True>
      getvalue = <not found>
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    line: raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
    locals:
      EOFError = <builtin> <class 'EOFError'>
      size = <local> 4
      read_size = <local> 0
EOFError: expected to read 4 bytes but got EOF after 0 bytes
Exception in py_wrap_get_sprint_automata_for_batch:
EXCEPTION
Traceback (most recent call last):
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in SprintSubprocessInstance._start_child
    line: ret = self._read()
    locals:
      ret = <not found>
      self = <local> <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7fdc896c9e50>
      self._read = <local> <bound method SprintSubprocessInstance._read of <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7fdc896c9e50>>
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in SprintSubprocessInstance._read
    line: return util.read_pickled_object(p)
    locals:
      util = <global> <module 'returnn.util.basic' from '/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py'>
      util.read_pickled_object = <global> <function read_pickled_object at 0x7fddcfbc3d30>
      p = <local> <_io.FileIO name=34 mode='rb' closefd=True>
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    line: size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
    locals:
      size_raw = <not found>
      read_bytes_to_new_buffer = <global> <function read_bytes_to_new_buffer at 0x7fddcfbc3ca0>
      p = <local> <_io.FileIO name=34 mode='rb' closefd=True>
      getvalue = <not found>
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    line: raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
    locals:
      EOFError = <builtin> <class 'EOFError'>
      size = <local> 4
      read_size = <local> 0
EOFError: expected to read 4 bytes but got EOF after 0 bytes

During handling of the above exception, another exception occurred:

EXCEPTION
Traceback (most recent call last):
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in get_sprint_automata_for_batch_op.<locals>.py_wrap_get_sprint_automata_for_batch
    line: return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
    locals:
      py_get_sprint_automata_for_batch = <global> <function py_get_sprint_automata_for_batch at 0x7fdc8c7361f0>
      sprint_opts = <local> {'sprintExecPath': '/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', 'sprintConfigStr': '--*.configuration.channel=output-channel --*.real-time-factor.channel=output-channel --*.system-info.channel=output-channel --*.time.channel=output-...
      tags = <not found>
      py_tags = <local> array([b'switchboard-1/sw02721B/sw2721B-ms98-a-0031',
                               b'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
                               b'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
                               b'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
                               b'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
                               b'switchboard-1/sw02...
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    line: edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
    locals:
      edges = <not found>
      weights = <not found>
      start_end_states = <not found>
      sprint_instance_pool = <local> <returnn.sprint.error_signals.SprintInstancePool object at 0x7fdc85e59f70>
      sprint_instance_pool.get_automata_for_batch = <local> <bound method SprintInstancePool.get_automata_for_batch of <returnn.sprint.error_signals.SprintInstancePool object at 0x7fdc85e59f70>>
      tags = <local> array([b'switchboard-1/sw02721B/sw2721B-ms98-a-0031',
                            b'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
                            b'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
                            b'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
                            b'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
                            b'switchboard-1/sw02...
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in SprintInstancePool.get_automata_for_batch
    line: instance = self._get_instance(i)
    locals:
      instance = <not found>
      self = <local> <returnn.sprint.error_signals.SprintInstancePool object at 0x7fdc85e59f70>
      self._get_instance = <local> <bound method SprintInstancePool._get_instance of <returnn.sprint.error_signals.SprintInstancePool object at 0x7fdc85e59f70>>
      i = <local> 0
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in SprintInstancePool._get_instance
    line: self._maybe_create_new_instance()
    locals:
      self = <local> <returnn.sprint.error_signals.SprintInstancePool object at 0x7fdc85e59f70>
      self._maybe_create_new_instance = <local> <bound method SprintInstancePool._maybe_create_new_instance of <returnn.sprint.error_signals.SprintInstancePool object at 0x7fdc85e59f70>>
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in SprintInstancePool._maybe_create_new_instance
    line: self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
    locals:
      self = <local> <returnn.sprint.error_signals.SprintInstancePool object at 0x7fdc85e59f70>
      self.instances = <local> []
      self.instances.append = <local> <built-in method append of list object at 0x7fdc8489e840>
      SprintSubprocessInstance = <global> <class 'returnn.sprint.error_signals.SprintSubprocessInstance'>
      self.sprint_opts = <local> {'sprintExecPath': '/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', 'sprintConfigStr': '--*.configuration.channel=output-channel --*.real-time-factor.channel=output-channel --*.system-info.channel=output-channel --*.time.channel=output-...
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in SprintSubprocessInstance.__init__
    line: self.init()
    locals:
      self = <local> <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7fdc896c9e50>
      self.init = <local> <bound method SprintSubprocessInstance.init of <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7fdc896c9e50>>
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in SprintSubprocessInstance.init
    line: self._start_child()
    locals:
      self = <local> <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7fdc896c9e50>
      self._start_child = <local> <bound method SprintSubprocessInstance._start_child of <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7fdc896c9e50>>
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in SprintSubprocessInstance._start_child
    line: raise Exception("SprintSubprocessInstance Sprint init failed")
    locals:
      Exception = <builtin> <class 'Exception'>
Exception: SprintSubprocessInstance Sprint init failed
2023-11-08 17:09:37.114349: W tensorflow/core/framework/op_kernel.cc:1816] UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


2023-11-08 17:09:37.114515: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 14907759204653744683
2023-11-08 17:09:37.114540: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 11924807411687211681
2023-11-08 17:09:37.114558: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 8498381501270362003
2023-11-08 17:09:37.114592: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 328642183433865367
2023-11-08 17:09:37.114608: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 15509202514790697743
2023-11-08 17:09:37.114638: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 12478617659299189133
2023-11-08 17:09:37.114656: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 17119912705987515863
2023-11-08 17:09:37.114671: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 1116834209094735605
2023-11-08 17:09:37.114687: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 4661036471183676975
2023-11-08 17:09:37.114703: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 17206736268075489981
2023-11-08 17:09:37.114723: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 11940517361119239617
2023-11-08 17:09:37.114737: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 2075000341389533861
2023-11-08 17:09:37.114757: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 1551945598752204051
2023-11-08 17:09:37.114773: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 18024994189871473987
2023-11-08 17:09:37.114787: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 8039025426040121703
2023-11-08 17:09:37.114801: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 12780907590735407947
2023-11-08 17:09:37.114832: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 18105505433626603299
2023-11-08 17:09:37.114848: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 14023509702728807603
2023-11-08 17:09:37.114861: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 4387189208380191869
2023-11-08 17:09:37.114877: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 15290859676350821985
2023-11-08 17:09:37.114891: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 4708683971917804685
2023-11-08 17:09:37.114905: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 782629118718604739
2023-11-08 17:09:37.114915: I tensorflow/core/framework/local_rendezvous.cc:409] Local rendezvous send item cancelled. Key hash: 16178361428648949333
2023-11-08 17:09:37.114930: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 1368360081948114135
2023-11-08 17:09:37.114956: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 9684463615367594434
2023-11-08 17:09:37.114970: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 11191673837626951548
2023-11-08 17:09:37.114986: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 4601451330222918362
2023-11-08 17:09:37.115000: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 14060714862683982606
2023-11-08 17:09:37.115032: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 16737683200926961030
2023-11-08 17:09:37.115046: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 17857287931859718032
2023-11-08 17:09:37.115059: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 5354699002852183842
2023-11-08 17:09:37.115073: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 12547361387349856700
2023-11-08 17:09:37.115087: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 15404591707848971056
2023-11-08 17:09:37.115101: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 7479360675682653368
2023-11-08 17:09:37.115115: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 15409731113398965776
2023-11-08 17:09:37.115131: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 9679296465648687078
2023-11-08 17:09:37.115145: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 9282137006686686836
2023-11-08 17:09:37.115158: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 9017255699680893100
2023-11-08 17:09:37.115172: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 16662337826391890718
2023-11-08 17:09:37.115186: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 6549064369067171100
2023-11-08 17:09:37.115225: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 5592458713738762450
2023-11-08 17:09:37.115243: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 6034280818993323922
2023-11-08 17:09:37.115258: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 18200915710976925794
2023-11-08 17:09:37.115271: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 15218690700986048972
2023-11-08 17:09:37.115284: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 8950560704742236676
2023-11-08 17:09:37.115294: I tensorflow/core/framework/local_rendezvous.cc:409] Local rendezvous send item cancelled. Key hash: 15258328697247900912
2023-11-08 17:09:37.115308: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 5450317640836131402
2023-11-08 17:09:37.115328: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 7607626667450182958
2023-11-08 17:09:37.115342: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 18231059680337670234
2023-11-08 17:09:37.115355: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 15520128163238770216
2023-11-08 17:09:37.115365: I tensorflow/core/framework/local_rendezvous.cc:409] Local rendezvous send item cancelled. Key hash: 6445139679874136070
2023-11-08 17:09:37.115379: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 5004971731649411668
2023-11-08 17:09:37.115409: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 4347196143763668518
MEMORY: main proc python3(2131331) increased RSS: rss=2.7GB pss=2.7GB uss=2.7GB shared=6.4MB
MEMORY: total (1 procs): pss=2.7GB uss=2.7GB
MEMORY: main proc python3(2131331) increased RSS: rss=2.8GB pss=2.8GB uss=2.8GB shared=6.4MB
MEMORY: total (1 procs): pss=2.8GB uss=2.8GB
MEMORY: main proc python3(2131331) increased RSS: rss=3.0GB pss=3.0GB uss=3.0GB shared=6.4MB
MEMORY: total (1 procs): pss=3.0GB uss=3.0GB
2023-11-08 17:09:51.148252: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:432] Loaded cuDNN version 8600
MEMORY: main proc python3(2131331) increased RSS: rss=3.2GB pss=3.2GB uss=3.2GB shared=6.4MB
MEMORY: total (1 procs): pss=3.2GB uss=3.2GB
MEMORY: main proc python3(2131331) increased RSS: rss=3.4GB pss=3.4GB uss=3.4GB shared=6.4MB
MEMORY: total (1 procs): pss=3.4GB uss=3.4GB
TensorFlow exception: Graph execution error:

Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "./returnn/rnn.py", line 11, in <module>
      main()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
      execute_main_task()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
      engine.init_train_from_config(config, train_data, dev_data, eval_data)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
      self.init_network_from_config(config)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
      self._init_network(net_desc=net_dict, epoch=self.epoch)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
      self.network, self.updater = self.create_network(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
      updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
      self.loss = network.get_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
      self.maybe_construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
      self._construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
      losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
      if loss_obj.get_loss_value_for_objective() is not None:
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
      self._prepare()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
      self._loss_value = self.loss.get_value()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
      fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
      edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
      edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "./returnn/rnn.py", line 11, in <module>
      main()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
      execute_main_task()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
      engine.init_train_from_config(config, train_data, dev_data, eval_data)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
      self.init_network_from_config(config)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
      self._init_network(net_desc=net_dict, epoch=self.epoch)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
      self.network, self.updater = self.create_network(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
      updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
      self.loss = network.get_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
      self.maybe_construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
      self._construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
      losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
      if loss_obj.get_loss_value_for_objective() is not None:
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
      self._prepare()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
      self._loss_value = self.loss.get_value()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
      fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
      edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
      edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
  (0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
         [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
  (1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
  File "./returnn/rnn.py", line 11, in <module>
    main()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
    execute_main_task()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
    engine.init_train_from_config(config, train_data, dev_data, eval_data)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
    self.init_network_from_config(config)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
    self._init_network(net_desc=net_dict, epoch=self.epoch)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
    self.network, self.updater = self.create_network(
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
    updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
    self.loss = network.get_objective()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
    self.maybe_construct_objective()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
    self._construct_objective()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
    losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
    if loss_obj.get_loss_value_for_objective() is not None:
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
    self._prepare()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
    self._loss_value = self.loss.get_value()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
    fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
    edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
    edges, weights, start_end_states = tf_compat.v1.py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/deprecation.py", line 371, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 1176, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 678, in py_func
    return py_func_common(func, inp, Tout, stateful, name=name)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 653, in py_func_common
    return _internal_py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 378, in _internal_py_func
    result = gen_script_ops.py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 795, in _apply_op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3381, in _create_op_internal
    ret = Operation.from_node_def(

Exception UnknownError() in step 0. (pid 2131331)
Failing op: <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
We tried to fetch the op inputs ([<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>]) but got another exception:
target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
EXCEPTION
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1379, in BaseSession._do_call
    line: return fn(*args)
    locals:
      fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7fdc85e97b80>
      args = <local> ({<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8b849530>: array([[[-0.05505638],
                             [-0.09610788],
                             [-0.05115783],
                             ...,
                             [ 0.        ],
                             [ 0.        ],
                             [ 0.        ]],

                            [[-0.00226238],
                             [-0.01049833],
                             [-0.00...
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1362, in BaseSession._do_run.<locals>._run_fn
    line: return self._call_tf_sessionrun(options, feed_dict, fetch_list,
                                          target_list, run_metadata)
    locals:
      self = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
      self._call_tf_sessionrun = <local> <bound method BaseSession._call_tf_sessionrun of <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>>
      options = <local> None
      feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8b849530>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc879fa270>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc87a00bb0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8c73f530>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
      target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b370>, <tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b3b0>]
      run_metadata = <local> None
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1455, in BaseSession._call_tf_sessionrun
    line: return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
                                                  fetch_list, target_list,
                                                  run_metadata)
    locals:
      tf_session = <global> <module 'tensorflow.python.client.pywrap_tf_session' from '/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/pywrap_tf_session.py'>
      tf_session.TF_SessionRun_wrapper = <global> <built-in method TF_SessionRun_wrapper of PyCapsule object at 0x7fdd96f6a480>
      self = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
      self._session = <local> <tensorflow.python.client._pywrap_tf_session.TF_Session object at 0x7fdc8bcf3770>
      options = <local> None
      feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8b849530>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc879fa270>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc87a00bb0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8c73f530>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
      target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b370>, <tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b3b0>]
      run_metadata = <local> None
UnknownError: 2 root error(s) found.
  (0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
         [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
  (1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.


During handling of the above exception, another exception occurred:

EXCEPTION
Traceback (most recent call last):
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 744, in Runner.run
    line: fetches_results = sess.run(
              fetches_dict, feed_dict=feed_dict, options=run_options
          )  # type: typing.Dict[str,typing.Union[numpy.ndarray,str]]
    locals:
      fetches_results = <not found>
      sess = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
      sess.run = <local> <bound method BaseSession.run of <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>>
      fetches_dict = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 8
      feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      options = <not found>
      run_options = <local> None
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 969, in BaseSession.run
    line: result = self._run(None, fetches, feed_dict, options_ptr,
                             run_metadata_ptr)
    locals:
      result = <not found>
      self = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
      self._run = <local> <bound method BaseSession._run of <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>>
      fetches = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 8
      feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      options_ptr = <local> None
      run_metadata_ptr = <local> None
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1192, in BaseSession._run
    line: results = self._do_run(handle, final_targets, final_fetches,
                                 feed_dict_tensor, options, run_metadata)
    locals:
      results = <not found>
      self = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
      self._do_run = <local> <bound method BaseSession._do_run of <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>>
      handle = <local> None
      final_targets = <local> [<tf.Operation 'conformer_1_conv_mod_bn/batch_norm/cond/Merge_1' type=Merge>, <tf.Operation 'optim_and_step_incr' type=NoOp>]
      final_fetches = <local> [<tf.Tensor 'objective/add:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss_init/truediv:0' shape=() dtype=float32>, <tf.Tensor 'globals/mem_usage_deviceGPU0:0' shape=() dtype=in...
      feed_dict_tensor = <local> {<Reference wrapping <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>>: array([[[-0.05505638],
                                         [-0.09610788],
                                         [-0.05115783],
                                         ...,
                                         [ 0.        ],
                                         [ 0.        ],
                                         [ 0.        ]],

                                        [[-0.00226238],
                                         [-0.01049...
      options = <local> None
      run_metadata = <local> None
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1372, in BaseSession._do_run
    line: return self._do_call(_run_fn, feeds, fetches, targets, options,
                               run_metadata)
    locals:
      self = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
      self._do_call = <local> <bound method BaseSession._do_call of <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>>
      _run_fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7fdc85e97b80>
      feeds = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8b849530>: array([[[-0.05505638],
                              [-0.09610788],
                              [-0.05115783],
                              ...,
                              [ 0.        ],
                              [ 0.        ],
                              [ 0.        ]],

                             [[-0.00226238],
                              [-0.01049833],
                              [-0.001...
      fetches = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc879fa270>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc87a00bb0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8c73f530>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
      targets = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b370>, <tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b3b0>]
      options = <local> None
      run_metadata = <local> None
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1398, in BaseSession._do_call
    line: raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
    locals:
      type = <builtin> <class 'type'>
      e = <not found>
      node_def = <local> name: "objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch"
                         op: "PyFunc"
                         input: "extern_data/placeholders/seq_tag/seq_tag"
                         attr {
                           key: "token"
                           value {
                             s: "pyfunc_0"
                           }
                         }
                         attr {
                           key: "Tout"
                           value {
                             list {
                               type: DT_INT32
                               type: DT_FLOAT
                               type: DT_INT...
      op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
      message = <local> 'Graph execution error:\n\nDetected at node \'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch\' defined at (most recent call last):\n    File "./returnn/rnn.py", line 11, in <module>\n      main()\n    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__mai..., len = 12234
UnknownError: Graph execution error:

Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "./returnn/rnn.py", line 11, in <module>
      main()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
      execute_main_task()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
      engine.init_train_from_config(config, train_data, dev_data, eval_data)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
      self.init_network_from_config(config)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
      self._init_network(net_desc=net_dict, epoch=self.epoch)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
      self.network, self.updater = self.create_network(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
      updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
      self.loss = network.get_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
      self.maybe_construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
      self._construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
      losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
      if loss_obj.get_loss_value_for_objective() is not None:
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
      self._prepare()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
      self._loss_value = self.loss.get_value()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
      fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
      edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
      edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "./returnn/rnn.py", line 11, in <module>
      main()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
      execute_main_task()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
      engine.init_train_from_config(config, train_data, dev_data, eval_data)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
      self.init_network_from_config(config)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
      self._init_network(net_desc=net_dict, epoch=self.epoch)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
      self.network, self.updater = self.create_network(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
      updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
      self.loss = network.get_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
      self.maybe_construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
      self._construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
      losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
      if loss_obj.get_loss_value_for_objective() is not None:
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
      self._prepare()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
      self._loss_value = self.loss.get_value()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
      fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
      edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
      edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
  (0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
         [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
  (1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
  File "./returnn/rnn.py", line 11, in <module>
    main()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
    execute_main_task()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
    engine.init_train_from_config(config, train_data, dev_data, eval_data)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
    self.init_network_from_config(config)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
    self._init_network(net_desc=net_dict, epoch=self.epoch)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
    self.network, self.updater = self.create_network(
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
    updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
    self.loss = network.get_objective()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
    self.maybe_construct_objective()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
    self._construct_objective()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
    losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
    if loss_obj.get_loss_value_for_objective() is not None:
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
    self._prepare()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
    self._loss_value = self.loss.get_value()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
    fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
    edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
    edges, weights, start_end_states = tf_compat.v1.py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/deprecation.py", line 371, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 1176, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 678, in py_func
    return py_func_common(func, inp, Tout, stateful, name=name)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 653, in py_func_common
    return _internal_py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 378, in _internal_py_func
    result = gen_script_ops.py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 795, in _apply_op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3381, in _create_op_internal
    ret = Operation.from_node_def(



During handling of the above exception, another exception occurred:

EXCEPTION
Traceback (most recent call last):
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4341, in help_on_tf_exception
    line: debug_fetch, fetch_helpers, op_copied = FetchHelper.copy_graph(
              debug_fetch,
              target_op=op,
              fetch_helper_tensors=list(op.inputs),
              stop_at_ts=stop_at_ts,
              verbose_stream=file,
          )
    locals:
      debug_fetch = <local> <tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>
      fetch_helpers = <not found>
      op_copied = <not found>
      FetchHelper = <local> <class 'returnn.tf.util.basic.FetchHelper'>
      FetchHelper.copy_graph = <local> <bound method FetchHelper.copy_graph of <class 'returnn.tf.util.basic.FetchHelper'>>
      target_op = <not found>
      op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
      fetch_helper_tensors = <not found>
      list = <builtin> <class 'list'>
      op.inputs = <local> (<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>,)
      stop_at_ts = <local> [<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>, <tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>, <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, <tf.Tensor 'extern_data/placeholders/batch_dim:...
      verbose_stream = <not found>
      file = <local> <returnn.log.Stream object at 0x7fddcfa8ee50>
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/util/basic.py", line 7700, in FetchHelper.copy_graph
    line: assert target_op in ops, "target_op %r,\nops\n%s" % (target_op, pformat(ops))
    locals:
      target_op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
      ops = <local> [<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
      pformat = <local> <function pformat at 0x7fddd3ddcc10>
AssertionError: target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]

Step meta information:
{'seq_idx': [0,
             1,
             2,
             3,
             4,
             5,
             6,
             7,
             8,
             9,
             10,
             11,
             12,
             13,
             14,
             15,
             16,
             17,
             18,
             19,
             20,
             21,
             22,
             23,
             24,
             25,
             26,
             27,
             28,
             29,
             30,
             31,
             32,
             33,
             34,
             35,
             36,
             37,
             38],
 'seq_tag': ['switchboard-1/sw02721B/sw2721B-ms98-a-0031',
             'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
             'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
             'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
             'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
             'switchboard-1/sw02145A/sw2145A-ms98-a-0107',
             'switchboard-1/sw02484A/sw2484A-ms98-a-0077',
             'switchboard-1/sw02768A/sw2768A-ms98-a-0064',
             'switchboard-1/sw03312B/sw3312B-ms98-a-0041',
             'switchboard-1/sw02344B/sw2344B-ms98-a-0023',
             'switchboard-1/sw04248B/sw4248B-ms98-a-0017',
             'switchboard-1/sw02762A/sw2762A-ms98-a-0059',
             'switchboard-1/sw03146A/sw3146A-ms98-a-0047',
             'switchboard-1/sw03032A/sw3032A-ms98-a-0065',
             'switchboard-1/sw02288A/sw2288A-ms98-a-0080',
             'switchboard-1/sw02751A/sw2751A-ms98-a-0066',
             'switchboard-1/sw02369A/sw2369A-ms98-a-0118',
             'switchboard-1/sw04169A/sw4169A-ms98-a-0059',
             'switchboard-1/sw02227A/sw2227A-ms98-a-0016',
             'switchboard-1/sw02061B/sw2061B-ms98-a-0170',
             'switchboard-1/sw02862B/sw2862B-ms98-a-0033',
             'switchboard-1/sw03116B/sw3116B-ms98-a-0065',
             'switchboard-1/sw03517B/sw3517B-ms98-a-0038',
             'switchboard-1/sw02360B/sw2360B-ms98-a-0086',
             'switchboard-1/sw02510B/sw2510B-ms98-a-0061',
             'switchboard-1/sw03919A/sw3919A-ms98-a-0017',
             'switchboard-1/sw02965A/sw2965A-ms98-a-0045',
             'switchboard-1/sw03154A/sw3154A-ms98-a-0073',
             'switchboard-1/sw02299A/sw2299A-ms98-a-0005',
             'switchboard-1/sw04572A/sw4572A-ms98-a-0026',
             'switchboard-1/sw02682A/sw2682A-ms98-a-0022',
             'switchboard-1/sw02808A/sw2808A-ms98-a-0014',
             'switchboard-1/sw04526A/sw4526A-ms98-a-0026',
             'switchboard-1/sw03180B/sw3180B-ms98-a-0010',
             'switchboard-1/sw03227A/sw3227A-ms98-a-0029',
             'switchboard-1/sw03891B/sw3891B-ms98-a-0008',
             'switchboard-1/sw03882B/sw3882B-ms98-a-0041',
             'switchboard-1/sw03102B/sw3102B-ms98-a-0027',
             'switchboard-1/sw02454A/sw2454A-ms98-a-0029']}
Feed dict:
  <tf.Tensor 'extern_data/placeholders/batch_dim:0' shape=() dtype=int32>: int(39)
  <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: shape (39, 10208, 1), dtype float32, min/max -1.0/1.0, mean/stddev 0.0014351769/0.11459725, Tensor{'data', [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}
  <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>: shape (39,), dtype int32, min/max 4760/10208, ([ 4760  6246  6372  6861  7296  7499  7534  7622  7824  8031  8295  8431
  8690  8675  8667  8886  9084  9199  9163  9156  9274  9262  9540  9668
  9678  9719  9711  9902  9989 10010 10020 10073 10006 10102 10131 10112
 10130 10178 10208])
  <tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>: type <class 'list'>, Tensor{'seq_tag', [B?], dtype='string'}
  <tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>: bool(True)
EXCEPTION
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1379, in BaseSession._do_call
    line: return fn(*args)
    locals:
      fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7fdc85e97b80>
      args = <local> ({<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8b849530>: array([[[-0.05505638],
                             [-0.09610788],
                             [-0.05115783],
                             ...,
                             [ 0.        ],
                             [ 0.        ],
                             [ 0.        ]],

                            [[-0.00226238],
                             [-0.01049833],
                             [-0.00...
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1362, in BaseSession._do_run.<locals>._run_fn
    line: return self._call_tf_sessionrun(options, feed_dict, fetch_list,
                                          target_list, run_metadata)
    locals:
      self = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
      self._call_tf_sessionrun = <local> <bound method BaseSession._call_tf_sessionrun of <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>>
      options = <local> None
      feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8b849530>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc879fa270>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc87a00bb0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8c73f530>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
      target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b370>, <tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b3b0>]
      run_metadata = <local> None
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1455, in BaseSession._call_tf_sessionrun
    line: return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
                                                  fetch_list, target_list,
                                                  run_metadata)
    locals:
      tf_session = <global> <module 'tensorflow.python.client.pywrap_tf_session' from '/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/pywrap_tf_session.py'>
      tf_session.TF_SessionRun_wrapper = <global> <built-in method TF_SessionRun_wrapper of PyCapsule object at 0x7fdd96f6a480>
      self = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
      self._session = <local> <tensorflow.python.client._pywrap_tf_session.TF_Session object at 0x7fdc8bcf3770>
      options = <local> None
      feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8b849530>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc879fa270>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc87a00bb0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8c73f530>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
      target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b370>, <tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b3b0>]
      run_metadata = <local> None
UnknownError: 2 root error(s) found.
  (0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
         [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
  (1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

EXCEPTION
Traceback (most recent call last):
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 744, in Runner.run
    line: fetches_results = sess.run(
              fetches_dict, feed_dict=feed_dict, options=run_options
          )  # type: typing.Dict[str,typing.Union[numpy.ndarray,str]]
    locals:
      fetches_results = <not found>
      sess = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
      sess.run = <local> <bound method BaseSession.run of <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>>
      fetches_dict = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 8
      feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      options = <not found>
      run_options = <local> None
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 969, in BaseSession.run
    line: result = self._run(None, fetches, feed_dict, options_ptr,
                             run_metadata_ptr)
    locals:
      result = <not found>
      self = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
      self._run = <local> <bound method BaseSession._run of <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>>
      fetches = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 8
      feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
                                  [-0.09610788],
                                  [-0.05115783],
                                  ...,
                                  [ 0.        ],
                                  [ 0.        ],
                                  [ 0.        ]],

                                 [[-0.00226238],
                                  [-0.01049833],
                                  [-0.001...
      options_ptr = <local> None
      run_metadata_ptr = <local> None
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1192, in BaseSession._run
    line: results = self._do_run(handle, final_targets, final_fetches,
                                 feed_dict_tensor, options, run_metadata)
    locals:
      results = <not found>
      self = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
      self._do_run = <local> <bound method BaseSession._do_run of <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>>
      handle = <local> None
      final_targets = <local> [<tf.Operation 'conformer_1_conv_mod_bn/batch_norm/cond/Merge_1' type=Merge>, <tf.Operation 'optim_and_step_incr' type=NoOp>]
      final_fetches = <local> [<tf.Tensor 'objective/add:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss_init/truediv:0' shape=() dtype=float32>, <tf.Tensor 'globals/mem_usage_deviceGPU0:0' shape=() dtype=in...
      feed_dict_tensor = <local> {<Reference wrapping <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>>: array([[[-0.05505638],
                                         [-0.09610788],
                                         [-0.05115783],
                                         ...,
                                         [ 0.        ],
                                         [ 0.        ],
                                         [ 0.        ]],

                                        [[-0.00226238],
                                         [-0.01049...
      options = <local> None
      run_metadata = <local> None
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1372, in BaseSession._do_run
    line: return self._do_call(_run_fn, feeds, fetches, targets, options,
                               run_metadata)
    locals:
      self = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
      self._do_call = <local> <bound method BaseSession._do_call of <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>>
      _run_fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7fdc85e97b80>
      feeds = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8b849530>: array([[[-0.05505638],
                              [-0.09610788],
                              [-0.05115783],
                              ...,
                              [ 0.        ],
                              [ 0.        ],
                              [ 0.        ]],

                             [[-0.00226238],
                              [-0.01049833],
                              [-0.001...
      fetches = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc879fa270>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc87a00bb0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8c73f530>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
      targets = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b370>, <tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b3b0>]
      options = <local> None
      run_metadata = <local> None
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1398, in BaseSession._do_call
    line: raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
    locals:
      type = <builtin> <class 'type'>
      e = <not found>
      node_def = <local> name: "objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch"
                         op: "PyFunc"
                         input: "extern_data/placeholders/seq_tag/seq_tag"
                         attr {
                           key: "token"
                           value {
                             s: "pyfunc_0"
                           }
                         }
                         attr {
                           key: "Tout"
                           value {
                             list {
                               type: DT_INT32
                               type: DT_FLOAT
                               type: DT_INT...
      op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
      message = <local> 'Graph execution error:\n\nDetected at node \'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch\' defined at (most recent call last):\n    File "./returnn/rnn.py", line 11, in <module>\n      main()\n    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__mai..., len = 12234
UnknownError: Graph execution error:

Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "./returnn/rnn.py", line 11, in <module>
      main()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
      execute_main_task()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
      engine.init_train_from_config(config, train_data, dev_data, eval_data)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
      self.init_network_from_config(config)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
      self._init_network(net_desc=net_dict, epoch=self.epoch)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
      self.network, self.updater = self.create_network(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
      updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
      self.loss = network.get_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
      self.maybe_construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
      self._construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
      losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
      if loss_obj.get_loss_value_for_objective() is not None:
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
      self._prepare()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
      self._loss_value = self.loss.get_value()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
      fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
      edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
      edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
    File "./returnn/rnn.py", line 11, in <module>
      main()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
      execute_main_task()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
      engine.init_train_from_config(config, train_data, dev_data, eval_data)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
      self.init_network_from_config(config)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
      self._init_network(net_desc=net_dict, epoch=self.epoch)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
      self.network, self.updater = self.create_network(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
      updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
      self.loss = network.get_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
      self.maybe_construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
      self._construct_objective()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
      losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
      if loss_obj.get_loss_value_for_objective() is not None:
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
      self._prepare()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
      self._loss_value = self.loss.get_value()
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
      fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
      edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
    File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
      edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
  (0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
         [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
  (1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
    ret = self._read()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
    return util.read_pickled_object(p)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
    size_raw = read_bytes_to_new_buffer(p, 4).getvalue()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
    raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))

EOFError: expected to read 4 bytes but got EOF after 0 bytes


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
    return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
    edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
    instance = self._get_instance(i)

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
    self._maybe_create_new_instance()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
    self.instances.append(SprintSubprocessInstance(**self.sprint_opts))

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
    self.init()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
    self._start_child()

  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
    raise Exception("SprintSubprocessInstance Sprint init failed")

Exception: SprintSubprocessInstance Sprint init failed


         [[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
  File "./returnn/rnn.py", line 11, in <module>
    main()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
    execute_main_task()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
    engine.init_train_from_config(config, train_data, dev_data, eval_data)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
    self.init_network_from_config(config)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
    self._init_network(net_desc=net_dict, epoch=self.epoch)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
    self.network, self.updater = self.create_network(
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
    updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
    self.loss = network.get_objective()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
    self.maybe_construct_objective()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
    self._construct_objective()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
    losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
    if loss_obj.get_loss_value_for_objective() is not None:
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
    self._prepare()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
    self._loss_value = self.loss.get_value()
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
    fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
    edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
  File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
    edges, weights, start_end_states = tf_compat.v1.py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/deprecation.py", line 371, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 1176, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 678, in py_func
    return py_func_common(func, inp, Tout, stateful, name=name)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 653, in py_func_common
    return _internal_py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 378, in _internal_py_func
    result = gen_script_ops.py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 795, in _apply_op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3381, in _create_op_internal
    ret = Operation.from_node_def(

Save model under output/models/epoch.001.crash_0
Trainer not finalized, quitting. (pid 2131331)

@albertz
Copy link
Member

albertz commented Nov 8, 2023

There it seems that RASR does not start at all. I see:

/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard: error while loading shared libraries: libtensorflow_cc.so.2: cannot open shared object file: No such file or directory

@albertz
Copy link
Member

albertz commented Nov 8, 2023

Btw, the RASR segmentation fault looks actually like a bug in RASR. RASR should never segfault.

@Marvin84
Copy link

Marvin84 commented Nov 8, 2023 via email

@albertz
Copy link
Member

albertz commented Nov 8, 2023

Whenever RASR gives a segfault, that's a bug in RASR. It should never segfault. Can you link corresponding RASR issues here? Or if this is not reported yet, can you open a corresponding RASR issue?

@vieting
Copy link
Contributor Author

vieting commented Nov 9, 2023

I created a RASR issue about the segfault in RASR with the tf2.14 image and RASR: rwth-i6/rasr#68

@albertz
Copy link
Member

albertz commented Nov 9, 2023

With my previous settings (tf2.13, RASR compiled with tf2.8)

There it seems that RASR does not start at all. I see:

/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard: error while loading shared libraries: libtensorflow_cc.so.2: cannot open shared object file: No such file or directory

@vieting Did you look at that? Did you fix it? Maybe it just needs the right LD_LIBRARY_PATH. Or use some other RASR, maybe one without TF.

@vieting
Copy link
Contributor Author

vieting commented Nov 9, 2023

With my previous settings (tf2.13, RASR compiled with tf2.8)

There it seems that RASR does not start at all. I see:

/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard: error while loading shared libraries: libtensorflow_cc.so.2: cannot open shared object file: No such file or directory

@vieting Did you look at that? Did you fix it? Maybe it just needs the right LD_LIBRARY_PATH. Or use some other RASR, maybe one without TF.

I just tried with the tf2.13 image and a RASR that was compiled without TF. There, I also get a segmentation fault. It looks identical to the one in rwth-i6/rasr#68.

Segmentation fault                                                                                                                                                                           [1321/42960]

Creating stack trace (innermost first):
#2  /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7fe88e9ec420]
#3  /lib/x86_64-linux-gnu/libpthread.so.0(raise+0xcb) [0x7fe88e9ec2ab]
#4  /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7fe88e9ec420]
#5  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Ftl::TrimAutomaton<Fsa::Automaton>::getState(unsigned int) const+0x3a) [0x55e5abccd12a]
#6  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Ftl::CacheAutomaton<Fsa::Automaton>::getState(unsigned int) const+0x373) [0x55e5abcdd653]
#7  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(+0xa0cdd3) [0x55e5abc54dd3]
#8  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(+0xa0f376) [0x55e5abc57376]
#9  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Am::TransitionModel::apply(Core::Ref<Fsa::Automaton const>, int, bool) const+0x25b) [0x55e5abc4fc2b]
#10  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Am::ClassicTransducerBuilder::applyTransitionModel(Core::Ref<Fsa::Automaton const>)+0x34d) [0x55e5abc436dd]
#11  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Speech::AllophoneStateGraphBuilder::addLoopTransition(Core::Ref<Fsa::Automaton const>)+0x11e) [0x55e5abb05cde]
#12  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Speech::CTCTopologyGraphBuilder::addLoopTransition(Core::Ref<Fsa::Automaton const>)+0x45) [0x55e5abb08e05]
#13  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Speech::CTCTopologyGraphBuilder::buildTransducer(Core::Ref<Fsa::Automaton const>)+0x80) [0x55e5abb0a4a0]
#14  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Speech::AllophoneStateGraphBuilder::buildTransducer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x60) [0x55e5abb06d90]
#15  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Speech::AllophoneStateGraphBuilder::build(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x2e) [0x55e5abb087de]
#16  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Nn::AllophoneStateFsaExporter::exportFsaForOrthography(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const+0x4c) [0x55e5ab9a9d1c]
#17  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Nn::PythonControl::Internal::exportAllophoneStateFsaBySegName(_object*, _object*)+0x108) [0x55e5ab990918]
#18  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Nn::PythonControl::Internal::callback(_object*, _object*)+0x297) [0x55e5ab990fb7]
#19  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8748) [0x7fe88a2f4748]
#20  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyObject_MakeTpCall+0xab) [0x7fe88a2f4b2b]
#21  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74df3) [0x7fe88a0c0df3]
#22  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7d86) [0x7fe88a0c8ef6]
#23  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b) [0x7fe88a0cc06b]
#24  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8d37) [0x7fe88a2f4d37]
#25  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyVectorcall_Call+0x60) [0x7fe88a2f4840]
#26  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x590a) [0x7fe88a0c6a7a]
#27  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb) [0x7fe88a216e4b]
#28  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94) [0x7fe88a2f4124]
#29  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8d37) [0x7fe88a2f4d37]
#30  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyVectorcall_Call+0x60) [0x7fe88a2f4840]
#31  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x590a) [0x7fe88a0c6a7a]
#32  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b) [0x7fe88a0cc06b]
#33  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d) [0x7fe88a0c0d6d]
#34  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0xea8) [0x7fe88a0c2018]
#35  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb) [0x7fe88a216e4b]
#36  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94) [0x7fe88a2f4124]
#37  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8d37) [0x7fe88a2f4d37]
#38  /lib/x86_64-linux-gnu/libpython3.8.so.1.0(PyVectorcall_Call+0x60) [0x7fe88a2f4840]
#39  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Python::PyCallKw(_object*, char const*, char const*, ...)+0xe6) [0x55e5abc2ab56]
#40  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Nn::PythonControl::run_control_loop()+0x66) [0x55e5ab984246]
#41  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(NnTrainer::pythonControl()+0x117) [0x55e5ab701ed7]
#42  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(NnTrainer::main(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)+0x304) [0x55e5ab6dcdf4]
#43  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Core::Application::run(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)+0x23) [0x55e5ab748913]
#44  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(Core::Application::main(int, char**)+0x5fb) [0x55e5ab6de69b]
#45  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(main+0x3d) [0x55e5ab6dc4ed]
#46  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fe889b32083]
#47  /u/hilmes/dev/rasr_onnx_115/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_start+0x2e) [0x55e5ab7016be]

@albertz
Copy link
Member

albertz commented Nov 9, 2023

So, on RETURNN/Python side, the last call before the crash is basically:

    def _handle_cmd_export_allophone_state_fsa_by_segment_name(self, segment_name):
        return self.callback("export_allophone_state_fsa_by_segment_name", segment_name)

Everything happens inside RASR then (callback is the Python API of RASR).
Do we know which segment that is? Do you see that in the RASR log? Otherwise maybe add a print here to show it.

@vieting
Copy link
Contributor Author

vieting commented Nov 9, 2023

I just saw that I get the same segmentation fault also with the old tf2.8 image and RASR without TF. So maybe this is about some mismatch. With tf2.8 image and RASR compiled with that image, the example I created runs properly.

@Marvin84
Copy link

Marvin84 commented Nov 9, 2023

@vieting what rasr binary are you using? Is this up to date? there was one memory leak bug once we integrated the FSA bug correction and got CTC topology under same subroutine. @SimBe195 did a correction for this a few months ago.

@vieting
Copy link
Contributor Author

vieting commented Nov 9, 2023

@vieting what rasr binary are you using? Is this up to date? there was one memory leak bug once we integrated the FSA bug correction and got CTC topology under same subroutine. @SimBe195 did a correction for this a few months ago.

You mean this here, right? rwth-i6/rasr#47

The RASR without TF is from Bene on branch add_onnx_support with last commits from August 2023. The RASR for tf2.14 is the current GitHub main branch. Both have the commit from rwth-i6/rasr#47. Also my RASR with tf2.8 has it.

@albertz

This comment was marked as resolved.

@vieting
Copy link
Contributor Author

vieting commented Nov 9, 2023

My tf 2.8 image and RASR compiled with that image works. All other combinations do not work, including the tf 2.14 image from rwth-i6/rasr#64 and RASR compiled with that image.

@albertz
Copy link
Member

albertz commented Nov 9, 2023

tf 2.8 image and RASR compiled with that image

And this is the same RASR version as in the other cases?

@albertz albertz changed the title TF get_sprint_automata_for_batch: EOFError: Ran out of input TF get_sprint_automata_for_batch: RASR segmentation fault in Speech::CTCTopologyGraphBuilder::addLoopTransition Nov 9, 2023
@albertz
Copy link
Member

albertz commented Nov 9, 2023

It seems like the RASR bug (causing seg fault) is fixed by rwth-i6/rasr#50.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants