You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Have I written custom code (as opposed to running examples on an unmodified clone of the repository): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
TensorFlow installed from (our builds, or upstream TensorFlow): pip install tensorflow-gpu==1.15.2
TensorFlow version (use command below): v1.15.0-92-g5d80e1e 1.15.2
Python version: Python 3.6.10 :: Anaconda, Inc.
CUDA/cuDNN version: CUDA Version 10.0.130
Thank you for the great repo.
I am trying to train German deepspeech model. I am using pre-processing scripts from the bin folder and able to train the model sucesssfully on Common Voice and Mailabs dataset. However, when I try to train the model on Tuda-De dataset, I am getting below exceptions;
Could you please help to fix the issue?
(deepspeech_v0.7.4) agarwal@LTLab.lan@wika:~/deepspeech_v0.7.4$ python DeepSpeech.py --train_files ../german-speech-corpus/tuda-de/data_prepared_mozilla_v0.7.4/tuda-v2-train.csv --dev_files ../german-speech-corpus/tuda-de/data_prepared_mozilla_v0.7.4/tuda-v2-dev.csv --test_files ../german-speech-corpus/tuda-de/data_prepared_mozilla_v0.7.4/tuda-v2-test.csv --alphabet_config_path ../dependencies_v0.7.4/swiss-german/alphabet.txt --scorer ../dependencies_v0.7.4/swiss-german/kenlm.scorer --test_batch_size 36 --train_batch_size 24 --dev_batch_size 36 --epochs 30 --learning_rate 0.0001 --dropout_rate 0.25 --early_stop True --es_epochs 5 --train_cudnn --checkpoint_dir checkpoints_experiments2/tmp/
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:16 | Steps: 1 | Loss: 191.423950 Traceback (most recent call last):
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 61, 24, 2048]
[[{{node tower_0/cudnn_lstm/CudnnRNNV3_2}}]]
(1) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 61, 24, 2048]
[[{{node tower_0/cudnn_lstm/CudnnRNNV3_2}}]]
[[tower_0/CTCLoss/_115]]
0 successful operations.
2 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "DeepSpeech.py", line 12, in <module>
ds_train.run_script()
File "/media/data/LTLab.lan/agarwal/deepspeech_v0.7.4/training/deepspeech_training/train.py", line 955, in run_script
absl.app.run(main)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/media/data/LTLab.lan/agarwal/deepspeech_v0.7.4/training/deepspeech_training/train.py", line 927, in main
train()
File "/media/data/LTLab.lan/agarwal/deepspeech_v0.7.4/training/deepspeech_training/train.py", line 595, in train
train_loss, _ = run_set('train', epoch, train_init_op)
File "/media/data/LTLab.lan/agarwal/deepspeech_v0.7.4/training/deepspeech_training/train.py", line 560, in run_set
feed_dict=feed_dict)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 61, 24, 2048]
[[node tower_0/cudnn_lstm/CudnnRNNV3_2 (defined at /home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
(1) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 61, 24, 2048]
[[node tower_0/cudnn_lstm/CudnnRNNV3_2 (defined at /home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[tower_0/CTCLoss/_115]]
0 successful operations.
2 derived errors ignored.
Original stack trace for 'tower_0/cudnn_lstm/CudnnRNNV3_2':
File "DeepSpeech.py", line 12, in <module>
ds_train.run_script()
File "/media/data/LTLab.lan/agarwal/deepspeech_v0.7.4/training/deepspeech_training/train.py", line 955, in run_script
absl.app.run(main)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/media/data/LTLab.lan/agarwal/deepspeech_v0.7.4/training/deepspeech_training/train.py", line 927, in main
train()
File "/media/data/LTLab.lan/agarwal/deepspeech_v0.7.4/training/deepspeech_training/train.py", line 473, in train
gradients, loss, non_finite_files = get_tower_results(iterator, optimizer, dropout_rates)
File "/media/data/LTLab.lan/agarwal/deepspeech_v0.7.4/training/deepspeech_training/train.py", line 312, in get_tower_results
avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(iterator, dropout_rates, reuse=i > 0)
File "/media/data/LTLab.lan/agarwal/deepspeech_v0.7.4/training/deepspeech_training/train.py", line 239, in calculate_mean_edit_distance_and_loss
logits, _ = create_model(batch_x, batch_seq_len, dropout, reuse=reuse, rnn_impl=rnn_impl)
File "/media/data/LTLab.lan/agarwal/deepspeech_v0.7.4/training/deepspeech_training/train.py", line 190, in create_model
output, output_state = rnn_impl(layer_3, seq_length, previous_state, reuse)
File "/media/data/LTLab.lan/agarwal/deepspeech_v0.7.4/training/deepspeech_training/train.py", line 128, in rnn_impl_cudnn_rnn
sequence_lengths=seq_length)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/python/layers/base.py", line 548, in __call__
outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 854, in __call__
outputs = call_fn(cast_inputs, *args, **kwargs)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 234, in wrapper
return converted_call(f, options, args, kwargs)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 439, in converted_call
return _call_unconverted(f, args, kwargs, options)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 330, in _call_unconverted
return f(*args, **kwargs)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 440, in call
training)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 518, in _forward
seed=self._seed)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 1132, in _cudnn_rnn
outputs, output_h, output_c, _, _ = gen_cudnn_rnn_ops.cudnn_rnnv3(**args)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_cudnn_rnn_ops.py", line 2051, in cudnn_rnnv3
time_major=time_major, name=name)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/home/LTLab.lan/agarwal/miniconda3/envs/deepspeech_v0.7.4/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
The text was updated successfully, but these errors were encountered:
Thank you for the great repo.
I am trying to train German deepspeech model. I am using pre-processing scripts from the bin folder and able to train the model sucesssfully on Common Voice and Mailabs dataset. However, when I try to train the model on Tuda-De dataset, I am getting below exceptions;
Could you please help to fix the issue?
The text was updated successfully, but these errors were encountered: