Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E tensorflow/stream_executor/dnn.cc:616] CUDNN_STATUS_EXECUTION_FAILED #41993

Closed
summa-code opened this issue Aug 3, 2020 · 8 comments
Closed
Assignees
Labels
stat:awaiting response Status - Awaiting response from author type:bug Bug

Comments

@summa-code
Copy link

Please make sure that this is a bug. As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): 20.04
  • TensorFlow installed from (source or binary): Latest
  • TensorFlow version (use command below): Latest
  • Python version: 3.8.1
  • Bazel version (if compiling from source): 3.1.0
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: 11.0
  • GPU model and memory: RTX 2070

https://www.tensorflow.org/tutorials/structured_data/time_series

E tensorflow/stream_executor/dnn.cc:616] CUDNN_STATUS_EXECUTION_FAILED
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1831): 'cudnnRNNForwardTraining( cudnn.handle(), rnn_desc.handle(), model_dims.max_seq_length, input_desc.handles(), input_data.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), rnn_desc.params_handle(), params.opaque(), output_desc.handles(), output_data->opaque(), output_h_desc.handle(), output_h_data->opaque(), output_c_desc.handle(), output_c_data->opaque(), workspace.opaque(), workspace.size(), reserve_space.opaque(), reserve_space.size())'
2020-08-02 22:58:40.548649: W tensorflow/core/framework/op_kernel.cc:1772] OP_REQUIRES failed at cudnn_rnn_ops.cc:1517 : Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 19, 32, 1, 24, 32, 32]

@Saduf2019
Copy link
Contributor

@summa-code
Please provide with simple stand alone code or a colab gist with the error for us to analyse the issue.

@Saduf2019 Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Aug 3, 2020
@applied-machinelearning
Copy link

As LSTM is used with CUDA, it is probably this: #41630 , you can try the workaround with the environment variable TF_CUDNN_RESET_RND_GEN_STATE=1

@summa-code
Copy link
Author

summa-code commented Aug 3, 2020

NOPE, it did not work, here is console output

CUDNN_STATUS_EXECUTION_FAILED
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1936): 'cudnnRNNBackwardData( cudnn.handle(), rnn_desc.handle(), model_dims.max_seq_length, output_desc.handles(), output_data.opaque(), output_desc.handles(), output_backprop_data.opaque(), output_h_desc.handle(), output_h_backprop_data.opaque(), output_c_desc.handle(), output_c_backprop_data.opaque(), rnn_desc.params_handle(), params.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), input_desc.handles(), input_backprop_data->opaque(), input_h_desc.handle(), input_h_backprop_data->opaque(), input_c_desc.handle(), input_c_backprop_data->opaque(), workspace.opaque(), workspace.size(), reserve_space_data->opaque(), reserve_space_data->size())'

On the notebook:

InternalError: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 19, 32, 1, 24, 32, 32]
[[{{node gradients/CudnnRNN_grad/CudnnRNNBackprop}}]]
[[PartitionedCall]] [Op:__inference_train_function_149648]

Function call stack:
train_function -> train_function -> train_function

@Saduf2019 Saduf2019 removed the stat:awaiting response Status - Awaiting response from author label Aug 4, 2020
@Saduf2019
Copy link
Contributor

@summa-code
This seems to be a duplicate of #41987, please confirm.

@Saduf2019 Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Aug 4, 2020
@summa-code
Copy link
Author

Ah !!! Looks like it. My bad.. has been doing few things and did not track what i filed before. Yes same.

@Saduf2019
Copy link
Contributor

Moving to closed status as its a duplicate of #41987

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@ThinhNgVhust
Copy link

TF_CUDNN_RESET_RND_GEN_STATE=1

this is help me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting response Status - Awaiting response from author type:bug Bug
Projects
None yet
Development

No branches or pull requests

4 participants