Fix #41630: include max_seq_length in cudnn descriptor cache key #41832

lissyx · 2020-07-28T18:43:38Z

No description provided.

googlebot · 2020-07-28T18:43:42Z

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

googlebot · 2020-07-28T18:45:48Z

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

tensorflow/core/kernels/cudnn_rnn_ops.cc

sanjoy

Looks fine to me, but I'll wait for @kaixih to LGTM also.

tensorflow/core/kernels/cudnn_rnn_ops.cc

sanjoy

Actually, can you please also add a test case?

lissyx · 2020-07-30T09:24:45Z

Actually, can you please also add a test case?

I can't find any existing test file supporting what is in tensorflow/core/kernels/cudnn_rnn_ops.cc, and the coverage of CONTRIBUTING.md for tests is quite scarce, it just covers how to run them. Would you have some pointers on where I should add test case?

lissyx · 2020-07-30T11:06:55Z

Actually, can you please also add a test case?

I can't find any existing test file supporting what is in tensorflow/core/kernels/cudnn_rnn_ops.cc, and the coverage of CONTRIBUTING.md for tests is quite scarce, it just covers how to run them. Would you have some pointers on where I should add test case?

Managed to add something as a new tf_cuda_cc_test, so it required some new headers and moving the two structs into that. It gets more invasise of a patch than I hoped for at the begining, but locally tests runs well and catches the regression.

kaixih · 2020-07-30T19:02:13Z

I am wondering if it is possible to limit the test into the python test, like in the lstm_v2_test.py. All we need to do is to design a simple training as in the issue 41630 that uses [1, 2048, 2048, 1, 74, 2, 2048] in the first step and store it to the cache but [1, 2048, 2048, 1, 75, 2, 2048] in the second step. If this can be successfully trained on GPU, we can view it as passed. How do you think?

lissyx · 2020-07-30T19:14:13Z

I am wondering if it is possible to limit the test into the python test, like in the lstm_v2_test.py. All we need to do is to design a simple training as in the issue 41630 that uses [1, 2048, 2048, 1, 74, 2, 2048] in the first step and store it to the cache but [1, 2048, 2048, 1, 75, 2, 2048] in the second step. If this can be successfully trained on GPU, we can view it as passed. How do you think?

I can look into that, it would sounds like a better option than the current test I wrote.

lissyx · 2020-07-30T21:23:40Z

lstm_v2_test.py.

Do you know if there's any magic to run python tests required? I've followed CONTRIBUTING.md section about running unit tests, but

bazel test -s --verbose_failures --config=noaws --config=nogcp --config=nohdfs --config=nonccl -c opt --config=cuda -k //tensorflow/python/keras/layers
[...]
INFO: Analyzed target //tensorflow/python/keras/layers:layers (0 packages loaded, 0 targets configured).
INFO: Found 1 target and 0 test targets...
Target //tensorflow/python/keras/layers:layers up-to-date (nothing to build)
INFO: Elapsed time: 0.336s, Critical Path: 0.00s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action

kaixih · 2020-07-30T21:26:38Z

Have you tried to simply run python lstm_v2_test.py?

lissyx · 2020-07-30T21:30:50Z

Have you tried to simply run python lstm_v2_test.py?

Even simpler: //tensorflow/python/keras/layers:lstm_v2_test bazel target, I assumed passing the directory would run all tests there.

lissyx · 2020-07-31T11:03:27Z

Making progresses:

  @test_util.run_gpu_only
  def test_lstm_variable_sequence_length(self):
    if test.is_built_with_rocm():
      self.skipTest('Skipping the test as ROCm MIOpen does not '
                    'support padded input yet.')
    input_shape = 2048
    rnn_state_size = 2048
    batch = 2

    timestep = 1

    (x_train, y_train), _ = testing_utils.get_test_data(
        train_samples=batch,
        test_samples=0,
        input_shape=(timestep, input_shape),
        num_classes=rnn_state_size,
        random_seed=random_seed.DEFAULT_GRAPH_SEED)

    timestep = 1

    (x2_train, y2_train), _ = testing_utils.get_test_data(
        train_samples=batch,
        test_samples=0,
        input_shape=(timestep, input_shape),
        num_classes=rnn_state_size,
        random_seed=random_seed.DEFAULT_GRAPH_SEED)

    inputs = keras.layers.Input(
        shape=[timestep, input_shape], dtype=dtypes.float32)
    masked_input = keras.layers.Masking()(inputs)

    with test_util.device(use_gpu=True):
      cudnn_layer = rnn.LSTM(rnn_state_size,
                             activation='tanh',
                             recurrent_activation='sigmoid',
                             recurrent_dropout=0,
                             unroll=False,
                             use_bias=True)
      cudnn_model = keras.models.Model(inputs, cudnn_layer(masked_input))

    cudnn_model.compile('rmsprop', 'mse')
    cudnn_model.fit(x_train, y_train)
    cudnn_model.fit(x2_train, y2_train)

With that, I can assert that it is going through the layers and reaches IsCompatibleWith:


IsCompatibleWith -> [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 1, 2, 2048] 
IsCompatibleWith -> [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 1, 2, 2048] 
IsCompatibleWith -> [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 1, 2, 2048] 
IsCompatibleWith -> [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 1, 2, 2048] 
IsCompatibleWith -> [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 1, 2, 2048] 
IsCompatibleWith -> [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 1, 2, 2048]

However, changing timestep to be 74 on x_train and 75 on x2_train seems to bypass the GPU implementation, because I can't see any IsCompatibleWith output in the log but I see:

WARNING:tensorflow:Model was constructed with shape (None, 75, 2048) for input Tensor("input_1:0", shape=(None, 75, 2048), dtype=float32), but it was called on an input with incompatible shape (None, 74, 2048).
W0731 11:00:01.229415 139681924948352 functional.py:583] Model was constructed with shape (None, 75, 2048) for input Tensor("input_1:0", shape=(None, 75, 2048), dtype=float32), but it was called on an input with incompatible shape (None, 74, 2048).

Even directly calling CuDNNLSTM from tensorflow/python/keras/layers/cudnn_recurrent.py I'm blocked at Keras level when adding the 74/75 dimensional difference.

Forcing eager execution using @test_util.run_in_graph_and_eager_modes,

ValueError: Error when checking input: expected input_1 to have shape (75, 64) but got array with shape (74, 64)

And to the best of my knowledge, with current TensorFlow, since there is no more tensorflow.contrib Keras is the only way to build LSTM layers, right? So I'm unsure how to solve this :/

…e key

lissyx · 2020-08-07T18:30:11Z

Can we hope for a 1.15.4 to include that fix ?

sanjoy · 2020-08-07T19:22:17Z

Can we hope for a 1.15.4 to include that fix ?

CC @goldiegadde

lissyx · 2020-08-18T08:58:38Z

Can we hope for a 1.15.4 to include that fix ?

CC @goldiegadde

Gentle ping, can we hope for 1.15.4 or should we direct people hitting this issue to use the magic reset state env variable?

lissyx · 2020-08-24T12:49:16Z

@sanjoy @kaixih @goldiegadde Gentle ping? There has been several commit pushed to r1.15, so it would really be nice to include this one if you are preparing a 1.15.4. I can send the PR if you'd like.

goldiegadde · 2020-08-24T18:46:24Z

@lissyx Sorry for the delayed response, can you please open a PR against the 1.15 branch. cc @mihaimaruseac as well.

mihaimaruseac · 2020-08-24T18:51:52Z

Apologies, I missed this myself too. Yes, please, let's open a PR against the branch

lissyx · 2020-08-24T19:18:43Z

@lissyx Sorry for the delayed response, can you please open a PR against the 1.15 branch. cc @mihaimaruseac as well.

#42634

google-ml-butler bot added the size:XS CL Change Size: Extra Small label Jul 28, 2020

googlebot added the cla: no label Jul 28, 2020

lissyx force-pushed the issue41630 branch from 1e698ad to 81d6287 Compare July 28, 2020 18:45

googlebot added cla: yes and removed cla: no labels Jul 28, 2020

lissyx mentioned this pull request Jul 28, 2020

CudnnLSTM variable sequence length sometimes fails with CUDNN_STATUS_EXECUTION_FAILED #41630

Closed

sanjoy reviewed Jul 28, 2020

View reviewed changes

tensorflow/core/kernels/cudnn_rnn_ops.cc Show resolved Hide resolved

gbaned self-assigned this Jul 29, 2020

gbaned added the comp:core issues related to core part of tensorflow label Jul 29, 2020

gbaned added this to Assigned Reviewer in PR Queue via automation Jul 29, 2020

lissyx force-pushed the issue41630 branch from 81d6287 to a63731f Compare July 29, 2020 08:24

gbaned requested a review from sanjoy July 29, 2020 11:49

lissyx force-pushed the issue41630 branch from a63731f to c6a9f1a Compare July 29, 2020 21:44

sanjoy requested a review from kaixih July 30, 2020 04:24

sanjoy reviewed Jul 30, 2020

View reviewed changes

tensorflow/core/kernels/cudnn_rnn_ops.cc Outdated Show resolved Hide resolved

PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes Jul 30, 2020

sanjoy suggested changes Jul 30, 2020

View reviewed changes

lissyx force-pushed the issue41630 branch from c6a9f1a to a529714 Compare July 30, 2020 09:21

lissyx force-pushed the issue41630 branch from a529714 to 6d6dc27 Compare July 30, 2020 11:06

Fix tensorflow#41630: include max_seq_length in cudnn descriptor cach…

9442cbb

…e key

kaixih approved these changes Aug 6, 2020

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Aug 6, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Aug 6, 2020

gbaned removed the ready to pull PR ready for merge process label Aug 6, 2020

gbaned requested a review from sanjoy August 6, 2020 11:22

sanjoy approved these changes Aug 6, 2020

View reviewed changes

PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Aug 6, 2020

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Aug 6, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Aug 6, 2020

gbaned added ready to pull PR ready for merge process and removed ready to pull PR ready for merge process labels Aug 7, 2020

gbaned approved these changes Aug 7, 2020

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Aug 7, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Aug 7, 2020

gbaned added ready to pull PR ready for merge process and removed ready to pull PR ready for merge process labels Aug 7, 2020

tensorflow-copybara merged commit 0fb8017 into tensorflow:master Aug 7, 2020

PR Queue automation moved this from Approved by Reviewer to Merged Aug 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #41630: include max_seq_length in cudnn descriptor cache key #41832

Fix #41630: include max_seq_length in cudnn descriptor cache key #41832

lissyx commented Jul 28, 2020

googlebot commented Jul 28, 2020

googlebot commented Jul 28, 2020

sanjoy left a comment

sanjoy left a comment

lissyx commented Jul 30, 2020

lissyx commented Jul 30, 2020

kaixih commented Jul 30, 2020

lissyx commented Jul 30, 2020

lissyx commented Jul 30, 2020

kaixih commented Jul 30, 2020

lissyx commented Jul 30, 2020

lissyx commented Jul 31, 2020 •

edited

lissyx commented Aug 7, 2020

sanjoy commented Aug 7, 2020

lissyx commented Aug 18, 2020

lissyx commented Aug 24, 2020

goldiegadde commented Aug 24, 2020

mihaimaruseac commented Aug 24, 2020

lissyx commented Aug 24, 2020

Fix #41630: include max_seq_length in cudnn descriptor cache key #41832

Fix #41630: include max_seq_length in cudnn descriptor cache key #41832

Conversation

lissyx commented Jul 28, 2020

googlebot commented Jul 28, 2020

googlebot commented Jul 28, 2020

sanjoy left a comment

Choose a reason for hiding this comment

sanjoy left a comment

Choose a reason for hiding this comment

lissyx commented Jul 30, 2020

lissyx commented Jul 30, 2020

kaixih commented Jul 30, 2020

lissyx commented Jul 30, 2020

lissyx commented Jul 30, 2020

kaixih commented Jul 30, 2020

lissyx commented Jul 30, 2020

lissyx commented Jul 31, 2020 • edited

lissyx commented Aug 7, 2020

sanjoy commented Aug 7, 2020

lissyx commented Aug 18, 2020

lissyx commented Aug 24, 2020

goldiegadde commented Aug 24, 2020

mihaimaruseac commented Aug 24, 2020

lissyx commented Aug 24, 2020

lissyx commented Jul 31, 2020 •

edited