Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #41630: include max_seq_length in cudnn descriptor cache key #41832

Merged
merged 1 commit into from Aug 7, 2020

Conversation

lissyx
Copy link
Contributor

@lissyx lissyx commented Jul 28, 2020

No description provided.

@google-ml-butler google-ml-butler bot added the size:XS CL Change Size: Extra Small label Jul 28, 2020
@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@gbaned gbaned self-assigned this Jul 29, 2020
@gbaned gbaned added the comp:core issues related to core part of tensorflow label Jul 29, 2020
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Jul 29, 2020
@gbaned gbaned requested a review from sanjoy July 29, 2020 11:49
@sanjoy sanjoy requested a review from kaixih July 30, 2020 04:24
Copy link
Contributor

@sanjoy sanjoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me, but I'll wait for @kaixih to LGTM also.

tensorflow/core/kernels/cudnn_rnn_ops.cc Outdated Show resolved Hide resolved
PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes Jul 30, 2020
Copy link
Contributor

@sanjoy sanjoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, can you please also add a test case?

@lissyx
Copy link
Contributor Author

lissyx commented Jul 30, 2020

Actually, can you please also add a test case?

I can't find any existing test file supporting what is in tensorflow/core/kernels/cudnn_rnn_ops.cc, and the coverage of CONTRIBUTING.md for tests is quite scarce, it just covers how to run them. Would you have some pointers on where I should add test case?

@lissyx
Copy link
Contributor Author

lissyx commented Jul 30, 2020

Actually, can you please also add a test case?

I can't find any existing test file supporting what is in tensorflow/core/kernels/cudnn_rnn_ops.cc, and the coverage of CONTRIBUTING.md for tests is quite scarce, it just covers how to run them. Would you have some pointers on where I should add test case?

Managed to add something as a new tf_cuda_cc_test, so it required some new headers and moving the two structs into that. It gets more invasise of a patch than I hoped for at the begining, but locally tests runs well and catches the regression.

@kaixih
Copy link
Contributor

kaixih commented Jul 30, 2020

I am wondering if it is possible to limit the test into the python test, like in the lstm_v2_test.py. All we need to do is to design a simple training as in the issue 41630 that uses [1, 2048, 2048, 1, 74, 2, 2048] in the first step and store it to the cache but [1, 2048, 2048, 1, 75, 2, 2048] in the second step. If this can be successfully trained on GPU, we can view it as passed. How do you think?

@lissyx
Copy link
Contributor Author

lissyx commented Jul 30, 2020

I am wondering if it is possible to limit the test into the python test, like in the lstm_v2_test.py. All we need to do is to design a simple training as in the issue 41630 that uses [1, 2048, 2048, 1, 74, 2, 2048] in the first step and store it to the cache but [1, 2048, 2048, 1, 75, 2, 2048] in the second step. If this can be successfully trained on GPU, we can view it as passed. How do you think?

I can look into that, it would sounds like a better option than the current test I wrote.

@lissyx
Copy link
Contributor Author

lissyx commented Jul 30, 2020

lstm_v2_test.py.

Do you know if there's any magic to run python tests required? I've followed CONTRIBUTING.md section about running unit tests, but

bazel test -s --verbose_failures --config=noaws --config=nogcp --config=nohdfs --config=nonccl -c opt --config=cuda -k //tensorflow/python/keras/layers
[...]
INFO: Analyzed target //tensorflow/python/keras/layers:layers (0 packages loaded, 0 targets configured).
INFO: Found 1 target and 0 test targets...
Target //tensorflow/python/keras/layers:layers up-to-date (nothing to build)
INFO: Elapsed time: 0.336s, Critical Path: 0.00s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action

@kaixih
Copy link
Contributor

kaixih commented Jul 30, 2020

Have you tried to simply run python lstm_v2_test.py?

@lissyx
Copy link
Contributor Author

lissyx commented Jul 30, 2020

Have you tried to simply run python lstm_v2_test.py?

Even simpler: //tensorflow/python/keras/layers:lstm_v2_test bazel target, I assumed passing the directory would run all tests there.

@lissyx
Copy link
Contributor Author

lissyx commented Jul 31, 2020

Making progresses:

  @test_util.run_gpu_only
  def test_lstm_variable_sequence_length(self):
    if test.is_built_with_rocm():
      self.skipTest('Skipping the test as ROCm MIOpen does not '
                    'support padded input yet.')
    input_shape = 2048
    rnn_state_size = 2048
    batch = 2

    timestep = 1

    (x_train, y_train), _ = testing_utils.get_test_data(
        train_samples=batch,
        test_samples=0,
        input_shape=(timestep, input_shape),
        num_classes=rnn_state_size,
        random_seed=random_seed.DEFAULT_GRAPH_SEED)

    timestep = 1

    (x2_train, y2_train), _ = testing_utils.get_test_data(
        train_samples=batch,
        test_samples=0,
        input_shape=(timestep, input_shape),
        num_classes=rnn_state_size,
        random_seed=random_seed.DEFAULT_GRAPH_SEED)

    inputs = keras.layers.Input(
        shape=[timestep, input_shape], dtype=dtypes.float32)
    masked_input = keras.layers.Masking()(inputs)

    with test_util.device(use_gpu=True):
      cudnn_layer = rnn.LSTM(rnn_state_size,
                             activation='tanh',
                             recurrent_activation='sigmoid',
                             recurrent_dropout=0,
                             unroll=False,
                             use_bias=True)
      cudnn_model = keras.models.Model(inputs, cudnn_layer(masked_input))

    cudnn_model.compile('rmsprop', 'mse')
    cudnn_model.fit(x_train, y_train)
    cudnn_model.fit(x2_train, y2_train)

With that, I can assert that it is going through the layers and reaches IsCompatibleWith:


IsCompatibleWith -> [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 1, 2, 2048] 
IsCompatibleWith -> [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 1, 2, 2048] 
IsCompatibleWith -> [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 1, 2, 2048] 
IsCompatibleWith -> [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 1, 2, 2048] 
IsCompatibleWith -> [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 1, 2, 2048] 
IsCompatibleWith -> [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 1, 2, 2048] 

However, changing timestep to be 74 on x_train and 75 on x2_train seems to bypass the GPU implementation, because I can't see any IsCompatibleWith output in the log but I see:

WARNING:tensorflow:Model was constructed with shape (None, 75, 2048) for input Tensor("input_1:0", shape=(None, 75, 2048), dtype=float32), but it was called on an input with incompatible shape (None, 74, 2048).
W0731 11:00:01.229415 139681924948352 functional.py:583] Model was constructed with shape (None, 75, 2048) for input Tensor("input_1:0", shape=(None, 75, 2048), dtype=float32), but it was called on an input with incompatible shape (None, 74, 2048).

Even directly calling CuDNNLSTM from tensorflow/python/keras/layers/cudnn_recurrent.py I'm blocked at Keras level when adding the 74/75 dimensional difference.

Forcing eager execution using @test_util.run_in_graph_and_eager_modes,

ValueError: Error when checking input: expected input_1 to have shape (75, 64) but got array with shape (74, 64)

And to the best of my knowledge, with current TensorFlow, since there is no more tensorflow.contrib Keras is the only way to build LSTM layers, right? So I'm unsure how to solve this :/

@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Aug 6, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Aug 6, 2020
@gbaned gbaned removed the ready to pull PR ready for merge process label Aug 6, 2020
@gbaned gbaned requested a review from sanjoy August 6, 2020 11:22
PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Aug 6, 2020
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Aug 6, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Aug 6, 2020
@gbaned gbaned added ready to pull PR ready for merge process and removed ready to pull PR ready for merge process labels Aug 7, 2020
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Aug 7, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Aug 7, 2020
@gbaned gbaned added ready to pull PR ready for merge process and removed ready to pull PR ready for merge process labels Aug 7, 2020
@tensorflow-copybara tensorflow-copybara merged commit 0fb8017 into tensorflow:master Aug 7, 2020
PR Queue automation moved this from Approved by Reviewer to Merged Aug 7, 2020
@lissyx
Copy link
Contributor Author

lissyx commented Aug 7, 2020

Can we hope for a 1.15.4 to include that fix ?

@sanjoy
Copy link
Contributor

sanjoy commented Aug 7, 2020

Can we hope for a 1.15.4 to include that fix ?

CC @goldiegadde

@lissyx
Copy link
Contributor Author

lissyx commented Aug 18, 2020

Can we hope for a 1.15.4 to include that fix ?

CC @goldiegadde

Gentle ping, can we hope for 1.15.4 or should we direct people hitting this issue to use the magic reset state env variable?

@lissyx
Copy link
Contributor Author

lissyx commented Aug 24, 2020

@sanjoy @kaixih @goldiegadde Gentle ping? There has been several commit pushed to r1.15, so it would really be nice to include this one if you are preparing a 1.15.4. I can send the PR if you'd like.

@goldiegadde
Copy link
Contributor

@lissyx Sorry for the delayed response, can you please open a PR against the 1.15 branch. cc @mihaimaruseac as well.

@mihaimaruseac
Copy link
Collaborator

Apologies, I missed this myself too. Yes, please, let's open a PR against the branch

@lissyx
Copy link
Contributor Author

lissyx commented Aug 24, 2020

@lissyx Sorry for the delayed response, can you please open a PR against the 1.15 branch. cc @mihaimaruseac as well.

#42634

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:core issues related to core part of tensorflow ready to pull PR ready for merge process size:XS CL Change Size: Extra Small
Projects
PR Queue
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

9 participants