can't run (TF)BartForConditionalGeneration.generation on GPU, it's speed very very very slow #17411

TheHonestBob · 2022-05-25T08:02:49Z

System Info

transformers==4.19
tensorflow-gpu==2.3
torch==1.11

Who can help?

@patil-suraj@patrickvonplaten, @Narsil, @gante@Rocketknight1

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

from transformers import BertTokenizer, TFBartForConditionalGeneration
tokenizer = BertTokenizer.from_pretrained("fnlp/bart-base-chinese")
model = TFBartForConditionalGeneration.from_pretrained("fnlp/bart-base-chinese", from_pt=True)
batch_data = ['北京是[MASK]的首都']*64
for i in range(20):
batch_dict = tokenizer.batch_encode_plus(batch_data, return_token_type_ids=False, return_tensors='tf')
result = model.generate(**batch_dict, max_length=20)
result = tokenizer.batch_decode(result, skip_special_tokens=True)
print(result)

Expected behavior

1 . when I run CUDA_VISIBLE_DEVICES=1 python test.py, GPU's memory is used,GPU utilization is almost zero，and generate speed is very very very slow, cpu utilization is 100%.
2. when i replace TFBartForConditionalGeneration with BartForConditionalGeneration, GPU's memory is used,
GPU utilization is almost zero，cpu utilization greater than 100%，speed is normal,but it mean that, generate is on cpu not GPU.

The text was updated successfully, but these errors were encountered:

gante · 2022-05-25T12:24:58Z

Hey @TheHonestBob 👋 We are aware of the generate speed problems with TensorFlow, and will be releasing an update very soon. It is not a bug, but rather how Eager Execution works, sadly. Stay tuned 🤞

TheHonestBob · 2022-05-26T01:35:29Z

Hey @TheHonestBob 👋 We are aware of the generate speed problems with TensorFlow, and will be releasing an update very soon. It is not a bug, but rather how Eager Execution works, sadly. Stay tuned 🤞

thanks for your reply，what can I do before update to solve it.

gante · 2022-05-26T11:20:07Z

My advice would be to go with the PyTorch version, if performance is a bottleneck to you and you need something working in the next ~2 weeks. If you can afford to wait ~2 weeks, then you can have a look at the guides we are writing up at the moment :)

TheHonestBob · 2022-05-27T02:40:34Z

My advice would be to go with the PyTorch version, if performance is a bottleneck to you and you need something working in the next ~2 weeks. If you can afford to wait ~2 weeks, then you can have a look at the guides we are writing up at the moment :)

OK, I will continue to pay attention no it

github-actions · 2022-06-24T15:01:46Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

gante · 2022-06-24T16:12:45Z

@TheHonestBob -- some of the functionality to speed up has been merged recently. If you try running a modified version of your script and you have a GPU, you will see it is much much faster.

import tensorflow as tf
from transformers import BertTokenizer, TFBartForConditionalGeneration
tokenizer = BertTokenizer.from_pretrained("fnlp/bart-base-chinese")
model = TFBartForConditionalGeneration.from_pretrained("fnlp/bart-base-chinese", from_pt=True)
batch_data = ['北京是[MASK]的首都']*64
xla_generate = tf.function(model.generate, jit_compile=True)
for i in range(20):
    batch_dict = tokenizer.batch_encode_plus(batch_data, return_token_type_ids=False, return_tensors='tf')
    result = xla_generate(**batch_dict, max_length=20, no_repeat_ngram_size=0, num_beams=1)
    result = tokenizer.batch_decode(result, skip_special_tokens=True)
    print(result)

To enable bigger values of num_beams, which should increase the quality of the generation, this PR has to be merged first :)

github-actions · 2022-07-19T15:02:04Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

gante · 2022-08-01T21:08:27Z

@TheHonestBob The newest release (v4.21) fixes this issue. Check our recent blog post -- https://huggingface.co/blog/tf-xla-generate

TheHonestBob · 2022-08-08T09:34:36Z

@TheHonestBob The newest release (v4.21) fixes this issue. Check our recent blog post -- https://huggingface.co/blog/tf-xla-generate

thanks a lot, I'll try it

TheHonestBob added the bug label May 25, 2022

gante self-assigned this May 25, 2022

gante removed the bug label May 25, 2022

github-actions bot closed this as completed Jul 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can't run (TF)BartForConditionalGeneration.generation on GPU, it's speed very very very slow #17411

can't run (TF)BartForConditionalGeneration.generation on GPU, it's speed very very very slow #17411

TheHonestBob commented May 25, 2022

gante commented May 25, 2022

TheHonestBob commented May 26, 2022

gante commented May 26, 2022

TheHonestBob commented May 27, 2022

github-actions bot commented Jun 24, 2022

gante commented Jun 24, 2022

github-actions bot commented Jul 19, 2022

gante commented Aug 1, 2022 •

edited

TheHonestBob commented Aug 8, 2022

can't run (TF)BartForConditionalGeneration.generation on GPU, it's speed very very very slow #17411

can't run (TF)BartForConditionalGeneration.generation on GPU, it's speed very very very slow #17411

Comments

TheHonestBob commented May 25, 2022

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

gante commented May 25, 2022

TheHonestBob commented May 26, 2022

gante commented May 26, 2022

TheHonestBob commented May 27, 2022

github-actions bot commented Jun 24, 2022

gante commented Jun 24, 2022

github-actions bot commented Jul 19, 2022

gante commented Aug 1, 2022 • edited

TheHonestBob commented Aug 8, 2022

gante commented Aug 1, 2022 •

edited