Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't run (TF)BartForConditionalGeneration.generation on GPU, it's speed very very very slow #17411

Closed
2 of 4 tasks
TheHonestBob opened this issue May 25, 2022 · 9 comments
Closed
2 of 4 tasks
Assignees

Comments

@TheHonestBob
Copy link

System Info

transformers==4.19
tensorflow-gpu==2.3
torch==1.11

Who can help?

@patil-suraj@patrickvonplaten, @Narsil, @gante@Rocketknight1

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import BertTokenizer, TFBartForConditionalGeneration
tokenizer = BertTokenizer.from_pretrained("fnlp/bart-base-chinese")
model = TFBartForConditionalGeneration.from_pretrained("fnlp/bart-base-chinese", from_pt=True)
batch_data = ['北京是[MASK]的首都']*64
for i in range(20):
batch_dict = tokenizer.batch_encode_plus(batch_data, return_token_type_ids=False, return_tensors='tf')
result = model.generate(**batch_dict, max_length=20)
result = tokenizer.batch_decode(result, skip_special_tokens=True)
print(result)

Expected behavior

1 . when I run CUDA_VISIBLE_DEVICES=1 python test.py, GPU's memory is used,GPU utilization is almost zero,and generate speed is very very very slow, cpu utilization is 100%.
2. when i replace TFBartForConditionalGeneration with BartForConditionalGeneration, GPU's memory is used,
GPU utilization is almost zero,cpu utilization greater than 100%,speed is normal,but it mean that, generate is on cpu not GPU.
@gante gante self-assigned this May 25, 2022
@gante gante removed the bug label May 25, 2022
@gante
Copy link
Member

gante commented May 25, 2022

Hey @TheHonestBob 👋 We are aware of the generate speed problems with TensorFlow, and will be releasing an update very soon. It is not a bug, but rather how Eager Execution works, sadly. Stay tuned 🤞

@TheHonestBob
Copy link
Author

Hey @TheHonestBob 👋 We are aware of the generate speed problems with TensorFlow, and will be releasing an update very soon. It is not a bug, but rather how Eager Execution works, sadly. Stay tuned 🤞

thanks for your reply,what can I do before update to solve it.

@gante
Copy link
Member

gante commented May 26, 2022

My advice would be to go with the PyTorch version, if performance is a bottleneck to you and you need something working in the next ~2 weeks. If you can afford to wait ~2 weeks, then you can have a look at the guides we are writing up at the moment :)

@TheHonestBob
Copy link
Author

My advice would be to go with the PyTorch version, if performance is a bottleneck to you and you need something working in the next ~2 weeks. If you can afford to wait ~2 weeks, then you can have a look at the guides we are writing up at the moment :)

OK, I will continue to pay attention no it

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@gante
Copy link
Member

gante commented Jun 24, 2022

@TheHonestBob -- some of the functionality to speed up has been merged recently. If you try running a modified version of your script and you have a GPU, you will see it is much much faster.

import tensorflow as tf
from transformers import BertTokenizer, TFBartForConditionalGeneration
tokenizer = BertTokenizer.from_pretrained("fnlp/bart-base-chinese")
model = TFBartForConditionalGeneration.from_pretrained("fnlp/bart-base-chinese", from_pt=True)
batch_data = ['北京是[MASK]的首都']*64
xla_generate = tf.function(model.generate, jit_compile=True)
for i in range(20):
    batch_dict = tokenizer.batch_encode_plus(batch_data, return_token_type_ids=False, return_tensors='tf')
    result = xla_generate(**batch_dict, max_length=20, no_repeat_ngram_size=0, num_beams=1)
    result = tokenizer.batch_decode(result, skip_special_tokens=True)
    print(result)

To enable bigger values of num_beams, which should increase the quality of the generation, this PR has to be merged first :)

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@gante
Copy link
Member

gante commented Aug 1, 2022

@TheHonestBob The newest release (v4.21) fixes this issue. Check our recent blog post -- https://huggingface.co/blog/tf-xla-generate

@TheHonestBob
Copy link
Author

@TheHonestBob The newest release (v4.21) fixes this issue. Check our recent blog post -- https://huggingface.co/blog/tf-xla-generate

thanks a lot, I'll try it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants