TF: XLA generation not working properly in some models #17935

gante · 2022-06-29T12:00:15Z

anmolsjoshi · 2022-07-11T02:12:40Z

@gante do you require any help with this issue? Happy to contribute

gante · 2022-07-12T11:25:47Z

If you are comfortable with debugging XLA, absolutely :) My recommendation would be to pick a model from "Models failing complex tests" (the others might require significant architecture changes), and to start debugging. The number 1 suspect is always the position embeddings, which may not be handling the case where past is padded. Let me know if you are up to it, and which model would you like to take!

dsuess · 2022-07-25T14:32:03Z

Hi @gante, I did have a bit of a poke around. I think the complex tests all fail for the same reason: those models have a setting max_position_embeddings that is set to 20 by default during testing and which is too short for the “slow” tests. Here’s a simple fix for those: dsuess@4a3e271. I’ll give the other ones a shot now

JuheonChu · 2023-02-10T19:30:01Z

Hello @gante, may I ask if there is anything that I can contribute?

gante · 2023-02-11T14:16:20Z

Hi JuheonChu 👋 Actually yes! I have a few unchecked models at the top, but I wouldn't recommend spending time there unless you plan to use those architectures -- they are infrequently used.

However, two popular models are currently failing their XLA tests with beam search:

Marian
OPT

You can see the failing test if you install from main (pip install --upgrade git+https://github.com/huggingface/transformers.git) and run it e.g. for OPT NVIDIA_TF32_OVERRIDE=0 RUN_SLOW=1 py.test -vv tests/models/opt/test_modeling_tf_opt.py::TFOPTModelTest::test_xla_generate_slow

I haven't dived in yet, so I don't know what's the cause for the failure. You'll have to hop into debug mode and see what is breaking :)

JuheonChu · 2023-02-15T02:36:53Z

Can @katiele47 and I try working on them?

gante · 2023-02-15T11:20:52Z

@JuheonChu of course!

JuheonChu · 2023-02-17T16:09:00Z

@JuheonChu of course!
@gante Are we figuring out the cause of the testing failures based on the clues as follows?

gante · 2023-02-17T17:15:41Z

@JuheonChu yes. My suggestion would be to attempt to find where the numerical differences start from (between the XLA and the non-XLA version), using a debugger. Please note that you can't print variables with jit_compile=True, so you should set it to False. From there, the root cause is typically apparent.

Be warned, these sort of tasks sometimes are very time-consuming to complete :)

JuheonChu · 2023-02-17T18:50:30Z

@JuheonChu yes. My suggestion would be to attempt to find where the numerical differences start from (between the XLA and the non-XLA version), using a debugger. Please note that you can't print variables with jit_compile=True, so you should set it to False. From there, the root cause is typically apparent.

Be warned, these sort of tasks sometimes are very time-consuming to complete :)

Thank you very much for your valuable guidance! We will try and keep you updated!

katiele47 · 2023-02-22T04:22:19Z

Hi @gante, I've attempted to reproduce the failed XLA test on the OPT model using your suggested commands. The cause of error I had was somehow different from @JuheonChu's. Would you be able to verify if the following is the expected failing test output? If not, I assume it could be due to my local repo. Thanks!

soma2000-lang · 2023-03-04T15:14:01Z

@gante working on XLNet

gante added Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! TensorFlow Anything TensorFlow labels Jun 29, 2022

gante self-assigned this Jun 29, 2022

This was referenced Jun 30, 2022

TF: T5 can now handle a padded past (i.e. XLA generation) #17969

Merged

TF: GPT-J compatible with XLA generation #17986

Merged

dsuess mentioned this issue Jul 26, 2022

Fix failing tests for XLA generation in TF #18298

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TF: XLA generation not working properly in some models #17935

TF: XLA generation not working properly in some models #17935

gante commented Jun 29, 2022 •

edited

anmolsjoshi commented Jul 11, 2022

gante commented Jul 12, 2022

dsuess commented Jul 25, 2022 •

edited

JuheonChu commented Feb 10, 2023

gante commented Feb 11, 2023 •

edited

JuheonChu commented Feb 15, 2023

gante commented Feb 15, 2023

JuheonChu commented Feb 17, 2023 •

edited

gante commented Feb 17, 2023

JuheonChu commented Feb 17, 2023

katiele47 commented Feb 22, 2023

soma2000-lang commented Mar 4, 2023

TF: XLA generation not working properly in some models #17935

TF: XLA generation not working properly in some models #17935

Comments

gante commented Jun 29, 2022 • edited

Key model issues

Models failing basic tests

Models failing complex tests

anmolsjoshi commented Jul 11, 2022

gante commented Jul 12, 2022

dsuess commented Jul 25, 2022 • edited

JuheonChu commented Feb 10, 2023

gante commented Feb 11, 2023 • edited

JuheonChu commented Feb 15, 2023

gante commented Feb 15, 2023

JuheonChu commented Feb 17, 2023 • edited

gante commented Feb 17, 2023

JuheonChu commented Feb 17, 2023

katiele47 commented Feb 22, 2023

soma2000-lang commented Mar 4, 2023

gante commented Jun 29, 2022 •

edited

dsuess commented Jul 25, 2022 •

edited

gante commented Feb 11, 2023 •

edited

JuheonChu commented Feb 17, 2023 •

edited