New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TF: XLA generation not working properly in some models #17935
Comments
@gante do you require any help with this issue? Happy to contribute |
Hi @anmolsjoshi 👋 If you are comfortable with debugging XLA, absolutely :) My recommendation would be to pick a model from "Models failing complex tests" (the others might require significant architecture changes), and to start debugging. The number 1 suspect is always the position embeddings, which may not be handling the case where |
Hi @gante, I did have a bit of a poke around. I think the complex tests all fail for the same reason: those models have a setting |
Hello @gante, may I ask if there is anything that I can contribute? |
Hi JuheonChu 👋 Actually yes! I have a few unchecked models at the top, but I wouldn't recommend spending time there unless you plan to use those architectures -- they are infrequently used. However, two popular models are currently failing their XLA tests with beam search:
You can see the failing test if you install from I haven't dived in yet, so I don't know what's the cause for the failure. You'll have to hop into debug mode and see what is breaking :) |
Can @katiele47 and I try working on them? |
@JuheonChu of course! |
|
@JuheonChu yes. My suggestion would be to attempt to find where the numerical differences start from (between the XLA and the non-XLA version), using a debugger. Please note that you can't print variables with Be warned, these sort of tasks sometimes are very time-consuming to complete :) |
Thank you very much for your valuable guidance! We will try and keep you updated! |
Hi @gante, I've attempted to reproduce the failed XLA test on the OPT model using your suggested commands. The cause of error I had was somehow different from @JuheonChu's. Would you be able to verify if the following is the expected failing test output? If not, I assume it could be due to my local repo. Thanks! |
@gante working on XLNet |
This issue is used to track TensorFlow XLA generation issues, arising from #17857. There are three categories of issues, sorted in descending order by severity:
Key model issues
These are heavily-used models, whose quality should be prioritized.
max_length
. See here.Models failing basic tests
These models are failing
test_xla_generate_fast
-- a short greedy generation.Models failing complex tests
These are models failing
test_xla_generate_slow
-- a long beam search generation.The text was updated successfully, but these errors were encountered: