-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama Model throwing "RuntimeError: expected scalar type BFloat16 but found Float" when using torch.compile and AMP together #30945
Comments
Hi @JackCai1206 I ran your script but didn't encounter the error that you mentioned for |
When I run nvidia-smi I get |
@JackCai1206 There are two main APIs of CUDA, the runtime and the driver. The nvidia CUDA version you have posted is the driver API version and what we have with pytorch is the runtime API one which we get after cuda toolkit gets installed automatically with |
Hi, thanks for the explanation! This is the output of pip list
and torch cuda version
|
also cc @gante |
@JackCai1206 Oh! I see. What i found could be the reason for the error is this line in If you want to use autocast then an alternative trial could be to use |
Sounds good. Yeah i think a warning message there could be useful. |
System Info
transformers 4.41.0
torch 2.3.0
GPU: NVIDIA GeForce RTX 4090, CUDA version 12.3
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
Running the code snippet above gives me the following error
This problem does not seem to happen for a GPT2 model. If I initialize the GPT2Config instead of LlamaConfig in the commented code in the script, there is no such error.
The text was updated successfully, but these errors were encountered: