You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I fine-tuned llama3-8b with Lora and followed the tutorial in the repository to convert the final result into model.pth. However, when I try to load the fine-tuned weights into the model using AutoModelForCausalLM.from_pretrained, I am unable to do so correctly. Below is my test:
But I found that the state_dict of torch.load doesn't equal to the model.state_dict(), as shown following:
torch.load:
model.state_dict()
I noticed that even though I passed the state_dict, from_pretrained still returns the weights of the model loaded by name. Did I make any mistakes in my code, and how can I solve this? Thanks!
The text was updated successfully, but these errors were encountered:
I can load the weight using the model.load_state_dict(), and then everything will go smoothly, but I really want to know why from_pretrained(state_dict=state_dict) can't work.
I could not reproduce it for another model yet when I gave it a quick try.
I am not sure if it's related because the differences are so big, but I wonder what the precision of the tensors in your current state dict are. Could you print the precision of the state dict, and could you also try to load it without torch_dtype=torch.float16?
EDIT: Nevermind, I can see that the precision is bfloat16 in your screenshot.
I fine-tuned llama3-8b with Lora and followed the tutorial in the repository to convert the final result into
model.pth
. However, when I try to load the fine-tuned weights into the model usingAutoModelForCausalLM.from_pretrained
, I am unable to do so correctly. Below is my test:But I found that the
state_dict
oftorch.load
doesn't equal to themodel.state_dict()
, as shown following:torch.load:
model.state_dict()
I noticed that even though I passed the
state_dict
,from_pretrained
still returns the weights of the model loaded by name. Did I make any mistakes in my code, and how can I solve this? Thanks!The text was updated successfully, but these errors were encountered: