New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
125m checkpoint outputting gibberish #73
Comments
This holds also true when running the generation directly in the conversion script of #60 |
The same holds true for the 1B3 checkpoint when using #72 Any ideas what could be the problem @stephenroller @suchenzang ? Do you have a script that allows to use 125m or 1B3 in inference? |
@patrickvonplaten hello, thanks for your share, can share the dict.txt? how to get it ? |
@patrickvonplaten I have a conversion script that our team used to convert the fairseq models to XGLM. Can use it to test it out if that works? I am stuck on the script though, since it requires 2 GPU instances(?) to fully function. |
Will try this in metaseq today. |
For the conversion we use multi-GPU machines though |
Seems to be addressed in huggingface/transformers#17088 - feel free to re-open this issue if not. |
Converting the sharded checkpoints of 125m to a singleton checkpoint with #60:
gives a new
file.
I then transformed the checkpoint into the same format as 350m to test some generation on it:
I tried running an inference example on the model to see whether the generation works as expected.
Here the code:
This sadly gives gibberish:
Note that this script works perfectly fine with the 350m checkpoint.
@stephenroller - any ideas?
The text was updated successfully, but these errors were encountered: