Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

125m checkpoint outputting gibberish #73

Closed
patrickvonplaten opened this issue May 9, 2022 · 8 comments
Closed

125m checkpoint outputting gibberish #73

patrickvonplaten opened this issue May 9, 2022 · 8 comments
Labels
bug Something isn't working

Comments

@patrickvonplaten
Copy link
Contributor

patrickvonplaten commented May 9, 2022

Converting the sharded checkpoints of 125m to a singleton checkpoint with #60:

    $ ls 125m
    dict.txt
    gpt2-merges.txt
    gpt2-vocab.json
    reshard-model_part-0.pt
    reshard-model_part-1.pt
    $ python -m metaseq.scripts.convert_to_singleton 125m

gives a new

restored.pt

file.

I then transformed the checkpoint into the same format as 350m to test some generation on it:

import torch
orig_state = torch.load("./reshard-model_part-0.pt")
model = torch.load("./restored.pt")

orig_state["model"] = model  # this format allows one to use the standard `checkpoint_utils.load_model_ensemble_and_task` function
orig_state["cfg"]["model"]._name = "transformer_lm"  # we change the architecture name to "transformer_lm" to be able to run it in a non-CUDA environment
torch.save(orig_state, "./reshard.pt")

I tried running an inference example on the model to see whether the generation works as expected.
Here the code:

import os

from transformers import GPT2Tokenizer
from metaseq import checkpoint_utils
import torch

path = "/home/patrick/add_opt"

"""
$ ls path
vocab.json
merges.txt
reshard.pt
"""


tokenizer = GPT2Tokenizer.from_pretrained("patrickvonplaten/opt_gpt2_tokenizer")
tokenizer.save_pretrained(path)

paths = [os.path.join(path, "reshard.pt")]

checkpoint = checkpoint_utils.load_model_ensemble_and_task(
    paths,
    arg_overrides={
        "vocab_filename": os.path.join(path, "vocab.json"),
        "merges_filename": os.path.join(path, "merges.txt"),
    }
)

model = checkpoint[0][0].eval()


# forward passes
def single_batch_forward_logits(prompts):
    input_ids = tokenizer(prompts, return_tensors="pt").input_ids
    input_ids = torch.cat([torch.tensor([[2]]), input_ids], dim=-1)
    logits = model(input_ids)[0]
    return logits


prompts = [
    "Today is a beautiful day and I want to",
    "In the city of",
    "Paris is the capital of France and",
    "Computers and mobile phones have taken",
]


print("Next word generation")
for prompt in prompts:
    print("-------------")
    print(f"Prompt: {prompt}...\n")
    logits = single_batch_forward_logits(prompt)
    pred_next_token = torch.argmax(logits[0, -1], -1)
    next_token = tokenizer.convert_ids_to_tokens([pred_next_token])
    next_token = next_token[0].replace("Ġ", "")
    print(f"Next word: {next_token}")
    print("-------------")

This sadly gives gibberish:

Next word generation
-------------
Prompt: Today is a beautiful day and I want to...

Next word: Robbins
-------------
-------------
Prompt: In the city of...

Next word: of
-------------
-------------
Prompt: Paris is the capital of France and...

Next word: Robbins
-------------
-------------
Prompt: Computers and mobile phones have taken...

Next word: Robbins
-------------

Note that this script works perfectly fine with the 350m checkpoint.

@stephenroller - any ideas?

@patrickvonplaten
Copy link
Contributor Author

This holds also true when running the generation directly in the conversion script of #60

@patrickvonplaten
Copy link
Contributor Author

The same holds true for the 1B3 checkpoint when using #72

Any ideas what could be the problem @stephenroller @suchenzang ?

Do you have a script that allows to use 125m or 1B3 in inference?

@Fengwills
Copy link

@patrickvonplaten hello, thanks for your share, can share the dict.txt? how to get it ?
can #60 be used on 350M model?
many thanks

@mrseeker
Copy link

mrseeker commented May 9, 2022

@patrickvonplaten I have a conversion script that our team used to convert the fairseq models to XGLM. Can use it to test it out if that works? I am stuck on the script though, since it requires 2 GPU instances(?) to fully function.

@stephenroller
Copy link
Contributor

Will try this in metaseq today.

@patrickvonplaten
Copy link
Contributor Author

@mrseeker,

If it helps here is the "converted" fsq checkpoint . You should be able to run this on CPU

@patrickvonplaten
Copy link
Contributor Author

For the conversion we use multi-GPU machines though

@suchenzang
Copy link
Contributor

Seems to be addressed in huggingface/transformers#17088 - feel free to re-open this issue if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants