125m checkpoint outputting gibberish #73

patrickvonplaten · 2022-05-09T09:23:24Z

Converting the sharded checkpoints of 125m to a singleton checkpoint with #60:

    $ ls 125m
    dict.txt
    gpt2-merges.txt
    gpt2-vocab.json
    reshard-model_part-0.pt
    reshard-model_part-1.pt
    $ python -m metaseq.scripts.convert_to_singleton 125m

gives a new

restored.pt

file.

I then transformed the checkpoint into the same format as 350m to test some generation on it:

import torch
orig_state = torch.load("./reshard-model_part-0.pt")
model = torch.load("./restored.pt")

orig_state["model"] = model  # this format allows one to use the standard `checkpoint_utils.load_model_ensemble_and_task` function
orig_state["cfg"]["model"]._name = "transformer_lm"  # we change the architecture name to "transformer_lm" to be able to run it in a non-CUDA environment
torch.save(orig_state, "./reshard.pt")

I tried running an inference example on the model to see whether the generation works as expected.
Here the code:

import os

from transformers import GPT2Tokenizer
from metaseq import checkpoint_utils
import torch

path = "/home/patrick/add_opt"

"""
$ ls path
vocab.json
merges.txt
reshard.pt
"""


tokenizer = GPT2Tokenizer.from_pretrained("patrickvonplaten/opt_gpt2_tokenizer")
tokenizer.save_pretrained(path)

paths = [os.path.join(path, "reshard.pt")]

checkpoint = checkpoint_utils.load_model_ensemble_and_task(
    paths,
    arg_overrides={
        "vocab_filename": os.path.join(path, "vocab.json"),
        "merges_filename": os.path.join(path, "merges.txt"),
    }
)

model = checkpoint[0][0].eval()


# forward passes
def single_batch_forward_logits(prompts):
    input_ids = tokenizer(prompts, return_tensors="pt").input_ids
    input_ids = torch.cat([torch.tensor([[2]]), input_ids], dim=-1)
    logits = model(input_ids)[0]
    return logits


prompts = [
    "Today is a beautiful day and I want to",
    "In the city of",
    "Paris is the capital of France and",
    "Computers and mobile phones have taken",
]


print("Next word generation")
for prompt in prompts:
    print("-------------")
    print(f"Prompt: {prompt}...\n")
    logits = single_batch_forward_logits(prompt)
    pred_next_token = torch.argmax(logits[0, -1], -1)
    next_token = tokenizer.convert_ids_to_tokens([pred_next_token])
    next_token = next_token[0].replace("Ġ", "")
    print(f"Next word: {next_token}")
    print("-------------")

This sadly gives gibberish:

Next word generation
-------------
Prompt: Today is a beautiful day and I want to...

Next word: Robbins
-------------
-------------
Prompt: In the city of...

Next word: of
-------------
-------------
Prompt: Paris is the capital of France and...

Next word: Robbins
-------------
-------------
Prompt: Computers and mobile phones have taken...

Next word: Robbins
-------------

Note that this script works perfectly fine with the 350m checkpoint.

@stephenroller - any ideas?

patrickvonplaten · 2022-05-09T09:48:30Z

This holds also true when running the generation directly in the conversion script of #60

patrickvonplaten · 2022-05-09T10:03:19Z

The same holds true for the 1B3 checkpoint when using #72

Any ideas what could be the problem @stephenroller @suchenzang ?

Do you have a script that allows to use 125m or 1B3 in inference?

Fengwills · 2022-05-09T11:53:09Z

@patrickvonplaten hello, thanks for your share, can share the dict.txt? how to get it ?
can #60 be used on 350M model?
many thanks

mrseeker · 2022-05-09T12:36:07Z

@patrickvonplaten I have a conversion script that our team used to convert the fairseq models to XGLM. Can use it to test it out if that works? I am stuck on the script though, since it requires 2 GPU instances(?) to fully function.

stephenroller · 2022-05-09T13:08:53Z

Will try this in metaseq today.

patrickvonplaten · 2022-05-09T13:25:36Z

@mrseeker,

If it helps here is the "converted" fsq checkpoint . You should be able to run this on CPU

patrickvonplaten · 2022-05-09T13:25:47Z

For the conversion we use multi-GPU machines though

suchenzang · 2022-05-12T05:52:56Z

Seems to be addressed in huggingface/transformers#17088 - feel free to re-open this issue if not.

patrickvonplaten added the bug Something isn't working label May 9, 2022

patrickvonplaten mentioned this issue May 9, 2022

[scripts] Convert resharded MP checkpoints to unflattened. #60

Merged

patrickvonplaten mentioned this issue May 10, 2022

No clear way to load models #78

Open

Mrs-Hudson mentioned this issue May 10, 2022

assert key_padding_mask.size(1) == src_len in 350M model #89

Open

suchenzang closed this as completed May 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

125m checkpoint outputting gibberish #73

125m checkpoint outputting gibberish #73

patrickvonplaten commented May 9, 2022 •

edited

patrickvonplaten commented May 9, 2022

patrickvonplaten commented May 9, 2022

Fengwills commented May 9, 2022

mrseeker commented May 9, 2022

stephenroller commented May 9, 2022

patrickvonplaten commented May 9, 2022

patrickvonplaten commented May 9, 2022

suchenzang commented May 12, 2022

125m checkpoint outputting gibberish #73

125m checkpoint outputting gibberish #73

Comments

patrickvonplaten commented May 9, 2022 • edited

patrickvonplaten commented May 9, 2022

patrickvonplaten commented May 9, 2022

Fengwills commented May 9, 2022

mrseeker commented May 9, 2022

stephenroller commented May 9, 2022

patrickvonplaten commented May 9, 2022

patrickvonplaten commented May 9, 2022

suchenzang commented May 12, 2022

patrickvonplaten commented May 9, 2022 •

edited