Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'NoneType' object has no attribute 'from_pretrained' #8864

Closed
louisabraham opened this issue Dec 1, 2020 · 6 comments · Fixed by #8881
Closed

AttributeError: 'NoneType' object has no attribute 'from_pretrained' #8864

louisabraham opened this issue Dec 1, 2020 · 6 comments · Fixed by #8881

Comments

@louisabraham
Copy link

This code was working yesterday but doesn't work today:

from transformers import AutoTokenizer
AutoTokenizer("Helsinki-NLP/opus-mt-en-fr")
@jacampo
Copy link

jacampo commented Dec 1, 2020

Same here a couple of hours ago

@LysandreJik
Copy link
Member

LysandreJik commented Dec 1, 2020

  1. Hi, could you please provide the information related to your environment?

  2. When you say it was working yesterday but was working before, do you mean to say you've upgraded to version v4.0.0 released yesterday? If this is so, you may be obtaining the following error message: AttributeError: 'NoneType' object has no attribute 'from_pretrained'. This would be because you do not have sentencepiece installed.

  3. Are you sure this worked previously? This should never have worked, as AutoTokenizer cannot be initialized like this, but has to be instantiated from the from_pretrained method:

from transformers import AutoTokenizer
AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-fr")

which works on v4.0.0 and on master, as long as you have SentencePiece installed.

@LysandreJik
Copy link
Member

Putting a better error message in #8881.

@louisabraham
Copy link
Author

louisabraham commented Dec 2, 2020

Right, I was using

AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-fr")

Thanks, pip install sentencepiece fixed the issue!

It looks that previously the tokenizer outputted torch tensors and now lists. Is this intended? It breaks existing code.

@LysandreJik
Copy link
Member

Yes, this was a bug. Tokenizers are framework-agnostic and should not output a specific framework's tensor. The implementation of the Marian tokenizer was not respecting the API in that regard.

Tokenizers can still handle torch tensors, you need to specify that you want them though:

tokenizer(xxx, return_tensors="pt")

I guess in your situation it has to do with the prepare_seq2seq_batch:

tokenizer.prepare_seq2seq_batch(xxx, return_tensors="pt")

@louisabraham
Copy link
Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants