Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPR AutoModel loading incorrect architecture for DPRContextEncoders #13670

Closed
joshdevins opened this issue Sep 21, 2021 · 9 comments · Fixed by #13796
Closed

DPR AutoModel loading incorrect architecture for DPRContextEncoders #13670

joshdevins opened this issue Sep 21, 2021 · 9 comments · Fixed by #13796

Comments

@joshdevins
Copy link
Contributor

joshdevins commented Sep 21, 2021

Environment info

  • transformers version: 4.10.2
  • Platform: Darwin-20.6.0-x86_64-i386-64bit
  • Python version: 3.7.7
  • PyTorch version (GPU?): 1.9.0 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

Who can help

Model type dpr: @LysandreJik @patrickvonplaten @lhoestq

Information

Model I am using:

To reproduce

Loading a DPR context encoder DPRContextEncoder using AutoModel.from_pretrained is actually loading DPRQuestionEncoder instead, and later fails.

Steps to reproduce the behavior:

AutoModel.from_pretrained('facebook/dpr-ctx_encoder-single-nq-base')

File "venv/lib/python3.7/site-packages/transformers/modeling_utils.py", line 579, in _init_weights
    raise NotImplementedError(f"Make sure `_init_weigths` is implemented for {self.__class__}")
NotImplementedError: Make sure `_init_weigths` is implemented for <class 'transformers.models.dpr.modeling_dpr.DPRQuestionEncoder'>

Note in the above that it's trying to use the DPRQuestionEncoder even though the config for this context encoder is correct and points to architecture=DPRContextEncoder.

Using explicitly the DPRContextEncoder.from_pretrained works just fine, so it looks like this is somewhere in AutoModel.

DPRContextEncoder.from_pretrained('facebook/dpr-ctx_encoder-single-nq-base')

Expected behavior

Using AutoModel.from_pretrained should pick the correct architecture for a DPRContextEncoder.

@qqaatw
Copy link
Contributor

qqaatw commented Sep 23, 2021

Unfortunately, AutoModel and its variants currently only support 1-to-1 model mapping according to the model name e.g. DPR. So in this case, the one model that maps to AutoModel is DPRQuestionEncoder

@joshdevins
Copy link
Contributor Author

Ok, that kind of makes sense 🙃 Is there an easy way to change that or DPR models so it also looks at the architecture in the config?

@qqaatw
Copy link
Contributor

qqaatw commented Sep 24, 2021

To the best of my knowledge, this would be a major change of auto factory because the mapping file defines all Auto- models all together, not for each specific model. Only modifying DPR-related models might break the consistency of them.

@patrickvonplaten patrickvonplaten linked a pull request Sep 29, 2021 that will close this issue
@patrickvonplaten
Copy link
Contributor

@joshdevins - could you check whether the PR linked above solves the issue?

@joshdevins
Copy link
Contributor Author

joshdevins commented Sep 30, 2021

@patrickvonplaten Sorry, I realise now that there are two problems. Your PR fixes the problem that they didn't implement _init_weights, so that error is now gone. The AutoModel problem is still that AutoModel.load_pretrained is selecting DPRQueryEncoder even when the model architecture (as specified also in the config.json) is actually DPRContextEncoder.

import torch
import transformers

model_id = "facebook/dpr-ctx_encoder-single-nq-base"
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
input_ids = tokenizer("This is an example sentence.", return_tensors="pt")["input_ids"]

auto_model = transformers.AutoModel.from_pretrained(model_id)
context_model = transformers.DPRContextEncoder.from_pretrained(model_id)

auto_output = auto_model(input_ids)
context_output = context_model(input_ids)
> type(auto_model)
transformers.models.dpr.modeling_dpr.DPRQuestionEncoder

> type(context_model)
transformers.models.dpr.modeling_dpr.DPRContextEncoder

> torch.all(torch.eq(auto_output["pooler_output"], context_output["pooler_output"]))
tensor(False)

@joshdevins
Copy link
Contributor Author

joshdevins commented Sep 30, 2021

Note that my workaround is basically this 🤷

config = AutoConfig.from_pretrained(model_id)
getattr(transformers, config.architectures[0]).from_pretrained(model_id)

@patrickvonplaten
Copy link
Contributor

@joshdevins - ah yeah I think we can't really do anything against the second problem the way it is implemented now...maybe it might makes sense to implement a AutoModel.from_pretrained(...) that relies on config.architectures in the future...

@joshdevins
Copy link
Contributor Author

I guess that makes sense. I wonder if this is the only model that has this scenario? It seems the way sentence-transformers does things also makes sense. They have a second config containing all the pooling and normalization layers after the transformer.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this as completed Nov 3, 2021
joshdevins added a commit to elastic/eland that referenced this issue Dec 6, 2021
In preparation for an 8.0 release, this updates PyTorch NLP dependencies
to more recent and latest minor versions. Amongst other things, this
introduces a fix from transformers that is helpful for text embedding
tasks with certain DPR models.

See: huggingface/transformers#13670
joshdevins added a commit to elastic/eland that referenced this issue Dec 6, 2021
In preparation for an 8.0 release, this updates PyTorch NLP dependencies
to more recent and latest minor versions. Amongst other things, this
introduces a fix from transformers that is helpful for text embedding
tasks with certain DPR models.

See: huggingface/transformers#13670
sethmlarson added a commit to elastic/eland that referenced this issue Dec 6, 2021
In preparation for an 8.0 release, this updates PyTorch NLP dependencies
to more recent and latest minor versions. Amongst other things, this
introduces a fix from transformers that is helpful for text embedding
tasks with certain DPR models.

See: huggingface/transformers#13670

Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>
RayDev1988 added a commit to RayDev1988/Python_Elastic- that referenced this issue Mar 11, 2022
In preparation for an 8.0 release, this updates PyTorch NLP dependencies
to more recent and latest minor versions. Amongst other things, this
introduces a fix from transformers that is helpful for text embedding
tasks with certain DPR models.

See: huggingface/transformers#13670

Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants