DPR AutoModel loading incorrect architecture for DPRContextEncoders #13670

joshdevins · 2021-09-21T11:10:29Z

Environment info

transformers version: 4.10.2
Platform: Darwin-20.6.0-x86_64-i386-64bit
Python version: 3.7.7
PyTorch version (GPU?): 1.9.0 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help

Model type dpr: @LysandreJik @patrickvonplaten @lhoestq

Information

Model I am using:

To reproduce

Loading a DPR context encoder DPRContextEncoder using AutoModel.from_pretrained is actually loading DPRQuestionEncoder instead, and later fails.

Steps to reproduce the behavior:

AutoModel.from_pretrained('facebook/dpr-ctx_encoder-single-nq-base')

File "venv/lib/python3.7/site-packages/transformers/modeling_utils.py", line 579, in _init_weights
    raise NotImplementedError(f"Make sure `_init_weigths` is implemented for {self.__class__}")
NotImplementedError: Make sure `_init_weigths` is implemented for <class 'transformers.models.dpr.modeling_dpr.DPRQuestionEncoder'>

Note in the above that it's trying to use the DPRQuestionEncoder even though the config for this context encoder is correct and points to architecture=DPRContextEncoder.

Using explicitly the DPRContextEncoder.from_pretrained works just fine, so it looks like this is somewhere in AutoModel.

DPRContextEncoder.from_pretrained('facebook/dpr-ctx_encoder-single-nq-base')

Expected behavior

Using AutoModel.from_pretrained should pick the correct architecture for a DPRContextEncoder.

The text was updated successfully, but these errors were encountered:

qqaatw · 2021-09-23T23:59:41Z

Unfortunately, AutoModel and its variants currently only support 1-to-1 model mapping according to the model name e.g. DPR. So in this case, the one model that maps to AutoModel is DPRQuestionEncoder

joshdevins · 2021-09-24T08:29:44Z

Ok, that kind of makes sense 🙃 Is there an easy way to change that or DPR models so it also looks at the architecture in the config?

qqaatw · 2021-09-24T09:50:42Z

To the best of my knowledge, this would be a major change of auto factory because the mapping file defines all Auto- models all together, not for each specific model. Only modifying DPR-related models might break the consistency of them.

patrickvonplaten · 2021-09-29T17:54:30Z

@joshdevins - could you check whether the PR linked above solves the issue?

joshdevins · 2021-09-30T13:26:45Z

@patrickvonplaten Sorry, I realise now that there are two problems. Your PR fixes the problem that they didn't implement _init_weights, so that error is now gone. The AutoModel problem is still that AutoModel.load_pretrained is selecting DPRQueryEncoder even when the model architecture (as specified also in the config.json) is actually DPRContextEncoder.

import torch
import transformers

model_id = "facebook/dpr-ctx_encoder-single-nq-base"
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
input_ids = tokenizer("This is an example sentence.", return_tensors="pt")["input_ids"]

auto_model = transformers.AutoModel.from_pretrained(model_id)
context_model = transformers.DPRContextEncoder.from_pretrained(model_id)

auto_output = auto_model(input_ids)
context_output = context_model(input_ids)

> type(auto_model)
transformers.models.dpr.modeling_dpr.DPRQuestionEncoder

> type(context_model)
transformers.models.dpr.modeling_dpr.DPRContextEncoder

> torch.all(torch.eq(auto_output["pooler_output"], context_output["pooler_output"]))
tensor(False)

joshdevins · 2021-09-30T13:29:13Z

Note that my workaround is basically this 🤷

config = AutoConfig.from_pretrained(model_id)
getattr(transformers, config.architectures[0]).from_pretrained(model_id)

patrickvonplaten · 2021-09-30T17:01:49Z

@joshdevins - ah yeah I think we can't really do anything against the second problem the way it is implemented now...maybe it might makes sense to implement a AutoModel.from_pretrained(...) that relies on config.architectures in the future...

joshdevins · 2021-10-01T08:47:48Z

I guess that makes sense. I wonder if this is the only model that has this scenario? It seems the way sentence-transformers does things also makes sense. They have a second config containing all the pooling and normalization layers after the transformer.

github-actions · 2021-10-25T15:06:35Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

In preparation for an 8.0 release, this updates PyTorch NLP dependencies to more recent and latest minor versions. Amongst other things, this introduces a fix from transformers that is helpful for text embedding tasks with certain DPR models. See: huggingface/transformers#13670

In preparation for an 8.0 release, this updates PyTorch NLP dependencies to more recent and latest minor versions. Amongst other things, this introduces a fix from transformers that is helpful for text embedding tasks with certain DPR models. See: huggingface/transformers#13670 Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>

joshdevins mentioned this issue Sep 21, 2021

Incorrect architecture specified for ANCE DPR context encoder model castorini/pyserini#769

Closed

patrickvonplaten linked a pull request Sep 29, 2021 that will close this issue

[DPR] Correct init #13796

Merged

patrickvonplaten closed this as completed in #13796 Sep 30, 2021

patrickvonplaten reopened this Sep 30, 2021

github-actions bot closed this as completed Nov 3, 2021

joshdevins mentioned this issue Dec 6, 2021

Upgrade PyTorch dependencies to latest elastic/eland#417

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPR AutoModel loading incorrect architecture for DPRContextEncoders #13670

DPR AutoModel loading incorrect architecture for DPRContextEncoders #13670

joshdevins commented Sep 21, 2021 •

edited by LysandreJik

qqaatw commented Sep 23, 2021

joshdevins commented Sep 24, 2021

qqaatw commented Sep 24, 2021 •

edited

patrickvonplaten commented Sep 29, 2021

joshdevins commented Sep 30, 2021 •

edited

joshdevins commented Sep 30, 2021 •

edited

patrickvonplaten commented Sep 30, 2021

joshdevins commented Oct 1, 2021

github-actions bot commented Oct 25, 2021

DPR AutoModel loading incorrect architecture for DPRContextEncoders #13670

DPR AutoModel loading incorrect architecture for DPRContextEncoders #13670

Comments

joshdevins commented Sep 21, 2021 • edited by LysandreJik

Environment info

Who can help

Information

To reproduce

Expected behavior

qqaatw commented Sep 23, 2021

joshdevins commented Sep 24, 2021

qqaatw commented Sep 24, 2021 • edited

patrickvonplaten commented Sep 29, 2021

joshdevins commented Sep 30, 2021 • edited

joshdevins commented Sep 30, 2021 • edited

patrickvonplaten commented Sep 30, 2021

joshdevins commented Oct 1, 2021

github-actions bot commented Oct 25, 2021

joshdevins commented Sep 21, 2021 •

edited by LysandreJik

qqaatw commented Sep 24, 2021 •

edited

joshdevins commented Sep 30, 2021 •

edited

joshdevins commented Sep 30, 2021 •

edited