Add Chinese-CLIP implementation #20368

yangapku · 2022-11-22T06:56:08Z

What does this PR do?

This PR adds Chinese-CLIP model into Transformers repo. The Chinese-CLIP model was introduced in Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese. Chinese CLIP is an implementation and adaptation of CLIP (Radford et al., 2021) on a large-scale dataset of Chinese image-text pairs. It is capable of performing Chinese-based cross-modal retrieval and also playing as a vision backbone for vision tasks like zero-shot image classification, open-domain object detection, etc. This model was contributed by OFA-Sys. The original Github repo of Chinese-CLIP can be found at this link. Currently we have released our model weights on the Huggingface ModelHub. Compared with original OpenAI CLIP, we changed the text encoder to Chinese Roberta encoder, thus we reimplemented the config, modeling and preprocessor modules of Chinese-CLIP. Necessary unit tests and documents have been added.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

yangapku · 2022-11-29T12:38:20Z

@ydshieh All the comments mentioned above have been addressed.

sgugger

A couple of final comments on my side and it should be good to merge!

src/transformers/models/auto/feature_extraction_auto.py

src/transformers/models/chinese_clip/configuration_chinese_clip.py

…p.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

yangapku · 2022-11-29T15:06:22Z

@ydshieh Hi, is the PR able to be merged now? Thank you very much! ❤️

ydshieh · 2022-11-29T16:43:55Z

@sgugger I feel there must be very good reason we want to have CLIPTextTransformer and CLIPVisionTransformer, and use these components in CLIPTextModel, CLIPVisionModel and CLIPModel.

(Potentially to avoid CLIPPreTrainedModel in another CLIPPreTrainedModel which might cause some issues - at least if we ever want to have a TF port).

Do you think here we need to avoid this line

self.text_model = ChineseCLIPTextModel(text_config, add_pooling_layer=False)

and to create ChineseCLIPTextTransformer and use it?

ydshieh · 2022-11-29T17:06:41Z

Hi @yangapku , other than the above comment, LGTM! But let's wait @sgugger 's response.

There are a few places where I believe we can still use # copied from (probably need some tweaks) - I can help on this before merge.

sgugger · 2022-11-29T18:27:32Z

@ydshieh I think this was done this way just to be more aligned with the original checkpoints in the CLIP case. Here it works fine with the checkpoint, so I wouldn't over-complexify things

ydshieh · 2022-11-29T18:50:34Z

@yangapku Before we merge, could you run

RUN_SLOW=1 python -m pytest -v tests/models/chinese_clip/

I got 5 failures. You can focus on the first/last ones at this moment. Let me know if you need help on fixing them 🙏

FAILED tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPTextModelTest::test_model_from_pretrained - AttributeError: 'ChineseCLIPConfig' object has no attribute 'vocab_size'
FAILED tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPModelTest::test_torchscript_output_attentions - AssertionError: Items in the second set but not the first:
FAILED tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPModelTest::test_torchscript_output_hidden_state - AssertionError: Items in the second set but not the first:
FAILED tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPModelTest::test_torchscript_simple - AssertionError: Items in the second set but not the first:
FAILED tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPModelIntegrationTest::test_inference - OSError: Can't load tokenizer for 'OFA-Sys/chinese-clip-vit-base-patch16'. If you were trying to load it

yangapku · 2022-11-30T02:09:22Z

@yangapku Before we merge, could you run

RUN_SLOW=1 python -m pytest -v tests/models/chinese_clip/

I got 5 failures. You can focus on the first/last ones at this moment. Let me know if you need help on fixing them 🙏

FAILED tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPTextModelTest::test_model_from_pretrained - AttributeError: 'ChineseCLIPConfig' object has no attribute 'vocab_size'
FAILED tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPModelTest::test_torchscript_output_attentions - AssertionError: Items in the second set but not the first:
FAILED tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPModelTest::test_torchscript_output_hidden_state - AssertionError: Items in the second set but not the first:
FAILED tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPModelTest::test_torchscript_simple - AssertionError: Items in the second set but not the first:
FAILED tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPModelIntegrationTest::test_inference - OSError: Can't load tokenizer for 'OFA-Sys/chinese-clip-vit-base-patch16'. If you were trying to load it

Okay I will try to fix them today.

yangapku · 2022-11-30T11:08:57Z

@yangapku Before we merge, could you run

RUN_SLOW=1 python -m pytest -v tests/models/chinese_clip/

I got 5 failures. You can focus on the first/last ones at this moment. Let me know if you need help on fixing them 🙏

FAILED tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPTextModelTest::test_model_from_pretrained - AttributeError: 'ChineseCLIPConfig' object has no attribute 'vocab_size'
FAILED tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPModelTest::test_torchscript_output_attentions - AssertionError: Items in the second set but not the first:
FAILED tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPModelTest::test_torchscript_output_hidden_state - AssertionError: Items in the second set but not the first:
FAILED tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPModelTest::test_torchscript_simple - AssertionError: Items in the second set but not the first:
FAILED tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPModelIntegrationTest::test_inference - OSError: Can't load tokenizer for 'OFA-Sys/chinese-clip-vit-base-patch16'. If you were trying to load it

Okay I will try to fix them today.

@ydshieh The first and last failed cases have been fixed. Now only the failed test cases with Torchscript still remain. Meanwhile, to fix the first failed case, I have to remove the copied from comment for ChineseCLIPTextModel, since it has diverged from BertModel with our customed config_class ChineseCLIPTextConfig.

… diverged from BertModel with customed config_class

yangapku · 2022-11-30T16:44:52Z

@ydshieh Hi, is the PR able to be merged now? Do I have to fix the test cases related with Torchscript? If so, more help is needed since I am not so familiar with it 😢 .

ydshieh · 2022-11-30T17:02:25Z

I will take a look on those 3 tests @yangapku .

ydshieh · 2022-11-30T17:44:09Z

@yangapku I pushed the remaining fix. Will merge once the final CI is green 🚀 🚀 🚀

非常感謝您的工作! 💯

ydshieh

Thanks again!

yangapku · 2022-12-01T02:15:19Z

Thank you very much for your brilliant support! @ydshieh @sgugger

ydshieh · 2022-12-01T11:02:35Z

Hi @yangapku Just a follow up. From your branch , I see the file

convert_chinese_clip_original_pytorch_to_hf.py

is last modified on Nov 22. (The change on Nov 29 doesn't count). However, the modeling file changed quite a lot since then due to our review comments. I just want to make sure the conversion script still works correctly, and the original checkpoints and the the converted HF checkpoints still have the same outputs on some test examples.

It would be super nice if you can double check, but it's your call, it's just a suggestion.
(It's always good to make sure the users get the right checkpoints to use :-))

yangapku · 2022-12-01T11:31:34Z

@ydshieh Hi, I have ensured that this conversion script works correctly 😄 . In fact, today we have also updated the other 3 model scales (ViT-L/14, ViT-L/14@336px, ViT-H/14) on our HF model hub, during which I have used this script to convert our original model to HF format. After the conversion, I have tested all the converted HF checkpoints (the pytorch_model.bin), all of them works in expectation.

ydshieh · 2022-12-01T11:34:00Z

Thank you @yangapku !

* init chinese-clip model from clip * init model tests and docs * implement chinese-clip into hf * implement chinese-clip into hf * implement chinese-clip into hf * implement chinese-clip into hf * implement chinese-clip into hf * update usecase example in model implementation * fix codestyle * fix model_type typo in readme * add placeholder in doc * add placeholder in doc * update the init script * update usecase * fix codestyle * update testcase * update testcase * update testcase * update testcase * update testcase * update testcase * update testcase * update testcase * update testcase * update testcase * update testcase * update testcase * forward the convert_rgb * update testcase * update testcase * update testcase * merge the recent update from clip about model_input_name property * update the doc * update the doc * update the doc * update the doc * remove unused imports * reformat code style * update the doc * fix isort style * bypass a weird failed unit test which is unrelated with my PR * update the doc * implement independent vision config class * implement independent vision model class * fix refactor bug * fix refactor bug * fix refactor bug * make style * fix refactor bug * make style * fix refactor bug * fix refactor bug * make style * fix refactor bug * fix refactor bug * doc-build restyle * implement independent text config class * implement independent text model class * implement independent text model class * make style * make fix-copies * fix refactor bug * fix refactor bug * fix refactor bug * fix refactor bug * fix refactor bug * fix refactor bug * fix refactor bug * fix refactor bug * fix refactor bug * fix refactor bug * make style * update doc * black and isort * update doc * Update src/transformers/models/chinese_clip/configuration_chinese_clip.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/auto/tokenization_auto.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * modify the model type from chinese-clip to chinese_clip * format the example comment of ChineseCLIPVisionConfig * correct the copyright comment * fix the tokenizer specification * add copied from for loss function * remove unused class * update CHINESE_CLIP_TEXT_INPUTS_DOCSTRING * update CHINESE_CLIP_INPUTS_DOCSTRING * update doc * update doc * update code comment in config * update copied from statement * make style * rename the doc file * add copied statement * remove unused attention_mask, causal_attention_mask in ChineseCLIPVisionEncoder * remove ChineseCLIPTextPreTrainedModel * fix bug * fix bug * fix bug * update doc * make style * Update src/transformers/models/chinese_clip/configuration_chinese_clip.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/chinese_clip/configuration_chinese_clip.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * update ChineseCLIPImageProcessor in image_processing_auto * fix config_class of chinesecliptextmodel * fix the test case * update the docs * remove the copied from comment for ChineseCLIPTextModel, since it has diverged from BertModel with customed config_class * update the testcase * final fix Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

yangapku added 30 commits November 12, 2022 21:02

init chinese-clip model from clip

70a9f72

init model tests and docs

b094a73

implement chinese-clip into hf

7f015cb

implement chinese-clip into hf

38dd850

implement chinese-clip into hf

b731e2d

implement chinese-clip into hf

c89a9ae

implement chinese-clip into hf

de05b3a

update usecase example in model implementation

ef09771

Merge remote-tracking branch 'origin/main' into main_refactor

4aa01f0

fix codestyle

4522004

fix model_type typo in readme

f91b4f6

add placeholder in doc

80ea6ff

add placeholder in doc

b491bfa

update the init script

131cc0a

update usecase

44ab584

fix codestyle

933face

update testcase

09b4a6a

Merge branch 'main' of github.com:yangapku/transformers

90af626

Merge remote-tracking branch 'origin/main'

00951b1

update testcase

91b74da

update testcase

4e46944

update testcase

047ef5a

update testcase

d74a91e

update testcase

75a127b

update testcase

32a08cc

update testcase

c618cd1

update testcase

1778e71

Merge remote-tracking branch 'origin/main'

770b559

update testcase

e78c979

update testcase

40cc6b2

sgugger approved these changes Nov 29, 2022

View reviewed changes

src/transformers/models/auto/feature_extraction_auto.py Show resolved Hide resolved

src/transformers/models/chinese_clip/configuration_chinese_clip.py Outdated Show resolved Hide resolved

src/transformers/models/chinese_clip/configuration_chinese_clip.py Outdated Show resolved Hide resolved

yangapku and others added 3 commits November 29, 2022 22:07

Update src/transformers/models/chinese_clip/configuration_chinese_cli…

34d25a3

…p.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update src/transformers/models/chinese_clip/configuration_chinese_cli…

dc51647

…p.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

update ChineseCLIPImageProcessor in image_processing_auto

cec76c2

yangapku added 2 commits November 30, 2022 19:00

fix config_class of chinesecliptextmodel

0d5d9ea

fix the test case

22d676f

yangapku added 3 commits November 30, 2022 19:15

update the docs

a7cfb0a

remove the copied from comment for ChineseCLIPTextModel, since it has…

b8de02e

… diverged from BertModel with customed config_class

update the testcase

5d04b82

final fix

c42aff6

ydshieh approved these changes Nov 30, 2022

View reviewed changes

ydshieh merged commit 7217640 into huggingface:main Nov 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Chinese-CLIP implementation #20368

Add Chinese-CLIP implementation #20368

yangapku commented Nov 22, 2022 •

edited

yangapku commented Nov 29, 2022

sgugger left a comment

yangapku commented Nov 29, 2022

ydshieh commented Nov 29, 2022 •

edited

ydshieh commented Nov 29, 2022 •

edited

sgugger commented Nov 29, 2022

ydshieh commented Nov 29, 2022 •

edited

yangapku commented Nov 30, 2022 •

edited

yangapku commented Nov 30, 2022 •

edited

yangapku commented Nov 30, 2022

ydshieh commented Nov 30, 2022

ydshieh commented Nov 30, 2022

ydshieh left a comment

yangapku commented Dec 1, 2022

ydshieh commented Dec 1, 2022

yangapku commented Dec 1, 2022

ydshieh commented Dec 1, 2022

Add Chinese-CLIP implementation #20368

Add Chinese-CLIP implementation #20368

Conversation

yangapku commented Nov 22, 2022 • edited

What does this PR do?

Before submitting

Who can review?

yangapku commented Nov 29, 2022

sgugger left a comment

Choose a reason for hiding this comment

yangapku commented Nov 29, 2022

ydshieh commented Nov 29, 2022 • edited

ydshieh commented Nov 29, 2022 • edited

sgugger commented Nov 29, 2022

ydshieh commented Nov 29, 2022 • edited

yangapku commented Nov 30, 2022 • edited

yangapku commented Nov 30, 2022 • edited

yangapku commented Nov 30, 2022

ydshieh commented Nov 30, 2022

ydshieh commented Nov 30, 2022

ydshieh left a comment

Choose a reason for hiding this comment

yangapku commented Dec 1, 2022

ydshieh commented Dec 1, 2022

yangapku commented Dec 1, 2022

ydshieh commented Dec 1, 2022

yangapku commented Nov 22, 2022 •

edited

ydshieh commented Nov 29, 2022 •

edited

ydshieh commented Nov 29, 2022 •

edited

ydshieh commented Nov 29, 2022 •

edited

yangapku commented Nov 30, 2022 •

edited

yangapku commented Nov 30, 2022 •

edited