Add X-CLIP #18852

NielsRogge · 2022-09-01T12:15:24Z

What does this PR do?

This PR adds X-CLIP, which is a minimal extension of CLIP for video-language pre-training.

To do:

upload all checkpoints to the hub, as part of the microsoft organization

NielsRogge · 2022-09-01T12:48:14Z

Many tests fail due to the following error:

ModuleNotFoundError: No module named 'transformers.models.xclip'

This is probably because I first called the model folder "xclip", which is now called "x_clip". Still, wondering why it keeps looking for the module models.clip. If anyone has any pointers, that would be greatly appreciated.

sgugger

Thanks for adding this new model! Left a couple of comments.

For the import errors, I think you need to add "xclip" to the SPECIAL_MODEL_TYPE_TO_MODULE_NAME variable in configuration_auto.py since the module name is not the model type xclip (with potential - replaced by _).

src/transformers/__init__.py

src/transformers/models/auto/tokenization_auto.py

src/transformers/models/x_clip/modeling_x_clip.py

src/transformers/models/x_clip/test.py

tests/models/x_clip/test_modeling_x_clip.py

utils/check_config_docstrings.py

utils/check_repo.py

HuggingFaceDocBuilderDev · 2022-09-02T08:48:44Z

The documentation is not available anymore as the PR was closed or merged.

NielsRogge · 2022-09-02T09:10:34Z

@sgugger thanks a lot, that solved the issue. There seems to be another (small) issue with run_tests_hub:

==================================== ERRORS ====================================
_______________ ERROR collecting tests/utils/test_file_utils.py ________________
tests/utils/test_file_utils.py:26: in <module>
    from transformers import *  # noqa F406
src/transformers/utils/import_utils.py:1021: in __getattr__
    value = getattr(module, name)
src/transformers/utils/import_utils.py:1023: in __getattr__
    raise AttributeError(f"module {self.__name__} has no attribute {name}")
E   AttributeError: module transformers.models.clip has no attribute CLIPProcessor

Running RUN_SLOW=yes pytest tests/utils/test_file_utils.py passes locally for me.

sgugger · 2022-09-02T11:53:58Z

That would be because you moved CLIPProcessor in the non-vision dependent objects in the main init (and rightly so) but did not do the same for the models/clip/__init__.py.

alaradirik

Thank you for adding this! Looks good to me overall, I just left a few comments and questions.

tests/models/x_clip/test_modeling_x_clip.py

src/transformers/models/x_clip/modeling_x_clip.py

alaradirik · 2022-09-05T16:30:13Z

src/transformers/models/x_clip/modeling_x_clip.py

+
+        hidden_states = torch.cat([hidden_states, msg_token], dim=1)
+
+        residual = hidden_states


Just double checking, shouldn't this be residual = hidden_states.clone() instead?

It seems lines 449-462 would alter residual too.

Hmm this seems to work fine; is it possible that residual just refers to the original hidden states?

Just did a quick experiment:

>>> a = "hello" >>> b = a >>> a += "niels" >>> b 'hello'

src/transformers/models/x_clip/modeling_x_clip.py

NielsRogge · 2022-09-08T11:01:10Z

@sgugger and @alaradirik - the PR is ready for merge. Kindly asking for your approval :)

sgugger

Looking good, thanks again for adding this model!

src/transformers/models/auto/tokenization_auto.py

alaradirik

Looks good to me! Thanks for adding this

* First draft * Improve conversion script * Make vision encoder work * More improvements * Improve conversion script * Fix quality * Add MultiframeIntegrationTransformer * More improvements * Make MiT output work * Fix quality * Add prompts generator * Add tests * Fix some tests * Fix some more tests * Fix more tests * Improve conversion script * Fix model outputs * Fix more tests * Add XClipProcessor * Use processor in conversion script * Fix integration test * Update README, fix docs * Fix all tests * Add MIT output to XClipOutput * Create better variable names * Rename XClip to XCLIP * Extend conversion script * Add support for large models * Add support for 16 frame models * Add another model' * Fix module issue * Apply suggestions from code review * Add figure to docs * Fix CLIPProcessor issue * Apply suggestions from code review * Delete file * Convert more checkpoints * Convert last checkpoint * Update nielsr to microsoft

NielsRogge added 30 commits September 1, 2022 09:42

First draft

6204390

Improve conversion script

7e3f4bb

Make vision encoder work

2fa856a

More improvements

51c4c5a

Improve conversion script

8fdc4a1

Fix quality

1bcbedc

Add MultiframeIntegrationTransformer

c6c29d1

More improvements

c324f2d

Make MiT output work

9384c4f

Fix quality

d679bd0

Add prompts generator

533c4e0

Add tests

beaae5a

Fix some tests

0c0fe95

Fix some more tests

f944b49

Fix more tests

a77dfab

Improve conversion script

adad246

Fix model outputs

8c1b600

Fix more tests

6688cc2

Add XClipProcessor

07694d4

Use processor in conversion script

7949831

Fix integration test

4f0aee7

Update README, fix docs

9949171

Fix all tests

5c448e1

Add MIT output to XClipOutput

252ff54

Create better variable names

043704d

Rename XClip to XCLIP

39b2049

Extend conversion script

26f8307

Add support for large models

658027e

Add support for 16 frame models

1c5a560

Add another model'

19cbc88

NielsRogge requested review from sgugger and alaradirik September 1, 2022 12:15

sgugger reviewed Sep 1, 2022

View reviewed changes

NielsRogge added 2 commits September 1, 2022 16:06

Fix module issue

4b3b1d3

Apply suggestions from code review

c1461cd

Add figure to docs

2ceb582

Fix CLIPProcessor issue

9f4b3dc

NielsRogge mentioned this pull request Sep 2, 2022

Adding X-CLIP to HuggingFace Transformers microsoft/VideoX#61

Closed

alaradirik reviewed Sep 5, 2022

View reviewed changes

NielsRogge added 5 commits September 7, 2022 13:36

Apply suggestions from code review

a110fe3

Delete file

04d7538

Convert more checkpoints

c5e2d4b

Convert last checkpoint

a04da92

Update nielsr to microsoft

eafedc6

sgugger approved these changes Sep 8, 2022

View reviewed changes

src/transformers/models/auto/tokenization_auto.py Outdated Show resolved Hide resolved

Add remaining models, apply suggestion

b14228f

NielsRogge merged commit bb6f6d5 into huggingface:main Sep 8, 2022

alaradirik reviewed Sep 8, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add X-CLIP #18852

Add X-CLIP #18852

NielsRogge commented Sep 1, 2022 •

edited

NielsRogge commented Sep 1, 2022

sgugger left a comment

HuggingFaceDocBuilderDev commented Sep 2, 2022 •

edited

NielsRogge commented Sep 2, 2022

sgugger commented Sep 2, 2022 •

edited

alaradirik left a comment

alaradirik Sep 5, 2022

NielsRogge Sep 7, 2022

NielsRogge commented Sep 8, 2022

sgugger left a comment

alaradirik left a comment


		hidden_states = torch.cat([hidden_states, msg_token], dim=1)

		residual = hidden_states

Add X-CLIP #18852

Add X-CLIP #18852

Conversation

NielsRogge commented Sep 1, 2022 • edited

What does this PR do?

NielsRogge commented Sep 1, 2022

sgugger left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Sep 2, 2022 • edited

NielsRogge commented Sep 2, 2022

sgugger commented Sep 2, 2022 • edited

alaradirik left a comment

Choose a reason for hiding this comment

alaradirik Sep 5, 2022

Choose a reason for hiding this comment

NielsRogge Sep 7, 2022

Choose a reason for hiding this comment

NielsRogge commented Sep 8, 2022

sgugger left a comment

Choose a reason for hiding this comment

alaradirik left a comment

Choose a reason for hiding this comment

NielsRogge commented Sep 1, 2022 •

edited

HuggingFaceDocBuilderDev commented Sep 2, 2022 •

edited

sgugger commented Sep 2, 2022 •

edited