Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BEiT #12994

Merged
merged 28 commits into from Aug 4, 2021
Merged

Add BEiT #12994

merged 28 commits into from Aug 4, 2021

Conversation

NielsRogge
Copy link
Contributor

@NielsRogge NielsRogge commented Aug 3, 2021

What does this PR do?

It adds BEiT: BERT Pre-Training of Image Transformers to the library. It's the first paper that enables self-supervised pre-trained Vision Transformers (ViTs) to outperform their supervised pre-training counterparts. As a picture says more than a thousand (or 16x16?) words, this is a good summary of the approach:

Schermafbeelding 2021-08-03 om 17 26 19

The authors used OpenAI's DALL-E's encoder to map images to tokens, which the model then needs to predict based on masked patches. There are 3 models defined: BEiTModel, BEiTForMaskedImageModeling and BEiTForImageClassification.

This PR also cleans up some scripts from the library, namely those that defined id2label dicts for several datasets. I have removed imagenet_classes.py and coco_classes.py from the utils directory. Instead, id2label's are now defined on the hub in their own repository. These can then be used in conversion scripts using the huggingface_hub library.

To do

  • Add all checkpoints to the hub, under the "Microsoft" namespace. Perhaps discuss the model names, because for example microsoft/beit_base_patch16_224_pt22k_ft22k_to_1k is getting out of hand
  • Would be cool to have a working colab for the BEiTForMaskedImageModeling model. For this, tagging one of the original authors: @donglixp

In a future PR, I also plan to add the semantic segmentation model, which obtains SOTA on Ade20k.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome addition! No big remark on my side, this looks ready to be merged soon (as long as the tests are fixed ;-) ), left a few comments.

README.md Outdated Show resolved Hide resolved
docs/source/model_doc/beit.rst Outdated Show resolved Hide resolved
src/transformers/__init__.py Outdated Show resolved Hide resolved
src/transformers/__init__.py Outdated Show resolved Hide resolved
src/transformers/image_utils.py Outdated Show resolved Hide resolved
src/transformers/models/vit/convert_vit_timm_to_pytorch.py Outdated Show resolved Hide resolved
tests/test_modeling_beit.py Outdated Show resolved Hide resolved


@require_torch
class BEiTModelTest(ModelTesterMixin, unittest.TestCase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question. The tests are different for a bunch of vision models now, maybe we should have a special tester class for them and refactor the common tests of vision models there? I'm not familiar enough with how similar those tests are to be sure it's worth it, so tell me if it makes no sense.

tests/test_modeling_beit.py Outdated Show resolved Hide resolved
NielsRogge and others added 3 commits August 4, 2021 09:44
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
@NielsRogge
Copy link
Contributor Author

NielsRogge commented Aug 4, 2021

I've uploaded all checkpoints to the hub: https://huggingface.co/models?search=microsoft/beit

I've renamed the checkpoints which are fine-tuned on ImageNet-1k (after being intermediately fine-tuned on ImageNet-22k) to be just microsoft/beit-base-patch16-224, etc.

@donglixp if you're interested, could you write model cards for these models? Model cards are READMEs that describe the models in detail. You can take inspiration from ViT's model card.

Also, I do have a notebook for BEiTForMaskedImageModeling, but it's not working as expected. Could you please take a look? https://colab.research.google.com/drive/1Mjt-3jHw9HYMXECmSdDlbiG59ZAw-Z0T?usp=sharing

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall very clean! I think you can safely ignore the error linked to model templates, it's running make fixup which is looking for a file that was deleted in this PR.

Left just a nit regarding the naming convention.

src/transformers/models/beit/modeling_beit.py Outdated Show resolved Hide resolved
@NielsRogge NielsRogge merged commit 83e5a10 into huggingface:master Aug 4, 2021
@JStumpp
Copy link

JStumpp commented Sep 29, 2021

@NielsRogge great work, any news on the future PR, to add the semantic segmentation model and the pretrained Ade20k? Thanks!

@NielsRogge
Copy link
Contributor Author

@JStumpp say no more, it's added ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants