Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CLIPSeg #20066

Merged
merged 47 commits into from Nov 8, 2022
Merged

Add CLIPSeg #20066

merged 47 commits into from Nov 8, 2022

Conversation

NielsRogge
Copy link
Contributor

@NielsRogge NielsRogge commented Nov 4, 2022

What does this PR do?

This PR adds CLIPSeg, a nice extension of CLIP for zero-shot and one-shot (image-guided) image segmentation.

To do:

  • transfer checkpoints and update code
  • update base_model_prefix

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Nov 4, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, thanks for adding this model! Make sure you update the paper link in the doc file, and the config should take into account the latest modifications done on the CLIP config.

docs/source/en/model_doc/clipseg.mdx Outdated Show resolved Hide resolved
src/transformers/models/clipseg/configuration_clipseg.py Outdated Show resolved Hide resolved
src/transformers/models/clipseg/configuration_clipseg.py Outdated Show resolved Hide resolved
src/transformers/models/clipseg/modeling_clipseg.py Outdated Show resolved Hide resolved
src/transformers/models/clipseg/test.py Outdated Show resolved Hide resolved
tests/models/clipseg/test_modeling_clipseg.py Outdated Show resolved Hide resolved
Comment on lines 81 to 87
"""This function prepares a list of PIL images, or a list of numpy arrays if one specifies numpify=True,
or a list of PyTorch tensors if one specifies torchify=True.
"""

image_inputs = [np.random.randint(255, size=(3, 30, 400), dtype=np.uint8)]

image_inputs = [Image.fromarray(np.moveaxis(x, 0, -1)) for x in image_inputs]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""This function prepares a list of PIL images, or a list of numpy arrays if one specifies numpify=True,
or a list of PyTorch tensors if one specifies torchify=True.
"""
image_inputs = [np.random.randint(255, size=(3, 30, 400), dtype=np.uint8)]
image_inputs = [Image.fromarray(np.moveaxis(x, 0, -1)) for x in image_inputs]
"""
This function prepares a list of PIL images, or a list of numpy arrays if one specifies numpify=True,
or a list of PyTorch tensors if one specifies torchify=True.
"""
image_inputs = [np.random.randint(255, size=(3, 30, 400), dtype=np.uint8)]
image_inputs = [Image.fromarray(np.moveaxis(x, 0, -1)) for x in image_inputs]

Copy link
Contributor

@alaradirik alaradirik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the addition! Looks ready to merge once the docstrings are complete and all tests are passing.

@NielsRogge NielsRogge merged commit 2589630 into huggingface:main Nov 8, 2022
@polavishnu4444
Copy link

Is the CLIPSeg yet to be released in the latest version?

mpierrau pushed a commit to mpierrau/transformers that referenced this pull request Dec 15, 2022
* Add first draft

* Update conversion script

* Improve conversion script

* Improve conversion script some more

* Add conditional embeddings

* Add initial decoder

* Fix activation function of decoder

* Make decoder outputs match original implementation

* Make decoder outputs match original implementation

* Add more copied from statements

* Improve model outputs

* Fix auto tokenizer file

* Fix more tests

* Add test

* Improve README and docs, improve conditional embeddings

* Fix more tests

* Remove print statements

* Remove initial embeddings

* Improve conversion script

* Add interpolation of position embeddings

* Finish addition of interpolation of position embeddings

* Add support for refined checkpoint

* Fix refined checkpoint

* Remove unused parameter

* Improve conversion script

* Add support for training

* Fix conversion script

* Add CLIPSegFeatureExtractor

* Fix processor

* Fix CLIPSegProcessor

* Fix conversion script

* Fix most tests

* Fix equivalence test

* Fix README

* Add model to doc tests

* Use better variable name

* Convert other checkpoint as well

* Update config, add link to paper

* Add docs

* Update organization

* Replace base_model_prefix with clip

* Fix base_model_prefix

* Fix checkpoint of config

* Fix config checkpoint

* Remove file

* Use logits for output

* Fix tests

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants