New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add X-CLIP #18852
Add X-CLIP #18852
Conversation
Many tests fail due to the following error:
This is probably because I first called the model folder "xclip", which is now called "x_clip". Still, wondering why it keeps looking for the module models.clip. If anyone has any pointers, that would be greatly appreciated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this new model! Left a couple of comments.
For the import errors, I think you need to add "xclip"
to the SPECIAL_MODEL_TYPE_TO_MODULE_NAME
variable in configuration_auto.py
since the module name is not the model type xclip
(with potential -
replaced by _
).
The documentation is not available anymore as the PR was closed or merged. |
@sgugger thanks a lot, that solved the issue. There seems to be another (small) issue with run_tests_hub:
Running |
That would be because you moved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this! Looks good to me overall, I just left a few comments and questions.
|
||
hidden_states = torch.cat([hidden_states, msg_token], dim=1) | ||
|
||
residual = hidden_states |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double checking, shouldn't this be residual = hidden_states.clone()
instead?
It seems lines 449-462 would alter residual
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this seems to work fine; is it possible that residual just refers to the original hidden states?
Just did a quick experiment:
>>> a = "hello"
>>> b = a
>>> a += "niels"
>>> b
'hello'
@sgugger and @alaradirik - the PR is ready for merge. Kindly asking for your approval :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, thanks again for adding this model!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks for adding this
* First draft * Improve conversion script * Make vision encoder work * More improvements * Improve conversion script * Fix quality * Add MultiframeIntegrationTransformer * More improvements * Make MiT output work * Fix quality * Add prompts generator * Add tests * Fix some tests * Fix some more tests * Fix more tests * Improve conversion script * Fix model outputs * Fix more tests * Add XClipProcessor * Use processor in conversion script * Fix integration test * Update README, fix docs * Fix all tests * Add MIT output to XClipOutput * Create better variable names * Rename XClip to XCLIP * Extend conversion script * Add support for large models * Add support for 16 frame models * Add another model' * Fix module issue * Apply suggestions from code review * Add figure to docs * Fix CLIPProcessor issue * Apply suggestions from code review * Delete file * Convert more checkpoints * Convert last checkpoint * Update nielsr to microsoft
What does this PR do?
This PR adds X-CLIP, which is a minimal extension of CLIP for video-language pre-training.
To do:
microsoft
organization