New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TF/Numpy variants for all DataCollator classes #13105
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great addition! We could also add the NumPy part for the Flax/Jax folks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few more nits to polish the PR.
More updates done - please note that tests will fail until all of the data collators are updated, because I removed the top-level imports. I definitely won't be merging this until that's done, don't worry! |
All the classes are in! Thank you to @aromans and @sdwalker62, whose PR #12199 I cannibalized for MLM and its variants. Next step is finishing tests and making sure all of this actually works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks a lot for adapting all of those and writing all the tests.
…g import is found
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
…ionLanguageModeling
…hem were making us fail code quality checks
a53734b
to
4b9cfb5
Compare
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you @Rocketknight1! Thanks for writing such extensive tests.
Hi @aromans and @sdwalker62, we're ready to merge now. I just realized I'll need your Github no-reply e-mail addresses to add you though - see the docs here. |
Thanks! |
It's in, and all authors have been properly credited! If you want to delete the messages with your e-mails (in case of spambot harvesting), feel free. |
This is a draft PR again - I've written an example of what a TF variant of one of our data collators would look like. If we're happy with this format, it should be easy to expand it to support Numpy/JAX as well, and to do the same for other data collators, and I'll probably add most of the other data collators to this PR before merging it. Let me know what you think!