Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF/Numpy variants for all DataCollator classes #13105

Merged
merged 36 commits into from Aug 31, 2021
Merged

Conversation

Rocketknight1
Copy link
Member

@Rocketknight1 Rocketknight1 commented Aug 12, 2021

This is a draft PR again - I've written an example of what a TF variant of one of our data collators would look like. If we're happy with this format, it should be easy to expand it to support Numpy/JAX as well, and to do the same for other data collators, and I'll probably add most of the other data collators to this PR before merging it. Let me know what you think!

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great addition! We could also add the NumPy part for the Flax/Jax folks

src/transformers/data/data_collator.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more nits to polish the PR.

src/transformers/data/data_collator.py Outdated Show resolved Hide resolved
src/transformers/data/data_collator.py Outdated Show resolved Hide resolved
src/transformers/data/data_collator.py Show resolved Hide resolved
src/transformers/data/data_collator.py Outdated Show resolved Hide resolved
src/transformers/data/data_collator.py Outdated Show resolved Hide resolved
@Rocketknight1
Copy link
Member Author

More updates done - please note that tests will fail until all of the data collators are updated, because I removed the top-level imports. I definitely won't be merging this until that's done, don't worry!

@Rocketknight1 Rocketknight1 changed the title Adding a TF variant of the DataCollatorForTokenClassification TF/Numpy variants for all DataCollator classes Aug 19, 2021
@Rocketknight1
Copy link
Member Author

All the classes are in! Thank you to @aromans and @sdwalker62, whose PR #12199 I cannibalized for MLM and its variants. Next step is finishing tests and making sure all of this actually works.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot for adapting all of those and writing all the tests.

tests/test_data_collator.py Outdated Show resolved Hide resolved
src/transformers/data/data_collator.py Outdated Show resolved Hide resolved
Rocketknight1 and others added 3 commits August 30, 2021 19:31
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you @Rocketknight1! Thanks for writing such extensive tests.

@Rocketknight1
Copy link
Member Author

Hi @aromans and @sdwalker62, we're ready to merge now. I just realized I'll need your Github no-reply e-mail addresses to add you though - see the docs here.

@sdwalker62
Copy link
Contributor

dalton_walker@icloud.com

@Rocketknight1
Copy link
Member Author

Thanks!

@aromans
Copy link
Contributor

aromans commented Aug 31, 2021

@Rocketknight1 Rocketknight1 merged commit 854260c into master Aug 31, 2021
@Rocketknight1 Rocketknight1 deleted the tf_data_collators branch August 31, 2021 12:06
@Rocketknight1
Copy link
Member Author

It's in, and all authors have been properly credited! If you want to delete the messages with your e-mails (in case of spambot harvesting), feel free.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants