New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distilbert-flax #13324
distilbert-flax #13324
Conversation
Great Great job @kamalkraj! Think the only major thing to update is to docs in the modeling file (at the moment it looks like it's the PyTorch docs, but should be Flax :-)) |
@patrickvonplaten |
Hi @kamalkraj , I'm also really interested in that PR - thanks for adding it 🤗 Do you also plan to add a script for the distillation process (like it is done in the "old" script), as I would like to re-distillate some of my previous DistilBERT models (I don't have access to multi GPU setups, only to TPUs at the moment). |
Hi @stefan-it, I will go through the scripts and pings you. |
passes and the code looks good :-) Ready to merge IMO 🎉 ! @patil-suraj the slow test doesn't pass on TPU since distilbert has pretty extreme activations in the forward pass like a couple of other models. We need to think a bit how to adapt the slow test depending on whether they're run on TPU or not in general... |
Great work @kamalkraj ! |
What does this PR do?
DistilBert Flax
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@VictorSanh @patrickvonplaten