Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BloomForSequenceClassification and BloomForTokenClassification classes #17639

Merged

Conversation

haileyschoelkopf
Copy link
Contributor

@haileyschoelkopf haileyschoelkopf commented Jun 9, 2022

What does this PR do?

This PR adds 2 new classes for the BLOOM model with sequence classification and token classification heads. Mentioned briefly by me in PR #17474 .

We are planning to use the smaller BLOOM models for these tasks downstream in the Bigscience Multilingual Modeling WG, so we need these classes implemented to do so.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). -> NO
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests? -> TODO

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Will tag patrickvonplaten and sgugger when ready--still need to write tests first!

Let me know if anything is wrong with this PR!

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 9, 2022

The documentation is not available anymore as the PR was closed or merged.

@haileyschoelkopf
Copy link
Contributor Author

haileyschoelkopf commented Jun 10, 2022

@patrickvonplaten @sgugger

This PR should be ready for review.

There are tests failing, but I am not sure why--I think I've looked at all the failing ones, and their failure seems to be unrelated to anything I changed (ie. image segmentation tests and tokenization tests fail, and text generation pipeline test_small_model_pt seems to fail because "sshleifer/tiny-ctrl" cannot be downloaded, none of which relate to files I touched in this PR afaik.) Hopefully I'm not mistaken on this end!

@haileyschoelkopf haileyschoelkopf marked this pull request as ready for review June 10, 2022 21:58
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once all tests pass!
We had an outage on the Hub last Friday and this weekend, relaunched all tests which should remove all flaky failures :-)

@younesbelkada
Copy link
Contributor

younesbelkada commented Jun 14, 2022

Hi @haileyschoelkopf , I have managed to fix some tests that fails on this PR, can I directly push to this PR in case the tests still fail on your side?

@haileyschoelkopf
Copy link
Contributor Author

Ah thank you @younesbelkada , you got to fixing the tests before I had time to do it!

I think that you can push to this PR already (you're marked as a maintainer right?) so feel free to do so :)

- more tests should pass
- one test left
@younesbelkada
Copy link
Contributor

Perfect thanks! Just pushed my commit

@haileyschoelkopf
Copy link
Contributor Author

haileyschoelkopf commented Jun 14, 2022

After a minor change to return type of BloomForTokenClassification , all tests pass!

This should be ready to merge now :)

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a very small comment! Otherwise looks very good to me! Thank you very much for helping us implementing these classes

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
@haileyschoelkopf
Copy link
Contributor Author

Comments should be resolved now!

@younesbelkada
Copy link
Contributor

Waiting for the last lights to be green 🟢 and we'll merge !

@younesbelkada younesbelkada changed the title [WIP] Add BloomForSequenceClassification and BloomForTokenClassification classes Add BloomForSequenceClassification and BloomForTokenClassification classes Jun 14, 2022
@younesbelkada younesbelkada merged commit edb672a into huggingface:main Jun 14, 2022
@haileyschoelkopf haileyschoelkopf deleted the bloom_classification_heads branch June 14, 2022 15:18
@stefan-it
Copy link
Collaborator

stefan-it commented Jun 17, 2022

Hi @haileyschoelkopf ,

thanks for adding this! Do you have any recommendations for a good hyper-param configuration (batch size, learning rate) when doing NER? I tried CoNLL-2003 with the bigscience/bloom-350m checkpoint, but results are around 67% on test set, which is very bad. (I tried epoch = 10, learning rate = 5e-06 and batch size = 4, that was working well with XLM-R Large and it took 114 minutes on a V100 using fp16).

@younesbelkada
Copy link
Contributor

I don't know if this is related but, bloom-350m's pre-training has not finished yet! You may want to try it with bloom-1b3 which is a model where the pre-training has been completed! https://huggingface.co/bigscience/bloom-1b3

@haileyschoelkopf
Copy link
Contributor Author

Hi @stefan-it, I agree with what @younesbelkada said, bloom-1b3 is worth trying since it's finished pretraining! We haven't actually gotten to NER experiments but I'll let you know if we do end up finding good hyperparams.

@stefan-it
Copy link
Collaborator

Hi @younesbelkada and @haileyschoelkopf thanks for that hint! I used the bigscience/bloom-1b3 model with DeepSpeed and the result after one epoch of fine-tuning (same hyper-params as mentioned above) are also very bad. So I'm going to tune the hyper-params and please let me know if you found a working setup for your NER task(s) 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants