Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add Jukebox model #16875

Closed
wants to merge 399 commits into from
Closed

Conversation

ArthurZucker
Copy link
Collaborator

@ArthurZucker ArthurZucker commented Apr 21, 2022

This is a draft pull request.

What does this PR do?

This PR will progressively add the Jukebox model to the hub.
It is linked to #16870.

Currently planned steps (WIP)

  • Create template files with transformeres-cli add-new-model-like
  • src/transformers/tokenization_jukebox.py
  • src/transformers/test_tokenization_jukebox.py
  • src/transformers/configuration_jukebox.py
  • src/transformers/modeling_jukebox.py
  • src/transformers/configuration_jukebox.py
  • docs/source/model_doc/jukebox.rst
  • src/transformers/tokenization_jukebox_fast.py (will most probably use WordLevel tokenizer). Also requires to implement a converter function class JukeboxConverter(Converter):

@ArthurZucker ArthurZucker linked an issue Apr 21, 2022 that may be closed by this pull request
2 tasks
@ArthurZucker ArthurZucker self-assigned this Apr 26, 2022
@ArthurZucker ArthurZucker added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Apr 26, 2022
@ArthurZucker ArthurZucker added this to In progress in New model additions via automation Apr 26, 2022
@ArthurZucker
Copy link
Collaborator Author

Tokenizer and corresponding test should be done. Lacking some detailed description and also probably something about the arguments in the init that are not data but I don't remember if I should create setters (@patrickvonplaten would love to have your review)

@patrickvonplaten
Copy link
Contributor

patrickvonplaten commented May 2, 2022

Cool nice to see much progress here!

Feel free to also add a file that shows how you compare OpenAI's original to the current (HF) implementation

ydshieh and others added 21 commits May 6, 2022 07:45
* fix report cat path

* fix report cat path

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add onnx configuration for bigbird-pegasus

* Modify docs
* split single_gpu and multi_gpu

* update needs in send_result

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
… in case of overflowing tokens (huggingface#17092)

* add get_overflowing_images function to ensure 1-to-1 mapping between samples and images in LayoutLMv2Processor

* make style

* add test for overflowing_tokens, change assert to ValueError, avoiding unrelated formatting changes

* change line length by passing --preview into black
…ggingface#17123)

* Add type hints for remaining BigBirdPegasus models

Here I added type hints to the BigBirdPegasusForCausalLM class.

* Add missing type hints for Data2VecText models

Added type hints to the Data2VecTextForCausalLM, Data2VecTextForMaskedLM,
Data2VecTextForMultipleChoice, Data2VecTextForQuestionAnswering,
Data2VecTextForSequenceClassification, and
Data2VecTextForTokenClassification classes.
* update docs of length_penalty

* Revert "update docs of length_penalty"

This reverts commit 466bf48.

* add mobilebert onnx config

* address suggestions

* Update auto.mdx

* Update __init__.py

* Update features.py
* PyTorch FSDP integration in Trainer

* reformatting

make style and make quality are now compliant.

* Updating dependency check

* Trigger CI

Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
…ith try-except (huggingface#16578)

* rebase and isort

* modify cookiecutter init

* fix cookiecutter auto imports

* fix clean_frameworks_in_init

* fix add_model_to_main_init

* blackify

* replace unnecessary f-strings

* update yolos imports

* fix roberta import bug

* fix yolos missing dependency

* fix add_model_like and cookiecutter bug

* fix repository consistency error

* modify cookiecutter, fix add_new_model_like

* remove stale line

Co-authored-by: Dom Miketa <dmiketa@exscientia.co.uk>
…uggingface#17068)

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

- Adds auto_batch_size finder 
- Moves training loop to an inner training loop
…huggingface#17130)

* ensure mlflow.end_run() is executed at end of training when mlflow.start_run() was executed by the callback

* add debug msg

* add support for MLFLOW_TAGS, MLFLOW_RUN_ID, and MLFLOW_NESTED_RUN

* update to support python 3.6+

* Validate env variables using ENV_VARS_TRUE_VALUES

* Empty-Commit
* LogSumExp trick `question_answering` pipeline.

* Adding a failing test.
@patrickvonplaten
Copy link
Contributor

What happened to the git commit history here?

@ArthurZucker
Copy link
Collaborator Author

I rebased instead of merging 🤕 Will create a new PR to replace that one

@ArthurZucker
Copy link
Collaborator Author

See followup in #17826

@ArthurZucker ArthurZucker removed this from In progress in New model additions Nov 8, 2022
ArthurZucker added a commit that referenced this pull request Nov 10, 2022
amyeroberts pushed a commit to amyeroberts/transformers that referenced this pull request Nov 14, 2022
mpierrau pushed a commit to mpierrau/transformers that referenced this pull request Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenAI's Jukebox for music generation