New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add Jukebox model #16875
[WIP] Add Jukebox model #16875
Conversation
Tokenizer and corresponding test should be done. Lacking some detailed description and also probably something about the arguments in the init that are not data but I don't remember if I should create setters (@patrickvonplaten would love to have your review) |
Cool nice to see much progress here! Feel free to also add a file that shows how you compare OpenAI's original to the current (HF) implementation |
* fix report cat path * fix report cat path Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add onnx configuration for bigbird-pegasus * Modify docs
* split single_gpu and multi_gpu * update needs in send_result Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
… in case of overflowing tokens (huggingface#17092) * add get_overflowing_images function to ensure 1-to-1 mapping between samples and images in LayoutLMv2Processor * make style * add test for overflowing_tokens, change assert to ValueError, avoiding unrelated formatting changes * change line length by passing --preview into black
…ggingface#17123) * Add type hints for remaining BigBirdPegasus models Here I added type hints to the BigBirdPegasusForCausalLM class. * Add missing type hints for Data2VecText models Added type hints to the Data2VecTextForCausalLM, Data2VecTextForMaskedLM, Data2VecTextForMultipleChoice, Data2VecTextForQuestionAnswering, Data2VecTextForSequenceClassification, and Data2VecTextForTokenClassification classes.
* update docs of length_penalty * Revert "update docs of length_penalty" This reverts commit 466bf48. * add mobilebert onnx config * address suggestions * Update auto.mdx * Update __init__.py * Update features.py
* PyTorch FSDP integration in Trainer * reformatting make style and make quality are now compliant. * Updating dependency check * Trigger CI Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
…ith try-except (huggingface#16578) * rebase and isort * modify cookiecutter init * fix cookiecutter auto imports * fix clean_frameworks_in_init * fix add_model_to_main_init * blackify * replace unnecessary f-strings * update yolos imports * fix roberta import bug * fix yolos missing dependency * fix add_model_like and cookiecutter bug * fix repository consistency error * modify cookiecutter, fix add_new_model_like * remove stale line Co-authored-by: Dom Miketa <dmiketa@exscientia.co.uk>
…uggingface#17068) Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> - Adds auto_batch_size finder - Moves training loop to an inner training loop
…huggingface#17130) * ensure mlflow.end_run() is executed at end of training when mlflow.start_run() was executed by the callback * add debug msg * add support for MLFLOW_TAGS, MLFLOW_RUN_ID, and MLFLOW_NESTED_RUN * update to support python 3.6+ * Validate env variables using ENV_VARS_TRUE_VALUES * Empty-Commit
* LogSumExp trick `question_answering` pipeline. * Adding a failing test.
What happened to the git commit history here? |
I rebased instead of merging 🤕 Will create a new PR to replace that one |
See followup in #17826 |
This is a draft pull request.
What does this PR do?
This PR will progressively add the Jukebox model to the hub.
It is linked to #16870.
Currently planned steps (WIP)
transformeres-cli add-new-model-like
src/transformers/tokenization_jukebox.py
src/transformers/test_tokenization_jukebox.py
src/transformers/configuration_jukebox.py
src/transformers/modeling_jukebox.py
src/transformers/configuration_jukebox.py
docs/source/model_doc/jukebox.rst
src/transformers/tokenization_jukebox_fast.py
(will most probably use WordLevel tokenizer). Also requires to implement a converter functionclass JukeboxConverter(Converter):