Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate: deprecate generation relying on default max_length #18018

Merged
merged 8 commits into from Jul 23, 2022

Conversation

gante
Copy link
Member

@gante gante commented Jul 4, 2022

What does this PR do?

(EDITED)

This PR applies the outcome of two discussions:

  1. Favor the use of max_new_tokens, as opposed to max_length. This introduces the max_new_tokens to TF and FLAX.
  2. Update the docstring about scores, to be clear that its length depends on max_new_tokens and other factors.

Review suggestion: review PT first. TF and Flax are mostly copy/paste.

Closes #17868

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 4, 2022

The documentation is not available anymore as the PR was closed or merged.

@gante gante changed the title Generate: Deprecate max_length in favor of max_new_tokens Generate: deprecate max_length in favor of max_new_tokens Jul 4, 2022
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this. Made comments on the Flax file which apply to the others. I'll let @patrickvonplaten comment on whether we should remove this max_length argument in v5 or not, but max_new_tokens makes way more sense to me!

src/transformers/generation_flax_utils.py Outdated Show resolved Hide resolved
src/transformers/generation_flax_utils.py Outdated Show resolved Hide resolved
src/transformers/generation_flax_utils.py Outdated Show resolved Hide resolved
src/transformers/generation_flax_utils.py Show resolved Hide resolved
src/transformers/generation_flax_utils.py Outdated Show resolved Hide resolved
@gante gante force-pushed the generate_max_new_tokens branch 2 times, most recently from 3d8d1f1 to 5e52144 Compare July 6, 2022 14:28
@patrickvonplaten
Copy link
Contributor

Thanks a lot for opening this very-much needed PR @gante!

Before moving forward, I'd like to discuss a bit if we should remove max_length in a v5 version or not because I'm a bit torn in-between, but don't really think we can or should remove max_length ever.

First, I want to lay out my opinion about generate and the future of generate a bit in general and second want to give my opinion of max_new_tokens vs max_length.

First: My general opinion regarding generate and how it should develop further:

  1. generate is very important as it's used by all GPT-like and all encoder-decoder models. The are three methods that are used in 99% of the cases (being greedy_search, sample and beam_search) - the other generation methods are somewhat special.
  2. It was a mistake to put the generation parameters in the PreTrainedConfig and have the function arguments default to them. This makes it extremely difficult to deal with breaking changes now and is also too much "black magic" for the user
  3. We have to be extremely careful with adding new features to generate since the function quickly explodes (see this PR)
  4. generate != generation pipeline . The generate function should limit the amount of "black magic" a.k.a. falling back to certain defaults as much as possible. E.g. in PyTorch's generate we create the attention_mask automatically from the padding id. I've not copied these things to Flax generate as I think the should not have been done as it makes the method hard to understand and now limits us severely in adding good error messages, keeping the function understandable.

Now from this perspective, if somehow possible, I'd like to long-term remove the "fall-back" mechanism to the config generation arguments and delete the generation parameters from the config fully (I'd be happy with an optional generation config instead though). This would help us enormously to:

  • a) Change this in the future since it generate won't rely on thousands of saved configs
  • b) Very much help the user to better understand what's going on / remove the black magic
  • c) Keep generate slim by removing many edge cases. What happens if max_length is defined before it falls back to config.max_length / what after. How to deal with conflicting config generation params and config generation params, etc...

Second:

Now from that perspective, I think we should not remove max_length since it has some clear advantages over max_new_tokens, but rather move towards removing the default of max_length=20 and changing all docs to use max_new_tokens, but allow the user to use both.

More specifically:

IMO max_new_tokens is definitely better than max_length because of the following:

  • max_length has a different meaning for "decoder-only" models such as GPT and "encoder-decoder" models such as T5. For "decoder-only" models the max_length is understood as the current length of the input (input_ids.shape + max_new_tokens) whereas for "encoder-decoder" models max_length is understood as max_new_tokens -1 (-1 because there is already the dummy decoder token id), which is essentially the same as max_new_tokens which is regardless of what the users has put as input_ids (since those are processed only once by the encoder). => Switching to max_new_tokens here reduces this knowledge burden as both "decoder-only" and "encoder-decoder" behave the same
  • max_new_tokens is more "natural" as users don't consider the input to the model as part of the generation & it avoids weird errors like max_length < len(input_ids)

However

  • max_length has some very useful use cases. E.g. if one always want to generate to the maximum allowed length of the model, it's very easy to do this with max_length but harder with max_new_tokens as max_new_tokens needs to be derived dynamically for every input
  • max_length is simply used too much IMO

So moving forward, I would propose to start with a warning if the user passes neither max_length nor max_new_tokens and state that this won't be allowed anymore in v5. Currently the model would just fall back to a somewhat arbitrary default of 20 which is not great IMO. At the same time we can strongly advice the user to use max_new_tokens instead of max_length.

Would love to hear your (general) opinions here regarding generate @gante @sgugger @LysandreJik @patil-suraj @Narsil @sanchit-gandhi @thomwolf

@sgugger
Copy link
Collaborator

sgugger commented Jul 7, 2022

I completely agree with the point of moving away from the config things that are not related to instantiating the model (like I've been doing for training parameters like gradient accumulation) and in general we might need some sort of "pipeline config" for the use of the model by default in a pipeline/widget, which could also be re-used for the parameters used by generate.

Good for me to have the two arguments forever and ever while encouraging max_new_tokens, your arguments make sense.

@gante
Copy link
Member Author

gante commented Jul 7, 2022

I also agree with the reasoning and the suggested path forward 👍

@gante gante changed the title Generate: deprecate max_length in favor of max_new_tokens Generate: deprecate generation relying on default max_length Jul 11, 2022
@gante
Copy link
Member Author

gante commented Jul 11, 2022

@patrickvonplaten @sgugger I've implemented the changes from the comments above 👍 Related to the max_length/max_new_tokens, the behavior is now the following:

  • Neither are set ➡️ raise a deprecation warning (will be an exception in v5), nudge towards max_new_tokens, and use the default for max_length;
  • Both are set ➡️ raise an exception;
  • max_length is set ➡️ nothing happens
  • max_new_tokens is set ➡️ updates max_length according to the prompt length

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some comments that should be applied on all three frameworks.

LGTM. thanks for iterating on this!

src/transformers/generation_flax_utils.py Outdated Show resolved Hide resolved
src/transformers/generation_flax_utils.py Outdated Show resolved Hide resolved
@Narsil
Copy link
Contributor

Narsil commented Jul 15, 2022

Your comment @patrickvonplaten is exactly what I think !

Also, not sure about v5 but using neither max_length nor max_new_tokens also make sense to me as a user. Using neither would mean just keep on generating while I ask you to stop, eventually eternally (if we had such a transformers + no max length and infinite compute power). Overriding the StoppingCriteria currently allows that.

Another way would be expressing it as a real python generator for instance.

for new_token in model.generate(...):
    #do something

Again this is pretty theoretical, doesn't map really well to TF or Jax, but I think it's been proposed in other Issues, and is something to have in mind (probably to at least address this in docs and explain why you probably don't want to generate eternally :))

@gante
Copy link
Member Author

gante commented Jul 18, 2022

@Narsil that seems like an interesting use case 🤔 What would be the default arguments, if setting neither was a valid option? Perhaps a Stopping condition that would target this use case?

From what I've seen, the default arguments are one of the biggest pain points for new users (and even some experienced users), so whatever decisions we make in this PR should also cover that!

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work @gante ! Thanks a lot for transforming the discussion into an actionable PR! Everything looks good to me. Some nitpicks regarding the wording, but apart from this it looks all good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Calling generate on a T5ForConditionalGeneration returns n tokens but n-1 scores
5 participants