Generate: deprecate generation relying on default `max_length` #18018

gante · 2022-07-04T19:34:34Z

What does this PR do?

(EDITED)

This PR applies the outcome of two discussions:

Favor the use of max_new_tokens, as opposed to max_length. This introduces the max_new_tokens to TF and FLAX.
Update the docstring about scores, to be clear that its length depends on max_new_tokens and other factors.

Review suggestion: review PT first. TF and Flax are mostly copy/paste.

Closes #17868

HuggingFaceDocBuilderDev · 2022-07-04T19:44:48Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Thanks for adding this. Made comments on the Flax file which apply to the others. I'll let @patrickvonplaten comment on whether we should remove this max_length argument in v5 or not, but max_new_tokens makes way more sense to me!

src/transformers/generation_flax_utils.py

patrickvonplaten · 2022-07-07T12:17:27Z

Thanks a lot for opening this very-much needed PR @gante!

Before moving forward, I'd like to discuss a bit if we should remove max_length in a v5 version or not because I'm a bit torn in-between, but don't really think we can or should remove max_length ever.

First, I want to lay out my opinion about generate and the future of generate a bit in general and second want to give my opinion of max_new_tokens vs max_length.

First: My general opinion regarding generate and how it should develop further:

generate is very important as it's used by all GPT-like and all encoder-decoder models. The are three methods that are used in 99% of the cases (being greedy_search, sample and beam_search) - the other generation methods are somewhat special.
It was a mistake to put the generation parameters in the PreTrainedConfig and have the function arguments default to them. This makes it extremely difficult to deal with breaking changes now and is also too much "black magic" for the user
We have to be extremely careful with adding new features to generate since the function quickly explodes (see this PR)
generate != generation pipeline . The generate function should limit the amount of "black magic" a.k.a. falling back to certain defaults as much as possible. E.g. in PyTorch's generate we create the attention_mask automatically from the padding id. I've not copied these things to Flax generate as I think the should not have been done as it makes the method hard to understand and now limits us severely in adding good error messages, keeping the function understandable.

Now from this perspective, if somehow possible, I'd like to long-term remove the "fall-back" mechanism to the config generation arguments and delete the generation parameters from the config fully (I'd be happy with an optional generation config instead though). This would help us enormously to:

a) Change this in the future since it generate won't rely on thousands of saved configs
b) Very much help the user to better understand what's going on / remove the black magic
c) Keep generate slim by removing many edge cases. What happens if max_length is defined before it falls back to config.max_length / what after. How to deal with conflicting config generation params and config generation params, etc...

Second:

Now from that perspective, I think we should not remove max_length since it has some clear advantages over max_new_tokens, but rather move towards removing the default of max_length=20 and changing all docs to use max_new_tokens, but allow the user to use both.

More specifically:

IMO max_new_tokens is definitely better than max_length because of the following:

max_length has a different meaning for "decoder-only" models such as GPT and "encoder-decoder" models such as T5. For "decoder-only" models the max_length is understood as the current length of the input (input_ids.shape + max_new_tokens) whereas for "encoder-decoder" models max_length is understood as max_new_tokens -1 (-1 because there is already the dummy decoder token id), which is essentially the same as max_new_tokens which is regardless of what the users has put as input_ids (since those are processed only once by the encoder). => Switching to max_new_tokens here reduces this knowledge burden as both "decoder-only" and "encoder-decoder" behave the same
max_new_tokens is more "natural" as users don't consider the input to the model as part of the generation & it avoids weird errors like max_length < len(input_ids)

However

max_length has some very useful use cases. E.g. if one always want to generate to the maximum allowed length of the model, it's very easy to do this with max_length but harder with max_new_tokens as max_new_tokens needs to be derived dynamically for every input
max_length is simply used too much IMO

So moving forward, I would propose to start with a warning if the user passes neither max_length nor max_new_tokens and state that this won't be allowed anymore in v5. Currently the model would just fall back to a somewhat arbitrary default of 20 which is not great IMO. At the same time we can strongly advice the user to use max_new_tokens instead of max_length.

Would love to hear your (general) opinions here regarding generate @gante @sgugger @LysandreJik @patil-suraj @Narsil @sanchit-gandhi @thomwolf

sgugger · 2022-07-07T14:08:55Z

I completely agree with the point of moving away from the config things that are not related to instantiating the model (like I've been doing for training parameters like gradient accumulation) and in general we might need some sort of "pipeline config" for the use of the model by default in a pipeline/widget, which could also be re-used for the parameters used by generate.

Good for me to have the two arguments forever and ever while encouraging max_new_tokens, your arguments make sense.

gante · 2022-07-07T17:55:53Z

I also agree with the reasoning and the suggested path forward 👍

gante · 2022-07-11T18:53:29Z

@patrickvonplaten @sgugger I've implemented the changes from the comments above 👍 Related to the max_length/max_new_tokens, the behavior is now the following:

Neither are set ➡️ raise a deprecation warning (will be an exception in v5), nudge towards max_new_tokens, and use the default for max_length;
Both are set ➡️ raise an exception;
max_length is set ➡️ nothing happens
max_new_tokens is set ➡️ updates max_length according to the prompt length

sgugger

Made some comments that should be applied on all three frameworks.

LGTM. thanks for iterating on this!

src/transformers/generation_flax_utils.py

Narsil · 2022-07-15T12:27:58Z

Your comment @patrickvonplaten is exactly what I think !

Also, not sure about v5 but using neither max_length nor max_new_tokens also make sense to me as a user. Using neither would mean just keep on generating while I ask you to stop, eventually eternally (if we had such a transformers + no max length and infinite compute power). Overriding the StoppingCriteria currently allows that.

Another way would be expressing it as a real python generator for instance.

for new_token in model.generate(...):
    #do something

Again this is pretty theoretical, doesn't map really well to TF or Jax, but I think it's been proposed in other Issues, and is something to have in mind (probably to at least address this in docs and explain why you probably don't want to generate eternally :))

gante · 2022-07-18T10:59:53Z

@Narsil that seems like an interesting use case 🤔 What would be the default arguments, if setting neither was a valid option? Perhaps a Stopping condition that would target this use case?

From what I've seen, the default arguments are one of the biggest pain points for new users (and even some experienced users), so whatever decisions we make in this PR should also cover that!

src/transformers/generation_flax_utils.py

src/transformers/generation_tf_utils.py

patrickvonplaten

Very nice work @gante ! Thanks a lot for transforming the discussion into an actionable PR! Everything looks good to me. Some nitpicks regarding the wording, but apart from this it looks all good!

…ssage)

gante requested review from sgugger, patrickvonplaten and patil-suraj July 4, 2022 19:35

gante changed the title ~~Generate: Deprecate max_length in favor of max_new_tokens~~ Generate: deprecate max_length in favor of max_new_tokens Jul 4, 2022

sgugger reviewed Jul 5, 2022

View reviewed changes

gante force-pushed the generate_max_new_tokens branch 2 times, most recently from 3d8d1f1 to 5e52144 Compare July 6, 2022 14:28

gante added 2 commits July 11, 2022 18:21

Deprecate

3749f2d

PR suggestions

da10d52

gante force-pushed the generate_max_new_tokens branch from 5e52144 to da10d52 Compare July 11, 2022 18:43

gante added 2 commits July 11, 2022 18:45

improve max_new_tokens param message

3feccd5

remove double backticks

c6f1e95

gante changed the title ~~Generate: deprecate max_length in favor of max_new_tokens~~ Generate: deprecate generation relying on default max_length Jul 11, 2022

sgugger approved these changes Jul 11, 2022

View reviewed changes

src/transformers/generation_flax_utils.py Outdated Show resolved Hide resolved

src/transformers/generation_flax_utils.py Outdated Show resolved Hide resolved

gante added 2 commits July 12, 2022 11:20

Apply PR suggestions

987fb18

add missing spaces

573bc4b

gante mentioned this pull request Jul 18, 2022

max_length and stopping_criteria in generate() #17718

Closed

4 tasks