You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We would expect that the only different between enabling a kv-cache for a model in generation is the speed of decoding; however, in experiments with commenting out with device: model.setup_caches() in our generate.py recipe, the output is garbage.
Needs more investigation.
The text was updated successfully, but these errors were encountered:
@joecummingsI'm guessing this is because the causal mask is created in setup_caches()here, so without calling this function we're attending to all tokens, resulting in garbage outputs. Maybe we should move this mask initialization into __init__?
Nevermind, this line takes care of the causal mask if it's missing.
We would expect that the only different between enabling a kv-cache for a model in generation is the speed of decoding; however, in experiments with commenting out
with device: model.setup_caches()
in our generate.py recipe, the output is garbage.Needs more investigation.
The text was updated successfully, but these errors were encountered: