Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate with KV-cache enabled vs. not enabled gives different results #959

Open
joecummings opened this issue May 10, 2024 · 2 comments · May be fixed by #973
Open

Generate with KV-cache enabled vs. not enabled gives different results #959

joecummings opened this issue May 10, 2024 · 2 comments · May be fixed by #973
Assignees

Comments

@joecummings
Copy link
Contributor

We would expect that the only different between enabling a kv-cache for a model in generation is the speed of decoding; however, in experiments with commenting out with device: model.setup_caches() in our generate.py recipe, the output is garbage.

Needs more investigation.

@joecummings joecummings self-assigned this May 10, 2024
@rohan-varma
Copy link
Member

You might need to change the incremental_decode in the generation function?

@calvinpelletier
Copy link

calvinpelletier commented May 15, 2024

@joecummings I'm guessing this is because the causal mask is created in setup_caches() here, so without calling this function we're attending to all tokens, resulting in garbage outputs. Maybe we should move this mask initialization into __init__?

Nevermind, this line takes care of the causal mask if it's missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants