-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why memory increases during training #2602
Comments
Hi, Not sure why the memory consumption increases, although you can just lower the batch size (eg. to 64) and it should lower the memory consumption as well. |
Hello! I'm also not quite sure, but I have noticed that sometimes the memory usage can increase. The reasoning is that during tokenization, we pad to the largest sample in the batch, up to the maximum sequence length. So, every time that you encounter a batch with a text that is longer than any text from any previous batch, then the memory usage goes up. After all, it has to put more values for that batch on the GPU. So, if you reached a particularly long text near the end of the training loop, this can result in a memory usage spike. In short, if a text from one of your batches exceeds the maximum sequence length, then the batch will be as big as it can possibly be. That will be the maximum memory usage that the training should take.
|
I would like to have a larger batch size, it seems that the models are better in that case. Is it possible to train the model on two GPUs? |
It makes sense. Thanks. |
Only via #2449 at this point. This PR will be merged and released as Sentence Transformers v3 soon. |
Hello,
I have an (anchor, positive) unlabeled dataset with around 250,000 examples.
Here is the code I use to fine-tune the sentence-transformers/multi-qa-mpnet-base-cos-v1 model on
a subset of MS Marco dataset:
When the training starts, 22 of 24GB of my VRAM is consumed. The memory consumption increases during iterations and at the very end of the first epoch I got the Out of Memory error.
I then tried to use Data Loader with IterableDataset but the result is the same. Why the memory increases towards the end of the epoch and how to fine tune this model?
Regards, Milos
The text was updated successfully, but these errors were encountered: