Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some detail of gector-large #186

Open
liuxin99 opened this issue Mar 21, 2023 · 2 comments
Open

some detail of gector-large #186

liuxin99 opened this issue Mar 21, 2023 · 2 comments

Comments

@liuxin99
Copy link

@MaksTarnavskyi
I am interested in your paper “Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction” and I want to reproduce your results. However, I have some questions about the experimental details of your paper.

In your paper and the GitHub repository, you did not specify the GPU configuration and the hyperparameters for each stage of training. Could you please share this information with me?

Also, I encountered a strange problem when I was training the model. In the first stage of training, the GPU memory usage was very small at first, but then it gradually increased. Even with a V100 32G GPU, I got out-of-memory errors. Do you know what might cause this problem and how to solve it?

@gotutiyan
Copy link

I'm not the author of the paper but I can provide information for the last question.

In general, GECToR will first train only classifier layers firstly, so the memory usage will be small. However, after training a few epochs, the BERT based encoder will also be trained, so the memory usage will be large.

I think the solution of above is to set smaller batch size. To find a batch size where no out-of-memory occurs, you can try some kinds of values for batch size with setting [--cold_steps_count] option to zero. This option controls how many epochs to train only classifier layers, thus setting zero means BERT based encoder will be train from first epochs.

@liuxin99
Copy link
Author

@gotutiyan I'm aware of the issue you mentioned, but what's puzzling to me is that I set the batch size to 64 and the accumulate size to 4, after two epochs of training and unfreezing the encoder's parameters, my GPU memory usage is around 9G. However, during the later training process, GPU memory usage gradually accumulates until it hits OOM (my GPU is V100 32G). I'm not sure if there's a GPU memory leak.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants