New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write downloaded model files atomically #3247
base: develop
Are you sure you want to change the base?
Conversation
Avoiding scenarios where one processes may be reading a file that is being written by another one by only moving the file to the final location after it has been fully written.
@naktinis, does this PR solve a currently known problem? In particular, it would be great to know if it solves any of the many many open issues concerning NLTK download errors. |
@ekaf this is an attempt to solve an issue we are having despite installing directly from The error we get almost every time is Our scenario is that we launch multiple processes simultaneously in an environment where there are no models downloaded yet. So it appears the following sequence is quite likely in such circumstances (and my guess is that it's causing our error):
|
I couldn't see any directly related issues currenlty reported, but if you have any that seem relevant, I'm happy to take a look. I can also report this issue separately to have it associated with this PR if that's helpful. |
@naktinis, your 5-point sequence above appears very plausible. However, if your assumption is correct, then we should expect this sequence to also have caused some of the many issues related to downloading and zip files, that have been reported over the past years. |
Thanks @naktinis for the issue and the test script. Your scenario seems plausible, because the NLTK downloader dates back to a time where running multiple parallel processes was less common than now. But given the large numbers of users who have had problems with downloading NLTK data packages, it seems surprising that none of them had similar errors. This would mean that you are downloading in a different way from everybody else, and I wonder what that could be? Hopefully, some users with download problems will try your solution, and see if it works for them. |
Avoiding scenarios where one processes may be reading a file that is being written by another one by only moving the file to the final location after it has been fully written.