-
-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak #180
Comments
Perhaps it is more helpful to write a test case code which does not use imagehash to show that the issue is indeed in pywt. |
Does the memory leak for all kinds of modes/methods or only one in particular? |
Candidates could be these two functions, which allocate a Wavelet object, and one should double-check that these are being de-allocated. wavedec2 idwtn |
Reproduction code snippet without imagehash (files - list of input files, 200K+ files):
I saw the leaking issue with 'haar' and 'db4' modes. |
Can reproduce, will investigate.
Throwing in a Some trivial poking about shows about 0.4kB leaked / iteration. Not affected by size of the wavelet or data. Affects at least both the 'sym' and 'db' classes. Haven't checked any others.
Note the following C does not leak:
|
Gotcha. It's these two lines in
Fix ready, just looking for similar cases in the rest of the code. Seems like an explicit |
Is that a known issue in Cython ? Could be worth raising it there ? |
@stuaxo yes, I will raise it upstream, I have a minimal example here. http://trac.cython.org is dead at the moment though. I have posted to cython-users, I think it is pending approval. However, the workaround isn't too bad, so I'm happy using that for now. |
Cheers :) only mentioned as it gave me a horrible flashback to a similar bug I had before, but never tracked down in a different library. |
@dmpetrov could you please double check that this is fixed now? |
I've checked the master branch version. Now it works much better. For one specific scenario it uses 400MB instead of 1.2GB of RAM. However, the memory usage is still increasing. The growth rate is 3-10 times less then it use to be. I'm not sure if it is another issue in the library or just a regular Python behavior. Anyway now we can process million of images (one by one) with less then 1Gb of memory (less than 700MB to be precise). So, I think we can close the issue. |
That still shouldn't be happening. I'll re-open this so I remember to chase it down. |
@dmpetrov, could you try to do |
I'll try today night. Sent from my iPhone
|
tried gc() in the loop - the same result |
I am unable to reproduce the issue with the following script, with 10 000 000 loop iterations memory use stays constant. Perhaps the bug is in PIL.
|
@kwohlfahrt you are right. I replaced image reading part to image copy and now the memory usage is constant. Do you know where is the correct PIL repository for opening a bug https://github.com/python-pillow/Pillow ? |
That looks like the right place (if you are using pillow instead of the old PIL). I haven't used it much to be honest. Closing this now, thanks for reporting and following up :) |
Thank you for the fix! |
It looks like pywt has a memory leak. When I process more than 200K small images (<10 KB each) a process takes all 16GB of my memory and stops (or slows down).
whash() function from imagehash lib:
JohannesBuchner/imagehash@da9386d
Reproduction code snippet:
Btw.. imagehash.phash() function doesn't use pywt and doesn't have this issue.
Image subset from Avito competition:
https://www.kaggle.com/c/avito-duplicate-ads-detection
The text was updated successfully, but these errors were encountered: