-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory leaks in image pipeline #2447
Comments
there is issue in PIL for memory leaks in Python 3, but I'm seeing this in Python 2.7: python-pillow/Pillow#2019 |
Additionally, there are 2-3x memory requirements for each image due to the format conversion and thumbnailing. Also, if I'm not wrong, the images pipeline bypass the concurrent requests limit causing to have a log of in-flight image requests. I haven't seen memory issues with the images downloader when setting |
See also: #482. Pipeline doesn't bypass website concurrency limits, but requests are sent directly to Downloader, without putting them to Scheduler - this indeed means they are all in-memory. |
Hi, I have a similar problem.
Past that, I used I've done this testing on a completely fresh project - using |
Okay. The problem for me was related to request caching in the media pipeline ( |
Seems that this PR I'll solve this problem when merged - #2823 |
Looks like it. Thanks. |
I had significant memory leak issue too when downloading images with IMAGES_MIN_HEIGHT and IMAGES_MIN_WEIGHT set. All of these image responses that does not meet min_height and min_weight condition raises ImageException and memory is filling up with these image responses. I deduced |
Closing as a duplicate of #939. |
It seems to me that image pipeline is leaking memory in a very significant ways. I have spider that downloads lists of images. There were always problems with memory when downloading images, but now my list of images to download got larger and I thought about opening issue here.
Basically after opening some images memory usage goes up and stays up (it's not reset to previous value). It might be some issue with PIL or it might be something we're doing in pipeline that is causing this. In any case this looks worrying and I think we should reflect on steps to take to limit this problem.
Following code reproduces the problem (I know it's long but this is really shortest I could get), it relies on presence of images.txt file that contains list of urls to images.
Sample output on attached image file: images.txt
Notice how memory goes up and stays up. from 44516 to 52700. Notice delay between final request and SIGINT ( 10 seconds). After this delay memory usage still stays at 52700.
The text was updated successfully, but these errors were encountered: