Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak #132

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Fix memory leak #132

wants to merge 1 commit into from

Conversation

almost
Copy link

@almost almost commented Nov 26, 2021

Creating a new session each time S3Storage is instantiated creates a memory leak. It seems that S3Storage can get created a bunch of times (I'm seeing it get created again and again as my app runs) and a boto3's Session takes loads of memory (see boto/boto3#1670) so my app eventually runs out of memory. This should fix the issue while still avoiding using the same session across different threads.

Creating a new session each time S3Storage is instantiated creates a memory leak. It seems that S3Storage can get created a bunch of times (I'm seeing it get created again and again as my app runs) and a boto3's Session takes loads of memory (see boto/boto3#1670) so my app eventually runs out of memory. This should fix the issue while still avoiding using the same session across different threads.
@etianen
Copy link
Owner

etianen commented Nov 26, 2021

Do you know what's triggering the recreation of the storage object? AFAIK it's created once globally, and only changes if the storage backend setting changes

@etianen
Copy link
Owner

etianen commented Nov 26, 2021

This comment appears to suggest that the session can be shared between multiple threads:

boto/boto3#1670 (comment)

@almost
Copy link
Author

almost commented Nov 26, 2021

This commit (dcf7952) seems to explictely add a per-thread session instead of using the implicit (and presumably shared) session. I don't know if it's needed but I assumed that commit was probably made for a reason. But if not then that's good, it can definitely be changed to either use a module-level variable for the session or just not specify a session at all (as it did before that commit) which is a bit simpler!

@etianen
Copy link
Owner

etianen commented Nov 26, 2021

Ugh, this is a mess. The boto documentation and issue tracker make it very unclear what should be considered thread-safe, and what should be shareable. Maybe it's changed since I put in that fix? Maybe not. I've been distinctly unimpressed with boto3 as a library, ever since porting django-s3-storages over to it.

The core issue seems to be two things:

  1. The boto3 session consumes a lot of resources that seem to inexplicably not clean up correctly when the reference count falls to zero. Possibly this is a cyclic garbage collection problem, possibly something more serious.
  2. Your code seems to be dropping and recreating the storage backend.

We can't do anything about (1). Unfortunately, your solution here isn't useful in an environment that spawns lots of single-use threads, as each thread will create it's own session that'll hang around somehow even when the thread is GC'd and the associated thread locals destroyed.

(2) is worth investigating further. My understanding is that the storage class should be created once at the start of the program, and used forever since. The strategy used by django-s3-storages is (I believe) the same as the one used by django-storages, and I'm not aware of any complaints about memory leaks in either library. This (possibly) implies there is something atypical about your code to be creating and throwing away storage backends so frequently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants