Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reintroduction of AWS_PRELOAD_METADATA functionality #1373

Open
jasongi opened this issue Apr 6, 2024 · 2 comments
Open

Reintroduction of AWS_PRELOAD_METADATA functionality #1373

jasongi opened this issue Apr 6, 2024 · 2 comments

Comments

@jasongi
Copy link

jasongi commented Apr 6, 2024

A few years ago the AWS_PRELOAD_METADATA setting was removed in this PR:
#636

I understand why - this is not something you’d want to turn on when servicing web requests and was probably the source of many complaints. But it was crucial to the performance optimisations that collectfast made. I’ve recently forked collectfast into Collectfasta as collectfast had been archived- but I’ve had to reimplement a lot of the S3Storage methods to add back the preloading functionality in order to maintain the performance improvements. This is not sustainable as the code will need to be kept in sync with each django-storages release.

My proposal: we move this logic back into django-storages (I am happy to contribute) but with some safeguards to ensure it only runs during management commands. Some ideas on this below (not all need to be implemented):

  1. Only allow it for static files storage classes - not other classes. This would be somewhat difficult as all the logic exists in the base classes so it would require a bit of refactoring of the base class to work. It also doesn’t solve it not running when serving requests
  2. Do not allow it to be set through settings at all - within the management command we can manipulate the storage object directly. Kinda feels ick doing this.
  3. Use an ENV var rather than Django settings - management commands can set the env var directly or users can do it when running the command. The reason I suggest this is that you don’t want to have a separate env module just for running collectstatic
  4. Detect the command that is being run somehow. Perhaps by checking sys.argv - I doubt this works with call_command so it probably not my first choice. As far as I know there’s no functionality in Django to tell within an arbitrary piece of code (i.e where you don’t have a request object) if you’re serving requests or running a management command.

If you want to read about why collectstatic performance is so bad I have a write-up on it here - unfortunately the way that the core Django storage API works - in particular ManifestStaticFilesStorage, it makes it very difficult to do a performant collectstatic command with just the methods available in the Storage interface if your storage class deals with remote storage.

@jasongi jasongi changed the title Reintroduction of Preload metadata functionality Reintroduction of AWS_PRELOAD_METADATA functionality Apr 6, 2024
@davidmir
Copy link

I'm experiencing super slow times with collectstatic to S3. A solution would be nice since it costs $.

@jschneier
Copy link
Owner

I will say it was an oversight of mine not to search GH to see how many consumers of the preload metadata API existed before I removed it.

And similarly, you understand why it was removed.

That being said I am happy to work on a solution that improves this for people since it seems there is still significant demand. In general, back when I was working with Django, I had switched to Whitenoise, but I do understand that doesn't work for everyone.

Can you propose a new API that would meet all the needs? It's possible we could upstream it once it was developed out and since this library has BaseStorage it might be okay to do something general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants