Speeding up generating compressed files #148

edmorley · 2017-09-15T18:58:22Z

In a project I work on, we use both CompressedStaticFilesMixin and the standalone compressor (python -m whitenoise.compress <DIR>) during Heroku deployments.

At the moment these steps are a considerable percentage (30-40%) of our deployment times.

For example using Python 2.7.13, Django 1.11.5, WhiteNoise master, Brotli 0.6.0, a Heroku-16 one-off performance-m dyno (2 cores, 2.5GB RAM, Ubuntu 16.04) with the static files directory cleared (to emulate deployment, since state intentionally isn't carried over):

~ $ time ./manage.py collectstatic --noinput
...
156 static files copied to '/app/treeherder/static', 202 post-processed.

real    0m29.837s
user    0m29.405s
sys     0m0.359s

As a baseline, using the stock Django ManifestStaticFilesStorage results in:

real    0m1.031s
user    0m0.855s
sys     0m0.167s

For the above, the 202 files output from ManifestStaticFilesStorage have a combined file-size of 15MB.

Moving onto the standalone compressor (which we use on the output of a webpack build, for the SPA part of the project):

~ $ find dist/ -type f | wc -l
35
~ $ du -hs dist/
5.2M    dist/
~ $ time python -m whitenoise.compress dist/
...
real    0m11.929s
user    0m11.841s
sys     0m0.084s

Ideas off the top of my head to speed this up:

Use concurrent.futures or similar to take advantage of all cores
See if the scantree() implementation might be faster than compress.py's os.walk() plus later stats
Reduce the number of files being compressed (eg WHITENOISE_KEEP_ONLY_HASHED_FILES and Try to work around leftover intermediate ManifestStaticFilesStorage files #147)
Profile both CompressedStaticFilesMixin and the CLI version, to double check that most of the time is indeed being spent in the compiled gzip/brotli code and not somewhere unexpected.
Compare the performance of the gzip stdlib and compiled brotli python package with command line equivalents.

The text was updated successfully, but these errors were encountered:

edmorley · 2017-09-15T19:29:49Z

Also worth noting is that if I switch from CompressedStaticFilesMixin back to ManifestStaticFilesStorage, and instead manually run python -m whitenoise.compress <path to static dir> afterwards, the total time taken is 12% faster -- even though it's now doing more work (due to the latter approach compressing the extra missed intermediate files - eg the base.5af66c1b1797.css instance in #147).

edmorley · 2017-09-15T21:14:22Z

Moving onto the standalone compressor (which we use on the output of a webpack build, for the SPA part of the project):

Breakdown of python -m whitenoise.compress dist/ times:

Both gzip and brotli: 11.93s
Just gzip (via: --no-brotli): 0.35s
Just Brotli (via: --no-gzip): 11.66s
--no-gzip --no-brotli: 0.05s (this walks filesystem and reads files from disk but no compression/writes)

So this is all on Brotli, and not due to the filesystem walking/reading parts or gzip (albeit the standalone compressor example here was just for 35 files; but even for a 10,000 file directory Brotli compression times would dwarf anything else even if the filesystem walking happened to be inefficient).

edmorley · 2017-09-20T21:40:38Z

@evansd I have a multi-threading solution locally that uses multiprocessing.dummy which is present in the stdlib for both Python 2 and 3, however it's not great (eg doesn't raise child thread exceptions unless I add lots more boilerplate).

Would you be open to me adding a dependency for Python 2 only on the futures package (which is a backport of Python 3 concurrent.futures)? The wheel is only 13KB, likely to be used by projects anyway, and I can use a version range specifier in setup.py so it won't be installed under Python 3.

evansd · 2017-09-21T08:36:22Z

@edmorley Thanks a lot for this Ed, and for the other work you've been doing on whitenoise recently. Sorry I haven't responded sooner; things have been a bit busy lately.

Yes, I'd be open to adding a dependency on futures. In general I like the fact that whitenoise is dependency-free, but backports of the Python 3 stdlib are a different case and I don't think it's a problem to add those.

sonthonaxrk · 2021-02-16T14:16:12Z

Really, the compression level should be configurable.

https://github.com/evansd/whitenoise/blob/master/whitenoise/compress.py#L84

fix evansd#148

rik · 2023-04-02T22:46:49Z

I've taken a stab at processing files in parallel in #484.

fix evansd#148

edmorley mentioned this issue Sep 17, 2017

CompressedStaticFilesMixin compresses each CSS file multiple times #150

Closed

edmorley mentioned this issue Nov 1, 2017

Try to work around leftover intermediate ManifestStaticFilesStorage files #147

Closed

veuncent mentioned this issue Oct 24, 2018

[Django] Prevent collectstatic from compressing static files on each startup globaldigitalheritage/arches-3d#102

Closed

edmorley mentioned this issue Nov 16, 2018

[Django] Ease disabling gzip support #207

Closed

rik added a commit to rik/whitenoise that referenced this issue Apr 2, 2023

Compress each file in a ProcessPool

cdf0f2e

fix evansd#148

rik added a commit to rik/whitenoise that referenced this issue Apr 2, 2023

Compress each file in a ProcessPool

3ab417d

fix evansd#148

rik added a commit to rik/whitenoise that referenced this issue Apr 2, 2023

Compress each file in a ProcessPool

2eb8eed

fix evansd#148

rik linked a pull request Apr 2, 2023 that will close this issue

Compress each file in a ThreadPool #484

Open

rik added a commit to rik/whitenoise that referenced this issue Apr 2, 2023

Compress each file in a ProcessPool

6d5ea36

fix evansd#148

rik added a commit to rik/whitenoise that referenced this issue Apr 2, 2023

Compress each file in a ThreadPool

32e2611

fix evansd#148

akx pushed a commit to valohai/whitenoise that referenced this issue Mar 27, 2024

Compress each file in a ThreadPool

bd2e02d

fix evansd#148

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up generating compressed files #148

Speeding up generating compressed files #148

edmorley commented Sep 15, 2017 •

edited

edmorley commented Sep 15, 2017

edmorley commented Sep 15, 2017 •

edited

edmorley commented Sep 20, 2017 •

edited

evansd commented Sep 21, 2017

sonthonaxrk commented Feb 16, 2021

rik commented Apr 2, 2023 •

edited

Speeding up generating compressed files #148

Speeding up generating compressed files #148

Comments

edmorley commented Sep 15, 2017 • edited

edmorley commented Sep 15, 2017

edmorley commented Sep 15, 2017 • edited

edmorley commented Sep 20, 2017 • edited

evansd commented Sep 21, 2017

sonthonaxrk commented Feb 16, 2021

rik commented Apr 2, 2023 • edited

edmorley commented Sep 15, 2017 •

edited

edmorley commented Sep 15, 2017 •

edited

edmorley commented Sep 20, 2017 •

edited

rik commented Apr 2, 2023 •

edited