Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting "no-cache" for 404 files #245

Open
slig opened this issue Feb 17, 2020 · 4 comments
Open

Setting "no-cache" for 404 files #245

slig opened this issue Feb 17, 2020 · 4 comments

Comments

@slig
Copy link

slig commented Feb 17, 2020

Hi,

First of all, thank you very much for the time and effort you've put into this project!

I have what I believe is an uncommon setup: during deploy I have both new and old application servers working simultaneously (using k8s), STATIC_URL = '/static/' and everything behind CloudFlare.

During a deploy that changes static files (such as when updating anything else compiled by webpack), due to the fact that old application code are still running while new pods are being brought up, if a page is loaded and routed to the new app server, it gets the new version and thus in the HTML there's a request to download the new version of the file (css-$newhash.css) but some of those static requests end up being served by old versions of the app code and that returns, obviously, 404. Now CloudFlare caches those 404 and every new page load is broken due that.

I'm not using collectstatic because webpack already generates the files with hashes.

I'm thinking on how to tackle this and I see two possibilities:

  • a custom middleware that sees if the 404 has STATIC_URL as the URL prefix and sets up a no-cache.
  • updating whitenoise to support this use case if that might be useful for others in the future, maybe through a special config option.

Thanks a lot!

Edit: for reference, here's the answer from CloudFlare telling to use no-cache on 404 pages to avoid this issue.

@evansd
Copy link
Owner

evansd commented Mar 13, 2020

Hi, thanks for your message. Sorry I haven't had time to look at this earlier.

The issue you've identified is one of the things that's bothered me before about the whitenoise model, but I don't see a simple, neat way of fixing it.

My assumption was that this would be quite a rare situation because usually the previous versions of the assets would already have been cached by the time the next deploy happens. But maybe you're clearing caches after you deploy or something like that.

One really hacky approach I thought of would be a bit of middleware which catches 404s under STATIC_URL and then changes them to a redirect to the same URL but with ?attempt=1 appended. If it gets the same request back again it redirects it again with ?attempt=2 and so on. The hope would be that after a retry or two it eventually hits an app server which can serve the request. And if it still hasn't done so after some set number of attempts then you can return 404 and give up. But that really is a nasty hack and probably not worth it.

I think if you'd had a bit of middleware which returned no-cache for any 404s under STATIC_URL I'd be open to including it with whitenoise. But I don't think I'd want to build that functionality in to the main code as it's quite a niche requirement I think.

Thanks.

@slig
Copy link
Author

slig commented Mar 14, 2020

Hi,

Thanks for replying! No worries about it!

You're right about previous versions being cached, but since the newer versions have different hashes in its file, there's no cache for those and the first time they're request might end up in a previous Django instance that will return 404 back to CloudFlare.

Thanks for considering add a new middleware for this special case. I'll try to cook something up, try it on my code and share it here if that ends up working.

I google'd and Heroku can do rolling deploys / zero downtime, and more and more apps are behind CF, so this issue will become more apparent in the future, I believe.

Thanks again!

@rafikdraoui
Copy link

rafikdraoui commented Apr 19, 2021

One partial solution that could work for some applications would be to fall back to serving the "hash-less" version of the file whenever WhiteNoise returns a 404, making sure that the fallback response doesn't get cached. This relies on WHITENOISE_KEEP_ONLY_HASHED_FILES being False (which is the default value).

I haven't run this production, and there are many cases where this would not work (see below), but perhaps this can prove useful to someone.

from whitenoise.middleware import WhiteNoiseMiddleware
from django.utils.cache import add_never_cache_headers

class WhiteNoiseWithFallbackMiddleware(WhiteNoiseMiddleware):
    def __call__(self, request):
        response = super().__call__(request)
        if response.status_code == 404 and request.path.startswith(self.static_prefix):
            fallback_path = self.get_name_without_hash(request.path)
            request.path = request.path_info = fallback_path
            fallback_response = self.process_request(request)
            if fallback_response:
                response = fallback_response
                add_never_cache_headers(response)
                # might also need `patch_cache_control(response, private=True)` if using Django < 3
        return response

Updating a static file

Let's say there is a static file called app.js that is referred to in the HTML served at /index.html, and we make a change to it. A request to /index.html will trigger a second request to /static/app.<hash>.js

The currently deployed version of the application (Version A) has the file referred to as app.old.js, and we are deploying Version B that will have the file referred to as app.new.js.

In the following, we assume that we are in the middle of the deployment, during the short interval where the two versions A and B are serving requests.

If both requests are served by the same version, then everything is fine.

If /index.html is served by Version A, then ideally the subsequent request to /static/app.old.js will get served by the CDN, but on the off-chance it does end up being served by Version B instead, then it will fall back with /static/app.js, which really is app.new.js. This response won't get cached, so we won't end up with app.old.js being cached with app.new.js's content.

If /index.html is served by Version B but the subsequent request to /static/app.new.js is routed to Version A, then it will fall back with /static/app.js, which really is app.old.js. This response won't get cached, so we won't end up with app.new.js being cached with app.old.js's content.

Adding or removing a static file

If a brand new file is added, then things are a bit trickier. The change will have to be made across two deploys: first a deploy where the new file is added, but isn't referred to anywhere; then a second deploy where the HTML is updated to refer to the new file. The steps are similar if a file is removed (i.e. remove the reference but keep the file, then remove the file in a subsequent deploy).

This is similar to how database migrations have to be handled with rolling deploys, so hopefully the process should feel familiar for people working on such projects.

Let's say a new file new.js is added, and is sourced in /index.html within the same deploy. If the request to /index.html is served by Version B, but the subsequent request to /static/new.js gets routed to Version A, then this will return a 404.

On the other hand, if the change is staggered across two deploys (A -> add new.js, no changes to index.html -> B -> add <script> tag to index.html -> C), then during the deploy from A to B, there will never be any requests to /static/new.js (which would be problematic if routed to A), and during the deploy from B to C, a request to /static/new.js would be successful regardless of which version it is routed to.

Problems with this approach

This would work best for files that are self-contained JS apps that don't rely too much on the surrounding HTML markup. If on the other hand the program is split across several files, then there could be some breakages when a mix of old and new versions which depend on one another are loaded together. For example, app.new.js could end up being loaded alongside app.old.css, which might or might not be a problem (although the same situation would occur regardless of how static files are served).

Thankfully any such glitches should be short-lived, and perhaps this is a good enough trade-off for your use case.

Having the possibility that a request to /static/file.new.js can (temporarily) return the file /static/file.old.js feels a bit icky... The no-caching headers on such fallback responses add some peace of mind, but still...

@slig
Copy link
Author

slig commented Apr 19, 2021

@rafikdraoui really interesting take, thanks for sharing! Maybe I'll use your solution on my projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants