Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta: HTTP Gateway cache control improvements #8717

Open
1 of 4 tasks
lidel opened this issue Feb 4, 2022 · 12 comments · Fixed by #8720
Open
1 of 4 tasks

Meta: HTTP Gateway cache control improvements #8717

lidel opened this issue Feb 4, 2022 · 12 comments · Fixed by #8720
Assignees
Labels
epic kind/enhancement A net-new feature or improvement to an existing feature kind/feature A new feature topic/gateway Topic gateway
Milestone

Comments

@lidel
Copy link
Member

lidel commented Feb 4, 2022

This is a meta issue for HTTP cache improvements that we should prioritize in go-ipfs:

cc @thattommyhall & @mathew-cf) if there are more asks/ideas here

@lidel lidel added kind/enhancement A net-new feature or improvement to an existing feature topic/gateway Topic gateway kind/feature A new feature epic labels Feb 4, 2022
@lidel lidel added this to In Progress in Go IPFS Roadmap Feb 4, 2022
@thattommyhall
Copy link
Member

Only thing I would add is a strategy of expiring at "top of the hour" or "end of the day" for the DirIndex pages makes dealing with when the etag does eventually change a bit nicer (you wont see different things on different servers for as long), which nudges me to prefer Expires to Cache-Control (but its not that much extra to calculate it anyway)

@mathew-cf
Copy link
Contributor

mathew-cf commented Feb 8, 2022

An idea in addition to the ones proposed:
A public resolution endpoint for gateways could be useful as an inexpensive, cacheable call. There is no public endpoint (AFAIK) to verify resolutions for IPNS public keys or IPFS subpaths without fetching the whole file from a gateway, so an endpoint or query param exposing resolution could be helpful.

@thattommyhall
Copy link
Member

thattommyhall commented Feb 8, 2022

@mathew-cf thats useful, but you'd have to rewrite to fetch /ipfs/<cid>/<path> or you couldnt be certain that the content you fetched matched. Like if I checked what the dnslink was for blog.ipfs.io then fetched /ipns/blog.ipfs.io it might have changed in the meantime

@mathew-cf
Copy link
Contributor

mathew-cf commented Feb 8, 2022

By cacheable, I meant for a short TTL (~1 min). This resolution response can still be supplemented by DNS resolution for faster cache verification/invalidation. If you have the x-ipfs-roots header, you can check that the resolution and response match and choose whether to cache the response.

I'm thinking in the context of using Cloudflare workers here if that helps clarify the context.

@BigLep BigLep linked a pull request Feb 8, 2022 that will close this issue
4 tasks
@lidel
Copy link
Member Author

lidel commented Feb 11, 2022

@mathew-cf Thoughts on leveraging existing HTTP HEAD responses for these quick checks?

  • They are inexpensive, don't return any payload, only headers, and do not require any new endpoint.

  • HEAD request resolves the content path, and for unixfs DAGs fetch the bare minimum (root block of a file/directory) to get the size for Content-Length header.

  • HEAD response could include additional metadata headers, for example:

    • let client know if the entire DAG is cached locally at the edge gateway, or if GET will require additional bitswap/fetch (enabling clients to decide which gateway to use for specific CID, prioritizing ones which have data in local cache etc)

@mathew-cf
Copy link
Contributor

@lidel That works for me!

@thattommyhall
Copy link
Member

thattommyhall commented Feb 24, 2022

I tested something like your dir-index-html strategy with lua in nginx, but I used X-Accel-Expires to keep it within our gateway.
Nginx will honor it, I think maybe Varnish but I cant find a clear ref (though I am near-certain there is an equivalent)

So configuring the name of the header might be useful is the concrete ask here, but someone else that uses another cache might be able to chime in

@lidel
Copy link
Member Author

lidel commented Feb 24, 2022

If we fix https://github.com/ipfs/go-ipfs/issues/1818#issuecomment-1015849462 (set proper cache-control header of /ipns/ and dir-index-html responses), that would be universal hint for all HTTP caching tools/solutions (in case where gateway operator wants to cache things for longer, the minimal max-age could be raised via config, decreasing the need for custom headers like X-Accel-Expires)

Go IPFS Roadmap automation moved this from In Progress to Done Mar 1, 2022
@BigLep BigLep reopened this Mar 1, 2022
@BigLep BigLep moved this from Done to Next Release Backlog in Go IPFS Roadmap Mar 1, 2022
@BigLep BigLep added this to the TBD milestone Mar 3, 2022
@thattommyhall
Copy link
Member

@lidel did fix/dir-index-html-max-age land somewhere?

@lidel
Copy link
Member Author

lidel commented Mar 28, 2022

@thattommyhall kinda: #8758 fixed a bug and now adds cache-control when a directory has index.html and returns it instead of dir listing response.

Generated dir listings don't have cache-control header, but they have deterministic Etag + will return HTTP 304 Not Modified if client sends matching Etag in If-None-Match header.

@thattommyhall
Copy link
Member

thattommyhall commented Mar 28, 2022

I'd like to advocate for something like top of the hour Expires or something in the DirIndex case too. It's nice not to have to re-ask the backend at least for a short while

@lidel
Copy link
Member Author

lidel commented Apr 7, 2022

@thattommyhall I am leaning towards setting Cache-Control that asks for caching forever (immutable), because dir listing is costly to generate, and we don't change them that often.

Given how people deploy gateway infra, we would still want to revalidate on CDN/caching proxies, so how about:

Cache-Control: public, max-age=31536000, s-maxage=604800, stale-while-revalidate=86400, stale-if-error=86400, immutable

(https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control)

My understanding is the caching proxy will always return a cached version of dir listing, but will try to revalidate if the cached copy is older than a week.

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic kind/enhancement A net-new feature or improvement to an existing feature kind/feature A new feature topic/gateway Topic gateway
Projects
No open projects
Go IPFS Roadmap
  
Next Release Backlog
Status: 🥞 Todo
Development

Successfully merging a pull request may close this issue.

4 participants