Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{caddy2,h2o}: use hash of path as ETag in Nix store (take 2) #222354

Closed
wants to merge 15 commits into from

Conversation

sephii
Copy link
Contributor

@sephii sephii commented Mar 21, 2023

Description of changes

This is an updated version of #83111 originally submitted by @emilazy with the remaining open points fixed. Here’s what I’ve changed:

I’m not sure if it was better to rebase the commits from the existing PR, or to merge master in the PR branch. I did the latter, but I can rebase instead if that’s a better process.

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.05 Release Notes (or backporting 22.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/prs-ready-for-review/3032/2005

@emilylange
Copy link
Member

Sorry for my silence so far.

I discussed this very briefly over the past two days on the internal Caddy Slack :)
Happy to share, that a fix for this can likely get upstreamed in https://github.com/caddyserver/caddy/ instead.

I'll try to share more details soon™

Can't speak for h2o, though, so that has to probably stay in nixpkgs :^)

@@ -316,3 +316,5 @@ In addition to numerous new and upgraded packages, this release has the followin
- The option `services.prometheus.exporters.pihole.interval` does not exist anymore and has been removed.

- `k3s` can now be configured with an EnvironmentFile for its systemd service, allowing secrets to be provided without ending up in the Nix Store.

- The ETag sent by the Caddy and H2O web servers is now calculated using the Nix store path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the comment below the headline

@SuperSandro2000
Copy link
Member

SuperSandro2000 commented Mar 31, 2023

I’m not sure if it was better to rebase the commits from the existing PR, or to merge master in the PR branch. I did the latter, but I can rebase instead if that’s a better process.

It is always better to rebase, actually you never want to merge master into a PR because it makes resolving merge conflicts and further rebases sometimes almost impossible

use hash of path as ETag in Nix store

This is not fully reliable. A store entry should be reproducible but it is not 100% guaranteed especially with experimental impure derivations. This would work for ca derivations.

@emilazy
Copy link
Member

emilazy commented May 8, 2023

Thank you @sephii for taking this over, and sorry for never getting around to it myself! I'm happy for you to rebase over my commits and I checked with @yanalunaterra; it's fine to just drop the test commits entirely and replace them with your own, since you rewrote them almost completely. For the patch commits themselves you can add your own Co-authored-by lines or overwrite the commit authors with your own and add Co-authored-by: Emily <vcs@emily.moe>; I'm fine with either. The existing coauthorship on the caddy2 patch should be updated to Co-authored-by: Yana Luna-Terra <yana@yana.moe> as well. It's probably a good idea to remove the git commit headers from the in-tree patch files entirely, since they don't really have any ownership and might be modified arbitrarily in the future.

This is not fully reliable. A store entry should be reproducible but it is not 100% guaranteed especially with experimental impure derivations. This would work for ca derivations.

While this is true, it's not really relevant, as this matches (actually refines) the behaviour used by the nginx patch that has been in the nixpkgs tree for years. Currently ETags for caddy and h2o are based solely on the file size which is extremely unreliable and leads to far more false caching for paths in the Nix store than for regular files, so this is basically a strict improvement. Additionally the kinds of reproducibility failures one might expect (though hope to avoid) for Nix derivations - build timestamps, perhaps some nondeterministic ordering - are also the kinds that are relatively unimportant to invalidate a cache over. Any derivation that produces meaningfully semantically different outputs with the same hash is, I think, hopelessly broken, and anyone serving its output directly out of the Nix store will be running into problems already.

I discussed this very briefly over the past two days on the internal Caddy Slack :) Happy to share, that a fix for this can likely get upstreamed in https://github.com/caddyserver/caddy/ instead.

Unfortunately I don't see how this is possible, unless Caddy want to maintain unportable Nix-specific special cases upstream. The patch fundamentally relies on the knowledge that files underneath /nix/store are generated in special ways and that their path is a reasonable proxy for their content.

@emilazy
Copy link
Member

emilazy commented May 23, 2023

Writing some code to look through caddy adapt output for configurations that look like they might be serving /nix/store contents without any directives modifying the Last-Modified header does strike me as it ought to be a reasonably straightforward addition to the NixOS module for caddy.

I think it could (hopefully) be simpler than this: unconditionally inject a module that does the same path resolution logic the current patch does and have it set/delete the ETag and Last-Modified headers appropriately. Presumably people could override the headers on top of that in their configuration if needed.

So, in terms of what we're looking for from Caddy, we just need a way for a Caddy module to hook into the filename/stat info that fileserver is going to serve so we can determine whether or not it's in the Nix store and override headers appropriately. A way to ensure that module is injected by default would also be useful (something like patching a list in the Caddy source would be okay; we'd be happy to maintain a minimally-invasive patch like that), but we can possibly get away without and just use our own logic at the NixOS level to detect whether the configuration is serving static files or not and inject the Caddy module appropriately.

The current approach in this PR patches the fileserver code directly, which is quite invasive: https://github.com/NixOS/nixpkgs/blob/2a3ac2fbfb23b78181556e61410c0b0fba4d639a/pkgs/servers/caddy/nix-store-etag.patch. The hope is that we can replicate that with a Caddy plugin. (You can think of the Nix store as being like a content-addressed store like IPFS (though there are caveats to that), which is why the mtime is useless but we can still generate a useful ETag ourselves.)

@charles-dyfis-net
Copy link
Contributor

Since we already have the NixOS module run caddy fmt, running caddy adapt instead and injecting a caddy module invocation into the generated JSON config strikes me as very feasible. I may take a shot at it when I have some weekend hours to spare.

@emilazy
Copy link
Member

emilazy commented May 23, 2023

Seems reasonable, as long as it's still possible to override the headers in the configuration even with it injected.

Unfortunately reading Caddy's staticfiles.go makes me unsure how we could write a correct Caddy plugin for this. The resolved file path and fs.Stat result are computed locally within ServeHTTP and never exposed in any way other than to http.ServeContent, so it's unclear to me how we could access them in a plugin to drive the logic the patch currently has. We could potentially recompute the path and stat it again, at the cost of another syscall, a race condition, and the possibility of drift from Caddy's logic. Might not be too bad, but maybe there's something better we could do that I'm missing?

For posterity, this 2009 commit has some explanation of why Nix uses mtime=1: NixOS/nix@14bc3ce (thanks to @DeeUnderscore on Matrix for digging this up). Basically it'd be reasonable if tools treated it as "no meaningful mtime" but at least 14 years ago a lot of them would apparently use it as a sentinel value for nonexistent files or refuse to load them entirely. Even if that's no longer true it's probably too late to change now since it would result in a huge number of tarballs and other artifacts needing to be rebuilt and have their hashes updated.

@mholt
Copy link

mholt commented May 23, 2023

@emilazy Just a thought, is it possible for NixOS to return the actual mod time in the stat syscall? That way the FS can have its mod time at 0 as needed for the hashing, but then at runtime the syscall can return the true value?

@emilazy
Copy link
Member

emilazy commented May 23, 2023

We don't have a real modification time to use (and couldn't alter the stat results on all operating systems; the Nix store is just a normal filesystem). Paths in the Nix store are immutable and look like /nix/store/<hash>-<name>/..., where the hash uniquely identifies the process that produced the files, so what the patch does is calculate an SHA-512 over the path to generate an ETag and leave the Last-Modified header blank. We have had an existing patch for nginx along these lines for years, but unfortunately there's no way around supporting it individually in other web servers. The reason the patch looks at the path and mtime is to identify which paths are in the Nix store in the first place.

@charles-dyfis-net
Copy link
Contributor

@emilazy Just a thought, is it possible for NixOS to return the actual mod time in the stat syscall? That way the FS can have its mod time at 0 as needed for the hashing, but then at runtime the syscall can return the true value?

Nix has a serialization format, .nar, that deliberately doesn't record a mtime field to better support binary-reproducible builds (which it does quite well; the NixOS install CD has been byte-for-byte-reproducible for years now). Every build output gets coerced to that format for hashing, content coming back from a binary cache is in that format -- so the mtimes deliberately don't exist.

@charles-dyfis-net
Copy link
Contributor

@aanderse, quick question -- can you speak to the requirements that led to caddy fmt being conditional on whether we're cross-compiling? (Relevant insofar as the proposal above would replace it with code that needs to run unconditionally for consistent behavior).

@charles-dyfis-net
Copy link
Contributor

charles-dyfis-net commented May 23, 2023

@emilazy: I see your point about needing to duplicate work with the staticfiles module. We don't need the path or stat data to just delete a meaningless Last-Modified header, but we certainly do need it to generate a good etag. (Please accept my apologies for being slow on the draw!)

@mholt: Would you accept a patch to staticfiles having it add details about which file a request resolved to to request context?

@mholt
Copy link

mholt commented May 23, 2023

@charles-dyfis-net

Would you accept a patch to staticfiles having it add details about which file a request resolved to to request context?

Sure, but instead of adding a new value to the context, it might make more sense / be more useful as a var. One line of code should do it:

https://github.com/caddyserver/caddy/blob/cee4441cb1d485b38d728168a315cda5641d84fb/modules/caddyhttp/vars.go#L316

(Pass in the request context)

@emilazy
Copy link
Member

emilazy commented May 23, 2023

Yeah, if we could get filename and info (or even just info.ModTime(), but I suspect it doesn't hurt to include the whole thing for others who want to avoid race conditions) from staticfiles's ServeHTTP stashed into an accessible context like that so we can adjust the headers based on them that sounds like it would be perfect for our needs.

Thanks for being open to modifications to help us achieve this - I know Nix is a very strange system at first sight and it's great to have an upstream willing to help accommodate our peculiar needs :) I assume that since all the plugins get linked into one binary there should be no appreciable performance overhead to doing things this way?

charles-dyfis-net added a commit to charles-dyfis-net/caddy that referenced this pull request May 23, 2023
Adds two variables for requests being handled as static files:

- `caddyhttp.fileserver.path` - filesystem path as used by caddy to open file being served; may be an index file, a compressed proxy, or otherwise something other than the original content.
- `caddyhttp.fileserver.info` - fs.FileInfo structure with basename and stat data.

Per discussion in NixOS/nixpkgs#222354 (with thanks to @emilazy and @mholt)
charles-dyfis-net added a commit to charles-dyfis-net/caddy that referenced this pull request May 23, 2023
Adds two variables for requests being handled as static files:

- `handlers.file_server.path` - filesystem path as used by caddy to open file being served; may be an index file, a compressed proxy, or otherwise something other than the original content.
- `handlers.file_server.info` - fs.FileInfo structure with basename and stat data.

Per discussion in NixOS/nixpkgs#222354 (with thanks to @emilazy and @mholt)
@aanderse
Copy link
Member

@aanderse, quick question -- can you speak to the requirements that led to caddy fmt being conditional on whether we're cross-compiling? (Relevant insofar as the proposal above would replace it with code that needs to run unconditionally for consistent behavior).

Yes, this is the correct behaviour. See

echo "Ignoring validation for cross-compilation"
.

We do something like this in a few modules, but here is another way to handle it that will probably suit your needs:

validationPackage = if pkgs.stdenv.buildPlatform == pkgs.stdenv.hostPlatform

@emilazy
Copy link
Member

emilazy commented May 24, 2023

Okay so I ported the existing patch over to a middleware handler and I was going to write a long comment about the various tradeoffs involved but I'm not certain that we can actually do what we need inside a middleware handler (at least without duplicating a bunch of path resolution logic from Caddy) in the first place.

I was going to say that relying on caddy adapt to inject the module is going to be a pain because Caddyfile parsing depends on the Caddy modules you have bundled, so if someone sets services.caddy.package then maybe we can't get the native host version of that Caddy and even if we can it'd require compiling the whole thing twice. Additionally, it won't work with running caddy(1) manually to test things or to use caddy file-server (but that might be ok-ish if the ETag/Last-Modified removal are handled upstream). But for it to even work at all requires being able to set the ETag header in a module after the static files handler sets the relevant context variables, which I'm not sure how to do (the deferred HeaderOps stuff doesn't seem adequate).

I had two potential alternatives for that. One was to keep a patch around like this, which is gross but probably easy to maintain:

diff --git a/modules/caddyhttp/routes.go b/modules/caddyhttp/routes.go
index 9be3d01a..cc7ad188 100644
--- a/modules/caddyhttp/routes.go
+++ b/modules/caddyhttp/routes.go
@@ -156,6 +156,13 @@ func (r *Route) ProvisionHandlers(ctx caddy.Context, metrics *Metrics) error {
                return fmt.Errorf("loading handler modules: %v", err)
        }
        for _, handler := range handlersIface.([]any) {
+               if handler.(caddy.Module).CaddyModule().ID == "http.handlers.file_server" {
+                       nixHandler, err := ctx.LoadModuleByID("http.handlers.nix_store_etag", nil)
+                       if err != nil {
+                               return fmt.Errorf("loading Nix store ETag handler: %v", err)
+                       }
+                       r.Handlers = append(r.Handlers, nixHandler.(MiddlewareHandler))
+               }
                r.Handlers = append(r.Handlers, handler.(MiddlewareHandler))
        }
 

...and the other was just to keep patching staticfiles.go directly, which I guess I've circled back around to feeling like is probably the best and maybe the only option. It'd be a matter of patching calculateEtag (and deleting the Last-Modified header; done for us if caddyserver/caddy@299c847 goes upstream, otherwise we can just incorporate it into the patch), which should be pretty easy to maintain as that function shouldn't change often (except when we ask for an upstream special case that turns out to not really get us closer to our goal anyway :/).

(Also the current patch is broken when serving precompressed files. Not sure if that's a problem carried over from my previous PR or if the support for that is new.)

@emilazy
Copy link
Member

emilazy commented May 24, 2023

caddyserver/caddy#5556 (comment) points out that we can do this in a middleware handler, so I'll try getting that working. It still seems like we have a choice to make:

  1. Use middleware, post-process caddy adapt: doesn't require patching Caddy at all, awkward with cross-compilation (especially with custom Caddy packages?), doesn't work on the command line
  2. Use middleware, patch routes.go to inject our middleware whenever file_server is used: works everywhere, probably the easier-to-maintain patch, kind of gross
  3. Just patch staticfiles.go: simple, works everywhere, requires more potential maintenance for patch breaks when updating Caddy

(btw I wish our module wrote out the JSON format directly; it doesn't make much sense to me for our structured configuration interface to output string templated Caddyfiles when those are designed for humans, but it'd be a compatibility break to fix that now...)

@emilazy
Copy link
Member

emilazy commented May 24, 2023

It's several times longer than the direct patch and not the most elegant thing in the world, but...

package caddy_nix_store_etag_middleware

import (
	"context"
	"crypto/sha512"
	"encoding/hex"
	"io/fs"
	"net/http"
	"path/filepath"
	"strings"

	"github.com/caddyserver/caddy/v2"
	"github.com/caddyserver/caddy/v2/modules/caddyhttp"
	"github.com/caddyserver/caddy/v2/modules/caddyhttp/fileserver"
)

func init() {
	caddy.RegisterModule(Middleware{})
}

type Middleware struct{}

type responseWriterWrapper struct {
	*caddyhttp.ResponseWriterWrapper
	ctx         context.Context
	wroteHeader bool
}

func (Middleware) CaddyModule() caddy.ModuleInfo {
	return caddy.ModuleInfo{
		ID:  "http.handlers.nix_store_etag",
		New: func() caddy.Module { return new(Middleware) },
	}
}

func (m Middleware) ServeHTTP(w http.ResponseWriter, r *http.Request, next caddyhttp.Handler) error {
	w = &responseWriterWrapper{
		ResponseWriterWrapper: &caddyhttp.ResponseWriterWrapper{ResponseWriter: w},
		ctx:                   r.Context(),
		wroteHeader:           false,
	}
	return next.ServeHTTP(w, r)
}

func (rww *responseWriterWrapper) WriteHeader(status int) {
	if rww.wroteHeader {
		return
	}
	// 1xx responses aren't final; just informational
	if status < 100 || status > 199 {
		rww.wroteHeader = true
	}
	rww.setEtag()
	rww.ResponseWriterWrapper.WriteHeader(status)
}

func (rww *responseWriterWrapper) setEtag() {
	info, _ := caddyhttp.GetVar(rww.ctx, fileserver.StaticFileInfoVarKey).(fs.FileInfo)
	if info == nil {
		return
	}

	// Nix store files have mtime = 1
	if info.ModTime().Unix() != 1 {
		return
	}

	path, ok := caddyhttp.GetVar(rww.ctx, fileserver.StaticFilePathVarKey).(string)
	if !ok {
		return
	}

	var err error

	// avoid running filepath.Abs unless necessary, as it calls
	// filepath.Clean which is (relatively) expensive
	if !filepath.IsAbs(path) {
		path, err = filepath.Abs(path)
		if err != nil {
			return
		}
	}

	const storePrefix = "/nix/store/"

	if !strings.HasPrefix(path, storePrefix) {
		// Since mtime = 1, most likely a link into the store, e.g.:
		//
		//     /var/www/element.example.com
		//       -> /nix/store/00000000000000000000000000000000-element-web
		//
		// Note that filepath.EvalSymlinks is relatively expensive (~15 µs in
		// an HTTP/2 microbenchmark while testing this patch), so you probably
		// want to avoid relying on this codepath if you're optimizing for raw
		// request throughput (i.e., don't serve any mtime = 1 files that
		// aren't explicitly rooted in a derivation or explicit /nix/store
		// path in the Caddy configuration).
		path, err = filepath.EvalSymlinks(path)
		if err != nil || !strings.HasPrefix(path, storePrefix) {
			return
		}
	}

	// Hash the entire path so that ETag changes when switching /foo from:
	//     /nix/store/00000000000000000000000000000000-www/a/foo
	// to:
	//     /nix/store/00000000000000000000000000000000-www/b/foo
	pathDigest := sha512.Sum512_224([]byte(path))
	rww.ResponseWriterWrapper.Header().Set("Etag", `"`+hex.EncodeToString(pathDigest[:])+`"`)
}

func (rww *responseWriterWrapper) Write(d []byte) (int, error) {
	if !rww.wroteHeader {
		rww.WriteHeader(http.StatusOK)
	}
	return rww.ResponseWriterWrapper.Write(d)
}

var (
	_ caddyhttp.MiddlewareHandler = (*Middleware)(nil)
)

If we do decide to go with one of the middleware-using routes we should probably repeat the benchmarks from the last PR just to check that all this additional indirection doesn't make performance worse.

@mholt
Copy link

mholt commented May 25, 2023

That plugin looks pretty good to me. The primary overhead is 1 extra function on the call stack, which should be negligible.

One thing to note, I believe the Go standard library calls IsAbs() for you in Abs()! So you don't have to do it yourself.

I'll be reviewing relevant PR(s) momentarily.

@emilazy
Copy link
Member

emilazy commented May 25, 2023

Yeah, I imagine the overhead is probably mostly just retrieving the information from the context rather than the actual additional handler, and I'm guessing even that probably comes out in the wash. filepath.Abs does check for absolute paths, but then it calls filepath.Clean on the result, which we don't need and which I believe turned out to be expensive enough to reliably measure in requests/sec benchmarks :(

Do you have any thoughts on the routes.go patch in #222354 (comment)? It's not the most elegant thing in the world, for sure, but I'm wondering if there's a better way we can make sure users get a good experience with caddy file-server etc. out of the box other than carrying a patch for that or staticfiles.go.

@NickCao
Copy link
Member

NickCao commented May 26, 2023

Actually instead of adding nix specific logics, we can simply append the inode number to the etag, it works for nix in most cases. It is done as such in actix-files: https://github.com/actix/actix-web/blob/17218dc6c88848938cebc560deafbf1c2184fa56/actix-files/src/named.rs#L379, which is used by miniserve.

@mholt
Copy link

mholt commented May 26, 2023

@emilazy So, hear me out, I have an idea:

Create a caddy nix-file-server command. Plugins can register subcommands -- this is how the file_server handler has its own subcommand, for example.

You could do the same thing as caddy file-server -- but instead of building a config with just file_server, have it add in any other HTTP handlers you need. So add in your nix-specific handler. :)

Maybe a caddy nix-file-server subcommand or similar.

@emilazy
Copy link
Member

emilazy commented May 26, 2023

@mholt

So the main reason we'd prefer to avoid explicit user intervention to get this to work is that we want people to have a good out of the box experience and it's hard for users who aren't Nix experts to tell when you'll run into this problem or what's causing it when you do. For instance, we package frontend web applications that it's perfectly natural to specify directly as a web root in our structured configuration interface: those will end up served out of /nix/store and if you don't know that you have to explicitly change things to handle this properly you'll end up with bad caching behaviour (whether that's things being cached incorrectly or getting no caching at all). It's come up in practice in the past and tracking down the source of it was why we started carrying a patch for to fix it for nginx users. caddy file-server --help does say it's production ready and if we can keep that true for NixOS users we'd like to be able to!

That said, personally I would be OK with sacrificing the command-line case and only handling it when Caddy is being configured with our structured interface (by injecting the handler into our generated configuration), which is how most people will be using Caddy on NixOS. But for various boring complex reasons this would be difficult for us to do in a way that handles cross-compilation scenarios, especially when people are adding on additional Caddy modules.

For that reason I strongly prefer solutions that will let us maintain a local patch to have the correct behaviour in all cases, or have upstream Caddy handle it properly out of the box. Getting this done elegantly is the problem :)

Actually instead of adding nix specific logics, we can simply append the inode number to the etag, it works for nix in most cases. It is done as such in actix-files: https://github.com/actix/actix-web/blob/17218dc6c88848938cebc560deafbf1c2184fa56/actix-files/src/named.rs#L379, which is used by miniserve.

So, this... is interesting? I think that if we're going to have to have Nix-local modules or patches or whatever then there's no real point to doing this over the current approach. But if upstream Caddy can strip off the Last-Modified header when appropriate (which it's not clear will happen, and the rest of this comment is conditional on that, since otherwise we need to do something ourselves regardless), and would be willing to incorporate inode numbers in their ETag calculation, then this seems like it might work fine. For Nix store files this would mean we're effectively keying on (size, inode).

Here's a couple caveats and downsides I can think of:

  • ext4 recycles inode numbers eagerly. It's not impossible that they could be reused between two versions of the same URL with the same size. It doesn't seem too likely, though: even in an "ideal" situation where you have the Nix store on its own ext4 filesystem, single-file webroot A, build single-file webroot B, garbage collect, then build single-file webroot C, it still seems like the writing of .drv files and so on would likely perturb things enough to prevent a collision.

  • This might interact weirdly with load balancing or other multiple-machine setups: two servers serving the exact same file are likely to give different ETags due to differing inode numbers. This could impair caching and I can imagine this being a blocker for Caddy doing this upstream. (The converse - a collision between two identical inodes with different content for the same URL across multiple servers - seems unlikely in any realistic setup.)

I think taking all this into account my preference list from most to least preferred goes something like:

  1. If upstream Caddy is willing to drop the Last-Modified header for mtime=1 and incorporate inodes into ETags, and there aren't any huge limitations or downsides to this approach that I've missed, then just ship as-is and don't do anything special and hopefully everything just works fine.

  2. Maintain a small patch to Caddy ourselves that either injects a module to fix ETag/Last-Modified headers on Nix store files, or handles that directly in staticfiles.go. (I'm fairly agnostic as to which unless the former ends up having some bad performance penalty we've missed; they both have their upsides and downsides in terms of complexity, "grossness", and maintenance costs.)

  3. Inject our module only for users of services.caddy by postprocessing the configuration. Solve the cross-compilation issues with this... somehow. Hope nobody uses caddy file-server directly.

  4. Do nothing, things just suck if you end up serving content out of the Nix store and you maybe never figure out why.

@NickCao
Copy link
Member

NickCao commented May 26, 2023

This might interact weirdly with load balancing or other multiple-machine setups: two servers serving the exact same file are likely to give different ETags due to differing inode numbers. This could impair caching and I can imagine this being a blocker for Caddy doing this upstream. (The converse - a collision between two identical inodes with different content for the same URL across multiple servers - seems unlikely in any realistic setup.)

caddy is already using mtime in etag, which the same reasoning applies.

@mholt
Copy link

mholt commented May 26, 2023

(which it's caddyserver/caddy#5556 (comment), and the rest of this comment is conditional on that, since otherwise we need to do something ourselves regardless),

It can happen, it just needs to be done in a ResponseWriterWrapper.

Anyway, it sounds like we're close to a solution that works for both projects.

@lukateras
Copy link
Member

caddy is already using mtime in etag, which the same reasoning applies.

Modified time is controllable though: for example, see curl -R. Inode numbers, not really.

@emilazy
Copy link
Member

emilazy commented Jun 16, 2023

Anyway, it sounds like we're close to a solution that works for both projects.

Hey @mholt, sorry I left this hanging for a while. If you'd be happy incorporating the inode number as part of the standard ETag calculation in upstream Caddy and dropping the Last-Modified header for mtime=1 then yes I think that'd work fine for us and we could forgo any patching or extension module. Up to you to decide whether you find the caveat discussed above acceptable: inodes are inherently machine/FS-specific so a cluster of Caddy servers saving the same static content will likely calculate different ETags and therefore have inferior caching behaviour in a load-balanced setup. If you do go down this route I would suggest applying a hash function to the input of (file size, mtime, inode) to generate the final ETag as revealing machine-specific inode numbers could be an information leak.

@mholt
Copy link

mholt commented Jun 19, 2023

@emilazy No worries, thanks for working with me/us on this.

I think I'm alright with that caveat -- maybe, ideally the inode is only used if necessary (i.e. if modtime is 0 or 1)?

Is there anything else you need from me?

@emilylange emilylange mentioned this pull request Aug 3, 2023
12 tasks
@sephii
Copy link
Contributor Author

sephii commented Dec 14, 2023

I’ll close this since the ETag fix is shipped in Caddy 2.7 (which is now in nixpkgs). Thanks for the fix!

@sephii sephii closed this Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants