-
-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{caddy2,h2o}: use hash of path as ETag in Nix store (take 2) #222354
Conversation
This is more correct than the behaviour of the nginx/H2O patches in some edge cases (documented in the patch) while having negligible overhead (the cost of resolving symlinks in the path far outweighs the cost of the hash function; remember that HTTPS involves a lot of cryptographic operations to begin with). Co-authored-by: Yegor Timoshenko <yegortimoshenko@riseup.net>
This matches the behaviour of the caddy2 patch; see 3e8b3c7.
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/prs-ready-for-review/3032/2005 |
Sorry for my silence so far. I discussed this very briefly over the past two days on the internal Caddy Slack :) I'll try to share more details soon™ Can't speak for h2o, though, so that has to probably stay in nixpkgs :^) |
@@ -316,3 +316,5 @@ In addition to numerous new and upgraded packages, this release has the followin | |||
- The option `services.prometheus.exporters.pihole.interval` does not exist anymore and has been removed. | |||
|
|||
- `k3s` can now be configured with an EnvironmentFile for its systemd service, allowing secrets to be provided without ending up in the Nix Store. | |||
|
|||
- The ETag sent by the Caddy and H2O web servers is now calculated using the Nix store path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the comment below the headline
It is always better to rebase, actually you never want to merge master into a PR because it makes resolving merge conflicts and further rebases sometimes almost impossible
This is not fully reliable. A store entry should be reproducible but it is not 100% guaranteed especially with experimental impure derivations. This would work for ca derivations. |
Thank you @sephii for taking this over, and sorry for never getting around to it myself! I'm happy for you to rebase over my commits and I checked with @yanalunaterra; it's fine to just drop the test commits entirely and replace them with your own, since you rewrote them almost completely. For the patch commits themselves you can add your own
While this is true, it's not really relevant, as this matches (actually refines) the behaviour used by the nginx patch that has been in the nixpkgs tree for years. Currently ETags for caddy and h2o are based solely on the file size which is extremely unreliable and leads to far more false caching for paths in the Nix store than for regular files, so this is basically a strict improvement. Additionally the kinds of reproducibility failures one might expect (though hope to avoid) for Nix derivations - build timestamps, perhaps some nondeterministic ordering - are also the kinds that are relatively unimportant to invalidate a cache over. Any derivation that produces meaningfully semantically different outputs with the same hash is, I think, hopelessly broken, and anyone serving its output directly out of the Nix store will be running into problems already.
Unfortunately I don't see how this is possible, unless Caddy want to maintain unportable Nix-specific special cases upstream. The patch fundamentally relies on the knowledge that files underneath |
I think it could (hopefully) be simpler than this: unconditionally inject a module that does the same path resolution logic the current patch does and have it set/delete the ETag and Last-Modified headers appropriately. Presumably people could override the headers on top of that in their configuration if needed. So, in terms of what we're looking for from Caddy, we just need a way for a Caddy module to hook into the filename/stat info that The current approach in this PR patches the |
Since we already have the NixOS module run |
Seems reasonable, as long as it's still possible to override the headers in the configuration even with it injected. Unfortunately reading Caddy's For posterity, this 2009 commit has some explanation of why Nix uses mtime=1: NixOS/nix@14bc3ce (thanks to @DeeUnderscore on Matrix for digging this up). Basically it'd be reasonable if tools treated it as "no meaningful mtime" but at least 14 years ago a lot of them would apparently use it as a sentinel value for nonexistent files or refuse to load them entirely. Even if that's no longer true it's probably too late to change now since it would result in a huge number of tarballs and other artifacts needing to be rebuilt and have their hashes updated. |
@emilazy Just a thought, is it possible for NixOS to return the actual mod time in the stat syscall? That way the FS can have its mod time at 0 as needed for the hashing, but then at runtime the syscall can return the true value? |
We don't have a real modification time to use (and couldn't alter the stat results on all operating systems; the Nix store is just a normal filesystem). Paths in the Nix store are immutable and look like |
Nix has a serialization format, .nar, that deliberately doesn't record a mtime field to better support binary-reproducible builds (which it does quite well; the NixOS install CD has been byte-for-byte-reproducible for years now). Every build output gets coerced to that format for hashing, content coming back from a binary cache is in that format -- so the mtimes deliberately don't exist. |
@aanderse, quick question -- can you speak to the requirements that led to |
@emilazy: I see your point about needing to duplicate work with the staticfiles module. We don't need the path or stat data to just delete a meaningless @mholt: Would you accept a patch to staticfiles having it add details about which file a request resolved to to request context? |
Sure, but instead of adding a new value to the context, it might make more sense / be more useful as a var. One line of code should do it: (Pass in the request context) |
Yeah, if we could get Thanks for being open to modifications to help us achieve this - I know Nix is a very strange system at first sight and it's great to have an upstream willing to help accommodate our peculiar needs :) I assume that since all the plugins get linked into one binary there should be no appreciable performance overhead to doing things this way? |
Adds two variables for requests being handled as static files: - `caddyhttp.fileserver.path` - filesystem path as used by caddy to open file being served; may be an index file, a compressed proxy, or otherwise something other than the original content. - `caddyhttp.fileserver.info` - fs.FileInfo structure with basename and stat data. Per discussion in NixOS/nixpkgs#222354 (with thanks to @emilazy and @mholt)
Adds two variables for requests being handled as static files: - `handlers.file_server.path` - filesystem path as used by caddy to open file being served; may be an index file, a compressed proxy, or otherwise something other than the original content. - `handlers.file_server.info` - fs.FileInfo structure with basename and stat data. Per discussion in NixOS/nixpkgs#222354 (with thanks to @emilazy and @mholt)
Yes, this is the correct behaviour. See
We do something like this in a few modules, but here is another way to handle it that will probably suit your needs:
|
Okay so I ported the existing patch over to a middleware handler and I was going to write a long comment about the various tradeoffs involved but I'm not certain that we can actually do what we need inside a middleware handler (at least without duplicating a bunch of path resolution logic from Caddy) in the first place. I was going to say that relying on I had two potential alternatives for that. One was to keep a patch around like this, which is gross but probably easy to maintain: diff --git a/modules/caddyhttp/routes.go b/modules/caddyhttp/routes.go
index 9be3d01a..cc7ad188 100644
--- a/modules/caddyhttp/routes.go
+++ b/modules/caddyhttp/routes.go
@@ -156,6 +156,13 @@ func (r *Route) ProvisionHandlers(ctx caddy.Context, metrics *Metrics) error {
return fmt.Errorf("loading handler modules: %v", err)
}
for _, handler := range handlersIface.([]any) {
+ if handler.(caddy.Module).CaddyModule().ID == "http.handlers.file_server" {
+ nixHandler, err := ctx.LoadModuleByID("http.handlers.nix_store_etag", nil)
+ if err != nil {
+ return fmt.Errorf("loading Nix store ETag handler: %v", err)
+ }
+ r.Handlers = append(r.Handlers, nixHandler.(MiddlewareHandler))
+ }
r.Handlers = append(r.Handlers, handler.(MiddlewareHandler))
}
...and the other was just to keep patching (Also the current patch is broken when serving precompressed files. Not sure if that's a problem carried over from my previous PR or if the support for that is new.) |
caddyserver/caddy#5556 (comment) points out that we can do this in a middleware handler, so I'll try getting that working. It still seems like we have a choice to make:
(btw I wish our module wrote out the JSON format directly; it doesn't make much sense to me for our structured configuration interface to output string templated Caddyfiles when those are designed for humans, but it'd be a compatibility break to fix that now...) |
It's several times longer than the direct patch and not the most elegant thing in the world, but... package caddy_nix_store_etag_middleware
import (
"context"
"crypto/sha512"
"encoding/hex"
"io/fs"
"net/http"
"path/filepath"
"strings"
"github.com/caddyserver/caddy/v2"
"github.com/caddyserver/caddy/v2/modules/caddyhttp"
"github.com/caddyserver/caddy/v2/modules/caddyhttp/fileserver"
)
func init() {
caddy.RegisterModule(Middleware{})
}
type Middleware struct{}
type responseWriterWrapper struct {
*caddyhttp.ResponseWriterWrapper
ctx context.Context
wroteHeader bool
}
func (Middleware) CaddyModule() caddy.ModuleInfo {
return caddy.ModuleInfo{
ID: "http.handlers.nix_store_etag",
New: func() caddy.Module { return new(Middleware) },
}
}
func (m Middleware) ServeHTTP(w http.ResponseWriter, r *http.Request, next caddyhttp.Handler) error {
w = &responseWriterWrapper{
ResponseWriterWrapper: &caddyhttp.ResponseWriterWrapper{ResponseWriter: w},
ctx: r.Context(),
wroteHeader: false,
}
return next.ServeHTTP(w, r)
}
func (rww *responseWriterWrapper) WriteHeader(status int) {
if rww.wroteHeader {
return
}
// 1xx responses aren't final; just informational
if status < 100 || status > 199 {
rww.wroteHeader = true
}
rww.setEtag()
rww.ResponseWriterWrapper.WriteHeader(status)
}
func (rww *responseWriterWrapper) setEtag() {
info, _ := caddyhttp.GetVar(rww.ctx, fileserver.StaticFileInfoVarKey).(fs.FileInfo)
if info == nil {
return
}
// Nix store files have mtime = 1
if info.ModTime().Unix() != 1 {
return
}
path, ok := caddyhttp.GetVar(rww.ctx, fileserver.StaticFilePathVarKey).(string)
if !ok {
return
}
var err error
// avoid running filepath.Abs unless necessary, as it calls
// filepath.Clean which is (relatively) expensive
if !filepath.IsAbs(path) {
path, err = filepath.Abs(path)
if err != nil {
return
}
}
const storePrefix = "/nix/store/"
if !strings.HasPrefix(path, storePrefix) {
// Since mtime = 1, most likely a link into the store, e.g.:
//
// /var/www/element.example.com
// -> /nix/store/00000000000000000000000000000000-element-web
//
// Note that filepath.EvalSymlinks is relatively expensive (~15 µs in
// an HTTP/2 microbenchmark while testing this patch), so you probably
// want to avoid relying on this codepath if you're optimizing for raw
// request throughput (i.e., don't serve any mtime = 1 files that
// aren't explicitly rooted in a derivation or explicit /nix/store
// path in the Caddy configuration).
path, err = filepath.EvalSymlinks(path)
if err != nil || !strings.HasPrefix(path, storePrefix) {
return
}
}
// Hash the entire path so that ETag changes when switching /foo from:
// /nix/store/00000000000000000000000000000000-www/a/foo
// to:
// /nix/store/00000000000000000000000000000000-www/b/foo
pathDigest := sha512.Sum512_224([]byte(path))
rww.ResponseWriterWrapper.Header().Set("Etag", `"`+hex.EncodeToString(pathDigest[:])+`"`)
}
func (rww *responseWriterWrapper) Write(d []byte) (int, error) {
if !rww.wroteHeader {
rww.WriteHeader(http.StatusOK)
}
return rww.ResponseWriterWrapper.Write(d)
}
var (
_ caddyhttp.MiddlewareHandler = (*Middleware)(nil)
) If we do decide to go with one of the middleware-using routes we should probably repeat the benchmarks from the last PR just to check that all this additional indirection doesn't make performance worse. |
That plugin looks pretty good to me. The primary overhead is 1 extra function on the call stack, which should be negligible. One thing to note, I believe the Go standard library calls I'll be reviewing relevant PR(s) momentarily. |
Yeah, I imagine the overhead is probably mostly just retrieving the information from the context rather than the actual additional handler, and I'm guessing even that probably comes out in the wash. Do you have any thoughts on the |
Actually instead of adding nix specific logics, we can simply append the inode number to the etag, it works for nix in most cases. It is done as such in actix-files: https://github.com/actix/actix-web/blob/17218dc6c88848938cebc560deafbf1c2184fa56/actix-files/src/named.rs#L379, which is used by miniserve. |
@emilazy So, hear me out, I have an idea: Create a You could do the same thing as Maybe a |
So the main reason we'd prefer to avoid explicit user intervention to get this to work is that we want people to have a good out of the box experience and it's hard for users who aren't Nix experts to tell when you'll run into this problem or what's causing it when you do. For instance, we package frontend web applications that it's perfectly natural to specify directly as a web root in our structured configuration interface: those will end up served out of That said, personally I would be OK with sacrificing the command-line case and only handling it when Caddy is being configured with our structured interface (by injecting the handler into our generated configuration), which is how most people will be using Caddy on NixOS. But for various boring complex reasons this would be difficult for us to do in a way that handles cross-compilation scenarios, especially when people are adding on additional Caddy modules. For that reason I strongly prefer solutions that will let us maintain a local patch to have the correct behaviour in all cases, or have upstream Caddy handle it properly out of the box. Getting this done elegantly is the problem :)
So, this... is interesting? I think that if we're going to have to have Nix-local modules or patches or whatever then there's no real point to doing this over the current approach. But if upstream Caddy can strip off the Last-Modified header when appropriate (which it's not clear will happen, and the rest of this comment is conditional on that, since otherwise we need to do something ourselves regardless), and would be willing to incorporate inode numbers in their ETag calculation, then this seems like it might work fine. For Nix store files this would mean we're effectively keying on (size, inode). Here's a couple caveats and downsides I can think of:
I think taking all this into account my preference list from most to least preferred goes something like:
|
caddy is already using mtime in etag, which the same reasoning applies. |
It can happen, it just needs to be done in a ResponseWriterWrapper. Anyway, it sounds like we're close to a solution that works for both projects. |
Modified time is controllable though: for example, see |
Hey @mholt, sorry I left this hanging for a while. If you'd be happy incorporating the inode number as part of the standard ETag calculation in upstream Caddy and dropping the Last-Modified header for mtime=1 then yes I think that'd work fine for us and we could forgo any patching or extension module. Up to you to decide whether you find the caveat discussed above acceptable: inodes are inherently machine/FS-specific so a cluster of Caddy servers saving the same static content will likely calculate different ETags and therefore have inferior caching behaviour in a load-balanced setup. If you do go down this route I would suggest applying a hash function to the input of (file size, mtime, inode) to generate the final ETag as revealing machine-specific inode numbers could be an information leak. |
@emilazy No worries, thanks for working with me/us on this. I think I'm alright with that caveat -- maybe, ideally the inode is only used if necessary (i.e. if modtime is 0 or 1)? Is there anything else you need from me? |
I’ll close this since the ETag fix is shipped in Caddy 2.7 (which is now in nixpkgs). Thanks for the fix! |
Description of changes
This is an updated version of #83111 originally submitted by @emilazy with the remaining open points fixed. Here’s what I’ve changed:
I’m not sure if it was better to rebase the commits from the existing PR, or to merge master in the PR branch. I did the latter, but I can rebase instead if that’s a better process.
Things done
sandbox = true
set innix.conf
? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)