Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETag / 304 unmodified support? #173

Open
onionhammer opened this issue Feb 15, 2023 · 10 comments
Open

ETag / 304 unmodified support? #173

onionhammer opened this issue Feb 15, 2023 · 10 comments
Labels
enhancement New feature or request help wanted Extra attention is needed v2 v2 release

Comments

@onionhammer
Copy link

onionhammer commented Feb 15, 2023

It seems like basically everything right now every html page seems to come back with cache control max age of 86400, if you deploy a new version of your app, most bundlers will hash asset names to cachebust other than your HTML files.

Describe the solution you'd like
Support etags, so people can configure HTML files and other non-hashed files to send not modified responses.

Describe alternatives you've considered
Using a different static web server

@joseluisq
Copy link
Collaborator

Yes, definitely a nice to have feature.

@joseluisq joseluisq added enhancement New feature or request help wanted Extra attention is needed v2 v2 release labels Feb 15, 2023
@onionhammer
Copy link
Author

onionhammer commented Feb 15, 2023

I'm not a rustacean, but I may give it a go for fun next weekend; this source code looks pretty clean & self explanatory; if you have any thoughts or advice on direction/libs to utilize for high-perf file hashing (to generate an etag) or an algorithm, maybe post it here.

@joseluisq
Copy link
Collaborator

I'm not a rustacean, but I may give it a go for fun next weekend; this source code looks pretty clean & self explanatory.

Sure, the code in general is easy to jump-in, so don't hesitate to give it a try.

if you have any thoughts or advice on direction/libs to utilize for high-perf file hashing (to generate an etag) or an algorithm, maybe post it here.

I found interesting this Actix file content, so you could have it a look at https://github.com/actix/actix-web/blob/master/actix-files/src/named.rs#L378

Just let me know here if you need any other assistance on your SWS journey.

@onionhammer
Copy link
Author

I looked into it, I think it would honestly be very little effort to put in defaults just for the common use case of "bundled app with static assets that have cache-busting names", but actually putting in a real etag (not weak etag - with the W/ prefix) could be a lot of work.

The thing that lead me to rethink my decision to plop in a quick-and-easy new setting, was I think adding etag on top of what you have today would just add technical debt on top of the hardcoded cache preferences that are already in the program. Probably prior to etag support comes, the program will need more robust cache configuration options

@joseluisq
Copy link
Collaborator

I looked into it, I think it would honestly be very little effort to put in defaults just for the common use case of "bundled app with static assets that have cache-busting names", but actually putting in a real etag (not weak etag - with the W/ prefix) could be a lot of work.

What exactly could be a lot of work in your opinion? Do you mean an algo to generate/validate strong etags efficiently?
I think that the entity_tag crate could help here.

The thing that lead me to rethink my decision to plop in a quick-and-easy new setting, was I think adding etag on top of what you have today would just add technical debt on top of the hardcoded cache preferences that are already in the program.

The current cache-control preferences are already part of the general section settings offered by SWS as one of several defaults. But, cache-control is behind a simple bool flag. So, it should not be a big deal against a new etag flag.
Also, headers like cache-control can be overwritten by users on demand.

Probably prior to etag support comes, the program will need more robust cache configuration options

I think that the etag feature could have its own advanced section place. So, users can have robust cache configuration options and the freedom to tweak them as needed. Also, SWS advanced options are only available via the config.toml file.

If you have ideas about how to improve the cache mechanism of SWS, please feel free to share them.

@joseluisq
Copy link
Collaborator

Just to let know here that I started to work on a module to compute/check strong Etags via the ahash crate under an advanced TOML entry.
I eventually will integrate that module on SWS in a couple of releases or so, stay tuned.

@joseluisq joseluisq self-assigned this Mar 4, 2023
@joseluisq joseluisq removed the help wanted Extra attention is needed label Mar 4, 2023
@joseluisq
Copy link
Collaborator

Turned out that it got more tricky than I expected. You can find more context and progress on this thread https://users.rust-lang.org/t/how-to-compute-a-byte-for-byte-etag-using-tokio/92481

But besides, until exploring Trailer. We could probably re-think the idea of going with a weak ETag instead as several implementations do. E.g https://github.com/actix/actix-web/blob/master/actix-files/src/named.rs#L377-L407

So I will leave it like this for now until I find time again to probably come back to the topic.
In the meantime, If someone would like to help, please feel free to go ahead.

@joseluisq joseluisq added the help wanted Extra attention is needed label Apr 13, 2023
@joseluisq joseluisq removed their assignment Apr 13, 2023
@dnagir
Copy link

dnagir commented Jan 19, 2024

It is probably possible to generate a weak ETag without having to read the whole body of a file.

It can be done, for example, this way (just adding a patch on top of 8c6ab53 as it's not worth a PR):

diff --git a/src/static_files.rs b/src/static_files.rs
index 6165696..df726e3 100644
--- a/src/static_files.rs
+++ b/src/static_files.rs
@@ -597,9 +597,10 @@ async fn response_body(
     conditionals: Conditionals,
 ) -> Result<Response<Body>, StatusCode> {
     let mut len = meta.len();
-    let modified = meta.modified().ok().map(LastModified::from);
+    let modified = meta.modified().ok();
+    let last_modified_header = modified.map(LastModified::from);
 
-    match conditionals.check(modified) {
+    match conditionals.check(last_modified_header) {
         Cond::NoBody(resp) => Ok(resp),
         Cond::WithBody(range) => {
             let buf_size = optimal_buf_size(meta);
@@ -645,9 +646,12 @@ async fn response_body(
                     resp.headers_mut().typed_insert(ContentType::from(mime));
                     resp.headers_mut().typed_insert(AcceptRanges::bytes());
 
-                    if let Some(last_modified) = modified {
+                    if let Some(last_modified) = last_modified_header {
                         resp.headers_mut().typed_insert(last_modified);
                     }
+                    if let Some(last_modified) = modified {
+                        insert_etag(&mut resp, last_modified);
+                    }
 
                     Ok(resp)
                 })
@@ -663,6 +667,19 @@ async fn response_body(
     }
 }
 
+/// Inserts the ETag header based on the provided modified time. The value adeed is a weak ETag
+/// that also contains a fixed prefix `LMT` (Last Modified Time) to allow for future changes of the
+/// format of the ETag.
+///
+/// An example value might look like: `W/"LMT1705668081589469821"`.
+fn insert_etag(resp: &mut Response<Body>, modified: SystemTime) {
+    if let Ok(elapsed) = modified.duration_since(std::time::UNIX_EPOCH) {
+        let etag = format!(r#"W/"LMT{}""#, elapsed.as_nanos());
+        resp.headers_mut()
+            .typed_insert(headers::ETag::from_str(&etag).unwrap());
+    }
+}
+
 struct BadRange;
 
 fn bytes_range(range: Option<Range>, max_len: u64) -> Result<(u64, u64), BadRange> {
diff --git a/tests/compression_static.rs b/tests/compression_static.rs
index f614c6c..e8b43a0 100644
--- a/tests/compression_static.rs
+++ b/tests/compression_static.rs
@@ -66,6 +66,14 @@ async fn compression_static_file_exists() {
         assert_eq!(headers["content-encoding"], "gzip");
         assert_eq!(headers["accept-ranges"], "bytes");
         assert!(!headers["last-modified"].is_empty());
+        {
+            let etag = headers["etag"].to_str().unwrap_or("");
+            assert!(
+                etag.starts_with(r#"W/MMS""#),
+                "should contain weak etag in {}",
+                etag
+            );
+        }
         assert_eq!(
             &headers["content-type"], "text/html",
             "content-type is not html"

We would still need to extend the Conditionals::check to support the If-None-Match and If-Match headers (on top of the "If-(Un)Modified", "Range") and respond appropriately with a 200 or 304 correctly when a browser requests the resource to be re-validated.

But the biggest problem with the ETag is that the re-validation will never be requested for at least the Cache-Control: max-age duration.

So if the objective is to be able to get a fresh resource when it changes, then just making use of an ETag will not solve this. We would need to probably remove Cache-Control or even set Cache-Control to no-store or no-cache or start doing other combinations of caching mechanisms.

I'd suggest that this complexity might be hard to justify.
At the end of the day, as soon as there is caching, there is always going to be stale resource.

There are only two hard things in Computer Science: cache invalidation and naming things. 😄

@joseluisq
Copy link
Collaborator

@dnagir as I said before. We can adopt a weak Etag using a combination of modified time and file size similar to your patch.

However, I should be also in favor of adjusting the Conditionals::check to evaluate at least If-None-Match, If-Match and If-Unmodified-Since headers. Maybe something like https://github.com/weihanglo/sfz/blob/master/src/http/conditional_requests.rs#L25

PR welcome.

@palant
Copy link
Contributor

palant commented Apr 19, 2024

Note that a weak Etag has performance advantages which is likely why nginx uses it ({last_mod_time}-{file_size} with both parameters in hex). In particular, it’s a prerequisite if one is to use sendfile functionality to send file contents in the most efficient way possible, something that SWS might want to adapt in future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed v2 v2 release
Projects
None yet
Development

No branches or pull requests

4 participants