Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customizing FS filters options #209

Open
arobase-che opened this issue Apr 29, 2019 · 20 comments
Open

Customizing FS filters options #209

arobase-che opened this issue Apr 29, 2019 · 20 comments
Labels
feature New feature or request

Comments

@arobase-che
Copy link

Hi,
I'm trying to serve a file (a big one).
I tried that code :

extern crate warp;
use warp::Filter;

fn main() {

    let big = warp::get2()
        .and(warp::path::end())
        .and(warp::fs::file("./big"));

    warp::serve(big).run(([127, 0, 0, 1], 3030));
}

But when I tried to download the file, it's incredibly slow !

$ wget "http://localhost:3030/" -O /dev/null
~67MB/s

While, the same using Go :

$ wget "http://localhost:8080/big" -O /dev/null
~440MB/s

Here is the SC :

package main

import (
  "log"
  "net/http"
)

func main() {

  http.HandleFunc("/big", func(w http.ResponseWriter, r *http.Request) {
    http.ServeFile(w, r, "big")
  });

  log.Fatal(http.ListenAndServe(":8080", nil))
}

Isn't that a performance issue ?

To give you some context. I'm trying to serve big file using Rust. The standard Go server is pretty good at this and currently I'm not able to implement a Rust web server which can compete the Go one, but i didn't expect the warp web framework to be so slow !

Did i miss something ?

@seanmonstar
Copy link
Owner

My first assumption is that Go is able to take advantage of sendfile, reducing copies from the file into userland and then the copy sent on the socket.

@arobase-che
Copy link
Author

arobase-che commented Apr 30, 2019

That sound realistic (and interesting) !
But it doesn't explain such low performance.

Even without using sendfile. A simple Stream reading a file by chunk from the disk using tokio (or std::file) and hyper (using wrap_stream) will be 3 times better (on my laptop) than the current method of warp to send a file.


I can't confirm that Go use sendfile (but it's likely because a comment in the server.go of the package http says so).

I don't think it's possible to use sendfile currently with hyper. How should it be implemented ? (Should it be ?) A new method from Body ? Body::send_file that take a file descriptor ?

@arobase-che
Copy link
Author

Ok so I checked how easy it will be to use sendfile as well as go does. (I checked with strace, go definitively use sendfile)
It's clearly hard.

Based on the fact that syscall are made by the standard lib.
It has to be implemented first in TCPStream (and/or UDP too), then into Hyper, then into warp.

Not for tomorrow then. 😞

Thank you for the hint ! I'm happy to understand why Go perform better than Rust here. ^^

Even so, the performance of warp to serve a file can be improved. I still doesn't understand why it's so slow.

@seanmonstar
Copy link
Owner

You say it's slower than doing it through hyper? That's surprising! Care has been taken to try to provide an optimized file stream in warp. What does the code look like for hyper that serves the file faster?

@arobase-che
Copy link
Author

Sorry I should have been more reactive ! Here is the first code I tried using tokio.

extern crate tokio;
extern crate hyper;

use futures::{Poll, Async, Stream, Future};
use std::io::{Error, ErrorKind};
use hyper::{Body, Response, Server, service::service_fn_ok};
use tokio::io::AsyncRead;

const BUF_SIZE : usize = 1<<13;

struct TokioStream {
    file: tokio::fs::File,
    buf: Vec<u8>,
}
impl Stream for TokioStream {
    type Item = Vec<u8>;
    type Error = Error;

    fn poll(&mut self) -> Poll<Option<Self::Item>, Self::Error> {
        match self.file.poll_read(&mut self.buf) {
            Ok(Async::Ready(0)) => {
                Ok(Async::Ready(None))
            },
            Ok(Async::Ready(_size)) => {
                Ok(Async::Ready(Some(Vec::from(&self.buf[..]))))
            },
            Ok(Async::NotReady) => Ok(Async::NotReady),
            Err(_) => Err(Error::new(ErrorKind::Other, "😭")),
        }
    }
}

fn main() {
    let addr = "[::1]:1337".parse().unwrap();

    let service = || {
        service_fn_ok( |_| {
            let task = tokio::fs::File::open("big")
                .and_then( |file|  {
                    Ok(TokioStream{ file: file,
                        buf: vec!(0; BUF_SIZE)
                    })
                });
            match task.wait() {
                Err(_) => Response::new(Body::from("Please (╥_╥)")),
                Ok(s) => Response::new(Body::wrap_stream(s)),
            }
        })
    };

    let server = Server::bind(&addr)
        .serve(service)
        .map_err(|e| eprintln!("Error: {}", e));

    hyper::rt::run(server);
}

I note that the size of the buffer greatly influences performance. So using a buffer's size of 1<<12 will make the "hyper code" as slow as the "warp one".

@seanmonstar
Copy link
Owner

Ok(Async::Ready(_size)) => {
    Ok(Async::Ready(Some(Vec::from(&self.buf[..]))))
}

Curious, without checking the number of bytes read, won't this sometimes send chunks of zeros (or whatever was left from a previous read)?

What OS are you testing this on? warp will try to use the blocksize from the file system to pick the best buffer size: https://github.com/seanmonstar/warp/blob/master/src/filters/fs.rs#L436

@seanmonstar seanmonstar changed the title Performance problem Performance of fs filters May 3, 2019
@arobase-che
Copy link
Author

Ok(Async::Ready(_size)) => {
    Ok(Async::Ready(Some(Vec::from(&self.buf[..]))))
}

Curious, without checking the number of bytes read, won't this sometimes send chunks of zeros (or whatever was left from a previous read)?

Yep it does ! The last chunk with zeros.
It's &self.buf[.._size] ^^" Sorry, I tried a lot of things.

What OS are you testing this on? warp will try to use the blocksize from the file system to pick the best buffer size: https://github.com/seanmonstar/warp/blob/master/src/filters/fs.rs#L436

Arch Linux - Updated.

$ cat /etc/os-release
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="0;36"
HOME_URL="https://www.archlinux.org/"

I will check the buffer size (and blocksize) as soon as possible. ^^

@arobase-che
Copy link
Author

arobase-che commented May 3, 2019

So I checked and my device block size is actually 1<<12 (4096B).
However, it seems more interesting to take larger block sizes. Why ?
I don't know.

The laptop is really old but has no particularity.

@seanmonstar
Copy link
Owner

So, something I've been thinking of is adding some sort of builder for fs filters, to allow customizing away from the defaults. This could include setting a desired read buffer size to override checking the blocksize. It'd be something like this:

let file = warp::fs::config()
    .read_buffer_size(1 << 13)
    // maybe you know the content-type better
    .content_type("text/x-my-format")
    // force a download (`content-disposition: attachment`)
    .attachment()
    // creates the `impl Filter` with the options set
    .file(path);

@seanmonstar seanmonstar changed the title Performance of fs filters Customizing fs filters read buffer size May 9, 2019
@seanmonstar seanmonstar added the feature New feature or request label May 9, 2019
@arobase-che
Copy link
Author

Sounds great ! :D

I will be happy to implement that. I'm studying for my exams ATM but will work on it ASAP. I may ask some questions.

To anyone: Feel free to start something before I do.

@caolan
Copy link

caolan commented Jan 20, 2020

I'd just like to add I also see very slow fs performance on Debian - it takes about 1.5 secs to transfer a 1.5Mb file on the same machine. I was previously using hyper with hyper-staticfile without any issues.

In addition, is there a suggested way to add etags to files served using fs::dir?

@seanmonstar
Copy link
Owner

@caolan I've just pushed a commit to master that will check the blocksize, but only use it if it's over 8192. Does that help your case?

Etag configuration isn't yet supported, but it'd be good to add!

@caolan
Copy link

caolan commented Jan 21, 2020

@seanmonstar no noticeable difference with master (e725bca) unfortunately

@Virgiel
Copy link

Virgiel commented Jan 21, 2020

I also have limited performance on Raspbian. I cannot exceed 1.7 MB/s with master

@seanmonstar
Copy link
Owner

I just noticed something that was preventing from using the proper read buffer size, will be fixed in #409.

@caolan
Copy link

caolan commented Jan 24, 2020

@seanmonstar I've tested again with master (49ed427) and can confirm the performance has greatly improved. A 1.5mb file that was take ~2 seconds to load on my laptop is now taking ~100-120ms.

@shepmaster
Copy link

I have a setup with files that are precompressed:

app-3b9b685e302fd7e597ba.js
app-3b9b685e302fd7e597ba.js.gz

I'd like to be able to say "if the request supports gzip, and there's a compressed file, then serve it, otherwise the uncompressed version". I also want to add headers based on the creation/modification date of the file.

I don't expect that this is a super common practice, so I don't expect warp to support it out of the box. What would be nice is to have access to some of the pieces to combine them in my own way.

For example, I don't want to re-write the path traversal protection that Warp (presumably) has to get the path from the request, nor do I want to re-write the part of the code that takes a File and returns it.

Is there a possibility of gaining access to these components?

@seanmonstar seanmonstar changed the title Customizing fs filters read buffer size Customizing FS filters options Jan 27, 2020
@seanmonstar
Copy link
Owner

@shepmaster

Yea, adding some more building blocks would likely help people build their specific requirements.

I don't want to re-write the path traversal protection that Warp (presumably) has to get the path from the request

Agreed, so what building block makes sense here? A way to join a file Path with warp::path::tail(), maybe?

nor do I want to re-write the part of the code that takes a File and returns it.

I think this would be #171, or close to it.

@shepmaster
Copy link

what building block makes sense here

The way that Rocket does it, and I appreciate, is to make the conversion from URL path to PathBuf perform the check. In Warp syntax, maybe something like:

warp::get("assets").and_then(file_path()).map(|path: PathBuf| {
    // `path` does not have traversal problems
})

If someone wanted the path without the check applied, I'd suggest a newtype (e.g. UncheckedPath(PathBuf)) that they can opt into. This way, the default / easy path is the safer path.


As an aside, I think (but have not tested) that the current traversal protection is overly aggressive. For the assets/ example, I think that assets/alpha/../beta should be fine, but from my reading of the code it would be rejected.

@urkle
Copy link
Contributor

urkle commented Feb 1, 2024

Some other options I would like to have control over in fs (file and dir)

  1. ability to set/use etags
  2. ability to not set last modified
  3. ability to set Cache-Control header (or any other headers?)

Tight now the cache handling for the default fs:dir() is horrible as deploying new updates to the files passing through are rarely detected and require a force-refresh in the browser to pickup. So being able to adjust how html files are handled separately from say assets (js, css, images) which I have cache-busting names for would make this significantly more usable.

@seanmonstar is the contract you suggested in #209 (comment) acceptable? As I can start a PR that adds that contract.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants