Support for unbuffered writes #229

nightmared · 2022-08-26T16:20:12Z

Hello,
this PR aims to tackle one problem I had: streaming the output of a program while it is running over HTTP, similarly to the Github/GitLab CI "live" screens.
In fact, the goal is to perform a request to the server with cURL inside a gitlab-ci job, and see the output unfolds as the program is executed:


              web browser
----------    ------------> -------------------
| CLIENT |                  |  GitLab CI job  |
----------   <------------  -------------------
               streams the     |              ∧
                  output       |              |
                               | executes     | prints the
                               |              | output
                               |              | "live"
                               ∨              |
                             -------------------
                             |       cURL      |
                             -------------------
                              |               ∧
                              |               | streaming
                              | HTTP          | (chunked
                              | query         | transfer)
                              |               | of stdin
                              |               | and stdout
                  spawns      ∨               |
-----------  <------------   --------------------
| program |                  | tiny-http server |
-----------   ------------>  --------------------
               sends stdout
               and stdin in
                 a pipe

Without this patch, the program executes in its entirety, and the output is sent all at once
once the pipe where the program writes its output is closed.

To achieve that goal, I submit two "sub-features":

The ability to disable buffering inside Response objects with the Response::with_buffered method.
Enabling this will force the transfer encoding to be TransferEncoding::Chunked and will ask the
chunks encoder to flush to its underlying writer on every write.
To get "instantaneous" write, disabling buffering in the chunks encoder is not enough, as the underlying
writer returned when calling Server::recv() (ClientConnection.sink) is in fact a BufWriter wrapping
the "real" output. The buffered_writer option in ServerConfig, when set to false while instantiating
a new server, omits the BufWriter to write to the TcpStream withotu any buffering. The cost of that
abstraction is that ClientConnection.sink now boxes the writer to be able to choose between
BufWriter<RefinedTcpStream> or RefinedTcpStream dynamically, which means there is now one additional
pointer deference. I do however expect the performance impact to be small as this pointer is then stored
in an Arc<Mutex<>>, and I think locking/unlocking the mutex should be more costly that deferencing the
pointer.

I expect this to decrease performance when sending big files, which is why these two sub-features are disabled
by default, and must explicitly be opted-in (by calling the with_buffered method for the first, and by
instanciating the server with the buffered_writer setting for the second).

Also note that the current iteration of this work breaks the current ABI, as it updates ServerConfig to
add the buffered_writer option, and ServerConfig was already exposed through the Server::new function.

bradfier

Hey @nightmared thanks so much for planning and building this draft.

First of all to give you some confidence to continue, I'd be happy to merge this feature, I think it's a useful extension on the current library behaviour.

If you could include some of your commentary on the change in the commit message when you finalise the PR that would be great too.

bradfier · 2022-08-26T20:03:43Z

src/client.rs

    pub fn new(
        write_socket: RefinedTcpStream,
        mut read_socket: RefinedTcpStream,
+        buffered: bool,


First off, this is a public API and I think we should minimize breakage where possible, so I'd prefer a new_unbuffered method rather than changing the signature of new.

I'd also prefer an Enum BufferMode { Buffered, Unbuffered } or something like that, I feel that boolean behaviour control parameters are an antipattern in Rust where we have such great support for enums.

Agreed for the enum and minizing breakage.

However ClientConnection does not appear to be public API? I do not see how a client can use it, because it is not reexported publicly in lib.rs, and the module is not exposed publicly either.

I will submit a new version shortly, but without a new method for ClientConnection as I don't think there will be any API breakage there (and thus any advantage. If you want me to still add a different method, that's not an issue at all, and I will do so gladly.

As for the breakage in ServerConfig, I will submit a small change where the buffering option is hidden behind an opaque struct, so that it can be further modified in the future without breaking the API (but that won't prevent that initial API breakage, sadly). Please tell me what you think of that. To be honest, I think that method could also be extended to cover all of ServerConfig.

The library in general predates pub(crate) so there's rather a lot of these mistakes to be made, my apologies if it turns out this is really an internal type that's mistakenly marked pub.

I'll re-review your latest version ASAP.

nightmared · 2022-08-26T20:53:01Z

src/lib.rs

+
+impl ServerConfigAdvanced {
+    pub fn new() -> Self {
+        Self::default()


To be honest, I'm not certain providing a default impl of ServerConfigAdvanced is a great idea, maybe I should drop that and explicitly instantiate the configuration in the new method.
Because the Default impl is public-facing, so if we commit to that, we will need to maintain a Default impl on that structure in the future.
That said, we do have to provide default values anyway (wether in the default impl or in the new method), so I don't feel strongly about this either way. But if the idea is to provide some opaque structure with a builder pattern, I guess the less we expose API-wide, the better?

I don't know if providing a Defaul implementation is any more or less public than an opaque new(), in that we would need to preserve any behavioural expectations anyway - I would like a builder pattern configuration mechanism anyway, but it doesn't need to be at the expense of Default.

This is achieved through two different means: - The ability to disable buffering inside `Response` objects with the `Response::with_buffering` method. Enabling this will force the transfer encoding to be `TransferEncoding::Chunked` and will configure the chunks encoder to flush to its underlying writer on every write. - To get "instantaneous" write, disabling buffering in the chunks encoder is not enough, as the underlying writer returned when calling `Server::recv()` (`ClientConnection.sink`) is in fact a `BufWriter` wrapping the "real" output. The `writer_buffering` parameter in `ServerConfig.advanced` can alter the server behavior to omit the BufWriter when writing to the TcpStream. The cost of that abstraction is that `ClientConnection.sink` now boxes the writer to be able to choose between `BufWriter<RefinedTcpStream>` and `RefinedTcpStream` dynamically, which means there is now one additional pointer deference. However, this pointer is then stored in an Arc<Mutex<>>, and locking/unlocking the mutex is probbaly more expensive that deferencing the pointer. This will probably decrease performance significantly when sending big files, which is why these two subfeatures are disabled by default, and must be opted-in (by calling the `with_buffering` method for the first, and by instanciating the server with the `with_writer_buffering_mode` method for the second).

nightmared · 2023-04-28T18:23:10Z

Alright, I have reworked the PR to rebase it on master, and also:

removed all the boolean parameters.
Added a new MaybeBufferedWriter enum to remove the Box<dyn Write> that I have introduced in the previous version. This should lead to slighter better performance (and mainly produce less allocations).

Note that the rustc-1.56 failure is not due to the PR itself, but the need to bump the minimal version in the CI to 1.57 for rustls.

bradfier requested changes Aug 26, 2022

View reviewed changes

nightmared force-pushed the unbuffering_support_pr branch from 2355e58 to 3a4ed4f Compare August 26, 2022 20:46

nightmared commented Aug 26, 2022

View reviewed changes

nightmared force-pushed the unbuffering_support_pr branch from 3a4ed4f to 4968af3 Compare April 28, 2023 18:18

nightmared requested a review from bradfier April 28, 2023 18:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for unbuffered writes #229

Support for unbuffered writes #229

nightmared commented Aug 26, 2022

bradfier left a comment

bradfier Aug 26, 2022

nightmared Aug 26, 2022

bradfier Sep 8, 2022

nightmared Aug 26, 2022

bradfier Sep 8, 2022

nightmared commented Apr 28, 2023

Support for unbuffered writes #229

Are you sure you want to change the base?

Support for unbuffered writes #229

Conversation

nightmared commented Aug 26, 2022

bradfier left a comment

Choose a reason for hiding this comment

bradfier Aug 26, 2022

Choose a reason for hiding this comment

nightmared Aug 26, 2022

Choose a reason for hiding this comment

bradfier Sep 8, 2022

Choose a reason for hiding this comment

nightmared Aug 26, 2022

Choose a reason for hiding this comment

bradfier Sep 8, 2022

Choose a reason for hiding this comment

nightmared commented Apr 28, 2023