Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for unbuffered writes #229

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nightmared
Copy link

Hello,
this PR aims to tackle one problem I had: streaming the output of a program while it is running over HTTP, similarly to the Github/GitLab CI "live" screens.
In fact, the goal is to perform a request to the server with cURL inside a gitlab-ci job, and see the output unfolds as the program is executed:


              web browser
----------    ------------> -------------------
| CLIENT |                  |  GitLab CI job  |
----------   <------------  -------------------
               streams the     |              ∧
                  output       |              |
                               | executes     | prints the
                               |              | output
                               |              | "live"
                               ∨              |
                             -------------------
                             |       cURL      |
                             -------------------
                              |               ∧
                              |               | streaming
                              | HTTP          | (chunked
                              | query         | transfer)
                              |               | of stdin
                              |               | and stdout
                  spawns      ∨               |
-----------  <------------   --------------------
| program |                  | tiny-http server |
-----------   ------------>  --------------------
               sends stdout
               and stdin in
                 a pipe

Without this patch, the program executes in its entirety, and the output is sent all at once
once the pipe where the program writes its output is closed.

To achieve that goal, I submit two "sub-features":

  • The ability to disable buffering inside Response objects with the Response::with_buffered method.
    Enabling this will force the transfer encoding to be TransferEncoding::Chunked and will ask the
    chunks encoder to flush to its underlying writer on every write.
  • To get "instantaneous" write, disabling buffering in the chunks encoder is not enough, as the underlying
    writer returned when calling Server::recv() (ClientConnection.sink) is in fact a BufWriter wrapping
    the "real" output. The buffered_writer option in ServerConfig, when set to false while instantiating
    a new server, omits the BufWriter to write to the TcpStream withotu any buffering. The cost of that
    abstraction is that ClientConnection.sink now boxes the writer to be able to choose between
    BufWriter<RefinedTcpStream> or RefinedTcpStream dynamically, which means there is now one additional
    pointer deference. I do however expect the performance impact to be small as this pointer is then stored
    in an Arc<Mutex<>>, and I think locking/unlocking the mutex should be more costly that deferencing the
    pointer.

I expect this to decrease performance when sending big files, which is why these two sub-features are disabled
by default, and must explicitly be opted-in (by calling the with_buffered method for the first, and by
instanciating the server with the buffered_writer setting for the second).

Also note that the current iteration of this work breaks the current ABI, as it updates ServerConfig to
add the buffered_writer option, and ServerConfig was already exposed through the Server::new function.

Copy link
Member

@bradfier bradfier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nightmared thanks so much for planning and building this draft.

First of all to give you some confidence to continue, I'd be happy to merge this feature, I think it's a useful extension on the current library behaviour.

If you could include some of your commentary on the change in the commit message when you finalise the PR that would be great too.

src/client.rs Outdated
Comment on lines 54 to 57
pub fn new(
write_socket: RefinedTcpStream,
mut read_socket: RefinedTcpStream,
buffered: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off, this is a public API and I think we should minimize breakage where possible, so I'd prefer a new_unbuffered method rather than changing the signature of new.

I'd also prefer an Enum BufferMode { Buffered, Unbuffered } or something like that, I feel that boolean behaviour control parameters are an antipattern in Rust where we have such great support for enums.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed for the enum and minizing breakage.

However ClientConnection does not appear to be public API? I do not see how a client can use it, because it is not reexported publicly in lib.rs, and the module is not exposed publicly either.

I will submit a new version shortly, but without a new method for ClientConnection as I don't think there will be any API breakage there (and thus any advantage. If you want me to still add a different method, that's not an issue at all, and I will do so gladly.

As for the breakage in ServerConfig, I will submit a small change where the buffering option is hidden behind an opaque struct, so that it can be further modified in the future without breaking the API (but that won't prevent that initial API breakage, sadly). Please tell me what you think of that. To be honest, I think that method could also be extended to cover all of ServerConfig.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The library in general predates pub(crate) so there's rather a lot of these mistakes to be made, my apologies if it turns out this is really an internal type that's mistakenly marked pub.

I'll re-review your latest version ASAP.


impl ServerConfigAdvanced {
pub fn new() -> Self {
Self::default()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I'm not certain providing a default impl of ServerConfigAdvanced is a great idea, maybe I should drop that and explicitly instantiate the configuration in the new method.
Because the Default impl is public-facing, so if we commit to that, we will need to maintain a Default impl on that structure in the future.
That said, we do have to provide default values anyway (wether in the default impl or in the new method), so I don't feel strongly about this either way. But if the idea is to provide some opaque structure with a builder pattern, I guess the less we expose API-wide, the better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if providing a Defaul implementation is any more or less public than an opaque new(), in that we would need to preserve any behavioural expectations anyway - I would like a builder pattern configuration mechanism anyway, but it doesn't need to be at the expense of Default.

This is achieved through two different means:
- The ability to disable buffering inside `Response` objects with the
  `Response::with_buffering` method. Enabling this will force the transfer
  encoding to be `TransferEncoding::Chunked` and will configure the chunks
  encoder to flush to its underlying writer on every write.
- To get "instantaneous" write, disabling buffering in the chunks encoder is
  not enough, as the underlying writer returned when calling `Server::recv()`
  (`ClientConnection.sink`) is in fact a `BufWriter` wrapping the "real" output.
  The `writer_buffering` parameter in `ServerConfig.advanced` can alter the
  server behavior to omit the BufWriter when writing to the TcpStream.
  The cost of that abstraction is that `ClientConnection.sink` now boxes the
  writer to be able to choose between `BufWriter<RefinedTcpStream>` and
  `RefinedTcpStream` dynamically, which means there is now one additional
   pointer deference. However, this pointer is then stored in an Arc<Mutex<>>,
   and locking/unlocking the mutex is probbaly more expensive that deferencing
   the pointer.

This will probably decrease performance significantly when sending big files,
which is why these two subfeatures are disabled by default, and must be opted-in
(by calling the `with_buffering` method for the first, and by instanciating the
server with the `with_writer_buffering_mode` method for the second).
@nightmared
Copy link
Author

Alright, I have reworked the PR to rebase it on master, and also:

  • removed all the boolean parameters.
  • Added a new MaybeBufferedWriter enum to remove the Box<dyn Write> that I have introduced in the previous version. This should lead to slighter better performance (and mainly produce less allocations).

Note that the rustc-1.56 failure is not due to the PR itself, but the need to bump the minimal version in the CI to 1.57 for rustls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants