New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient implementation of vectored writes for BufWriter #3163
Efficient implementation of vectored writes for BufWriter #3163
Conversation
Thanks for the submission. Could we add some tests for the new functions? |
I had some tests written for #3092, but these relied on an async-friendly |
d34f468
to
c2aee65
Compare
I have rebased the branch and added some tests. |
c2aee65
to
f0bf96c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry for taking so long to review this. The PR got lost among all the other PRs we get.
tokio/src/io/util/buf_writer.rs
Outdated
fn poll_write_vectored( | ||
mut self: Pin<&mut Self>, | ||
cx: &mut Context<'_>, | ||
bufs: &[IoSlice<'_>], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I think that this could be simplified quite a lot if you make use of the fact that it is possible to reassign bufs
to a sub-slice of itself to remove slices from the beginning of the buffer. E.g. to remove the first slice after having used it:
bufs = &bufs[1..];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion. I could not simplify by much, but hopefully the code is more readable now without an explicit iterator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave it a try myself and came up with this. What do you think?
} else {
// Remove empty buffers at the beginning of bufs.
while bufs.first().map(|buf| buf.len()) == Some(0) {
bufs = &bufs[1..];
}
if bufs.is_empty() {
return Poll::Ready(Ok(0));
}
// Flush if the first buffer doesn't fit.
let first_len = bufs[0].len();
if first_len >= self.buf.capacity() - self.buf.len() {
ready!(self.as_mut().flush_buf(cx))?;
debug_assert!(self.buf.is_empty());
}
let me = self.project();
if first_len >= me.buf.capacity() {
// The slice is at least as large as the buffering capacity,
// so it's better to write it directly, bypassing the buffer.
debug_assert!(me.buf.is_empty());
return me.inner.poll_write(cx, &*bufs[0]);
}
// Append the buffers that fit in the internal buffer.
// The first buffer is guaranteed to fit due to the flush above.
let mut total_written = 0;
for buf in bufs {
if buf.len() > me.buf.capacity() - me.buf.len() {
break;
} else {
me.buf.extend_from_slice(&*buf);
total_written += buf.len();
}
}
debug_assert!(total_written != 0);
Poll::Ready(Ok(total_written))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Darksonn Thanks! I have unrolled the first iteration of appending though, since we're into improving performance and the check in the loop body would be done redundantly, and there is a suitable else
branch for that.
Implement AsyncWrite::poll_write_vectored for BufWriter, making use of the buffer to coalesce data from the passed slices. Make exceptions for cases when writing directly to the underlying object is more efficient, which differ depending on whether the underlying writer itself supports vectored output. Change the implementation of AsyncWrite::is_write_vectored to return true.
Forward poll_write_vectored/is_write_vectored to the underlying object for BufReader. Do the same in BufStream, which has the effect of is_write_vectored returning true because BufWriter's implementation now does.
Simplify branching and do what poll_write does: do not buffer slices partially, optimizing for the most likely case where slices are much smaller than the buffer, while retaining special treatment for oversized slices.
In the implementation of AsyncWrite::poll_write_vectored for BufWriter, the total length of the data can technically overflow usize even with safely obtained buffer slices, since slices may overlap.
Improve readability by updating the IOSlice vector slice as constituent slices are gone over by the iteration, rather than using an iterator.
f0bf96c
to
f0d443d
Compare
Partially incorporated code suggested by @Darksonn.
18b991e
to
2dff0ad
Compare
Co-authored-by: Alice Ryhl <alice@ryhl.io>
eddf256
to
598560b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems good now, thanks.
Motivation
With vectored writes reintroduced in #3149,
BufWriter
can provide an efficient implementation both with or without vectored write support in the underlying writer, and advertise it withis_write_vectored
.Solution
Provide a custom implementation of
AsyncWrite::poll_write_vectored
by coalescing data from the slices in the buffer, unless it is more efficient to pass one or all of them directly to the underlying writer. This needs to be optimized differently depending on whether the underlying writer supports efficient vectored output.