Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase max I/O buffer size to 128 MiB, and DRY the constant in stdio_common tests #4580

Closed
wants to merge 1 commit into from

Conversation

mcronce
Copy link

@mcronce mcronce commented Mar 24, 2022

Motivation

A very rudimentary solution to #1976 :) the maximum number of bytes we can currently read/write through tokio's I/O subsystem in a single operation is 16KiB. This increases that maximum to 128MiB.

Solution

It doesn't give us direct configurability of the maximum buffer size, but since (unless I'm misunderstanding the code in that module) calling code can create and supply smaller buffers, it seems like simply increasing the hardcoded number is a reasonable starting point

@Darksonn Darksonn added A-tokio Area: The main tokio crate M-fs Module: tokio/fs labels Mar 24, 2022
@Darksonn
Copy link
Contributor

Although 16KiB is too small, 128 MiB is probably too large, at least as a default value. I'd rather not have a few open files take up an entire gigabyte of ram unless the user explicitly asks for a buffer size that large.

@mcronce
Copy link
Author

mcronce commented Mar 24, 2022

@Darksonn I don't think it does. As far as I can tell, this is only specifying the hard maximum number of bytes that io::blocking::Buf will extend itself to, and thus what can be transferred in a single I/O request. Calling code has always been free to allocate and pass in a buffer of any size; MAX_BUF is used in a std::cmp::min() call, so the lesser of MAX_BUF and the size of the passed-in buffer will be used.

The only uses of MAX_BUF to directly determine allocation size that I can find are in two (currently failing :( ) tests in tokio/src/fs/file/tests.rs

Unless I'm missing something and am completely off base ;) I've been hunting for evidence to the contrary but haven't been able to find it - I wasn't able to find anything in code that allocates a larger buffer, and with some added debugging and a quick test binary:

diff --git a/tokio/src/io/blocking.rs b/tokio/src/io/blocking.rs
index 289c32ca..9d431b40 100644
--- a/tokio/src/io/blocking.rs
+++ b/tokio/src/io/blocking.rs
@@ -241,6 +241,8 @@ impl Buf {
         unsafe {
             self.buf.set_len(len);
         }
+
+        dbg!(self.buf.capacity());
     }
 
     pub(crate) fn read_from<T: Read>(&mut self, rd: &mut T) -> io::Result<usize> {
use tokio::fs::File;
use tokio::io::AsyncReadExt;

#[tokio::main]
async fn main() {
	println!("+++ Buffer size 4k");
	let mut buf = [0u8; 4 * 1024];
	let mut f = File::open("test.dat").await.unwrap();
	let count = f.read(&mut buf).await.unwrap();
	println!("--- Read {}\n", count);

	println!("+++ Buffer size 16k");
	let mut buf = [0u8; 16 * 1024];
	let mut f = File::open("test.dat").await.unwrap();
	let count = f.read(&mut buf).await.unwrap();
	println!("--- Read {}\n", count);

	println!("+++ Buffer size 1M");
	let mut buf = vec![0u8; 1 * 1024 * 1024];
	let mut f = File::open("test.dat").await.unwrap();
	let count = f.read(&mut buf).await.unwrap();
	println!("--- Read {}\n", count);

	println!("+++ Buffer size 128M");
	let mut buf = vec![0u8; 128 * 1024 * 1024];
	let mut f = File::open("test.dat").await.unwrap();
	let count = f.read(&mut buf).await.unwrap();
	println!("--- Read {}\n", count);

	println!("+++ Buffer size 512M");
	let mut buf = vec![0u8; 512 * 1024 * 1024];
	let mut f = File::open("test.dat").await.unwrap();
	let count = f.read(&mut buf).await.unwrap();
	println!("--- Read {}\n", count);
}

That behavior seems to be confirmed:

+++ Buffer size 4k
[/home/mc/code/tokio/tokio/src/io/blocking.rs:245] self.buf.capacity() = 4096
--- Read 4096

+++ Buffer size 16k
[/home/mc/code/tokio/tokio/src/io/blocking.rs:245] self.buf.capacity() = 16384
--- Read 16384

+++ Buffer size 1M
[/home/mc/code/tokio/tokio/src/io/blocking.rs:245] self.buf.capacity() = 1048576
--- Read 1048576

+++ Buffer size 128M
[/home/mc/code/tokio/tokio/src/io/blocking.rs:245] self.buf.capacity() = 134217728
--- Read 134217728

+++ Buffer size 512M
[/home/mc/code/tokio/tokio/src/io/blocking.rs:245] self.buf.capacity() = 134217728
--- Read 134217728

Let me know if that seems wrong! If it isn't wrong, maybe it's worth adding a brief doc to MAX_BUF explaining all this - I can do that in this PR

@tobz
Copy link
Member

tobz commented Apr 6, 2022

Personally, I agree with Alice that this is an easy way to allow for a ton of behind-the-scenes allocations when working with files, and in particular, allocations that a user will not be able to easily surface/control.

Separately, though, but partially related to the above: it's not clear to me why it should be 128MB. Ostensibly, we want to do more I/O per call, that much is clear, but is there not a diminishing return at maximum buffer sizes between 16KB and 128MB?

As an example: with the current 16KB maximum buffer size, I can already read and write at nearly 1GB/s by doing simple things like using a buffered writer, and not constantly fsyncing, etc.

Overall, I don't see why this needs to change, but I'm open to examining any data/evidence that shows how the change would benefit users, whether it's 128MB or something smaller, like 1MB, etc.

@mcronce
Copy link
Author

mcronce commented Apr 6, 2022

@tobz my argument about why it needs to change is because it takes control away from the user. It's not like we're setting a default with this constant; we're setting a limit.

By increasing the limit to something much higher (or, alternatively, removing it altogether), it gives users the option to figure out a size that works best for their use case.

@tobz
Copy link
Member

tobz commented Apr 6, 2022

@tobz my argument about why it needs to change is because it takes control away from the user. It's not like we're setting a default with this constant; we're setting a limit.

By increasing the limit to something much higher (or, alternatively, removing it altogether), it gives users the option to figure out a size that works best for their use case.

The problem is that is the work that has to happen behind-the-scenes to drive synchronous I/O through an async-based API means we're doing a lot more copying/buffering than is optimal.

We're taking away control from the user, for some definition of "control", but again, we're explicitly trying to avoid allocating a large one-off buffer that the user actually has no control over and never will have control over.

This is why I'd personally want to see data along the curve of possible maximum buffer sizes, because I believe the pragmatic view is that we could increase the maximum buffer size so long as we see there's a general performance improvement, and avoid increasing it past the point where there starts to be diminishing returns.

File I/O through the blocking infrastructure is kind of a crutch to begin with.. a means to an end, by providing a usable API surface for doing file I/O in a way that is natural for async applications, but I think Carl's comments in #1976 are ultimately what we really want: a way to allow the user to actually pass ownership of their buffer(s) rather than having to ever copy at all.

That's control, and efficiency to boot.

@Darksonn
Copy link
Contributor

Darksonn commented Apr 6, 2022

a way to allow the user to actually pass ownership of their buffer(s) rather than having to ever copy at all.

Well, a much simpler option is a way to explicitly set the buffer size, that allows it to go above the limit. This would not risk existing applications that run out of memory because they keep a few File objects around.

@tobz
Copy link
Member

tobz commented Apr 6, 2022

I wouldn't disagree, although just visualizing it in my mind, it's not clear where we might expose that, and it also feels like it leaks a bit of the implementation details around how file I/O is driven.

All of that said, yeah, it would indeed preserve much of the "no surprises" aspect of the current maximum buffer size while giving others the ability to exceed it.

@mcronce
Copy link
Author

mcronce commented Apr 6, 2022

This is not really worth the mental effort of arguing about for me. Will have to just keep maintaining a fork to control I/O size.

@mcronce mcronce closed this Apr 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate M-fs Module: tokio/fs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants