File operations don't use the whole buffer #1976

lnicola · 2019-12-17T13:56:06Z

tokio 0.2.4, see

Line 29 in 0d38936

pub(crate) const MAX_BUF: usize = 16 * 1024;

.

There's an unexpected 16 KB limit for IO operations, e.g. this will print 16384. It seems intended, but it's somewhat confusing.

async fn run() -> Result<(), std::io::Error> {
    let mut file = File::open("x.mkv").await?;
    let mut buf = [0u8; 32768];
    let size = file.read(&mut buf).await?;
    println!("{}", size);
    Ok(())
}

The text was updated successfully, but these errors were encountered:

carllerche · 2019-12-21T21:38:41Z

Looking at your original issue, I think we can support better performance by working directly with Bytes. In that case, we can avoid the copying and instead send the bytes handle to the remote thread.

As hinted in tokio-rs#1976 (comment) this change replaces the inner buf attribute of the Buf struct.

blasrodri · 2020-08-03T18:40:26Z

I’d like to work on this. Some guidance would be appreciated.
Especially around

I think we can support better performance by working directly with Bytes

grantperry · 2021-01-09T00:15:52Z

@carllerche
By working withBytes do you mean call AsyncReadExt::read_buf instead of AsyncReadExt::read? Will this fill the Bytes buffer if it is larger that MAX_BUF?

Darksonn · 2021-01-09T10:48:13Z

Not with the current implementation. It would probably require a special function for BytesMut specifically.

fetchadd · 2021-06-28T13:36:45Z

@carllerche Why is 16K, and not others, is there some special reason, or testing proved a better performance with 16K?

Darksonn · 2021-06-28T13:43:50Z

The reason to have a maximum buffer size is that the file API allocates an intermediate buffer separate from the user-provided buffer. I don't think the exact choice of size was benchmarked.

blasrodri · 2021-07-04T21:59:36Z

Why not replacing Vec<u8> in Io::blocking::Buf for BytesMut?

Darksonn · 2021-07-04T22:17:51Z

Well why would we? If both can be used, it's better to use a Vec<u8>.

blasrodri · 2021-07-04T23:35:33Z

Well why would we? If both can be used, it's better to use a Vec<u8>.

You're right.

Any hints on how to move forward w/ this?

Darksonn · 2021-07-05T07:54:38Z

The file operations that are offloaded to the spawn_blocking threadpool will probably continue to have a maximum buffer size. One thing that would be nice is to finish #3821, and operations executed through that setup would not be limited in size. That would involve finding a way to test various kernel versions in CI.

E.g. maybe there is a way to have some machines on AWS with the desired kernel version participate in running CI? Not sure on the details.

mcronce · 2022-01-22T06:31:13Z

It's an extremely narrow test, but in my quest to optimize I/O performance on something reading/writing whole (1-50GiB) files sequentially, I tested a quick hack that simply changes MAX_BUF to 128MiB and, much to my surprise, it Just Worked(TM): With a 128MiB buffer, I'm getting 128MiB per read() and write() syscall according to strace.

This is an obnoxiously large I/O size, but it does work on this very specific use case: The files are on cephfs, in an erasure coded pool, with 128MiB object size. Aligning I/O to read whole objects per request substantially improves performance (approx 80MiB/s per task with four tasks up to approx. 220MiB/s in my case)

This is on Fedora with kernel 5.6.13

EDIT: Fixed numbers

Darksonn · 2022-01-22T08:35:42Z

I am open to changing the buffer size used by the File type.

bartlomieju · 2022-03-21T13:28:03Z

@Darksonn is there anything we could help with to get this issue addressed?

At Deno we got several reports regarding poor performance when working on large files (denoland/deno#10157) due to a need to perform thousands of roundtrips between Rust and JavaScript to read the data.

Darksonn · 2022-03-21T14:54:11Z

Well, the options are the following:

We change the default buffer size.
We provide a way to configure the buffer size.
You perform the operations yourself with spawn_blocking instead of going through tokio::fs::File.

I think I would be ok with all of these. What do you think?

mcronce · 2022-03-21T15:00:17Z

@Darksonn I can make a PR with my changes later today. I landed on 32MiB as an optimal size for my use case, but I feel like making MAX_BUF arbitrarily large is probably fine, since it's just setting a hardcoded maximum - if calling code passes in a buffer with a smaller size, that size will be used

crowlKats · 2022-03-22T13:32:04Z

@Darksonn at Deno, we will be going with option 3 for now, but ideally we would like to see option 2 happening.

mcronce · 2022-03-24T06:53:12Z

Pushed up #4580

tokio issue: - tokio-rs/tokio#1976 This means when we pass a buffer > 16KB to the OS, tokio truncates it to 16KB blowing up 9pfs msize expectations.

tp971 · 2022-11-16T21:54:49Z

It just took me a good hour to find a bug while writing files using tokio::fs::File. Apparently, this issue also applies when writing to a file, which results in anything over 16384 bytes to just be ignored silently (without any panic or error). If the maximum buffer size is intended, I would suggest to at least document this behavior somewhere and maybe return an error or panic.

sfackler · 2022-11-16T21:59:58Z

AsyncWrite::write is never guaranteed to always write out the entire buffer. You should always check the number of bytes written that is returned from the call. If you want to write the entire buffer, use write_all.

matheus-consoli · 2023-07-04T17:44:26Z

#5397 increased the buf size to 2MB

tokio/tokio/src/io/blocking.rs

Line 29 in 0d382fa

pub(crate) const MAX_BUF: usize = 2 * 1024 * 1024;

is that enough to close this issue or some refinement should be done?

lnicola · 2023-07-04T17:49:44Z

I haven't tested it lately, but I suppose it's fine. @carllerche suggested using Bytes instead of a Vec, I don't know if it's still planned.

lnicola mentioned this issue Dec 17, 2019

[Tracking] Potential performance improvements stephank/hyper-staticfile#24

Open

Darksonn mentioned this issue Jul 25, 2020

Document performance considerations tokio::fs #2700

Open

blasrodri added a commit to blasrodri/tokio that referenced this issue Aug 3, 2020

io: replace Vec<u8> for BytesMut in Buf

06d67ac

As hinted in tokio-rs#1976 (comment) this change replaces the inner buf attribute of the Buf struct.

This was referenced Jan 18, 2021

Tokio can only read 16 KB per file per event loop tick emacs-ng/emacs-ng#105

Closed

readTextFile will only read 16KB of data per tick of event loop denoland/deno#9149

Closed

kitsonk mentioned this issue Apr 13, 2021

Max "op" size is 16384 which can be inefficient in user land denoland/deno#10157

Closed

mcronce mentioned this issue Mar 24, 2022

Increase max I/O buffer size to 128 MiB, and DRY the constant in stdio_common tests #4580

Closed

dralley mentioned this issue Feb 7, 2023

Switch tokio AsyncRead/Write traits to futures AsyncRead/Write tr… rpm-rs/rpm#57

Merged

lnicola closed this as completed Jul 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File operations don't use the whole buffer #1976

File operations don't use the whole buffer #1976

lnicola commented Dec 17, 2019

carllerche commented Dec 21, 2019

blasrodri commented Aug 3, 2020

grantperry commented Jan 9, 2021

Darksonn commented Jan 9, 2021

fetchadd commented Jun 28, 2021

Darksonn commented Jun 28, 2021

blasrodri commented Jul 4, 2021

Darksonn commented Jul 4, 2021

blasrodri commented Jul 4, 2021

Darksonn commented Jul 5, 2021 •

edited

mcronce commented Jan 22, 2022 •

edited

Darksonn commented Jan 22, 2022

bartlomieju commented Mar 21, 2022

Darksonn commented Mar 21, 2022

mcronce commented Mar 21, 2022

crowlKats commented Mar 22, 2022

mcronce commented Mar 24, 2022

tp971 commented Nov 16, 2022

sfackler commented Nov 16, 2022

matheus-consoli commented Jul 4, 2023

lnicola commented Jul 4, 2023

File operations don't use the whole buffer #1976

File operations don't use the whole buffer #1976

Comments

lnicola commented Dec 17, 2019

carllerche commented Dec 21, 2019

blasrodri commented Aug 3, 2020

grantperry commented Jan 9, 2021

Darksonn commented Jan 9, 2021

fetchadd commented Jun 28, 2021

Darksonn commented Jun 28, 2021

blasrodri commented Jul 4, 2021

Darksonn commented Jul 4, 2021

blasrodri commented Jul 4, 2021

Darksonn commented Jul 5, 2021 • edited

mcronce commented Jan 22, 2022 • edited

Darksonn commented Jan 22, 2022

bartlomieju commented Mar 21, 2022

Darksonn commented Mar 21, 2022

mcronce commented Mar 21, 2022

crowlKats commented Mar 22, 2022

mcronce commented Mar 24, 2022

tp971 commented Nov 16, 2022

sfackler commented Nov 16, 2022

matheus-consoli commented Jul 4, 2023

lnicola commented Jul 4, 2023

Darksonn commented Jul 5, 2021 •

edited

mcronce commented Jan 22, 2022 •

edited