fs: add read_at/write_at/seek_read/seek_write for fs::File #6427

SteveLauC · 2024-03-25T05:13:39Z

Motivation

Solution

Implemented using spawn_blocking(), see Feature request: read_at/write_at for tokio::fs::File #1529 (comment)
This implementation has UNIX and Windows supported, WASM should be supported as well, but the corresponding standard interface is still nightly, so it is not included in this PR.

About the interface

These 2 methods are added directly to tokio::fs::File, the sync version APIs, are actually exposed through trait, FileExt, will change it to a trait, something like AsyncFileExt if you want.

SteveLauC · 2024-03-25T06:18:00Z

Well, about the interface on Windows:

Originally, I thought seek_read()/seek_write() were the Windows version of POSIX pread()/pwrite(), but they are actually not, seek_read()/seek_write() will affect the file cursor (this is why the CI tests on Windows are failing)

Given the different behaviors, I think we should expose them by trait? I would like to hear your thoughts before making any further changes:)

Darksonn · 2024-03-26T10:54:45Z

You can provide a unix-specific method for read_at, and a different windows-specific method for seek_read. We usually don't use traits for this, and instead just use cfgs:

#[cfg(unix)]
#[cfg_attr(docsrs, doc(cfg(unix)))]
fn my_unix_specific_method() {}

The second method ensures that it is rendered properly in the documentation. Please double check the generated documentation for your code as well. You can do that either by generating it locally via the instructions in CONTRIBUTING.md, or by pushing to github and clicking "details" on the "netlify/tokio-rs/deploy-preview" ci job, which shows rendered documentation for your PR.

SteveLauC · 2024-03-26T11:22:42Z

Thanks for that reminder of the doc cfg, I will do it soon:)

SteveLauC · 2024-03-27T08:49:03Z

Hi, this PR should be ready for review:

Interfaces:
- read_at() write_at() for UNIX
- seek_read() seek_write() for Windows
Documentation: copied from the corresponding standard library interfaces, i.e., these traits:
- std::os::unix::fs::FileExt
- std::os::windows::fs::FileExt
Examples added
Integration test added
Though I am not sure if the file names are ok, should they be io_xxx.rs or fs_xxx.rs, they are added to fs::File, but they are actually I/O interfaces:<

Ping me if you guys need me to squash my commits

Darksonn · 2024-03-30T11:16:36Z

tokio/src/fs/file.rs

+        let std = self.std.clone();
+        let n = buf.len();
+        let bytes_read = asyncify(move || _read_at(&std, n, offset)).await?;
+        let len = bytes_read.len();
+        buf[..len].copy_from_slice(&bytes_read);


The Tokio file already has a bunch of logic for keeping track of and reusing a buffer, but none of these functions use that logic. Is there a reason you're doing it this way?

The Tokio file already has a bunch of logic for keeping track of and reusing a buffer,

Well, I am not aware of this, do you mean the bytes crate? If not, would you like to show me some examples?

No, I mean the stuff related to these types:

tokio/tokio/src/fs/file.rs

Lines 96 to 118 in c98e22f

struct Inner {

state: State,

/// Errors from writes/flushes are returned in write/flush calls. If a write

/// error is observed while performing a read, it is saved until the next

/// write / flush call.

last_write_err: Option<io::ErrorKind>,

pos: u64,

}

#[derive(Debug)]

enum State {

Idle(Option<Buf>),

Busy(JoinHandle<(Operation, Buf)>),

}

#[derive(Debug)]

enum Operation {

Read(io::Result<usize>),

Write(io::Result<()>),

Seek(io::Result<u64>),

}

Thanks for showing me this!

I just took another look at other methods implemented with spawn_blocking(), ~~seems that I should add:~~

let mut inner = self.inner.lock().await; inner.complete_inflight().await;

~~before involving the actual logic, right?~~

Update: Well, seems more complicated than I thought, I originally thought we could simply implement it using asyncify()

One could probably argue either way about whether it's even desireable. One advantage of read_at is that you can have several calls in parallel, but if we use the state logic, then they will run one-after-the-other.

I'm not sure what the best answer here is.

One advantage of read_at is that you can have several calls in parallel, but if we use the state logic, then they will run one-after-the-other.

Thanks for pointing it out! The capability of enabling concurrent access is indeed the reason why pread/pwrite are added, so I slightly tend to use separate buffers

One could probably argue either way about whether it's even desireable. One advantage of read_at is that you can have several calls in parallel, but if we use the state logic, then they will run one-after-the-other.

I'm not sure what the best answer here is.

I have a tentative plan, can we use RwLock instead of Mutex to achieve sharing among multiple readers and allowing them run in parallel?

tokio/tokio/src/fs/file.rs

Line 92 in c98e22f

inner: Mutex<Inner>,

@Chasing1020 A read/write lock doesn't help. Fundamentally, the shared buffer can only hold one piece of data at the time.

asomers · 2024-04-10T16:05:10Z

I think this would be a good feature. But what would make it really great would be if it can have platform-specific integration. For example, aio_read on FreeBSD. We need to be very careful with the initial implementation to ensure that acceleration will be possible. For example to allow for errors to be returned in the right places. Here is the existing FreeBSD version. https://docs.rs/tokio-file/latest/tokio_file/ .

SteveLauC · 2024-04-10T22:41:53Z

But what would make it really great would be if it can have platform-specific integration.

I agree and I am ok to have real async I/O for platforms that are capable of doing it

asomers · 2024-04-27T14:20:48Z

I'm working on the platform-specific part right now. However, even though aio_read is nearly equivalent to an asynchronous read_at, it's error handling behavior is slightly different. For example, aio_read may return EAGAIN if it hits a system resource limitation. The caller must be prepared for that. Therefore, I think the AIO acceleration should be opt-in. How about moving these methods into an extension trait named something like AsyncFileExt and then creating a second extension trait named AioFileExt whose methods have the same signature? Then opting into AIO would just consist of importing the AioFileExt trait instead of the AsyncFileExt trait?

Darksonn · 2024-04-30T14:11:04Z

@asomers Can you open a new feature request for an AIO implementation of tokio::fs::File. Then we can discuss that issue there.

Regarding this PR, I still don't know what to do about the question of buffer management. The tokio::fs::File type kind of has the assumption that it will only need one buffer, but these methods violate that assumption.

mox692 added A-tokio Area: The main tokio crate M-fs Module: tokio/fs labels Mar 25, 2024

SteveLauC force-pushed the feat/pread_pwrite branch from 620fcac to ba4c86a Compare March 25, 2024 06:06

fs: read/write_at for unix & seek_read/write for win

3d548d8

SteveLauC force-pushed the feat/pread_pwrite branch from 9ca599d to 3d548d8 Compare March 27, 2024 07:48

SteveLauC added 5 commits March 27, 2024 15:56

test: add test for seek_read/seek_write

0274f3a

style: fmt

ffe6451

test: fix windows test

dbb1b7e

test: fix windows test

488d5f2

test: fix windows test

c98e22f

SteveLauC changed the title ~~fs: add read/write_at for fs::File~~ fs: add read_at/write_at/seek_read/seek_write for fs::File Mar 27, 2024

Darksonn reviewed Mar 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fs: add read_at/write_at/seek_read/seek_write for fs::File #6427

fs: add read_at/write_at/seek_read/seek_write for fs::File #6427

SteveLauC commented Mar 25, 2024 •

edited

SteveLauC commented Mar 25, 2024 •

edited

Darksonn commented Mar 26, 2024 •

edited

SteveLauC commented Mar 26, 2024

SteveLauC commented Mar 27, 2024 •

edited

Darksonn Mar 30, 2024

SteveLauC Mar 30, 2024

Darksonn Mar 30, 2024

SteveLauC Mar 30, 2024 •

edited

Darksonn Mar 30, 2024

SteveLauC Apr 1, 2024

Chasing1020 Apr 1, 2024

Darksonn Apr 1, 2024

asomers commented Apr 10, 2024

SteveLauC commented Apr 10, 2024

asomers commented Apr 27, 2024

Darksonn commented Apr 30, 2024

	struct Inner {
	state: State,

	/// Errors from writes/flushes are returned in write/flush calls. If a write
	/// error is observed while performing a read, it is saved until the next
	/// write / flush call.
	last_write_err: Option<io::ErrorKind>,

	pos: u64,
	}

	#[derive(Debug)]
	enum State {
	Idle(Option<Buf>),
	Busy(JoinHandle<(Operation, Buf)>),
	}

	#[derive(Debug)]
	enum Operation {
	Read(io::Result<usize>),
	Write(io::Result<()>),
	Seek(io::Result<u64>),
	}

fs: add read_at/write_at/seek_read/seek_write for fs::File #6427

Are you sure you want to change the base?

fs: add read_at/write_at/seek_read/seek_write for fs::File #6427

Conversation

SteveLauC commented Mar 25, 2024 • edited

Motivation

Solution

SteveLauC commented Mar 25, 2024 • edited

Darksonn commented Mar 26, 2024 • edited

SteveLauC commented Mar 26, 2024

SteveLauC commented Mar 27, 2024 • edited

Darksonn Mar 30, 2024

Choose a reason for hiding this comment

SteveLauC Mar 30, 2024

Choose a reason for hiding this comment

Darksonn Mar 30, 2024

Choose a reason for hiding this comment

SteveLauC Mar 30, 2024 • edited

Choose a reason for hiding this comment

Darksonn Mar 30, 2024

Choose a reason for hiding this comment

SteveLauC Apr 1, 2024

Choose a reason for hiding this comment

Chasing1020 Apr 1, 2024

Choose a reason for hiding this comment

Darksonn Apr 1, 2024

Choose a reason for hiding this comment

asomers commented Apr 10, 2024

SteveLauC commented Apr 10, 2024

asomers commented Apr 27, 2024

Darksonn commented Apr 30, 2024

SteveLauC commented Mar 25, 2024 •

edited

SteveLauC commented Mar 25, 2024 •

edited

Darksonn commented Mar 26, 2024 •

edited

SteveLauC commented Mar 27, 2024 •

edited

SteveLauC Mar 30, 2024 •

edited