Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io: Add AsyncFd #2903

Merged
merged 11 commits into from Oct 22, 2020
Merged

io: Add AsyncFd #2903

merged 11 commits into from Oct 22, 2020

Conversation

bdonlan
Copy link
Contributor

@bdonlan bdonlan commented Oct 2, 2020

This adds AsyncFd, a unix-only structure to allow for read/writability states
to be monitored for arbitrary file descriptors.

Issue: #2728

Motivation

See #2728 for the motivation for this change.

Solution

This introduces a new AsyncFd type which can be used to monitor arbitrary file descriptors for readability and writability. I've chosen not to expose a poll_* API as callers can emulate such an API on their own (an example is given in the rustdocs). I've also avoided exposing specific Ready/Interest-like types (this could be added in the future in a backwards-compatible way).

I'm a little bit uncomfortable with how the cfg-macros work out here - I had to touch quite a few places to add the async-fd feature flag. That feels like a refactor for another PR however.

@Darksonn Darksonn added A-tokio Area: The main tokio crate C-enhancement Category: A PR with an enhancement or bugfix. M-io Module: tokio/io labels Oct 3, 2020
@bdonlan
Copy link
Contributor Author

bdonlan commented Oct 5, 2020

macos tests are failing because libc exposes different APIs for getting errno depending on the OS (I don't have access to a macos box and thus can't test locally). Considering pulling in nix as a dev-dependency for its gnarly portable implementation of errno for this test, any thoughts?

@bdonlan bdonlan force-pushed the asyncfd branch 2 times, most recently from 03399d8 to dd728fe Compare October 5, 2020 21:57
@bdonlan
Copy link
Contributor Author

bdonlan commented Oct 5, 2020

Force pushing to rebase on master.

@bdonlan
Copy link
Contributor Author

bdonlan commented Oct 5, 2020

Discussed the API a bit with @carllerche (particularly around whether it should own the FD). I feel that AsyncFd seems to imply that it owns the fd, but there are non-owning use cases as well (and we want to minimize API surface area for now). I think the best approach for now is to rename this to not imply ownership - e.g. something like tokio::io::FdWatcher?

Also, @seanmonstar , do you have any opinions or concerns around this API?

tokio/src/io/async_fd.rs Outdated Show resolved Hide resolved
tokio/src/io/async_fd.rs Outdated Show resolved Hide resolved
tokio/src/io/async_fd.rs Outdated Show resolved Hide resolved
tokio/src/io/async_fd.rs Show resolved Hide resolved
@ipetkov
Copy link
Member

ipetkov commented Oct 6, 2020

Discussed the API a bit with @carllerche (particularly around whether it should own the FD). I feel that AsyncFd seems to imply that it owns the fd, but there are non-owning use cases as well (and we want to minimize API surface area for now). I think the best approach for now is to rename this to not imply ownership - e.g. something like tokio::io::FdWatcher?

@bdonlan maybe I'm lacking some context, but could you elaborate on the non-owned use cases? I'm still worried that having a non-owned-by-default API will lead to various footguns with juggling drop ordering. Consider a struct which holds some owned fd type and an AsyncFd: declaring the AsyncFd second in the struct will cause it's destructor to run after the owned fd is dropped/closed which may lead to problems.

@bdonlan
Copy link
Contributor Author

bdonlan commented Oct 6, 2020

@ipetkov It's possible, for example, for the file descriptor to be owned by some native library, where we're not allowed to close it. There are workarounds of course if we were to implement closing of FDs (eg. duping the file descriptor, or implementing IntoRawFd) - but we need to choose which mode of operation is more natural for the API.

Personally, I'd be more of a fan of the owned API - @carllerche, what do you think about the concerns above?

@ipetkov
Copy link
Member

ipetkov commented Oct 6, 2020

@bdonlan not necessarily suggesting we use generics but AsyncFd could take any type which implements AsRawFd and allow it to handle closing the descriptor. Maybe we also provide a NotOwnedFd wrapper which doesn't close the descriptor.

Alternatively, if we really want to avoid generics or traits, we could have some kind of AsyncFd::new_with_not_owned(fd) constructor which will skip the libc::close call when dropped

@bdonlan
Copy link
Contributor Author

bdonlan commented Oct 7, 2020

Generics feels like it might be the right approach, but one issue is that many IO objects will need mutable access. Consider instantiating an AsyncFd<FileDescriptorWrapper> where FileDescriptorWrapper implements Read + Write + AsRawFd. This works, but you wouldn't be able to implement AsyncRead and AsyncWrite in terms of the inner Read + Write as you would be holding a shared reference to the AsyncFd via the ReadyGuard (and, for that matter, you might want to have both a read and a write in flight at the same time).

@ipetkov
Copy link
Member

ipetkov commented Oct 8, 2020

@bdonlan what if Async{Read, Write} was implemented in terms of accessing the raw file descriptor and not the Read/Write traits themselves? It could still allow for "mutable" access over the descriptor (the only bound needed would be T: AsRawFd)

@bdonlan
Copy link
Contributor Author

bdonlan commented Oct 8, 2020

I don't necessarily want to assume that the fd is meant to be used via read() and write() (and, moreover, AsyncRead/AsyncWrite need mut access to the AsyncFd so they could delegate to the mutable inner anyway)

@Dessix
Copy link

Dessix commented Oct 15, 2020

I think as long as into_raw_fd is implemented in such a way as to properly transfer ownership, this should be fine. Potentially we should advise that derived types owning the FD should either implement into_raw_fd or otherwise present a means of transferring ownership out?

As for preventing duplicate ownership- I think ensuring that an AsyncFD provides the same behaviours as expected from an unmonitored FD is the right way to go, rather than trying to spread ownership across objects; otherwise, we would have to prepare for destruction by the true owner causing issues when the shared owner accesses it. The easiest way to provide these behaviours is implementing AsRawFd, but pretending that the result's lifetime is bound by that of the AsyncFd. Perhaps a tweaked AsRawFd which returns a reference bounded by the lifetime of the owner is in order?

Alternatively, we could look into ensuring that FDs are refcounted, and the only true owner is the refcounter; anything using the FD would "subscribe" to it and increase the refcount until it's ready to release it.

The above two options are the only ones I see being sensible under the single-ownership-principle Rust enforces.

@carllerche
Copy link
Member

After reading the comments, if this is a type that takes ownership of the FD, I like the idea of it being AsyncFd<T: AsRawFd>. This way, the question of "close on drop" can be punted to the internal type.

That said, it would add an API wrinkle. How would you get &mut access to the inner type while holding on to the ReadyGuard?

linkmauve added a commit to linkmauve/conch-runtime that referenced this pull request Oct 16, 2020
This is currently pending on tokio-rs/tokio#2903
in order to support registering a fd as a tokio Source.
@de-vri-es
Copy link
Contributor

de-vri-es commented Oct 17, 2020

I've chosen not to expose a poll_* API as callers can emulate such an API on their own (an example is given in the rustdocs).

Why not expose the poll_* methods directly? Until we get support for async functions in traits, poll_* functions are essential for efficient traits with asynchronous behaviour. That also means that types built on top of AsyncFd will need to expose poll_* functions, for which direct access to AsyncFd::poll_* is desirable.

The suggested work-around of allocating a Pin<Box<dyn Future>> to emulate a poll interface requires dynamic allocation, while direct access to AsyncFd::poll_* is zero-cost.

@Darksonn
Copy link
Contributor

I also think that poll_* methods should be exposed. This is a low-level type, and its predecessor PollEvented was typically used to implement traits like AsyncRead and AsyncWrite, which require access to them.

I'm really not a fan of using Pin<Box<...>> as an alternative to poll_* methods.

@bdonlan
Copy link
Contributor Author

bdonlan commented Oct 19, 2020

That said, it would add an API wrinkle. How would you get &mut access to the inner type while holding on to the ReadyGuard?

After some thought, I don't think it's necessary to get &mut access here. Consider that if the underlying fd is already suitably locked by the operating system, it's better to allow multiple threads to share the underlying object to avoid double locking or other mutual exclusion. If, however, we actually do need mutual exclusion, there are various options: For example, you could use a AsyncFd<RefCell<Whatever>> (or with a suitable adapter) if only one task is to access the object; if multiple tasks are to access the object, there'll need a lock at some level anyway, so AsyncFd<Mutex<Whatever>> works.

What I wonder here is whether we should offer a new trait to extract the fd that can see through common mutexes like this, something like:

trait FdHolder {
  type Output: Future<Output=RawFd>;
  fn get_fd(&self) -> Output;
}

impl<T: AsRawFd> FdHolder for T { ... }
impl<T: AsRawFd> FdHolder for RefCell<T> { ... }
impl<T: AsRawFd> FdHolder for std::mutex::Mutex<T> { ... }
impl<T: FdHolder> FdHolder for tokio::sync::Mutex<T> { ... }

or if it would be better to just accept a conversion function for types not implementing AsRawFd:

impl<T> AsyncFd {
  async fn new(obj: T) where T: AsRawFd { ... }
  async fn new_with_fd_extracter(obj: T, fd_extract: impl FnOnce(&mut T)->RawFd) { ... }
}

or if we should just let the user pass a fd directly:

impl<T> AsyncFd {
  async fn new(obj: T) where T: AsRawFd { ... }
  async fn new_with_fd(obj: T, fd: RawFd) { ... }
}

@carllerche
Copy link
Member

@bdonlan Ok, that sounds good to me. It is also possible to add mut support in the future by adding a readiness_mut fn and providing access to the inner T via the guard.

I'm also fine adding poll_read_ready and poll_write_ready functions, but this can be done in a follow up PR.

So, in conclusion, we are going to change AsyncFd to take T: AsRawFd. We probably should also add get_ref(&self) -> &T and into_inner(self) -> T functions. In 0.2, PollEvented::into_inner() returned Result. This was done in the event deregister failed. This is probably still fine.

@bdonlan
Copy link
Contributor Author

bdonlan commented Oct 19, 2020

I'm working on updating this with the feedback here - before I go too far, what are people's thoughts on the feature flag for this - define a new one or use net?

Previously, there was a race window in which an IO driver shutting down could
fail to notify ScheduledIo instances of this state; in particular, notification
of outstanding ScheduledIo registrations was driven by `Driver::drop`, but
registrations bypass `Driver` and go directly to a `Weak<Inner>`. The `Driver`
holds the `Arc<Inner>` keeping `Inner` alive, but it's possible that a new
handle could be registered (or a new readiness future created for an existing
handle) after the `Driver::drop` handler runs and prior to `Inner` being
dropped.

This change fixes this in two parts: First, notification of outstanding
ScheduledIo handles is pushed down into the drop method of `Inner` instead,
and, second, we add state to ScheduledIo to ensure that we remember that the IO
driver we're bound to has shut down after the initial shutdown notification, so
that subsequent readiness future registrations can immediately return (instead
of potentially blocking indefinitely).

Fixes: tokio-rs#2924
@bdonlan
Copy link
Contributor Author

bdonlan commented Oct 20, 2020

Implemented the owned version, and also ported TcpListener over as a proof-of-concept. In the process I rediscovered and resolved #2924 (I only realized there was already a PR after writing it, oops - but there's a second race in there as well)

@bdonlan
Copy link
Contributor Author

bdonlan commented Oct 20, 2020

Oh. Well. I guess we can't replace PollEvented with AsyncFd :(
[unsupported on windows]

@bdonlan
Copy link
Contributor Author

bdonlan commented Oct 20, 2020

I'll write up an independent test for the shutdown notification issue and remove the tcplistener port tomorrow.

Copy link
Member

@carllerche carllerche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great 👍 thanks for taking this on.

I have some minor change requests, but we're almost there.

tokio/src/io/async_fd.rs Outdated Show resolved Hide resolved
tokio/src/io/async_fd.rs Outdated Show resolved Hide resolved
tokio/src/io/async_fd.rs Outdated Show resolved Hide resolved
tokio/src/io/async_fd.rs Outdated Show resolved Hide resolved
tokio/src/io/async_fd.rs Outdated Show resolved Hide resolved
tokio/src/io/async_fd.rs Outdated Show resolved Hide resolved
tokio/src/io/async_fd.rs Outdated Show resolved Hide resolved
tokio/src/io/mod.rs Show resolved Hide resolved
@carllerche
Copy link
Member

@seanmonstar Could you look at this? You worked in here most recently.

Comment on lines +55 to +58
/// The ownership of this slab is moved into this structure during
/// `Driver::drop`, so that `Inner::drop` can notify all outstanding handles
/// without risking new ones being registered in the meantime.
resources: Mutex<Option<Slab<ScheduledIo>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have probably thought about this a lot more than me, but I fail to understand how would moving the slab into the Option ensure that no new resources are registered while dropping. Registration is happening through the slab::Allocator<ScheduledIo> not the Slab, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving it into Inner isn't what ensures no further resources are registered. Because Handle holds a weak ref to Inner, we know that, when Inner drops, no further registration can happen. So cleanup happens at that point in Inner::drop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think there probably is a better way to handle shutdown, but this is the quickest way to correct the issues that exist today w/o a bigger refactor. Impact is minimal as well as the Mutex is only used once in the lifecycle of the I/O driver (shutdown).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

God it, that makes sense! Thank you. Out of curiosity, what would that better way look like?

Comment on lines +201 to +212
impl Drop for Inner {
fn drop(&mut self) {
let resources = self.resources.lock().take();

if let Some(mut slab) = resources {
slab.for_each(|io| {
// If a task is waiting on the I/O resource, notify it. The task
// will then attempt to use the I/O resource and fail due to the
// driver being shutdown.
io.shutdown();
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that changes things a bit. Before the idea was that the moment the driver gets dropped, it wakes all io tasks. Now however, there might be strong arc(s) being held somewhere else. So this logic is being delayed until all these arcs are dropped. Is that desirable? Does that actually solve the original racing issue? Might be missing something here...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not change things materially. The strong refs are only held for small amounts of time to perform registration / deregistration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to keep the Inner alive with enough traffic on the handle, but this isn't worse than it was before (it's better, since eventually once all the Arc cloning stops and the Inner can be dropped, all the new handles that were registered after the driver was dropped will be properly notified)

Copy link
Member

@carllerche carllerche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-enhancement Category: A PR with an enhancement or bugfix. M-io Module: tokio/io
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants