Give access to subsystem handle from toplevel #55

Ten0 · 2023-03-12T00:44:52Z

This allows accessing a more permissive start function, as well as starting nested subsystems without introducing one additional level of indirection that requires Send + 'static

This allows accessing a more permissive `start` function, as well as starting nested subsystems without introducing one additional level of indirection that requires `Send + 'static`

tests/subsystem_handle.rs

+    // Obviously these two things are more complex in reality
+    // thing needs &mut monitoring to initialize, and then that needs to be moved
+    // to the Monitoring task
+    let mut monitoring_registry = ();


tests/subsystem_handle.rs

+
+    let tasks = Toplevel::<anyhow::Error>::nested(toplevel.subsystem_handle(), "Tasks");
+    for task in 0..2 {
+        let thing_to_work_on = init_thing(&mut monitoring_registry);


tests/subsystem_handle.rs

+            .start(&format!("Task {task}"), move |subsystem| async move {
+                // This simulates normal work with, but obviously it would normally wait on select
+                // on `thing` and on_shutdown_requested
+                let _thing = thing_to_work_on;


tests/subsystem_handle.rs

+            res
+        })
+        .start("Monitoring", move |ss| async move {
+            let _monitoring = monitoring_registry;


Finomnis · 2023-03-13T21:44:06Z

I think there are a couple of misconceptions in your example:

ss.request_global_shutdown() is probably the wrong function in "Tasks", ss.request_shutdown() is probably what you want.
tasks_ss.request_shutdown(); does not shut down tasks_ss; it shuts down the Toplevel it is contained in. The only way of shutting down a nested subsystem is from its parent.
Don't delay returning res until the end of the function in "Tasks". Returning an error triggers a shutdown automatically.

Out of curiousity/feedback:

Why do you use a nested subsystem in the first place? Because you want to shutdown as soon as all the nested tasks are finished? I ask because I try understand what API I should improve in a potential rework. The current implementation is quite robust, but is almost impossible to extend any further in functionality. I ask about this specifically because one functionality I already found that is desperately needed is wait_for_all_children() inside of a subsystem. This would prevent the necessity to use a nested toplevel, correct?

I'm not sure if I like exposing the subsystemhandle in the toplevel object. It's more of an implementation detail and it's not supposed to be used the way you intend to use it here. I feel like a refactoring in your code could achieve a better result. Maybe I'm wrong though, I might have to think about it more. I currently have very little time though, sadly, so it might have to wait a little :/ I apologize.

Ten0 · 2023-03-14T09:17:53Z

Apart from the poor syntax when spawning tasks in a loop, which I'd like to avoid, the impossibility to create from the main thread objects that contain a Subsystem so that they are able to spawn new tasks autonomously (new need my example doesn't showcase really but that goes with the restrictive interface), yes the way this example is built showcases a need to shut down in a particular order, where the monitoring stops after all the other tasks are finished shutting down.

While one vision of it could be that Subsystem is an implementation detail, another could be that it's a remote referencing a Top-level, that enables starting tasks, stopping it, stopping globally, and checking whether one should stop. While this may imply some renaming, it seems like a reasonable public api, and seems much less restrictive.

Finomnis · 2023-03-14T12:13:04Z

Are you 100% this does not break anything? For example, it might allow spawning new tasks while the system is already shutting down. I hid the toplevel subsystem on purpose, because I don't think it's safe to be used directly in all cases.

I think what your problem showcases instead is that a rework is indeed necessary.

Although I sadly have little time at the moment. Maybe someone can use my solution as inspiration for a better API?

Ten0 · 2023-03-14T12:41:54Z

For example, it might allow spawning new tasks while the system is already shutting down

IIUC all that is already possible once the subsystem is leaked through any task.
I guess if I wanted to circumvent this imitation, I could spawn a task that sends its subsystem through a oneshot channel, and await the receiver right after. That just seems like a terribly wrong way to do it compared to opening this PR.

ss.request_global_shutdown() is probably the wrong function in "Tasks", ss.request_shutdown() is probably what you want.

IIUC these achieve the same effect in this case because ss at that point is the top-level Toplevel. (naming ^^) I just wanted to make it very clear that at that point I really wanted to shutdown everything and not just tasks.

tasks_ss.request_shutdown(); does not shut down tasks_ss; it shuts down the Toplevel it is contained in. The only way of shutting down a nested subsystem is from its parent.

Yes, and that top-level being tasks, it means I'm only requesting shutdown of down tasks, and in particular not that of monitoring.

Finomnis · 2023-03-14T16:39:14Z

For example, it might allow spawning new tasks while the system is already shutting down

I think this might be covered, because there is already a mechanism in place that prevents new tasks from getting spawned once the system is shutting down. There is an option somewhere that gets collected on shutdown. So this might be safe after all.

Finomnis · 2023-03-14T16:45:15Z

I feel like my crate is too opinionated for your usecase...

I really don't feel comfortable with leaking the SubsystemHandle. I kind of regret the decision to make it owned/clonable in the first place. The handle was originally meant to stay within the subsystem, to interact with the shutdown system. The thing returned from "start" is the thing that should be used as an external handle.

But I think the API got too convoluted and the naming is all wrong. And now people use the internal handle as external handles and store them in structs and stuff.

I'd like to perform a rework where the internal token is a scoped mut reference and can't be leaked out of the subsystem, and then also an external handle that is more powerful than it is currently.

Ten0 · 2023-03-14T17:54:35Z

Yet as far as I could tell from looking for a lib that would handle graceful shutdowns in a tokio universe on lib.rs, this is the most adapted to my use-case ;)

Finomnis · 2023-04-22T08:21:34Z

I think I'll reject this for now as it doesn't really fit into existing design decisions.

Ten0 · 2023-06-12T14:44:09Z

I'd like to perform a rework where the internal token is a scoped mut reference and can't be leaked out of the subsystem, and then also an external handle that is more powerful than it is currently.

FYI (in case that would match the rework you had in mind), I ran into issues using this crate where it would sometimes freeze when tasks were supposed to timeout and that was very hard to investigate because there is a lot of code and I couldn't extract a minimal repro, so I figured the easiest way to fix it would be to just rewrite a simpler version of this crate from scratch.

I've published it as tokio_tasks_shutdown (there's 3x less code than in tokio-graceful-shutdown).

Finomnis · 2023-06-12T16:40:10Z

Is the freeze related to #50?

If so, I don't think you will get rid of it in a rewrite, it's deeply rooted in tokio...

But sure, that's why I like the open source community, if something isn't good enough people can always fork/rewrite/improve. Kudos for the effort :) might try it out some day

Finomnis · 2023-06-12T16:43:02Z

But I would indeed be happy about a reproducible hang, a lot of effort was put into making this entire thing as reliable as possible, so if I missed some weird corner case, I'd love to hear about it.

Ten0 · 2023-06-12T17:13:00Z

I would indeed be happy about a reproducible hang, a lot of effort was put into making this entire thing as reliable as possible, so if I missed some weird corner case, I'd love to hear about it.

I really tried: I did spend 2 hours trying to extract the repro and didn't manage (that was made even harder by the fact this would only happen in production). It seemed that either TaskHandle::abort() was never called for some tasks, or it had no effect (we had the timeout log), and then we were hanging on

tokio-graceful-shutdown/src/toplevel.rs

Line 264 in d7a6183

result = self.attempt_clean_shutdown() => result

I had a look at #50 back then, and it seemed that wasn't it, since:

We didn't get a successful shutdown (it hangs before)
Before adding tokio-graceful-shutdown to the program, it wouldn't hang, which led me to think that a hanging task wasn't what was causing the issue

However, that is for this scenario that I added an extra timeout to stop waiting on aborted hanging tasks, so user can make sure this wouldn't hang even in that case.

Design-wise I'm using a single control task selecting on channels instead of mutexes, I think that might make it easier to reason about (and of course even simpler since I didn't put subsystems concept), so hopefully if there is a such potential corner case I didn't hit it 🤷, and if #50 was indeed the issue the new timeout should solve it (worse-case scenario user can always process::exit as long as the function returns).

I'll let you know if that doesn't work. 😊

Ten0 added 3 commits March 12, 2023 01:44

Give access to subsystem handle from toplevel

c90fd98

This allows accessing a more permissive `start` function, as well as starting nested subsystems without introducing one additional level of indirection that requires `Send + 'static`

fix nested so that it allows all ErrTypes

760b438

Add test that showcases use

c435783

Ten0 force-pushed the give_access_to_subsystem_handle_from_toplevel branch from 041f191 to c435783 Compare March 12, 2023 10:39

github-advanced-security bot found potential problems Mar 13, 2023

View reviewed changes

Finomnis closed this Apr 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Give access to subsystem handle from toplevel #55

Give access to subsystem handle from toplevel #55

Ten0 commented Mar 12, 2023

Finomnis commented Mar 13, 2023 •

edited

Ten0 commented Mar 14, 2023 •

edited

Finomnis commented Mar 14, 2023 •

edited

Ten0 commented Mar 14, 2023

Finomnis commented Mar 14, 2023

Finomnis commented Mar 14, 2023 •

edited

Ten0 commented Mar 14, 2023 •

edited

Finomnis commented Apr 22, 2023

Ten0 commented Jun 12, 2023 •

edited

Finomnis commented Jun 12, 2023

Finomnis commented Jun 12, 2023

Ten0 commented Jun 12, 2023 •

edited

Give access to subsystem handle from toplevel #55

Give access to subsystem handle from toplevel #55

Conversation

Ten0 commented Mar 12, 2023

Finomnis commented Mar 13, 2023 • edited

Ten0 commented Mar 14, 2023 • edited

Finomnis commented Mar 14, 2023 • edited

Ten0 commented Mar 14, 2023

Finomnis commented Mar 14, 2023

Finomnis commented Mar 14, 2023 • edited

Ten0 commented Mar 14, 2023 • edited

Finomnis commented Apr 22, 2023

Ten0 commented Jun 12, 2023 • edited

Finomnis commented Jun 12, 2023

Finomnis commented Jun 12, 2023

Ten0 commented Jun 12, 2023 • edited

Finomnis commented Mar 13, 2023 •

edited

Ten0 commented Mar 14, 2023 •

edited

Finomnis commented Mar 14, 2023 •

edited

Finomnis commented Mar 14, 2023 •

edited

Ten0 commented Mar 14, 2023 •

edited

Ten0 commented Jun 12, 2023 •

edited

Ten0 commented Jun 12, 2023 •

edited