New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Give access to subsystem handle from toplevel #55
Give access to subsystem handle from toplevel #55
Conversation
This allows accessing a more permissive `start` function, as well as starting nested subsystems without introducing one additional level of indirection that requires `Send + 'static`
041f191
to
c435783
Compare
// Obviously these two things are more complex in reality | ||
// thing needs &mut monitoring to initialize, and then that needs to be moved | ||
// to the Monitoring task | ||
let mut monitoring_registry = (); |
Check warning
Code scanning / clippy
this let-binding has unit value
|
||
let tasks = Toplevel::<anyhow::Error>::nested(toplevel.subsystem_handle(), "Tasks"); | ||
for task in 0..2 { | ||
let thing_to_work_on = init_thing(&mut monitoring_registry); |
Check warning
Code scanning / clippy
this let-binding has unit value
.start(&format!("Task {task}"), move |subsystem| async move { | ||
// This simulates normal work with, but obviously it would normally wait on select | ||
// on `thing` and on_shutdown_requested | ||
let _thing = thing_to_work_on; |
Check warning
Code scanning / clippy
this let-binding has unit value
res | ||
}) | ||
.start("Monitoring", move |ss| async move { | ||
let _monitoring = monitoring_registry; |
Check warning
Code scanning / clippy
this let-binding has unit value
I think there are a couple of misconceptions in your example:
Out of curiousity/feedback:
I'm not sure if I like exposing the subsystemhandle in the toplevel object. It's more of an implementation detail and it's not supposed to be used the way you intend to use it here. I feel like a refactoring in your code could achieve a better result. Maybe I'm wrong though, I might have to think about it more. I currently have very little time though, sadly, so it might have to wait a little :/ I apologize. |
Apart from the poor syntax when spawning tasks in a loop, which I'd like to avoid, the impossibility to create from the main thread objects that contain a Subsystem so that they are able to spawn new tasks autonomously (new need my example doesn't showcase really but that goes with the restrictive interface), yes the way this example is built showcases a need to shut down in a particular order, where the monitoring stops after all the other tasks are finished shutting down. While one vision of it could be that Subsystem is an implementation detail, another could be that it's a remote referencing a Top-level, that enables starting tasks, stopping it, stopping globally, and checking whether one should stop. While this may imply some renaming, it seems like a reasonable public api, and seems much less restrictive. |
Are you 100% this does not break anything? For example, it might allow spawning new tasks while the system is already shutting down. I hid the toplevel subsystem on purpose, because I don't think it's safe to be used directly in all cases. I think what your problem showcases instead is that a rework is indeed necessary. Although I sadly have little time at the moment. Maybe someone can use my solution as inspiration for a better API? |
IIUC all that is already possible once the subsystem is leaked through any task.
IIUC these achieve the same effect in this case because
Yes, and that top-level being |
I think this might be covered, because there is already a mechanism in place that prevents new tasks from getting spawned once the system is shutting down. There is an option somewhere that gets collected on shutdown. So this might be safe after all. |
I feel like my crate is too opinionated for your usecase... I really don't feel comfortable with leaking the SubsystemHandle. I kind of regret the decision to make it owned/clonable in the first place. The handle was originally meant to stay within the subsystem, to interact with the shutdown system. The thing returned from "start" is the thing that should be used as an external handle. But I think the API got too convoluted and the naming is all wrong. And now people use the internal handle as external handles and store them in structs and stuff. I'd like to perform a rework where the internal token is a scoped mut reference and can't be leaked out of the subsystem, and then also an external handle that is more powerful than it is currently. |
Yet as far as I could tell from looking for a lib that would handle graceful shutdowns in a tokio universe on lib.rs, this is the most adapted to my use-case ;) |
I think I'll reject this for now as it doesn't really fit into existing design decisions. |
FYI (in case that would match the rework you had in mind), I ran into issues using this crate where it would sometimes freeze when tasks were supposed to timeout and that was very hard to investigate because there is a lot of code and I couldn't extract a minimal repro, so I figured the easiest way to fix it would be to just rewrite a simpler version of this crate from scratch. I've published it as |
Is the freeze related to #50? If so, I don't think you will get rid of it in a rewrite, it's deeply rooted in tokio... But sure, that's why I like the open source community, if something isn't good enough people can always fork/rewrite/improve. Kudos for the effort :) might try it out some day |
But I would indeed be happy about a reproducible hang, a lot of effort was put into making this entire thing as reliable as possible, so if I missed some weird corner case, I'd love to hear about it. |
I really tried: I did spend 2 hours trying to extract the repro and didn't manage (that was made even harder by the fact this would only happen in production). It seemed that either tokio-graceful-shutdown/src/toplevel.rs Line 264 in d7a6183
I had a look at #50 back then, and it seemed that wasn't it, since:
However, that is for this scenario that I added an extra timeout to stop waiting on aborted hanging tasks, so user can make sure this wouldn't hang even in that case. Design-wise I'm using a single control task I'll let you know if that doesn't work. 😊 |
This allows accessing a more permissive
start
function, as well as starting nested subsystems without introducing one additional level of indirection that requiresSend + 'static