Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Domain.join is suspiciously slow when using systhreads #12948

Open
talex5 opened this issue Jan 30, 2024 · 3 comments · May be fixed by #13026
Open

Domain.join is suspiciously slow when using systhreads #12948

talex5 opened this issue Jan 30, 2024 · 3 comments · May be fixed by #13026
Assignees

Comments

@talex5
Copy link
Contributor

talex5 commented Jan 30, 2024

This program spawns a domain that finishes by returning the current time. The main domain joins it to get this time and reports the difference to the time when the join finishes. I get around 50ms (the tick thread timeout?):

let () =
  let domain = Domain.spawn (fun () ->
      Thread.join @@ Thread.create (fun () -> ()) ();
      Unix.gettimeofday ()
    )
  in
  let t0 = Domain.join domain in
  let t1 = Unix.gettimeofday () in
  Printf.printf "Domain join took %f\n" (t1 -. t0)

I get (with OCaml 5.1.1):

Domain join took 0.051131        

strace -f -tt shows that a timeout seemed to be the thing that got it going again:

[pid 252806] 15:31:29.666652 +++ exited with 0 +++
[pid 252807] 15:31:29.716282 <... pselect6 resumed>) = 0 (Timeout)

If the domain doesn't run the sys-thread, then it's much faster:

Domain join took 0.000383        

This is a (minor) problem for Eio, since the main domain is stuck for up to 50ms joining the domain, when it could be doing other work. For example, here is a trace showing the top domain freezing for a while trying to join the second domain:

domain-join

(the yellow background shows areas where work is being done)

@abbysmal
Copy link
Contributor

I get around 50ms (the tick thread timeout?)

I think this intuition may be reasonable?

https://github.com/ocaml/ocaml/blob/trunk/otherlibs/systhreads/st_stubs.c#L467

When multiple domains are running, if a domain exit and it is running systhreads, it will forcefully join on all other threads before exiting. (to prevent rogue threads running in the background when the domain has been shut down.)

It may be possible to avoid such delay however, I can look at it tomorrow.

@abbysmal
Copy link
Contributor

Nevermind my previous comment, this is obviously not related to the forced joined on other systhreads. I think we are indeed hitting the tick thread timeout before it can effectively shut down.

Interestingly, there is this old comment here:
https://github.com/ocaml/ocaml/blob/trunk/otherlibs/systhreads/st_stubs.c#L564

/* Cleanup the thread machinery when the runtime is shut down. Joining the tick
   thread take 25ms on average / 50ms in the worst case, so we don't do it on
   program exit. (FIXME: not implemented in OCaml 5 yet) */

I am not sure what is not implemented here?

@gadmm
Copy link
Contributor

gadmm commented Feb 19, 2024

Since all FIXMEs in code should have a corresponding issue to track it, this is indeed related to the first issue of #12399 with the suggestion use a condition variable. (Though I do not think anyone realized that this would result in a delay for Domain.join.)

@damiendoligez damiendoligez self-assigned this Mar 7, 2024
damiendoligez added a commit to damiendoligez/ocaml that referenced this issue Mar 13, 2024
damiendoligez added a commit to damiendoligez/ocaml that referenced this issue Mar 13, 2024
@damiendoligez damiendoligez linked a pull request Mar 13, 2024 that will close this issue
damiendoligez added a commit to damiendoligez/ocaml that referenced this issue Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants