-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #12948 (slow domain join) #13026
base: trunk
Are you sure you want to change the base?
Conversation
int res; | ||
caml_plat_assert_locked(cond->mutex); | ||
res = pthread_cond_timedwait(&cond->cond, cond->mutex, deadline); | ||
if (res != ETIMEDOUT) check_err("timedwait", res); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One documentation for pthread_cond_timedwait
mentions spurious wakeups. In these cases one should probably restart pthread_cond_timedwait
rather than raise an exception. (edit: callers should take spurious wakeups into account)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading more closely this time around, I changed where I think one should deal with spurious wakeups. I also have added a comment about the choice of clock.
caml_plat_assert_locked(cond->mutex); | ||
res = pthread_cond_timedwait(&cond->cond, cond->mutex, deadline); | ||
if (res != ETIMEDOUT) check_err("timedwait", res); | ||
return res; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return res; | |
return res != ETIMEDOUT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, res
is either 0 (signal or spurious wake-up) or ETIMEDOUT (deadline reached), so this is just adding a negation. I would rather keep this function as close as possible to pthread_cond_timedwait
and leave the return value as-is.
@@ -114,8 +114,7 @@ typedef struct { pthread_cond_t cond; caml_plat_mutex* mutex; } caml_plat_cond; | |||
#define CAML_PLAT_COND_INITIALIZER(m) { PTHREAD_COND_INITIALIZER, m } | |||
void caml_plat_cond_init(caml_plat_cond*, caml_plat_mutex*); | |||
void caml_plat_wait(caml_plat_cond*); | |||
/* like caml_plat_wait, but if nanoseconds surpasses the second parameter | |||
without a signal, then this function returns 1. */ | |||
int caml_plat_timedwait(caml_plat_cond*, const struct timespec *); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int caml_plat_timedwait(caml_plat_cond*, const struct timespec *); | |
/* return 0 on timeout */ | |
int caml_plat_timedwait(caml_plat_cond*, const struct timespec *); |
otherlibs/systhreads/st_pthreads.h
Outdated
}else{ | ||
deadline.tv_sec = curtime.tv_sec; | ||
} | ||
(void) caml_plat_timedwait (&Tick_thread_control.cond, &deadline); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correcting my previous comment, I'd see spurious wakeups handled here:
(void) caml_plat_timedwait (&Tick_thread_control.cond, &deadline); | |
while(caml_plat_timedwait (&Tick_thread_control.cond, &deadline)) | |
{ | |
if (Tick_thread_control.state == Tick_stop) goto …; | |
/* In case of spurious wakeup, keep waiting */ | |
}; |
otherlibs/systhreads/st_pthreads.h
Outdated
caml_plat_lock (&Tick_thread_control.mu); | ||
while(1){ | ||
if (Tick_thread_control.state == Tick_stop) break; | ||
gettimeofday (&curtime, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clock for caml_plat_timedwait
is set inside caml_plat_cond_init_aux
in a platform-dependent manner. Please use e.g. clock_gettime
to make sure the deadline is on the same clock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, clock_gettime
is not available on MacOS, so I had to reproduce the platform-dependent part.
otherlibs/systhreads/st_stubs.c
Outdated
caml_plat_unlock (&Tick_thread_control.mu); | ||
caml_plat_signal (&Tick_thread_control.cond); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
caml_plat_unlock (&Tick_thread_control.mu); | |
caml_plat_signal (&Tick_thread_control.cond); | |
caml_plat_signal (&Tick_thread_control.cond); | |
caml_plat_unlock (&Tick_thread_control.mu); |
The DEBUG_LOCK
seems to enforce that the mutex is locked when entering caml_plat_signal
. But there is a PR currently under review which aims to remove the needless restriction.
However according to some sources online, schedulers are programmed to better handle the case where pthread_cond_signal
is called with the mutex locked. Not an expert on these details though. Feel free to ignore this change if you have a strong opinion about it. I am curious to know the rationale for the best choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no strong opinion and I am ready to believe that schedulers are optimized that way.
a317498
to
e587924
Compare
Many thanks for the thorough review. I changed the code to implement your suggestions. |
Running precheck at https://ci.inria.fr/ocaml/job/precheck/972/ . |
This needs a rebase, it would be useful to get an explicit approval from @gadmm, but also the CI reports a lot of failed tests on Windows:
This probably needs some investigation if the results are consistent after the rebase and CI rerun. |
I think this is what @gadmm had in mind in #12399 (comment).
fixes #12948
Also fixes the first item of #12399