New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gRPC crash in a forked process (python) #31240
Comments
This is not a gRPC bug. The only reliable solution is to not use If a process has any threads running at the time the fork system call happened, on any platform, all bets are off. The Python runtime is not async-signal-safe (an never will be - that is impossible) so you cannot execute any Python code after fork from a process that had threads. Difficult to debug random deadlocks and crashes are normal in that situation. When you see an application working fine despite that, it is running on borrowed time and has gotten lucky. Its luck will run out at some unplanned point in the future. Your |
@gpshead gave a great explanation. I don't think there's much we can do here. |
@gpshead @gnossen I think the correct direction is that grpcio should NOT create any implicit background threads until the user actually calls any resource-initialization API after forking. I agree with @gpshead's explanation of general danger of forking threaded processes, but if the user controls the order of forking and threading carefully, we should be able to use grpcio without any problem. How could we track such implicit threads in grpcio, in a holistic view? |
Examine your process and see what started the threads, from python/cpython#77906 it sounds like Apple system APIs themselves are starting background threads. Was grpc even involved at all? While it is polite for libraries to not start threads without an explicit "okay, go" API call, this applies to all transitive dependencies of your application. If you understand and manage all of the API calls made by your entire process up until you fork to make sure that nothing that spawns threads is called you can probably do it safely. This is not easy to do and becomes less easy as time goes on as things change out from underneath you. |
#33400 might help here. |
What version of gRPC and what language are you using?
gRPC v1.48.0
Python
What operating system (Linux, Windows,...) and version?
macOS 12.6
What runtime / compiler are you using (e.g. python version or version of gcc)
Python 3.10.5
What did you do?
Use gRPC client in a forked process, like in a Celery worker when Celery runs in "prefork" pool mode. A gRPC call must fail for the issue to occur.
Here's the minimal reproducible example:
What did you expect to see?
The child process should exit without any issues (although the gRPC call is expected to fail, but it doesn't matter)
What did you see instead?
The child process exits due to SIGABRT signal:
Here's a full backtrace from the breakpoint mentioned in the above message:
Additional context
On macOS there is a need to "initialize" certain types before forking (for references see links below) to avoid crashing in forked-off processes. Otherwise, there's a great chance to see messages like:
followed by SIGABRT which terminates the process.
This is a known but not well handled issue on macOS 10.13+, often affecting Python, Ruby (and some other scripting languages), typically in combination with 3rd party libraries which rely on threads, where the call to
fork()
is not followed byexec*()
(which according to various sources is not so uncommon in scripting languages; some examples in Python are the "multiprocessing" package or Celery with its "prefork" worker pool).Some sources recommend exporting
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
but this only hides the problem that still exists and can potentially result in hard-to-diagnose deadlocks.Related issues:
References:
Possible workarounds/solutions
One possible workaround/solution is to initialise certain types before forking. Below is a snippet of code which we use in our codebase:
If you add a call to
fix_grpc_client()
in the minimal reproducible example shown at the top of this issue, before callingfork()
, the issue will be gone.However, this only works just for one type,
NSTimeZone
in this case, so it's not an ideal solution. Meaning you can't turn this into a generalised fix for all potential types which require "initialisation" prior forking (or at least I'm not aware of any way of doing so).Additionally, in our codebase we simply call
fix_grpc_client()
early during process init, whereas we could potentially usepthread_atfork()
(as suggested in https://www.wefearchange.org/2018/11/forkmacos.rst.html, although the author claims it didn't work for him) to do this right before forking and only if forking at all.The text was updated successfully, but these errors were encountered: