-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cloud Spanner with Python multiprocessing hangs #4736
Comments
cc @jonparrott |
@snthibaud can you let us know what versions you're using? |
@vkedia @jonparrott Thanks for looking into it! These are the versions in my virtual environment:
|
Are you doing any grpc things or using the client at all before starting the pool? Forking after gRPC has been used apparently causes issues. FYI: If you're using using multiprocessing to increase write throughput (and not actually doing anything CPU intensive), i'd recommend using threading (as RPC calls are I/O bound and not CPU bound). |
@chemelnucfin Since you've been doing a lot of work on spanner lately, can I task you with trying to put together a simple reproducible case for this? |
@jonparrott It'd certainly be possible to allow swapping out all |
@dhermes not sure what you mean? |
@jonparrott I was trying to describe how one might implement support in gRPC for users to set the type of concurrency in a transparent way. |
Oh - I don't think it's anything in userland. From what I understand this is just a case of trying to use any gRPC-based client after forking - not an issue with any concurrency that our client libraries use. |
@jonparrott I'll take a look. |
currently getting this:
|
@chemelnucfin try creating a new client in each process, as in the OP. |
@jonparrott So I have chased the multiprocessing to this
The culprit is the
which I have chased to more or less here: Any pointers from here will be appreciated. Also, multiprocessing didn't hang though, snapshots were ok. |
@jonparrott In answer to your question. I am using a different instance of the spanner client before starting the multiprocessing pool. This should be ok, right? |
I am afk now, but i also believe i had a separate client before using
multiprocessing and it did not hang getting a snapshot. Could you try
that? Mp did not work with insert though, but it did not hang.
…On Sun, Jan 14, 2018, 11:15 PM Stéphane Thibaud ***@***.***> wrote:
@jonparrott <https://github.com/jonparrott> In answer to your question. I
am using a different instance of the spanner client before starting the
multiprocessing pool. This should be ok, right?
@chemelnucfin <https://github.com/chemelnucfin> I am not completely sure
if the service account had the right permissions, so maybe it did hang
because of a missing permission although I did try again after changing the
permissions and it was still hanging.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4736 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADzDDLYa9DSA0ocqYKzGOw2uV3Uj44uxks5tKvslgaJpZM4Rac7N>
.
|
@chemelnucfin What exactly should I try? I am not sure if it was related to the retrieval of a snapshot. For me, it was hanging when entering the scope of a batch. Does it retrieve a snapshot at that point? |
If I remember correctly, that is the core of the issue. grpc's c core can not function after a |
@jonparrott I have to go back and check, but I believe that that is what I did and multiprocessing worked. I just couldn't use batch.insert function. I'll report back later. |
On Tue, Jan 16, 2018 at 8:43 AM, Jon Wayne Parrott ***@***.*** > wrote:
I am using a different instance of the spanner client before starting the
multiprocessing pool. This should be ok, right?
If I remember correctly, that is the core of the issue. grpc's c core can
not function after a fork, so any work with grpc must be done *after* you
fork. Correct me if I'm wrong @kpayson64 <https://github.com/kpayson64>
@nathanielmanistaatgoogle <https://github.com/nathanielmanistaatgoogle>
Correct. You should delay client creation until after the fork() call in
the process pool.
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4736 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ARd8Ksjjte7dA4JqkPebRmrHJ83aSDYCks5tLNHOgaJpZM4Rac7N>
.
|
@snthibaud Could you try to From what I understand it's once you step into the context manager it's hanging, so the code runs ok without the entire Could you also try some other operations as I did in #4756 and see if they work? |
@chemelnucfin When putting a pass statement there, it still hangs. I traced the hanging back to line 103 of |
@kpayson64 @jonparrott I understand that in general the client cannot be passed to a new process, but in this case, the main process and all forked processes each have their own client (for the forked processes: see code above). These processes should be completely separated from each other, right? Is the grpc core still somehow shared between the main process and child processes? |
@chemelnucfin This traceback when stopping the process might also help:
|
@kpayson64 @jonparrott @chemelnucfin I have tried creating all clients only after forking and that did work. Perhaps the gprc core is shared between the processes after all. I still feel that it's not completely right that subprocesses are hanging in this case. |
Glad to know. @jonparrott Now I'm confused why my code did work. |
@jonparrott @chemelnucfin I am trying to use threads instead of processes (as you suggested) to increase write throughput as follows:
But then I get |
That is my understanding, so using any gRPC client before forking will cause issues in the forked processes.
Our client should be thread-safe, and in any case, should not cause a segfault. This seems to point to some native code causing issues. @kpayson64 can you help out here? |
Yes, that looks like a core segfault, but without any additional information its hard to diagnose. Its possible that you are running into grpc/grpc#13327, which as I understand occurs when there are a bunch of active threads. |
I believe @haih-g also ran into this issue while using spanner client across multiple threads. |
@kpayson64 Yes, that's almost certainly it. I am also using Python 3.6 like most users in the issue you mention. |
@snthibaud Since I believe it is a different issue now and also reported in a different repository, I will close this issue. Please feel free to reopen or open another issue if desired. Thanks. |
Actually I would like to reopen it. The underlying issue might be in gRPC but this will definitely impact Spanner customers. At the least we should have some documentation and recommendations around how customers can use spanner client safely across processes and across threads. |
Agreed, we should have some documentation that mentions explicitly that you can not use this client in a multiprocessing environment unless you defer all calls until after forking. |
@jonparrott @vkedia I was going to follow up in the PR or another issue regarding the multiprocessing issues. Sorry! |
Thanks. What about multi threading? Does the client not work at all with multi threading or it works under certain conditions? |
The client should be thread-safe, if there's any issues with that we should solve them. |
But there seems to be some issue with gRPC which is causing segmentation fault in spanner client when used across multiple threads. How do we plan to tackle that? |
@vkedia that may already be resolved (see grpc/grpc#13327 (comment)), but we need someone to test. So far it seems it's been difficult to find a reproducible case. |
@snthibaud Is the multithreading still segfaulting and multiprocessing still hanging and causing you problems after the fix @jonparrott mentioned? @kpayson64 I wrote a test in #4756 which creates the pool after client creation, and it does not hang. However, @snthibaud and you mention that it should hang. Did I write something wrong? Thank you. |
@snthibaud From this comment here, grpc/grpc#12455 (comment) it seems the problem only occurs on mac and not linux. May I ask if you are using a mac? |
Folks, gRPC issue 13327 was recently fixed and verified to address some known segfault cases involving multithreading. This fix in gRPC is related to a bug in internal refcount logic and NOT related to fork(). If anyone on on this thread is able to reproduce this issue, it is worth retrying with gRPC release 1.91. I am curious to know if there is a way to reproduce hang/crash when forking before client creation. |
@srini100 FWIW, our system tests for spanner use multithreading, and don't crash. I just created a multiprocessing testcase which runs cleanly:
So, I'm going to close this issue. Feel free to reopen if you can provide a reproducer which hangs / breaks when using multiprocessing in this way. |
@tseaver link is dead. Also, just curious, do you have a Mac or Linux? |
I've updated the link. Linux. |
Inserting rows turned out to be very slow (80 KB/s), so I am trying multiprocessing to parallelize the process.
I am currently using only one node in Cloud Spanner, the rows are sorted by primary key and there are ~30 batches that have a few hundred rows each.
For this I am using the following code:
The code hangs when it tries to enter the batch scope (it seems to never succeed in creating a session).
Any ideas?
The text was updated successfully, but these errors were encountered: