Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential deadlock due to calling callbacks while holding a lock #3084

Closed
zhangkun83 opened this issue Jun 9, 2017 · 8 comments · Fixed by #9319
Closed

Potential deadlock due to calling callbacks while holding a lock #3084

zhangkun83 opened this issue Jun 9, 2017 · 8 comments · Fixed by #9319
Assignees
Labels
Milestone

Comments

@zhangkun83
Copy link
Contributor

zhangkun83 commented Jun 9, 2017

InProcessClientStream and InProcessServerStream are synchronized on their own. InProcessClientStream.serverStreamListener is called under synchronized (InProcessClientStream.this), and vice versa.

If the application tries to call methods on ClientCall or ServerCall from within the callbacks (assuming that it has already taken care of the thread-safety of the method calls on "Call" objects), a deadlock is possible when direct executor is used. For example:

  • Thread1

    1. InProcessClientStream.serverRequested (locks InProcessClientStream.this)
    2. InProcessClientStream.serverStreamListener.messageRead()
    3. Eventually reaches application callback, which calls ServerCall.close()
    4. InProcessServerStream.close() (locks InProcessServerStream.this)
  • Thread2

    1. InProcessServerStream.clientRequested (locks InProcessServerStream.this)
    2. InProcessServerStream.clientStreamListener.messageRead()
    3. Eventually reaches application callback, which calls ClientCall.close()
    4. InProcessClientStream.close() (locks InProcessClientStream.this)

As locks are acquired in reverse orders from two threads, a deadlock is possible.

The fundamental issue is that we should not call into application code while holding a lock, because we don't know what application code can do thus we can't control the order of subsequent locking.

OkHttp has the same issue, because OkHttpClientStream.transportDataReceived(), which will call into application code, is called under lock.

We could use ChannelExecutor (maybe renamed) to prevent calling into callbacks while holding a lock.

@ArvinDevel
Copy link

I recently ran into a problem which is related to InProcessServerStream and InProcessClientStream deadlock. When my server app thread response to client request, it will first lock InProcessServerStream, then try to lock InProcessClientStream which is locked by grpc-thread which wants to lock InProcessServerStream at the same time.
Does this mean I can only use the same grpc thread to response?

@ArvinDevel
Copy link

ArvinDevel commented Jan 6, 2019

Output of jstack related to deadlock is below.

"grpc-default-executor-1":
	at io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessServerStream.request(InProcessTransport.java:324)
	- waiting to lock <0x00000007b7f8d6f8> (a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessServerStream)
	at io.grpc.internal.ServerCallImpl.request(ServerCallImpl.java:83)
	at io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:252)
	at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:263)
	at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:682)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializeReentrantCallsDirectExecutor.execute(SerializeReentrantCallsDirectExecutor.java:49)
	at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener.messagesAvailable(ServerImpl.java:693)
	at io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream.writeMessage(InProcessTransport.java:563)
	- locked <0x00000007b7f8d3f8> (a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream)
	at io.grpc.internal.ForwardingClientStream.writeMessage(ForwardingClientStream.java:36)
	at io.grpc.internal.ClientCallImpl.sendMessage(ClientCallImpl.java:434)
	at io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37)
	at io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37)
	at org.apache.bookkeeper.common.grpc.proxy.ProxyCall$RequestProxy.onMessage(ProxyCall.java:56)
	at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:263)
	at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:682)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
"storage-scheduler-OrderedScheduler-0-0":
	at io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream.request(InProcessTransport.java:515)
	- waiting to lock <0x00000007b7f8d3f8> (a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream)
	at io.grpc.internal.ForwardingClientStream.request(ForwardingClientStream.java:31)
	at io.grpc.internal.ClientCallImpl.request(ClientCallImpl.java:381)
	at io.grpc.PartialForwardingClientCall.request(PartialForwardingClientCall.java:34)
	at io.grpc.ForwardingClientCall.request(ForwardingClientCall.java:22)
	at io.grpc.ForwardingClientCall$SimpleForwardingClientCall.request(ForwardingClientCall.java:44)
	at io.grpc.PartialForwardingClientCall.request(PartialForwardingClientCall.java:34)
	at io.grpc.ForwardingClientCall.request(ForwardingClientCall.java:22)
	at io.grpc.ForwardingClientCall$SimpleForwardingClientCall.request(ForwardingClientCall.java:44)
	at org.apache.bookkeeper.common.grpc.proxy.ProxyCall$ResponseProxy.onMessage(ProxyCall.java:112)
	- locked <0x00000007b7f8d2c0> (a org.apache.bookkeeper.common.grpc.proxy.ProxyCall$ResponseProxy)
	at io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
	at io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:526)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializeReentrantCallsDirectExecutor.execute(SerializeReentrantCallsDirectExecutor.java:49)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.messagesAvailable(ClientCallImpl.java:543)
	at io.grpc.internal.ForwardingClientStreamListener.messagesAvailable(ForwardingClientStreamListener.java:44)
	at io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessServerStream.writeMessage(InProcessTransport.java:377)
	- locked <0x00000007b7f8d6f8> (a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessServerStream)
	at io.grpc.internal.ServerCallImpl.sendMessage(ServerCallImpl.java:139)
	at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:341)
	at org.apache.bookkeeper.stream.storage.impl.stream.WriteSession.accept(WriteSession.java:148)
	at org.apache.bookkeeper.stream.storage.impl.stream.WriteSession.accept(WriteSession.java:40)
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)

@ArvinDevel
Copy link

If I use default thread-pool to accompany async call StreamObserver<...>.onNext(...) in server after my async future to client request finish, the problem still exists.
Any suggestions on this situation where server use CompletableFuture to async handle client request?

@ArvinDevel
Copy link

ArvinDevel commented Jan 6, 2019

The new deadlock is below.

Found one Java-level deadlock:

"grpc-default-executor-26":
  waiting to lock monitor 0x00007febea3c9d88 (object 0x00000007ab5fc618, a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream),
  which is held by "grpc-default-executor-14"
"grpc-default-executor-14":
  waiting to lock monitor 0x00007febeb884af8 (object 0x00000007ab5fc918, a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessServerStream),
  which is held by "ForkJoinPool.commonPool-worker-3"
"ForkJoinPool.commonPool-worker-3":
  waiting to lock monitor 0x00007febea3c9d88 (object 0x00000007ab5fc618, a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream),
  which is held by "grpc-default-executor-14"

Java stack information for the threads listed above:

"grpc-default-executor-26":
	at io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream.internalCancel(InProcessTransport.java:593)
	- waiting to lock <0x00000007ab5fc618> (a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream)
	at io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream.cancel(InProcessTransport.java:584)
	at io.grpc.internal.ForwardingClientStream.cancel(ForwardingClientStream.java:61)
	at io.grpc.internal.ClientCallImpl$ContextCancellationListener.cancelled(ClientCallImpl.java:115)
	at io.grpc.Context$ExecutableListener.run(Context.java:995)
	at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:399)
	at io.grpc.Context$ExecutableListener.deliver(Context.java:987)
	at io.grpc.Context$ExecutableListener.access$400(Context.java:976)
	at io.grpc.Context.notifyAndClearListeners(Context.java:535)
	at io.grpc.Context$ParentListener.cancelled(Context.java:1006)
	at io.grpc.Context$ExecutableListener.run(Context.java:995)
	at io.grpc.Context$DirectExecutor.execute(Context.java:1023)
	at io.grpc.Context$ExecutableListener.deliver(Context.java:987)
	at io.grpc.Context$ExecutableListener.access$400(Context.java:976)
	at io.grpc.Context.notifyAndClearListeners(Context.java:540)
	at io.grpc.Context$CancellableContext.cancel(Context.java:794)
	at io.grpc.internal.ServerImpl$ContextCloser.run(ServerImpl.java:781)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
"grpc-default-executor-14":
	at io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessServerStream.request(InProcessTransport.java:323)
	- waiting to lock <0x00000007ab5fc918> (a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessServerStream)
	at io.grpc.internal.ServerCallImpl.request(ServerCallImpl.java:83)
	at io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:252)
	at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:263)
	at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:682)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializeReentrantCallsDirectExecutor.execute(SerializeReentrantCallsDirectExecutor.java:49)
	at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener.messagesAvailable(ServerImpl.java:693)
	at io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream.writeMessage(InProcessTransport.java:563)
	- locked <0x00000007ab5fc618> (a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream)
	at io.grpc.internal.ForwardingClientStream.writeMessage(ForwardingClientStream.java:36)
	at io.grpc.internal.ClientCallImpl.sendMessage(ClientCallImpl.java:434)
	at io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37)
	at io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37)
	at org.apache.bookkeeper.common.grpc.proxy.ProxyCall$RequestProxy.onMessage(ProxyCall.java:56)
	at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:263)
	at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:682)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
"ForkJoinPool.commonPool-worker-3":
	at io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream.request(InProcessTransport.java:514)
	- waiting to lock <0x00000007ab5fc618> (a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream)
	at io.grpc.internal.ForwardingClientStream.request(ForwardingClientStream.java:31)
	at io.grpc.internal.ClientCallImpl.request(ClientCallImpl.java:381)
	at io.grpc.PartialForwardingClientCall.request(PartialForwardingClientCall.java:34)
	at io.grpc.ForwardingClientCall.request(ForwardingClientCall.java:22)
	at io.grpc.ForwardingClientCall$SimpleForwardingClientCall.request(ForwardingClientCall.java:44)
	at io.grpc.PartialForwardingClientCall.request(PartialForwardingClientCall.java:34)
	at io.grpc.ForwardingClientCall.request(ForwardingClientCall.java:22)
	at io.grpc.ForwardingClientCall$SimpleForwardingClientCall.request(ForwardingClientCall.java:44)
	at org.apache.bookkeeper.common.grpc.proxy.ProxyCall$ResponseProxy.onMessage(ProxyCall.java:112)
	- locked <0x00000007ab5fc4e0> (a org.apache.bookkeeper.common.grpc.proxy.ProxyCall$ResponseProxy)
	at io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
	at io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:526)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializeReentrantCallsDirectExecutor.execute(SerializeReentrantCallsDirectExecutor.java:49)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.messagesAvailable(ClientCallImpl.java:543)
	at io.grpc.internal.ForwardingClientStreamListener.messagesAvailable(ForwardingClientStreamListener.java:44)
	at io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessServerStream.writeMessage(InProcessTransport.java:377)
	- locked <0x00000007ab5fc918> (a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessServerStream)
	at io.grpc.internal.ServerCallImpl.sendMessage(ServerCallImpl.java:139)
	at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:341)
	at org.apache.bookkeeper.stream.storage.impl.stream.WriteSession.accept(WriteSession.java:147)
	at org.apache.bookkeeper.stream.storage.impl.stream.WriteSession.accept(WriteSession.java:40)
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
	at java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:443)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

Found 1 deadlock.

@carl-mastrangelo
Copy link
Contributor

Looks legit.

@ArvinDevel
Copy link

ArvinDevel commented Jan 8, 2019

@carl-mastrangelo Any suggestions? Is my usage wrong? I'm confused by that why InProcessServerStream is coupled with InProcessClientStream so tightly?I think the InProcessServerStream should only responsible for transfer client request response to client.

@carl-mastrangelo
Copy link
Contributor

I think this an actual bug, and there isn't much you can do about it. The tight coupling is to make InProcess as close to being thread-free as possible (i.e. in unit tests, where it may not be possible to have multiple threads). The pattern we use to solve this elsewhere is serializing all transport accesses, but that will take time to implement.

@ejona86
Copy link
Member

ejona86 commented May 15, 2019

When this is fixed we should also fix the references to it in the examples, introduced in #5672.

larry-safran added a commit to larry-safran/grpc-java that referenced this issue Jun 27, 2022
…void deadlocks (Fixes bug grpc#3084)

Also support unary calls returning null values
larry-safran added a commit to larry-safran/grpc-java that referenced this issue Jun 28, 2022
…void deadlocks (Fixes bug grpc#3084)

Also support unary calls returning null values
larry-safran added a commit to larry-safran/grpc-java that referenced this issue Jun 28, 2022
…void deadlocks (Fixes bug grpc#3084)

Also support unary calls returning null values
@ejona86 ejona86 assigned larry-safran and unassigned zhangkun83 and temawi Jun 28, 2022
larry-safran added a commit to larry-safran/grpc-java that referenced this issue Jun 29, 2022
…void deadlocks

Fixes grpc#3084
Also support unary calls returning null values
ejona86 pushed a commit that referenced this issue Jun 30, 2022
…oid deadlocks

Fixes deadlocks caused by client and server listeners being called in a synchronized block

Also support unary calls returning null values

Fixes #3084
@ejona86 ejona86 modified the milestones: Next, 1.49 Jul 22, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
6 participants