Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade GRPC to 1.31 to avoid deadlock #8351

Merged
merged 1 commit into from Oct 23, 2020

Conversation

pkumar-singh
Copy link
Member

@pkumar-singh pkumar-singh commented Oct 23, 2020

Motivation

Current version of gRPC is deadlocks as explained below.
This deadlock is currently only appearing in state store but might affect pulsar in general as well.

Found one Java-level deadlock:
"io-read-scheduler-OrderedScheduler-0-0":
waiting to lock monitor 0x00007f10e804e100 (object 0x00000000e3b8e0a8, a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream),
which is held by "grpc-default-executor-17"
"grpc-default-executor-17":
waiting to lock monitor 0x00007f107000ca00 (object 0x00000000e3b8e1d8, a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessServerStream),
which is held by "io-read-scheduler-OrderedScheduler-0-0"
Java stack information for the threads listed above:
===================================================
"io-read-scheduler-OrderedScheduler-0-0":
at io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream.request(InProcessTransport.java:639)
waiting to lock <0x00000000e3b8e0a8> (a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream)
at io.grpc.internal.ForwardingClientStream.request(ForwardingClientStream.java:32)
at io.grpc.internal.ClientCallImpl.request(ClientCallImpl.java:369)
at io.grpc.PartialForwardingClientCall.request(PartialForwardingClientCall.java:34)
at io.grpc.ForwardingClientCall.request(ForwardingClientCall.java:22)
at io.grpc.ForwardingClientCall$SimpleForwardingClientCall.request(ForwardingClientCall.java:44)
at io.grpc.PartialForwardingClientCall.request(PartialForwardingClientCall.java:34)
at io.grpc.ForwardingClientCall.request(ForwardingClientCall.java:22)
at io.grpc.ForwardingClientCall$SimpleForwardingClientCall.request(ForwardingClientCall.java:44)
at io.grpc.PartialForwardingClientCall.request(PartialForwardingClientCall.java:34)
at io.grpc.ForwardingClientCall.request(ForwardingClientCall.java:22)
at io.grpc.ForwardingClientCall$SimpleForwardingClientCall.request(ForwardingClientCall.java:44)
at org.apache.bookkeeper.common.grpc.proxy.ProxyCall$ResponseProxy.onMessage(ProxyCall.java:112)
locked <0x00000000e3b8dfd8> (a org.apache.bookkeeper.common.grpc.proxy.ProxyCall$ResponseProxy)
at io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
at io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:519)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializeReentrantCallsDirectExecutor.execute(SerializeReentrantCallsDirectExecutor.java:49)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.messagesAvailable(ClientCallImpl.java:536)
at io.grpc.internal.ForwardingClientStreamListener.messagesAvailable(ForwardingClientStreamListener.java:44)
at io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessServerStream.writeMessage(InProcessTransport.java:455)
locked <0x00000000e3b8e1d8> (a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessServerStream)
at io.grpc.internal.ServerCallImpl.sendMessage(ServerCallImpl.java:139)
at io.grpc.ForwardingServerCall.sendMessage(ForwardingServerCall.java:32)
at org.apache.bookkeeper.common.grpc.stats.MonitoringServerCall.sendMessage(MonitoringServerCall.java:47)
at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:344)
at org.apache.bookkeeper.stream.storage.impl.grpc.handler.ResponseHandler.accept(ResponseHandler.java:49)
at org.apache.bookkeeper.stream.storage.impl.grpc.handler.ResponseHandler.accept(ResponseHandler.java:29)
at java.util.concurrent.CompletableFuture.uniWhenComplete(java.base@11.0.8/CompletableFuture.java:859)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(java.base@11.0.8/CompletableFuture.java:837)
at java.util.concurrent.CompletableFuture.postComplete(java.base@11.0.8/CompletableFuture.java:506)
at java.util.concurrent.CompletableFuture.complete(java.base@11.0.8/CompletableFuture.java:2073)
at org.apache.bookkeeper.statelib.impl.journal.AbstractStateStoreWithJournal.lambda$executeIO$16(AbstractStateStoreWithJournal.java:472)
at org.apache.bookkeeper.statelib.impl.journal.AbstractStateStoreWithJournal$$Lambda$294/0x00000008404b7040.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.8/Executors.java:515)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.8/Executors.java:515)
at java.util.concurrent.FutureTask.run(java.base@11.0.8/FutureTask.java:264)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.8/ScheduledThreadPoolExecutor.java:304)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.8/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.8/ThreadPoolExecutor.java:628)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(java.base@11.0.8/Thread.java:834)
"grpc-default-executor-17":
at io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessServerStream.isReady(InProcessTransport.java:466)
waiting to lock <0x00000000e3b8e1d8> (a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessServerStream)
at io.grpc.internal.ServerCallImpl.isReady(ServerCallImpl.java:167)
at io.grpc.PartialForwardingServerCall.isReady(PartialForwardingServerCall.java:43)
at io.grpc.ForwardingServerCall.isReady(ForwardingServerCall.java:22)
at io.grpc.ForwardingServerCall$SimpleForwardingServerCall.isReady(ForwardingServerCall.java:39)
at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:173)
at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializeReentrantCallsDirectExecutor.execute(SerializeReentrantCallsDirectExecutor.java:49)
at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener.halfClosed(ServerImpl.java:722)
at io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream.halfClose(InProcessTransport.java:745)
locked <0x00000000e3b8e0a8> (a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream)
at io.grpc.internal.ForwardingClientStream.halfClose(ForwardingClientStream.java:67)
at io.grpc.internal.ClientCallImpl.halfClose(ClientCallImpl.java:408)
at io.grpc.PartialForwardingClientCall.halfClose(PartialForwardingClientCall.java:44)
at io.grpc.ForwardingClientCall.halfClose(ForwardingClientCall.java:22)
at io.grpc.ForwardingClientCall$SimpleForwardingClientCall.halfClose(ForwardingClientCall.java:44)
at io.grpc.PartialForwardingClientCall.halfClose(PartialForwardingClientCall.java:44)
at io.grpc.ForwardingClientCall.halfClose(ForwardingClientCall.java:22)
at io.grpc.ForwardingClientCall$SimpleForwardingClientCall.halfClose(ForwardingClientCall.java:44)
at io.grpc.PartialForwardingClientCall.halfClose(PartialForwardingClientCall.java:44)
at io.grpc.ForwardingClientCall.halfClose(ForwardingClientCall.java:22)
at io.grpc.ForwardingClientCall$SimpleForwardingClientCall.halfClose(ForwardingClientCall.java:44)
at org.apache.bookkeeper.common.grpc.proxy.ProxyCall$RequestProxy.onHalfClose(ProxyCall.java:68)
at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.8/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.8/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base@11.0.8/Thread.java:834)
Found 1 deadlock.

Solution

Verified that with gRPC-1.31 issue is fixed.

@merlimat merlimat added the type/bug The PR fixed a bug or issue reported a bug label Oct 23, 2020
@merlimat merlimat added this to the 2.7.0 milestone Oct 23, 2020
@merlimat merlimat merged commit 647d3c2 into apache:master Oct 23, 2020
@merlimat merlimat changed the title PLSR-1240 upgrade GRPC to 1.31 to avoid deadlock Upgrade GRPC to 1.31 to avoid deadlock Oct 23, 2020
@lhotari
Copy link
Member

lhotari commented Oct 23, 2020

I opened #8361 since another PR job failed with this kind of exception:

13:11:31.285 [main] ERROR org.apache.bookkeeper.common.component.AbstractLifecycleComponent - Failed to start Component: storage-service
java.lang.NoSuchMethodError: io.grpc.internal.DnsNameResolverProvider.newNameResolver(Ljava/net/URI;Lio/grpc/Attributes;)Lio/grpc/internal/DnsNameResolver;
	at org.apache.bookkeeper.common.resolver.ServiceNameResolverProvider.newNameResolver(ServiceNameResolverProvider.java:95) ~[org.apache.bookkeeper-stream-storage-java-client-4.10.0.jar:4.10.0]
	at org.apache.bookkeeper.common.resolver.NameResolverProviderFactory.newNameResolver(NameResolverProviderFactory.java:45) ~[org.apache.bookkeeper-stream-storage-java-client-4.10.0.jar:4.10.0]
	at io.grpc.NameResolver$Factory.newNameResolver(NameResolver.java:207) ~[io.grpc-grpc-api-1.31.0.jar:1.31.0]
	at io.grpc.NameResolver$Factory.newNameResolver(NameResolver.java:235) ~[io.grpc-grpc-api-1.31.0.jar:1.31.0]
	at io.grpc.internal.ManagedChannelImpl.getNameResolver(ManagedChannelImpl.java:701) ~[io.grpc-grpc-core-1.31.0.jar:1.31.0]
	at io.grpc.internal.ManagedChannelImpl.<init>(ManagedChannelImpl.java:606) ~[io.grpc-grpc-core-1.31.0.jar:1.31.0]

in https://github.com/apache/pulsar/runs/1297863577?check_suite_focus=true

I'd assume that protoc-gen-grpc-java.version should match grpc.version . I also added a change in the PR to use the grpc-bom in dependencyManagement.

@lhotari
Copy link
Member

lhotari commented Oct 23, 2020

@pkumar-singh @merlimat I have issued a PR to revert this grpc upgrade:Please see #8363 for more details. The master branch is currently broken because of the grpc upgrade.

@lhotari
Copy link
Member

lhotari commented Oct 26, 2020

@pkumar-singh regarding the dead lock, did you find some reference elsewhere that GRPC 1.31 has a fix for some dead lock bug?

I found an open issue grpc/grpc-java#3084 which is not resolved. There the workaround seems to be to execute callbacks on a different thread (executor). This is the reason why I'm wondering why the upgrade to GRPC 1.31 fixes the issue. It would be useful to understand the full context.

@pkumar-singh
Copy link
Member Author

pkumar-singh commented Oct 26, 2020

Well, The context is. I was running bookkeeper table service and I encounter this deadlock. As can be seen from the call stack. org.apache.bookkeeper.statelib.impl.journal.AbstractStateStoreWithJournal.lambda$executeIO$16(AbstractStateStoreWithJournal.java:472). And I saw the grpc/grpc-java#3084 issue too while digging around.
Natural question could be if deadlock is reported while running apache bookkeeper table service why upgrade pulsar.
Reason is: They all are running as same k8s deployment.

How do I confirm upgrade to 1.31 fixes the deadlock. Well, I upgraded to 1.31 in the deployment and deadlock never happened otherwise used to happen within 2 minutes.

Besides everything, sooner or later gRPC have to be updated in Apache Pulsar as well. Apache Pulsar running with gRPC 1.18 does have a deadlock. @lhotari

codelipenghui pushed a commit that referenced this pull request Oct 27, 2020
### Motivation

For managing protobuf libraries there is a maven bom file available for protobuf. The benefit of using this is that it can be imported to the project's pom.xml dependency management to make sure that the versions of the various protobuf libraries are aligned and use the same version.

Besides starting to use protobuf-bom, it was noticed that there are separate settings for protobuf protoc version (`protoc3.version`) and the protoc grpc plugin versions (`protoc-gen-grpc-java.version`). These versions should match the protobuf and grpc versions. The PR also covers an improvement for that.

One motivation of this PR is to prepare for the grpc upgrade that was attempted by #8351 , but reverted. Before doing the grpc upgrade, it would be useful to improve the protobuf & grpc dependency management provided by this PR.

### Modifications

* Use protobuf-bom to manage protobuf library versions
* make `protoc3.version` match `protobuf3.version`
  * there should be no reason that these would be different
* make grpc's `protoc-gen-grpc-java.version` match `grpc.version`
  * there should be no reason that there would be different
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this pull request Nov 13, 2020
Co-authored-by: Prashant Kumar <prashantk@splunk.com>
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this pull request Nov 13, 2020
### Motivation

For managing protobuf libraries there is a maven bom file available for protobuf. The benefit of using this is that it can be imported to the project's pom.xml dependency management to make sure that the versions of the various protobuf libraries are aligned and use the same version.

Besides starting to use protobuf-bom, it was noticed that there are separate settings for protobuf protoc version (`protoc3.version`) and the protoc grpc plugin versions (`protoc-gen-grpc-java.version`). These versions should match the protobuf and grpc versions. The PR also covers an improvement for that.

One motivation of this PR is to prepare for the grpc upgrade that was attempted by apache#8351 , but reverted. Before doing the grpc upgrade, it would be useful to improve the protobuf & grpc dependency management provided by this PR.

### Modifications

* Use protobuf-bom to manage protobuf library versions
* make `protoc3.version` match `protobuf3.version`
  * there should be no reason that these would be different
* make grpc's `protoc-gen-grpc-java.version` match `grpc.version`
  * there should be no reason that there would be different
flowchartsman pushed a commit to flowchartsman/pulsar that referenced this pull request Nov 17, 2020
Co-authored-by: Prashant Kumar <prashantk@splunk.com>
flowchartsman pushed a commit to flowchartsman/pulsar that referenced this pull request Nov 17, 2020
### Motivation

For managing protobuf libraries there is a maven bom file available for protobuf. The benefit of using this is that it can be imported to the project's pom.xml dependency management to make sure that the versions of the various protobuf libraries are aligned and use the same version.

Besides starting to use protobuf-bom, it was noticed that there are separate settings for protobuf protoc version (`protoc3.version`) and the protoc grpc plugin versions (`protoc-gen-grpc-java.version`). These versions should match the protobuf and grpc versions. The PR also covers an improvement for that.

One motivation of this PR is to prepare for the grpc upgrade that was attempted by apache#8351 , but reverted. Before doing the grpc upgrade, it would be useful to improve the protobuf & grpc dependency management provided by this PR.

### Modifications

* Use protobuf-bom to manage protobuf library versions
* make `protoc3.version` match `protobuf3.version`
  * there should be no reason that these would be different
* make grpc's `protoc-gen-grpc-java.version` match `grpc.version`
  * there should be no reason that there would be different
@lhotari lhotari mentioned this pull request Jan 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants