Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrades Netty to 4.1.70.Final #3423

Closed

Conversation

ljnelson
Copy link
Member

Upgrades Netty to 4.1.68.Final and adds apparently necessary initialize-at-runtime statements to the native image properties in the webserver project.

Signed-off-by: Laird Nelson laird.nelson@oracle.com

@ljnelson
Copy link
Member Author

@barchetta @tomas-langer Joe asked for a volunteer to do this. Specifically the mp-1 test was failing with just an "ordinary" Netty upgrade, so I ran it and added relevant initialize-at-runtime lines. I did so by taking the instructions that GraalVM spat out and incorporated them rather blindly into the native-image.properties file. This seemed to make the mp-1 test pass but I stress that this is definitely not an area of expertise so drop some eyes on this please and I'll undraft it if things look OK.

@ljnelson ljnelson self-assigned this Sep 16, 2021
@ljnelson ljnelson added 2.x Issues for 2.x version branch dependencies Pull requests that update a dependency file native-image P3 webserver labels Sep 16, 2021
@ljnelson ljnelson added this to In Progress in Backlog Sep 16, 2021
@ljnelson ljnelson marked this pull request as ready for review September 21, 2021 16:56
@barchetta barchetta mentioned this pull request Sep 21, 2021
23 tasks
@tomas-langer
Copy link
Member

The native-image.properties seems to be updated fine.

@tomas-langer
Copy link
Member

The test failure can be reproduced locally, so it is most likely caused by the upgrade.

[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.314 s <<< FAILURE! - in io.helidon.grpc.server.SslIT
[ERROR] io.helidon.grpc.server.SslIT  Time elapsed: 0.314 s  <<< ERROR!
java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: io/netty/internal/tcnative/AsyncSSLPrivateKeyMethod

@ljnelson
Copy link
Member Author

OK; looks like pom.xml (not dependencies/pom.xml for some reason?) needs to be updated to also contain a higher version of netty-tcnative-boringssl-static. I'll make that change.

@ljnelson
Copy link
Member Author

This now results in a different error:

[ERROR] io.helidon.grpc.server.SslIT.shouldConnectWithoutClientCertsFor1Way  Time elapsed: 0.063 s  <<< ERROR!
io.grpc.StatusRuntimeException: UNAVAILABLE: GOAWAY shut down transport. HTTP/2 error code: INTERNAL_ERROR, debug data: io.netty.util.IllegalReferenceCountException: refCnt: 0
	at io.helidon.grpc.server.SslIT.shouldConnectWithoutClientCertsFor1Way(SslIT.java:129)

@ljnelson
Copy link
Member Author

I'll commit the tcnative version update because it needs to be there, and then perhaps @aseovic can take a look at the grpc innards portion of this.

@barchetta
Copy link
Member

FYI, in 4.1.65.Final they disabled tlsv1 by default. Maybe that impacts the test: https://netty.io/news/2021/05/19/4-1-65-Final.html

We should consider the compatibility impact of that on Helidon.

Signed-off-by: Laird Nelson <laird.nelson@oracle.com>
…ersion upgrade; uncovers reference count failure in grpc/server tests

Signed-off-by: Laird Nelson <laird.nelson@oracle.com>
@ljnelson
Copy link
Member Author

ljnelson commented Nov 3, 2021

Rebased this PR and bumped Netty to 4.1.70.Final to see if that helps.

…s so a requires statement had to change

Signed-off-by: Laird Nelson <laird.nelson@oracle.com>
…tty upgrade

Signed-off-by: Laird Nelson <laird.nelson@oracle.com>
@ljnelson
Copy link
Member Author

ljnelson commented Nov 3, 2021

Also had to bump up the tcnative version to conform to https://github.com/netty/netty/blob/f49c94c83a5b2a2625913f57515c8dc6ab8c2639/pom.xml#L511

@ljnelson ljnelson changed the title Upgrades Netty to 4.1.68.Final Upgrades Netty to 4.1.70.Final Nov 3, 2021
@tomas-langer tomas-langer marked this pull request as draft November 8, 2021 21:08
@ljnelson
Copy link
Member Author

ljnelson commented Nov 9, 2021

Keeping an eye on grpc/grpc-java#8605 which seems to be responsible.

@ljnelson
Copy link
Member Author

Notes to myself. Root stack trace:

2021.11.16 15:56:56 INFO io.grpc.netty.NettyServerTransport.connections Thread[nioEventLoopGroup-7-3,5,main]: Transport failed
io.netty.handler.codec.DecoderException: io.netty.util.IllegalReferenceCountException: refCnt: 0
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:477)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.helidon.common.context.Contexts.runInContext(Contexts.java:117)
	at io.helidon.grpc.server.GrpcServerImpl$ContextAwareThreadFactory.lambda$newThread$0(GrpcServerImpl.java:472)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0
	at io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1454)
	at io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1383)
	at io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1379)
	at io.netty.buffer.AbstractByteBuf.getByte(AbstractByteBuf.java:355)
	at io.netty.buffer.AbstractByteBuf.getUnsignedByte(AbstractByteBuf.java:368)
	at io.netty.handler.ssl.SslUtils.getEncryptedPacketLength(SslUtils.java:280)
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1210)
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1284)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446)

@duplexsystem
Copy link
Contributor

duplexsystem commented Nov 17, 2021

I'd recommend updating io_uring to 0.0.10 along with this PR :) as io_uring bump it's netty dependency to 4.1.70 in 0.0.10. No code changes for that but probably best practice to update it as well.

@ljnelson
Copy link
Member Author

I can make the failing test pass by effectively undoing netty/netty#11242. More investigation to come.

@ljnelson
Copy link
Member Author

Filed: netty/netty-tcnative#680

@normanmaurer
Copy link

@ljnelson how can I reproduce this ? I did checkout your branch and run mvn clean package but it passes.

@barchetta
Copy link
Member

@normanmaurer this should reproduce (using Laird's branch):

# java 17.0.1 2021-10-19 LTS
# Apache Maven 3.8.2
# From top of repo
mvn clean install -DskipTests
cd grpc/server
mvn clean install

@normanmaurer
Copy link

@barchetta no luck :(

[INFO]
[INFO] --- maven-compiler-plugin:3.8.1:testCompile (default-testCompile) @ helidon-grpc-server ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 16 source files to /Users/norman/Documents/workspace/helidon/grpc/server/target/test-classes
[INFO]
[INFO] --- maven-surefire-plugin:3.0.0-M5:test (default-test) @ helidon-grpc-server ---
[INFO]
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running io.helidon.grpc.server.GrpcServiceTest
[INFO] Tests run: 28, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.491 s - in io.helidon.grpc.server.GrpcServiceTest
[INFO] Running io.helidon.grpc.server.GrpcServerConfigurationTest
[INFO] Tests run: 21, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.206 s - in io.helidon.grpc.server.GrpcServerConfigurationTest
[INFO] Running io.helidon.grpc.server.HealthServiceImplTest
[INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.032 s - in io.helidon.grpc.server.HealthServiceImplTest
[INFO] Running io.helidon.grpc.server.MethodDescriptorTest
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.053 s - in io.helidon.grpc.server.MethodDescriptorTest
[INFO] Running io.helidon.grpc.server.ServiceDescriptorTest
[INFO] Tests run: 26, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.166 s - in io.helidon.grpc.server.ServiceDescriptorTest
[INFO] Running io.helidon.grpc.server.ContextSettingServerInterceptorTest
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.033 s - in io.helidon.grpc.server.ContextSettingServerInterceptorTest
[INFO] Running io.helidon.grpc.server.ConstantHealthCheckTest
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 s - in io.helidon.grpc.server.ConstantHealthCheckTest
[INFO] Running io.helidon.grpc.server.BindableServiceImplTest
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.013 s - in io.helidon.grpc.server.BindableServiceImplTest
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 93, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO]
[INFO] --- maven-jar-plugin:3.0.2:jar (default-jar) @ helidon-grpc-server ---
[INFO] Building jar: /Users/norman/Documents/workspace/helidon/grpc/server/target/helidon-grpc-server-2.4.1-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  8.502 s
[INFO] Finished at: 2021-11-22T18:59:55+01:00
[INFO] ------------------------------------------------------------------------
➜  server git:(upgrade-netty-to-4168Final) ✗ echo $JAVA_HOME
/Users/norman/.sdkman/candidates/java/current
➜  server git:(upgrade-netty-to-4168Final) ✗ /Users/norman/.sdkman/candidates/java/current/bin/java -version
openjdk version "17" 2021-09-14 LTS
OpenJDK Runtime Environment Zulu17.28+13-CA (build 17+35-LTS)
OpenJDK 64-Bit Server VM Zulu17.28+13-CA (build 17+35-LTS, mixed mode, sharing)

git branch:

upgrade-netty-to-4168Final

@ljnelson
Copy link
Member Author

ljnelson commented Nov 22, 2021

I don't know anything about the Zulu OpenJDK distribution. Is it possible it causes some different SSL engine to be used in Netty? If this issue is too hard to use for a reproducer (none of us has had this problem), you may find that reproducing the same issue over in grpc-java land is easier; see grpc/grpc-java#8605. Of course over there you'll need to run on Java 8.

@ljnelson
Copy link
Member Author

ljnelson commented Nov 22, 2021

@normanmaurer Sorry, read everything too fast. package (and install) doesn't go "far" enough; you'll need to do at least mvn verify. The failing test is run by the maven-failsafe-plugin as an integration test during the verify phase.

@normanmaurer
Copy link

@ljnelson thanks a lot! This allows me to reproduce locally now. Will come back to you

@ljnelson
Copy link
Member Author

Here is my (Netbeans) breakpoint that let me find the problem:

Screen Shot 2021-11-22 at 11 00 22 AM

@ljnelson
Copy link
Member Author

Finally, I debugged all this by running this Maven command:

JAVA_HOME=$(/usr/libexec/java_home -v17) /usr/local/mvn/bin/mvn verify -Dmaven.failsafe.debug -Dit.test='SslIT#shouldConnectWithClientCertsFor2WayUseConfig'

I mention this because the maven.failsafe.debug property turned out to be very convenient and together with the it.test property made for comparatively rapid turnarounds.

@normanmaurer
Copy link

I can confirm this fixes it:

diff --git a/grpc/server/src/main/java/io/helidon/grpc/server/GrpcServerImpl.java b/grpc/server/src/main/java/io/helidon/grpc/server/GrpcServerImpl.java
index 227f6f281..179014648 100644
--- a/grpc/server/src/main/java/io/helidon/grpc/server/GrpcServerImpl.java
+++ b/grpc/server/src/main/java/io/helidon/grpc/server/GrpcServerImpl.java
@@ -66,6 +66,7 @@ import io.netty.channel.nio.NioEventLoopGroup;
 import io.netty.channel.socket.nio.NioServerSocketChannel;
 import io.netty.handler.ssl.ClientAuth;
 import io.netty.handler.ssl.JdkSslContext;
+import io.netty.handler.ssl.OpenSslContextOption;
 import io.netty.handler.ssl.SslContext;
 import io.netty.handler.ssl.SslContextBuilder;
 import io.netty.handler.ssl.SslProvider;
@@ -414,7 +415,7 @@ public class GrpcServerImpl implements GrpcServer {
         }

         SslContextBuilder sslContextBuilder = SslContextBuilder.forServer(certResource.stream(), keyResource.stream())
-                .sslProvider(SslProvider.OPENSSL);
+                .sslProvider(SslProvider.OPENSSL).option(OpenSslContextOption.USE_TASKS, false);

         if (aX509Certificates.length > 0) {
             sslContextBuilder.trustManager(aX509Certificates)
diff --git a/grpc/server/src/test/java/io/helidon/grpc/server/SslIT.java b/grpc/server/src/test/java/io/helidon/grpc/server/SslIT.java
index ac6e8d2bf..a24f4f28b 100644
--- a/grpc/server/src/test/java/io/helidon/grpc/server/SslIT.java
+++ b/grpc/server/src/test/java/io/helidon/grpc/server/SslIT.java
@@ -39,6 +39,7 @@ import io.grpc.StatusRuntimeException;
 import io.grpc.netty.GrpcSslContexts;
 import io.grpc.netty.NegotiationType;
 import io.grpc.netty.NettyChannelBuilder;
+import io.netty.handler.ssl.OpenSslContextOption;
 import io.netty.handler.ssl.SslContext;
 import io.netty.handler.ssl.SslContextBuilder;
 import org.junit.AfterClass;
@@ -255,6 +256,7 @@ public class SslIT {
         if (clientCertChainFilePath != null && clientPrivateKeyFilePath != null) {
             builder.keyManager(clientCertChainFilePath.stream(), clientPrivateKeyFilePath.stream());
         }
+        builder.option(OpenSslContextOption.USE_TASKS, false);
         return builder.build();
     }

So definitely a netty bug.

@ljnelson
Copy link
Member Author

Thanks, @normanmaurer; looks like your netty/netty#11854 will fix this.

@ljnelson
Copy link
Member Author

ljnelson commented Dec 9, 2021

Superseded by #3720

@ljnelson ljnelson closed this Dec 9, 2021
Backlog automation moved this from In Progress to Closed Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Issues for 2.x version branch dependencies Pull requests that update a dependency file native-image P3 webserver
Projects
Backlog
  
Closed
Development

Successfully merging this pull request may close these issues.

None yet

5 participants