Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add exponential backoff #541

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

PertsevRoman
Copy link

@PertsevRoman PertsevRoman commented May 24, 2022

I faced with DNS/timeout issues during agent initialization
WebSocket mode

io.jenkins.remoting.shaded.javax.websocket.DeploymentException: Connection failed.
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.JdkClientContainer$1.call(JdkClientContainer.java:187)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.JdkClientContainer$1.call(JdkClientContainer.java:107)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.JdkClientContainer.openClientSocket(JdkClientContainer.java:192)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3$1.run(ClientManager.java:647)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3.run(ClientManager.java:696)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$SameThreadExecutorService.execute(ClientManager.java:849)
	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:493)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:337)
	at hudson.remoting.Engine.runWebSocket(Engine.java:656)
	at hudson.remoting.Engine.run(Engine.java:495)
Caused by: java.nio.channels.UnresolvedAddressException
	at sun.nio.ch.Net.checkAddress(Net.java:104)
	at sun.nio.ch.UnixAsynchronousSocketChannelImpl.implConnect(UnixAsynchronousSocketChannelImpl.java:302)
	at sun.nio.ch.AsynchronousSocketChannelImpl.connect(AsynchronousSocketChannelImpl.java:210)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter.handleConnect(TransportFilter.java:184)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.connect(Filter.java:80)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.connect(Filter.java:83)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.connect(Filter.java:83)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.connect(ClientFilter.java:99)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.JdkClientContainer.connectSynchronously(JdkClientContainer.java:326)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.JdkClientContainer.access$700(JdkClientContainer.java:58)
	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.JdkClientContainer$1.call(JdkClientContainer.java:156)
	... 12 more

TCP socket mode

java.io.IOException: Failed to connect to http://jenkins-url/tcpSlaveAgentListener/: connect timed out
	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:214)
	at hudson.remoting.Engine.innerRun(Engine.java:733)
	at hudson.remoting.Engine.run(Engine.java:539)
Caused by: java.net.SocketTimeoutException: connect timed out
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:607)
	at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
	at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
	at sun.net.www.http.HttpClient.New(HttpClient.java:339)
	at sun.net.www.http.HttpClient.New(HttpClient.java:357)
	at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1228)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
	at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990)
	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:211)
	... 2 more

I'm trying to detect exact issue. Seems it relates to some K8s/JDK networking corner case.
Anyway the PR introduces exponential backoff workaround which resolves the issue on source code level. It can help to avoid potential network issues.

@PertsevRoman PertsevRoman marked this pull request as ready for review May 24, 2022 11:25
@PertsevRoman PertsevRoman changed the title draft: fix: add exponential backoff fix: add exponential backoff May 24, 2022
@jglick jglick added the bug For changelog: Fixes a bug. label May 24, 2022
@jglick jglick requested a review from Vlatombe May 24, 2022 12:18
Copy link
Contributor

@jeffret-b jeffret-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea seems fine. Exponential backoff can be useful. In this case, we need an associated Jira ticket that sufficiently describes the problem and some indication that this resolves or improves the situation.

import java.net.Socket;
import java.net.URI;
import java.net.URL;
import java.net.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't combine imports. They should be one per line.

Also, it would be nice if the reformatting were separated. At least in a separate commit. Maybe a separate PR.

@@ -72,6 +72,11 @@ THE SOFTWARE.
</properties>

<dependencies>
<dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally we don't like to add dependencies to Remoting, but this one is small, so it's probably okay.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though do we really need a third-party dependency for exponential backoff? Remoting already retries connection, and typically you need to tune backoff behavior a bit for the caller anyway. E.g. https://github.com/jenkinsci/apache-httpcomponents-client-4-api-plugin/blob/4e7d9a7bae61f816f95f6fc1fb144e6ed40895f1/src/main/java/io/jenkins/plugins/httpclient/RobustHTTPClient.java#L159-L216

@@ -236,6 +232,8 @@ public Thread newThread(@NonNull final Runnable r) {
private final String instanceIdentity;
private final Set<String> protocols;

private Integer retryAttempts;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use primitives?

@jglick
Copy link
Member

jglick commented Nov 29, 2022

Merge conflict, possibly with #603 etc. The idea is fine but this is probably a candidate for closing unless the author still has time and interest in cleaning it up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug For changelog: Fixes a bug.
Projects
None yet
3 participants