io.jaegertracing.jaeger-client 1.7.0 ICMP port unreachable if agent daemonset restarts #827
Comments
There was an attempt to fix something similar in #726. Note that this library is deprecated, there are no plans to fix any bugs unless they are security related. Please see the notice in the readme. |
Thanks. Can I ask a question here? Please ignore if it is not appropriate. I says in readme that jaeger-client library is being deprecated and open telemetry library should be used instead. I followed this suggested guide https://medium.com/jaegertracing/migrating-from-jaeger-client-to-opentelemetry-sdk-bd337d796759 Why and how is open telemetry library with jaeger bridge considered ready?
<dependencyManagement>
<dependencies>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-bom</artifactId>
<version>1.10.1</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-bom-alpha</artifactId>
<version>1.10.1-alpha</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
|
Reported problem with ICMP port unreachable can be fixed with change in flush method in class io.jaegertracing.thrift.internal.reporters.protocols.ThriftUdpTransport. If agents are used as Daemonset and not sidecars every redeployment or restart of agents will cause all jaeger-client in services to receive ICMP port uncreachable which will cause socket: java.net.DatagramSocket to be closed. When agents become available again spans will not continue to be sent from client to agent automatically because socket is closed. This fix will try to reconnect the socket first and then flush again. I tested it and it works. If this is not the correct solution it would be good if a maintainer could fix it in the right way. It is not a security bug but it is a major problem which is causing critical problems in production. For instance we can't restart all services in production every time there is an upgrade of jaeger for which restart of agent Daemonset is needed. Fix in ThriftUdpTransport: @Override
public void flush() throws TTransportException {
if (this.writeBuffer != null) {
byte[] bytes = new byte[MAX_PACKET_SIZE];
int len = this.writeBuffer.position();
this.writeBuffer.flip();
this.writeBuffer.get(bytes, 0, len);
try {
this.socket.send(new DatagramPacket(bytes, len));
} catch (PortUnreachableException e) {
reconnectSocketAndFlush(bytes, len);
} catch (IOException e) {
throw new TTransportException(
TTransportException.UNKNOWN, "Cannot flush closed transport", e);
} finally {
this.writeBuffer = null;
}
}
}
private void reconnectSocketAndFlush(byte[] bytes, int len) throws TTransportException {
try {
this.socket = new DatagramSocket(null);
this.socket.connect(new InetSocketAddress(host, port));
} catch (SocketException se) {
throw new TTransportException(
TTransportException.UNKNOWN, "TUDPTransport cannot reconnect:", se);
}
try {
this.socket.send(new DatagramPacket(bytes, len));
} catch (IOException ioe) {
throw new TTransportException(
TTransportException.UNKNOWN, "Cannot flush on reconnected transport", ioe);
}
} |
Won't fix - this repository is being archived. |
We have agents deployed as daemonset on kubernetes. If for some reason agents restart: microservices that have io.jaegertracing.jaeger-client 1.7.0 will not be able to reconnect to agent nodeIP:6831 udp port. In logs I see this exception:
If microservice is restarted everything will work again.
Microservices are configured with:
To Reproduce
Steps to reproduce the behavior:
Expected behavior
When Jaeger agent daemonset restarts jaeger-client should reconnect successfully to agent.
Version (please complete the following information):
What troubleshooting steps did you try?
The text was updated successfully, but these errors were encountered: