RemoteReporter repeatedly logging error and success during unavailable agent #732

pschichtel · 2020-08-03T18:16:39Z

Followup to #728

This is very likely a bug I introduced myself in #729 which got released with 1.3.2.
It seems to toggle between the states even though the agent has not been reachable. I'll investigate this.

jphelp32 · 2021-02-17T18:16:59Z

@pschichtel any update on addressing this? thanks

pschichtel · 2021-02-18T10:46:00Z

Sorry, totally forgot about this. I had a look at it, but the behaviour didn't really make sense to me. I wasn't able to properly debug this in a running instance of our application (some source mapping issues it seemed). We have stopped enabled jaeger by default since then, so haven't really touched this in a white.

lenoch7 · 2021-04-23T11:56:21Z

IMHO, one problem is, that removing "failed" command from commandFailedBefore HashSet (in RemoteReporter.QueueProcessor) is always done when command execution ends successfully (i.e. without throwing SenderException). But there are some situations, when "span is only saved in a buffer" on sender level or there is no any span to be sent and remote communication is not started at all (and SenderException cannot be thrown). In such cases the sender methods (append(), flush() ... ) should return value 0.

I tried to use modification, where I only remove "failed" command from HashSet, when command ends successfully and return value is bigger than 0. I changed execute() method signature of Command interface to return int value (result of sender method), something like this:

interface Command {
    int execute() throws SenderException;
}

public void run() {
  ...
  int sentSpanCount = command.execute();
  if (sentSpanCount > 0) {
    if (failedBefore) {
      log.info(commandClass.getSimpleName() + " is working again!");
      commandFailedBefore.remove(commandClass);
    }
  }
  ...
}

But unfortunately, it seems it is not enough (and it is problem number two). It seems that my modification solved problem when HttpSender is used (I got log only first time), but it does not work in case when UdpSender is used. I tried to debug it and I discovered, that in my environment, socket.send(new DatagramPacket(bytes, len)) statement in ThriftUdpTransport.flush() does not throw PortUnreachableException on each invocation. Here is a piece of javadoc for DatagramSocket.send(DatagramPacket) method:

PortUnreachableException may be thrown if the socket is connected
to a currently unreachable destination. Note, there is no
guarantee that the exception will be thrown.

So finally, I configured io.jaegertracing.internal.reporters.RemoteReporter logger level to ERROR only (to suppress error messages and completely disable feature to log error only one time). I guess (maybe), it was a reason to open this issue #778.

The question is, if it has sense to solve problem one - removing failed command only when some attempt to start remote communication is initiated - because it works only for HttpSender.

lenoch7 · 2021-05-03T06:11:42Z

I created PR (#785), where fix is done only for situation, when HttpSender is used. To make a fix for UdpSender probably requires bigger changes.

BTW: IMHO, the problem, which I described has relation to correct values in metrics too. Because when PortUnreachableException is not thrown, the count of Failure statistics cannot be updated.

lenoch7 mentioned this issue May 3, 2021

Fix repeatedly error logging for HttpSender (#732) #785

Merged

yurishkuro closed this as completed in #785 May 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RemoteReporter repeatedly logging error and success during unavailable agent #732

RemoteReporter repeatedly logging error and success during unavailable agent #732

pschichtel commented Aug 3, 2020

jphelp32 commented Feb 17, 2021

pschichtel commented Feb 18, 2021

lenoch7 commented Apr 23, 2021

lenoch7 commented May 3, 2021

RemoteReporter repeatedly logging error and success during unavailable agent #732

RemoteReporter repeatedly logging error and success during unavailable agent #732

Comments

pschichtel commented Aug 3, 2020

jphelp32 commented Feb 17, 2021

pschichtel commented Feb 18, 2021

lenoch7 commented Apr 23, 2021

lenoch7 commented May 3, 2021