Skip to content
This repository has been archived by the owner on Jul 1, 2022. It is now read-only.

RemoteReporter repeatedly logging error and success during unavailable agent #732

Closed
pschichtel opened this issue Aug 3, 2020 · 4 comments · Fixed by #785
Closed

RemoteReporter repeatedly logging error and success during unavailable agent #732

pschichtel opened this issue Aug 3, 2020 · 4 comments · Fixed by #785

Comments

@pschichtel
Copy link
Contributor

Followup to #728

This is very likely a bug I introduced myself in #729 which got released with 1.3.2.
It seems to toggle between the states even though the agent has not been reachable. I'll investigate this.

@jphelp32
Copy link

@pschichtel any update on addressing this? thanks

@pschichtel
Copy link
Contributor Author

Sorry, totally forgot about this. I had a look at it, but the behaviour didn't really make sense to me. I wasn't able to properly debug this in a running instance of our application (some source mapping issues it seemed). We have stopped enabled jaeger by default since then, so haven't really touched this in a white.

@lenoch7
Copy link
Contributor

lenoch7 commented Apr 23, 2021

IMHO, one problem is, that removing "failed" command from commandFailedBefore HashSet (in RemoteReporter.QueueProcessor) is always done when command execution ends successfully (i.e. without throwing SenderException). But there are some situations, when "span is only saved in a buffer" on sender level or there is no any span to be sent and remote communication is not started at all (and SenderException cannot be thrown). In such cases the sender methods (append(), flush() ... ) should return value 0.

I tried to use modification, where I only remove "failed" command from HashSet, when command ends successfully and return value is bigger than 0. I changed execute() method signature of Command interface to return int value (result of sender method), something like this:

interface Command {
    int execute() throws SenderException;
}

public void run() {
  ...
  int sentSpanCount = command.execute();
  if (sentSpanCount > 0) {
    if (failedBefore) {
      log.info(commandClass.getSimpleName() + " is working again!");
      commandFailedBefore.remove(commandClass);
    }
  }
  ...
}

But unfortunately, it seems it is not enough (and it is problem number two). It seems that my modification solved problem when HttpSender is used (I got log only first time), but it does not work in case when UdpSender is used. I tried to debug it and I discovered, that in my environment, socket.send(new DatagramPacket(bytes, len)) statement in ThriftUdpTransport.flush() does not throw PortUnreachableException on each invocation. Here is a piece of javadoc for DatagramSocket.send(DatagramPacket) method:

PortUnreachableException may be thrown if the socket is connected
to a currently unreachable destination. Note, there is no
guarantee that the exception will be thrown.

So finally, I configured io.jaegertracing.internal.reporters.RemoteReporter logger level to ERROR only (to suppress error messages and completely disable feature to log error only one time). I guess (maybe), it was a reason to open this issue #778.

The question is, if it has sense to solve problem one - removing failed command only when some attempt to start remote communication is initiated - because it works only for HttpSender.

@lenoch7
Copy link
Contributor

lenoch7 commented May 3, 2021

I created PR (#785), where fix is done only for situation, when HttpSender is used. To make a fix for UdpSender probably requires bigger changes.

BTW: IMHO, the problem, which I described has relation to correct values in metrics too. Because when PortUnreachableException is not thrown, the count of Failure statistics cannot be updated.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants