New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[loadbalance] remove 'Invalid argument' confusing stacktrace when NIC speed can't be detected #14537
Conversation
…s speed can't be detected
...broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/LinuxBrokerHostUsageImpl.java
Outdated
Show resolved
Hide resolved
...broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/LinuxBrokerHostUsageImpl.java
Outdated
Show resolved
Hide resolved
...broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/LinuxBrokerHostUsageImpl.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should consider failing during broker start up when loadBalancerOverrideBrokerNicSpeedGbps
is not configured and the broker cannot determine NIC speed for all NICs (and when load balancing is enabled).
The current design is to frequently log an error while ignoring the NIC. The only real course of action for an operator is to re-configure the broker. It seems like a better course of action to fail on start up so that operators know immediately that they need to fix the configuration instead of finding out some time later when observing the logs.
@nicoloboschi @michaeljmarshall I very much agree with Michael, Would you mind thinking about this approach? |
@mattisonchao thanks, please go ahead |
@mattisonchao - sure, go ahead. For context, we discussed this issue briefly at the community meeting today and there was consensus that failing on startup is the right behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
@mattisonchao @michaeljmarshall please note that some CI environments could have particular NIC and throwing blocking error will make them fail. |
@nicoloboschi - that's a great thing to call out. The solution is to configure the |
Motivation
After #14340 and #14252 you may find in the log something like that
There are some NIC that are of type 1 but that doesn't expose the speed.
The stacktrace is not useful and it's really scary.
Note that since #14252 has been cherry-picked to 2.8, 2.9, 2.10 it's recommended to pick this one as well.
Modifications
Invalid argument
exception and only log the suggestion messageno-need-doc