Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LoadBalance] Optimize find nics process. #14340

Merged
merged 4 commits into from Feb 18, 2022
Merged

[LoadBalance] Optimize find nics process. #14340

merged 4 commits into from Feb 18, 2022

Conversation

mattisonchao
Copy link
Member

@mattisonchao mattisonchao commented Feb 17, 2022

Motivation

According to this PR #14252,

We can know In some cases, VMs in EC2 won't have the speed reported on the NIC and will give a read error.

In a normal case, only eth0 has speed value. other devices have no speed value.

image

So when we check isPhysicalNic , line 232 would get a read error many times.

private boolean isPhysicalNic(Path path) {
if (!path.toString().contains("/virtual/")) {
try {
Files.readAllBytes(path.resolve("speed"));
return true;
} catch (Exception e) {
// In some cases, VMs in EC2 won't have the speed reported on the NIC and will give a read-error.
// Check the type to make sure it's ethernet (type "1")
try {
String type = new String(Files.readAllBytes(path.resolve("type")), StandardCharsets.UTF_8).trim();
return Integer.parseInt(type) == 1;
} catch (IOException ioe) {
// wireless nics don't report speed, ignore them.
return false;
}
}
}
return false;
}

Then I think we can change the process to check type first and then to check speed.

Modifications

  • Change find NICs process.
  • Add log to remind the user to set config loadBalancerOverrideBrokerNicSpeedGbps when we can't read speed that is ethernet.

Verifying this change

  • Make sure that the change passes the CI checks.

Documentation

  • no-need-doc

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 17, 2022
@mattisonchao
Copy link
Member Author

/pulsarbot rerun-failure-checks

@Technoboy-
Copy link
Contributor

Could you help review this patch, for we're not sure about it.
@merlimat @codelipenghui @eolivelli @lhotari .

@mattisonchao
Copy link
Member Author

Supplementary Instructions:

Linux /sys/class/net/<iface>/type values: https://github.com/torvalds/linux/blob/master/include/uapi/linux/if_arp.h
Linux /sys/class/net/<iface>/speeddefination:

Indicates the interface latest or current speed value. Value is an integer representing the link speed in Mbits/sec.
Note: this attribute is only valid for interfaces that implement the ethtool get_link_ksettings method (mostly Ethernet).

@mattisonchao
Copy link
Member Author

/pulsarbot rerun-failure-checks

@merlimat merlimat added this to the 2.11.0 milestone Feb 17, 2022
@merlimat merlimat added the type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages label Feb 17, 2022
@BewareMyPower
Copy link
Contributor

Should it be cherry-picked to other active branches? @codelipenghui

I also found this problem in EC2 VMs.

@michaeljmarshall
Copy link
Member

@BewareMyPower - it might be helpful to cherry pick to active branches. In #14537, there was a discussion about failing fast, instead of constantly logging this error message. #14537 (comment)

I expect that behavior will likely get cherry-picked, and that commit might rely on this one.

@Shawyeok
Copy link
Contributor

Shawyeok commented Apr 7, 2022

We had same problem on Xen VM with pulsar 2.8.3, should we pick this to 2.8.x branch? @codelipenghui

codelipenghui pushed a commit that referenced this pull request Apr 19, 2022
codelipenghui pushed a commit that referenced this pull request Apr 19, 2022
@codelipenghui codelipenghui added the cherry-picked/branch-2.9 Archived: 2.9 is end of life label Apr 19, 2022
Nicklee007 pushed a commit to Nicklee007/pulsar that referenced this pull request Apr 20, 2022
codelipenghui pushed a commit that referenced this pull request Apr 28, 2022
@codelipenghui codelipenghui added the cherry-picked/branch-2.8 Archived: 2.8 is end of life label Apr 28, 2022
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request May 4, 2022
(cherry picked from commit 77d60b3)
(cherry picked from commit 58d074d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker cherry-picked/branch-2.8 Archived: 2.8 is end of life cherry-picked/branch-2.9 Archived: 2.9 is end of life cherry-picked/branch-2.10 doc-not-needed Your PR changes do not impact docs release/2.8.4 release/2.9.3 release/2.10.1 type/bug The PR fixed a bug or issue reported a bug type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants