Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memberlist fails in large-scale clusters #299

Open
zhuohuashiyi opened this issue Mar 24, 2024 · 1 comment
Open

memberlist fails in large-scale clusters #299

zhuohuashiyi opened this issue Mar 24, 2024 · 1 comment

Comments

@zhuohuashiyi
Copy link

We recently implemented a large scale improved gossip protocol based on memberlist. memberlist did not meet our expectations when the number of nodes reached 1000. After investigation, it is found that the broadcast mechanism takes data from the system broadcast queue first(function getBraodcasts in broadcast.go). If the number of nodes is too large, the UDP packet does not have enough space to store the data in the user-defined broadcast queue, which leads to our failure to achieve system consistency.

@zhuohuashiyi
Copy link
Author

No, I conducted experiments based on memberlist in a cluster consisting of thousands of servers. The real cause of the problem I mentioned is that the udp packet size of a single broadcast is limited, and memberlist's broadcast logic is to first populate the system data, that is, some node status information, and then fetch data from the user-defined broadcast queue to populate the udp packet. This logic is fine in a small cluster, but if the cluster scale increases, udp packets will be full of node status information, resulting in inconsistency at the user level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant