New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to set limits properly #2632
Comments
we enabled metrics to monitoring |
If you create your own DHT network, there is no need to change the limits. And everything will work faster. It's very easy to do. |
This happens for new streams that are pending protocol negotiation via multistream. These are the limits set by the limit.TransientBaseLimit.* config values. If you In your case, if you have more than 8 streams pending multistream negotiation, you'll trigger this issues. It will help to enable metrics to debug why this is happening.
total connection cound is over connmgr.LowWaterMark but it cannot trim these new connections because they have been around for less than grace period. This is not a problem in itself, especially if your resource manager limits are set right.
This happens when the peer closed the connection before you could run identify on it. Again not sure why this happens without some understanding of your code. It'll help to enable metrics to at least understand why this is happening. One theory is that one side is dropping the new stream because it has exceeded its transient connection limit and the other side was going to negotiate identify on this stream and so the other side prints this log line. If it's possible, can you run the nodes with debug logs and metrics enabled? |
Thank you for your invaluable assistance in addressing the issues in this thread. We have implemented a connection gater to prevent new connections to open on limit, and this has significantly reduced the number of connection-related errors reported by users. However, we are still encountering some issues, particularly with connections to peers where protocol negotiation is not completed. Getting supporting protocols is very important for us, as it allows each node to handshake with its neighbors (using streams) in order to start the syncing process. The number of connections without a supporting protocol exchange is significant and appears to be abnormal. Do you have any idea what leads to nodes opening connections without exchanging supporting protocols? This information would be really helpful in our efforts to resolve these issues. By the way, I believe @amirvalhalla has sent a link to a specific metric here. |
Regarding the resource manager limits, we have defined the limits as follows:
By default, the minimum is set at 8, and the maximum is set at 16 connections. This means that each node only needs connections with 8 to 16 other nodes.
Consider users who are running the node on their personal computers; some also run the node on a VPS with only 1 or 2 cores and 2 GB of RAM.
So far, the syncing process and networking work smoothly, especially for the consensus messages; we don't have any issues. We have about 400+ computers in our network. However, there are some strange logs in our system that I want to discuss with you:
Implementation is available here: https://github.com/pactus-project/pactus/tree/main/network
Thanks in advance for your help
The image is reported by one of the community member:
The text was updated successfully, but these errors were encountered: