Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to set limits properly #2632

Closed
b00f opened this issue Nov 2, 2023 · 4 comments
Closed

How to set limits properly #2632

b00f opened this issue Nov 2, 2023 · 4 comments
Assignees

Comments

@b00f
Copy link

b00f commented Nov 2, 2023

Regarding the resource manager limits, we have defined the limits as follows:

maxConns := conf.MaxConns // default is 16
minConns := conf.MinConns // default is 8
limit := lp2prcmgr.DefaultLimits

limit.SystemBaseLimit.ConnsInbound = logScale(maxConns)
limit.SystemBaseLimit.Conns = logScale(2 * maxConns)
limit.SystemBaseLimit.StreamsInbound = logScale(maxConns)
limit.SystemBaseLimit.Streams = logScale(2 * maxConns)

limit.ServiceLimitIncrease.ConnsInbound = logScale(minConns)
limit.ServiceLimitIncrease.Conns = logScale(2 * minConns)
limit.ServiceLimitIncrease.StreamsInbound = logScale(minConns)
limit.ServiceLimitIncrease.Streams = logScale(2 * minConns)

limit.TransientBaseLimit.ConnsInbound = logScale(maxConns / 2)
limit.TransientBaseLimit.Conns = logScale(2 * maxConns / 2)
limit.TransientBaseLimit.StreamsInbound = logScale(maxConns / 2)
limit.TransientBaseLimit.Streams = logScale(2 * maxConns / 2)

limit.TransientLimitIncrease.ConnsInbound = logScale(minConns / 2)
limit.TransientLimitIncrease.Conns = logScale(2 * minConns / 2)
limit.TransientLimitIncrease.StreamsInbound = logScale(minConns / 2)
limit.TransientLimitIncrease.Streams = logScale(2 * minConns / 2)

By default, the minimum is set at 8, and the maximum is set at 16 connections. This means that each node only needs connections with 8 to 16 other nodes.

Consider users who are running the node on their personal computers; some also run the node on a VPS with only 1 or 2 cores and 2 GB of RAM.

So far, the syncing process and networking work smoothly, especially for the consensus messages; we don't have any issues. We have about 400+ computers in our network. However, there are some strange logs in our system that I want to discuss with you:

  1. Failed to open stream: this makes us worried
stream-xxxx: transient: cannot reserve stream: resource limit exceeded
  1. Failed to identify: We have many failed protocol negotiation, once failed we close the connection with the node.
INFO net/identify identify/id.go:427 failed to negotiate identify protocol with peer {"peer": "12D3Koo...", "error": "Application error 0x0 (local)"}
WARN net/identify identify/id.go:399 failed to identify 12D3Koo...: Application error 0x0 (local)
  1. Information about connection manager: Probably not important, but not bad to mention here
INFO connmgr connmgr/connmgr.go:490 open connection count above limit, but too many are in the grace period

Implementation is available here: https://github.com/pactus-project/pactus/tree/main/network

Thanks in advance for your help


The image is reported by one of the community member:
image

@amirvalhalla
Copy link

we enabled metrics to monitoring libp2p better. you can see our monitoring dashboard here.

@master255
Copy link

master255 commented Nov 5, 2023

If you create your own DHT network, there is no need to change the limits. And everything will work faster. It's very easy to do.

@sukunrt
Copy link
Member

sukunrt commented Nov 14, 2023

Failed to open stream: this makes us worried

stream-xxxx: transient: cannot reserve stream: resource limit exceeded

This happens for new streams that are pending protocol negotiation via multistream. These are the limits set by the limit.TransientBaseLimit.* config values. If you In your case, if you have more than 8 streams pending multistream negotiation, you'll trigger this issues. It will help to enable metrics to debug why this is happening.

INFO connmgr connmgr/connmgr.go:490 open connection count above limit, but too many are in the grace period

total connection cound is over connmgr.LowWaterMark but it cannot trim these new connections because they have been around for less than grace period. This is not a problem in itself, especially if your resource manager limits are set right.

INFO net/identify identify/id.go:427 failed to negotiate identify protocol with peer {"peer": "12D3Koo...", "error": "Application error 0x0 (local)"}
WARN net/identify identify/id.go:399 failed to identify 12D3Koo...: Application error 0x0 (local)

This happens when the peer closed the connection before you could run identify on it. Again not sure why this happens without some understanding of your code. It'll help to enable metrics to at least understand why this is happening. One theory is that one side is dropping the new stream because it has exceeded its transient connection limit and the other side was going to negotiate identify on this stream and so the other side prints this log line.

If it's possible, can you run the nodes with debug logs and metrics enabled?

@b00f
Copy link
Author

b00f commented Nov 16, 2023

@sukunrt,

Thank you for your invaluable assistance in addressing the issues in this thread.

We have implemented a connection gater to prevent new connections to open on limit, and this has significantly reduced the number of connection-related errors reported by users. However, we are still encountering some issues, particularly with connections to peers where protocol negotiation is not completed.

Getting supporting protocols is very important for us, as it allows each node to handshake with its neighbors (using streams) in order to start the syncing process. The number of connections without a supporting protocol exchange is significant and appears to be abnormal.

Do you have any idea what leads to nodes opening connections without exchanging supporting protocols? This information would be really helpful in our efforts to resolve these issues.

By the way, I believe @amirvalhalla has sent a link to a specific metric here.

@sukunrt sukunrt self-assigned this Apr 25, 2024
@b00f b00f closed this as completed May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants