Set a better default for `idle_connection_timeout` #4912

thomaseizinger · 2023-11-22T03:11:17Z

As discussed yesterday in #4905.

Needs:

Decide on a better default (!= 0 seconds)
Update of all examples
Some docs on the method explaining what it does any why

dariusc93 · 2023-11-22T05:39:43Z

A default of 10 seconds may be enough to accommodate majority of the protocols, but setting to 30 seconds may be the best balance in my opinion

thomaseizinger · 2023-11-22T06:12:27Z

30 seconds seems like a lot to me. The main usecase that we talked about yesterday was to be able to initiate protocols with peers from outside the Swarm based on events like kademlia queries. Those would usually happen very fast and a default timeout of 5 seconds would easily accommodate for that.

Anything longer feels too application-specific but I don't have any data to back that up.

thomaseizinger · 2023-12-01T01:57:30Z

An alternative would be to entirely remove auto-closing and instead emit an event notifying the user that the connection is idle. Would that perhaps be useful?

dariusc93 · 2023-12-01T02:28:20Z

An alternative would be to entirely remove auto-closing and instead emit an event notifying the user that the connection is idle. Would that perhaps be useful?

That actually doesnt sound like a bad idea. Kind of curious to see how that would play out long term.

thomaseizinger · 2023-12-01T04:59:36Z

An alternative would be to entirely remove auto-closing and instead emit an event notifying the user that the connection is idle. Would that perhaps be useful?

That actually doesnt sound like a bad idea. Kind of curious to see how that would play out long term.

Typically, the problem with these kind of approaches is that by default, users will only handle the events that the care about and _ => {} match the rest. So an event ConnectionIsIdle will likely go unnoticed for many of them.

In this particular case though, that doesn't actually do any harm! (Until you run into connection limits for example). At that point however, you are probably familiar enough with rust-libp2p to learn, that this event exists and you can take whatever action you prefer.

To make the simple case of a timeout easier, we could re-emit this event every X seconds and include, how long the connection is already idle. Then users wouldn't need to keep any state outside of the Swarm if they e.g. wanted to close connections after them being idle for say 10 seconds.

Curious to hear what @mxinden thinks.

thomaseizinger · 2023-12-06T22:02:40Z

I find it quite difficult to gauge what would be considered a bigger annoyance / "rust-libp2p"-ism:

Auto-closing connections when they are idle (regardless how long the timeout is)
Having to close idle connections yourself

From a library PoV, the latter is "better" because it gives the user more power over the policy on when to close connections. At the same time, not closing them at all might be a weird default?

We could add ConnectionEvent::StateChange { idle: bool } which would allow us to add a module that can auto-close connections once they are idle for X duration.

DougAnderson444 · 2023-12-06T22:52:18Z

I'm curious what amount of resources open connections use. Are they cheap or expensive? I would think that would influence the decision a bit?

thomaseizinger · 2023-12-06T23:04:47Z

On @mxinden kademlia-exporter node, a single connection uses about 300kb. I'd expect an idle connection, even with more protocols enabled to not consume much more than that because the presence of buffered data would mean that the connection is not idle. Hence, all buffers should be empty on an idle connection, unless we have a memory leak somewhere.

In addition to that, an open connection consumes a file descriptor (or equivalent depending on OS).

maqi · 2023-12-12T10:58:31Z

couple of things might related to this idle connection handling, that we observed from our large scaled network (over 4k nodes) using kad + gossipsub :
1, the connection that being openned (via get_closest_peer or put/get record ops or by gossipsub) remains alive and not get auto closed , even with idel_connection_timeout got setup via with_idle_connection_timeout .
My guess of that is: the connects was openned for network query like get_closest, but as there is gossipsub that will periodicaly contact connected peers, those openned connection then remains non-ilde even the original network query ops got completed.
2, to keep a live connection, the per connection mem_usage is not much (around 50-150KB per connection).
But with the point 1, with large scaled testnet, we observed nodes holding over 3k connections and consuming 400MB.
3, We have to deploy a mechanism to manually close the outdated (has been established for a while) connections from not in kbuckets peers
This reduces the mem_usage a lot as live connections is capped by number of peers in the RT.
4, The gossipsub turns out to be rely on the live connections (instead of local peer knowledge + establish connection on request) to ensure a message reaches all subscribers.
We noticed there is no loss of topic msg when nodes keep all connections live, but will lost some topic msg when close some non-RT connections

dariusc93 · 2023-12-12T11:45:40Z

1, the connection that being openned (via get_closest_peer or put/get record ops or by gossipsub) remains alive and not get auto closed , even with idel_connection_timeout got setup via with_idle_connection_timeout .
My guess of that is: the connects was openned for network query like get_closest, but as there is gossipsub that will periodicaly contact connected peers, those openned connection then remains non-ilde even the original network query ops got completed.

To my knowledge, if the connected peers are subscribed to the same gossipsub topic on the node, their connection will be kept alive until they are no longer apart of the mesh (eg unsubscribed from the topic). Not sure if this is applicable for your use case though.

2, to keep a live connection, the per connection mem_usage is not much (around 50-150KB per connection).
But with the point 1, with large scaled testnet, we observed nodes holding over 3k connections and consuming 400MB.

Just being curious, is this with just TCP or TCP + Quic setup?

maqi · 2023-12-12T12:00:49Z

if the connected peers are subscribed to the same gossipsub topic on the node

No, most of the nodes are not subscribed to any topic. Only 2% of the nodes subscribed to the same topic.
i.e. with 4k nodes, only 80 nodes are within gossipsub.
However, gossipsub protocol does mention it will broadcast to non-subscribed connected peers as I understand.

is this with just TCP or TCP + Quic setup?

just TCP

thomaseizinger · 2024-01-15T01:45:22Z

An active gossipsub stream will keep the connection alive even if the nodes aren't subscribed to the same topics. There isn't really much we can do about that. The best solution is to (as you are already doing), close those connections manually.

thomaseizinger assigned mxinden Nov 22, 2023

thomaseizinger mentioned this issue Nov 30, 2023

Update libp2p from 0.50.0 to 0.53.1 FuelLabs/fuel-core#1379

Merged

thomaseizinger linked a pull request Dec 1, 2023 that will close this issue

feat(swarm): set default for idle-connection-timeout to 10s #4967

Draft

4 tasks

thomaseizinger mentioned this issue Dec 5, 2023

fix(webrtc example): clarify idle connection timeout #4966

Merged

2 tasks

thomaseizinger mentioned this issue Jan 13, 2024

Peer closes connection before other peer has read all the data #5060

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set a better default for `idle_connection_timeout` #4912

Set a better default for `idle_connection_timeout` #4912

thomaseizinger commented Nov 22, 2023

dariusc93 commented Nov 22, 2023

thomaseizinger commented Nov 22, 2023

thomaseizinger commented Dec 1, 2023

dariusc93 commented Dec 1, 2023

thomaseizinger commented Dec 1, 2023 •

edited

thomaseizinger commented Dec 6, 2023

DougAnderson444 commented Dec 6, 2023

thomaseizinger commented Dec 6, 2023

maqi commented Dec 12, 2023 •

edited

dariusc93 commented Dec 12, 2023

maqi commented Dec 12, 2023

thomaseizinger commented Jan 15, 2024

Set a better default for idle_connection_timeout #4912

Set a better default for idle_connection_timeout #4912

Comments

thomaseizinger commented Nov 22, 2023

dariusc93 commented Nov 22, 2023

thomaseizinger commented Nov 22, 2023

thomaseizinger commented Dec 1, 2023

dariusc93 commented Dec 1, 2023

thomaseizinger commented Dec 1, 2023 • edited

thomaseizinger commented Dec 6, 2023

DougAnderson444 commented Dec 6, 2023

thomaseizinger commented Dec 6, 2023

maqi commented Dec 12, 2023 • edited

dariusc93 commented Dec 12, 2023

maqi commented Dec 12, 2023

thomaseizinger commented Jan 15, 2024

Set a better default for `idle_connection_timeout` #4912

Set a better default for `idle_connection_timeout` #4912

thomaseizinger commented Dec 1, 2023 •

edited

maqi commented Dec 12, 2023 •

edited