Skip to content
This repository has been archived by the owner on Apr 21, 2022. It is now read-only.

Close connections where streams haven't been opened since some time #77

Closed
wants to merge 3 commits into from

Conversation

aarshkshah1992
Copy link
Collaborator

@aarshkshah1992 aarshkshah1992 commented Jan 19, 2021

For libp2p/go-libp2p#1039.

As discussed with @Stebalien , once hole punching is in place, we could end up with multiple connections between peers often. That's because we'd have the Relayed connection on which the successful hole punch was co-ordinated and for QUIC, we'd simply end up with two connections between the peers during a hole punch.

However, in the Swarm, we'll ONLY open stream on the newest direct connection with the most streams going ahead. So, in the connection manager, we should close Connections that haven't seen a stream in some time.

Note that for opening a Stream, the Swarm will always prefer a direct connection over a Relayed connection as implemented in libp2p/go-libp2p-swarm#233.

Copy link
Contributor

@marten-seemann marten-seemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a problem with this approach: QUIC imposes a limit on how many streams can be opened at the same time. Once that limit is reached, calls to OpenStream will block.

It might be cleaner to add a PunchHole(remote *net.UDPAddr) method to the QUIC transport. This method would send out a single UDP packet (with random payload) to the remote addr. That would create the NAT binding we need, and we'd only end up with a single QUIC connection.

@vyzo
Copy link
Contributor

vyzo commented Jan 19, 2021

We also need logic in the swarm to always select a direct connection over a relay connection, otherwise we may end up with clown shoes.

@aarshkshah1992
Copy link
Collaborator Author

@vyzo We already do that. Please see libp2p/go-libp2p-swarm#233.

connmgr.go Outdated
Comment on lines 384 to 387
if !info.lastStreamOpen.IsZero() && time.Since(info.lastStreamOpen) > maxStreamOpenDuration {
selected = append(selected, c)
target--
seen[c] = struct{}{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should only do this if there are no active streams.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And because it might be expensive to count on demand, we can keep a running count using the StreamOpened/StreamClosed events.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vyzo

The current heuristic is:-
"Cleanup connections that vane't seen a stream since 10 Minutes (but could have open streams older than that)".

Are you saying that we should clean-up streams that:
"Don't have any open streams at all inspite of being older than 10 minutes" ?

Any reason to prefer the latter ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That heuristic is incorrect, it will break applications with long lived stream.
Imagine a pubsub client, who simply connects and opens the pubsub streams and does nothing else afterwards.
If we reap the connection because there are no new streams, we just broke the app.

So yeah, we should only close connections if they have been idle for 10min without any streams present.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense.

@aarshkshah1992
Copy link
Collaborator Author

@vyzo

Have addressed your review. Please take a look.

Copy link
Contributor

@vyzo vyzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small coding issues that need to be addressed.

connmgr.go Outdated
@@ -19,6 +19,8 @@ var SilencePeriod = 10 * time.Second

var log = logging.Logger("connmgr")

var maxStreamOpenDuration = 10 * time.Minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name... this is poorly named.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

connmgr.go Outdated
}

// connections that haven't seen a new stream since 10 minutes and have NO streams open.
if !info.lastStreamOpen.IsZero() && time.Since(info.lastStreamOpen) > maxStreamOpenDuration && info.nStreams == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the count check before the time.Since check as it is a lot cheaper and can short-circuit; you can even put it first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, cache the current time and use that instead of doing gettimeofday 1ktimes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.

connmgr.go Outdated
Comment on lines 667 to 673
_, ok = cinf.conns[c]
if !ok {
log.Error("received stream close notification for conn we are not tracking: ", p)
return
}

cinf.conns[c].nStreams--
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clown shoes... you are doing the map lookup twice, and it's not free. Just capture it in a variable in the first lookup.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

connmgr.go Outdated
Comment on lines 641 to 648
_, ok = cinf.conns[c]
if !ok {
log.Error("received stream open notification for conn we are not tracking: ", p)
return
}

cinf.conns[c].lastStreamOpen = time.Now()
cinf.conns[c].nStreams++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing here, you are doing the lookup 3(!!!!) times; just capture it in a variable and use it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@vyzo
Copy link
Contributor

vyzo commented Feb 18, 2021

Also, we need to make sure we are not closing a protected connection with this logic.

@aarshkshah1992
Copy link
Collaborator Author

@vyzo Have addressed your changes.

We don't close protected connections with this logic as we filter out protected connections from the candidate set in the getConnsToClose function before we run this logic.

@aarshkshah1992
Copy link
Collaborator Author

aarshkshah1992 commented Feb 19, 2021

As discussed offline with @vyzo , this is quickly getting messy for now. We don't need this for now because:

i. Limited Relays will anyways close conns after a grace period.
ii. We'll introduce a change in QUIC hole punching to send a UDP packet instead of a full-blown connection to avoid creating multiple connections.

@vyzo vyzo closed this Feb 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants