Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of disconnected peers in PubSub #506

Open
Wondertan opened this issue Nov 19, 2022 · 6 comments
Open

Improve handling of disconnected peers in PubSub #506

Wondertan opened this issue Nov 19, 2022 · 6 comments

Comments

@Wondertan
Copy link
Contributor

Context

Right now, the implementation handles dead peers smartly but uncommonly. PubSub determines a dead peer if the write stream returns EOF or other error. This is possible due to the dual-stream model, where peers establish long-lived outbound streams to each other for writes and receive inbound streams for reads.

Problems

  • Isn't idiomatic. The canonical go-libp2p way of handling dead/disconnected peers relies on the Notfiee interface(deprecated) or EvtPeerConnectednessChanged.
  • Undocumented. The exact behavior should be brought up to specs, as the recent dual-stream requirement, and synchronized across implementations.
  • Allocates. To handle dead peers, we allocate a Reader with 4K buffers per peer. Moreover, a malicious peer can send a message and allocate a receive buffer in the Reader of up to 1M. A small yet DOS vector.

Improvement Paths

  • Migrate to the event system, specifically EvtPeerConnectednessChanged, to handle dead peers and spec it out.

Not sure what event system status is in other libp2p implementations.

  • Keep the existing way of handling dead/disconnected peers. Remove allocations and DOS vector by reading the varint only and killing the stream. Spec out the behavior. (by @vyzo )

Other ideas?

@vyzo
Copy link
Collaborator

vyzo commented Nov 19, 2022

The advantage of the current system is that it is very responsive and it also works at the stream level.

Migrating to use the event bus (which appeared in libp2p way after pubsub :) may be worthwhile, but I would suggest that we improve the current peer detection to remove allocation first.
This should be straightforward to implement, and it is not mutually exclusive with using the event bus later.

@Wondertan
Copy link
Contributor Author

@vyzo, should we implement this or wait for inputs from other folks:?

@vyzo
Copy link
Collaborator

vyzo commented Nov 19, 2022

I think we can implement it as it is, it will fix the (minor) dos vecror.

@Wondertan
Copy link
Contributor Author

Ok. Follow up question. Why do we need to read specifically varint or is one byte enough? Should we kill a peer or reset a stream if the unexpected message arrives?

@vyzo
Copy link
Collaborator

vyzo commented Nov 19, 2022

I think just a byte is enough.
Killing the stream is reasonable i think, it may well be unusable at this point and indicative of a bug on the other side.

@vyzo
Copy link
Collaborator

vyzo commented Nov 19, 2022

There is also a backoff mechanism for dealing with dead peers, so we might as well declare the peer dead and enter it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants