Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support per-topic TCP connections #455

Open
synzhu opened this issue Sep 24, 2021 · 12 comments
Open

Support per-topic TCP connections #455

synzhu opened this issue Sep 24, 2021 · 12 comments

Comments

@synzhu
Copy link
Contributor

synzhu commented Sep 24, 2021

Can we optionally support dedicated TCP connections per-topic?

This would make it easier to implement traffic prioritization / rate-limiting based on topic.

@Stebalien
Copy link
Member

Unfortunately, no. That would require refactoring all of libp2p, or just not using libp2p. Per-topic rate-limiting would need to be implemented inside pubsub itself.

The simplest solution would be a custom validator.

@synzhu
Copy link
Contributor Author

synzhu commented Sep 27, 2021

@Stebalien we can implement rate-limiting inside pubsub, but this would be L7 rate-limiting right?

Having optional per-topic connections would allow user to do stuff at L3/4 instead.

Can we reopen this as discussion for long term impl?

@huitseeker
Copy link

That would require refactoring all of libp2p, or just not using libp2p.

Would you have more details? I don't remember @vyzo being as categorical when we broached the subject.

The idea is we already have several possible endpoints for a node exposed through a multi-address. We could choose to use one or the other depending on which topic a particular message fits. Not an easy lift by any means, but I may be missing why it's such a daunting task?

The payoff for this is that a lot of tools for traffic shaping become instantly available when operating on different TCP endpoints, including, but not limited to rate-limiting and QoS

@Stebalien
Copy link
Member

Thinking about this a bit... The best way to do this would be to just have separate pubsub instances per topic with separate libp2p hosts (with separate peer IDs).

Trying to do this inside of libp2p would be a mess. The best way I can think of doing this would be to "label" connections:

  1. When creating a new stream or connection, the user could specify a "label". If a connection with such a label exists, it would be used. Otherwise, a new connection would be created.
  2. When receiving streams, the receiver would be able to look at the "label" to decide what to do with the connection.

However:

  1. This would add an entirely new dimension to libp2p (instance -> connection -> stream would become instance -> label -> connection -> stream).
  2. For TCP, at least, we'd likely need to pick a different source port per label for outbound dials. Currently, we always use the port we're listening on but that would prevent us from opening multiple connections.

@Stebalien
Copy link
Member

Basically, supporting this natively inside of libp2p would need a very good motivation and a massive design lift.

@Stebalien
Copy link
Member

To be clear, I think a connection labeling feature could be otherwise useful:

  1. NewStream could take a label "policy" (specifying that any label matching the policy may be used) along with a default label for the underlying connection if none exists.
  2. Transports could have label policies, allowing some transports to be used for some classes of connections.

This could even replace our current system for "transient" connections.

  1. The default label policy for new streams would be "default".
  2. When a transient stream is acceptable, NewStream would be called with a label policy of (default, transient).
  3. The relay transport would only create connections with the "transient" label.

But this would need a lot of fleshing out and would, again, be a pretty big design lift (and this repo likely isn't the right place to figure it out).

@Stebalien
Copy link
Member

So, this topic has been bothering me and I think I was too hasty in closing it. I don't think automatically opening a new TCP connection per topic is viable, but I think we can come up with a viable solution. It's not going to be easy, but we might as well discuss it.

I've opened an issue (libp2p/libp2p#99) to discuss a general solution to this problem but there are likely easier short-term solutions (like using multiple libp2p nodes) that we should consider first.

@marten-seemann
Copy link
Contributor

Can we optionally support dedicated TCP connections per-topic?

This would make it easier to implement traffic prioritization / rate-limiting based on topic.

I can see how doing prioritization / rate limiting on L4 can make reusing existing tools easier, but it does come with significant drawbacks. Having multiple connections to the same endpoint will:

  • increase handshake latency
  • increase FD usage (at least if you don't use QUIC)
  • lead to competing congestion controllers (each congestion controller will ramp up its send rate until it triggers a loss event on a competing connection), effectively reducing overall throughput on all connections
  • make it impossible to share congestion state (RTT, RTTvar, cwnd, tail loss probes) between connections

I'd like to point out that over the last decade, there has been a massive push in the transport community towards reducing the number of concurrent connections, starting with SPDY stream muxing (which evolved to HTTP/2) and moving towards QUIC (which gives you a stream muxer with non-HoL-blocked streams).

It sounds like what you're trying to do is L6/L7 prioritization, and I'd argue that this should be implemented at that layer. Looking at prior art, HTTP faced a very similar problem, and solved it introducing a prioritization scheme as part of HTTP/2.

@Stebalien
Copy link
Member

I generally agree, but I can also see a use-case where someone may want to use entirely separate links for different types of traffic.

@vyzo
Copy link
Collaborator

vyzo commented Oct 14, 2021

In terms of implementation in pubsub, it would take a modest refactoring to allow multiple senders/receivers per peer. Not trivial by any means, but not infeasible either.

@synzhu
Copy link
Contributor Author

synzhu commented Oct 22, 2021

So, the idea is this would not be "per topic for all" but "per topic as an exception". In otherwords, there would be a default connection that pub/sub messages go over, and then there should be some number of per-topic ones, if user decides that they want a dedicated connection for some topics.

One thing that would probably need be solved here is, how would sender know whether or not to open a new connection for a peer to send a message on a topic? Peers would probably need to exchange information about any dedicated per-topic endpoints they use upon initially joining each topic or something?

It sounds like what you're trying to do is L6/L7 prioritization, and I'd argue that this should be implemented at that layer.

On that topic, how would one achieve this today? I don't think there is any prioritization mechanism built into libp2p as of now. If I were to implement this below libp2p, I guess I would need some way to inspect the Topic field of each underlying message sent over the connection.

@Stebalien
Copy link
Member

One thing that would probably need be solved here is, how would sender know whether or not to open a new connection for a peer to send a message on a topic? Peers would probably need to exchange information about any dedicated per-topic endpoints they use upon initially joining each topic or something?

IMO, this would need to be a global policy in your network, not something negotiated after the fact. The later would be pretty invasive.

On that topic, how would one achieve this today? I don't think there is any prioritization mechanism built into libp2p as of now. If I were to implement this below libp2p, I guess I would need some way to inspect the Topic field of each underlying message sent over the connection.

This would need to be implemented in pubsub itself, likely using async validators to feed all messages through some priority queue.


Have you considered the multiple peer ID approach? It's sounding more and more like that would just "solve" the issue for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants