Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider removing / disabling keep-alives #44

Open
marten-seemann opened this issue Feb 16, 2021 · 9 comments
Open

consider removing / disabling keep-alives #44

marten-seemann opened this issue Feb 16, 2021 · 9 comments

Comments

@marten-seemann
Copy link
Contributor

NAT mappings are kept from expiring if the the NAT occasionally sees a connection being used.

Architecturally, sending keep-alives is a responsibility of the underlying transport, not of the stream multiplexer. For TCP, we can set the SO_KEEPALIVE socket option in go-tcp-transport, and the kernel will take care of keeping the connection alive. When running yamux on top of any other transport, we can't make any assumptions about the necessity and the frequency of keep-alive intervals anyway.

I suggest adding SO_KEEPALIVE support to go-tcp-transport and removing the respective code from go-yamux. Open question: Do we only deprecate Config.EnableKeepAlive and Config.KeepAliveInterval, or do we remove them?

WDYT, @Stebalien and @willscott?

@willscott
Copy link
Contributor

if our release isn't doing anything with them, I'd probably be in favor of a major version update that removes them entirely.

@Stebalien
Copy link
Member

Yeah, I'd love to remove this from the stream multiplexer (especially given that mplex doesn't support it).

However, we need keepalives in all transports before we should consider removing them here.

  1. Websocket.
  2. Relay (so the intermediate node can't "stall" the connection). I'm not sure of a clean way to handle this. Maybe:
    1. Expose whether or not the transports support keepalives via Stat().
    2. When keepalives are not supported, spin up a keepalive process in the host (keeping a keepalive stream open).

@marten-seemann
Copy link
Contributor Author

  1. Websocket.

On the protocol level, Websockets should have support for keep-alive: https://tools.ietf.org/html/rfc6455#section-5.5.2. That said, isn't it guaranteed that we always run Websockets over TCP or QUIC, both of which already support keep-alives?

  1. Relay (so the intermediate node can't "stall" the connection).

I'm not sure there's a problem here. The relay will also run the TCP / QUIC transport, so they'll do keep-alives to both sides of the relayed connection. Even if they disable that (don't really see a reason why they'd do that, but let's just assume they do), receiving and acknowledging a keep-alive packet from both connection endpoints should be sufficient to keep any NAT binding alive.

@Stebalien
Copy link
Member

That said, isn't it guaranteed that we always run Websockets over TCP or QUIC, both of which already support keep-alives?

Yes, we just need to enable them on the websocket transport. That is, the websocket transport doesn't, IIRC, use the TCP transport under the covers. It uses a shared "reuseport transport".

@Stebalien
Copy link
Member

I'm not sure there's a problem here. The relay will also run the TCP / QUIC transport, so they'll do keep-alives to both sides of the relayed connection. Even if they disable that (don't really see a reason why they'd do that, but let's just assume they do), receiving and acknowledging a keep-alive packet from both connection endpoints should be sufficient to keep any NAT binding alive.

The problem is the relay. The relay can let the connection start, then DoS both sides by hanging (not necessarily due to malice). Without keepalives, we won't notice that the connection simply doesn't work. With keepalives, the relay will need to let some traffic through sometimes.

(obviously, we should try to upgrade anyways; but we still want to detect hung relayed connections)

@marten-seemann
Copy link
Contributor Author

The problem is the relay. The relay can let the connection start, then DoS both sides by hanging (not necessarily due to malice). Without keepalives, we won't notice that the connection simply doesn't work.

A malicious relay can always stall the connection, so that's nothing we can defend against. The only thing we can and should do is detect it and time out in a timely manner.
In general, to keep a NAT bindings fresh, only one of the peers has to use keep-alives: As a keep-alive involves sending a packet and receiving an acknowledgement for that packet, every NAT on the way will see both an incoming and an outgoing packet. We only enable keep-alives on both sides to speed up the connection timeout in case the other peer goes offline.

@Stebalien
Copy link
Member

A malicious relay can always stall the connection, so that's nothing we can defend against. The only thing we can and should do is detect it and time out in a timely manner

Yep, that's my point. We need some form of keepalive to detect stalled connections.

@marten-seemann
Copy link
Contributor Author

Yep, that's my point. We need some form of keepalive to detect stalled connections.

Maybe I'm misunderstanding you. Dont we have that, because the two peers send keepalives to the relay?

@Stebalien
Copy link
Member

Stebalien commented Feb 17, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants