Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(quic): implement hole punching #3964

Merged
merged 44 commits into from Jun 13, 2023
Merged

feat(quic): implement hole punching #3964

merged 44 commits into from Jun 13, 2023

Conversation

arpankapoor
Copy link
Contributor

@arpankapoor arpankapoor commented May 22, 2023

Description

Implement Transport::dial_as_listener for QUIC as specified by the DCUtR spec.

To facilitate hole punching in QUIC, one side needs to send random UDP packets to establish a mapping in the routing table of the NAT device. If successful, our listener will emit a new inbound connection. This connection needs to then be sent to the dialing task. We achieve this by storing a HashMap of hole punch attempts indexed by the remote's SocketAddr. A matching incoming connection is then sent via a oneshot channel to the dialing task which continues with upgrading the connection.

Related #2883.

Notes & open questions

Please consider this a first draft. The design is largely similar to go-libp2p, except that I've only used the remote address as the hole punching attempt key. I have also added 2 examples analogous to the TCP hole punch tutorial and am able to establish a direct connection upgrade. Questions/issues to consider:

  • rust-libp2p's Transport::dial/Transport::dial_as_listener doesn't have peerid as an argument which go-libp2p does. It could perhaps be extracted from the multiaddr or the trait could be changed to pass the peerid as well (not sure how involved this might be).
  • Transport trait doesn't receive the fully established Connection for us to be able to extract the peerid from the incoming connection. Without the peerid, we are assuming that an incoming connection from the ip:port for which holepunching was previously attempted is from the same peer.
  • Which listener should be used to send the random UDP packets? go-libp2p uses the OS routing table, while for this PR, I've simply used the same code as Transport::dial.

Apart from this, I think I also found an issue with the DCUtR NetworkBehavior's implementation of handle_established_[in|out]bound_connection. Since outgoing_direct_connection_attempts is indexed by the relayed ConnectionId, shouldn't this method first try to get the relayed connection id corresponding to the possibly upgraded direct connection from direct_to_relayed_connections? If this is an issue, I could possibly create a separate PR for this.

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

@arpankapoor arpankapoor changed the title Implement Transport::dial_as_listener for QUIC feat(QUIC): Implement Transport::dial_as_listener for QUIC May 22, 2023
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, thank you @arpankapoor for your contribution! Also exciting that this working for you in the wild.

In case you haven't seen it yet, note that we will likely move to quinn instead of quinn-proto with our own wrapper "soon". See #3454 and #2883 (comment).

I will give this an in-depth review. No action required from your end yet. Posting some comments so that I don't forget.

examples/relay-server-quic/src/main.rs Outdated Show resolved Hide resolved
transports/quic/src/transport.rs Outdated Show resolved Hide resolved
transports/quic/src/transport.rs Outdated Show resolved Hide resolved
@arpankapoor
Copy link
Contributor Author

Thanks for the initial reviews @mxinden, @thomaseizinger and @kpp! I would like to hear your comments/suggestions on the open questions I posted in the first comment, especially regarding addition of peerid to HolePunchKey.

@thomaseizinger
Copy link
Contributor

  • rust-libp2p's Transport::dial/Transport::dial_as_listener doesn't have peerid as an argument which go-libp2p does. It could perhaps be extracted from the multiaddr or the trait could be changed to pass the peerid as well (not sure how involved this might be).

  • Transport trait doesn't receive the fully established Connection for us to be able to extract the peerid from the incoming connection. Without the peerid, we are assuming that an incoming connection from the ip:port for which holepunching was previously attempted is from the same peer.

If we expect a particular PeerId when dialing an address, then the address will have a /p2p part so you can attempt to extract it and compare it later when the connection upgrade is complete. Does that help?

@kpp
Copy link
Contributor

kpp commented May 24, 2023

@arpankapoor I implemented the logic for questions 1 and 2 from the description of the PR in e507e44 . Thank you!

arpankapoor and others added 2 commits May 25, 2023 13:51
@arpankapoor
Copy link
Contributor Author

@arpankapoor I implemented the logic for questions 1 and 2 from the description of the PR in e507e44 . Thank you!

Thanks a lot @kpp! I've copied over the relevant pieces here.

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done passing an potentially hole-punching inbound connection all the way back to the dialing end. While this adds complexity to libp2p-quic, it nicely hides it such that upper layers don't need to worry about it.

transports/quic/src/hole_punching.rs Outdated Show resolved Hide resolved
transports/quic/src/hole_punching.rs Outdated Show resolved Hide resolved
transports/quic/src/hole_punching.rs Outdated Show resolved Hide resolved
transports/quic/src/hole_punching.rs Outdated Show resolved Hide resolved
transports/quic/src/hole_punching.rs Outdated Show resolved Hide resolved
transports/quic/src/transport.rs Outdated Show resolved Hide resolved
transports/quic/src/transport.rs Outdated Show resolved Hide resolved
transports/quic/src/hole_punching.rs Outdated Show resolved Hide resolved
@mxinden
Copy link
Member

mxinden commented May 30, 2023

Moving an out-of-band discussion here:

any chance you can consider [this pull request] for the upcoming release?

Note that libp2p-quic is not included in the libp2p meta crate, and thus a breaking change in libp2p-quic does not require a breaking change in libp2p. This allows us to release libp2p-quic at any point in time, outside of the libp2p v0.52.0 work.

(We can not release it before libp2p v0.52.0 as it depends on crate versions part of libp2p v0.52.0.)

@arpankapoor
Copy link
Contributor Author

This looks good to me. Only missing change is the move to (2), i.e. not waiting to upgrade an inbound connection, but instead match a outgoing hole punch with an inbound connection based on the address, not the peer ID.

Given that libp2p-quic is not part of libp2p, we don't have to rush here to get it into the libp2p v0.52.0 release. Once this pull request is merged we can cut a new release of libp2p-quic independent from any other release work.

@arpankapoor can you please add a changelog entry to transports/quic/CHANGELOG.md?

I've made the move to (2) and added a changelog entry. Please review the latest changes.

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am liking where this is going! Some minor comments.

I wonder how we can best write a test for this. It would be really good to have some automated tests. But perhaps that is best done at an interop-test level?

transports/quic/src/transport.rs Outdated Show resolved Hide resolved
transports/quic/src/transport.rs Outdated Show resolved Hide resolved
transports/quic/src/transport.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go from my end. I'd like @mxinden's approval too.

transports/quic/src/transport.rs Outdated Show resolved Hide resolved
transports/quic/src/transport.rs Outdated Show resolved Hide resolved
transports/quic/src/transport.rs Show resolved Hide resolved
@thomaseizinger
Copy link
Contributor

CI needs to be fixed.

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the comments @thomaseizinger raised in the last review and the failing CI, this looks good to me.

I will have low availability next week. @thomaseizinger please merge once you are happy.

transports/quic/src/transport.rs Outdated Show resolved Hide resolved
@thomaseizinger thomaseizinger changed the title feat(QUIC): Implement Transport::dial_as_listener for QUIC feat(quic): implement hole punching Jun 11, 2023
transports/quic/src/hole_punching.rs Outdated Show resolved Hide resolved
transports/quic/src/transport.rs Outdated Show resolved Hide resolved
transports/quic/src/transport.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Thank you so much for this work and your patience in iterating on it. I'll test this manually tomorrow and merge once successful.

@stormshield-pj50
Copy link
Contributor

From my understanding, sending random packets is required by the dcutr specification. Doing the simultaneous dial isn't enough ? Do we really need those random packets to hole punch ? Are they a plus ?

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Jun 13, 2023

From my understanding, sending random packets is required by the dcutr specification. Doing the simultaneous dial isn't enough ? Do we really need those random packets to hole punch ? Are they a plus ?

The random packets establish a mapping in your NAT device which allow the packets of the incoming connection to be routed through. A simultaneous dial is different because the parties need to somehow work out, how the dialer and the listener is.

It is my understanding that this works well enough for TCP because TCP doesn't really care about the "role" of a socket. For QUIC however, establishing the socket involves a cryptographic handshake which also assumes certain roles (who sends client-hello vs server-hello etc). If we were to just sim-open two connections with QUIC, we'd need support in the underlying QUIC stack to override the role. This is essentially the abstraction we have in rust-libp2p with dial_as_listener.

There may be other reasons too but it seems like just sending random UDP packets to allow the other connection to be routed through is a much simpler solution and hence preferable.

I wasn't involved in the dcutr spec, perhaps @mxinden can explain this further.

@stormshield-pj50
Copy link
Contributor

From my understanding, sending random packets is required by the dcutr specification. Doing the simultaneous dial isn't enough ? Do we really need those random packets to hole punch ? Are they a plus ?

The random packets establish a mapping in your NAT device which allow the packets of the incoming connection to be routed through. A simultaneous dial is different because the parties need to somehow work out, how the dialer and the listener is.

It is my understanding that this works well enough for TCP because TCP doesn't really care about the "role" of a socket. For QUIC however, establishing the socket involves a cryptographic handshake which also assumes certain roles (who sends client-hello vs server-hello etc). If we were to just sim-open two connections with QUIC, we'd need support in the underlying QUIC stack to override the role. This is essentially the abstraction we have in rust-libp2p with dial_as_listener.

There may be other reasons too but it seems like just sending random UDP packets to allow the other connection to be routed through is a much simpler solution and hence preferable.

I wasn't involved in the dcutr spec, perhaps @mxinden can explain this further.

Thanks for your response, this is clear now. So the incoming connection is authorized through the NAT thanks to the random packets sent by the puncher and eventually this incoming connection is used as the dialing one. However those random UDP packets may be dropped by a firewall which performs deep packet inspection, preventing the hole punch to succeed.

@thomaseizinger
Copy link
Contributor

From my understanding, sending random packets is required by the dcutr specification. Doing the simultaneous dial isn't enough ? Do we really need those random packets to hole punch ? Are they a plus ?

The random packets establish a mapping in your NAT device which allow the packets of the incoming connection to be routed through. A simultaneous dial is different because the parties need to somehow work out, how the dialer and the listener is.
It is my understanding that this works well enough for TCP because TCP doesn't really care about the "role" of a socket. For QUIC however, establishing the socket involves a cryptographic handshake which also assumes certain roles (who sends client-hello vs server-hello etc). If we were to just sim-open two connections with QUIC, we'd need support in the underlying QUIC stack to override the role. This is essentially the abstraction we have in rust-libp2p with dial_as_listener.
There may be other reasons too but it seems like just sending random UDP packets to allow the other connection to be routed through is a much simpler solution and hence preferable.
I wasn't involved in the dcutr spec, perhaps @mxinden can explain this further.

Thanks for your response, this is clear now. So the incoming connection is authorized through the NAT thanks to the random packets sent by the puncher and eventually this incoming connection is used as the dialing one. However those random UDP packets may be dropped by a firewall which performs deep packet inspection, preventing the hole punch to succeed.

Yes, but I wouldn't know how we could circumvent this. DPI may block all sorts of traffic.

@thomaseizinger
Copy link
Contributor

Awesome!

Thank you so much for this work and your patience in iterating on it. I'll test this manually tomorrow and merge once successful.

I just tested this successfully with my laptop sitting behind CG-NAT as the dialer (can't hole-punch through to CG-NAT unfortunately) and a separate machine in the same city but completely different network and it worked!

Let's go!

@mergify mergify bot merged commit cf3e4c6 into libp2p:master Jun 13, 2023
63 checks passed
.with("0.0.0.0".parse::<Ipv4Addr>().unwrap().into())
.with(Protocol::Tcp(0)),
)
.listen_on("/ip4/0.0.0.0/udp/0/quic-v1".parse().unwrap())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which means that there is no way to perform hole punching in the example because eligible_listener checks is_loopback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the example for testing this in two real networks. If you run this example on localhost, you are not really testing hole punching but just that the protocol is executed. On localhost, the connection should always succeed because there is no need for the random UDP packets to punch a hole. Am I wrong?

@mxinden
Copy link
Member

mxinden commented Jun 19, 2023

However those random UDP packets may be dropped by a firewall which performs deep packet inspection, preventing the hole punch to succeed.

Note @stormshield-pj50 that most of a QUIC UDP packet is encrypted, thus not much for deep-packet-inspection to analyze. Though in case DPI is ever a problem we could be smarter around faking the non-encrypted parts.

@mxinden
Copy link
Member

mxinden commented Jun 19, 2023

Very happy to see this merged.

Thank you @arpankapoor for the proactive work here.

Thank you @thomaseizinger and @kpp for the continuous reviews and testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants