fix(quic): fix address translation #4896

nazar-pc · 2023-11-19T09:39:29Z

Description

This fixes address translation for QUIC that was essentially non-existent before.

Notes & open questions

Test is analogous to corresponding test in TCP protocol implementation. I have added test for both async-std and tokio, even though compiling crate with just tokio feature enabled causes a lot of compilation warnings.

What I'm not sure about is whether old behavior is intentional, but to be it seemed like a bug that caused major issues.

Change checklist

I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
A changelog entry has been made in the appropriate crates

thomaseizinger

Thanks! I wonder why it was implemented like this ..

Maybe @mxinden remembers.

thomaseizinger · 2023-11-19T23:29:48Z

transports/quic/tests/smoke.rs

+    let mut iter = translated.iter();
+    assert_eq!(iter.next(), Some(Protocol::Ip4(observed_ip)));
+    assert_eq!(iter.next(), Some(Protocol::Udp(port)));
+    assert_eq!(iter.next(), Some(Protocol::QuicV1));
+    assert_eq!(iter.next(), None);


Wouldn't it be cleaner to use assert_eq against a constructed address?

This test is translated 1:1 from TCP test, I can update it, but then TCP should probably be updated too (and maybe others?).

thomaseizinger · 2023-11-19T23:30:07Z

transports/quic/tests/smoke.rs

+        .with(Protocol::Ip4(observed_ip))
+        .with(Protocol::Udp(1))
+        .with(Protocol::QuicV1)
+        .with(Protocol::P2p(PeerId::random()));


Is this one important?

thomaseizinger · 2023-11-19T23:30:27Z

transports/quic/tests/smoke.rs

+    let quic_listen_addr = Multiaddr::empty()
+        .with(Protocol::Ip4(Ipv4Addr::new(127, 0, 0, 1)))
+        .with(Protocol::Udp(port))
+        .with(Protocol::QuicV1);


Perhaps a helper function that takes in an Ipv4Addr and a port would make this test a bit more concise.

mxinden · 2023-11-20T18:57:37Z

transports/quic/src/transport.rs

-        Some(observed.clone())
+        address_translation(listen, observed)


I am not sure what issue this is addressing.

The libp2p_core::address_translation function takes the port of listen and the IP of observed. This is relevant in the case of TCP without port-reuse where one does not listen on the port of an outgoing connection, i.e. here were one does not listen of the port of observed.

The relevant section of the address_translation doc comment:

rust-libp2p/core/src/translation.rs

Lines 29 to 32 in 338e467

/// This function can for example be useful when handling tcp connections. Tcp does not listen and

/// dial on the same port by default. Thus when receiving an observed address on a connection that

/// we initiated, it will contain our dialing port, not our listening port. We need to take the ip

/// address or dns address from the observed address and the port from the original address.

I argue that this is not relevant for QUIC, given that QUIC will use the same UDP port for both outgoing and incoming connections. The port of observed is either equal to the port of listen, or, if the local node is behind a NAT, equal to the public NAT port mapping of listen. In the former case address_translation is a no-op. In the latter case address_translation returns the wrong port, namely LAN port, not the public Internet port.

Note that with the upcoming redesign of #4568 we can get rid of Transport::address_translation. Whether a connection has been established using port-reuse or not will be visible to a NetworkBehaviour. Thus e.g. libp2p-identify receiving an observed address can judge itself, whether it needs to alter any ports.

Also see #2289 (comment) for past discussion.

I am not sure what issue this is addressing.

This addresses the problem of ephemeral ports with QUIC. At least when behind NAT with port forwarding, without address translation you'll end up with invalid external port. The more annoying thing is that it will be valid for the duration of Autonat probing, but when later anyone else tries to reach the node, the port is no longer functional.

The port of observed is either equal to the port of listen, or, if the local node is behind a NAT, equal to the public NAT port mapping of listen.

This is most certainly not true in practice, at least in some cases. Even on my machine with pfSense acting as a router, I have seen random ports all over the place observed by remote nodes.

In the former case address_translation is a no-op.

Yes

In the latter case address_translation returns the wrong port, namely LAN port, not the public Internet port.

No, in this case address translation will result in correct port. I guess it depends on how to do port forwarding, it should be possible for router to map ports both ways, but I can't imagine many users actually doing it, it seems that external ports are mapped onto LAN ports, but probably rarely outgoing LAN ports are mapped onto specific WAN ports. This is my educated guess based on observation, I'm not claiming some consumer routers are not doing that.

Note that with the upcoming redesign of #4568 we can get rid of Transport::address_translation. Whether a connection has been established using port-reuse or not will be visible to a NetworkBehaviour. Thus e.g. libp2p-identify receiving an observed address can judge itself, whether it needs to alter any ports.

Interesting. Well, then this will need to be adjusted when that happens, but in the meantime network needs to function and lack of address translation was a big issue for us since many nodes were not able to discover their public addresses with QUIC (that we use almost exclusively in place of TCP now).

Also see #2289 (comment) for past discussion.

This means the current implementation is still wrong though as it will happily return an observed TCP address as the translated address. If QUIC is "before" TCP in the transport stack, it would return a wrong address, wouldn't it?

See lines 252-256 above, the protocol is checked there and returns None if it doesn't match. Same is done in TCP and it works properly.

See lines 252-256 above, the protocol is checked there and returns None if it doesn't match. Same is done in TCP and it works properly.

Ah you are right, I missed that because the diff didn't show it 🙄

I don't really understand why we need to change anything here then. QUIC doesn't have a concept of ephemeral ports so address_translation can't do anything useful. If you still observe changing ports with QUIC, it might be that you are sitting behind a symmetric NAT. We can't do anything against that I am afraid.

The more annoying thing is that it will be valid for the duration of Autonat probing, but when later anyone else tries to reach the node, the port is no longer functional.

We are aware of this issue. Unfortunately, it is a design flaw in AutoNAT and one of the things that prompted the design of AutoNATv2. In v2, the server uses a different outgoing port to perform the dial-back, meaning we don't accidentally "hole-punch" through the NAT and get a more trustworthy result that the address is actually reachable.

If you still observe changing ports with QUIC, it might be that you are sitting behind a symmetric NAT. We can't do anything against that I am afraid.

But I still might have port forwarding configured such that the port matching my listening port is still reachable from the outside. This is what we recommend our users to do when they are behind NAT and it seems to work. It doesn't cover the case when forwarded port doesn't match listening port, in that case user has to specify external address explicitly, but it is still useful to do address translation, as mentioned above, in worst case it is no-op.

mxinden · 2023-11-21T20:19:39Z

As discussed in the open maintainers call today, @nazar-pc can you research whether you are behind an endpoint indepedent or dependent NAT?

nazar-pc · 2023-11-21T22:50:59Z

I get unique random outgoing port for every localip:localport:remoteip:remoteport combination, here are the states after doing running stun client test against stun.xten.com:

LAN 	udp 	192.168.1.2:30535 -> 216.93.246.18:3478
WAN 	udp 	a.b.c.d:54496 (192.168.1.2:30535) -> 216.93.246.18:3478
LAN 	udp 	192.168.1.2:30536 -> 216.93.246.18:3478
WAN 	udp 	a.b.c.d:24189 (192.168.1.2:30536)
LAN 	udp 	192.168.1.2:30535 -> 216.93.246.15:3478
WAN 	udp 	a.b.c.d:10373 (192.168.1.2:30535) -> 216.93.246.15:3478

So even if my app is making request from port 30535, the actual request will go from a random port. If I make request from that port to a different host, the port will also be different.

The only reason I'm reachable is because I have port forwarding from a.b.c.d:30535 to 192.168.1.2:30535.

I see autonat doesn't actually care about the address, it will replace all received addresses with an observed one anyway and then will try to dial it. So response in autonat doesn't actually corresponds to request directly. What I don't like about this is that autonat server is essentially doing something similar to address translation, but it should still work.

The problem is that because autonat client concatenates observed addresses with listen addresses rather than the other way around, it will in many cases confirm ephemeral external address instead of actual address with QUIC on my machine. I think eventually it will be able to find public address, but certainly not immediately.

Another concern is that all local listen addresses are exposed when probing, which is somewhat revealing when running on the host and not in an isolated container.

thomaseizinger · 2023-11-22T03:06:46Z

So even if my app is making request from port 30535, the actual request will go from a random port. If I make request from that port to a different host, the port will also be different.

That sounds like a symmetric NAT to me. What is odd is that your NAT device doesn't seem to take into account the configured port-forwarding. Whilst not necessary for things to function, it would seem logical to apply the port-forwarding in a reverse fashion for outgoing connections to allow peers to use the observed address.

In this case, address translation would be helpful because you've done a direct mapping of your port-forwarding. But, we don't know that in the application so not doing address translation is as good of a guess as doing it ...

Once we ship #4568, we should be able to improve things here. In that PR, we know whether we reused a port or not for a new connection. This means, libp2p-identify can stop emitting candidates for ephemeral ports altogether.

As a result, we can remove the current address_translation functionality entirely and instead build out some functionality in libp2p-swarm that generates more candidates based on the current listen addresses and candidates emitted by other protocols. Perhaps this functionality should also be in libp2p-identify as it is the one that knows that the candidate was generated via the remote observing our connection.

mergify · 2023-12-01T21:40:41Z

This pull request has merge conflicts. Could you please resolve them @nazar-pc? 🙏

nazar-pc · 2023-12-10T01:14:19Z

So looks like you're saying that address translation will be moved to identify at some point in the future? I guess that makes sense, but I don't see why this PR shouldn't be merged in the meantime.

I think it is pretty clear that there is an issue with QUIC right now and the fact that it works with Autonat v1 is a coincidence because Autonat v1 essentially also does address translation on its end, replacing IP in listen address with public IP, resulting in reachable address overall.

On the surface it also looks like with Autonat v2 QUIC will break if no other changes are done because server-side address translation will not happen anymore (if my understanding is correct).

thomaseizinger · 2024-01-15T02:02:00Z

So looks like you're saying that address translation will be moved to identify at some point in the future?

Yes that is the plan. Identify is the protocol that gathers the observed address, meaning it is its job to filter out ephemeral ports. With the changes of #4568, we know whether an outgoing connection used a new port or the existing listen port. In the case of a new one, we can just discard the observed address because we know that it is garbage and will never be a valid candidate for an external address. Thus, there is no more need to do address translation.

I think it is pretty clear that there is an issue with QUIC right now

It is not really clear to me. Are the nodes that you are observing this on listening for incoming connections or are they only making outgoing connections? If you don't have a listening socket then we will make one if a random port. Perhaps that is the issue you are observing?

nazar-pc force-pushed the fix-quic-address-translation branch from d6339da to 9b6ca0e Compare November 19, 2023 09:49

fix(quic): fix address translation

ae5fe62

nazar-pc force-pushed the fix-quic-address-translation branch from 9b6ca0e to ae5fe62 Compare November 19, 2023 09:51

nazar-pc mentioned this pull request Nov 19, 2023

QUIC address translation subspace/subspace#2245

Merged

1 task

thomaseizinger reviewed Nov 19, 2023

View reviewed changes

mxinden reviewed Nov 20, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(quic): fix address translation #4896

fix(quic): fix address translation #4896

nazar-pc commented Nov 19, 2023

thomaseizinger left a comment

thomaseizinger Nov 19, 2023

nazar-pc Nov 20, 2023

thomaseizinger Nov 19, 2023

thomaseizinger Nov 19, 2023

mxinden Nov 20, 2023

mxinden Nov 20, 2023

mxinden Nov 20, 2023

nazar-pc Nov 20, 2023

thomaseizinger Nov 20, 2023

nazar-pc Nov 20, 2023

thomaseizinger Nov 21, 2023

nazar-pc Nov 21, 2023

mxinden commented Nov 21, 2023

nazar-pc commented Nov 21, 2023

thomaseizinger commented Nov 22, 2023

mergify bot commented Dec 1, 2023

nazar-pc commented Dec 10, 2023

thomaseizinger commented Jan 15, 2024

	/// This function can for example be useful when handling tcp connections. Tcp does not listen and
	/// dial on the same port by default. Thus when receiving an observed address on a connection that
	/// we initiated, it will contain our dialing port, not our listening port. We need to take the ip
	/// address or dns address from the observed address and the port from the original address.

fix(quic): fix address translation #4896

Are you sure you want to change the base?

fix(quic): fix address translation #4896

Conversation

nazar-pc commented Nov 19, 2023

Description

Notes & open questions

Change checklist

thomaseizinger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mxinden commented Nov 21, 2023

nazar-pc commented Nov 21, 2023

thomaseizinger commented Nov 22, 2023

mergify bot commented Dec 1, 2023

nazar-pc commented Dec 10, 2023

thomaseizinger commented Jan 15, 2024