Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(quic): fix address translation #4896

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nazar-pc
Copy link
Contributor

Description

This fixes address translation for QUIC that was essentially non-existent before.

Notes & open questions

Test is analogous to corresponding test in TCP protocol implementation. I have added test for both async-std and tokio, even though compiling crate with just tokio feature enabled causes a lot of compilation warnings.

What I'm not sure about is whether old behavior is intentional, but to be it seemed like a bug that caused major issues.

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I wonder why it was implemented like this ..

Maybe @mxinden remembers.

Comment on lines +273 to +277
let mut iter = translated.iter();
assert_eq!(iter.next(), Some(Protocol::Ip4(observed_ip)));
assert_eq!(iter.next(), Some(Protocol::Udp(port)));
assert_eq!(iter.next(), Some(Protocol::QuicV1));
assert_eq!(iter.next(), None);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be cleaner to use assert_eq against a constructed address?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is translated 1:1 from TCP test, I can update it, but then TCP should probably be updated too (and maybe others?).

.with(Protocol::Ip4(observed_ip))
.with(Protocol::Udp(1))
.with(Protocol::QuicV1)
.with(Protocol::P2p(PeerId::random()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this one important?

Comment on lines +259 to +262
let quic_listen_addr = Multiaddr::empty()
.with(Protocol::Ip4(Ipv4Addr::new(127, 0, 0, 1)))
.with(Protocol::Udp(port))
.with(Protocol::QuicV1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a helper function that takes in an Ipv4Addr and a port would make this test a bit more concise.

Comment on lines -256 to +257
Some(observed.clone())
address_translation(listen, observed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what issue this is addressing.

The libp2p_core::address_translation function takes the port of listen and the IP of observed. This is relevant in the case of TCP without port-reuse where one does not listen on the port of an outgoing connection, i.e. here were one does not listen of the port of observed.

The relevant section of the address_translation doc comment:

/// This function can for example be useful when handling tcp connections. Tcp does not listen and
/// dial on the same port by default. Thus when receiving an observed address on a connection that
/// we initiated, it will contain our dialing port, not our listening port. We need to take the ip
/// address or dns address from the observed address and the port from the original address.

I argue that this is not relevant for QUIC, given that QUIC will use the same UDP port for both outgoing and incoming connections. The port of observed is either equal to the port of listen, or, if the local node is behind a NAT, equal to the public NAT port mapping of listen. In the former case address_translation is a no-op. In the latter case address_translation returns the wrong port, namely LAN port, not the public Internet port.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that with the upcoming redesign of #4568 we can get rid of Transport::address_translation. Whether a connection has been established using port-reuse or not will be visible to a NetworkBehaviour. Thus e.g. libp2p-identify receiving an observed address can judge itself, whether it needs to alter any ports.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also see #2289 (comment) for past discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what issue this is addressing.

This addresses the problem of ephemeral ports with QUIC. At least when behind NAT with port forwarding, without address translation you'll end up with invalid external port. The more annoying thing is that it will be valid for the duration of Autonat probing, but when later anyone else tries to reach the node, the port is no longer functional.

The port of observed is either equal to the port of listen, or, if the local node is behind a NAT, equal to the public NAT port mapping of listen.

This is most certainly not true in practice, at least in some cases. Even on my machine with pfSense acting as a router, I have seen random ports all over the place observed by remote nodes.

In the former case address_translation is a no-op.

Yes

In the latter case address_translation returns the wrong port, namely LAN port, not the public Internet port.

No, in this case address translation will result in correct port. I guess it depends on how to do port forwarding, it should be possible for router to map ports both ways, but I can't imagine many users actually doing it, it seems that external ports are mapped onto LAN ports, but probably rarely outgoing LAN ports are mapped onto specific WAN ports. This is my educated guess based on observation, I'm not claiming some consumer routers are not doing that.

Note that with the upcoming redesign of #4568 we can get rid of Transport::address_translation. Whether a connection has been established using port-reuse or not will be visible to a NetworkBehaviour. Thus e.g. libp2p-identify receiving an observed address can judge itself, whether it needs to alter any ports.

Interesting. Well, then this will need to be adjusted when that happens, but in the meantime network needs to function and lack of address translation was a big issue for us since many nodes were not able to discover their public addresses with QUIC (that we use almost exclusively in place of TCP now).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also see #2289 (comment) for past discussion.

This means the current implementation is still wrong though as it will happily return an observed TCP address as the translated address. If QUIC is "before" TCP in the transport stack, it would return a wrong address, wouldn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See lines 252-256 above, the protocol is checked there and returns None if it doesn't match. Same is done in TCP and it works properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See lines 252-256 above, the protocol is checked there and returns None if it doesn't match. Same is done in TCP and it works properly.

Ah you are right, I missed that because the diff didn't show it 🙄

I don't really understand why we need to change anything here then. QUIC doesn't have a concept of ephemeral ports so address_translation can't do anything useful. If you still observe changing ports with QUIC, it might be that you are sitting behind a symmetric NAT. We can't do anything against that I am afraid.

The more annoying thing is that it will be valid for the duration of Autonat probing, but when later anyone else tries to reach the node, the port is no longer functional.

We are aware of this issue. Unfortunately, it is a design flaw in AutoNAT and one of the things that prompted the design of AutoNATv2. In v2, the server uses a different outgoing port to perform the dial-back, meaning we don't accidentally "hole-punch" through the NAT and get a more trustworthy result that the address is actually reachable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you still observe changing ports with QUIC, it might be that you are sitting behind a symmetric NAT. We can't do anything against that I am afraid.

But I still might have port forwarding configured such that the port matching my listening port is still reachable from the outside. This is what we recommend our users to do when they are behind NAT and it seems to work. It doesn't cover the case when forwarded port doesn't match listening port, in that case user has to specify external address explicitly, but it is still useful to do address translation, as mentioned above, in worst case it is no-op.

@mxinden
Copy link
Member

mxinden commented Nov 21, 2023

As discussed in the open maintainers call today, @nazar-pc can you research whether you are behind an endpoint indepedent or dependent NAT?

@nazar-pc
Copy link
Contributor Author

I get unique random outgoing port for every localip:localport:remoteip:remoteport combination, here are the states after doing running stun client test against stun.xten.com:

LAN 	udp 	192.168.1.2:30535 -> 216.93.246.18:3478
WAN 	udp 	a.b.c.d:54496 (192.168.1.2:30535) -> 216.93.246.18:3478
LAN 	udp 	192.168.1.2:30536 -> 216.93.246.18:3478
WAN 	udp 	a.b.c.d:24189 (192.168.1.2:30536)
LAN 	udp 	192.168.1.2:30535 -> 216.93.246.15:3478
WAN 	udp 	a.b.c.d:10373 (192.168.1.2:30535) -> 216.93.246.15:3478

So even if my app is making request from port 30535, the actual request will go from a random port. If I make request from that port to a different host, the port will also be different.

The only reason I'm reachable is because I have port forwarding from a.b.c.d:30535 to 192.168.1.2:30535.

I see autonat doesn't actually care about the address, it will replace all received addresses with an observed one anyway and then will try to dial it. So response in autonat doesn't actually corresponds to request directly. What I don't like about this is that autonat server is essentially doing something similar to address translation, but it should still work.

The problem is that because autonat client concatenates observed addresses with listen addresses rather than the other way around, it will in many cases confirm ephemeral external address instead of actual address with QUIC on my machine. I think eventually it will be able to find public address, but certainly not immediately.

Another concern is that all local listen addresses are exposed when probing, which is somewhat revealing when running on the host and not in an isolated container.

@thomaseizinger
Copy link
Contributor

So even if my app is making request from port 30535, the actual request will go from a random port. If I make request from that port to a different host, the port will also be different.

That sounds like a symmetric NAT to me. What is odd is that your NAT device doesn't seem to take into account the configured port-forwarding. Whilst not necessary for things to function, it would seem logical to apply the port-forwarding in a reverse fashion for outgoing connections to allow peers to use the observed address.

In this case, address translation would be helpful because you've done a direct mapping of your port-forwarding. But, we don't know that in the application so not doing address translation is as good of a guess as doing it ...

Once we ship #4568, we should be able to improve things here. In that PR, we know whether we reused a port or not for a new connection. This means, libp2p-identify can stop emitting candidates for ephemeral ports altogether.

As a result, we can remove the current address_translation functionality entirely and instead build out some functionality in libp2p-swarm that generates more candidates based on the current listen addresses and candidates emitted by other protocols. Perhaps this functionality should also be in libp2p-identify as it is the one that knows that the candidate was generated via the remote observing our connection.

Copy link

mergify bot commented Dec 1, 2023

This pull request has merge conflicts. Could you please resolve them @nazar-pc? 🙏

@nazar-pc
Copy link
Contributor Author

So looks like you're saying that address translation will be moved to identify at some point in the future? I guess that makes sense, but I don't see why this PR shouldn't be merged in the meantime.

I think it is pretty clear that there is an issue with QUIC right now and the fact that it works with Autonat v1 is a coincidence because Autonat v1 essentially also does address translation on its end, replacing IP in listen address with public IP, resulting in reachable address overall.

On the surface it also looks like with Autonat v2 QUIC will break if no other changes are done because server-side address translation will not happen anymore (if my understanding is correct).

@thomaseizinger
Copy link
Contributor

So looks like you're saying that address translation will be moved to identify at some point in the future?

Yes that is the plan. Identify is the protocol that gathers the observed address, meaning it is its job to filter out ephemeral ports. With the changes of #4568, we know whether an outgoing connection used a new port or the existing listen port. In the case of a new one, we can just discard the observed address because we know that it is garbage and will never be a valid candidate for an external address. Thus, there is no more need to do address translation.

I think it is pretty clear that there is an issue with QUIC right now

It is not really clear to me. Are the nodes that you are observing this on listening for incoming connections or are they only making outgoing connections? If you don't have a listening socket then we will make one if a random port. Perhaps that is the issue you are observing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants