Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowflake simulation fails to dial OR port #3278

Open
cohosh opened this issue Jan 10, 2024 · 8 comments · May be fixed by #3279
Open

Snowflake simulation fails to dial OR port #3278

cohosh opened this issue Jan 10, 2024 · 8 comments · May be fixed by #3279
Labels
Type: Bug Error or flaw producing unexpected results

Comments

@cohosh
Copy link
Contributor

cohosh commented Jan 10, 2024

Describe the issue
The Snowflake server failed to dial the OR port with the following log messages:

2000/01/01 00:02:01 handleConn: failed to connect to ORPort: dial tcp 127.0.0.1:8080: protocol not available

A look at the shadow logs suggests the following warnings might be related (they occur the same number of times as the OR dialing failures):

00:00:00.418855 [203349:shadow-worker] 00:02:01.902394010 [WARN] [snowflakeserver:11.0.0.5] [legacy_tcp.rs:1249] [shadow_rs::host::descriptor::socket::inet::legacy_tcp] setsockopt called with unsupported level 0 and opt 24

Looking at my system's sys/socket.h file (and related files), I've tracked down the level as SOL_IP. I had a harder time tracking down the option but eventually found it:

asm-generic/socket.h
40:#define SO_SECURITY_ENCRYPTION_NETWORK		24

Here's the full shadow logs (at log level info):
shadow.log

This was working for me before, so I'm guessing it's something to do with the newest version of Go.

To Reproduce
Run the minimal snowflake shadow experiment on Debian unstable: https://github.com/cohosh/shadow-snowflake-minimal

Note: this uses Snowflake without Tor

Operating System (please complete the following information):

  • OS and version: Debian GNU/Linux trixie/sid
  • Kernel version:
    Linux 6.6.9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.6.9-1 (2024-01-01) x86_64 GNU/Linux
  • Go version: go1.21.5

Shadow (please complete the following information):

  • Version: v3.1.0
  • Which processes you are trying to run inside the Shadow simulation:
    snowflake, tgen

Additional context

@cohosh cohosh added the Type: Bug Error or flaw producing unexpected results label Jan 10, 2024
@cohosh
Copy link
Contributor Author

cohosh commented Jan 10, 2024

Ah, I think this issue is due to a recent change in Snowflake: https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/commit/9edaee65470a1483bbdbe984e5e15a885f1e95d2

I'm going to take a closer look at those changes, and whether we even need more support for this.

@stevenengler
Copy link
Contributor

stevenengler commented Jan 10, 2024

There seems to be nothing about SO_SECURITY_ENCRYPTION_NETWORK anywhere on the Internet. The only reference in the Linux kernel seems to be:

/* Security levels - as per NRL IPv6 - don't actually do anything */
#define SO_SECURITY_AUTHENTICATION		22
#define SO_SECURITY_ENCRYPTION_TRANSPORT	23
#define SO_SECURITY_ENCRYPTION_NETWORK		24

But I think this SO_SECURITY_ENCRYPTION_NETWORK is for the SOL_SOCKET level, not SOL_IP. For SOL_IP, option 24 corresponds to IP_BIND_ADDRESS_NO_PORT which is an option for deferring the port choice until the connect() call. (You bind() with a port of 0, and instead of Linux choosing an unused port immediately, it waits and chooses one when you call connect() later.)

Edit: Oops, a few seconds too late :)

@cohosh
Copy link
Contributor Author

cohosh commented Jan 10, 2024

You're right, it is IP_BIND_ADDRESS_NO_PORT. I just saw this in the commit linked above:

sockErr = syscall.SetsockoptInt(int(fd), unix.SOL_IP, unix.IP_BIND_ADDRESS_NO_PORT, 1)

@stevenengler
Copy link
Contributor

stevenengler commented Jan 10, 2024

Ah, I think this issue is due to a recent change in Snowflake

Thanks for finding that link. The link also says:

tor does bind-before-connect when the OutboundBindAddress option is set in torrc. Since version 0.4.7.13 (January 2023), tor sets IP_BIND_ADDRESS_NO_PORT unconditionally on platforms that support it, and therefore we must do the same, to avoid EADDRNOTAVAIL errors.

which means support for OutboundBindAddress in tor now requires support for IP_BIND_ADDRESS_NO_PORT. I don't think tornettools uses OutboundBindAddress anywhere. Do you know if your snowflake sims need support for OutboundBindAddress?

But problems arise if there are multiple processes doing bind-before-connect, and some of them use IP_BIND_ADDRESS_NO_PORT and some of them do not. When there is a mix, the ones that do will have their ephemeral ports reserved by the ones that do not, leading to EADDRNOTAVAIL errors.

I'm not super clear about this. It sounds like it's just saying that if snowflake doesn't use IP_BIND_ADDRESS_NO_PORT, then it might use all ephemeral ports before tor (which does use IP_BIND_ADDRESS_NO_PORT) has a chance to call connect() and obtain a port?

Supporting IP_BIND_ADDRESS_NO_PORT in Shadow might be a bit of a pain. If using it could be avoided in tor and snowflake, I think that would be ideal. But I think it's something Shadow could support if it needs to. An easy workaround in Shadow could be to just ignore the option and assign an ephemeral port immediately like usual, which should be roughly the same as long as you don't need more than maybe a few thousand ephemeral ports.

@cohosh
Copy link
Contributor Author

cohosh commented Jan 10, 2024

Do you know if your snowflake sims need support for OutboundBindAddress?

They shouldn't, no. And I don't think we're going to have trouble with running out of ports for the size of the simulations we're doing. I kind of suspect we don't actually need this socket option to do what it's supposed to do, we might just need it to not return an error.

I'm going to poke at it a bit and see if I can find out where this protocol not available error is coming from exactly.

@stevenengler
Copy link
Contributor

I'm going to poke at it a bit and see if I can find out where this protocol not available error is coming from exactly.

On the Shadow side, it's probably coming from ENOPROTOOPT returned by the TCP socket at:

_ => {
log::warn!("setsockopt called with unsupported level {level} and opt {optname}");
return Err(Errno::ENOPROTOOPT.into());
}

ENOPROTOOPT maps to the glibc message "Protocol not available". But I have no idea about what happens on the Go side when a syscall.SetsockoptInt returns ENOPROTOOPT.

@cohosh cohosh linked a pull request Jan 11, 2024 that will close this issue
@cohosh
Copy link
Contributor Author

cohosh commented Jan 11, 2024

Sure enough, just preventing the error makes everything run without issue. I left in a comment about implementing it as a TODO but honestly I don't think we'll ever need it.

@cohosh
Copy link
Contributor Author

cohosh commented Jan 26, 2024

Do you know if your snowflake sims need support for OutboundBindAddress?

They shouldn't, no. And I don't think we're going to have trouble with running out of ports for the size of the simulations we're doing. I kind of suspect we don't actually need this socket option to do what it's supposed to do, we might just need it to not return an error.

I might have been mistaken about this. I don't think the problem occurs from running out of ports, but rather that the process that uses IP_BIND_ADDRESS_NO_PORT will try and use ephemeral ports that it (mistakenly) thinks it has, only to find that another process has started using them. But, since all processes are run in Shadow, they will all be on equal footing w.r.t. whether this socket option is supported so this problem won't arise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Error or flaw producing unexpected results
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants