Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transports/tcp/: Call take_error on TCP socket #2458

Merged
merged 5 commits into from
Feb 1, 2022

Conversation

mxinden
Copy link
Member

@mxinden mxinden commented Jan 29, 2022

Within Provider::new_stream we wait for the socket to become writable
(stream.writable), before returning it as a stream. In other words, we
are waiting for the socket to connect before returning it as a new TCP
connection. Waiting to connect before returning it as a new TCP
connection allows us to catch TCP connection establishment errors early.

While stream.writable drives the process of connecting, it does not
surface potential connection errors themselves. These need to be
explicitly collected via TcpSocket::take_error. If not explicitly
collected, they will surface on future operations on the socket.

For now this commit explicitly calls TcpSocket::take_error when using
async-io only. tokio introduced the method (take_error) in
tokio-rs/tokio#4364 though later reverted it in
tokio-rs/tokio#4392. Once re-reverted, the same
patch can be applied when using libp2p-tcp with tokio.


One example on how this bug surfaces today:

A /dnsaddr/xxx Multiaddr can potentially resolve to multiple IP
addresses, e.g. to the IPv4 and the IPv6 addresses of a node.
libp2p-dns tries dialing each of them in sequence using libp2p-tcp,
returning the first that libp2p-tcp reports as successful.

Say that the local node tries the IPv6 address first. In the scenario
where the local node's networking stack does not support IPv6, e.g. has
no IPv6 route, the connection attempt to the resolved IPv6 address of
the remote node fails. Given that libp2p-tcp does not call
TcpSocket::take_error, it would falsly report the TCP connection
attempt as successful. libp2p-dns would receive the "successful" TCP
connection for the IPv6 address from libp2p-tcp and would not attempt
to dial the IPv4 address, even though it supports IPv4, and instead
bubble up the "successful" IPv6 TCP connection. Only later, when writing
or reading from the "successful" IPv6 TCP connection, would the IPv6
error surface.


Thanks @dignifiedquire for the initial bug report as well as the many debugging infos.

Within `Provider::new_stream` we wait for the socket to become writable
(`stream.writable`), before returning it as a stream. In other words, we
are waiting for the socket to connect before returning it as a new TCP
connection.  Waiting to connect before returning it as a new TCP
connection allows us to catch TCP connection establishment errors early.

While `stream.writable` drives the process of connecting, it does not
surface potential connection errors themselves. These need to be
explicitly collected via `TcpSocket::take_error`. If not explicitly
collected, they will surface on future operations on the socket.

For now this commit explicitly calls `TcpSocket::take_error` when using
`async-io` only. `tokio` introduced the method (`take_error`) in
tokio-rs/tokio#4364 though later reverted it in
tokio-rs/tokio#4392. Once re-reverted, the same
patch can be applied when using `libp2p-tcp` with tokio.

---

One example on how this bug surfaces today:

A `/dnsaddr/xxx` `Multiaddr` can potentially resolve to multiple IP
addresses, e.g. to the IPv4 and the IPv6 addresses of a node.
`libp2p-dns` tries dialing each of them in sequence using `libp2p-tcp`,
returning the first that `libp2p-tcp` reports as successful.

Say that the local node tries the IPv6 address first. In the scenario
where the local node's networking stack does not support IPv6, e.g. has
no IPv6 route, the connection attempt to the resolved IPv6 address of
the remote node fails. Given that `libp2p-tcp` does not call
`TcpSocket::take_error`, it would falsly report the TCP connection
attempt as successful. `libp2p-dns` would receive the "successful" TCP
connection for the IPv6 address from `libp2p-tcp` and would not attempt
to dial the IPv4 address, even though it supports IPv4, and instead
bubble up the "successful" IPv6 TCP connection. Only later, when writing
or reading from the "successful" IPv6 TCP connection, would the IPv6
error surface.
@rkuhn
Copy link
Contributor

rkuhn commented Jan 31, 2022

Looks good to me! Do I understand correctly that a socket becomes writable when failing? (the man-page doesn’t explicitly say so, but it would be consistent with “write would not block”)

@mxinden
Copy link
Member Author

mxinden commented Jan 31, 2022

Do I understand correctly that a socket becomes writable when failing?

That is my understanding.

@mxinden
Copy link
Member Author

mxinden commented Jan 31, 2022

@dignifiedquire could you test whether this patch fixes your issue?

Either via libp2p-lookup:

# Inside the libpp2 lookup repo.
git checkout mxinden async-io-take-error # points at mxinden/rust-libp2p:async-io-take-error
cargo run -- direct --address /dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN

Or via rust-libp2p:

# Inside the rust-libp2p repo.
git checkout mxinden async-io-take-error
cargo run --example ipfs-kad

transports/tcp/CHANGELOG.md Outdated Show resolved Hide resolved
@dignifiedquire
Copy link
Member

@mxinden unfortunately the issue is still not fixed

@mxinden
Copy link
Member Author

mxinden commented Feb 1, 2022

@mxinden unfortunately the issue is still not fixed

Discussed further out of band.

I will move forward here, given that it at least solves parts of the core issue. Still more debugging to come.

Will also create a reminder to do the same change on the tokio side once take_error is re-introduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants