Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck in Retransmission #836

Open
hunhoffe opened this issue Aug 23, 2023 · 2 comments
Open

Stuck in Retransmission #836

hunhoffe opened this issue Aug 23, 2023 · 2 comments

Comments

@hunhoffe
Copy link
Contributor

Hello,

This is probably a user error but I'm unsure how, and I've been unable to replicate it in more minimal example/unit tests. I'm seeing that a TCP connection gets stuck in a state where one endpoint is reporting retransmission. Data stops flowing.

For example, in one endpoint, the smoltcp debug output from the entire connection looks something like this:

2595182934 [DEBUG] - smoltcp::iface::interface: address 172.31.0.20 not in neighbor cache, sending ARP request
22411633806 [DEBUG] - smoltcp::socket::tcp: sending sACK option with current assembler ranges
22414587508 [DEBUG] - smoltcp::socket::tcp: sending sACK option with current assembler ranges
22989792580 [DEBUG] - smoltcp::socket::tcp: sending sACK option with current assembler ranges
22992740658 [DEBUG] - smoltcp::socket::tcp: sending sACK option with current assembler ranges
23710606986 [DEBUG] - smoltcp::socket::tcp: sending sACK option with current assembler ranges
23713578044 [DEBUG] - smoltcp::socket::tcp: sending sACK option with current assembler ranges
24291406372 [DEBUG] - smoltcp::socket::tcp: sending sACK option with current assembler ranges
24297777220 [DEBUG] - smoltcp::socket::tcp: sending sACK option with current assembler ranges
34125075858 [DEBUG] - smoltcp::socket::tcp: sending sACK option with current assembler ranges
34198003734 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.035s
34205010890 [DEBUG] - smoltcp::socket::tcp: sending sACK option with current assembler ranges
40444139694 [DEBUG] - smoltcp::socket::tcp: sending sACK option with current assembler ranges
40512295590 [DEBUG] - smoltcp::socket::tcp: sending sACK option with current assembler ranges
40527663952 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.038s
46695298770 [DEBUG] - smoltcp::socket::tcp: sending sACK option with current assembler ranges
46763925288 [DEBUG] - smoltcp::socket::tcp: sending sACK option with current assembler ranges
46779286840 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.038s
47018390980 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: received duplicate ACK for seq 1631339880 (duplicate nr 1)
47055394618 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.025s
47105578428 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: reset duplicate ACK count
47118221322 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.028s
47166434916 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.025s
47194978896 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: segment not in receive window (33409..33445 not intersecting 33445..41637), will send challenge ACK
47199268658 [DEBUG] - smoltcp::socket::tcp: sending sACK option with current assembler ranges
< no more data is sent/received past this point >
47246045928 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.038s
47325664856 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.038s
47405272454 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.038s
47539355940 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.064s
47673440094 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.064s
47807524226 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.064s
48050556940 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.116s
48293588128 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.116s
48536619018 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.116s
48997531214 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.220s
49458447640 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.220s
49919367822 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.220s
50816062896 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.428s
51712750286 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.428s
52609444680 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.428s
54377700926 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.844s
56145945782 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.844s
57914214806 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+0.844s
61425532650 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+1.676s
64936884838 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+1.676s
68448240406 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+1.676s
75445794208 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+3.340s
82443357336 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+3.340s
89440915258 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+3.340s
103410897126 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+6.668s
117380877286 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+6.668s
131350855552 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+6.668s
152301637296 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.11:10110:172.31.0.12:10110: retransmitting at t+10.000s
< and it continues... >

The other endpoint reports something like this. It is stuck in a poll loop waiting to receive some data:

2856582960 [DEBUG] - smoltcp::iface::interface: address 172.31.0.11 not in neighbor cache, sending ARP request
17736485762 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.12:10110:172.31.0.11:10110: received duplicate ACK for seq 15897 (duplicate nr 1)
17808106524 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.12:10110:172.31.0.11:10110: retransmitting at t+0.037s
17811415352 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.12:10110:172.31.0.11:10110: reset duplicate ACK count
24055259082 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.12:10110:172.31.0.11:10110: received duplicate ACK for seq 24537 (duplicate nr 1)
24114295806 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.12:10110:172.31.0.11:10110: retransmitting at t+0.031s
24123349318 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.12:10110:172.31.0.11:10110: reset duplicate ACK count
30306398102 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.12:10110:172.31.0.11:10110: received duplicate ACK for seq 33069 (duplicate nr 1)
30368099134 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.12:10110:172.31.0.11:10110: retransmitting at t+0.032s
30374968356 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.12:10110:172.31.0.11:10110: reset duplicate ACK count
30710389408 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.12:10110:172.31.0.11:10110: retransmitting at t+0.042s
< no more data is send/received after this point >
30799851290 [DEBUG] - smoltcp::socket::tcp: tcp:172.31.0.12:10110:172.31.0.11:10110: retransmitting at t+0.043s

I've confirmed both endpoints are polling the interface regularly. I've confirmed that the clock I'm using - while it may return the same value - it never goes backwards. I am a bit concerned that it might be a timing error (I'm using the rawtime library to get the current time), but I'm not sure what other clock requirement I need to check for other than monotonicity. I am also using a custom device, which could perhaps be the issue.

I'm using smoltcp v0.8.0. I did go through some previous issues/PRs to see if someone else has reported something similar, and I saw some things about retransmissions but I think those issues were merged before the v0.8.0 release, if I am reading the release dates correctly. So I'm not sure if I should invest time in upgrading to a newer release.

Any insight would be appreciated for how to get to the root of this issue!

@Dirbaio
Copy link
Member

Dirbaio commented Aug 23, 2023

Check packet dumps of both sides, you'll hopefully see one side is transmitting packets that never get to the other side. Could be something wrong with the phy impl or something in the link. If you see only big packets getting dropped it could be a MTU issue, it's common for it to give trouble if you're tunneling/VPNing.

@korken89
Copy link
Contributor

Hi,

In our system we're seeing a similar issue where the TCP connection gets stuck when we send a lot of small packets.
It seems like a packet is somehow lost and smoltcp tries to re-transmit repeatedly.
The interesting part is that whenever smoltcp logs a re-transmit nothing is detected on the wire when looking with wireshark on the computer to which this TCP socket is connected, this is the only measurable effect we've been able to capture as of yet.

We're trying to create a reproducible test so we can make a proper bug report, but the end effect is that smoltcp seems to get stuck as here, until tokio's TCP stream hits timeout which then causes the sockets to be recreated and and the problem is alleviated until the next time it happens.

If it's this issue we are hitting or a separate one I'll let our investigation into the issue show.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants