Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No limit to exponential growth of PTO #3594

Closed
cliffc-spirent opened this issue Oct 14, 2022 · 11 comments · Fixed by #3595
Closed

No limit to exponential growth of PTO #3594

cliffc-spirent opened this issue Oct 14, 2022 · 11 comments · Fixed by #3595

Comments

@cliffc-spirent
Copy link
Contributor

If I have a network outage lasting, for example, 1 minute, the PTO seems to grow to a similar size in that time due to exponential backoff so that when the outage clears it will still wait 1-2 minutes after the network recovers to resume communication. I think this is due to the code doing (1 << h.ptoCount) in sentPacketHandler's getPTOTimeAndSpace() and there being no limit to ptoCount.

Can there be a configured maximum backoff? Or am I looking in the wrong place? (Or is this precluded by the spec?)

@marten-seemann
Copy link
Member

Would that be allowed by RFC 9002?

@davidfdzp
Copy link

Aren't you considering QUIC for interplanetary links?

@marten-seemann
Copy link
Member

Aren't you considering QUIC for interplanetary links?

It would be foolish to run QUIC unmodified on links with significant propagation delays.

The behavior of the loss detection timer is specific in RFC 9002, in specific here: https://datatracker.ietf.org/doc/html/rfc9002#appendix-A.8. There's no limit to the exponential backoff.

@davidfdzp
Copy link

Aren't you considering QUIC for interplanetary links?

It would be foolish to run QUIC unmodified on links with significant propagation delays.

Wondering where is the limit. What is a significant propagation delay for QUIC? > 1 s (the Moon), 30 s, 1 minute?

@marten-seemann
Copy link
Member

That depends on the kind of application you're running, and how responsive you expect that application to be. As a rule of thumb, retransmitting a lost packet takes one network round-trip. At certain RTTs, you'd probably be willing to invest significant effort into making sure that you're not hit by that.

@cliffc-spirent
Copy link
Contributor Author

It's unfortunate that the RFC doesn't specify a limit. Could I add a parameterized limit that defaults to forever? I would like my connections to resume soon after the network comes back and not up to 2x the duration of the outage.

@cliffc-spirent
Copy link
Contributor Author

To support interplanetary links the limit could be to the number of doublings and not a specific time since the original delay is based on the RTT. So it could be limited to 256 * the single packet loss delay for example.

@marten-seemann
Copy link
Member

I'd be ok with limiting it to 60s, as described in https://datatracker.ietf.org/doc/rfc8961/. The proposed 10s in #3595 seem dangerously short.

@cliffc-spirent
Copy link
Contributor Author

I can change that to 60s. I wrote to quic@ietf.org and they pointed out the 60s from rfc 8961 as well.

@marten-seemann
Copy link
Member

I saw that discussion. I'm ok with introducing a maximum of 60s (doesn't need to be configurable). It doesn't really help you though, does it? 60s still means that there can be minute-long outages.

@cliffc-spirent
Copy link
Contributor Author

60s is definitely an improvement. I think we can live with that and explain the reasoning behind it. Sorry it's taking so long to get the CLA signed -- it's a big company and they have procedures for everything assuming you manage to track down the proper person.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants