-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cypress:server should really retry requests on network failures #24716
Comments
Hi @ckoncz-hwx. Thank you for logging this issue. What browser are you using for these tests? We made a change about a year ago that prevents issues going through the proxy from retrying as chrome itself changed to also retry failures. This caused a large number of retries and it caused various issues. |
Hi @ryanthemanuel , thank you for looking into this! |
So a bit of context here, courtesy of @flotwig (thanks!).
It's possible that Chromium changed behavior again in a recent version; I'd be curious if you see the same behavior with failed network requests trying with an older version of Chrome, say 97-99? That or reduced reproduction that we can run locally (without spawning 60k requests / waiting 10min) would be very helpful in tracking this down. |
Thank you , @BlueWinds , for the update. The way to reproduce this is to issue two consecutive HTTP requests where the second one is delayed compared to the first one by the target server's keep-alive connection timeout. If we repeat this sequence enough times we might run into a situation, when the client thinks that a connection is still open but the server has already posted the FIN packet. My production server runs CherryPy which has a keep-alive timeout of 10 seconds. While searching the net on how to achieve the same timeout on nodejs (so that I can provide a javascript-based repro environment), I ran across this article which describes a situation quite similar to mine: https://shuheikagawa.com/blog/2019/04/25/keep-alive-timeout/ The recommended solution there is to have a shorter timeout on the load balancer/proxy than on the upstream server. |
That article seems to be describing the same situation, good find. Cypress uses the default node timeout of 5s, and I don't think we have functionality to modify this directly on the server. The proxy does copy most headers from the original request intact - I'm not sure how it would handle a keep-alive timeout set on the request from the browser. Might very well be overridden (and therefore a dead end). One alternative would be to set retries on your tests. I'm assuming this isn't a single test that issues 30k requests. With as rare as the failure is, setting I think we'd probably accept a PR if you wanted to expose the server's |
Yes, configuring However, on the 5 second timeout: the TCP flow included in the description seems to contradict that. The connection between the Cypress server and the server under test was open for more than 10 seconds. My first idea was that there is a one-to-one mapping between client and upstream connections, so if Cypress closes the client connection after the 5s timeout then it will close the matching server connection, too. This does not seem to be the case. Is then there a connection pool for the Cypress-> upstream server connection that is managed independently from the client connections? And then that has a different timeout (larger than 10 seconds)? |
Yes, there's definitely a separate connection pool. All requests from the browser run through the cypress server - a node process which in turn makes calls to the backend server. That's how So it's probably the server pool that has a different timeout. We don't explicitly set a timeout or create a pool anywhere I was able to locate, so it's probably a node default somewhere that's getting used. |
One thing to note is that we internally use the |
@BlueWinds Yeah, we extended Node.js's default Node.js has made a lot of progress on their own Keep-Alive support over the past few years, so maybe we can get rid of some of the logic we have in there to support Keep-Alive for upstream proxied requests. |
Current behavior
I am performing a Cypress run that issues around 30 000 requests to a Cherrypy-powered application.
(I am counting the
cypress:server:request successful response received
messages in the Cypress debug output).Out of these requests there about 1-10 that fail with ECONNRESET errors. Here is one:
(The same formatted manually:)
Despite of having
retryOnNetworkFailure: true
in the message above, I actually see no retry attempts and the browser request is aborted. This results in random test failures (status null for API requests and window.load event timeout for resources included in the HTML page).Was able to track down the above reset event and it seems to be because of bad/unfortunate timing:
the Cherrypy server closes the idle connection after a 10 second timeout but Cypress issues the request just before the FIN has arrived.
See the attached screenshot: the server FIN arrived just 0.2 milliseconds after the second GET is issued by Cypress.
The RST sent by the server is totally justified.
Desired behavior
In this situation Cypress should resubmit the failed request and provide the browser with the real upstream response.
Test code to reproduce
This issue can not be reproduced in a stable manner. However, it occurs with high probability during long Cypress runs.
Cypress Version
10.11.0
Node version
16.14.2
Operating System
Debian GNU/Linux 11 (bullseye) as Docker image cypress/included:10.11.0
Debug Logs
Other
No response
The text was updated successfully, but these errors were encountered: