New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternator tests fails when the connection is closed before the reply is fully read #12143
Comments
@avikivity @raphaelsc I just ran the test using Avi's command line Can one of you for whom this error reproduces please attach here a snippet of the test output which shows what failed, or just send me the output of the failed test run? Thanks. |
Thanks @avikivity. The failure is in if Version(urllib3.__version__) < Version('1.26'):
pytest.skip("urllib3 before 1.26.0 threw broken pipe and did not read response and cause issue #8195. Fixed by pull request urllib3/urllib3#1524")
...
response = requests.post(req.url, headers=req.headers, data=req.body, verify=False)
# In issue #8195, Alternator closed the connection early, causing the
# library to incorrectly throw an exception (Broken Pipe) instead noticing
# the error code 413 which the server did send.
assert response.status_code == 413 As the comment explains, old versions of urllib3 had a bug where it used to throw a Broken Pipe when the server failed to read the entire request, even if the server already sent a response. This was fixed in urllib3, and indeed this test is usually passing, but it appears that there is a race condition which can cause urllib3 to throw:
And this fails the test. @avikivity, @raphaelsc reported that he'd been seeing sporadic failures of this suite for months, so because this specific test was only un-xfailed a week ago (757d2a4), there may be additional rare failures. If one of you has additional test/alternator failure logs you can send me (or just check if it's again |
After writing most of a bug report to urll3lib, I realized that this is NOT a urll3lib bug but a Seastar HTTP bug, which I anticipated in #8195 (comment) bug forgot about. Note that the "Connection Reset By Peer" message above. After the client receives a RST it cannot read the responses, even if it wants to. So it is the responsibility of the server to do what needs to be done to avoid sending this RST, and I'll open an issue and send a patch to xfail/skip/change this test until we fix that, but so that I know whether this patch will Fixes or just Refs this issue, @avikivity / @raphaelsc please help me understand if there are additional test failures, or just this one. |
The title now refers to just one bug, so your fix will fix it. |
In a recent commit 757d2a4, we removed the "xfail" mark from the test test_manual_requests.py::test_too_large_request_content_length because it started to pass on more modern versions of Python, with a urllib3 bug fixed. Unfortunately, the celebration was premature: It turns out that although the test now *usually* passes, it sometimes fails. This is caused by a Seastar bug scylladb/seastar#1325, which I opened scylladb#12166 to track in this project. So unfortunately we need to add the "xfail" mark back to this test. Note that although the test will now be marked "xfail", it will actually pass most of the time, so will appear as "xpass" to people run it. I put a note in the xfail reason string as a reminder why this is happening. Fixes scylladb#12143 Refs scylladb#12166 Refs scylladb/seastar#1325 Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This is a test patch, fixing an earlier test patch that wasn't backported, so no need to backport this one either. |
In a recent commit 757d2a4, we removed the "xfail" mark from the test test_manual_requests.py::test_too_large_request_content_length because it started to pass on more modern versions of Python, with a urllib3 bug fixed. Unfortunately, the celebration was premature: It turns out that although the test now *usually* passes, it sometimes fails. This is caused by a Seastar bug scylladb/seastar#1325, which I opened scylladb#12166 to track in this project. So unfortunately we need to add the "xfail" mark back to this test. Note that although the test will now be marked "xfail", it will actually pass most of the time, so will appear as "xpass" to people run it. I put a note in the xfail reason string as a reminder why this is happening. Fixes scylladb#12143 Refs scylladb#12166 Refs scylladb/seastar#1325 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb#12169
@avikivity and @raphaelsc that the full Alternator test suite has been flaky recently, sporadically failing even on fast machines and release builds (so it's not the usual not-long-enough timeout issue).
@avikivity gave an example:
The text was updated successfully, but these errors were encountered: