New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternator's response to oversized requests confuses the client drivers #8195
Comments
In #8196 @avikivity suggested that we should drop the |
I should note that this should probably be considered a bug in the Python HTTP client, not in the server. Other people have noticed this bug in the Python "requests" client too - see psf/requests#2422 (comment) and also issues 5296, 4062, 2422, 1574, 4445, 2490 in that project. The HTTP standard does allow a server to return a response and close the socket even before reading the entire request. In this case, when the client can't write any more ("broken pipe"), it still needs to read from the socket to get a response. Nevertheless, DynamoDB doesn't have this problem. I don't know if it reads the entire request to completion - which is easy to do although a bit silly (do we do this even if the request is 100 MB?), or somehow has another trick to avoid the broken pipe on the client. The HTTP RFC, while it allows sending a response before the request is fully read, mentions that even if the client is correctly written (which, as is now becoming evident, isn't the case), it is difficult to implement correctly in the HTTP server because of the "reset problem" explained in https://tools.ietf.org/html/rfc7230#section-6.6. Seastar's HTTP server probably doesn't handle this properly (it just closes the connection), which is another reason not to send a response before consuming the entire request. A quote from that link:
|
Like DynamoDB, Alternator rejects requests larger than some fixed maximum size (16MB). We had a test for this feature - test_too_large_request, but it was too blunt, and missed two issues: Refs #8195 Refs #8196 So this patch adds two better tests that reproduce these two issues: First, test_too_large_request_chunked verifies that an oversized request is detected even if the body is sent with chunked encoding. Second, both tests - test_too_large_request_chunked and test_too_large_request_content_length - verify that the rather limited (and arguably buggy) Python HTTP client is able to read the 413 status code - and doesn't report some generic I/O error. Both tests pass on DynamoDB, but fail on Alternator because of these two open issues. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210302154555.1488812-1-nyh@scylladb.com>
Like DynamoDB, Alternator rejects requests larger than some fixed maximum size (16MB). We had a test for this feature - test_too_large_request, but it was too blunt, and missed two issues: Refs #8195 Refs #8196 So this patch adds two better tests that reproduce these two issues: First, test_too_large_request_chunked verifies that an oversized request is detected even if the body is sent with chunked encoding. Second, both tests - test_too_large_request_chunked and test_too_large_request_content_length - verify that the rather limited (and arguably buggy) Python HTTP client is able to read the 413 status code - and doesn't report some generic I/O error. Both tests pass on DynamoDB, but fail on Alternator because of these two open issues. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210302154555.1488812-1-nyh@scylladb.com>
Like DynamoDB, Alternator rejects requests larger than some fixed maximum size (16MB). We had a test for this feature - test_too_large_request, but it was too blunt, and missed two issues: Refs #8195 Refs #8196 So this patch adds two better tests that reproduce these two issues: First, test_too_large_request_chunked verifies that an oversized request is detected even if the body is sent with chunked encoding. Second, both tests - test_too_large_request_chunked and test_too_large_request_content_length - verify that the rather limited (and arguably buggy) Python HTTP client is able to read the 413 status code - and doesn't report some generic I/O error. Both tests pass on DynamoDB, but fail on Alternator because of these two open issues. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210302154555.1488812-1-nyh@scylladb.com>
Curiously, this bug seems to have been solved on its own, and It may have been solved not by the AWS library but one of the the Python libraries it uses. I see the urllib3 library recently fixed this issue (see urllib3/urllib3#1524), and it reached urllib3 version 1.26.0 (dated 2020-11-10), I confirmed that botocore does use urllib3, but did not confirm that downgrading urllib3 reproduces the bug. There's still an interesting discussion whether closing the connection as Scylla does is the right thing or we should read the entire request before closing the connection, but I think it's an academic discussion and there's no need to do it if the test passes. We should remove the "xfail" marker (but only if the library is new enough?) and close this issue. |
We had an xfailing test that reproduced a case where Alternator tried to report an error when the request was too long, but the boto library didn't see this error and threw a "Broken Pipe" error instead. It turns out that this wasn't a Scylla bug but rather a bug in urllib3, which overzealously reported a "Broken Pipe" instead of trying to read the server's response. It turns out this issue was already fixed in urllib3/urllib3#1524 and now, on modern installations, the test that used to fail now passes and reports "XPASS". So in this patch we remove the "xfail" tag, and skip the test if running an old version of urllib3. Fixes scylladb#8195 Signed-off-by: Nadav Har'El <nyh@scylladb.com> x
We had an xfailing test that reproduced a case where Alternator tried to report an error when the request was too long, but the boto library didn't see this error and threw a "Broken Pipe" error instead. It turns out that this wasn't a Scylla bug but rather a bug in urllib3, which overzealously reported a "Broken Pipe" instead of trying to read the server's response. It turns out this issue was already fixed in urllib3/urllib3#1524 and now, on modern installations, the test that used to fail now passes and reports "XPASS". So in this patch we remove the "xfail" tag, and skip the test if running an old version of urllib3. Fixes scylladb#8195
We had an xfailing test that reproduced a case where Alternator tried to report an error when the request was too long, but the boto library didn't see this error and threw a "Broken Pipe" error instead. It turns out that this wasn't a Scylla bug but rather a bug in urllib3, which overzealously reported a "Broken Pipe" instead of trying to read the server's response. It turns out this issue was already fixed in urllib3/urllib3#1524 and now, on modern installations, the test that used to fail now passes and reports "XPASS". So in this patch we remove the "xfail" tag, and skip the test if running an old version of urllib3. Fixes scylladb#8195
This is a test patch to enable a test that was previously disabled. It shouldn't be backported. |
We had an xfailing test that reproduced a case where Alternator tried to report an error when the request was too long, but the boto library didn't see this error and threw a "Broken Pipe" error instead. It turns out that this wasn't a Scylla bug but rather a bug in urllib3, which overzealously reported a "Broken Pipe" instead of trying to read the server's response. It turns out this issue was already fixed in urllib3/urllib3#1524 and now, on modern installations, the test that used to fail now passes and reports "XPASS". So in this patch we remove the "xfail" tag, and skip the test if running an old version of urllib3. Fixes scylladb#8195 Closes scylladb#12038
Alternator places a limit (using Seastar's HTTP server feature of
content_length_limit
) on the request size. We have a test,test_manual_requests.py::test_too_large_request
, of what happens when the request's length exceeds the limit. This test still passes because it sees an error in the request - as expected. However, the type of error changed - probably in recent changes to the HTTP server - and we didn't notice. The error that the Cassandra driver shows today when sent an over-sized request (with a Content-Length header saying its over-the-limit size) is:Wireshark shows that the server read the
Content-Length
header, did not read the entire content, and instead sent a response:But evidently the client did not read this response before noticing that the connection was closed and writes it already did got rejected.
It would be better if "common" clients like the Cassandra drivers or even Python's request.post (which currently reports a "broken pipe" in this scenario) would be able to report the 413 error instead of some generic I/O error. Remember that drivers are likely to retry on such generic I/O error, while a permanent 413 error could tell the client not to both retry.
This issue can be reproduced by the tests:
test_manual_requests.py::test_too_large_request_chunked
test_manual_requests.py::test_too_large_request_content_length
Perhaps when sending a response while the client is still sending the content, we should continue reading the content - although AFAIR the RFC is clear that we don't need to. Or, maybe it's enough to not read the content but also not close the connection immediately, to allow the client to hang on the write and read the response before noticing a broken pipe. However, I don't know if this actually works.
Because the
content_length_limit
is a Seastar feature, we will probably need to open a corresponding issue for Seastar.The text was updated successfully, but these errors were encountered: