Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternator's health-check request doesn't work properly with HTTPS #8691

Closed
nyh opened this issue May 23, 2021 · 8 comments
Closed

Alternator's health-check request doesn't work properly with HTTPS #8691

nyh opened this issue May 23, 2021 · 8 comments
Assignees
Labels
area/alternator Alternator related Issues type/bug
Milestone

Comments

@nyh
Copy link
Contributor

nyh commented May 23, 2021

When Alternator is configured to listen to HTTPS, the health-check service should work with HTTPS as well. This works in Amazon DynamoDB:

$ curl https://dynamodb.us-east-1.amazonaws.com
healthy: dynamodb.us-east-1.amazonaws.com

We have a unit test for this, but to run the test in HTTPS mode we need to do it explicitly, and we don't do it in Jenkins, so this appears to be a regression: test/alternator/run --https test_health.py now fails. When running all tests on https (test/alternator/run --https), most pass, but we have all test_health.py tests fail, and also test_cors.py tests fail and test_scylla.py. We need to fix all of those!

Also noticed by @bhalevy in dtests: https://github.com/scylladb/scylla-dtest/pull/2138.

@nyh nyh added type/bug area/alternator Alternator related Issues labels May 23, 2021
@nyh nyh self-assigned this May 23, 2021
@nyh
Copy link
Contributor Author

nyh commented May 23, 2021

This is most likely a recent regression - the test test/alternator/run --https test_health.py passes on scylla-4.4.1.
I'll do a git bisect now.

@nyh
Copy link
Contributor Author

nyh commented May 23, 2021

Bisection shows that the offending commit is f41dac2 (by me):

commit f41dac2a3ae0b631b7381072dfc7d7fd03b3f1fb
Author: Nadav Har'El <nyh@scylladb.com>
Date:   Wed Mar 10 00:25:25 2021 +0200

    alternator: avoid large contiguous allocation for request body

This is a new commit, which did not reach 4.4 or earlier, but the fix will need to be backported to 4.5.

@nyh
Copy link
Contributor Author

nyh commented May 23, 2021

It turns out that two Seastar bugs conspire to create this problem (@wmitros who's the expert in that new Seastar code):

  1. Seastar's new "request body streaming" feature (scylladb/seastar@4fa0e4b) recognizes when the application's request handler did not fully read its request's body, and marks the connection for closing as it can no longer be reused. However, in this case, we're talking about the health check which is a GET request with an empty body. Because our handler expects no body, it doesn't even try to read it. For some reason with normal HTTP, not reading the zero-length body was fine and the EOF was recognized, but with the HTTPS layer, the server doesn't know the unread body is already at the EOF. The handler would need to do a read to get that EOF on the content-length stream.
  2. Although Scylla can easily skip_entire_stream() in the health handler and fix this problem (I verified), and perhaps should, things should have worked even without it (except the unfortunate disconnection) - our response should have still be sent. I saw that we are setting up a response _content, but it is simply not sent to the client before the disconnection. This appears to be a Seastar bug.

Note that although we can circumvent both Seastar bugs in Scylla by simply using skip_entire_stream() in all Scylla's requests that don't care about the request body (including health check and error messages), I think it is still important to fix these two bugs in Seastar. The second is a straight bug, and the first is a performance problem - it causes an HTTPS server which only cares about GET and no bodies and never has anything in its body to not reuse connections - when it could (and would only need to close the connection if the client unexpectedly sends a body and the server doesn't read it).

@nyh
Copy link
Contributor Author

nyh commented May 23, 2021

I analyzed the two issues listed above and opened two Seastar issues scylladb/seastar#907 and scylladb/seastar#906 respectively.

Note that although fixing scylladb/seastar#906 will solve this issue, fixing that issue alone is not enough. The test will start working, but we won't notice that the connection got closed without reason. To fix that remaining problem we must either fix scylladb/seastar#907 or do a fix in Scylla:

Scylla can do a skip_entire_stream() when it doesn't care about the body. This includes healthcheck, as well as error messages. If we do this, it will also solve another longstanding issue - #8195 (however, the problem there is that the body length limit is done in Seastar, so it can't be fixed in Scylla).

If we do this workaround in Scylla, fixing the two Seastar issues will be good, but not necessary for solving this bug. HOWEVER, this is one extra complication - one of the tests, test_health_only_works_for_root_path, sends a bad GET request ("/abc") and expects a 404. This 404 is sent by Seastar, not Scylla, so we can't fix the not-reading-to-the-end-of-the-input in Scylla. To solve this without changes to Seastar we can change the route we use (server::set_routes) to use any URL, not just "/", and have the Scylla code check whether we got "/" or "/localnodes" (fine) or anything else (404).

@nyh
Copy link
Contributor Author

nyh commented May 23, 2021

I sent a patch to the Seastar mailing list, titled "httpd: allow handler to not read an empty content", fixing Seastar issue 907 and making all the broken HTTPS tests involving empty request bodies to work again.

avikivity pushed a commit to scylladb/seastar that referenced this issue May 24, 2021
Starting in commit 4fa0e4b, an HTTP handler
can read a request's body using an input stream instead of as a string.
If the handler does not read the entire request body, the client can not
send any further requests on this socket, so the server needs to close it.
Conversely, if the handler *did* read the content, the server may keep the
socket open.

To check whether the handler read the content we checked the content
stream's eof() state. However, this may only become true after a read()
noticed the content has reached its end (specified by content-length or
chunked encoding). But one interesting case we missed is a simple handler
which assumes an empty request body, and doesn't bother to read it at all.
In this case, even though the request is empty, and the handler didn't
miss any content by not reading it, we close the connection.

The solution in this patch is to replace the check for eof() by an attempt
to read() from the content stream. In the typical case the handler either
read the entire content or there was nothing to read in the first place,
so this read() will return nothing immediately - and we allow reusing the
connection. If anything is returned by the read(), it is ignored, and the
connection is closed (we may still not read the entire content).

Fixes #907
Refs scylladb/scylladb#8691

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210523184038.425954-1-nyh@scylladb.com>
@nyh
Copy link
Contributor Author

nyh commented May 24, 2021

I confirmed that the Alternator tests with HTTPS work on master now. Thanks @avikivity for the commits.
As I mentioned above, we still need to backport the Seastar patch to 4.5, but not to earlier branches (since this is a recent regression).

avikivity pushed a commit to scylladb/scylla-seastar that referenced this issue May 25, 2021
Starting in commit 4fa0e4b, an HTTP handler
can read a request's body using an input stream instead of as a string.
If the handler does not read the entire request body, the client can not
send any further requests on this socket, so the server needs to close it.
Conversely, if the handler *did* read the content, the server may keep the
socket open.

To check whether the handler read the content we checked the content
stream's eof() state. However, this may only become true after a read()
noticed the content has reached its end (specified by content-length or
chunked encoding). But one interesting case we missed is a simple handler
which assumes an empty request body, and doesn't bother to read it at all.
In this case, even though the request is empty, and the handler didn't
miss any content by not reading it, we close the connection.

The solution in this patch is to replace the check for eof() by an attempt
to read() from the content stream. In the typical case the handler either
read the entire content or there was nothing to read in the first place,
so this read() will return nothing immediately - and we allow reusing the
connection. If anything is returned by the read(), it is ignored, and the
connection is closed (we may still not read the entire content).

Fixes #907
Refs scylladb/scylladb#8691

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210523184038.425954-1-nyh@scylladb.com>
(cherry picked from commit f0f28d0)
avikivity added a commit that referenced this issue May 25, 2021
* seastar dadd299e7d...dab10ba6ad (1):
  > httpd: allow handler to not read an empty content
Fixes #8691.
@avikivity
Copy link
Member

Backported to 4.5.

@nyh
Copy link
Contributor Author

nyh commented May 25, 2021

Thanks @avikivity !

@DoronArazii DoronArazii added this to the 4.6 milestone May 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/alternator Alternator related Issues type/bug
Projects
None yet
Development

No branches or pull requests

4 participants