Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix case where url is fragmented in httptools protocol #1263

Merged
merged 5 commits into from Feb 16, 2022

Conversation

euri10
Copy link
Member

@euri10 euri10 commented Nov 23, 2021

Fixes #1262

@euri10
Copy link
Member Author

euri10 commented Dec 8, 2021

spent two weeks waiting for OP in the issue to take the lead on this, so no need to wait further, this should be good for a review

@euri10 euri10 requested a review from a team December 8, 2021 15:00
Copy link
Member

@abersheeran abersheeran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

@@ -222,10 +214,20 @@ def on_header(self, name: bytes, value: bytes):

def on_headers_complete(self):
http_version = self.parser.get_http_version()
method = self.parser.get_method()
self.scope["method"] = method.decode("ascii")
if http_version != "1.1":
self.scope["http_version"] = http_version
if self.parser.should_upgrade():
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter that "path", "raw_path", "query_string" are now not populated in the upgrade case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be tempted to say no since the handle_upgrade does not need them and the ws protocols will build the scope themselves down the road.
this said it doesn't hurt I think to put it before, both ways pass the tests, would you prefer it that way ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me know if this is ok for you this way @tomchristie
I added latest master changes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm okay with this, yup. 👍

An alternate approach would be to keep the change footprint absolutely as low as possible. I'm almost always in favour of the lowest possible change footprint in PRs, because they're easier to review and lower risk.

In this case we could alternately approach the PR like this...

    def on_message_begin(self):
         self.url = b""

    def on_url(self, url):
         self.url += url

    def on_url_complete(self):
        # This isn't an `httptools` callback, but we need it because `on_url` can actually
        # be called multiple times, and we don't know on each call if it's complete or not.
        # Instead into this method from `on_headers_complete`, so that we've got a single
        # point at which the URL is set.
        ...  # Existing body of `on_url()`

    def on_headers_complete(self):
        self.on_url_complete()
        ...  # Existing body of `on_headers_complete()`

Which would result in a really small changeset. Which as I say, I tend to think is a great thing.

Having said that, it's not a super complex PR. It looks good already, and I don't want to give you extra work, so gonna okay this and then leave the final decision to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will merge that way, I dont have the time !

@euri10
Copy link
Member Author

euri10 commented Feb 14, 2022

in master the added test gives:

test_http.py::test_fragmentation FAILED                                  [100%]INFO:     Started server process [3037919]
INFO:     Waiting for application startup.
INFO:     ASGI 'lifespan' protocol appears unsupported.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
WARNING:  Invalid HTTP request received.
Traceback (most recent call last):
  File "httptools/parser/parser.pyx", line 258, in httptools.parser.parser.cb_on_url
  File "/home/lotso/PycharmProjects/uvicorn/uvicorn/protocols/http/httptools_impl.py", line 188, in on_url
    parsed_url = httptools.parse_url(url)
  File "httptools/parser/url_parser.pyx", line 105, in httptools.parser.url_parser.parse_url
httptools.parser.errors.HttpParserInvalidURLError: invalid url b'am=qqqqqqqqqq'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/lotso/PycharmProjects/uvicorn/uvicorn/protocols/http/httptools_impl.py", line 125, in data_received
    self.parser.feed_data(data)
  File "httptools/parser/parser.pyx", line 212, in httptools.parser.parser.HttpParser.feed_data
httptools.parser.errors.HttpParserCallbackError: `on_url` callback error

tests/protocols/test_http.py:753 (test_fragmentation)
b'HTTP/1.1 400 Bad Request' != b'HTTP/1.1 400 Bad Request'

Expected :b'HTTP/1.1 400 Bad Request'
Actual   :b'HTTP/1.1 400 Bad Request'
<Click to see difference>

def test_fragmentation():
        def receive_all(sock):
            chunks = []
            while True:
                chunk = sock.recv(1024)
                if not chunk:
                    break
                chunks.append(chunk)
            return b"".join(chunks)
    
        app = Response("Hello, world", media_type="text/plain")
    
        def send_fragmented_req(path):
    
            sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            sock.connect(("127.0.0.1", 8000))
            d = (
                f"GET {path} HTTP/1.1\r\n" "Host: localhost\r\n" "Connection: close\r\n\r\n"
            ).encode()
            split = len(path) // 2
            # send first part
            sock.sendall(d[:split])
            time.sleep(
                0.01
            )  # important, ~100% error, without this the chances of reproducing the error are pretty low
            # send second part
            sock.sendall(d[split:])
            # read response
            resp = receive_all(sock)
            sock.shutdown(socket.SHUT_RDWR)
            sock.close()
            return resp
    
        t = threading.Thread(target=lambda: uvicorn.run(app, http="httptools"))
        t.daemon = True
        t.start()
        time.sleep(1)  # wait for unicorn to start
    
        path = "/?param=" + "q" * 10
        response = send_fragmented_req(path)
        bad_response = b"HTTP/1.1 400 Bad Request"
>       assert bad_response != response[: len(bad_response)]
E       AssertionError: assert b'HTTP/1.1 400 Bad Request' != b'HTTP/1.1 400 Bad Request'

test_http.py:795: AssertionError

@tomchristie
Copy link
Member

Just to make it absolutely clear why we need this, I ran the following to confirm to myself that on_url really does get multiple callbacks...

import httptools


class ShowCallbacks:
    def on_message_begin(self):
        print("on_message_begin")

    def on_url(self, url: bytes):
        print("on_url", url)

    def on_header(self, name: bytes, value: bytes):
        print("on_header", name, value)

    def on_headers_complete(self):
        print("on_headers_complete")

    def on_body(body: bytes):
        print("on_body", body)

    def on_message_complete(self):
        print("on_message_complete")

    def on_chunk_header(self):
        print("on_chunk_header")

    def on_chunk_complete(self):
        print("on_chunk_complete")

    def on_status(self, status: bytes):
        print("on_status", status)


request = b'GET /hello_world HTTP/1.1\r\nHost: localhost\r\nConnection: close\r\n\r\n'
show_callbacks = httptools.HttpRequestParser(ShowCallbacks())

# parse the entire request in one go...
# show_callbacks.feed_data(request)

# parse the request, drip-feeding it byte-by-byte...
for index in range(len(request)):
    single_byte = request[index : index + 1]

Which will output this when run...

on_message_begin
on_url b'/'
on_url b'h'
on_url b'e'
on_url b'l'
on_url b'l'
on_url b'o'
on_url b'_'
on_url b'w'
on_url b'o'
on_url b'r'
on_url b'l'
on_url b'd'
on_url b''
on_header b'Host' b'localhost'
on_header b'Connection' b'close'
on_headers_complete
on_message_complete

Copy link
Member

@tomchristie tomchristie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, yup.

One comment here on an alternate approach, that'd possibly(?) be nice just because of the virtues of aiming for minimal-as-possible PRs wherever possible. But it's not a blocker. You're good either ways.

@euri10 euri10 merged commit 399f90a into encode:master Feb 16, 2022
@euri10 euri10 deleted the 1262_chunked_url branch February 16, 2022 10:43
@euri10
Copy link
Member Author

euri10 commented Feb 16, 2022

big thanks to @div1001 for the repro !

Kludex pushed a commit to sephioh/uvicorn that referenced this pull request Oct 29, 2022
* Fix fragmented url

* Fixed subtle bug introduced by setting self.url in the init, it should be reinitiliazed for every new connection

* Blacked

* Adapted failing tests provided in bug report
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

httptools.parser.errors.HttpParserInvalidURLError on fragmented first line of the HTTP request.
3 participants