Fix case where url is fragmented in httptools protocol #1263

euri10 · 2021-11-23T13:19:01Z

…d be reinitiliazed for every new connection

euri10 · 2021-12-08T15:00:02Z

spent two weeks waiting for OP in the issue to take the lead on this, so no need to wait further, this should be good for a review

abersheeran

looks good!

tomchristie · 2021-12-08T15:52:45Z

uvicorn/protocols/http/httptools_impl.py

@@ -222,10 +214,20 @@ def on_header(self, name: bytes, value: bytes):

    def on_headers_complete(self):
        http_version = self.parser.get_http_version()
+        method = self.parser.get_method()
+        self.scope["method"] = method.decode("ascii")
        if http_version != "1.1":
            self.scope["http_version"] = http_version
        if self.parser.should_upgrade():
            return


Does it matter that "path", "raw_path", "query_string" are now not populated in the upgrade case?

I'd be tempted to say no since the handle_upgrade does not need them and the ws protocols will build the scope themselves down the road.
this said it doesn't hurt I think to put it before, both ways pass the tests, would you prefer it that way ?

let me know if this is ok for you this way @tomchristie
I added latest master changes

I think I'm okay with this, yup. 👍

An alternate approach would be to keep the change footprint absolutely as low as possible. I'm almost always in favour of the lowest possible change footprint in PRs, because they're easier to review and lower risk.

In this case we could alternately approach the PR like this...

def on_message_begin(self): self.url = b"" def on_url(self, url): self.url += url def on_url_complete(self): # This isn't an `httptools` callback, but we need it because `on_url` can actually # be called multiple times, and we don't know on each call if it's complete or not. # Instead into this method from `on_headers_complete`, so that we've got a single # point at which the URL is set. ... # Existing body of `on_url()` def on_headers_complete(self): self.on_url_complete() ... # Existing body of `on_headers_complete()`

Which would result in a really small changeset. Which as I say, I tend to think is a great thing.

Having said that, it's not a super complex PR. It looks good already, and I don't want to give you extra work, so gonna okay this and then leave the final decision to you.

ok will merge that way, I dont have the time !

euri10 · 2022-02-14T18:19:05Z

in master the added test gives:

test_http.py::test_fragmentation FAILED                                  [100%]INFO:     Started server process [3037919]
INFO:     Waiting for application startup.
INFO:     ASGI 'lifespan' protocol appears unsupported.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
WARNING:  Invalid HTTP request received.
Traceback (most recent call last):
  File "httptools/parser/parser.pyx", line 258, in httptools.parser.parser.cb_on_url
  File "/home/lotso/PycharmProjects/uvicorn/uvicorn/protocols/http/httptools_impl.py", line 188, in on_url
    parsed_url = httptools.parse_url(url)
  File "httptools/parser/url_parser.pyx", line 105, in httptools.parser.url_parser.parse_url
httptools.parser.errors.HttpParserInvalidURLError: invalid url b'am=qqqqqqqqqq'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/lotso/PycharmProjects/uvicorn/uvicorn/protocols/http/httptools_impl.py", line 125, in data_received
    self.parser.feed_data(data)
  File "httptools/parser/parser.pyx", line 212, in httptools.parser.parser.HttpParser.feed_data
httptools.parser.errors.HttpParserCallbackError: `on_url` callback error

tests/protocols/test_http.py:753 (test_fragmentation)
b'HTTP/1.1 400 Bad Request' != b'HTTP/1.1 400 Bad Request'

Expected :b'HTTP/1.1 400 Bad Request'
Actual   :b'HTTP/1.1 400 Bad Request'
<Click to see difference>

def test_fragmentation():
        def receive_all(sock):
            chunks = []
            while True:
                chunk = sock.recv(1024)
                if not chunk:
                    break
                chunks.append(chunk)
            return b"".join(chunks)
    
        app = Response("Hello, world", media_type="text/plain")
    
        def send_fragmented_req(path):
    
            sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            sock.connect(("127.0.0.1", 8000))
            d = (
                f"GET {path} HTTP/1.1\r\n" "Host: localhost\r\n" "Connection: close\r\n\r\n"
            ).encode()
            split = len(path) // 2
            # send first part
            sock.sendall(d[:split])
            time.sleep(
                0.01
            )  # important, ~100% error, without this the chances of reproducing the error are pretty low
            # send second part
            sock.sendall(d[split:])
            # read response
            resp = receive_all(sock)
            sock.shutdown(socket.SHUT_RDWR)
            sock.close()
            return resp
    
        t = threading.Thread(target=lambda: uvicorn.run(app, http="httptools"))
        t.daemon = True
        t.start()
        time.sleep(1)  # wait for unicorn to start
    
        path = "/?param=" + "q" * 10
        response = send_fragmented_req(path)
        bad_response = b"HTTP/1.1 400 Bad Request"
>       assert bad_response != response[: len(bad_response)]
E       AssertionError: assert b'HTTP/1.1 400 Bad Request' != b'HTTP/1.1 400 Bad Request'

test_http.py:795: AssertionError

tomchristie · 2022-02-15T15:00:45Z

Just to make it absolutely clear why we need this, I ran the following to confirm to myself that on_url really does get multiple callbacks...

import httptools


class ShowCallbacks:
    def on_message_begin(self):
        print("on_message_begin")

    def on_url(self, url: bytes):
        print("on_url", url)

    def on_header(self, name: bytes, value: bytes):
        print("on_header", name, value)

    def on_headers_complete(self):
        print("on_headers_complete")

    def on_body(body: bytes):
        print("on_body", body)

    def on_message_complete(self):
        print("on_message_complete")

    def on_chunk_header(self):
        print("on_chunk_header")

    def on_chunk_complete(self):
        print("on_chunk_complete")

    def on_status(self, status: bytes):
        print("on_status", status)


request = b'GET /hello_world HTTP/1.1\r\nHost: localhost\r\nConnection: close\r\n\r\n'
show_callbacks = httptools.HttpRequestParser(ShowCallbacks())

# parse the entire request in one go...
# show_callbacks.feed_data(request)

# parse the request, drip-feeding it byte-by-byte...
for index in range(len(request)):
    single_byte = request[index : index + 1]

Which will output this when run...

on_message_begin
on_url b'/'
on_url b'h'
on_url b'e'
on_url b'l'
on_url b'l'
on_url b'o'
on_url b'_'
on_url b'w'
on_url b'o'
on_url b'r'
on_url b'l'
on_url b'd'
on_url b''
on_header b'Host' b'localhost'
on_header b'Connection' b'close'
on_headers_complete
on_message_complete

tomchristie

Good work, yup.

One comment here on an alternate approach, that'd possibly(?) be nice just because of the virtues of aiming for minimal-as-possible PRs wherever possible. But it's not a blocker. You're good either ways.

euri10 · 2022-02-16T10:44:41Z

big thanks to @div1001 for the repro !

* Fix fragmented url * Fixed subtle bug introduced by setting self.url in the init, it should be reinitiliazed for every new connection * Blacked * Adapted failing tests provided in bug report

Fix fragmented url

5f1844a

euri10 mentioned this pull request Nov 23, 2021

httptools.parser.errors.HttpParserInvalidURLError on fragmented first line of the HTTP request. #1262

Closed

2 tasks

Fixed subtle bug introduced by setting self.url in the init, it shoul…

007fdaf

…d be reinitiliazed for every new connection

euri10 requested a review from a team December 8, 2021 15:00

abersheeran approved these changes Dec 8, 2021

View reviewed changes

tomchristie reviewed Dec 8, 2021

View reviewed changes

euri10 added 2 commits February 14, 2022 10:01

Merge branch 'master' into 1262_chunked_url

1124296

Blacked

e6e5510

euri10 requested a review from tomchristie February 14, 2022 09:14

Adapted failing tests provided in bug report

634697b

tomchristie approved these changes Feb 15, 2022

View reviewed changes

euri10 merged commit 399f90a into encode:master Feb 16, 2022

euri10 deleted the 1262_chunked_url branch February 16, 2022 10:43

This was referenced Feb 16, 2022

Sporatic errors with nginx connection pipelining #344

Closed

Handle exception when receive request with custom method #1296

Closed

frankie567 mentioned this pull request Feb 20, 2022

File upload & streamed response not working through NGINX tiangolo/fastapi#4600

Closed

9 tasks

euri10 mentioned this pull request Jul 1, 2022

Fix skipping tests requiring httptools #1538

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix case where url is fragmented in httptools protocol #1263

Fix case where url is fragmented in httptools protocol #1263

euri10 commented Nov 23, 2021

euri10 commented Dec 8, 2021

abersheeran left a comment

tomchristie Dec 8, 2021

euri10 Dec 8, 2021

euri10 Feb 14, 2022

tomchristie Feb 15, 2022

euri10 Feb 16, 2022

euri10 commented Feb 14, 2022

tomchristie commented Feb 15, 2022

tomchristie left a comment

euri10 commented Feb 16, 2022

Fix case where url is fragmented in httptools protocol #1263

Fix case where url is fragmented in httptools protocol #1263

Conversation

euri10 commented Nov 23, 2021

euri10 commented Dec 8, 2021

abersheeran left a comment

Choose a reason for hiding this comment

tomchristie Dec 8, 2021

Choose a reason for hiding this comment

euri10 Dec 8, 2021

Choose a reason for hiding this comment

euri10 Feb 14, 2022

Choose a reason for hiding this comment

tomchristie Feb 15, 2022

Choose a reason for hiding this comment

euri10 Feb 16, 2022

Choose a reason for hiding this comment

euri10 commented Feb 14, 2022

tomchristie commented Feb 15, 2022

tomchristie left a comment

Choose a reason for hiding this comment

euri10 commented Feb 16, 2022