New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix case where url is fragmented in httptools protocol #1263
Conversation
…d be reinitiliazed for every new connection
spent two weeks waiting for OP in the issue to take the lead on this, so no need to wait further, this should be good for a review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good!
@@ -222,10 +214,20 @@ def on_header(self, name: bytes, value: bytes): | |||
|
|||
def on_headers_complete(self): | |||
http_version = self.parser.get_http_version() | |||
method = self.parser.get_method() | |||
self.scope["method"] = method.decode("ascii") | |||
if http_version != "1.1": | |||
self.scope["http_version"] = http_version | |||
if self.parser.should_upgrade(): | |||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter that "path"
, "raw_path"
, "query_string"
are now not populated in the upgrade case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be tempted to say no since the handle_upgrade
does not need them and the ws protocols will build the scope themselves down the road.
this said it doesn't hurt I think to put it before, both ways pass the tests, would you prefer it that way ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me know if this is ok for you this way @tomchristie
I added latest master changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm okay with this, yup. 👍
An alternate approach would be to keep the change footprint absolutely as low as possible. I'm almost always in favour of the lowest possible change footprint in PRs, because they're easier to review and lower risk.
In this case we could alternately approach the PR like this...
def on_message_begin(self):
self.url = b""
def on_url(self, url):
self.url += url
def on_url_complete(self):
# This isn't an `httptools` callback, but we need it because `on_url` can actually
# be called multiple times, and we don't know on each call if it's complete or not.
# Instead into this method from `on_headers_complete`, so that we've got a single
# point at which the URL is set.
... # Existing body of `on_url()`
def on_headers_complete(self):
self.on_url_complete()
... # Existing body of `on_headers_complete()`
Which would result in a really small changeset. Which as I say, I tend to think is a great thing.
Having said that, it's not a super complex PR. It looks good already, and I don't want to give you extra work, so gonna okay this and then leave the final decision to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will merge that way, I dont have the time !
in master the added test gives:
|
Just to make it absolutely clear why we need this, I ran the following to confirm to myself that import httptools
class ShowCallbacks:
def on_message_begin(self):
print("on_message_begin")
def on_url(self, url: bytes):
print("on_url", url)
def on_header(self, name: bytes, value: bytes):
print("on_header", name, value)
def on_headers_complete(self):
print("on_headers_complete")
def on_body(body: bytes):
print("on_body", body)
def on_message_complete(self):
print("on_message_complete")
def on_chunk_header(self):
print("on_chunk_header")
def on_chunk_complete(self):
print("on_chunk_complete")
def on_status(self, status: bytes):
print("on_status", status)
request = b'GET /hello_world HTTP/1.1\r\nHost: localhost\r\nConnection: close\r\n\r\n'
show_callbacks = httptools.HttpRequestParser(ShowCallbacks())
# parse the entire request in one go...
# show_callbacks.feed_data(request)
# parse the request, drip-feeding it byte-by-byte...
for index in range(len(request)):
single_byte = request[index : index + 1] Which will output this when run... on_message_begin
on_url b'/'
on_url b'h'
on_url b'e'
on_url b'l'
on_url b'l'
on_url b'o'
on_url b'_'
on_url b'w'
on_url b'o'
on_url b'r'
on_url b'l'
on_url b'd'
on_url b''
on_header b'Host' b'localhost'
on_header b'Connection' b'close'
on_headers_complete
on_message_complete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work, yup.
One comment here on an alternate approach, that'd possibly(?) be nice just because of the virtues of aiming for minimal-as-possible PRs wherever possible. But it's not a blocker. You're good either ways.
big thanks to @div1001 for the repro ! |
* Fix fragmented url * Fixed subtle bug introduced by setting self.url in the init, it should be reinitiliazed for every new connection * Blacked * Adapted failing tests provided in bug report
Fixes #1262