Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#8570 Increase HTTP protocol length limits #12094

Open
wants to merge 8 commits into
base: trunk
Choose a base branch
from

Conversation

Gallaecio
Copy link
Contributor

@Gallaecio Gallaecio commented Jan 26, 2024

Scope and purpose

Fixes #8570, fixes scrapy/scrapy#355, closes scrapy/scrapy#5911.

Increase the length limit affecting the HTTP client parser to support scenarios seen in the wild, such as https://www.vapestore.co.uk/ returning a header with a 28k+ char value.

It is an arbitrary increase mimicking CPython’s, though, so there is a chance that in the future some new scenario for which these new limits are not enough warrants either increasing the limits further or making it easier to increase the limits from user code, which is not straightforward at the moment without relying on private APIs.

I also made a minor change on the contributor docs, switching pip install -e .[dev] to pip install -e .[dev,test] based on my experience. Not sure if that’s OK to do here, or if that’s the right change (maybe you prefer pip install -e .[test] to be suggested on the following section about running tests).

Regarding tests, there are no existing tests that check the specific values, which I think may make sense given these are trivially-selected limits. Also, there are already tests for the limits themselves (those tests use custom values, for performance reasons I imagine).

Post-merge to-do

@Gallaecio
Copy link
Contributor Author

please review

Copy link
Member

@adiroiban adiroiban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for the changes.

I left more comments inline.

I think they are an improvement over what we have in trunk

it is very important to have automated test for these values.

Let me know if you need help with the autoamted tests.

@@ -23,7 +23,7 @@ see in particular `Git and GitHub learning resources <https://help.github.com/ar
$ git clone https://github.com/twisted/twisted twisted
$ python3 -m venv ./venv
$ . venv/bin/activate
$ pip install -e .[dev]
$ pip install -e .[dev,test]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. We can have this for now.

In an ideal case, dev will also install test.
But for another PR.

Free free to send a PR if you have time to help with this.

src/twisted/protocols/basic.py Outdated Show resolved Hide resolved
src/twisted/protocols/newsfragments/8570.bugfix Outdated Show resolved Hide resolved
src/twisted/web/newsfragments/8570.bugfix Outdated Show resolved Hide resolved
src/twisted/web/http.py Outdated Show resolved Hide resolved
@@ -419,7 +422,7 @@ class LineOnlyReceiver(protocol.Protocol):

_buffer = b""
delimiter = b"\r\n"
MAX_LENGTH = 16384
MAX_LENGTH = _MAX_LINE_LENGTH
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this as a change to all LineOnlyReceiver and LineReceiver uses seems pretty excessive. The issue is with certain HTTP servers. Assuming the right fix is to make Twisted more liberal in what it accepts from HTTP servers, this doesn't mean all line-based protocols implemented with Twisted should have the same change applied.

I'd move this fix into the HTTP implementation and leave LineReceiver / LineOnlyReceiver alone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I have now limited the scope of the changes to HTTPClientParser.

@Gallaecio
Copy link
Contributor Author

please review

Copy link
Member

@adiroiban adiroiban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good. Many thanks!

I think that this is very close to being ready to merge.

I left a few minor comments, but that needs to be fixed before merge.

src/twisted/web/test/test_newclient.py Outdated Show resolved Hide resolved
src/twisted/web/test/test_newclient.py Show resolved Hide resolved
Comment on lines 978 to 979
"""Test that HTTP responses with header lines up to 65536 bytes long
are allowed."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is a copy/paste error.

Maybe something like this?

Suggested change
"""Test that HTTP responses with header lines up to 65536 bytes long
are allowed."""
"""
When the remote server returns an HTTP header that has a long line, will disconnected the connection.
"""

Is disconnection everything that it does?

I was expecting to also see an error and the reason for disconnection.

Is it possible to also test the error message in this test?
In yes, it wouldhelp to update the docstring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually originally going to ask about that. I was first checking by testing the resulting headers, which I will probably restore in combination with the disconnect check based on #12094 (comment), and was going to ask for a better way to check that things failed, as I don’t know. The relevant code simply calls “loseConnection” on the transport as far as I can tell.

Comment on lines +980 to +981
prefix = b"a: "
line = prefix + b"a" * (65536 - len(prefix) + 1) + b"\r\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this tests, does it matter the format of the header?

Maybe just write random long data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given https://github.com/twisted/twisted/pull/12094/files/3ade9422f18248993f2921a1439649bd39440c15#r1473704052, maybe it makes sense to keep this a valid header, so that checking that the parsed header list is empty has more value (i.e. it is not empty because it was not a valid header, it is empty due to the length limit).

src/twisted/web/test/test_newclient.py Show resolved Hide resolved
src/twisted/web/newsfragments/8570.bugfix Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect (too long) header from server causes Twisted Agent to fail Default downloader fails to get page
6 participants