New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for 'Failed to parse headers' warning #1439
Conversation
@@ -1328,6 +1328,35 @@ def test_header_without_colon_or_value(self): | |||
]) | |||
|
|||
|
|||
@pytest.mark.skipif( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to skip here, we should be able to run these test cases on all Python versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll take your word on this. :)
issubclass(httplib.HTTPMessage, MimeToolMessage), | ||
reason='Header parsing errors not available' | ||
) | ||
class TestOkayHeaders(SocketDummyServerTestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's change the name of this test case to something like: TestHeaderParsingContentType
? Same for the helper function, phrases like "okay" don't pinpoint the functionality being tested. It'd also be nice to have a comment within the unittest about why this is being tested separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
src/urllib3/exceptions.py
Outdated
@@ -236,8 +236,8 @@ def __init__(self, scheme): | |||
|
|||
class HeaderParsingError(HTTPError): | |||
"Raised by assert_header_parsing, but we convert it to a log.warning statement." | |||
def __init__(self, defects, unparsed_data): | |||
message = '%s, unparsed data: %r' % (defects or 'Unknown', unparsed_data) | |||
def __init__(self, defects): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important to maintain the unparsed_data attribute, it's useful to see where the parser finished parsing the raw bytes when there is a defect that stops HTTP parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed; reverted.
src/urllib3/util/response.py
Outdated
|
||
if defects or unparsed_data: | ||
raise HeaderParsingError(defects=defects, unparsed_data=unparsed_data) | ||
if defects: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add back the unparsed_data getter. Let's continue checking if unparsed_data
isn't a list and has len(unparsed_data) > 0
to preserve the same code path that was originally intended by the author.
Also add a comment about why we're ensuring that unparsed_data isn't a list, maybe reference EmailMessage.get_payload() documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed; I think the new changes address your points.
test/test_exceptions.py
Outdated
@@ -32,7 +32,6 @@ def test_exceptions(self, exception): | |||
|
|||
class TestFormat(object): | |||
def test_header_parsing_errors(self): | |||
hpe = HeaderParsingError('defects', 'unparsed_data') | |||
hpe = HeaderParsingError('defects') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert this change after implementing above changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Also we definitely need a changelog entry for this. Something to the effect of "Fixed bug where responses |
Codecov Report
@@ Coverage Diff @@
## master #1439 +/- ##
======================================
Coverage 100% 100%
======================================
Files 21 21
Lines 1790 1790
======================================
Hits 1790 1790
Continue to review full report at Codecov.
|
This reverts commit 7e78f79.
b'\r\n'.join(headers) means the headers passed into _test_broken_header_parsing don't need to be terminated with b'\r\n'; however the final header needs b'\r\n\r\n' appended
@@ -1289,12 +1289,12 @@ def socket_handler(listener): | |||
) | |||
class TestBrokenHeaders(SocketDummyServerTestCase): | |||
|
|||
def _test_broken_header_parsing(self, headers): | |||
def _test_broken_header_parsing(self, headers, unparsed_data_check=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why were these test cases changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the addition of unparsed_data_check specifically?
I looked at the various Stackoverflow questions where this had come up, and realised we already had a test case where unparsed_data was non-empty, so I thought it would be good to check for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant more the input data, it looks like you've changed the number of CRLF within the test case, what was the motivation for those changes? I'd be more comfortable adding a new testcase for unparsed_data than changing these existing ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the changes in the values of headers
passed into _test_broken_header_parsing()
and the subsequent generation of the value to pass into self.start_response_handler()
, here's what two of the test cases were actually checking:
b'HTTP/1.1 200 OK\r\nContent-Length: 0\r\nContent-type: text/plain\r\n: Value\r\n\r\nAnother: Header\r\n\r\n'
b'HTTP/1.1 200 OK\r\nContent-Length: 0\r\nContent-type: text/plain\r\n:\r\n\r\nAnother: Header\r\n\r\n'
Looking in more detail, the '\r\n'
at the end of each bytestring in this call:
self._test_broken_header_parsing([
b': Value\r\n',
b'Another: Header\r\n',
])
combined with b'\r\n'.join(headers)
in _test_broken_header_parsing()
result in duplicate CRLFs between each of the headers supplied in the method call.
The duplicate CRLF before Another: Header
meant that Another: Header
wasn't a part of the headers anymore, and wasn't being checked. In all the present test cases that didn't affect the result of the test, but that was more by coincidence. For instance, in the example above, if the order of the two bytestrings in the list was reversed, the test would fail, that is, no warning would be logged.
The best fix seemed to me to remove the CRLFs from the test bytestrings, and ensure the headers are terminated with an explicit CRLFCRLF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's next for this PR? Are you happy with my explanations, or would you prefer me to rework the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your changes better now. Thanks for explaining! Just one more little comment that I just noticed.
src/urllib3/util/response.py
Outdated
@@ -60,7 +60,13 @@ def assert_header_parsing(headers): | |||
|
|||
unparsed_data = None | |||
if get_payload: # Platform-specific: Python 3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change the # Platform-specific: Python 3
into a different comment format to ensure we're getting coverage below. Our coveragerc
makes any branch with this comment not count towards coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The # Platform-specific: Python 3
comment was there originally, not added by me; also I note that the docstring for the function says Only works on Python 3.
(These were modified/added in commit 275cfda in 2015.) I'm not familiar with the project's usage of this marker and why it's excluded from coverage, so I'll follow your lead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you didn't add them, sorry if what I said above was misunderstood :) We use markers like that to exclude a function from test coverage but (imo) a lot of these are actually not great because having more coverage is a good thing! So I try to remove them if we can get 100% test coverage there.
Since it's documented elsewhere I think it's good to just remove it. If we lose coverage on any of those branches we should create test cases to add it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made that change; it doesn't seem to have changed the coverage.
I think this looks good enough to merge now, thanks @timb07! 🎉 |
Adds a test case for #1438, and one possible fix.
As noted in the issue, it would be worth checking the history behind why the
unparsed_data
check (which this PR removes) was added in the first place, since the test suite doesn't exercise it.