Fix for 'Failed to parse headers' warning #1439

timb07 · 2018-09-12T17:42:54Z

Adds a test case for #1438, and one possible fix.

As noted in the issue, it would be worth checking the history behind why the unparsed_data check (which this PR removes) was added in the first place, since the test suite doesn't exercise it.

sethmlarson · 2018-09-12T17:53:32Z

test/with_dummyserver/test_socketlevel.py

@@ -1328,6 +1328,35 @@ def test_header_without_colon_or_value(self):
        ])


+@pytest.mark.skipif(


I don't think we need to skip here, we should be able to run these test cases on all Python versions.

Okay, I'll take your word on this. :)

sethmlarson · 2018-09-12T17:54:46Z

test/with_dummyserver/test_socketlevel.py

+    issubclass(httplib.HTTPMessage, MimeToolMessage),
+    reason='Header parsing errors not available'
+)
+class TestOkayHeaders(SocketDummyServerTestCase):


Let's change the name of this test case to something like: TestHeaderParsingContentType? Same for the helper function, phrases like "okay" don't pinpoint the functionality being tested. It'd also be nice to have a comment within the unittest about why this is being tested separately.

sethmlarson · 2018-09-12T17:55:29Z

src/urllib3/exceptions.py

@@ -236,8 +236,8 @@ def __init__(self, scheme):

 class HeaderParsingError(HTTPError):
    "Raised by assert_header_parsing, but we convert it to a log.warning statement."
-    def __init__(self, defects, unparsed_data):
-        message = '%s, unparsed data: %r' % (defects or 'Unknown', unparsed_data)
+    def __init__(self, defects):


I think it's important to maintain the unparsed_data attribute, it's useful to see where the parser finished parsing the raw bytes when there is a defect that stops HTTP parsing.

Agreed; reverted.

sethmlarson · 2018-09-12T17:57:22Z

src/urllib3/util/response.py

-
-    if defects or unparsed_data:
-        raise HeaderParsingError(defects=defects, unparsed_data=unparsed_data)
+    if defects:


Let's add back the unparsed_data getter. Let's continue checking if unparsed_data isn't a list and has len(unparsed_data) > 0 to preserve the same code path that was originally intended by the author.

Also add a comment about why we're ensuring that unparsed_data isn't a list, maybe reference EmailMessage.get_payload() documentation?

Agreed; I think the new changes address your points.

sethmlarson · 2018-09-12T17:57:45Z

test/test_exceptions.py

@@ -32,7 +32,6 @@ def test_exceptions(self, exception):

 class TestFormat(object):
    def test_header_parsing_errors(self):
-        hpe = HeaderParsingError('defects', 'unparsed_data')
+        hpe = HeaderParsingError('defects')


Revert this change after implementing above changes.

sethmlarson · 2018-09-12T18:02:22Z

Also we definitely need a changelog entry for this. Something to the effect of "Fixed bug where responses
with header Content-Type: message/* erroneously raising HeaderParsingError." then you can add yourself to the contributors list as well. :)

codecov-io · 2018-09-12T19:18:31Z

Codecov Report

Merging #1439 into master will not change coverage.
The diff coverage is n/a.

@@          Coverage Diff           @@
##           master   #1439   +/-   ##
======================================
  Coverage     100%    100%           
======================================
  Files          21      21           
  Lines        1790    1790           
======================================
  Hits         1790    1790

Impacted Files	Coverage Δ
src/urllib3/util/response.py	`100% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1821e73...7649111. Read the comment docs.

This reverts commit 7e78f79.

b'\r\n'.join(headers) means the headers passed into _test_broken_header_parsing don't need to be terminated with b'\r\n'; however the final header needs b'\r\n\r\n' appended

sethmlarson · 2018-09-13T00:42:39Z

test/with_dummyserver/test_socketlevel.py

@@ -1289,12 +1289,12 @@ def socket_handler(listener):
 )
 class TestBrokenHeaders(SocketDummyServerTestCase):

-    def _test_broken_header_parsing(self, headers):
+    def _test_broken_header_parsing(self, headers, unparsed_data_check=None):


Why were these test cases changed?

Do you mean the addition of unparsed_data_check specifically?

I looked at the various Stackoverflow questions where this had come up, and realised we already had a test case where unparsed_data was non-empty, so I thought it would be good to check for that.

I meant more the input data, it looks like you've changed the number of CRLF within the test case, what was the motivation for those changes? I'd be more comfortable adding a new testcase for unparsed_data than changing these existing ones.

As for the changes in the values of headers passed into _test_broken_header_parsing() and the subsequent generation of the value to pass into self.start_response_handler(), here's what two of the test cases were actually checking:

b'HTTP/1.1 200 OK\r\nContent-Length: 0\r\nContent-type: text/plain\r\n: Value\r\n\r\nAnother: Header\r\n\r\n' b'HTTP/1.1 200 OK\r\nContent-Length: 0\r\nContent-type: text/plain\r\n:\r\n\r\nAnother: Header\r\n\r\n'

Looking in more detail, the '\r\n' at the end of each bytestring in this call:

self._test_broken_header_parsing([ b': Value\r\n', b'Another: Header\r\n', ])

combined with b'\r\n'.join(headers) in _test_broken_header_parsing() result in duplicate CRLFs between each of the headers supplied in the method call.

The duplicate CRLF before Another: Header meant that Another: Header wasn't a part of the headers anymore, and wasn't being checked. In all the present test cases that didn't affect the result of the test, but that was more by coincidence. For instance, in the example above, if the order of the two bytestrings in the list was reversed, the test would fail, that is, no warning would be logged.

The best fix seemed to me to remove the CRLFs from the test bytestrings, and ensure the headers are terminated with an explicit CRLFCRLF.

What's next for this PR? Are you happy with my explanations, or would you prefer me to rework the tests?

sethmlarson

I understand your changes better now. Thanks for explaining! Just one more little comment that I just noticed.

sethmlarson · 2018-09-17T23:22:45Z

src/urllib3/util/response.py

@@ -60,7 +60,13 @@ def assert_header_parsing(headers):

    unparsed_data = None
    if get_payload:  # Platform-specific: Python 3.


Can we change the # Platform-specific: Python 3 into a different comment format to ensure we're getting coverage below. Our coveragerc makes any branch with this comment not count towards coverage.

The # Platform-specific: Python 3 comment was there originally, not added by me; also I note that the docstring for the function says Only works on Python 3. (These were modified/added in commit 275cfda in 2015.) I'm not familiar with the project's usage of this marker and why it's excluded from coverage, so I'll follow your lead.

I know you didn't add them, sorry if what I said above was misunderstood :) We use markers like that to exclude a function from test coverage but (imo) a lot of these are actually not great because having more coverage is a good thing! So I try to remove them if we can get 100% test coverage there.

Since it's documented elsewhere I think it's good to just remove it. If we lose coverage on any of those branches we should create test cases to add it back.

I've made that change; it doesn't seem to have changed the coverage.

sethmlarson · 2018-09-19T01:46:22Z

I think this looks good enough to merge now, thanks @timb07! 🎉

…rllib3#1439)

timb07 added 2 commits September 13, 2018 03:36

Add test for spurious header warnings; urllib3#1438

0796637

Simplify assert_header_parsing; fixes urllib3#1438

7e78f79

sethmlarson requested changes Sep 12, 2018

View reviewed changes

jamesls mentioned this pull request Sep 12, 2018

"Failed to parse headers" warning logged when reading S3 object boto/botocore#1551

Closed

timb07 added 5 commits September 13, 2018 08:53

Revert "Simplify assert_header_parsing; fixes urllib3#1438"

5472700

This reverts commit 7e78f79.

Fix header test cases to remove unintended header separator

129a895

b'\r\n'.join(headers) means the headers passed into _test_broken_header_parsing don't need to be terminated with b'\r\n'; however the final header needs b'\r\n\r\n' appended

Improve test to check for unparsed_data value

7b13457

Rename header parsing test and fix so it passes; fixes urllib3#1438

5c0f06c

Updated changelog

7649111

sethmlarson reviewed Sep 13, 2018

View reviewed changes

Merge branch 'master' into bug1438

9fff9af

sethmlarson requested changes Sep 17, 2018

View reviewed changes

Remove platform-specific marker

f22b64b

sethmlarson approved these changes Sep 19, 2018

View reviewed changes

sethmlarson merged commit f4efcca into urllib3:master Sep 19, 2018

dependabot bot mentioned this pull request Mar 16, 2021

build(deps): bump urllib3 from 1.23 to 1.24.2 in /docker/qc shahcompbio/wgs#36

Closed

This was referenced Mar 17, 2021

Bump urllib3 from 1.21.1 to 1.24.2 shubham-shrivastava/smsify#5

Merged

Bump urllib3 from 1.23 to 1.24.2 in /modulos/proj-isolado-teste victorbertoldo/py#1

Closed

Dobatymo pushed a commit to Dobatymo/urllib3 that referenced this pull request Mar 16, 2022

Fix for parsing Content-Type: message/* Responses without warnings (u…

ce8dccf

…rllib3#1439)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for 'Failed to parse headers' warning #1439

Fix for 'Failed to parse headers' warning #1439

timb07 commented Sep 12, 2018

sethmlarson Sep 12, 2018

timb07 Sep 13, 2018

sethmlarson Sep 12, 2018

timb07 Sep 13, 2018

sethmlarson Sep 12, 2018

timb07 Sep 13, 2018

sethmlarson Sep 12, 2018

timb07 Sep 13, 2018

sethmlarson Sep 12, 2018

timb07 Sep 13, 2018

sethmlarson commented Sep 12, 2018 •

edited

codecov-io commented Sep 12, 2018 •

edited

sethmlarson Sep 13, 2018

timb07 Sep 13, 2018

sethmlarson Sep 13, 2018 •

edited

timb07 Sep 13, 2018 •

edited

timb07 Sep 17, 2018

sethmlarson left a comment

sethmlarson Sep 17, 2018 •

edited

timb07 Sep 18, 2018

sethmlarson Sep 18, 2018 •

edited

timb07 Sep 18, 2018

sethmlarson commented Sep 19, 2018

		@@ -1328,6 +1328,35 @@ def test_header_without_colon_or_value(self):
		])


		@pytest.mark.skipif(

		@@ -60,7 +60,13 @@ def assert_header_parsing(headers):

		unparsed_data = None
		if get_payload: # Platform-specific: Python 3.

Fix for 'Failed to parse headers' warning #1439

Fix for 'Failed to parse headers' warning #1439

Conversation

timb07 commented Sep 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sethmlarson commented Sep 12, 2018 • edited

codecov-io commented Sep 12, 2018 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sethmlarson Sep 13, 2018 • edited

Choose a reason for hiding this comment

timb07 Sep 13, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sethmlarson left a comment

Choose a reason for hiding this comment

sethmlarson Sep 17, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sethmlarson Sep 18, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sethmlarson commented Sep 19, 2018

sethmlarson commented Sep 12, 2018 •

edited

codecov-io commented Sep 12, 2018 •

edited

sethmlarson Sep 13, 2018 •

edited

timb07 Sep 13, 2018 •

edited

sethmlarson Sep 17, 2018 •

edited

sethmlarson Sep 18, 2018 •

edited